Dremio Blog

9 minute read · April 22, 2025

A Journey from AI to LLMs and MCP — 2 — How LLMs Work — Embeddings, Vectors, and Context Windows

Alex Merced Head of DevRel, Dremio

Start For Free

Copied to clipboard

A Journey from AI to LLMs and MCP — 2 — How LLMs Work — Embeddings, Vectors, and Context Windows

How LLMs Think: It’s All Math Underneath

Embeddings: From Words to Numbers

What is an embedding?

Vector Search and Semantic Understanding

Context Windows: The Model’s Working Memory

Limitations of Embeddings and Context Windows

Embedding limitations:

Context window limitations:

Recap: Key Concepts from This Post

Up Next: Making LLMs Smarter with Fine-Tuning, Prompt Engineering, and RAG

In our last post, we explored the evolution of AI — from rule-based systems to deep learning — and how Large Language Models (LLMs) like GPT-4 and Claude represent a transformative leap in capability.

But how do these models actually work?

In this post, we’ll peel back the curtain on the inner workings of LLMs. We’ll explore the fundamental concepts that make these models tick: embeddings, vector spaces, and context windows. You’ll walk away with a clearer understanding of how LLMs “understand” language — and what their limits are.

How LLMs Think: It’s All Math Underneath

Despite their fluent text output, LLMs don’t truly “understand” language in the human sense. Instead, they operate on numerical representations of text, using vast networks of mathematical weights to predict the next word in a sequence.

The key mechanism behind this: transformers.

Transformers revolutionized NLP by allowing models to weigh the relevance of each word in a sentence — attention mechanisms — instead of processing words one-by-one like RNNs.

Here’s the simplified flow:

Text is tokenized (split into chunks)
Tokens are converted into embeddings (vectors)
Those vectors pass through layers of attention to capture meaning
The model generates the next token based on probability

But what are these embeddings and why do they matter?

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

Embeddings: From Words to Numbers

Before an LLM can do anything with language, it must convert words into numbers it can operate on.

That’s where embeddings come in.

What is an embedding?

An embedding is a high-dimensional vector (think: a long list of numbers) that represents the meaning of a word or phrase.

Words with similar meanings have similar embeddings.

For example:

Embedding("dog") ≈ Embedding("puppy") Embedding("Paris") ≈ Embedding("London")

These vectors live in an abstract vector space, where distance encodes similarity.

LLMs use embeddings not just for input, but throughout every layer of their neural network to understand relationships, context, and meaning.

Vector Search and Semantic Understanding

Because embeddings encode meaning, they’re also incredibly useful for semantic search.

Instead of matching exact words (like keyword search), vector search compares embeddings to find text that’s conceptually similar.

For example:

Query: “How do I fix a leaking pipe?”
Match: “Plumbing repair for minor water leaks”

Even though the words don’t overlap, the meaning does — and that’s what embeddings capture.

This is the foundation for many powerful AI techniques like:

Document similarity
Retrieval-Augmented Generation (RAG) (more on this in Blog 3)
Context injection from external data sources

Context Windows: The Model’s Working Memory

Another crucial concept in LLMs is the context window — the maximum number of tokens the model can “see” at once.

Every input to an LLM gets broken into tokens, and the model has a limited capacity for how many tokens it can process per request.

ModelMax Context WindowGPT-3.54,096 tokens (~3,000 words)GPT-4 TurboUp to 128,000 tokensClaude 3 OpusUp to 200,000 tokens

If you go over the limit, you’ll need to:

Truncate input (losing information)
Summarize
Use techniques like RAG or memory management

TL;DR: The larger the context window, the more the model can “remember” during a conversation or task.

Limitations of Embeddings and Context Windows

Even though LLMs are powerful, they come with trade-offs:

Embedding limitations:

Don’t always reflect nuanced context (e.g., sarcasm, tone)
Fixed dimensionality: can’t represent everything
Require separate handling for different modalities (text vs images)

Context window limitations:

Long documents may get truncated or ignored
Memory is not persistent — everything resets after a session unless you manually re-include previous context
More tokens = higher latency and cost

These limits are precisely why so much effort goes into enhancing LLMs through fine-tuning, retrieval systems, and smarter prompt engineering.

We’ll dive into that next.

Recap: Key Concepts from This Post

Concept	What It Is	Why It Matters
Embeddings	Vector representations of tokens/text	Enable semantic understanding & search
Vector Space	Mathematical space where embeddings live	Allows similarity comparison & clustering
Context Window	Max token size per LLM input	Defines how much the model can “see”
Attention	Weighs token relationships dynamically	Enables context awareness in LLMs

Up Next: Making LLMs Smarter with Fine-Tuning, Prompt Engineering, and RAG

In our next post, we’ll show how to enhance LLM performance using proven techniques:

Fine-tuning
Prompt engineering
Retrieval-Augmented Generation (RAG)

These strategies help you move beyond limitations — and get the most out of your models.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.

Start For Free

Article Topics

Dremio Blog: Various Insights

Blog coverpage for Ingesting Data into Aparche Iceberg with Dremio

Feb 1, 2024 Product Insights from the Dremio Blog

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

By unifying data from diverse sources, simplifying data operations, and providing powerful tools for data management, Dremio stands out as a comprehensive solution for modern data needs. Whether you are a data engineer, business analyst, or data scientist, harnessing the combined power of Dremio and Apache Iceberg will undoubtedly be a valuable asset in your data management toolkit.

Alex Merced

Sep 22, 2023 Dremio Blog: Open Data Insights

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

We're always looking for ways to better handle and save money on our data. That's why the "data lakehouse" is becoming so popular. It offers a mix of the flexibility of data lakes and the ease of use and performance of data warehouses. The goal? Make data handling easier and cheaper. So, how do we […]

Alex Merced

Oct 12, 2023 Product Insights from the Dremio Blog

Table-Driven Access Policies Using Subqueries

This blog helps you learn about table-driven access policies in Dremio Cloud and Dremio Software v24.1+.

Albert Vernon

A Journey from AI to LLMs and MCP — 2 — How LLMs Work — Embeddings, Vectors, and Context Windows

Table of Contents

How LLMs Think: It’s All Math Underneath

Try Dremio’s Interactive Demo

Embeddings: From Words to Numbers

What is an embedding?

Vector Search and Semantic Understanding

Context Windows: The Model’s Working Memory

Limitations of Embeddings and Context Windows

Embedding limitations:

Context window limitations:

Recap: Key Concepts from This Post

Up Next: Making LLMs Smarter with Fine-Tuning, Prompt Engineering, and RAG

Try Dremio Cloud free for 30 days

Ready to Get Started?

Table of Contents

How LLMs Think: It’s All Math Underneath

Try Dremio’s Interactive Demo

Embeddings: From Words to Numbers

What is an embedding?

Vector Search and Semantic Understanding

Context Windows: The Model’s Working Memory

Limitations of Embeddings and Context Windows

Embedding limitations:

Context window limitations:

Recap: Key Concepts from This Post

Up Next: Making LLMs Smarter with Fine-Tuning, Prompt Engineering, and RAG

Try Dremio Cloud free for 30 days

Related Dremio Articles

Ingesting Data Into Apache Iceberg Tables with Dremio: A Unified Path to Iceberg

Intro to Dremio, Nessie, and Apache Iceberg on Your Laptop

Table-Driven Access Policies Using Subqueries

Ready to Get Started?