Dremio Blog

8 minute read · December 19, 2025

How LLMs Work: Tokens, Embeddings, and Transformers

Will Martin Will Martin Technical Evangelist
Start For Free
How LLMs Work: Tokens, Embeddings, and Transformers
Copied to clipboard

Key Takeaways

  • LLMs use mathematical representations to process language differently than humans, focusing on tokenization and embeddings.
  • Tokenization breaks text into smaller units, which helps LLMs build a comprehensive vocabulary and understand syntax.
  • Embeddings are numerical representations that define token meanings in a multi-dimensional space, enabling conceptual similarity searches.
  • Transformers, a type of neural network, leverage tokenization and embeddings to predict subsequent tokens and understand context effectively.
  • While LLM techniques are complex, they conceptually mirror how humans process language, using weights and attention mechanisms.

Large Language Models (LLMs) are capable of understanding and generating language. However, they do not understand or process language in the same way that a person like you or I does. 

When reading text, humans construct meaning by processing the syntax of each sentence as it unfolds. This involves combining word definitions, using context from the rest of the sentence, prior knowledge, and even anticipating upcoming words. 

In comparison, LLMs operate by converting the text into numerical representations, then using vast networks of mathematical weights to predict the next word in a sequence. To put it in terms of school subjects: humans use English Language while LLMs use Mathematics.

At a high-level, the LLM process is as follows:

  • Text is split into chunks.
  • Chunks are converted into mathematical vectors.
  • Vectors pass through layers of attention to capture their meaning.
  • The model uses the vectors to guess the next probable chunk.

Try Dremio’s Interactive Demo

Explore this interactive demo and see how Dremio's Intelligent Lakehouse enables Agentic AI

Tokenization

Part one of the process, Tokenization breaks the text block into small units of text for the models to analyse. These resulting "tokens" can be:

  • whole words,
  • parts of words (useful for complex or uncommon words),
  • single characters (e.g. individual letters or numbers),
  • or even punctuation. 

The complete set of tokens, outputted by the tokenization process, is known as an LLMs “vocabulary”. This is the range of tokens that the LLM will be able to recognise and generate. At the time of writing, modern LLMs typically have vocabularies ranging from 30,000 to 100,000 tokens. 

As with people, the size of an LLMs vocabulary impacts how well it can understand and generate text. An LLM with an extensive vocabulary is able to handle more words and understand greater concepts. However, unlike with people, bigger vocabularies have a distinct drawback for LLMs. Models with larger token are more computationally intensive to run, which affects the speed and costs of using the model. As such, the best performing LLMs strike a balance between model efficiency and output quality.

Embeddings

Using tokenization, the block of text is broken down into parts for the LLM to process. However, these units of language must first be converted into numbers before the LLM can understand and work with them. This is where embeddings come into play. 

LLMs use embeddings throughout every layer of their neural network to understand the relationships, context, and meanings of words and tokens. An embedding is a numerical representation of these concepts for a given token, defined as a high-dimensional vector, i.e. a very long list of numbers. At the time of writing, a typical token vector can consist of hundreds to thousands of floating-point numbers. 

The embedding vector defines a token’s meaning as a position in a multi-dimensional space. The closer two vectors are in this “embedding” space, the closer their relationship and their meanings are to each other. So while it sounds complicated dealing with thousands of floating point numbers, conceptually it is quite simple; the more similar the numbers, the more similar the words.

For example, "cat" will have more similar embedding values to "lion" than it does to "moon". Embeddings also encode the context of a token, meaning, the token for "hat" in "hi-hat" will have different embedding values than that of "hat" in "bowler hat". This is what enables Vector Search, the process of searching for words that are conceptually similar rather than spelt the same (as in Keyword Search).

Transformers

Transformers are a type of neural network model that leverage both tokenization and embeddings to understand blocks of text. They represent the technological breakthrough that revolutionized how LLMs process natural language.

Transformers train on large text sets and use probability to try to predict the next token in each line. This training is repeated to build recognisable patterns of successful predictions. This is how LLMs are evaluated on text comprehension, by being able to reliably predict what comes next. Literature, search requests, programming code, and chat logs are all different types of text with distinct patterns that LLM models can understand and replicate.

To understand a sentence, the LLM model needs to quantify the relevance of each token that makes up that sentence. This importance is represented by an assigned number, known as a “weight”. However, the importance of a given token is not universal and depends on where you are in that sentence. As such, this weighting process is repeated for every token in the sentence at each point in the sentence. 

Attention mechanisms allow a model to focus on specific parts of the sentence, helping it to decide which tokens matter the most for that current point in the text. The more useful detail a token adds, the higher its assigned weight.  What this effectively does is build a relationship map across the entire sentence. While this can be difficult to conceptualize, it is not too dissimilar from how a person would link words in a sentence, such as a verb linking to an object or a pronoun relating to the subject.

Summary

That concludes this brief overview of the mechanisms LLMs use to understand text. Whilst these are complex processes on a technical level, they are conceptually quite simple. Neural networks use mathematics to approximate the complexities of syntax, word definitions, and context. Techniques which, in many ways, mirror how we humans process language ourselves.

Try Dremio Cloud free for 30 days

Deploy agentic analytics directly on Apache Iceberg data with no pipelines and no added overhead.