// GLOSSARY -- AGENTIC AI

What is Transformer?

1 min read Updated Feb 19, 2026

A Transformer is the neural network architecture underlying all modern large language models, using self-attention mechanisms to process sequential data in parallel rather than sequentially.

WHY IT MATTERS

The Transformer architecture, introduced in the 2017 paper 'Attention Is All You Need,' is the foundation of modern AI. Its self-attention mechanism allows the model to weigh the importance of different parts of the input relative to each other — capturing long-range dependencies that previous architectures (RNNs, LSTMs) struggled with.

The key innovation: parallelism. Unlike sequential models, Transformers process all tokens simultaneously during training, enabling massive scaling on GPU hardware. This is what made models with hundreds of billions of parameters feasible.

Every major LLM — GPT, Claude, Gemini, Llama — is based on the Transformer architecture, with variations in attention patterns, positional encoding, and training methodology.

FREQUENTLY ASKED QUESTIONS

What is self-attention?

A mechanism where each token computes a weighted relationship with every other token in the sequence. This allows the model to understand context regardless of distance in the input.

Why are Transformers better than RNNs?

Parallelization (train faster), long-range dependencies (attend to any position), and scalability (more data and parameters consistently improve performance).

Will Transformers be replaced?

Alternatives are being explored (state space models like Mamba, RWKV), but Transformers remain dominant. Any replacement would need to match their scaling properties.

What is Transformer?

WHY IT MATTERS

FREQUENTLY ASKED QUESTIONS

FURTHER READING

Take your agents live. Without losing control.

What is Transformer?

WHY IT MATTERS

FREQUENTLY ASKED QUESTIONS

RELATED TERMS

FURTHER READING

Take your agents live. Without losing control.