1 / 22
โ† โ†’ Navigate  ยท  Space Next
Lecture 2

LLMs & Agents

From language models to autonomous agents that reason, plan, and take actions.

Nir Naim
Tel Aviv University
Queueing Theory Seminar

Part 1

Large Language Models

Understanding how LLMs work

What is an LLM?

Core Principle

An LLM is a neural network trained to predict the next token given all previous tokens.

\[P(x_{t+1} | x_1, x_2, \ldots, x_t)\]

This simple objective, scaled to billions of parameters and trillions of tokens, produces remarkably capable models.

Decoder-Only Architecture

Modern LLMs (GPT-4, Claude, Llama) use decoder-only transformers:

  • Autoregressive: Generate one token at a time
  • Causal masking: Each position only sees previous positions
  • Context window: Fixed max sequence length
Decoder Block ร— N Masked Self-Attention Feed-Forward Network + Residual + LayerNorm

Training Pipeline

01

Pretraining

Next-token prediction on trillions of tokens from the web

02

Fine-tuning (SFT)

Supervised learning on curated instruction-response pairs

03

RLHF

Reinforcement learning from human preferences

\[\mathcal{L} = -\sum_{t=1}^{T} \log P(x_t | x_1, \ldots, x_{t-1}; \theta)\]

Training Data Sources

  • Common Crawl: Petabytes of web pages
  • Books: Literature, textbooks
  • Wikipedia: Encyclopedic knowledge
  • Code: GitHub repositories
  • Scientific papers: arXiv, PubMed

๐Ÿ’ก Scale

GPT-3: 175B parameters trained on ~500B tokens
GPT-4: Estimated 1.8T parameters

Inference

Generating Text

How LLMs produce output

Temperature

Controls randomness by scaling logits before softmax:

\[P(x_i) = \frac{\exp(z_i / T)}{\sum_j \exp(z_j / T)}\]

Prompt: "The capital of France is ___"

Deterministic Random T=1.0

Sampling Strategies

Greedy

Always pick highest probability token

Top-k

Sample from k most likely tokens

Top-p (Nucleus)

Sample from tokens covering probability p

๐Ÿ’ก Context Window

Maximum tokens the model can process. Attention is O(nยฒ), so longer contexts are expensive.

Prompting

The Art of Instruction

How to get the best results from LLMs

Zero-Shot Prompting

No examples provided. The model relies on its pretraining knowledge.

Classify the sentiment as positive, negative, or neutral:

"The food was amazing but the service was slow."

Sentiment:

The model outputs: "neutral" or "mixed"

Few-Shot Prompting

Provide examples that demonstrate the pattern:

Review: "Best purchase ever!" โ†’ positive
Review: "Broke after one day." โ†’ negative
Review: "The food was amazing but the service was slow." โ†’

The model learns the format from context and outputs: "mixed"

Chain-of-Thought

Encourage step-by-step reasoning before answering:

# Prompt with "Let's think step by step"

1. "food was amazing" โ†’ positive signal
2. "service was slow" โ†’ negative signal  
3. "amazing" is stronger than "slow"
4. Overall: positive with caveats

๐Ÿ’ก Emergent Ability

CoT only works reliably in sufficiently large models

Scaling Laws

Performance scales predictably as a power law:

\[L(N) \propto N^{-0.076}, \quad L(D) \propto D^{-0.095}\]
  • Larger models are more sample efficient
  • No signs of diminishing returns at current scales
  • Some abilities emerge suddenly at certain scales

Part 2

LLM Agents

From text generation to autonomous action

What is an Agent?

Definition

LLM Agent = LLM (reasoning) + Tools (actions) + Memory (state) + Loop (orchestration)

LLM Input Output Tools Memory

The Agent Loop

1

Perceive

Receive input from user/environment

2

Think

LLM reasons about what to do

3

Act

Execute tool or respond

4

Observe

Process action result

5

Repeat

Until task complete

โ†ป

Loop

Continuous reasoning cycle

Tool Use / Function Calling

LLMs output structured requests that external code executes:

# Define tools
tools = [
    {"name": "search_web", "description": "Search for info"},
    {"name": "calculate", "description": "Do math"}
]

# User: "What's 15% of 847?"
# LLM outputs: {"name": "calculate", "args": {"expr": "847 * 0.15"}}

Common Tool Types

๐Ÿ”

Web Search

๐Ÿงฎ

Calculator

๐Ÿ’ป

Code Execution

๐Ÿ“

File Operations

๐ŸŒ

API Calls

๐Ÿ—„๏ธ

Database

Multi-Agent Systems

Complex tasks decomposed across specialized agents:

Hierarchical

Manager delegates to specialists

Collaborative

Peers share information

Debate

Agents argue, judge decides

Agent Frameworks

๐Ÿฆœ

LangChain

General purpose, extensive integrations

โšก

PydanticAI

Type-safe, clean API

๐Ÿ‘ฅ

CrewAI

Role-based multi-agent

๐Ÿ’ฌ

AutoGen

Multi-agent chat

๐Ÿ“š

LlamaIndex

RAG & retrieval

๐Ÿ”ง

Your Own!

Custom implementation

Workshop

Build Your Own Agent

Hands-on with PydanticAI

from pydantic_ai import Agent

agent = Agent('gemini-2.5-flash')  # Free!

@agent.tool
def search_web(query: str) -> str:
    return perform_search(query)

result = agent.run_sync("Find latest AI news")

Questions?

Thank you for your attention.