Building AI Agents with Google ADK: What I Taught at GDG Khujand

May 26, 2026

11 min read

I recently ran a workshop for GDG Khujand called "Build your first AI agent!" — two and a half hours of guiding twenty Python developers through Google's Agent Development Kit (ADK). Most of them had never built an agent before. Most had used an LLM API, but none had connected one to tools, sessions, callbacks, or evaluation.

By the end of the session, they all had a multi-agent system with memory, guardrails, and a passing eval suite running locally.

I built the workshop as eight progressive checkpoints — each a self-contained ADK project you can adk web your way into. This article distills that path into something you can follow in an afternoon. Every code block here is taken straight from the open-sourced workshop repo.

This isn't a "here's what an agent is" piece. It's a tutorial. By the end you'll know how to ship one.

Why ADK (and not LangChain, or raw API calls)

I've built agentic systems on top of LangChain, on top of raw Gemini and OpenAI API calls, and on top of LiteLLM. After running this workshop, my position is clear: if you're building on Gemini, ADK is the default.

A few reasons:

•It's official, open source, and the same framework Google uses internally for Vertex AI Agent Engine.
•Sessions, memory, sub-agents, MCP, callbacks, streaming, and evaluation are all in the box. You don't bolt them on later.
•adk web gives you a local dev UI with full event inspection and traces. The first time I showed this to the workshop room, three people audibly gasped — it's that good as a debugging tool.
•It deploys to Cloud Run, GKE, or Agent Engine with one command.

The trade-off is lock-in to the Google ecosystem. You can swap models via LiteLLM, but the framework itself is Google's. For a founder shipping on Gemini, that's fine. For a multi-cloud play, look elsewhere.

Setup: 90 seconds to a working agent

ADK expects a strict folder structure. If you fight it, nothing works. If you accept it, you get the dev UI for free.

# Install uv (a fast Python package manager, optional but recommended)
curl -LsSf https://astral.sh/uv/install.sh | sh

uv venv --python 3.12
source .venv/bin/activate
uv pip install google-adk python-dotenv

Then create a folder that looks exactly like this:

parent_folder/
└── my_first_agent/         # folder name = agent name in adk web
    ├── __init__.py         # contains: from . import agent
    ├── agent.py            # MUST define a variable named root_agent
    └── .env                # API key and config

Grab a free Gemini key from aistudio.google.com/apikey and drop it into .env:

GOOGLE_GENAI_USE_VERTEXAI=FALSE
GOOGLE_API_KEY=your_key_here
ADK_MODEL=gemini-3.1-flash-lite

One thing I make every workshop participant do: use gemini-3.1-flash-lite, not gemini-2.5-flash. The free tier on 2.5-flash is 20 requests per day. On 3.1-flash-lite it's 500. Twenty people sharing 20 RPD turns a workshop into a queue.

Checkpoint 1: Hello, Agent!

The minimum viable agent is a model plus an instruction. That's it.

from google.adk.agents import Agent

root_agent = Agent(
    name="python_helper",
    model="gemini-3.1-flash-lite",
    description="A friendly assistant that helps with Python questions.",
    instruction=(
        "You are a friendly Python mentor. "
        "Answer concisely, to the point, with code examples. "
        "If the question is not about programming, "
        "politely redirect the conversation back to Python."
    ),
)

Run adk web from the parent folder, open http://localhost:8000, pick my_first_agent, and you're chatting. The Events tab shows every message, tool call, and model response. Trace shows latency per step. This is what you'll live in for the rest of your agent-building career.

A note on terminology: Agent is an alias for LlmAgent. They're the same class. The workshop uses Agent early for friendliness and switches to LlmAgent once callbacks enter the picture.

Checkpoint 2: Tools — the real superpower

An agent without tools is a chatbot. An agent with tools is software.

ADK turns any Python function into a tool by reading its signature and docstring. Both are sent to the model verbatim. Write them like you'd write an API spec for a junior engineer.

def get_weather(city: str) -> dict:
    """Returns the current weather in the specified city.

    Args:
        city: City name in English (e.g., "New York").

    Returns:
        Dict with status and a report, or an error message.
    """
    fake_data = {
        "new york": "It's sunny in New York, 25°C.",
        "london": "It's cloudy in London, 14°C.",
        "tashkent": "It's clear in Tashkent, 28°C.",
    }
    report = fake_data.get(city.lower())
    if report:
        return {"status": "success", "report": report}
    return {"status": "error", "error_message": f"No data for {city}."}


root_agent = Agent(
    name="weather_time_agent",
    model="gemini-3.1-flash-lite",
    description="Agent that answers questions about weather and time.",
    instruction=(
        "You are a helpful assistant. "
        "When the user asks about weather or time — use the tools. "
        "If the city is not in the data — say so honestly."
    ),
    tools=[get_weather, get_current_time],
)

Three things I drill into every participant:

Type hints and docstrings are not optional. They are the tool spec. Forget them and the LLM either guesses arguments or refuses to call the tool.
Always return a structured dict with a status field. {"status": "success", ...} or {"status": "error", "error_message": ...}. This gives the model a clean signal to handle failure paths.
Built-in tools exist — from google.adk.tools import google_search gives you grounding in Google. But (see Checkpoint 6) you can't mix built-in tools with sub-agents on the same agent. There's a workaround.

Checkpoint 3: Memory and sessions

ADK has three levels of memory, and confusing them is the most common bug I saw in the workshop:

Level	Class / field	What it holds
Session	`Session` (`InMemorySessionService`)	Message history of one conversation
State	`session.state` (dict-like)	Structured data inside a session — name, preferences, flags
Memory	`MemoryService`	Long-term memory across sessions — user profiles, RAG indexes

Writing to state from a tool is dead simple — you just declare a tool_context parameter and ADK injects it:

from google.adk.tools.tool_context import ToolContext

def remember_preference(key: str, value: str, tool_context: ToolContext) -> dict:
    """Saves a user preference to session state."""
    tool_context.state[key] = value
    return {"status": "success", "saved": {key: value}}

def recall_preference(key: str, tool_context: ToolContext) -> dict:
    """Retrieves a saved preference."""
    value = tool_context.state.get(key)
    if value is None:
        return {"status": "error", "error_message": f"No value for {key}."}
    return {"status": "success", "value": value}

In adk web, the State tab shows the live session dict as the agent updates it. This single panel converted more skeptics than any slide I had.

In production you swap InMemorySessionService for DatabaseSessionService, VertexAiSessionService, or a Redis-backed one. The agent code doesn't change.

Checkpoint 4: Streaming

You don't change the agent to enable streaming. You change the RunConfig.

from google.adk.agents.run_config import RunConfig, StreamingMode
from google.adk.runners import InMemoryRunner
from google.genai import types

runner = InMemoryRunner(agent=root_agent, app_name="workshop")
session = await runner.session_service.create_session(
    app_name="workshop", user_id="user_1"
)

run_config = RunConfig(
    streaming_mode=StreamingMode.SSE,
    max_llm_calls=20,   # guard against runaway loops
)

async for event in runner.run_async(
    user_id="user_1",
    session_id=session.id,
    new_message=types.Content(
        role="user",
        parts=[types.Part(text="Explain recursion with a long example.")],
    ),
    run_config=run_config,
):
    if event.content and event.content.parts:
        for part in event.content.parts:
            if part.text:
                print(part.text, end="", flush=True)

Three streaming modes: NONE (default), SSE (token-by-token, one-way — perfect for chat UIs), and BIDI (bidirectional via Gemini Live API — for voice and interruptions).

The max_llm_calls=20 line is non-negotiable in production. An agent in a tool-call loop can run up your bill in minutes. I've seen it.

Checkpoint 5: Callbacks — where guardrails live

This is the checkpoint that separates "agent demo" from "agent in production." Callbacks are extension points: return None to pass through, return a value to short-circuit the step.

from google.adk.agents.callback_context import CallbackContext
from google.adk.models import LlmRequest, LlmResponse
from google.genai import types
from typing import Optional

BLOCKED_WORDS = ("api_key", "password", "secret", "token")

def block_secrets(
    callback_context: CallbackContext, llm_request: LlmRequest
) -> Optional[LlmResponse]:
    """If the user message contains blocked words — substitute the response
    and skip the LLM call entirely."""
    last_text = ""
    if llm_request.contents:
        last = llm_request.contents[-1]
        if last.parts:
            last_text = (last.parts[0].text or "").lower()

    if any(word in last_text for word in BLOCKED_WORDS):
        return LlmResponse(
            content=types.Content(
                role="model",
                parts=[types.Part(text="I don't discuss secrets.")],
            )
        )
    return None

Wire it up like this:

root_agent = LlmAgent(
    name="guarded_weather_agent",
    model="gemini-3.1-flash-lite",
    instruction="You are a helpful assistant. Use get_weather for weather questions.",
    tools=[get_weather],
    before_agent_callback=log_entry,
    before_model_callback=block_secrets,
    before_tool_callback=normalize_city,
    after_model_callback=append_signature,
)

ADK gives you six callback points — before/after agent, before/after model, before/after tool. They're the right place for: structured logging (Datadog, OpenTelemetry), PII redaction, prompt-injection guards, request normalization, LLM response caching, and audit signatures.

One rule: don't do heavy synchronous I/O in a callback. It blocks the agent loop. Fire-and-forget to a queue if you need durability.

Checkpoint 6: Multi-agent systems (and one Gemini gotcha)

The naive way to build a research assistant is to give one agent both google_search and a writer sub-agent. This is also the way to discover a Gemini API restriction that cost me an hour the first time I hit it:

400 INVALID_ARGUMENT: Please enable tool_config.include_server_side_tool_invocations to use Built-in tools with Function calling.

Built-in tools (google_search, code_execution) cannot coexist with regular function-calling on the same agent. And when you pass sub_agents=[...], ADK auto-injects a transfer_to_agent function — which counts as function-calling. Conflict.

The fix is the AgentTool pattern. Instead of sub_agents, wrap each sub-agent as a tool of the coordinator. Each runs in its own isolated context — no injected functions, no conflict.

from google.adk.agents import LlmAgent
from google.adk.tools import google_search
from google.adk.tools.agent_tool import AgentTool

researcher = LlmAgent(
    name="researcher",
    model="gemini-3.1-flash-lite",
    instruction="Use google_search to find facts. Return 3–5 facts with sources.",
    tools=[google_search],
)

writer = LlmAgent(
    name="writer",
    model="gemini-3.1-flash-lite",
    instruction="Write a short 3–4 paragraph article based on the provided facts.",
)

root_agent = LlmAgent(
    name="research_coordinator",
    model="gemini-3.1-flash-lite",
    instruction=(
        "1. Call researcher to gather facts via Google Search.\n"
        "2. Pass facts to writer to draft the article.\n"
        "3. Return the final article.\n"
        "Never invent facts — rely on what researcher returns."
    ),
    tools=[AgentTool(agent=researcher), AgentTool(agent=writer)],
)

The coordinator calls each sub-agent like a function and gets back text. It's a cleaner mental model than "handoff" anyway.

For deterministic pipelines (no LLM coordination), ADK also offers SequentialAgent, ParallelAgent, and LoopAgent. And via MCPToolset, any Model Context Protocol server (filesystem, GitHub, Postgres, Slack) becomes a tool with zero glue code.

Checkpoint 7: Evaluation — the part most teams skip

If you change a prompt or swap a model, can you tell whether the agent got better or worse? Most teams I've audited answer this with vibes. ADK gives you a real answer.

You write an evalset: a JSON file with conversations, expected tool calls, and expected responses. Then you run it via adk eval, pytest, or the Eval tab in adk web.

{
  "eval_set_id": "basic",
  "eval_cases": [{
    "eval_id": "weather_ny",
    "conversation": [{
      "invocation_id": "1",
      "user_content": { "parts": [{"text": "What's the weather in New York?"}], "role": "user" },
      "final_response": { "parts": [{"text": "It's sunny in New York, 25°C."}], "role": "model" },
      "intermediate_data": {
        "tool_uses": [{"name": "get_weather", "args": {"city": "New York"}}]
      }
    }]
  }]
}

adk eval my_first_agent my_first_agent/tests/basic.evalset.json
# or
pytest -v my_first_agent/tests/test_agent.py

Two metrics that catch most regressions: tool_trajectory_avg_score (did the agent call the right tools in the right order?) and response_match_score (ROUGE overlap with the expected reply). Both are cheap. Both should be in CI before your agent ever sees a user.

LLM-as-Judge metrics (final_response_match_v2, hallucinations_v1, safety_v1) are more expensive because they call a judge model — run them in nightly jobs, not on every commit.

Checkpoint 8: Deployment

adk deploy cloud_run --project <PROJECT> --region us-central1 my_first_agent and three minutes later you have a public URL with a FastAPI endpoint. That's it.

The three options:

Platform	When	Command
Vertex AI Agent Engine	Production, managed, A2A, sessions in the box	`adk deploy agent_engine`
Cloud Run	Serverless container, cheap	`adk deploy cloud_run`
GKE	You already live in Kubernetes	`adk deploy gke`

Two things to do before shipping: swap InMemorySessionService for a persistent backend, and move the API key into Secret Manager (not .env).

Key Takeaways

I'll keep this short.

The eight checkpoints are eight habits. Not just an order of building, but the order in which to think: instruction first, then tools, then memory, then streaming, then guardrails, then composition, then evaluation, then deployment.

The two non-obvious wins I'd flag for anyone starting now:

adk web is the highest-leverage tool in the box. Use it before you write a line of test code. Watching the Events tab as the agent reasons through a tool call teaches you more about agent behavior than any blog post — including this one.
Evaluation isn't optional, but it's cheap. A 10-case evalset with tool_trajectory_avg_score will catch 80% of the regressions you'd otherwise ship to users. Start there. Get fancier later.

If you want the full workshop materials — eight checkpoints, a Jupyter notebook, slides, and a Russian translation — they're open on GitHub. Fork it, run it for your meetup, send me what you build.

I'm building AI agents for businesses at Lookona Labs. If you're shipping something with ADK and want a second pair of eyes, reach out.

Workshop run at GDG Khujand. Full source code and slides: github.com/dev-muhammad/ADK-workshop.