How to build a Web Search Agent from scratch

Most AI applications today are limited by their training data cutoff. Your users ask about recent events, current stock prices, or the latest news, and your AI assistant responds with "I don't have access to real-time information." This is where web search agents become game-changers.

Building a web search agent isn't just about connecting an LLM to Google. It's about creating an intelligent system that can formulate search queries, evaluate results, extract relevant content, and synthesize information into coherent responses. Let's build one from scratch.

What is a Web Search Agent?

Think of a web search agent as a LLM in a for loop that instead of querying a vector db or training weights, it queries the web in a loop to find websites with information that the user needs.

The key difference between a simple web search integration and a proper agent is iterative reasoning or in simpler terms a for loop. A basic integration might search once and return results. An agent can:

Formulate multiple search queries based on initial findings
Extract content from specific URLs for deeper analysis
Refine its search strategy based on previous results
Synthesize information from multiple sources

Architecture Overview

Our web search agent consists of four core components:

Query Formulation: The LLM analyzes user questions and creates appropriate search queries
Web Search: Execute searches using a search API (we'll use Brave Search)
Content Extraction: Parse and extract meaningful content from web pages
Response Synthesis: Combine information from multiple sources into a coherent answer

User Question → LLM → Search Query → Web Results → Content Extraction → Final Response
     ↑                                                                      ↓
     └──────────────── Iterative Refinement ←──────────────────────────────┘

The Implementation

Dependencies and Setup

First, let's set up our environment with the necessary packages:

from mirascope import llm, prompt_template
from typing import Literal
from pydantic import BaseModel, Field
import os 
from dotenv import load_dotenv
from datetime import datetime
import requests
import markdownify

load_dotenv()

We're using Mirascope for LLM calls because it provides clean abstractions for tool calling and conversation management. The Brave Search API gives us access to web search without the complexity of Google's pricing tiers.

Before we forget you will need a BRAVE_API_KEY and a GOOGLE_API_KEY in your .env file. You can get a Google API key easily and for free by following these instructions.

Tools

Think of tools as functions you give to a LLM to expand its current functionality. LLMs by themselves are stateless and a black box. Your favorite AIs? Those are not LLMs anymore, they are agents.

For example, ChatGPT is an agent. It has tools to do web search, create images, deep research, get past conversations etc. They added the functionality to their propriety LLMs with tools in order to do all that.

Below we will have two tools. Web search and Context Extraction. One is used to search the web, another is to extract the content of a web page if the LLM thinks it has information we need.

Web Search Tool

def web_search(query: str) -> str:
    """
    Searches the web and returns the summaries of top results.
    
    Args:
        query: The search query to be executed.
        
    Returns:
        A string containing the summaries of the top results.
    """
    try:
        from brave import Brave
        brave = Brave(api_key=os.getenv("BRAVE_API_KEY"))
        results = brave.search(q=query, count=10, raw=True)
        web_results = results.get("web", {}).get("results", [])
        
        summaries = []
        for result in web_results:
            if 'profile' not in result:
                continue
            url = result['url']
            header = f"{result['profile']['name']} - {result['profile']['long_name']}"
            title = result['title']
            snippet = result['description']
            summaries.append(f"{header}\n{title}\n{snippet}\n{url}")
        return "\n\n".join(summaries)
    except Exception as e:
        return f"Error searching the web: {e}"

This function handles the initial web search. We're filtering results to ensure they have profiles (which indicates higher quality sources) and structuring the output to include source information, titles, descriptions, and URLs.

Content Extraction Tool

def extract_content(url: str) -> str:
    """
    Fetches the content of a given URL and returns it as a markdown page.
    
    Args:
        url: The URL to fetch the content from.
        
    Returns:
        A string containing the content of the URL as a markdown page.
    """
    import requests
    import markdownify
    response = requests.get(url)
    content = response.text
    markdown = markdownify.markdownify(content)
    return markdown

The content extraction tool converts HTML to markdown, making it easier for the LLM to process. Markdown provides better structure than raw HTML while being more concise than plain text.

The Agent Brain

@llm.call(provider='google', model='gemini-2.0-flash', tools=[web_search, extract_content])
@prompt_template("""
SYSTEM:
You are an expert web searcher. Your task is to answer the user's question using the provided tools.
Use the current date provided to search the web for the most up to date information.
The current date is {current_date}.

You have access to the following tools:
- `web_search(query: str)`: Searches the web and returns summaries of top results.
- `extract_content(url: str)`: Parse the content of a webpage of a given URL and returns it as a markdown page.

When calling the `web_search` tool, the `body` is simply the summary of the search
result with the URL. You MUST then call the `extract_content` tool to get the actual content
of the webpage. It is up to you to determine which search results to parse.

You may call one tool per turn, for up to 10 turns before giving your final answer.

In each turn you should give your thinking process and the final answer when you have gathered all of the information you need.

Once you have gathered all of the information you need, generate a writeup that
strikes the right balance between brevity and completeness based on the context of the user's query.

MESSAGES: {history}
USER: {question}
""")
def search(question: str, history: list = None):
    return {"computed_fields": {"current_date": datetime.now().strftime("%Y-%m-%d %H:%M:%S"), "history": history or []}}

Using mirascope's llm.call, we defined what LLM we want to use, what tools it has and a prompt. Read this function once, then twice, then three times to understand it.

The agent's prompt is carefully crafted to:

Establish clear role and capabilities: The agent knows it's a web searcher with specific tools
Provide current context: Date information helps with time-sensitive queries
Define workflow: Search first, then extract content from relevant URLs
Set boundaries: Maximum 10 iterations prevents infinite loops
Guide output quality: Balance between brevity and completeness

Orchestration Engine

No that we have the LLM call and gave it the tools its needs, time to run it in a for loop. We will limit it to 10 iterations.

def run_agent_with_tools(question: str, max_iterations: int = 10):
    """
    Run the agent with iterative tool calling until completion.
    
    Args:
        question: The user's question
        max_iterations: Maximum number of tool calling iterations to prevent infinite loops
        
    Returns:
        Dict containing the final response and execution details
    """
    conversation_history = []
    total_cost = 0
    total_tokens = 0
    iteration = 0
    
    print(f"🤖 Starting agent for question: {question}")
    print("=" * 60)
    
    while iteration < max_iterations:
        iteration += 1
        print(f"\n📍 Iteration {iteration}")
        
        # Make the LLM call with conversation history
        result = search(question, history=conversation_history)
        
        # Track costs and tokens
        total_cost += result.cost
        total_tokens += result.input_tokens + result.output_tokens
        
        # Display LLM reasoning
        print(f"💭 LLM Response: {result.content}")
        
        # Add user message to history (agent state management)
        if iteration == 1 and result.user_message_param:
            conversation_history.append(result.user_message_param)
        
        # Add assistant message to history
        conversation_history.append(result.message_param)
        
        # Check if tools were called
        if result.tools:
            print(f"🔧 Tools called: {len(result.tools)}")
            tools_and_outputs = []
            
            for i, tool in enumerate(result.tools):
                print(f"   Tool {i+1}: {tool._name()}({tool.args})")
                
                # Execute the tool
                try:
                    output = tool.call()
                    tools_and_outputs.append((tool, output))
                    print(f"   ✅ Tool output length: {len(str(output))} characters")
                except Exception as e:
                    print(f"   ❌ Tool error: {e}")
                    tools_and_outputs.append((tool, f"Error: {e}"))
            
            # Add tool results to conversation history
            if tools_and_outputs:
                conversation_history.extend(
                    result.tool_message_params(tools_and_outputs)
                )
            
            # Continue the loop for another LLM call with tool results
            continue
        else:
            # No tools called - agent is done
            print("✅ No tools called - Agent completed!")
            break
    
    if iteration >= max_iterations:
        print(f"⚠️  Reached maximum iterations ({max_iterations})")
    
    print("\n" + "=" * 60)
    print(f"📊 Final Stats:")
    print(f"   Iterations: {iteration}")
    print(f"   Total Cost: ${total_cost:.6f}")
    print(f"   Total Tokens: {total_tokens}")
    print(f"   Conversation History Length: {len(conversation_history)}")
    
    return {
        'final_response': result.content,
        'iterations': iteration,
        'total_cost': total_cost,
        'total_tokens': total_tokens,
        'conversation_history': conversation_history,
        'completed': iteration < max_iterations
    }

This orchestration engine manages the agent's execution flow with several key features:

State Management: Maintains conversation history throughout iterations Cost Tracking: Monitors token usage and costs for production budgeting
Error Handling: Gracefully handles tool failures and continues execution Termination Logic: Stops when the agent decides it has enough information or hits iteration limits Observability: Provides detailed logging for debugging and monitoring

Agent Execution Flow

Let's trace through how the agent handles a typical query:

# Test the agent
result = run_agent_with_tools("Why did OKC Thunder lose game 1 of the NBA finals?")

Iteration 1: Agent analyzes the question and realizes it needs current NBA information. It calls web_search("OKC Thunder NBA finals game 1 loss 2024")

Iteration 2: Agent reviews search results and identifies relevant articles. It calls extract_content() on the most promising URLs to get detailed information.

Iteration 3: If needed, the agent might search for additional context like "NBA finals 2024 schedule" or specific player statistics.

Final Response: Agent synthesizes information from multiple sources into a comprehensive answer.

Here is an example run:

result = run_agent_with_tools("Why did OKC Thunder lose game 1 of the NBA finals?")

print(f"\n🎯 Final Answer:")
print(result['final_response'])

Output:

🤖 Starting agent for question: Why did OKC Thunder lose game 1 of the NBA finals?
============================================================

📍 Iteration 1
💭 LLM Response: The NBA finals are in June. The current date is June 8, 2025. Therefore, I will search for the results of the 2025 NBA finals.

🔧 Tools called: 1
   Tool 1: web_search({'query': 'OKC Thunder NBA Finals 2025 game 1 result'})
   ✅ Tool output length: 3868 characters

📍 Iteration 2
💭 LLM Response: 
🔧 Tools called: 1
   Tool 1: extract_content({'url': 'https://www.espn.com/nba/story/_/id/44610574/nba-finals-2025-postseason-news-scores-highlights'})
   ✅ Tool output length: 630 characters

📍 Iteration 3
💭 LLM Response: 
🔧 Tools called: 1
   Tool 1: web_search({'query': 'OKC Thunder vs Pacers Game 1 NBA Finals 2025 recap'})
   ✅ Tool output length: 4562 characters

📍 Iteration 4
💭 LLM Response: The OKC Thunder lost Game 1 of the 2025 NBA Finals to the Indiana Pacers with a score of 111-110. A late comeback by the Pacers, capped by a pull-up jumper from Tyrese Haliburton with 0.3 seconds remaining, secured the victory for Indiana. The Thunder had built a 15-point lead in the fourth quarter but were unable to maintain it.

✅ No tools called - Agent completed!

============================================================
📊 Final Stats:
   Iterations: 4
   Total Cost: $0.000783
   Total Tokens: 7190
   Conversation History Length: 8

🎯 Final Answer:
The OKC Thunder lost Game 1 of the 2025 NBA Finals to the Indiana Pacers with a score of 111-110. A late comeback by the Pacers, capped by a pull-up jumper from Tyrese Haliburton with 0.3 seconds remaining, secured the victory for Indiana. The Thunder had built a 15-point lead in the fourth quarter but were unable to maintain it.

If you take a closer look, you can see it only called the extraction tool once, meaning that it got all the information it needed to answer the users question, saving the user time and money. That Haliburton game winner still hurts, sad !

Why This Architecture is not good(What??!)

This is just a beginners guide to building an agent, but there are a bunch of improvements that can be made. I made them but since it is homework I shall not be sharing them with the class(maybe later)!

Further improvements

We are having the single LLM call handle too much. It can easily lose context and what it needs to do between the following:

Generating queries to query the brave search API
Searching the web with said queries
Extracting content that it needs to from select websites
Generating final answer

To further imrove this you should seperate out the above points in seperate LLM calls and functions. This way you can have a agentic workflow where you can reduce the load on one LLM call and further enhance the web search capabilities.

You can also turn this into a MCP tool! More on this next time.

Closing Thoughts

Web search agents represent a significant step forward in AI application capabilities. By combining structured tool calling with iterative reasoning, we can build systems that provide accurate, up-to-date information while maintaining transparency about sources and reasoning.

The architecture we've built here is a great starting point for many, but everything can be improved or customized to your needs. You can change the LLM you are using, which web search API you want to use, how you want to extract contect, or even chunk it. Its all up to you.

Remember that building great agents is an iterative process. Start with the basics, test thoroughly with real user queries, and gradually add sophistication based on actual usage patterns. The key is balancing capability with reliability, ensuring your agent provides value without overwhelming users with unnecessary complexity.

Your users will thank you for building AI that actually knows what happened yesterday.