How to build a Web Search Agent from scratch

Most AI applications today are limited by their training data cutoff. Your users ask about recent events, current stock prices, or the latest news, and your AI assistant responds with "I don't have access to real-time information." This is where web search agents become game-changers.
Building a web search agent isn't just about connecting an LLM to Google. It's about creating an intelligent system that can formulate search queries, evaluate results, extract relevant content, and synthesize information into coherent responses. Let's build one from scratch.
What is a Web Search Agent?
Think of a web search agent as a LLM in a for loop that instead of querying a vector db or training weights, it queries the web in a loop to find websites with information that the user needs.
The key difference between a simple web search integration and a proper agent is iterative reasoning or in simpler terms a for loop. A basic integration might search once and return results. An agent can:
- Formulate multiple search queries based on initial findings
- Extract content from specific URLs for deeper analysis
- Refine its search strategy based on previous results
- Synthesize information from multiple sources
Architecture Overview
Our web search agent consists of four core components:
- Query Formulation: The LLM analyzes user questions and creates appropriate search queries
- Web Search: Execute searches using a search API (we'll use Brave Search)
- Content Extraction: Parse and extract meaningful content from web pages
- Response Synthesis: Combine information from multiple sources into a coherent answer
User Question → LLM → Search Query → Web Results → Content Extraction → Final Response
↑ ↓
└──────────────── Iterative Refinement ←──────────────────────────────┘
The Implementation
Dependencies and Setup
First, let's set up our environment with the necessary packages:
from mirascope import llm, prompt_template
from typing import Literal
from pydantic import BaseModel, Field
import os
from dotenv import load_dotenv
from datetime import datetime
import requests
import markdownify
load_dotenv()
We're using Mirascope for LLM calls because it provides clean abstractions for tool calling and conversation management. The Brave Search API gives us access to web search without the complexity of Google's pricing tiers.
Before we forget you will need a BRAVE_API_KEY and a GOOGLE_API_KEY in your .env file. You can get a Google API key easily and for free by following these instructions.
Tools
Think of tools as functions you give to a LLM to expand its current functionality. LLMs by themselves are stateless and a black box. Your favorite AIs? Those are not LLMs anymore, they are agents.
For example, ChatGPT is an agent. It has tools to do web search, create images, deep research, get past conversations etc. They added the functionality to their propriety LLMs with tools in order to do all that.
Below we will have two tools. Web search and Context Extraction. One is used to search the web, another is to extract the content of a web page if the LLM thinks it has information we need.
Web Search Tool
def web_search(query: str) -> str:
"""
Searches the web and returns the summaries of top results.
Args:
query: The search query to be executed.
Returns:
A string containing the summaries of the top results.
"""
try:
from brave import Brave
brave = Brave(api_key=os.getenv("BRAVE_API_KEY"))
results = brave.search(q=query, count=10, raw=True)
web_results = results.get("web", {}).get("results", [])
summaries = []
for result in web_results:
if 'profile' not in result:
continue
url = result['url']
header = f"{result['profile']['name']} - {result['profile']['long_name']}"
title = result['title']
snippet = result['description']
summaries.append(f"{header}\n{title}\n{snippet}\n{url}")
return "\n\n".join(summaries)
except Exception as e:
return f"Error searching the web: {e}"
This function handles the initial web search. We're filtering results to ensure they have profiles (which indicates higher quality sources) and structuring the output to include source information, titles, descriptions, and URLs.
Content Extraction Tool
def extract_content(url: str) -> str:
"""
Fetches the content of a given URL and returns it as a markdown page.
Args:
url: The URL to fetch the content from.
Returns:
A string containing the content of the URL as a markdown page.
"""
import requests
import markdownify
response = requests.get(url)
content = response.text
markdown = markdownify.markdownify(content)
return markdown
The content extraction tool converts HTML to markdown, making it easier for the LLM to process. Markdown provides better structure than raw HTML while being more concise than plain text.
The Agent Brain
@llm.call(provider='google', model='gemini-2.0-flash', tools=[web_search, extract_content])
@prompt_template("""
SYSTEM:
You are an expert web searcher. Your task is to answer the user's question using the provided tools.
Use the current date provided to search the web for the most up to date information.
The current date is {current_date}.
You have access to the following tools:
- `web_search(query: str)`: Searches the web and returns summaries of top results.
- `extract_content(url: str)`: Parse the content of a webpage of a given URL and returns it as a markdown page.
When calling the `web_search` tool, the `body` is simply the summary of the search
result with the URL. You MUST then call the `extract_content` tool to get the actual content
of the webpage. It is up to you to determine which search results to parse.
You may call one tool per turn, for up to 10 turns before giving your final answer.
In each turn you should give your thinking process and the final answer when you have gathered all of the information you need.
Once you have gathered all of the information you need, generate a writeup that
strikes the right balance between brevity and completeness based on the context of the user's query.
MESSAGES: {history}
USER: {question}
""")
def search(question: str, history: list = None):
return {"computed_fields": {"current_date": datetime.now().strftime("%Y-%m-%d %H:%M:%S"), "history": history or []}}
Using mirascope's llm.call, we defined what LLM we want to use, what tools it has and a prompt. Read this function once, then twice, then three times to understand it.
The agent's prompt is carefully crafted to:
- Establish clear role and capabilities: The agent knows it's a web searcher with specific tools
- Provide current context: Date information helps with time-sensitive queries
- Define workflow: Search first, then extract content from relevant URLs
- Set boundaries: Maximum 10 iterations prevents infinite loops
- Guide output quality: Balance between brevity and completeness
Orchestration Engine
No that we have the LLM call and gave it the tools its needs, time to run it in a for loop. We will limit it to 10 iterations.
def run_agent_with_tools(question: str, max_iterations: int = 10):
"""
Run the agent with iterative tool calling until completion.
Args:
question: The user's question
max_iterations: Maximum number of tool calling iterations to prevent infinite loops
Returns:
Dict containing the final response and execution details
"""
conversation_history = []
total_cost = 0
total_tokens = 0
iteration = 0
print(f"🤖 Starting agent for question: {question}")
print("=" * 60)
while iteration < max_iterations:
iteration += 1
print(f"\n📍 Iteration {iteration}")
# Make the LLM call with conversation history
result = search(question, history=conversation_history)
# Track costs and tokens
total_cost += result.cost
total_tokens += result.input_tokens + result.output_tokens
# Display LLM reasoning
print(f"💭 LLM Response: {result.content}")
# Add user message to history (agent state management)
if iteration == 1 and result.user_message_param:
conversation_history.append(result.user_message_param)
# Add assistant message to history
conversation_history.append(result.message_param)
# Check if tools were called
if result.tools:
print(f"🔧 Tools called: {len(result.tools)}")
tools_and_outputs = []
for i, tool in enumerate(result.tools):
print(f" Tool {i+1}: {tool._name()}({tool.args})")
# Execute the tool
try:
output = tool.call()
tools_and_outputs.append((tool, output))
print(f" ✅ Tool output length: {len(str(output))} characters")
except Exception as e:
print(f" ❌ Tool error: {e}")
tools_and_outputs.append((tool, f"Error: {e}"))
# Add tool results to conversation history
if tools_and_outputs:
conversation_history.extend(
result.tool_message_params(tools_and_outputs)
)
# Continue the loop for another LLM call with tool results
continue
else:
# No tools called - agent is done
print("✅ No tools called - Agent completed!")
break
if iteration >= max_iterations:
print(f"⚠️ Reached maximum iterations ({max_iterations})")
print("\n" + "=" * 60)
print(f"📊 Final Stats:")
print(f" Iterations: {iteration}")
print(f" Total Cost: ${total_cost:.6f}")
print(f" Total Tokens: {total_tokens}")
print(f" Conversation History Length: {len(conversation_history)}")
return {
'final_response': result.content,
'iterations': iteration,
'total_cost': total_cost,
'total_tokens': total_tokens,
'conversation_history': conversation_history,
'completed': iteration < max_iterations
}
This orchestration engine manages the agent's execution flow with several key features:
State Management: Maintains conversation history throughout iterations
Cost Tracking: Monitors token usage and costs for production budgeting
Error Handling: Gracefully handles tool failures and continues execution
Termination Logic: Stops when the agent decides it has enough information or hits iteration limits
Observability: Provides detailed logging for debugging and monitoring
Agent Execution Flow
Let's trace through how the agent handles a typical query:
# Test the agent
result = run_agent_with_tools("Why did OKC Thunder lose game 1 of the NBA finals?")
Iteration 1: Agent analyzes the question and realizes it needs current NBA information. It calls web_search("OKC Thunder NBA finals game 1 loss 2024")
Iteration 2: Agent reviews search results and identifies relevant articles. It calls extract_content()
on the most promising URLs to get detailed information.
Iteration 3: If needed, the agent might search for additional context like "NBA finals 2024 schedule" or specific player statistics.
Final Response: Agent synthesizes information from multiple sources into a comprehensive answer.
Here is an example run:
result = run_agent_with_tools("Why did OKC Thunder lose game 1 of the NBA finals?")
print(f"\n🎯 Final Answer:")
print(result['final_response'])
Output:
🤖 Starting agent for question: Why did OKC Thunder lose game 1 of the NBA finals?
============================================================
📍 Iteration 1
💭 LLM Response: The NBA finals are in June. The current date is June 8, 2025. Therefore, I will search for the results of the 2025 NBA finals.
🔧 Tools called: 1
Tool 1: web_search({'query': 'OKC Thunder NBA Finals 2025 game 1 result'})
✅ Tool output length: 3868 characters
📍 Iteration 2
💭 LLM Response:
🔧 Tools called: 1
Tool 1: extract_content({'url': 'https://www.espn.com/nba/story/_/id/44610574/nba-finals-2025-postseason-news-scores-highlights'})
✅ Tool output length: 630 characters
📍 Iteration 3
💭 LLM Response:
🔧 Tools called: 1
Tool 1: web_search({'query': 'OKC Thunder vs Pacers Game 1 NBA Finals 2025 recap'})
✅ Tool output length: 4562 characters
📍 Iteration 4
💭 LLM Response: The OKC Thunder lost Game 1 of the 2025 NBA Finals to the Indiana Pacers with a score of 111-110. A late comeback by the Pacers, capped by a pull-up jumper from Tyrese Haliburton with 0.3 seconds remaining, secured the victory for Indiana. The Thunder had built a 15-point lead in the fourth quarter but were unable to maintain it.
✅ No tools called - Agent completed!
============================================================
📊 Final Stats:
Iterations: 4
Total Cost: $0.000783
Total Tokens: 7190
Conversation History Length: 8
🎯 Final Answer:
The OKC Thunder lost Game 1 of the 2025 NBA Finals to the Indiana Pacers with a score of 111-110. A late comeback by the Pacers, capped by a pull-up jumper from Tyrese Haliburton with 0.3 seconds remaining, secured the victory for Indiana. The Thunder had built a 15-point lead in the fourth quarter but were unable to maintain it.
If you take a closer look, you can see it only called the extraction tool once, meaning that it got all the information it needed to answer the users question, saving the user time and money. That Haliburton game winner still hurts, sad !
Why This Architecture is not good(What??!)
This is just a beginners guide to building an agent, but there are a bunch of improvements that can be made. I made them but since it is homework I shall not be sharing them with the class(maybe later)!
Further improvements
We are having the single LLM call handle too much. It can easily lose context and what it needs to do between the following:
- Generating queries to query the brave search API
- Searching the web with said queries
- Extracting content that it needs to from select websites
- Generating final answer
To further imrove this you should seperate out the above points in seperate LLM calls and functions. This way you can have a agentic workflow where you can reduce the load on one LLM call and further enhance the web search capabilities.
You can also turn this into a MCP tool! More on this next time.
Closing Thoughts
Web search agents represent a significant step forward in AI application capabilities. By combining structured tool calling with iterative reasoning, we can build systems that provide accurate, up-to-date information while maintaining transparency about sources and reasoning.
The architecture we've built here is a great starting point for many, but everything can be improved or customized to your needs. You can change the LLM you are using, which web search API you want to use, how you want to extract contect, or even chunk it. Its all up to you.
Remember that building great agents is an iterative process. Start with the basics, test thoroughly with real user queries, and gradually add sophistication based on actual usage patterns. The key is balancing capability with reliability, ensuring your agent provides value without overwhelming users with unnecessary complexity.
Your users will thank you for building AI that actually knows what happened yesterday.