When building an AI agent, the design choice matters. A single agent may be enough for straightforward tasks, while more complex workflows may need multiple specialised agents working together, with each one responsible for a specific part of the process, such as retrieval, writing, verification, coding, testing or review.
This post explains the core components of AI agent design, the ReAct approach, the difference between single-agent and multi-agent architectures, and how to choose the right design depending on the task. It also includes a walkthrough of how a practical Multi-Agent RAG system works and how it was built.
popular because modern LLMs are now highly capable at tasks like coding, writing, reasoning, and solving problems across different fields. This has reduced the need to train custom models and shifted more attention toward building practical applications around existing LLMs. Tools like Codex, Claude Code, Cursor and Windsurf are already helping software engineers work faster, while businesses use agents for customer support, automation and other real-world tasks.
An AI agent is an application that uses an LLM to reason, plan and use tools to perform tasks, allowing the model to interact with its environment in a practical and useful way.
Some of the major components of most AI agents are the LLM, tools, and memory.

Image Generated By ChatGPT
An AI agent differs from a basic chatbot because a chatbot usually follows a more direct workflow: user query → LLM → response. The LLM receives the user’s message and generates a reply based mainly on the prompt and its existing context.
An AI agent goes beyond this by using the LLM to reason about the task, decide what needs to be done, choose whether tools are needed, call those tools, observe the results and continue until it can produce a useful answer.
This is where the ReAct approach comes in. ReAct means Reasoning + Acting. It is an agent pattern where the LLM reasons about a task and takes actions, usually through tools, based on that reasoning. It involves designing a core logic loop around an LLM.

Image Generated By ChatGPT
Step 1: The agent receives a user query
The LLM reasons over the task and decides whether it can answer directly or needs to use tools. It checks what tools are available and decides which ones are needed to solve the task.
Step 2: The agent calls the required tools
Based on its reasoning, the agent takes action by calling the necessary tools. These tools may search the web, retrieve documents from a vector database, access files, run code or connect to an external API. The results returned from these tools are known as tool outputs.
Step 3: The tool outputs are sent back to the LLM
The tool outputs are passed back to the LLM as additional context. This gives the agent more relevant information to work with instead of relying only on the original prompt.
Step 4: The LLM checks the evidence and generates a response
The LLM reviews the tool outputs and checks whether they are enough to solve the task. If the evidence is sufficient, it generates a grounded response for the user. If not, the agent may repeat the reasoning, tool-calling and observation steps until it has enough information to provide a useful answer.
AI Agents can either be single or multi depending on the design structure.

Image Generated By ChatGPT
A single agent is an agent design where one LLM handles the whole task. It reasons, plans and calls the required tools when needed. Most AI agents start as single-agent systems because they are simpler, easier to maintain and usually enough for many tasks.
A multi-agent system uses specialised agents to solve different parts of a task. It often has a central agent, usually called an orchestrator, supervisor or planner, that coordinates the other agents and decides when each one should act. Each specialised agent can have its own role, tools and reasoning logic, making the system more modular and suitable for complex workflows.
A single-agent design works well for simple tasks that require limited tool use. For example, a personal assistant agent that can access your calendar to book reminders, a calculator agent that only uses a calculator tool, or a web search agent that uses a web search API to retrieve up-to-date information.
However, a single agent can become overloaded when the task requires many tools, multi-step reasoning, different responsibilities or verification before the final response is returned to the user. Common issues include overloaded prompting, poor tool routing, unclear agent responsibilities and reduced reliability due to too much complexity in one agent.
A multi-agent system is a better choice when the task may overwhelm a single-agent design and when you need specialised agents with clear roles, their own tools and separate responsibilities.
For example, a software engineering agent may work better as a multi-agent system:
Orchestrator → Coder → Tester → Reviewer
The Orchestrator coordinates the workflow, the Coder agent generates the code, the Tester agent checks whether the code works, and the Reviewer agent reviews the solution to check for missing parts or possible improvements.
Another example is a research agent that researches a topic, retrieves information from different data sources and generates grounded content:
Orchestrator → Retriever → Writer → Verifier
The Retriever agent gathers information from the web and local documents stored in a vector database. The Writer agent writes based on the retrieved content. The Verifier agent checks the written content for errors, citations and factual accuracy before the final response is returned.
Multi-agent systems make the workflow more modular and give each stage a clear role. However, they should be used only when the task genuinely needs that design, because they usually increase latency, cost and maintenance complexity due to more LLM calls and more moving parts.
A simple rule is:
Use a single agent when the task is simple, has fewer steps and needs only a few tools. Use a multi-agent system when the task requires specialised roles, multi-step reasoning, stronger verification or coordination across different tools and workflows.
I built a project called Multi-Agent RAG Researcher to make the idea of multi-agent systems more practical.
The goal of the project is to show how a central agent can coordinate multiple specialised agents to research a topic, retrieve evidence from documents and the web, write a grounded content and verify the content before returning it to the user. Instead of using one agent to handle everything, the system splits the workflow into different responsibilities.

Image Generated By ChatGPT
Check the project on github: https://github.com/ayoolaolafenwa/multi-agent-rag-researcher
Clone Project repo
git clone https://github.com/ayoolaolafenwa/multi-agent-rag-researcher.git
Clone the repo to followup with the code along the post. When the repo is cloned, the project structure will look like this:
.
├── docs/ # Default PDF files
├── memory/ # SQLite-backed session memory helpers
├── qdrant_vector_database/ # PDF ingestion and similarity search
├── ui/ # Gradio app and UI handlers
├── utils/
│ ├── requirements.txt # Python dependencies
├── worker_agents/ # Retriever, writer, and verifier
├── orchestrator_agent.py # Main coordinator
└── run_orchestrator.py # CLI entry point
There are two major data sources:
Qdrant Vector DatabaseInformation retrieval from PDFs is handled in the following stages:
docs/ folder or uploaded through the UI.The document retrieval part of the project where Qdrant vector database is setup, PDF ingestion, chunking, embedding, and similarity search are managed is handled in qdrant_vector_database/vector_store.py .
Tavily Web SearchTavily is used to retrieve up-to-date or external information from the web. The retriever agent can use it when:
Worker Agents
Retriever AgentThe role is:
The code for the retriever agent with tavily web search available in worker_agents/retriever.py . It uses gpt-5.4-mini with low reasoning effort.
Writer AgentThe role is:
The code for the writer agent available in worker_agents/writer.py . It uses gpt-5.4 with low reasoning effort.
Verifier AgentThe role is:
The code for the worker agent is available in worker_agents/verifier.py . It uses gpt-5.4 with low reasoning effort.
SQLite is used to provide short-term memory for the multi-agent workflow. For a given session ID, the system stores:
This allows the orchestrator to reuse relevant evidence for follow-up questions instead of retrieving the same information again every time.
The code for the memory is available in memory/memory.py .
The orchestrator coordinates the three worker agents: Retriever, Writer and Verifier.
The code for the orchestrator is in orchestrator_agent.py . It uses gpt-5.4-mini with low reasoning effort.
The orchestrator has a guardrail that keeps the system focused on research and factual questions. It refuses unrelated general tasks such as coding help or simple math because the goal of the system is to function as a research assistant.
Note: For the models used in the orchestrator and worker agents, you can change them from gpt-5.4 to any openai provided model of your choice.
Installation
Create and activate a virtual environment:
python3 -m venv env source env/bin/activate
2. Install the dependencies:
cd multi-agent-rag-researcher
pip3 install -r utils/requirements.txt
3. Create a utils/var.env file and store your API keys:
OPENAI_API_KEY=your_openai_api_key
TAVILY_API_KEY=your_tavily_api_key
4. Place the PDFs you want to index in the docs/ folder, or upload PDFs later through the UI. The project already includes existing PDFs in docs/, currently Gemma 3 Technical Report.pdf and DeepSeek-V3.2.pdf, so you can use those directly or replace them with your own documents.
Run Project
Start the command-line app:
python3 run_orchestrator.py
When the CLI starts, it ingests the PDFs in docs/ into the local Qdrant store. Type q or exit to end the session.
Run UI for Multi-Agent Chat
Start the Gradio UI:
python3 ui/gradio_app.py
The UI automatically loads the default PDFs from docs/ on startup. If you upload new PDFs, they replace the active indexed document set for that UI session.
utils/memory.db.utils/qdrant_storage/.In this post, I explained how an AI agent works, how it uses tools to interact with its environment, and how the ReAct approach helps it reason, plan, select tools and execute specific tasks.
I also covered the structural design of AI agents, which can be single-agent or multi-agent systems. I explained how both designs work, when to choose each one based on the workflow, and compared single-agent implementation with multi-agent architecture.
Finally, I did a walkthrough of the multi-agent design behind my Multi-Agent RAG Researcher project, showing how it uses an orchestrator to coordinate three worker agents, retrieve information from the web and local documents, use memory for consistency and write and verify grounded content before returning the final output.
Email: [email protected]
Linkedin: https://www.linkedin.com/in/ayoola-olafenwa-003b901a9/
https://developers.openai.com/cookbook
https://developers.openai.com/api/docs/guides/function-calling