-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathdoc_map.json
More file actions
1 lines (1 loc) · 115 KB
/
doc_map.json
File metadata and controls
1 lines (1 loc) · 115 KB
1
[{"id": "89206315-5f59-4686-83f6-d33ab035bb2b", "doc": "# Agentic Retrieval-Augmented Generation : A Survey On Agentic RAG\n\n\n\n\nOverview of Agentic RAG\n\n\n\n\n\n\n\n\n\n\nRecent Update (2025-02-04):\n\n\n> Check section 4 in the table of contents in this repo for the new **Agentic Workflow Patterns**.\n> New images have been added to enhance the **Overview of Agentic RAG**. The paper is also updated.\n\n\n\n---\n\n## Abstract\n\nAgentic Retrieval-Augmented Generation ( Agentic RAG) represents a transformative leap in artificial intelligence by embedding autonomous agents into the RAG pipeline. This repository complements the survey paper \"Agentic Retrieval-Augmented Generation (Agentic RAG): A Survey On Agentic RAG,\" providing insights into:\n\n- Foundational principles, including **Agentic Patterns** such as reflection, planning, tool use, and multi-agent collaboration.\n- A detailed taxonomy of Agentic RAG systems, showcasing frameworks like single-agent, multi-agent, hierarchical, corrective, adaptive, and graph-based RAG.\n- Comparative analysis of traditional RAG, Agentic RAG, and Agentic Document Workflows (ADW) to highlight their strengths, weaknesses, and best-fit scenarios.\n- Real-world applications across industries like healthcare, education, finance, and legal analysis.\n- Challenges and future directions in scaling, ethical AI, multimodal integration, and human-agent collaboration.\n\nThis repository serves as a comprehensive resource for researchers and practitioners to explore, implement, and advance the capabilities of Agentic RAG systems.\n\n---\n\n## Table of Contents\n1. \ud83d\udcdc [Abstract](#abstract)\n2. \ud83e\udde9 [Introduction](#introduction)\n3. \ud83e\udd16 [Agentic Patterns](#agentic-patterns)\n4. \ud83d\udd04 [Agentic Workflow Patterns](#agentic-workflow-patterns)\n5. \ud83d\udee0\ufe0f [Taxonomy of Agentic RAG Systems](#taxonomy-of-agentic-rag-systems)\n6. \ud83d\udd0d [Comparative Analysis of Agentic RAG Frameworks](#comparative-analysis-of-agentic-rag-frameworks)\n7. \ud83d\udcbc [Applications](#applications)\n8. \ud83d\udea7 [Challenges and Future Directions](#challenges-and-future-directions)\n9. \ud83d\udee0\ufe0f [Implementation of RAG Agentic Taxonomy: Techniques and Tools](#implementation-of-rag-agentic-taxonomy-techniques-and-tools)\n10. \ud83d\udcf0 [Blogs and Tutorials on Agentic RAG](#blogs-and-tutorials-on-agentic-rag)\n11. \ud83d\udd8a\ufe0f [Noteworthy Related Concepts](#noteworthy-related-concepts)\n12. \ud83d\udca1 [Practical Implementations and Use Cases of Agentic RAG](#practical-implementations-and-use-cases-of-agentic-rag)\n13. \ud83d\udcda [References](#references)\n14. \ud83d\udd8a\ufe0f [How to Cite](#how-to-cite)\n\n\n---\n\n## Introduction\n\nRetrieval-Augmented Generation (RAG) systems combine the capabilities of large language models (LLMs) with retrieval mechanisms to generate contextually relevant and accurate responses. While traditional RAG systems excel in knowledge retrieval and generation, they often fall short in handling dynamic, multi-step reasoning tasks, adaptability, and orchestration for complex workflows.\n\n**Agentic Retrieval-Augmented Generation (Agentic RAG)** overcomes these limitations by integrating autonomous AI agents. These agents employ core **Agentic Patterns**, such as reflection, planning, tool use, and multi-agent collaboration, to dynamically adapt to task-specific requirements and provide superior performance in:\n\n- Multi-domain knowledge retrieval.\n- Real-time, document-centric workflows.\n- Scalable, adaptive, and ethical AI systems.\n\nThis repository explores the evolution of RAG to Agentic RAG, presenting:\n- **Agentic Patterns**: The core principles driving the system\u2019s adaptability and intelligence.\n- **Taxonomy**: A comprehensive classification of Agentic RAG architectures.\n- **Comparative Analysis**: Key differences between Traditional RAG, Agentic RAG, and ADW.\n- **Applications**: Practical use cases across healthcare, education, finance, and more.\n- **Challenges and Future Directions**: Addressing scalability, ethical AI, and multimodal integration.\n\nWhether you\u2019re a researcher, developer, or practitioner, this repository offers valuable insights and resources to understand and advance Agentic RAG systems.\n\n---\n\n## Agentic Patterns \n\nAgentic RAG systems derive their intelligence and adaptability from well-defined agentic patterns. These patterns enable agents to handle complex reasoning tasks, adapt to dynamic environments, and collaborate effectively. Below are the key patterns central to Agentic RAG:\n\n### 1. Reflection\n- **Definition**: Agents evaluate their own decisions and outputs, identifying errors and areas for improvement.\n- **Key Benefits**:\n - Enables iterative refinement of results.\n - Enhances accuracy in multi-step reasoning tasks.\n- **Example**: In a medical diagnostic system, agents refine diagnoses based on iterative feedback from retrieved data.\n\n\n\nFigure 1: Reflection Pattern\n\n\n\n### 2. Planning\n- **Definition**: Agents create structured workflows and task sequences to solve problems efficiently.\n- **Key Benefits**:\n - Facilitates multi-step reasoning by breaking down tasks.\n - Reduces computational overhead through optimized task prioritization.\n- **Example**: A financial analysis system plans data retrieval tasks to assess risks and provide recommendations.\n\n\n\nFigure 2: Planning Pattern\n\n\n\n### 3. Tool Use\n- **Definition**: Agents interact with external tools, APIs, and knowledge bases to retrieve and process data.\n- **Key Benefits**:\n - Expands the system's capabilities beyond pre-trained knowledge.\n - Enables domain-specific applications by integrating external resources.\n- **Example**: A legal assistant agent retrieves clauses from contract databases and applies domain-specific rules for compliance analysis.\n\n\n\nFigure 3: Tool Use Pattern\n\n\n\n### 4. Multi-Agent Collaboration\n- **Definition**: Multiple agents collaborate to divide and conquer complex tasks, sharing information and results.\n- **Key Benefits**:\n - Handles large-scale and distributed problems efficiently.\n - Combines specialized agent capabilities for better outcomes.\n- **Example**:\n- In customer support, agents collaborate to retrieve knowledge from FAQs, generate responses, and provide follow-ups.\n- **LawGlance** simplifies legal research by leveraging **multi-agent workflows** to retrieve relevant documents, analyze information, and deliver precise legal insights.\n\t\t\t\tIt integrates Crew AI, LangChain, and Chroma to retrieve legal documents, perform web searches, and provide concise, accurate answers tailored to user queries. \n [Access LawGlance on Google Colab](https://colab.research.google.com/drive/1yrS2Kp-kprYWot_sEu7JeWMIRAei_vov?usp=sharing#scrollTo=80NmedKtmCtI)\n\n\n\nFigure 4: Multi-Agent Collaboration Pattern\n\n\n---\n\n### Significance of Agentic Patterns\nThese patterns form the backbone of Agentic RAG systems, enabling them to:\n- Adapt dynamically to task requirements.\n- Improve decision-making through self-evaluation.\n- Leverage external resources for domain-specific reasoning.\n- Handle complex, distributed workflows via collaboration.\n\n---\n\n## Agentic Workflow Patterns: Adaptive Strategies for Dynamic Collaboration\n\nAgentic workflow patterns help structure LLM-based applications to optimize performance, accuracy, and efficiency. Different approaches are suitable depending on task complexity and processing requirements. \n**Source:** [Anthropic Research](https://www.anthropic.com/research/building-effective-agents) and [LangGraph Workflows](https://langchain-ai.github.io/langgraph/tutorials/workflows/)\n\n### 1. Prompt Chaining: Enhancing Accuracy Through Sequential Processing\n- **Definition:** \n Prompt chaining decomposes a complex task into multiple steps, where each step builds upon the previous one. This structured approach improves accuracy by simplifying each subtask before moving forward. However, it may increase latency due to sequential processing.\n- **When to Use:** \n This workflow is most effective when a task can be broken down into fixed subtasks, each contributing to the final output. It is particularly useful in scenarios where step-by-step reasoning enhances accuracy.\n- **Example Applications:**\n - Generating marketing content in one language and then translating it into another while preserving nuances.\n - Structuring document creation by first generating an outline, verifying its completeness, and then developing the full text.\n \n\n\nFigure 1: Illustration of Prompt Chaining Workflow\n\n\n---\n\n### 2. Routing: Directing Inputs to Specialized Processes\n- **Definition:** \n Routing involves classifying an input and directing it to an appropriate specialized prompt or process. This method ensures distinct queries or tasks are handled separately, improving efficiency and response quality.\n- **When to Use:** \n Ideal for scenarios where different types of input require distinct handling strategies, ensuring optimized performance for each category.\n- **Example Applications:**\n - Directing customer service queries into categories such as technical support, refund requests, or general inquiries.\n - Assigning simple queries to smaller models for cost efficiency, while complex requests go to advanced models.\n \n\n\nFigure 2: Illustration of Routing Workflow\n\n\n---\n\n### 3. Parallelization: Speeding Up Processing Through Concurrent Execution\n- **Definition:** \n Parallelization divides a task into independent processes that run simultaneously, reducing latency and improving throughput. It can be categorized into:\n - **Sectioning:** Splitting tasks into independent subtasks.\n - **Voting:** Generating multiple outputs for increased accuracy.\n- **When to Use:** \n Useful when tasks can be executed independently to enhance speed or when multiple outputs improve confidence.\n- **Example Applications:**\n - **Sectioning:** Splitting tasks like content moderation, where one model screens input while another generates a response.\n - **Voting:** Using multiple models to cross-check code for vulnerabilities or analyze content moderation decisions.\n \n\n\nFigure 3: Illustration of Parallelization Workflow\n\n\n---\n\n### 4. Orchestrator-Workers: Dynamic Task Delegation\n- **Definition:** \n This workflow features a central orchestrator model that dynamically breaks tasks into subtasks, assigns them to specialized worker models, and compiles the results. Unlike parallelization, it adapts to varying input complexity.\n- **When to Use:** \n Best suited for tasks requiring dynamic decomposition and real-time adaptation, where subtasks are not predefined.\n- **Example Applications:**\n - Automatically modifying multiple files in a codebase based on the nature of requested changes.\n - Conducting real-time research by gathering and synthesizing relevant information from multiple sources.\n \n\n\nFigure 4: Illustration of Orchestrator-Workers Workflow\n\n\n---\n\n### 5. Evaluator-Optimizer: Refining Output Through Iteration\n- **Definition:** \n The evaluator-optimizer workflow iteratively improves content by generating an initial output and refining it based on feedback from an evaluation model.\n- **When to Use:** \n Effective when iterative refinement significantly enhances response quality, especially when clear evaluation criteria exist.\n- **Example Applications:**\n - Improving literary translations through multiple evaluation and refinement cycles.\n - Conducting multi-round research queries where additional iterations refine search results.\n \n\n\nFigure 5: Illustration of Evaluator-Optimizer Workflow\n\n---\n\n\n## Taxonomy of Agentic RAG Systems\n\nAgentic Retrieval-Augmented Generation (RAG) systems encompass various architectures and workflows, each tailored to specific tasks and levels of complexity. Below is a detailed taxonomy of these systems:\n\n### 1. Single-Agent RAG\n- **Key Idea**: A single autonomous agent manages the retrieval and generation process.\n- **Workflow**:\n 1. Query is submitted to the agent.\n 2. The agent retrieves relevant data from external sources.\n 3. Data is processed and synthesized into a response.\n- **Advantages**:\n - Simple architecture for basic use cases.\n - Easy to implement and maintain.\n- **Limitations**:\n - Limited scalability.\n - Ineffective for multi-step reasoning or large datasets.\n\n### 2. Multi-Agent RAG\n- **Key Idea**: A team of agents collaborates to perform complex retrieval and reasoning tasks.\n- **Workflow**:\n 1. Agents dynamically divide tasks (e.g., retrieval, reasoning, synthesis).\n 2. Each agent specializes in a specific sub-task.\n 3. Results are aggregated and synthesized into a coherent output.\n- **Advantages**:\n - Better performance for distributed, multi-step tasks.\n - Increased modularity and scalability.\n- **Limitations**:\n - Coordination complexity increases with the number of agents.\n - Risk of redundancy or conflicts between agents.\n\n #### Case Study: AgentFlow\nAgentFlow is a **trainable, tool-integrated agentic framework** designed to overcome the **scalability and generalization limits** of today\u2019s tool-augmented reasoning approaches. It coordinates four specialized modules\u2014**Planner**, **Executor**, **Verifier**, **Generator**\u2014and optimizes the **planner** *in the flow* of multi-turn tasks using **Flow-GRPO**, improving long-horizon credit assignment and tool-use reliability.\n\n**Key Features:** \n\ud83e\udde9 Modular Agentic System \u2013 Four specialized agent modules (Planner, Executor, Verifier, Generator) that coordinate via evolving memory and integrated tools across multiple turns. \n\ud83d\udd17 Multi-Tool Integration \u2013 Seamlessly connect with diverse tool ecosystems, including `base_generator`, `python_coder`, `google_search`, `wikipedia_search`, `web_search`, and more. \n\ud83c\udfaf Flow-GRPO Algorithm \u2013 Enables in-the-flow agent optimization for long-horizon reasoning tasks with sparse rewards.\n\n\n### 3. Hierarchical Agentic RAG\n- **Key Idea**: Organizes agents in a hierarchy for better task prioritization and delegation.\n- **Workflow**:\n 1. A top-level agent orchestrates subtasks among lower-level agents.\n 2. Each lower-level agent handles a specific part of the process.\n 3. Results are iteratively refined and integrated at higher levels.\n- **Advantages**:\n - Scalable for large and complex tasks.\n - Modular design facilitates specialization.\n- **Limitations**:\n - Requires sophisticated orchestration mechanisms.\n - Potential bottlenecks at higher levels of the hierarchy.\n\n### 4. Corrective Agentic RAG\n- **Key Idea**: Feedback loops enable agents to evaluate and refine their outputs iteratively.\n- **Workflow**:\n 1. Initial response is generated by the agent.\n 2. A critic module evaluates the response for errors or inconsistencies.\n 3. The agent refines the response based on feedback.\n 4. Steps 2-3 repeat until the output meets quality standards.\n- **Advantages**:\n - High accuracy and reliability through iterative improvements.\n - Useful for error-prone or high-stakes tasks.\n- **Limitations**:\n - Increased computational overhead.\n - Feedback mechanisms must be well-designed to avoid infinite loops.\n\n### 5. Adaptive Agentic RAG\n- **Key Idea**: Dynamically adjusts retrieval strategies and workflows based on task requirements.\n- **Workflow**:\n 1. The agent assesses the query and its context.\n 2. Adapts retrieval strategies in real-time based on available data and user needs.\n 3. Synthesizes a response using dynamic workflows.\n- **Advantages**:\n - High flexibility for varied tasks and dynamic environments.\n - Improves context relevance and user satisfaction.\n- **Limitations**:\n - Challenging to design robust adaptation mechanisms.\n - Computational overhead for real-time adjustments.\n\n\n\n### 6. Graph-Based Agentic RAG\nGraph-based RAG systems extend traditional RAG by integrating graph-based data structures for advanced reasoning.\n\n#### 6.1 Agent-G: Agentic Framework for Graph RAG\n- **Key Idea**: Dynamically assigns tasks to specialized agents using graph knowledge bases and feedback loops.\n- **Workflow**:\n 1. Extract relationships from graph knowledge bases (e.g., disease-to-symptom mappings).\n 2. Complement with unstructured data from external sources.\n 3. Use a critic module to validate results and iteratively improve.\n- **Advantages**:\n - Combines structured and unstructured data.\n - Modular and scalable for complex tasks.\n - Ensures high accuracy through iterative refinement.\n\n#### 6.2 GeAR: Graph-Enhanced Agent for RAG\n- **Key Idea**: Enhances RAG systems with graph expansion techniques and agent-based architectures.\n- **Workflow**:\n 1. Expand query-related graphs for better relational understanding.\n 2. Leverage specialized agents for multi-hop reasoning.\n 3. Synthesize graph-structured and unstructured information into responses.\n- **Advantages**:\n - Excels in multi-hop reasoning scenarios.\n - Improves accuracy for deep contextual tasks.\n - Dynamically adapts to complex query environments.\n\n### 7. Agentic Document Workflows (ADW)\n\nAgentic Document Workflows (ADW) extend traditional RAG systems by automating document-centric processes with intelligent agents.\n\n#### Workflow\n1. **Document Parsing and Structuring**:\n - Extracts structured data from documents like invoices or contracts.\n2. **State Maintenance**:\n - Tracks context across multi-step workflows for consistency.\n3. **Knowledge Retrieval**:\n - Retrieves relevant references from external sources or domain-specific databases.\n4. **Agentic Orchestration**:\n - Applies business rules, performs multi-hop reasoning, and orchestrates external APIs.\n5. **Actionable Output Generation**:\n - Produces structured outputs tailored to specific use cases (e.g., reports or summaries).\n\n#### Key Features and Advantages\n- **State Maintenance**: Ensures consistency in multi-step workflows.\n- **Domain-Specific Intelligence**: Adapts to specialized domains with tailored rules.\n- **Scalability**: Handles large-scale document processing efficiently.\n- **Enhanced Productivity**: Reduces manual effort and augments human expertise.\n\n---\n\n### Visual Representations\n\n\n\nFigure 5: Single-Agent RAG Diagram\n\n\n\nFigure 6: Multi-Agent RAG Diagram\n\n\n\nFigure 7: Hierarchical RAG Workflow\n\n\n\nFigure 8: Graph-Based RAG Workflow\n\n\n\n\nFigure 9: ADW Workflow Diagram\n[Source]\n\n\n\n---\n\n## Comparative Analysis of Agentic RAG Frameworks\n\nThe table below provides a comprehensive comparative analysis of the three architectural frameworks: Traditional RAG, Agentic RAG, and Agentic Document Workflows (ADW). This analysis highlights their respective strengths, weaknesses, and best-fit scenarios, offering valuable insights into their applicability across diverse use cases.\n\n| **Feature** | **Traditional RAG** | **Agentic RAG** | **Agentic Document Workflows (ADW)** |\n| ------------------------------ | ----------------------------------------- | ----------------------------------------------- | ---------------------------------------------------------- |\n| **Focus** | Isolated retrieval and generation tasks | Multi-agent collaboration and reasoning | Document-centric end-to-end workflows |\n| **Context Maintenance** | Limited | Enabled through memory modules | Maintains state across multi-step workflows |\n| **Dynamic Adaptability** | Minimal | High | Tailored to document workflows |\n| **Workflow Orchestration** | Absent | Orchestrates multi-agent tasks | Integrates multi-step document processing |\n| **Use of External Tools/APIs** | Basic integration (e.g., retrieval tools) | Extends via tools like APIs and knowledge bases | Deeply integrates business rules and domain-specific tools |\n| **Scalability** | Limited to small datasets or queries | Scalable for multi-agent systems | Scales for multi-domain enterprise workflows |\n| **Complex Reasoning** | Basic (e.g., simple Q&A) | Multi-step reasoning with agents | Structured reasoning across documents |\n| **Primary Applications** | QA systems, knowledge retrieval | Multi-domain knowledge and reasoning | Contract review, invoice processing, claims analysis |\n| **Strengths** | Simplicity, quick setup | High accuracy, collaborative reasoning | End-to-end automation, domain-specific intelligence |\n| **Challenges** | Poor contextual understanding | Coordination complexity | Resource overhead, domain standardization |\n\n---\n\n### Key Takeaways\n- **Traditional RAG** is best suited for simpler tasks requiring basic retrieval and generation capabilities.\n- **Agentic RAG** excels in multi-agent collaborative reasoning, making it suitable for more complex, multi-domain tasks.\n- **Agentic Document Workflows (ADW)** provide tailored, document-centric solutions for enterprise-scale applications like contract analysis and invoice processing.\n\n---\n\n## Applications\n\nAgentic Retrieval-Augmented Generation (RAG) systems have transformative potential across diverse industries, enabling intelligent retrieval, multi-step reasoning, and dynamic adaptation to complex tasks. Below are some key domains where Agentic RAG systems make a significant impact:\n\n### 1. Healthcare and Personalized Medicine\n- **Problem**: Rapid retrieval and synthesis of medical knowledge for diagnostics, treatment planning, and research.\n- **Applications**:\n - Clinical decision support systems leveraging multi-modal data (e.g., patient records, medical literature).\n - Automating medical report generation with relevant contextual references.\n - Multi-hop reasoning for analyzing complex relationships (e.g., disease-to-symptom mappings or treatment-to-outcome correlations).\n\n### 2. Education and Personalized Learning\n- **Problem**: Delivering personalized and adaptive learning experiences for diverse learners.\n- **Applications**:\n - Designing intelligent tutors capable of real-time knowledge retrieval and personalized feedback.\n - Generating customized educational content based on student progress and preferences.\n - Multi-agent systems for collaborative learning simulations.\n\n### 3. Legal and Contract Analysis\n- **Problem**: Analyzing complex legal documents and extracting actionable insights.\n- **Applications**:\n - Contract summarization and clause comparison with contextual alignment to legal standards.\n - Retrieval of precedent cases and regulatory guidelines for compliance.\n - Iterative workflows for identifying inconsistencies or conflicts in contracts.\n\n### 4. Finance and Risk Analysis\n- **Problem**: Analyzing large-scale financial datasets and identifying trends, risks, and opportunities.\n- **Applications**:\n - Automating the generation of financial summaries and investment recommendations.\n - Real-time fraud detection through multi-step reasoning and data correlation.\n - Scenario-based modeling for risk analysis using graph-based workflows.\n\n### 5. Customer Support and Virtual Assistants\n- **Problem**: Providing contextually relevant and dynamic responses to customer queries.\n- **Applications**:\n - Building AI-powered virtual assistants for real-time customer support.\n - Adaptive systems that improve responses by learning from user feedback.\n - Multi-agent orchestration for handling complex, multi-query interactions.\n\n### 6. Graph-Enhanced Applications in Multimodal Workflows\n- **Problem**: Tackling tasks requiring relational understanding and multi-modal data integration.\n- **Applications**:\n - Graph-based retrieval systems for connecting structured and unstructured data.\n - Enhanced reasoning workflows in domains like scientific research and knowledge management.\n - Synthesis of insights across text, images, and structured data for actionable outputs.\n\n### 7. Document-Centric Workflows\n- **Problem**: Automating complex workflows involving document parsing, data extraction, and multi-step reasoning.\n- **Applications**:\n - **Invoice Payments Workflow**:\n - Parses invoices to extract key details (e.g., invoice number, vendor info, payment terms).\n - Retrieves related vendor contracts to verify terms and compliance.\n - Generates a payment recommendation report, including cost-saving suggestions (e.g., early payment discounts).\n - **Contract Review**:\n - Analyzes legal contracts for critical clauses and compliance issues.\n - Automatically identifies risks and provides actionable recommendations.\n - **Insurance Claims Analysis**:\n - Automates claims review, extracting policy terms and calculating payouts based on predefined rules.\n- **Key Advantages**:\n - **State Maintenance**: Tracks the document\u2019s context across workflow stages.\n - **Domain-Specific Intelligence**: Applies tailored rules for industry-specific needs.\n - **Scalability**: Handles large volumes of enterprise documents efficiently.\n - **Enhanced Productivity**: Reduces manual effort and augments human expertise.\n\n---\n\n## Challenges and Future Directions\n\nWhile Agentic Retrieval-Augmented Generation (RAG) systems show immense promise, there are several challenges and research opportunities that remain unaddressed:\n\n### Challenges\n1. **Coordination Complexity in Multi-Agent Systems**:\n - Managing communication and collaboration among multiple agents can lead to inefficiencies and increased computational overhead.\n - Balancing task assignments and resolving conflicts between agents remains a critical issue.\n\n2. **Ethical and Responsible AI**:\n - Ensuring unbiased retrieval and decision-making in sensitive domains like healthcare and finance.\n - Addressing data privacy concerns and building transparent systems that adhere to ethical standards.\n\n3. **Scalability and Latency**:\n - Scaling Agentic RAG systems to handle large datasets and high-volume queries without compromising response times.\n - Addressing latency in multi-agent and graph-based workflows.\n\n4. **Hybrid Human-Agent Collaboration**:\n - Designing systems that effectively integrate human oversight with autonomous agents for tasks requiring domain expertise.\n - Maintaining user trust and control while leveraging the strengths of AI agents.\n\n5. **Expanding Multimodal Capabilities**:\n - Integrating text, image, audio, and video data for richer and more comprehensive outputs.\n - Handling the complexity of multimodal reasoning in real-time applications.\n\n---\n\n### Future Directions\n1. **Enhanced Agentic Orchestration**:\n - Development of more robust coordination frameworks for hierarchical and multi-agent systems.\n - Incorporating adaptive learning mechanisms to dynamically improve task allocation.\n\n2. **Domain-Specific Applications**:\n - Customizing Agentic RAG systems for niche domains like legal analysis, personalized education, and advanced scientific research.\n\n3. **Ethical AI and Governance Frameworks**:\n - Building tools to monitor, explain, and mitigate biases in AI outputs.\n - Developing policies and guidelines for ethical deployment in high-stakes environments.\n\n4. **Efficient Graph-Based Reasoning**:\n - Optimizing graph-based workflows for large-scale, real-world applications.\n - Exploring hybrid approaches that combine graph-based reasoning with neural networks.\n\n5. **Human-AI Synergy**:\n - Designing intuitive interfaces and workflows to empower humans to interact effectively with Agentic RAG systems.\n - Focusing on explainability and user-centric design.\n\n---\n## Implementation of RAG Agentic Taxonomy: Techniques and Tools\n| Technique | Tools | Description | Notebooks |\n| ------------------------------------ | ------------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |\n| Single Agentic RAG | LangChain, FAISS, Athina AI | Uses AI agents to find and generate answers using tools like vectordb and web searches. | [View Notebook](https://colab.research.google.com/github/athina-ai/rag-cookbooks/blob/main/agentic_rag_techniques/basic_agentic_rag.ipynb) |\n| | LlamaIndex, Vertex AI (Vector Store, Text Embedding, LLM), Google Cloud Storage | Demonstrates a single-router Agentic RAG system using LlamaIndex with Vertex AI for context retrieval and response generation. | [View Notebook](https://docs.llamaindex.ai/en/stable/examples/agent/agentic_rag_using_vertex_ai/) |\n| | LangChain, IBM Granite-3-8B-Instruct, Watsonx.ai, Chroma DB, WebBaseLoader | Builds an Agentic RAG system using IBM Granite-3-8B-Instruct model in Watsonx.ai to answer complex queries with external information. | [View Notebook](https://github.com/ibm-granite-community/granite-snack-cookbook/blob/main/recipes/AI-Agents/Agentic_RAG.ipynb) |\n| | LangGraph, Chroma, NVIDIA Inference Microservices (NIMs), Tavily Search API | This system uses a router-based architecture to determine whether a query should be handled by a RAG pipeline (retrieving from a vector database) or a websearch pipeline. An AI agent evaluates the query's topic and routes it to the appropriate pipeline for information retrieval and response generation, ensuring accurate, relevant, and contextually augmented answers. | [View Notebook](https://github.com/NVIDIA/workbench-example-agentic-rag) |\n| | LlamaIndex, Redis, Amazon Bedrock, RedisVectorStore, LlamaParse, BedrockEmbedding, SemanticCache | This system implements a ReAct agent-based RAG pipeline where the agent interacts with a Redis-backed index and vector store to retrieve and process data from a PDF document. It utilizes Amazon Bedrock embeddings and LlamaIndex to process the document, build embeddings, and handle retrieval-based augmented generation. Additionally, semantic caching optimizes the system by reducing redundant LLM queries for repeated or similar user questions, improving response times and efficiency. | [View Notebook](https://github.com/redis-developer/agentic-rag) |\n| Multi-Agent Agentic RAG Orchestrator | AutoGen, SQL, AI Search Indexes | This orchestrator utilizes a multi-agent system to facilitate complex task execution through coordinated agent interactions. Using a factory pattern and various predefined strategies (e.g., classic_rag for retrieval-augmented generation and nl2sql for translating natural language to SQL), the system enables flexible, multi-agent collaboration for tasks like database querying and document retrieval. The orchestrator supports agent communication, iterative responses, and customizable strategies, offering a high level of adaptability for diverse use cases. | [View Notebook](https://github.com/Azure/gpt-rag-agentic) |\n| Hierarchical Multi-Agent Agentic RAG | Weaviate, ExaSearch, Groq, crewAI | This approach uses a hierarchical agentic architecture with multiple agents, each responsible for specific tasks or tools. A manager agent coordinates the work of specialized agents (such as WeaviateTool for internal document retrieval, ExaSearchTool for web searches, and Groq for fast AI inference) to handle complex queries. The flexible, task-oriented system can support various use cases such as QA and workflow automation. | [View Notebook](https://github.com/lorenzejay/agentic-rag-practical-example) |\n| Corrective RAG | LangChain, LangGraph, Chromadb, Athina AI | Refines relevant documents, removes irrelevant ones or does the web search. | [View Notebook](https://colab.research.google.com/github/athina-ai/rag-cookbooks/blob/main/agentic_rag_techniques/corrective_rag.ipynb) |\n| | LangChain, FAISS, HuggingFace Inference API, SmolAgents, HyDE, Self-Query | This system incorporates query reformulation and self-query strategies to address limitations in traditional RAG systems. It performs iterative retrieval by critiquing the relevance of retrieved documents and re-querying as needed. The agent refines queries to improve semantic similarity and ensure higher accuracy. Self-grading mechanisms assess the quality of retrieved information, enhancing results through iterative improvement. The system aligns with Corrective RAG principles by reducing confabulations and improving retrieval relevance. | [View Notebook](https://github.com/aymericroucher/agentic-rag-query-reformulation) |\n| Adaptive RAG | LangChain, LangGraph, FAISS, Athina AI | Adjusts retrieval methods based on query type, using indexed data or web search. | [View Notebook](https://colab.research.google.com/github/athina-ai/rag-cookbooks/blob/main/agentic_rag_techniques/adaptive_rag.ipynb) |\n| ReAct RAG | LangChain, LangGraph, FAISS, Athina AI | System combining reasoning and retrieval for context-aware responses | [](https://colab.research.google.com/github/athina-ai/rag-cookbooks/blob/main/agentic_rag_techniques/react_rag.ipynb) |\n| Self RAG | LangChain, LangGraph, FAISS, Athina AI | Reflects on retrieved data to ensure accurate and complete responses. | [](https://colab.research.google.com/github/athina-ai/rag-cookbooks/blob/main/agentic_rag_techniques/self_rag.ipynb) |\n\n---\n## Blogs and Tutorials on Agentic RAG\n1. DeepLearning.AI: How agents can improve LLM performance. [DeepLearning.AI](https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/?ref=dl-staging-website.ghost.io)\n2. Weaviate Blog: What is agentic RAG? [Weaviate Blog](https://weaviate.io/blog/what-is-agentic-rag#:~:text=is%20Agentic%20RAG%3F-,%E2%80%8B,of%20the%20non%2Dagentic%20pipeline)\n3. LangGraph CRAG Tutorial: LangGraph CRAG: Contextualized retrieval-augmented generation tutorial. [LangGraph CRAG](https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_crag/)\n4. LangGraph Adaptive RAG Tutorial: LangGraph adaptive RAG: Adaptive retrieval-augmented generation tutorial. [LangGraph Adaptive RAG](https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_adaptive_rag/). Accessed: 2025-01-14.\n5. LlamaIndex Blog: Agentic RAG with LlamaIndex. [LlamaIndex Blog](https://www.llamaindex.ai/blog/agentic-rag-with-llamaindex-2721b8a49ff6)\n6. Hugging Face Cookbook. Agentic RAG: Turbocharge your retrieval-augmented generation with query reformulation and self-query. [Hugging Face Cookbook](https://huggingface.co/learn/cookbook/en/agent_rag)\n7. Hugging Face Agentic RAG: https://huggingface.co/docs/smolagents/en/examples/rag\n8. Qdrant Blog. Agentic RAG: Combining RAG with agents for enhanced information retrieval. [Qdrant Blog](https://qdrant.tech/articles/agentic-rag/)\n9. Semantic Kernel: Semantic Kernel is an open-source SDK by Microsoft that integrates large language models (LLMs) into applications. It supports agentic patterns, enabling the creation of autonomous AI agents for natural language understanding, task automation, and decision-making. It has been used in scenarios like ServiceNow\u2019s P1 incident management to facilitate real-time collaboration, automate task execution, and retrieve contextual information seamlessly.\n\n - [GitHub - RAG with Semantic Kernel](https://github.com/bostonazure/rag-vector-agent-semantic-kernel)\n - [GitHub - Semantic Kernel](https://github.com/microsoft/semantic-kernel)\n - [ServiceNow Case Study](https://devblogs.microsoft.com/semantic-kernel/customer-case-study-pushing-the-boundaries-of-multi-agent-ai-collaboration-with-servicenow-and-microsoft-semantic-kernel/)\n\n---\n\n## Practical Implementations and Use Cases of Agentic RAG\n1. AWS Machine Learning Blog. How Twitch used agentic workflow with RAG on Amazon Bedrock to supercharge ad sales. [AWS Machine Learning Blog](https://aws.amazon.com/blogs/machine-learning/how-twitch-used-agentic-workflow-with-rag-on-amazon-bedrock-to-supercharge-ad-sales/)\n2. LlamaCloud Demo Repository. Patient case summary workflow using LlamaCloud. [GitHub](https://github.com/run-llama/llamacloud-demo/blob/main/examples/document_workflows/patient_case_summary/patient_case_summary.ipynb) 2025. Accessed: 2025-01-13.\n3. LlamaCloud Demo Repository. Contract review workflow using LlamaCloud. [GitHub](https://github.com/run-llama/llamacloud-demo/blob/main/examples/document_workflows/contract_review/contract_review.ipynb)\n4. LlamaCloud Demo Repository. Auto insurance claims workflow using LlamaCloud. [GitHub](https://github.com/run-llama/llamacloud-demo/blob/main/examples/document_workflows/auto_insurance_claims/auto_insurance_claims.ipynb)\n5. LlamaCloud Demo Repository. Research paper report generation workflow using LlamaCloud.[GitHub](https://github.com/run-llama/llamacloud-demo/blob/main/examples/report_generation/research_paper_report_generation.ipynb)\n\n---\n\n### Noteworthy Related Concepts\nBelow are some noteworthy resources related to Agentic Design Patterns. The first five items are from **Andrew Ng**\u2019s series at **DeepLearning.ai**:\n\n1. **Agentic Design Patterns Part 1** \n [How Agents Can Improve LLM Performance](https://www.deeplearning.ai/the-batch/how-agents-can-improve-llm-performance/?ref=dl-staging-website.ghost.io)\n2. **Agentic Design Patterns Part 2, Reflection** \n [Read More](https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-2-reflection/)\n3. **Agentic Design Patterns Part 3, Tool Use** \n [Read More](https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-3-tool-use/)\n4. **Agentic Design Patterns Part 4, Planning** \n [Read More](https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-4-planning/)\n5. **Agentic Design Patterns Part 5, Multi-Agent Collaboration** \n [Read More](https://www.deeplearning.ai/the-batch/agentic-design-patterns-part-5-multi-agent-collaboration/)\n\n**Additional Resources**\n\n- **Building Agentic RAG with LlamaIndex** \n [Explore the Course](https://www.deeplearning.ai/short-courses/building-agentic-rag-with-llamaindex/)\n- **AI Agentic Design Patterns with AutoGen** \n [Explore the Course](https://www.deeplearning.ai/short-courses/ai-agentic-design-patterns-with-autogen/)\n- **LangGraph Agentic RAG** \n [Read More](https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_agentic_rag/#nodes-and-edges)\n\n\n\n\n---\n## References\n\n### Research Papers on Agentic RAG \n\n#### 1. Single-Agent RAG (Router-Based)\n- Search-o1: Agentic Search-Enhanced Large Reasoning Models https://arxiv.org/abs/2501.05366\n\n#### 2. Multi-Agent Agentic RAG\n- Agentic Retrieval-Augmented Generation for Time Series Analysis https://arxiv.org/abs/2408.14484\n\n#### 3. Corrective Agentic RAG\n- Agentic AI-Driven Technical Troubleshooting for Enterprise Systems https://arxiv.org/abs/2412.12006\n- Corrective RAG (CRAG) https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_crag/\n- Corrective Retrieval Augmented Generation https://arxiv.org/abs/2401.15884\n- Agentic AI-Driven Technical Troubleshooting for Enterprise Systems https://arxiv.org/abs/2412.12006\n\n#### 4. Adaptive Agentic RAG\n- Langgraph Adaptive RAG https://langchain-ai.github.io/langgraph/tutorials/rag/langgraph_adaptive_rag/\n- MBA-RAG: A Bandit Approach for Adaptive Retrieval-Augmented https://arxiv.org/abs/2412.01572\n- CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control https://arxiv.org/abs/2405.18727\n- Adaptive-RAG: Learning to Adapt Retrieval-Augmented Large Language Models through Question Complexity https://arxiv.org/abs/2403.14403\n- AT-RAG: An Adaptive RAG Model Enhancing Query Efficiency with Topic Filtering and Iterative Reasoning https://arxiv.org/abs/2410.12886\n\n\n\n#### 5. Graph-Based Agentic RAG\n- GeAR: Graph-enhanced Agent for Retrieval-augmented Generation https://arxiv.org/abs/2412.18431\n- Agent-G: An Agentic Framework for Graph Retrieval Augmented Generation https://openreview.net/forum?id=g2C947jjjQ\n\n---\n\n## How to Cite\n\nIf you find this work useful in your research, please cite:\n\n```bibtex\n@misc{singh2025agenticretrievalaugmentedgenerationsurvey,\n title={Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG}, \n author={Aditi Singh and Abul Ehtesham and Saket Kumar and Tala Talaei Khoei},\n year={2025},\n eprint={2501.09136},\n archivePrefix={arXiv},\n primaryClass={cs.AI},\n url={https://arxiv.org/abs/2501.09136}, \n}\n \n"}, {"id": "72fe83ca-0dfe-415c-a59f-63e0fd9520b1", "doc": "Agentic RAG for PDFs with mixed data | CohereDASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG INGuides and conceptsAPI ReferenceRelease NotesLLMUCookbooksSearch/Ask AIGuides and conceptsAPI ReferenceRelease NotesLLMUCookbooksCookbooksAgent API CallsShort-Term Memory Handling for AgentsAgentic Multi-Stage RAG with Cohere Tools APIAgentic RAG for PDFs with mixed dataAnalysis of Form 10-K/10-Q Using Cohere and RAGAnalyzing Hacker News with Six Language Understanding MethodsArticle Recommender with Text Embedding Classification ExtractionMulti-Step Tool UseBasic RAGBasic Semantic SearchBasic Tool UseCalendar Agent with Native Multi Step ToolChunking StrategiesCreating a QA Bot From Technical DocumentationFinancial CSV Agent with Native Multi-Step Cohere APIFinancial CSV Agent with LangchainMigrating away from create_csv_agent in langchain-cohereA Data Analyst Agent Built with Cohere and LangchainAdvanced Document Parsing For EnterprisesEnd-to-end RAG using Elasticsearch and CohereSemantic Search with Cohere Embed Jobs and Pinecone serverless SolutionSemantic Search with Cohere Embed JobsFueling Generative Content with Keyword ResearchGrounded Summarization Using Command RHello World! Meet Language AILong Form General StrategiesMigrating Monolithic Prompts to Command-R with RAGMultilingual Search with Cohere and LangchainPDF Extractor with Native Multi Step Tool UsePondr, Fostering Connection through Good ConversationDeep Dive Into RAG EvaluationRAG With Chat Embed and Rerank via PineconeDemo of RerankSQL AgentSummarization EvalsText Classification Using EmbeddingsTopic Modeling AI PapersWikipedia Semantic Search with Cohere + WeaviateWikipedia Semantic Search with Cohere Embedding ArchivesBuild Chatbots That Know Your Business with MongoDB and CohereFinetuning on Cohere's PlatformDeploy your finetuned model on AWS MarketplaceFinetuning on AWS SagemakerSQL Agent with Cohere and LangChain (i-5O Case Study)Introduction to Aya VisionRetrieval Evaluation with LLM-as-a-Judge via Pydantic AIDocument Translation with Command A TranslateLightOn this pageMotivationObjectiveReference DocumentsInstall DependenciesParsingVector Store SetupRAG PipelineExampleChat History ManagementRAG Pipeline ClassCohere ReAct Agent with RAG ToolConclusionAgentic RAG for PDFs with mixed dataCopy pageShaan Desai\nBack to CookbooksOpen in GitHub\nMotivation\nRetrieval-augmented generation (RAG) allows language models to generate grounded answers to questions about documents. However, the complexity of the documents can significantly influence overall RAG performance. For instance, the documents may be PDFs that contain a mix of text and tables.\nMore broadly, the implementation of a RAG pipeline - including parsing and chunking of documents, along with the embedding and retrieval of the chunks - is critical to the accuracy of grounded answers. Additionally, it is sometimes not sufficient to merely retrieve the answers; a user may want further postprocessing performed on the output. This use case would benefit from giving the model access to tools.\nObjective\nIn this notebook, we will guide you through best practices for setting up a RAG pipeline to process documents that contain both tables and text. We will also demonstrate how to create a ReAct agent with a Cohere model, and then give the agent access to a RAG pipeline tool to improve accuracy. The general structure of the notebook is as follows:\n\nindividual components around parsing, retrieval and generation are covered for documents with mixed tabular and textual data\na class object is created that can be used to instantiate the pipeline with parametric input\nthe RAG pipeline is then used as a tool for a Cohere ReACT agent\n\nReference Documents\nWe recommend the following notebook as a guide to semi-structured RAG.\nWe also recommend the following notebook to explore various parsing techniques for PDFs.\nVarious LangChain-supported parsers can be found here.\nInstall Dependencies\nPYTHON1# there may be other dependencies that will need installation2# ! pip install --quiet langchain langchain_cohere langchain_experimental3# !pip --quiet install faiss-cpu tiktoken4# !pip install pypdf5# !pip install pytesseract6# !pip install opencv-python --upgrade7# !pip install \"unstructured[all-docs]\"8# !pip install chromadb\nPYTHON1# LLM2import os3from langchain.text_splitter import RecursiveCharacterTextSplitter4from langchain_community.document_loaders import WebBaseLoader5from langchain_community.vectorstores import FAISS6from langchain_cohere import CohereEmbeddings7from pydantic import BaseModel8from unstructured.partition.pdf import partition_pdf9from langchain_community.document_loaders import PyPDFLoader10import os11from typing import Any12import uuid13from langchain.retrievers.multi_vector import MultiVectorRetriever14from langchain.storage import InMemoryStore15from langchain_community.vectorstores import Chroma16from langchain_core.documents import Document17import cohere, json18import pandas as pd19from datasets import load_dataset20from joblib import Parallel, delayed2122os.environ['COHERE_API_KEY'] = \"\"\nParsing\nTo improve RAG performance on PDFs with mixed types (text and tables), we investigated a number of parsing and chunking strategies from various libraries:\n\nPyPDFLoader (LC)\nLlamaParse (Llama-Index)\nUnstructured\n\nWe have found that the best option for parsing is unstructured.io since the parser can:\n\nseparate tables from text\nautomatically chunk the tables and text by title during the parsing step so that similar elements are grouped\n\nPYTHON1# UNSTRUCTURED pdf loader2# Get elements3raw_pdf_elements = partition_pdf(4 filename=\"city_ny_popular_fin_report.pdf\",5 # Unstructured first finds embedded image blocks6 extract_images_in_pdf=False,7 # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles8 # Titles are any sub-section of the document9 infer_table_structure=True,10 # Post processing to aggregate text once we have the title11 chunking_strategy=\"by_title\",12 # Chunking params to aggregate text blocks13 # Attempt to create a new chunk 3800 chars14 # Attempt to keep chunks > 2000 chars15 max_characters=4000,16 new_after_n_chars=3800,17 combine_text_under_n_chars=2000,18 image_output_dir_path='.',19)\nPYTHON1# extract table and textual objects from parser2class Element(BaseModel):3 type: str4 text: Any56# Categorize by type7categorized_elements = []8for element in raw_pdf_elements:9 if \"unstructured.documents.elements.Table\" in str(type(element)):10 categorized_elements.append(Element(type=\"table\", text=str(element)))11 elif \"unstructured.documents.elements.CompositeElement\" in str(type(element)):12 categorized_elements.append(Element(type=\"text\", text=str(element)))1314# Tables15table_elements = [e for e in categorized_elements if e.type == \"table\"]16print(len(table_elements))1718# Text19text_elements = [e for e in categorized_elements if e.type == \"text\"]20print(len(text_elements))\nOutput1424\nVector Store Setup\nThere are many options for setting up a vector store. Here, we show how to do so using Chroma and Langchain\u2019s Multi-vector retrieval.\nAs the name implies, multi-vector retrieval allows us to store multiple vectors per document; for instance, for a single document chunk, one could keep embeddings for both the chunk itself, and a summary of that document. A summary may be able to distill more accurately what a chunk is about, leading to better retrieval.\nYou can read more about this here: https://python.langchain.com/docs/how_to/multi_vector/\nBelow, we demonstrate the following process:\n\nsummaries of each chunk are embedded\nduring inference, the multi-vector retrieval returns the full context document related to the summary\n\nPYTHON1co = cohere.Client()2def get_chat_output(message, preamble, chat_history, model, temp, documents=None):3 return co.chat(4 message=message,5 preamble=preamble,6 chat_history=chat_history,7 documents=documents,8 model=model,9 temperature=temp10 ).text1112def parallel_proc_chat(prompts,preamble,chat_history=None,model='command-a-03-2025',temp=0.1,n_jobs=10):13 \"\"\"Parallel processing of chat endpoint calls.\"\"\"14 responses = Parallel(n_jobs=n_jobs, prefer=\"threads\")(delayed(get_chat_output)(prompt,preamble,chat_history,model,temp) for prompt in prompts)15 return responses1617def rerank_cohere(query, returned_documents,model:str=\"rerank-multilingual-v3.0\",top_n:int=3):18 response = co.rerank(19 query=query,20 documents=returned_documents,21 top_n=top_n,22 model=model,23 return_documents=True24 )25 top_chunks_after_rerank = [results.document.text for results in response.results]26 return top_chunks_after_rerank\nPYTHON1# generate table and text summaries2prompt_text = \"\"\"You are an assistant tasked with summarizing tables and text. \\3Give a concise summary of the table or text. Table or text chunk: {element}. Only provide the summary and no other text.\"\"\"45table_prompts = [prompt_text.format(element=i.text) for i in table_elements]6table_summaries = parallel_proc_chat(table_prompts,None)7text_prompts = [prompt_text.format(element=i.text) for i in text_elements]8text_summaries = parallel_proc_chat(text_prompts,None)9tables = [i.text for i in table_elements]10texts = [i.text for i in text_elements]\nPYTHON1# The vectorstore to use to index the child chunks2vectorstore = Chroma(collection_name=\"summaries\", embedding_function=CohereEmbeddings())3# The storage layer for the parent documents4store = InMemoryStore()5id_key = \"doc_id\"6# The retriever (empty to start)7retriever = MultiVectorRetriever(8 vectorstore=vectorstore,9 docstore=store,10 id_key=id_key,11)12# Add texts13doc_ids = [str(uuid.uuid4()) for _ in texts]14summary_texts = [15 Document(page_content=s, metadata={id_key: doc_ids[i]})16 for i, s in enumerate(text_summaries)17]18retriever.vectorstore.add_documents(summary_texts)19retriever.docstore.mset(list(zip(doc_ids, texts)))20# Add tables21table_ids = [str(uuid.uuid4()) for _ in tables]22summary_tables = [23 Document(page_content=s, metadata={id_key: table_ids[i]})24 for i, s in enumerate(table_summaries)25]26retriever.vectorstore.add_documents(summary_tables)27retriever.docstore.mset(list(zip(table_ids, tables)))\nRAG Pipeline\nWith our database in place, we can run queries against it. The query process can be broken down into the following steps:\n\naugment the query, this really helps retrieve all the relevant information\nuse each augmented query to retrieve the top k docs and then rerank them\nconcatenate all the shortlisted/reranked docs and pass them to the generation model\n\nPYTHON1def process_query(query, retriever):2 \"\"\"Runs query augmentation, retrieval, rerank and final generation in one call.\"\"\"3 augmented_queries=co.chat(message=query,model='command-a-03-2025',temperature=0.2, search_queries_only=True)4 #augment queries5 if augmented_queries.search_queries:6 reranked_docs=[]7 for itm in augmented_queries.search_queries:8 docs=retriever.invoke(itm.text)9 temp_rerank = rerank_cohere(itm.text,docs)10 reranked_docs.extend(temp_rerank)11 documents = [{\"title\": f\"chunk {i}\", \"snippet\": reranked_docs[i]} for i in range(len(reranked_docs))]12 else:13 #no queries will be run through RAG14 documents = None1516 preamble = \"\"\"17## Task & Context18You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.1920## Style Guide21Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.22\"\"\"23 model = 'command-a-03-2025'24 temp = 0.225262728 response = co.chat(29 message=query,30 documents=documents,31 preamble=preamble,32 model=model,33 temperature=temp34 )3536 final_answer_docs=\"\"\"The final answer is from the documents below:3738 {docs}\"\"\".format(docs=str(response.documents))3940 final_answer = response.text41 return final_answer, final_answer_docs\nExample\nWe can now test out a query. In this example, the final answer can be found on page 12 of the PDF, which aligns with the response provided by the model:\nPYTHON1query = \"what are the charges for services in 2022\"2final_answer, final_answer_docs = process_query(query, retriever)3print(final_answer)4print(final_answer_docs)567chat_history=[{'role':\"USER\", 'message':query},{'role':\"CHATBOT\", 'message':f'The final answer is: {final_answer}.' + final_answer_docs}]\nOutputThe charges for services in 2022 were $5,266 million.The final answer is from the documents below: [{'id': 'doc_0', 'snippet': 'Program and General Revenues FY 2023 FY 2022 FY 2021 Category (in millions) Charges for Services (CS) $5,769 $5,266 $5,669 Operating Grants and Contributions (OGC) 27,935 31,757 28,109 Capital Grants and Contributions (CGC) 657 656 675 Real Estate Taxes (RET) 31,502 29,507 31,421 Sales and Use Taxes (SUT) 10,577 10,106 7,614 Personal Income Taxes (PIT) 15,313 15,520 15,795 Income Taxes, Other (ITO) 13,181 9,521 9,499 Other Taxes* (OT) 3,680 3,777 2,755 Investment Income* (II) 694 151 226 Unrestricted Federal and State Aid (UFSA) 234 549 108 Other* (O) Total Program and General Revenues - Primary Government 2,305 $110,250 $107,535 $104,176 708 725', 'title': 'chunk 0'}]\nChat History Management\nIn the example below, we ask a follow up question that relies on the chat history, but does not require a rerun of the RAG pipeline.\nWe detect questions that do not require RAG by examining the search_queries object returned by calling co.chat to generate candidate queries to answer our question. If this object is empty, then the model has determined that a document query is not needed to answer the question.\nIn the example below, the else statement is invoked based on query2. We still pass in the chat history, allowing the question to be answered with only the prior context.\nPYTHON1query2='divide this by two'2augmented_queries=co.chat(message=query2,model='command-a-03-2025',temperature=0.2, search_queries_only=True)3if augmented_queries.search_queries:4 print('RAG is needed')5 final_answer, final_answer_docs = process_query(query, retriever)6 print(final_answer)7else:8 print('RAG is not needed')9 response = co.chat(10 message=query2,11 model='command-a-03-2025',12 chat_history=chat_history,13 temperature=0.314 )1516 print(\"Final answer:\")17 print(response.text)\nOutput RAG is not needed Final answer: The result of dividing the charges for services in 2022 by two is $2,633.\n\nRAG Pipeline Class\nHere, we connect all of the pieces discussed above into one class object, which is then used as a tool for a Cohere ReAct agent. This class definition consolidates and clarify the key parameters used to define the RAG pipeline.\nPYTHON1co = cohere.Client()\nPYTHON1class Element(BaseModel):2 type: str3 text: Any45class RAG_pipeline():6 def __init__(self,paths):7 self.embedding_model=\"embed-v4.0\"8 self.generation_model=\"command-a-03-2025\"9 self.summary_model=\"command-a-03-2025\"10 self.rerank_model=\"rerank-multilingual-v3.0\"11 self.num_docs_to_retrieve = 1012 self.top_k_rerank=313 self.temperature=0.214 self.preamble=\"\"\"15## Task & Context16You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.1718## Style Guide19Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.20\"\"\"21 self.n_jobs=10 #number of parallel processes to run summarization of chunks22 self.extract_images_in_pdf=False23 self.infer_table_structure=True24 self.chunking_strategy=\"by_title\"25 self.max_characters=400026 self.new_after_n_chars=380027 self.combine_text_under_n_chars=200028 self.image_output_dir_path='.'29 self.paths = paths30 self.parse_and_build_retriever()3132 def parse_and_build_retriever(self,):33 #step1, parse pdfs34 # if condition just for debugging since perf_audit.pdf is parsed in the prev step, no need to rerun35 parsed_pdf_list=self.parse_pdfs(self.paths)36 #separate tables and text37 extracted_tables, extracted_text = self.extract_text_and_tables(parsed_pdf_list)38 #generate summaries for everything39 tables, table_summaries, texts, text_summaries=self.generate_summaries(extracted_tables,extracted_text)40 self.tables = tables41 self.table_summaries = table_summaries42 self.texts = texts43 self.text_summaries=text_summaries44 #setup the multivector retriever45 self.make_retriever(tables, table_summaries, texts, text_summaries)4647 def extract_text_and_tables(self,parsed_pdf_list):48 # extract table and textual objects from parser49 # Categorize by type50 all_table_elements = []51 all_text_elements = []52 for raw_pdf_elements in parsed_pdf_list:53 categorized_elements = []54 for element in raw_pdf_elements:55 if \"unstructured.documents.elements.Table\" in str(type(element)):56 categorized_elements.append(Element(type=\"table\", text=str(element)))57 elif \"unstructured.documents.elements.CompositeElement\" in str(type(element)):58 categorized_elements.append(Element(type=\"text\", text=str(element)))5960 # Tables61 table_elements = [e for e in categorized_elements if e.type == \"table\"]62 print(len(table_elements))6364 # Text65 text_elements = [e for e in categorized_elements if e.type == \"text\"]66 print(len(text_elements))67 all_table_elements.extend(table_elements)68 all_text_elements.extend(text_elements)6970 return all_table_elements, all_text_elements7172 def parse_pdfs(self, paths):7374 path_raw_elements = []75 for path in paths:76 raw_pdf_elements = partition_pdf(77 filename=path,78 # Unstructured first finds embedded image blocks79 extract_images_in_pdf=self.extract_images_in_pdf,80 # Use layout model (YOLOX) to get bounding boxes (for tables) and find titles81 # Titles are any sub-section of the document82 infer_table_structure=self.infer_table_structure,83 # Post processing to aggregate text once we have the title84 chunking_strategy=self.chunking_strategy,85 # Chunking params to aggregate text blocks86 # Attempt to create a new chunk 3800 chars87 # Attempt to keep chunks > 2000 chars88 max_characters=self.max_characters,89 new_after_n_chars=self.new_after_n_chars,90 combine_text_under_n_chars=self.combine_text_under_n_chars,91 image_output_dir_path=self.image_output_dir_path,92 )93 path_raw_elements.append(raw_pdf_elements)94 print('PDFs parsed')95 return path_raw_elements969798 def get_chat_output(self,message, preamble, model, temp):99 # print(\"**message\")100 # print(message)101102 response=co.chat(103 message=message,104 preamble=preamble,105 model=model,106 temperature=temp107 ).text108 # print(\"**output\")109 # print(response)110 return response111112 def parallel_proc_chat(self,prompts,preamble,model,temp,n_jobs):113 \"\"\"Parallel processing of chat endpoint calls.\"\"\"114 responses = Parallel(n_jobs=n_jobs, prefer=\"threads\")(delayed(self.get_chat_output)(prompt,preamble,model,temp) for prompt in prompts)115 return responses116117 def rerank_cohere(self,query, returned_documents,model, top_n):118 response = co.rerank(119 query=query,120 documents=returned_documents,121 top_n=top_n,122 model=model,123 return_documents=True124 )125 top_chunks_after_rerank = [results.document.text for results in response.results]126 return top_chunks_after_rerank127128 def generate_summaries(self,table_elements,text_elements):129 # generate table and text summaries130131 summarize_prompt = \"\"\"You are an assistant tasked with summarizing tables and text. \\132 Give a concise summary of the table or text. Table or text chunk: {element}. Only provide the summary and no other text.\"\"\"133134 table_prompts = [summarize_prompt.format(element=i.text) for i in table_elements]135 table_summaries = self.parallel_proc_chat(table_prompts,self.preamble,self.summary_model,self.temperature,self.n_jobs)136 text_prompts = [summarize_prompt.format(element=i.text) for i in text_elements]137 text_summaries = self.parallel_proc_chat(text_prompts,self.preamble,self.summary_model,self.temperature,self.n_jobs)138 tables = [i.text for i in table_elements]139 texts = [i.text for i in text_elements]140 print('summaries generated')141 return tables, table_summaries, texts, text_summaries142143 def make_retriever(self,tables, table_summaries, texts, text_summaries):144 # The vectorstore to use to index the child chunks145 vectorstore = Chroma(collection_name=\"summaries\", embedding_function=CohereEmbeddings())146 # The storage layer for the parent documents147 store = InMemoryStore()148 id_key = \"doc_id\"149 # The retriever (empty to start)150 retriever = MultiVectorRetriever(151 vectorstore=vectorstore,152 docstore=store,153 id_key=id_key,154 search_kwargs={\"k\": self.num_docs_to_retrieve}155 )156 # Add texts157 doc_ids = [f'text_{i}' for i in range(len(texts))]#[str(uuid.uuid4()) for _ in texts]158 summary_texts = [159 Document(page_content=s, metadata={id_key: doc_ids[i]})160 for i, s in enumerate(text_summaries)161 ]162 retriever.vectorstore.add_documents(summary_texts,ids=doc_ids)163 retriever.docstore.mset(list(zip(doc_ids, texts)))164 # Add tables165 table_ids = [f'table_{i}' for i in range(len(texts))]#[str(uuid.uuid4()) for _ in tables]166 summary_tables = [167 Document(page_content=s, metadata={id_key: table_ids[i]})168 for i, s in enumerate(table_summaries)169 ]170 retriever.vectorstore.add_documents(summary_tables,ids=table_ids)171 retriever.docstore.mset(list(zip(table_ids, tables)))172 self.retriever = retriever173 print('retriever built')174175 def process_query(self,query):176 \"\"\"Runs query augmentation, retrieval, rerank and generation in one call.\"\"\"177 augmented_queries=co.chat(message=query,model=self.generation_model,temperature=self.temperature, search_queries_only=True)178 #augment queries179 if augmented_queries.search_queries:180 reranked_docs=[]181 for itm in augmented_queries.search_queries:182 docs=self.retriever.invoke(itm.text)183 temp_rerank = self.rerank_cohere(itm.text,docs,model=self.rerank_model,top_n=self.top_k_rerank)184 reranked_docs.extend(temp_rerank)185 documents = [{\"title\": f\"chunk {i}\", \"snippet\": reranked_docs[i]} for i in range(len(reranked_docs))]186 else:187 documents = None188189 response = co.chat(190 message=query,191 documents=documents,192 preamble=self.preamble,193 model=self.generation_model,194 temperature=self.temperature195 )196197 final_answer_docs=\"\"\"The final answer is from the documents below:198199 {docs}\"\"\".format(docs=str(response.documents))200201 final_answer = response.text202 return final_answer, final_answer_docs\nPYTHON1rag_object=RAG_pipeline(paths=[\"city_ny_popular_fin_report.pdf\"])\nThis function will be deprecated in a future release and unstructured will simply use the DEFAULT_MODEL from unstructured_inference.model.base to set default model name\nOutputPDFs parsed1424summaries generatedretriever built\nCohere ReAct Agent with RAG Tool\nFinally, we build a simple agent that utilizes the RAG pipeline defined above. We do this by granting the agent access to two tools:\n\nthe end-to-end RAG pipeline\na Python interpreter\n\nThe intention behind coupling these tools is to enable the model to perform mathematical and other postprocessing operations on RAG outputs using Python.\nPYTHON1from langchain.agents import Tool2from langchain_experimental.utilities import PythonREPL3from langchain.agents import AgentExecutor4from langchain_cohere.react_multi_hop.agent import create_cohere_react_agent5from langchain_core.prompts import ChatPromptTemplate6from langchain_cohere.chat_models import ChatCohere7from langchain.tools.retriever import create_retriever_tool8from langchain_core.pydantic_v1 import BaseModel, Field9from langchain_core.tools import tool1011class react_agent():12 def __init__(self,rag_retriever,model=\"command-a-03-2025\",temperature=0.2):13 self.llm = ChatCohere(model=model, temperature=temperature)14 self.preamble=\"\"\"15## Task & Context16You help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.1718## Style Guide19Unless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.2021## Guidelines22You are an expert who answers the user's question.23You have access to a vectorsearch tool that will use your query to search through documents and find the relevant answer.24You also have access to a python interpreter tool which you can use to run code for mathematical operations.25\"\"\"26 self.get_tools(rag_retriever)27 self.build_agent()2829 def get_tools(self,rag_retriever):30 @tool31 def vectorsearch(query: str):32 \"\"\"Uses the query to search through a list of documents and return the most relevant documents as well as the answer.\"\"\"33 final_answer, final_answer_docs=rag_retriever.process_query(query)34 return final_answer + final_answer_docs35 vectorsearch.name = \"vectorsearch\" # use python case36 vectorsearch.description = \"Uses the query to search through a list of documents and return the most relevant documents as well as the answer.\"37 class vectorsearch_inputs(BaseModel):38 query: str = Field(description=\"the users query\")39 vectorsearch.args_schema = vectorsearch_inputs404142 python_repl = PythonREPL()43 python_tool = Tool(44 name=\"python_repl\",45 description=\"Executes python code and returns the result. The code runs in a static sandbox without interactive mode, so print output or save output to a file.\",46 func=python_repl.run,47 )48 python_tool.name = \"python_interpreter\"49 class ToolInput(BaseModel):50 code: str = Field(description=\"Python code to execute.\")51 python_tool.args_schema = ToolInput5253 self.alltools = [vectorsearch,python_tool]5455 def build_agent(self):56 # Prompt template57 prompt = ChatPromptTemplate.from_template(\"{input}\")58 # Create the ReAct agent59 agent = create_cohere_react_agent(60 llm=self.llm,61 tools=self.alltools,62 prompt=prompt,63 )64 self.agent_executor = AgentExecutor(agent=agent, tools=self.alltools, verbose=True,return_intermediate_steps=True)656667 def run_agent(self,query,history=None):68 if history:69 response=self.agent_executor.invoke({70 \"input\": query,71 \"preamble\": self.preamble,72 \"chat_history\": history73 })74 else:75 response=self.agent_executor.invoke({76 \"input\": query,77 \"preamble\": self.preamble,78 })79 return response\nPYTHON1agent_object=react_agent(rag_retriever=rag_object)\nPYTHON1step1_response=agent_object.run_agent(\"what are the charges for services in 2022 and 2023\")\nOutput\u001b[1m> Entering new AgentExecutor chain...\u001b[0m\u001b[32;1m\u001b[1;3mI will search for the charges for services in 2022 and 2023.{'tool_name': 'vectorsearch', 'parameters': {'query': 'charges for services in 2022 and 2023'}}\u001b[0m\u001b[36;1m\u001b[1;3mThe charges for services in 2022 were $5,266 million and in 2023 were $5,769 million.The final answer is from the documents below: [{'id': 'doc_0', 'snippet': 'Program and General Revenues FY 2023 FY 2022 FY 2021 Category (in millions) Charges for Services (CS) $5,769 $5,266 $5,669 Operating Grants and Contributions (OGC) 27,935 31,757 28,109 Capital Grants and Contributions (CGC) 657 656 675 Real Estate Taxes (RET) 31,502 29,507 31,421 Sales and Use Taxes (SUT) 10,577 10,106 7,614 Personal Income Taxes (PIT) 15,313 15,520 15,795 Income Taxes, Other (ITO) 13,181 9,521 9,499 Other Taxes* (OT) 3,680 3,777 2,755 Investment Income* (II) 694 151 226 Unrestricted Federal and State Aid (UFSA) 234 549 108 Other* (O) Total Program and General Revenues - Primary Government 2,305 $110,250 $107,535 $104,176 708 725', 'title': 'chunk 0'}]\u001b[0m\u001b[32;1m\u001b[1;3mRelevant Documents: 0Cited Documents: 0Answer: The charges for services in 2022 were $5,266 million and in 2023 were $5,769 million.Grounded answer: The charges for services in <co: 0=\"\">2022</co:> were <co: 0=\"\">$5,266 million</co:> and in <co: 0=\"\">2023</co:> were <co: 0=\"\">$5,769 million</co:>.\u001b[0m\u001b[1m> Finished chain.\u001b[0m\nJust like earlier, we can also pass chat history to the LangChain agent to refer to for any other queries.\nPYTHON1from langchain_core.messages import HumanMessage, AIMessage\nPYTHON1chat_history=[2HumanMessage(content=step1_response['input']),3AIMessage(content=step1_response['output'])4]\nPYTHON1agent_object.run_agent(\"what is the mean of the two values\",history=chat_history)\nOutput\u001b[1m> Entering new AgentExecutor chain...\u001b[0mPython REPL can execute arbitrary code. Use with caution.\u001b[32;1m\u001b[1;3mI will use the Python Interpreter tool to calculate the mean of the two values.{'tool_name': 'python_interpreter', 'parameters': {'code': 'import numpy as np\\n\\n# Data\\nvalues = [5266, 5769]\\n\\n# Calculate the mean\\nmean_value = np.mean(values)\\n\\nprint(f\"The mean of the two values is: {mean_value:.0f} million\")'}}\u001b[0m\u001b[33;1m\u001b[1;3mThe mean of the two values is: 5518 million\u001b[0m\u001b[32;1m\u001b[1;3mRelevant Documents: 0Cited Documents: 0Answer: The mean of the two values is 5518 million.Grounded answer: The mean of the two values is <co: 0=\"\">5518 million</co:>.\u001b[0m\u001b[1m> Finished chain.\u001b[0m\nOutput1{'input': 'what is the mean of the two values',2'preamble': \"\\n## Task & Context\\nYou help people answer their questions and other requests interactively. You will be asked a very wide array of requests on all kinds of topics. You will be equipped with a wide range of search engines or similar tools to help you, which you use to research your answer. You should focus on serving the user's needs as best you can, which will be wide-ranging.\\n\\n## Style Guide\\nUnless the user asks for a different style of answer, you should answer in full sentences, using proper grammar and spelling.\\n\\n## Guidelines\\nYou are an expert who answers the user's question. \\nYou have access to a vectorsearch tool that will use your query to search through documents and find the relevant answer.\\nYou also have access to a python interpreter tool which you can use to run code for mathematical operations.\\n\",3'chat_history': [HumanMessage(content='what are the charges for services in 2022 and 2023'),4AIMessage(content='The charges for services in 2022 were $5,266 million and in 2023 were $5,769 million.')],5'output': 'The mean of the two values is 5518 million.',6'citations': [CohereCitation(start=30, end=42, text='5518 million', documents=[{'output': 'The mean of the two values is: 5518 million\\n'}])],7'intermediate_steps': [(AgentActionMessageLog(tool='python_interpreter', tool_input={'code': 'import numpy as np\\n\\n# Data\\nvalues = [5266, 5769]\\n\\n# Calculate the mean\\nmean_value = np.mean(values)\\n\\nprint(f\"The mean of the two values is: {mean_value:.0f} million\")'}, log='\\nI will use the Python Interpreter tool to calculate the mean of the two values.\\n{\\'tool_name\\': \\'python_interpreter\\', \\'parameters\\': {\\'code\\': \\'import numpy as np\\\\n\\\\n# Data\\\\nvalues = [5266, 5769]\\\\n\\\\n# Calculate the mean\\\\nmean_value = np.mean(values)\\\\n\\\\nprint(f\"The mean of the two values is: {mean_value:.0f} million\")\\'}}\\n', message_log=[AIMessage(content='\\nPlan: I will use the Python Interpreter tool to calculate the mean of the two values.\\nAction: ```json\\n[\\n {\\n \"tool_name\": \"python_interpreter\",\\n \"parameters\": {\\n \"code\": \"import numpy as np\\\\n\\\\n# Data\\\\nvalues = [5266, 5769]\\\\n\\\\n# Calculate the mean\\\\nmean_value = np.mean(values)\\\\n\\\\nprint(f\\\\\"The mean of the two values is: {mean_value:.0f} million\\\\\")\"\\n }\\n }\\n]\\n```')]),8'The mean of the two values is: 5518 million\\n')]}\nConclusion\nAs you can see, the RAG pipeline can be used as a tool for a Cohere ReAct agent. This allows the agent to access the RAG pipeline for document retrieval and generation, as well as a Python interpreter for postprocessing mathematical operations to improve accuracy. This setup can be used to improve the accuracy of grounded answers to questions about documents that contain both tables and text.Was this page helpful?YesNoEdit this pagePreviousAnalysis of Form 10-K/10-Q Using Cohere and RAGNextBuild withDocsv2 APIv2 APIDASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN"}, {"id": "f7b5bfbb-dc9f-4467-8338-3166f34e42d6", "doc": "Routing Queries to Data Sources | CohereDASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG INGuides and conceptsAPI ReferenceRelease NotesLLMUCookbooksSearch/Ask AIGuides and conceptsAPI ReferenceRelease NotesLLMUCookbooksGet StartedIntroductionInstallationCreating a clientQuickstartPlaygroundFAQsModelsAn Overview of Cohere's ModelsCommandEmbedRerankAyaText GenerationIntroduction to Text Generation at CohereUsing the Chat APIReasoningImage InputsStreaming ResponsesStructured OutputsPredictable OutputsAdvanced Generation ParametersRetrieval Augmented Generation (RAG)Tool UseTokens and TokenizersSummarizing TextSafety ModesEmbeddings (Vectors, Search, Retrieval)Introduction to Embeddings at CohereSemantic Search with EmbeddingsMultimodal EmbeddingsBatch Embedding JobsRerankingGoing to ProductionAPI Keys and Rate LimitsGoing LiveDeprecationsHow Does Cohere's Pricing Work?IntegrationsIntegrating Embedding Models with Other ToolsCohere and LangChainLlamaIndex and CohereDeployment OptionsOverviewSDK CompatibilityPrivate DeploymentCloud AI ServicesTutorialsCookbooksLLM UniversityBuild Things with Cohere!Agentic RAGCohere on AzureResponsible UseSecurityUsage PolicyCommand A Technical ReportCommand R and Command R+ Model CardCohere LabsCohere Labs Acceptable Use PolicyMore ResourcesCohere ToolkitDatasetsImprove Cohere DocsLightOn this pageSetupSetting up the toolsRunning an agentic RAG workflowRouting queries to toolsSummaryTutorialsAgentic RAGRouting Queries to Data SourcesCopy pageOpen in Colab\nImagine a RAG system that can search over diverse sources, such as a website, a database, and a set of documents.\nIn a standard RAG setting, the application would aggregate retrieved documents from all the different sources it is connected to. This may contribute to noise from less relevant documents.\nAdditionally, it doesn\u2019t take into consideration that, given a data source\u2019s nature, it might be less or more relevant to a query than the other data sources.\nAn agentic RAG system can solve this problem by routing queries to the most relevant tools based on the query\u2019s nature. This is done by leveraging the tool use capabilities of the Chat endpoint.\nIn this tutorial, we\u2019ll cover:\n\nSetting up the tools\nRunning an agentic RAG workflow\nRouting queries to tools\n\nWe\u2019ll build an agent that can answer questions about using Cohere, equipped with a number of different tools.\nSetup\nTo get started, first we need to install the cohere library and create a Cohere client.\nWe also need to import the tool definitions that we\u2019ll use in this tutorial.\n Important: the source code for tool definitions can be found here. Make sure to have the tool_def.py file in the same directory as this notebook for the imports to work correctly. \nPYTHON1! pip install cohere langchain langchain-community pydantic -qq\nPYTHON1import json2import os3import cohere45from tool_def import (6 search_developer_docs,7 search_developer_docs_tool,8 search_internet,9 search_internet_tool,10 search_code_examples,11 search_code_examples_tool,12)1314co = cohere.ClientV2(15 \"COHERE_API_KEY\"16) # Get your free API key: https://dashboard.cohere.com/api-keys1718os.environ[\"TAVILY_API_KEY\"] = (19 \"TAVILY_API_KEY\" # We'll need the Tavily API key to perform internet search. Get your API key: https://app.tavily.com/home20)\nSetting up the tools\nIn an agentic RAG system, each data source is represented as a tool. A tool is broadly any function or service that can receive and send objects to the LLM. But in the case of RAG, this becomes a more specific case of a tool that takes a query as input and returns a set of documents.\nHere, we are defining a Python function for each tool, but more broadly, the tool can be any function or service that can receive and send objects.\n\nsearch_developer_docs: Searches Cohere developer documentation. Here we are creating a small list of sample documents for simplicity and will return the same list for every query. In practice, you will want to implement a search function such as those that use semantic search.\nsearch_internet: Performs an internet search using Tavily search, which we take from LangChain\u2019s ready implementation.\nsearch_code_examples: Searches for Cohere code examples and tutorials. Here we are also creating a small list of sample documents for simplicity.\n\nThese functions are mapped to a dictionary called functions_map for easy access.\nHere, we are defining a Python function for each tool.\nFurther reading:\n\nDocumentation on parameter types in tool use\n\nPYTHON1functions_map = {2 \"search_developer_docs\": search_developer_docs,3 \"search_internet\": search_internet,4 \"search_code_examples\": search_code_examples,5}\nThe second and final setup step is to define the tool schemas in a format that can be passed to the Chat endpoint. A tool schema must contain the following fields: name, description, and parameters in the format shown below.\nThis schema informs the LLM about what the tool does, which enables an LLM to decide whether to use a particular tool. Therefore, the more descriptive and specific the schema, the more likely the LLM will make the right tool call decisions.\nRunning an agentic RAG workflow\nWe can now run an agentic RAG workflow using a tool use approach. We can think of the system as consisting of four components:\n\nThe user\nThe application\nThe LLM\nThe tools\n\nAt its most basic, these four components interact in a workflow through four steps:\n\nStep 1: Get user message \u2013 The LLM gets the user message (via the application)\nStep 2: Tool planning and calling \u2013 The LLM makes a decision on the tools to call (if any) and generates the tool calls\nStep 3: Tool execution - The application executes the tools and sends the results to the LLM\nStep 4: Response and citation generation \u2013 The LLM generates the response and citations to back to the user\n\nWe wrap all these steps in a function called run_agent.\nPYTHON1tools = [2 search_developer_docs_tool,3 search_internet_tool,4 search_code_examples_tool,5]\nPYTHON1system_message = \"\"\"## Task and Context2You are an assistant who helps developers use Cohere. You are equipped with a number of tools that can provide different types of information. If you can't find the information you need from one tool, you should try other tools if there is a possibility that they could provide the information you need.\"\"\"\nPYTHON1model = \"command-a-03-2025\"234def run_agent(query, messages=None):5 if messages is None:6 messages = []78 if \"system\" not in {m.get(\"role\") for m in messages}:9 messages.append({\"role\": \"system\", \"content\": system_message})1011 # Step 1: get user message12 print(f\"QUESTION:\\n{query}\")13 print(\"=\" * 50)1415 messages.append({\"role\": \"user\", \"content\": query})1617 # Step 2: Generate tool calls (if any)18 response = co.chat(19 model=model, messages=messages, tools=tools, temperature=0.320 )2122 while response.message.tool_calls:2324 print(\"TOOL PLAN:\")25 print(response.message.tool_plan, \"\\n\")26 print(\"TOOL CALLS:\")27 for tc in response.message.tool_calls:28 print(29 f\"Tool name: {tc.function.name} | Parameters: {tc.function.arguments}\"30 )31 print(\"=\" * 50)3233 messages.append(34 {35 \"role\": \"assistant\",36 \"tool_calls\": response.message.tool_calls,37 \"tool_plan\": response.message.tool_plan,38 }39 )4041 # Step 3: Get tool results42 for tc in response.message.tool_calls:43 tool_result = functions_map[tc.function.name](44 **json.loads(tc.function.arguments)45 )46 tool_content = []47 for data in tool_result:48 tool_content.append(49 {50 \"type\": \"document\",51 \"document\": {\"data\": json.dumps(data)},52 }53 )54 # Optional: add an \"id\" field in the \"document\" object, otherwise IDs are auto-generated55 messages.append(56 {57 \"role\": \"tool\",58 \"tool_call_id\": tc.id,59 \"content\": tool_content,60 }61 )6263 # Step 4: Generate response and citations64 response = co.chat(65 model=model,66 messages=messages,67 tools=tools,68 temperature=0.3,69 )7071 messages.append(72 {73 \"role\": \"assistant\",74 \"content\": response.message.content[0].text,75 }76 )7778 # Print final response79 print(\"RESPONSE:\")80 print(response.message.content[0].text)81 print(\"=\" * 50)8283 # Print citations (if any)84 verbose_source = (85 False # Change to True to display the contents of a source86 )87 if response.message.citations:88 print(\"CITATIONS:\\n\")89 for citation in response.message.citations:90 print(91 f\"Start: {citation.start}| End:{citation.end}| Text:'{citation.text}' \"92 )93 print(\"Sources:\")94 for idx, source in enumerate(citation.sources):95 print(f\"{idx+1}. {source.id}\")96 if verbose_source:97 print(f\"{source.tool_output}\")98 print(\"\\n\")99100 return messages\nRouting queries to tools\nLet\u2019s ask the agent a few questions, starting with this one about the Embed endpoint.\nBecause the question asks about a specific feature, the agent decides to use the search_developer_docs tool (instead of retrieving from all the data sources it\u2019s connected to).\nIt first generates a tool plan that describes how it will handle the query. Then, it generates tool calls to the search_developer_docs tool with the associated query parameter.\nThe tool does indeed contain the information asked by the user, which the agent then uses to generate its response.\nPYTHON1messages = run_agent(\"How many languages does Embed support?\")\n1QUESTION:2How many languages does Embed support?3==================================================4TOOL PLAN:5I will search the Cohere developer documentation for 'how many languages does Embed support'. 67TOOL CALLS:8Tool name: search_developer_docs | Parameters: {\"query\":\"how many languages does Embed support\"}9==================================================10RESPONSE:11The Embed endpoint supports over 100 languages.12==================================================13CITATIONS:1415Start: 28| End:47| Text:'over 100 languages.' 16Sources:171. search_developer_docs_gwt5g55gjc3w:2\nLet\u2019s now ask the agent a question about setting up the Notion API so we can connect it to LLMs. This information is not likely to be found in the developer documentation or code examples because it is not Cohere-specific, so we can expect the agent to use the internet search tool.\nAnd this is exactly what the agent does. This time, it decides to use the search_internet tool, triggers the search through Tavily search, and uses the results to generate its response.\nPYTHON1messages = run_agent(\"How to set up the Notion API.\")\n1QUESTION:2How to set up the Notion API.3==================================================4TOOL PLAN:5I will search for 'Notion API setup' to find out how to set up the Notion API. 67TOOL CALLS:8Tool name: search_internet | Parameters: {\"query\":\"Notion API setup\"}9==================================================10RESPONSE:11To set up the Notion API, you need to create a new integration in Notion's integrations dashboard. You can do this by navigating to https://www.notion.com/my-integrations and clicking '+ New integration'.1213Once you've done this, you'll need to get your API secret by visiting the Configuration tab. You should keep your API secret just that \u2013 a secret! You can refresh your secret if you accidentally expose it.1415Next, you'll need to give your integration page permissions. To do this, you'll need to pick or create a Notion page, then click on the ... More menu in the top-right corner of the page. Scroll down to + Add Connections, then search for your integration and select it. You'll then need to confirm the integration can access the page and all of its child pages.1617If your API requests are failing, you should confirm you have given the integration permission to the page you are trying to update.1819You can also create a Notion API integration and get your internal integration token. You'll then need to create a .env file and add environmental variables, get your Notion database ID and add your integration to your database.2021For more information on what you can build with Notion's API, you can refer to this guide.22==================================================23CITATIONS:2425Start: 38| End:62| Text:'create a new integration' 26Sources:271. search_internet_cwabyfc5mn8c:0282. search_internet_cwabyfc5mn8c:2293031Start: 75| End:98| Text:'integrations dashboard.' 32Sources:331. search_internet_cwabyfc5mn8c:2343536Start: 132| End:170| Text:'https://www.notion.com/my-integrations' 37Sources:381. search_internet_cwabyfc5mn8c:0394041Start: 184| End:203| Text:''+ New integration'' 42Sources:431. search_internet_cwabyfc5mn8c:0442. search_internet_cwabyfc5mn8c:2454647Start: 244| End:263| Text:'get your API secret' 48Sources:491. search_internet_cwabyfc5mn8c:2505152Start: 280| End:298| Text:'Configuration tab.' 53Sources:541. search_internet_cwabyfc5mn8c:2555657Start: 310| End:351| Text:'keep your API secret just that \u2013 a secret' 58Sources:591. search_internet_cwabyfc5mn8c:2606162Start: 361| End:411| Text:'refresh your secret if you accidentally expose it.' 63Sources:641. search_internet_cwabyfc5mn8c:2656667Start: 434| End:473| Text:'give your integration page permissions.' 68Sources:691. search_internet_cwabyfc5mn8c:2707172Start: 501| End:529| Text:'pick or create a Notion page' 73Sources:741. search_internet_cwabyfc5mn8c:2757677Start: 536| End:599| Text:'click on the ... More menu in the top-right corner of the page.' 78Sources:791. search_internet_cwabyfc5mn8c:2808182Start: 600| End:632| Text:'Scroll down to + Add Connections' 83Sources:841. search_internet_cwabyfc5mn8c:2858687Start: 639| End:681| Text:'search for your integration and select it.' 88Sources:891. search_internet_cwabyfc5mn8c:2909192Start: 702| End:773| Text:'confirm the integration can access the page and all of its child pages.' 93Sources:941. search_internet_cwabyfc5mn8c:2959697Start: 783| End:807| Text:'API requests are failing' 98Sources:991. search_internet_cwabyfc5mn8c:2100101102Start: 820| End:907| Text:'confirm you have given the integration permission to the page you are trying to update.' 103Sources:1041. search_internet_cwabyfc5mn8c:2105106107Start: 922| End:953| Text:'create a Notion API integration' 108Sources:1091. search_internet_cwabyfc5mn8c:1110111112Start: 958| End:994| Text:'get your internal integration token.' 113Sources:1141. search_internet_cwabyfc5mn8c:1115116117Start: 1015| End:1065| Text:'create a .env file and add environmental variables' 118Sources:1191. search_internet_cwabyfc5mn8c:1120121122Start: 1067| End:1094| Text:'get your Notion database ID' 123Sources:1241. search_internet_cwabyfc5mn8c:1125126127Start: 1099| End:1137| Text:'add your integration to your database.' 128Sources:1291. search_internet_cwabyfc5mn8c:1130131132Start: 1223| End:1229| Text:'guide.' 133Sources:1341. search_internet_cwabyfc5mn8c:3\nLet\u2019s ask the agent a final question, this time about tutorials that are relevant for enterprises.\nAgain, the agent uses the context of the query to decide on the most relevant tool. In this case, it selects the search_code_examples tool and provides a response based on the information found.\nPYTHON1messages = run_agent(2 \"Any tutorials that are relevant for enterprises?\"3)\n1QUESTION:2Any tutorials that are relevant for enterprises?3==================================================4TOOL PLAN:5I will search for 'enterprise tutorials' in the code examples and tutorials tool. 67TOOL CALLS:8Tool name: search_code_examples | Parameters: {\"query\":\"enterprise tutorials\"}9==================================================10RESPONSE:11I found a tutorial called 'Advanced Document Parsing For Enterprises'.12==================================================13CITATIONS:1415Start: 26| End:69| Text:''Advanced Document Parsing For Enterprises'' 16Sources:171. search_code_examples_jhh40p32wxpw:4\nSummary\nIn this tutorial, we learned about:\n\nHow to set up tools in an agentic RAG system\nHow to run an agentic RAG workflow\nHow to automatically route queries to the most relevant data sources\n\nHowever, so far we have only seen rather simple queries. In practice, we may run into a complex query that needs to simplified, optimized, or split (etc.) before we can perform the retrieval.\nIn Part 2, we\u2019ll learn how to build an agentic RAG system that can expand user queries into parallel queries.Was this page helpful?YesNoEdit this pagePreviousGenerate Parallel Queries for Better RAG RetrievalNextBuild withDocsv2 APIv2 APIDASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN"}, {"id": "c745bd22-8486-47e8-ae97-a6691d7ef85b", "doc": "Generate Parallel Queries for Better RAG Retrieval | CohereDASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG INGuides and conceptsAPI ReferenceRelease NotesLLMUCookbooksSearch/Ask AIGuides and conceptsAPI ReferenceRelease NotesLLMUCookbooksGet StartedIntroductionInstallationCreating a clientQuickstartPlaygroundFAQsModelsAn Overview of Cohere's ModelsCommandEmbedRerankAyaText GenerationIntroduction to Text Generation at CohereUsing the Chat APIReasoningImage InputsStreaming ResponsesStructured OutputsPredictable OutputsAdvanced Generation ParametersRetrieval Augmented Generation (RAG)Tool UseTokens and TokenizersSummarizing TextSafety ModesEmbeddings (Vectors, Search, Retrieval)Introduction to Embeddings at CohereSemantic Search with EmbeddingsMultimodal EmbeddingsBatch Embedding JobsRerankingGoing to ProductionAPI Keys and Rate LimitsGoing LiveDeprecationsHow Does Cohere's Pricing Work?IntegrationsIntegrating Embedding Models with Other ToolsCohere and LangChainLlamaIndex and CohereDeployment OptionsOverviewSDK CompatibilityPrivate DeploymentCloud AI ServicesTutorialsCookbooksLLM UniversityBuild Things with Cohere!Agentic RAGCohere on AzureResponsible UseSecurityUsage PolicyCommand A Technical ReportCommand R and Command R+ Model CardCohere LabsCohere Labs Acceptable Use PolicyMore ResourcesCohere ToolkitDatasetsImprove Cohere DocsLightOn this pageSetupSetting up the toolsRunning an agentic RAG workflowQuery expansionQuery expansion over multiple data sourcesQuery expansion in multi-turn conversationsSummaryTutorialsAgentic RAGGenerate Parallel Queries for Better RAG RetrievalCopy pageOpen in Colab\nCompare two user queries to a RAG chatbot, \u201cWhat was Apple\u2019s revenue in 2023?\u201d and \u201cWhat were Apple\u2019s and Google\u2019s revenue in 2023?\u201d.\nThe first query is straightforward as we can perform retrieval using pretty much the same query we get.\nBut the second query is more complex. We need to break it down into two separate queries, one for Apple and one for Google.\nThis is an example that requires query expansion. Here, the agentic RAG will need to transform the query into a more optimized set of queries it should use to perform the retrieval.\nIn this part, we\u2019ll learn how to create an agentic RAG system that can perform query expansion and then run those queries in parallel:\n\nQuery expansion\nQuery expansion over multiple data sources\nQuery expansion in multi-turn conversations\n\nWe\u2019ll learn these by building an agent that answers questions about using Cohere.\nSetup\nTo get started, first we need to install the cohere library and create a Cohere client.\nWe also need to import the tool definitions that we\u2019ll use in this tutorial.\n Important: the source code for tool definitions can be found here. Make sure to have the tool_def.py file in the same directory as this notebook for the imports to work correctly. \nPYTHON1! pip install cohere langchain langchain-community pydantic -qq\nPYTHON1import json2import os3import cohere45from tool_def import (6 search_developer_docs,7 search_developer_docs_tool,8 search_internet,9 search_internet_tool,10 search_code_examples,11 search_code_examples_tool,12)1314co = cohere.ClientV2(15 \"COHERE_API_KEY\"16) # Get your free API key: https://dashboard.cohere.com/api-keys1718os.environ[\"TAVILY_API_KEY\"] = (19 \"TAVILY_API_KEY\" # We'll need the Tavily API key to perform internet search. Get your API key: https://app.tavily.com/home20)\nSetting up the tools\nWe set up the same set of tools as in Part 1. If you want further details on how to set up the tools, check out Part 1.\nPYTHON1functions_map = {2 \"search_developer_docs\": search_developer_docs,3 \"search_internet\": search_internet,4 \"search_code_examples\": search_code_examples,5}\nRunning an agentic RAG workflow\nWe create a run_agent function to run the agentic RAG workflow, the same as in Part 1. If you want further details on how to set up the tools, check out Part 1.\nPYTHON1tools = [2 search_developer_docs_tool,3 search_internet_tool,4 search_code_examples_tool,5]\nPYTHON1system_message = \"\"\"## Task and Context2You are an assistant who helps developers use Cohere. You are equipped with a number of tools that can provide different types of information. If you can't find the information you need from one tool, you should try other tools if there is a possibility that they could provide the information you need.\"\"\"\nPYTHON1model = \"command-a-03-2025\"234def run_agent(query, messages=None):5 if messages is None:6 messages = []78 if \"system\" not in {m.get(\"role\") for m in messages}:9 messages.append({\"role\": \"system\", \"content\": system_message})1011 # Step 1: get user message12 print(f\"QUESTION:\\n{query}\")13 print(\"=\" * 50)1415 messages.append({\"role\": \"user\", \"content\": query})1617 # Step 2: Generate tool calls (if any)18 response = co.chat(19 model=model, messages=messages, tools=tools, temperature=0.320 )2122 while response.message.tool_calls:2324 print(\"TOOL PLAN:\")25 print(response.message.tool_plan, \"\\n\")26 print(\"TOOL CALLS:\")27 for tc in response.message.tool_calls:28 print(29 f\"Tool name: {tc.function.name} | Parameters: {tc.function.arguments}\"30 )31 print(\"=\" * 50)3233 messages.append(34 {35 \"role\": \"assistant\",36 \"tool_calls\": response.message.tool_calls,37 \"tool_plan\": response.message.tool_plan,38 }39 )4041 # Step 3: Get tool results42 for tc in response.message.tool_calls:43 tool_result = functions_map[tc.function.name](44 **json.loads(tc.function.arguments)45 )46 tool_content = []47 for data in tool_result:48 tool_content.append(49 {50 \"type\": \"document\",51 \"document\": {\"data\": json.dumps(data)},52 }53 )54 # Optional: add an \"id\" field in the \"document\" object, otherwise IDs are auto-generated55 messages.append(56 {57 \"role\": \"tool\",58 \"tool_call_id\": tc.id,59 \"content\": tool_content,60 }61 )6263 # Step 4: Generate response and citations64 response = co.chat(65 model=model,66 messages=messages,67 tools=tools,68 temperature=0.3,69 )7071 messages.append(72 {73 \"role\": \"assistant\",74 \"content\": response.message.content[0].text,75 }76 )7778 # Print final response79 print(\"RESPONSE:\")80 print(response.message.content[0].text)81 print(\"=\" * 50)8283 # Print citations (if any)84 verbose_source = (85 False # Change to True to display the contents of a source86 )87 if response.message.citations:88 print(\"CITATIONS:\\n\")89 for citation in response.message.citations:90 print(91 f\"Start: {citation.start}| End:{citation.end}| Text:'{citation.text}' \"92 )93 print(\"Sources:\")94 for idx, source in enumerate(citation.sources):95 print(f\"{idx+1}. {source.id}\")96 if verbose_source:97 print(f\"{source.tool_output}\")98 print(\"\\n\")99100 return messages\nQuery expansion\nLet\u2019s ask the agent a few questions, starting with this one about the Chat endpoint and the RAG feature.\nFirstly, the agent rightly chooses the search_developer_docs tool to retrieve the information it needs.\nAdditionally, because the question asks about two different things, retrieving information using the same query as the user\u2019s may not be the optimal approach. Instead, the query needs to be expanded or split into multiple parts, each retrieving its own set of documents.\nThus, the agent expands the original query into two queries.\nThis is enabled by the parallel tool calling feature that comes with the Chat endpoint.\nThis results in a richer and more representative list of documents retrieved, and therefore a more accurate and comprehensive answer.\nPYTHON1messages = run_agent(\"Explain the Chat endpoint and the RAG feature\")\n1QUESTION:2Explain the Chat endpoint and the RAG feature3==================================================4TOOL PLAN:5I will search the Cohere developer documentation for the Chat endpoint and the RAG feature. 67TOOL CALLS:8Tool name: search_developer_docs | Parameters: {\"query\":\"Chat endpoint\"}9Tool name: search_developer_docs | Parameters: {\"query\":\"RAG feature\"}10==================================================11RESPONSE:12The Chat endpoint facilitates a conversational interface, allowing users to send messages to the model and receive text responses.1314Retrieval Augmented Generation (RAG) is a method for generating text using additional information fetched from an external data source, which can greatly increase the accuracy of the response.15==================================================16CITATIONS:1718Start: 18| End:56| Text:'facilitates a conversational interface' 19Sources:201. search_developer_docs_c059cbhr042g:3212. search_developer_docs_beycjq0ejbvx:3222324Start: 58| End:130| Text:'allowing users to send messages to the model and receive text responses.' 25Sources:261. search_developer_docs_c059cbhr042g:3272. search_developer_docs_beycjq0ejbvx:3282930Start: 132| End:162| Text:'Retrieval Augmented Generation' 31Sources:321. search_developer_docs_c059cbhr042g:4332. search_developer_docs_beycjq0ejbvx:4343536Start: 174| End:266| Text:'method for generating text using additional information fetched from an external data source' 37Sources:381. search_developer_docs_c059cbhr042g:4392. search_developer_docs_beycjq0ejbvx:4404142Start: 278| End:324| Text:'greatly increase the accuracy of the response.' 43Sources:441. search_developer_docs_c059cbhr042g:4452. search_developer_docs_beycjq0ejbvx:4\nQuery expansion over multiple data sources\nThe earlier example focused on a single data source, the Cohere developer documentation. However, the agentic RAG can also perform query expansion over multiple data sources.\nHere, the agent is asked a question that contains two parts: first asking for an explanation of the Embed endpoint and then asking for code examples.\nIt correctly identifies that this requires both searching the developer documentation and the code examples. Thus, it generates two queries, one for each data source, and performs two separate searches in parallel.\nIts response then contains information referenced from both data sources.\nPYTHON1messages = run_agent(2 \"What is the Embed endpoint? Give me some code tutorials\"3)\n1QUESTION:2What is the Embed endpoint? Give me some code tutorials3==================================================4TOOL PLAN:5I will search for 'what is the Embed endpoint' and 'Embed endpoint code tutorials' at the same time. 67TOOL CALLS:8Tool name: search_developer_docs | Parameters: {\"query\":\"what is the Embed endpoint\"}9Tool name: search_code_examples | Parameters: {\"query\":\"Embed endpoint code tutorials\"}10==================================================11RESPONSE:12The Embed endpoint returns text embeddings. An embedding is a list of floating point numbers that captures semantic information about the text that it represents.1314I'm afraid I couldn't find any code tutorials for the Embed endpoint.15==================================================16CITATIONS:1718Start: 19| End:43| Text:'returns text embeddings.' 19Sources:201. search_developer_docs_pgzdgqd3k0sd:1212223Start: 62| End:162| Text:'list of floating point numbers that captures semantic information about the text that it represents.' 24Sources:251. search_developer_docs_pgzdgqd3k0sd:1\nQuery expansion in multi-turn conversations\nA RAG chatbot needs to be able to infer the user\u2019s intent for a given query, sometimes based on a vague context.\nThis is especially important in multi-turn conversations, where the user\u2019s intent may not be clear from a single query.\nFor example, in the first turn, a user might ask \u201cWhat is A\u201d and in the second turn, they might ask \u201cCompare that with B and C\u201d. So, the agent needs to be able to infer that the user\u2019s intent is to compare A with B and C.\nLet\u2019s see an example of this. First, note that the run_agent function is already set up to handle multi-turn conversations. It can take messages from the previous conversation turns and append them to the messages list.\nIn the first turn, the user asks about the Chat endpoint, to which the agent duly responds.\nPYTHON1messages = run_agent(\"What is the Chat endpoint?\")\n1QUESTION:2What is the Chat endpoint?3==================================================4TOOL PLAN:5I will search the Cohere developer documentation for 'Chat endpoint'. 67TOOL CALLS:8Tool name: search_developer_docs | Parameters: {\"query\":\"Chat endpoint\"}9==================================================10RESPONSE:11The Chat endpoint facilitates a conversational interface, allowing users to send messages to the model and receive text responses.12==================================================13CITATIONS:1415Start: 18| End:130| Text:'facilitates a conversational interface, allowing users to send messages to the model and receive text responses.' 16Sources:171. search_developer_docs_qx7dht277mg7:3\nIn the second turn, the user asks a question that has two parts: first, how it\u2019s different from RAG, and then, for code examples.\nWe pass the messages from the previous conversation turn to the run_agent function.\nBecause of this, the agent is able to infer that the question is referring to the Chat endpoint even though the user didn\u2019t explicitly mention it.\nThe agent then expands the query into two separate queries, one for the search_code_examples tool and one for the search_developer_docs tool.\nPYTHON1messages = run_agent(2 \"How is it different from RAG? Also any code tutorials?\", messages3)\n1QUESTION:2How is it different from RAG? Also any code tutorials?3==================================================4TOOL PLAN:5I will search the Cohere developer documentation for 'Chat endpoint vs RAG' and 'Chat endpoint code tutorials'. 67TOOL CALLS:8Tool name: search_developer_docs | Parameters: {\"query\":\"Chat endpoint vs RAG\"}9Tool name: search_code_examples | Parameters: {\"query\":\"Chat endpoint\"}10==================================================11RESPONSE:12The Chat endpoint facilitates a conversational interface, allowing users to send messages to the model and receive text responses.1314RAG (Retrieval Augmented Generation) is a method for generating text using additional information fetched from an external data source, which can greatly increase the accuracy of the response.1516I could not find any code tutorials for the Chat endpoint, but I did find a tutorial on RAG with Chat Embed and Rerank via Pinecone.17==================================================18CITATIONS:1920Start: 414| End:458| Text:'RAG with Chat Embed and Rerank via Pinecone.' 21Sources:221. search_code_examples_h8mn6mdqbrc3:2\nSummary\nIn this tutorial, we learned about:\n\nHow query expansion works in an agentic RAG system\nHow query expansion works over multiple data sources\nHow query expansion works in multi-turn conversations\n\nHaving said that, we may encounter even more complex queries than what we\u2019ve seen so far. In particular, some queries require sequential reasoning where the retrieval needs to happen over multiple steps.\nIn Part 3, we\u2019ll learn how the agentic RAG system can perform sequential reasoning.Was this page helpful?YesNoEdit this pagePreviousPerforming Tasks Sequentially with Cohere's RAGNextBuild withDocsv2 APIv2 APIDASHBOARDPLAYGROUNDDOCSCOMMUNITYLOG IN"}]