Skip to content

Commit 2c0dc66

Browse files
committed
Add haystack deep research agent example
1 parent 540408c commit 2c0dc66

File tree

9 files changed

+857
-0
lines changed

9 files changed

+857
-0
lines changed
Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
<!--
2+
SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
3+
SPDX-License-Identifier: Apache-2.0
4+
5+
Licensed under the Apache License, Version 2.0 (the "License");
6+
you may not use this file except in compliance with the License.
7+
You may obtain a copy of the License at
8+
9+
http://www.apache.org/licenses/LICENSE-2.0
10+
11+
Unless required by applicable law or agreed to in writing, software
12+
distributed under the License is distributed on an "AS IS" BASIS,
13+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
14+
See the License for the specific language governing permissions and
15+
limitations under the License.
16+
-->
17+
18+
# Haystack Deep Research Agent
19+
20+
This example demonstrates how to build a comprehensive research agent using Haystack AI that combines web search and Retrieval-Augmented Generation (RAG) capabilities within the AIQ toolkit framework.
21+
22+
## Overview
23+
24+
The Haystack Deep Research Agent is an intelligent research assistant that can:
25+
26+
- **Web Search**: Search the internet for current information using SerperDev API
27+
- **Document Retrieval**: Query an internal document database using RAG with OpenSearch
28+
- **Comprehensive Research**: Combine both sources to provide thorough, well-cited research reports
29+
- **Intelligent Routing**: Automatically decide when to use web search vs. internal documents
30+
31+
## Architecture
32+
33+
The workflow consists of three main components:
34+
35+
1. **Web Search Tool** (`web_search_tool.py`): Uses Haystack's SerperDevWebSearch and LinkContentFetcher to search the web and extract content from web pages
36+
2. **RAG Tool** (`rag_tool.py`): Uses OpenSearchDocumentStore to index and query internal documents with semantic retrieval
37+
3. **Deep Research Agent** (`deep_research_agent.py`): Combines both tools using Haystack's Agent framework with OpenAI for intelligent orchestration
38+
39+
## Prerequisites
40+
41+
Before using this workflow, ensure you have:
42+
43+
1. **OpenAI API Key**: Required for the chat generator and RAG functionality
44+
- Get your key from [OpenAI Platform](https://platform.openai.com/api-keys)
45+
- Set as environment variable: `export OPENAI_API_KEY=your_key_here`
46+
47+
2. **SerperDev API Key**: Required for web search functionality
48+
- Get your key from [SerperDev](https://serper.dev/api-key)
49+
- Set as environment variable: `export SERPERDEV_API_KEY=your_key_here`
50+
51+
3. **OpenSearch Instance**: Required for RAG functionality
52+
- You can run OpenSearch locally using Docker:
53+
```bash
54+
docker run -d --name opensearch -p 9200:9200 -p 9600:9600 \
55+
-e "discovery.type=single-node" \
56+
-e "plugins.security.disabled=true" \
57+
opensearchproject/opensearch:2.11.1
58+
```
59+
60+
## Installation and Usage
61+
62+
Follow the instructions in the [Install Guide](../../../../docs/source/quick-start/installing.md#install-from-source) to create the development environment and install AIQ toolkit.
63+
64+
### Step 1: Set Your API Keys
65+
66+
```bash
67+
export OPENAI_API_KEY=<YOUR_OPENAI_API_KEY>
68+
export SERPERDEV_API_KEY=<YOUR_SERPERDEV_API_KEY>
69+
```
70+
71+
### Step 2: Start OpenSearch (if not already running)
72+
73+
```bash
74+
docker run -d --name opensearch -p 9200:9200 -p 9600:9600 \
75+
-e "discovery.type=single-node" \
76+
-e "plugins.security.disabled=true" \
77+
opensearchproject/opensearch:2.11.1
78+
```
79+
80+
### Step 3: Install the Workflow
81+
82+
```bash
83+
uv pip install -e examples/basic/frameworks/haystack_deep_research_agent
84+
```
85+
86+
### Step 4: Add Sample Documents (Optional)
87+
88+
Place PDF documents in the `data/` directory to enable RAG functionality:
89+
90+
```bash
91+
# Example: Download a sample PDF
92+
wget "https://docs.aws.amazon.com/pdfs/bedrock/latest/userguide/bedrock-ug.pdf" \
93+
-O examples/basic/frameworks/haystack_deep_research_agent/data/bedrock-ug.pdf
94+
```
95+
96+
### Step 5: Run the Workflow
97+
98+
```bash
99+
aiq run --config_file=examples/basic/frameworks/haystack_deep_research_agent/src/aiq_haystack_deep_research_agent/configs/config.yml --input "What are the latest updates on the Artemis moon mission?"
100+
```
101+
102+
## Example Queries
103+
104+
Here are some example queries you can try:
105+
106+
**Web Search Examples:**
107+
```bash
108+
# Current events
109+
aiq run --config_file=examples/basic/frameworks/haystack_deep_research_agent/src/aiq_haystack_deep_research_agent/configs/config.yml --input "What are the latest developments in AI research for 2024?"
110+
111+
# Technology news
112+
aiq run --config_file=examples/basic/frameworks/haystack_deep_research_agent/src/aiq_haystack_deep_research_agent/configs/config.yml --input "What are the new features in the latest Python release?"
113+
```
114+
115+
**RAG Examples (if you have documents indexed):**
116+
```bash
117+
# Document-specific queries
118+
aiq run --config_file=examples/basic/frameworks/haystack_deep_research_agent/src/aiq_haystack_deep_research_agent/configs/config.yml --input "What are the key features of AWS Bedrock?"
119+
120+
# Mixed queries (will use both web search and RAG)
121+
aiq run --config_file=examples/basic/frameworks/haystack_deep_research_agent/src/aiq_haystack_deep_research_agent/configs/config.yml --input "How does AWS Bedrock compare to other AI platforms in 2024?"
122+
```
123+
124+
## Configuration
125+
126+
The workflow is configured via `config.yml`. Key configuration options include:
127+
128+
- **Web Search Tool**:
129+
- `top_k`: Number of search results to retrieve (default: 10)
130+
- `timeout`: Timeout for fetching web content (default: 3 seconds)
131+
- `retry_attempts`: Number of retry attempts for failed requests (default: 2)
132+
133+
- **RAG Tool**:
134+
- `document_store_host`: OpenSearch host URL (default: "http://localhost:9200")
135+
- `index_name`: OpenSearch index name (default: "deep_research_docs")
136+
- `top_k`: Number of documents to retrieve (default: 15)
137+
- `data_directory`: Directory containing PDF documents to index
138+
139+
- **Agent**:
140+
- `max_agent_steps`: Maximum number of agent steps (default: 20)
141+
- `system_prompt`: Customizable system prompt for the agent
142+
143+
## Customization
144+
145+
You can customize the workflow by:
146+
147+
1. **Modifying the system prompt** in `config.yml` to change the agent's behavior
148+
2. **Adding more document types** by extending the RAG tool to support other file formats
149+
3. **Changing the LLM model** by updating the OpenAI model in the configuration
150+
4. **Adjusting search parameters** to optimize for your use case
151+
152+
## Troubleshooting
153+
154+
**Common Issues:**
155+
156+
1. **OpenSearch Connection Error**: Ensure OpenSearch is running and accessible at the configured host
157+
2. **Missing API Keys**: Verify that both OPENAI_API_KEY and SERPERDEV_API_KEY are set
158+
3. **No Documents Found**: Check that PDF files are placed in the data directory and the path is correct
159+
4. **Web Search Fails**: Verify your SerperDev API key is valid and has remaining quota
160+
161+
**Logs**: Check the AIQ logs for detailed error information and debugging.
162+
163+
## Architecture Details
164+
165+
The workflow demonstrates several key AIQ patterns:
166+
167+
- **Function Registration**: Each tool is registered as a function with its own configuration
168+
- **Builder Pattern**: The AIQ Builder is used to create and manage tools and LLMs
169+
- **Component Integration**: Haystack components are wrapped as AIQ functions
170+
- **Error Handling**: Robust error handling with fallback behaviors
171+
- **Async Operations**: All operations are asynchronous for better performance
172+
173+
This example showcases how different AI frameworks (Haystack) can be seamlessly integrated into AIQ workflows while maintaining the flexibility and power of the underlying frameworks.
Lines changed: 39 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,39 @@
1+
AIQ Toolkit: A Comprehensive AI Workflow Framework
2+
3+
The AIQ Toolkit is a powerful framework for building AI-powered workflows and applications. It provides a unified interface for working with multiple AI frameworks including LangChain, LlamaIndex, Haystack, and Semantic Kernel.
4+
5+
Key Features:
6+
7+
1. Multi-Framework Support: AIQ Toolkit seamlessly integrates different AI frameworks, allowing developers to choose the best tool for each task without being locked into a single ecosystem.
8+
9+
2. Builder Pattern: The framework uses a builder pattern that makes it easy to construct complex AI workflows through configuration rather than extensive coding.
10+
11+
3. Function Registration: Functions can be easily registered and reused across different workflows, promoting modularity and code reuse.
12+
13+
4. Tool Integration: The toolkit provides easy integration with various tools including web search, document retrieval, and language models.
14+
15+
5. Agent Support: AIQ supports different types of agents including ReAct agents, reasoning agents, and custom workflow agents.
16+
17+
6. Configuration-Driven: Most workflows can be configured through YAML files, making them easy to modify and deploy.
18+
19+
Architecture:
20+
21+
The AIQ Toolkit follows a modular architecture where:
22+
- Functions represent individual AI capabilities
23+
- Workflows orchestrate multiple functions
24+
- Tools provide external integrations
25+
- Agents handle complex reasoning and tool usage
26+
27+
This architecture allows for flexible composition of AI capabilities while maintaining clean separation of concerns.
28+
29+
Use Cases:
30+
31+
AIQ Toolkit is particularly well-suited for:
32+
- Research applications that need to combine web search with document analysis
33+
- Multi-step reasoning tasks that require different AI capabilities
34+
- Enterprise applications that need to integrate multiple AI services
35+
- Rapid prototyping of AI workflows
36+
37+
Getting Started:
38+
39+
To get started with AIQ Toolkit, install the package and its dependencies, configure your AI services, and define your workflow through the configuration system. The framework handles the complex orchestration automatically.
Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,28 @@
1+
[build-system]
2+
build-backend = "setuptools.build_meta"
3+
requires = ["setuptools >= 64", "setuptools-scm>=8"]
4+
5+
[tool.setuptools_scm]
6+
root = "../../../.."
7+
8+
[project]
9+
name = "aiq_haystack_deep_research_agent"
10+
dynamic = ["version"]
11+
dependencies = [
12+
"aiqtoolkit~=1.2",
13+
"haystack-ai~=2.15.0",
14+
"opensearch-haystack~=4.2.0",
15+
"trafilatura~=2.0.0",
16+
"pypdf~=5.8.0",
17+
"docstring-parser~=0.16",
18+
"openai~=1.94.0",
19+
]
20+
requires-python = ">=3.11,<3.13"
21+
description = "Haystack Deep Research Agent workflow for AIQ toolkit"
22+
classifiers = ["Programming Language :: Python"]
23+
24+
[tool.uv.sources]
25+
aiqtoolkit = { path = "../../../..", editable = true }
26+
27+
[project.entry-points.'aiq.components']
28+
aiq_haystack_deep_research_agent = "aiq_haystack_deep_research_agent.register"
Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# SPDX-FileCopyrightText: Copyright (c) 2025, NVIDIA CORPORATION & AFFILIATES. All rights reserved.
2+
# SPDX-License-Identifier: Apache-2.0
3+
#
4+
# Licensed under the Apache License, Version 2.0 (the "License");
5+
# you may not use this file except in compliance with the License.
6+
# You may obtain a copy of the License at
7+
#
8+
# http://www.apache.org/licenses/LICENSE-2.0
9+
#
10+
# Unless required by applicable law or agreed to in writing, software
11+
# distributed under the License is distributed on an "AS IS" BASIS,
12+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13+
# See the License for the specific language governing permissions and
14+
# limitations under the License.
15+
16+
general:
17+
use_uvloop: true
18+
19+
llms:
20+
openai_llm:
21+
_type: openai
22+
model_name: gpt-4o-mini
23+
temperature: 0.0
24+
max_tokens: 4000
25+
26+
workflow:
27+
_type: haystack_deep_research_agent
28+
llm: openai_llm
29+
system_prompt: |
30+
You are a deep research assistant.
31+
You create comprehensive research reports to answer the user's questions.
32+
You use the 'search' tool to answer any questions by using web search.
33+
You use the 'rag' tool to answer any questions by using retrieval augmented generation on your internal document database.
34+
You perform multiple searches until you have the information you need to answer the question.
35+
Make sure you research different aspects of the question.
36+
Use markdown to format your response.
37+
When you use information from the websearch results, cite your sources using markdown links.
38+
When you use information from the document database, cite the text used from the source document.
39+
It is important that you cite accurately.
40+
max_agent_steps: 20

0 commit comments

Comments
 (0)