feat(web): configurable web search providers with fallback

Add multi-provider web search support: Brave (default), Tavily,
DuckDuckGo, and SearXNG. Falls back to DuckDuckGo when provider
credentials are missing. Providers are dispatched via a map with
register_provider() for plugin extensibility.

- WebSearchConfig with env-var resolution and from_legacy() bridge
- Config migration for legacy flat keys (tavilyApiKey, searxngBaseUrl)
- SearXNG URL validation, explicit error for unknown providers
- ddgs package (replaces deprecated duckduckgo-search)
- 16 tests covering all providers, fallback, env resolution, edge cases
- docs/web-search.md with full config reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Chris Alexander
2026-02-10 13:06:56 +00:00
parent 99b896f5d4
commit 71d90de31b
10 changed files with 722 additions and 97 deletions

View File

@@ -150,7 +150,7 @@ nanobot channels login
> [!TIP]
> Set your API key in `~/.nanobot/config.json`.
> Get API keys: [OpenRouter](https://openrouter.ai/keys) (Global) · [Brave Search](https://brave.com/search/api/) (optional, for web search)
> Get API keys: [OpenRouter](https://openrouter.ai/keys) (Global) · [DashScope](https://dashscope.console.aliyun.com) (Qwen) · [Brave Search](https://brave.com/search/api/) or [Tavily](https://tavily.com/) (optional, for web search). SearXNG is supported via a base URL.
**1. Initialize**
@@ -185,6 +185,21 @@ Add or merge these **two parts** into your config (other options have defaults).
}
```
**Optional: Web search provider** — set `tools.web.search.provider` to `brave` (default), `duckduckgo`, `tavily`, or `searxng`. See [docs/web-search.md](docs/web-search.md) for full configuration.
```json
{
"tools": {
"web": {
"search": {
"provider": "tavily",
"apiKey": "tvly-..."
}
}
}
}
```
**3. Chat**
```bash

95
docs/web-search.md Normal file
View File

@@ -0,0 +1,95 @@
# Web Search Providers
NanoBot supports multiple web search providers. Configure in `~/.nanobot/config.json` under `tools.web.search`.
| Provider | Key | Env var |
|----------|-----|---------|
| `brave` (default) | `apiKey` | `BRAVE_API_KEY` |
| `tavily` | `apiKey` | `TAVILY_API_KEY` |
| `searxng` | `baseUrl` | `SEARXNG_BASE_URL` |
| `duckduckgo` | — | — |
Each provider uses the same `apiKey` field — set the provider and key together. If no provider is specified but `apiKey` is given, Brave is assumed.
When credentials are missing and `fallbackToDuckduckgo` is `true` (the default), searches fall back to DuckDuckGo automatically.
## Examples
**Brave** (default — just set the key):
```json
{
"tools": {
"web": {
"search": {
"apiKey": "BSA..."
}
}
}
}
```
**Tavily:**
```json
{
"tools": {
"web": {
"search": {
"provider": "tavily",
"apiKey": "tvly-..."
}
}
}
}
```
**SearXNG** (self-hosted, no API key needed):
```json
{
"tools": {
"web": {
"search": {
"provider": "searxng",
"baseUrl": "https://searx.example"
}
}
}
}
```
**DuckDuckGo** (no credentials required):
```json
{
"tools": {
"web": {
"search": {
"provider": "duckduckgo"
}
}
}
}
```
## Options
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `provider` | string | `"brave"` | Search backend |
| `apiKey` | string | `""` | API key for the selected provider |
| `baseUrl` | string | `""` | Base URL for SearXNG (appends `/search`) |
| `maxResults` | integer | `5` | Default results per search |
| `fallbackToDuckduckgo` | boolean | `true` | Fall back to DuckDuckGo when credentials are missing |
## Custom providers
Plugins can register additional providers at runtime via the dispatch dict:
```python
async def my_search(query: str, n: int) -> str:
...
tool._provider_dispatch["my-engine"] = my_search
```

View File

@@ -28,7 +28,7 @@ from nanobot.providers.base import LLMProvider
from nanobot.session.manager import Session, SessionManager
if TYPE_CHECKING:
from nanobot.config.schema import ChannelsConfig, ExecToolConfig
from nanobot.config.schema import ChannelsConfig, ExecToolConfig, WebSearchConfig
from nanobot.cron.service import CronService
@@ -57,7 +57,7 @@ class AgentLoop:
max_tokens: int = 4096,
memory_window: int = 100,
reasoning_effort: str | None = None,
brave_api_key: str | None = None,
web_search_config: "WebSearchConfig | None" = None,
web_proxy: str | None = None,
exec_config: ExecToolConfig | None = None,
cron_service: CronService | None = None,
@@ -66,7 +66,9 @@ class AgentLoop:
mcp_servers: dict | None = None,
channels_config: ChannelsConfig | None = None,
):
from nanobot.config.schema import ExecToolConfig
from nanobot.config.schema import ExecToolConfig, WebSearchConfig
from nanobot.cron.service import CronService
self.bus = bus
self.channels_config = channels_config
self.provider = provider
@@ -77,8 +79,8 @@ class AgentLoop:
self.max_tokens = max_tokens
self.memory_window = memory_window
self.reasoning_effort = reasoning_effort
self.brave_api_key = brave_api_key
self.web_proxy = web_proxy
self.web_search_config = web_search_config or WebSearchConfig()
self.exec_config = exec_config or ExecToolConfig()
self.cron_service = cron_service
self.restrict_to_workspace = restrict_to_workspace
@@ -94,7 +96,7 @@ class AgentLoop:
temperature=self.temperature,
max_tokens=self.max_tokens,
reasoning_effort=reasoning_effort,
brave_api_key=brave_api_key,
web_search_config=self.web_search_config,
web_proxy=web_proxy,
exec_config=self.exec_config,
restrict_to_workspace=restrict_to_workspace,
@@ -107,7 +109,9 @@ class AgentLoop:
self._mcp_connecting = False
self._consolidating: set[str] = set() # Session keys with consolidation in progress
self._consolidation_tasks: set[asyncio.Task] = set() # Strong refs to in-flight tasks
self._consolidation_locks: weakref.WeakValueDictionary[str, asyncio.Lock] = weakref.WeakValueDictionary()
self._consolidation_locks: weakref.WeakValueDictionary[str, asyncio.Lock] = (
weakref.WeakValueDictionary()
)
self._active_tasks: dict[str, list[asyncio.Task]] = {} # session_key -> tasks
self._processing_lock = asyncio.Lock()
self._register_default_tools()
@@ -117,13 +121,15 @@ class AgentLoop:
allowed_dir = self.workspace if self.restrict_to_workspace else None
for cls in (ReadFileTool, WriteFileTool, EditFileTool, ListDirTool):
self.tools.register(cls(workspace=self.workspace, allowed_dir=allowed_dir))
self.tools.register(ExecTool(
working_dir=str(self.workspace),
timeout=self.exec_config.timeout,
restrict_to_workspace=self.restrict_to_workspace,
path_append=self.exec_config.path_append,
))
self.tools.register(WebSearchTool(api_key=self.brave_api_key, proxy=self.web_proxy))
self.tools.register(
ExecTool(
working_dir=str(self.workspace),
timeout=self.exec_config.timeout,
restrict_to_workspace=self.restrict_to_workspace,
path_append=self.exec_config.path_append,
)
)
self.tools.register(WebSearchTool(config=self.web_search_config, proxy=self.web_proxy))
self.tools.register(WebFetchTool(proxy=self.web_proxy))
self.tools.register(MessageTool(send_callback=self.bus.publish_outbound))
self.tools.register(SpawnTool(manager=self.subagents))
@@ -136,6 +142,7 @@ class AgentLoop:
return
self._mcp_connecting = True
from nanobot.agent.tools.mcp import connect_mcp_servers
try:
self._mcp_stack = AsyncExitStack()
await self._mcp_stack.__aenter__()
@@ -169,12 +176,14 @@ class AgentLoop:
@staticmethod
def _tool_hint(tool_calls: list) -> str:
"""Format tool calls as concise hint, e.g. 'web_search("query")'."""
def _fmt(tc):
args = (tc.arguments[0] if isinstance(tc.arguments, list) else tc.arguments) or {}
val = next(iter(args.values()), None) if isinstance(args, dict) else None
if not isinstance(val, str):
return tc.name
return f'{tc.name}("{val[:40]}")' if len(val) > 40 else f'{tc.name}("{val}")'
return ", ".join(_fmt(tc) for tc in tool_calls)
async def _run_agent_loop(
@@ -213,13 +222,15 @@ class AgentLoop:
"type": "function",
"function": {
"name": tc.name,
"arguments": json.dumps(tc.arguments, ensure_ascii=False)
}
"arguments": json.dumps(tc.arguments, ensure_ascii=False),
},
}
for tc in response.tool_calls
]
messages = self.context.add_assistant_message(
messages, response.content, tool_call_dicts,
messages,
response.content,
tool_call_dicts,
reasoning_content=response.reasoning_content,
thinking_blocks=response.thinking_blocks,
)
@@ -241,7 +252,9 @@ class AgentLoop:
final_content = clean or "Sorry, I encountered an error calling the AI model."
break
messages = self.context.add_assistant_message(
messages, clean, reasoning_content=response.reasoning_content,
messages,
clean,
reasoning_content=response.reasoning_content,
thinking_blocks=response.thinking_blocks,
)
final_content = clean
@@ -273,7 +286,12 @@ class AgentLoop:
else:
task = asyncio.create_task(self._dispatch(msg))
self._active_tasks.setdefault(msg.session_key, []).append(task)
task.add_done_callback(lambda t, k=msg.session_key: self._active_tasks.get(k, []) and self._active_tasks[k].remove(t) if t in self._active_tasks.get(k, []) else None)
task.add_done_callback(
lambda t, k=msg.session_key: self._active_tasks.get(k, [])
and self._active_tasks[k].remove(t)
if t in self._active_tasks.get(k, [])
else None
)
async def _handle_stop(self, msg: InboundMessage) -> None:
"""Cancel all active tasks and subagents for the session."""
@@ -287,9 +305,13 @@ class AgentLoop:
sub_cancelled = await self.subagents.cancel_by_session(msg.session_key)
total = cancelled + sub_cancelled
content = f"⏹ Stopped {total} task(s)." if total else "No active task to stop."
await self.bus.publish_outbound(OutboundMessage(
channel=msg.channel, chat_id=msg.chat_id, content=content,
))
await self.bus.publish_outbound(
OutboundMessage(
channel=msg.channel,
chat_id=msg.chat_id,
content=content,
)
)
async def _dispatch(self, msg: InboundMessage) -> None:
"""Process a message under the global lock."""
@@ -299,19 +321,26 @@ class AgentLoop:
if response is not None:
await self.bus.publish_outbound(response)
elif msg.channel == "cli":
await self.bus.publish_outbound(OutboundMessage(
channel=msg.channel, chat_id=msg.chat_id,
content="", metadata=msg.metadata or {},
))
await self.bus.publish_outbound(
OutboundMessage(
channel=msg.channel,
chat_id=msg.chat_id,
content="",
metadata=msg.metadata or {},
)
)
except asyncio.CancelledError:
logger.info("Task cancelled for session {}", msg.session_key)
raise
except Exception:
logger.exception("Error processing message for session {}", msg.session_key)
await self.bus.publish_outbound(OutboundMessage(
channel=msg.channel, chat_id=msg.chat_id,
content="Sorry, I encountered an error.",
))
await self.bus.publish_outbound(
OutboundMessage(
channel=msg.channel,
chat_id=msg.chat_id,
content="Sorry, I encountered an error.",
)
)
async def close_mcp(self) -> None:
"""Close MCP connections."""
@@ -336,8 +365,9 @@ class AgentLoop:
"""Process a single inbound message and return the response."""
# System messages: parse origin from chat_id ("channel:chat_id")
if msg.channel == "system":
channel, chat_id = (msg.chat_id.split(":", 1) if ":" in msg.chat_id
else ("cli", msg.chat_id))
channel, chat_id = (
msg.chat_id.split(":", 1) if ":" in msg.chat_id else ("cli", msg.chat_id)
)
logger.info("Processing system message from {}", msg.sender_id)
key = f"{channel}:{chat_id}"
session = self.sessions.get_or_create(key)
@@ -345,13 +375,18 @@ class AgentLoop:
history = session.get_history(max_messages=self.memory_window)
messages = self.context.build_messages(
history=history,
current_message=msg.content, channel=channel, chat_id=chat_id,
current_message=msg.content,
channel=channel,
chat_id=chat_id,
)
final_content, _, all_msgs = await self._run_agent_loop(messages)
self._save_turn(session, all_msgs, 1 + len(history))
self.sessions.save(session)
return OutboundMessage(channel=channel, chat_id=chat_id,
content=final_content or "Background task completed.")
return OutboundMessage(
channel=channel,
chat_id=chat_id,
content=final_content or "Background task completed.",
)
preview = msg.content[:80] + "..." if len(msg.content) > 80 else msg.content
logger.info("Processing message from {}:{}: {}", msg.channel, msg.sender_id, preview)
@@ -366,19 +401,21 @@ class AgentLoop:
self._consolidating.add(session.key)
try:
async with lock:
snapshot = session.messages[session.last_consolidated:]
snapshot = session.messages[session.last_consolidated :]
if snapshot:
temp = Session(key=session.key)
temp.messages = list(snapshot)
if not await self._consolidate_memory(temp, archive_all=True):
return OutboundMessage(
channel=msg.channel, chat_id=msg.chat_id,
channel=msg.channel,
chat_id=msg.chat_id,
content="Memory archival failed, session not cleared. Please try again.",
)
except Exception:
logger.exception("/new archival failed for {}", session.key)
return OutboundMessage(
channel=msg.channel, chat_id=msg.chat_id,
channel=msg.channel,
chat_id=msg.chat_id,
content="Memory archival failed, session not cleared. Please try again.",
)
finally:
@@ -387,14 +424,18 @@ class AgentLoop:
session.clear()
self.sessions.save(session)
self.sessions.invalidate(session.key)
return OutboundMessage(channel=msg.channel, chat_id=msg.chat_id,
content="New session started.")
return OutboundMessage(
channel=msg.channel, chat_id=msg.chat_id, content="New session started."
)
if cmd == "/help":
return OutboundMessage(channel=msg.channel, chat_id=msg.chat_id,
content="🐈 nanobot commands:\n/new — Start a new conversation\n/stop — Stop the current task\n/help — Show available commands")
return OutboundMessage(
channel=msg.channel,
chat_id=msg.chat_id,
content="🐈 nanobot commands:\n/new — Start a new conversation\n/stop — Stop the current task\n/help — Show available commands",
)
unconsolidated = len(session.messages) - session.last_consolidated
if (unconsolidated >= self.memory_window and session.key not in self._consolidating):
if unconsolidated >= self.memory_window and session.key not in self._consolidating:
self._consolidating.add(session.key)
lock = self._consolidation_locks.setdefault(session.key, asyncio.Lock())
@@ -421,19 +462,26 @@ class AgentLoop:
history=history,
current_message=msg.content,
media=msg.media if msg.media else None,
channel=msg.channel, chat_id=msg.chat_id,
channel=msg.channel,
chat_id=msg.chat_id,
)
async def _bus_progress(content: str, *, tool_hint: bool = False) -> None:
meta = dict(msg.metadata or {})
meta["_progress"] = True
meta["_tool_hint"] = tool_hint
await self.bus.publish_outbound(OutboundMessage(
channel=msg.channel, chat_id=msg.chat_id, content=content, metadata=meta,
))
await self.bus.publish_outbound(
OutboundMessage(
channel=msg.channel,
chat_id=msg.chat_id,
content=content,
metadata=meta,
)
)
final_content, _, all_msgs = await self._run_agent_loop(
initial_messages, on_progress=on_progress or _bus_progress,
initial_messages,
on_progress=on_progress or _bus_progress,
)
if final_content is None:
@@ -448,22 +496,31 @@ class AgentLoop:
preview = final_content[:120] + "..." if len(final_content) > 120 else final_content
logger.info("Response to {}:{}: {}", msg.channel, msg.sender_id, preview)
return OutboundMessage(
channel=msg.channel, chat_id=msg.chat_id, content=final_content,
channel=msg.channel,
chat_id=msg.chat_id,
content=final_content,
metadata=msg.metadata or {},
)
def _save_turn(self, session: Session, messages: list[dict], skip: int) -> None:
"""Save new-turn messages into session, truncating large tool results."""
from datetime import datetime
for m in messages[skip:]:
entry = dict(m)
role, content = entry.get("role"), entry.get("content")
if role == "assistant" and not content and not entry.get("tool_calls"):
continue # skip empty assistant messages — they poison session context
if role == "tool" and isinstance(content, str) and len(content) > self._TOOL_RESULT_MAX_CHARS:
entry["content"] = content[:self._TOOL_RESULT_MAX_CHARS] + "\n... (truncated)"
if (
role == "tool"
and isinstance(content, str)
and len(content) > self._TOOL_RESULT_MAX_CHARS
):
entry["content"] = content[: self._TOOL_RESULT_MAX_CHARS] + "\n... (truncated)"
elif role == "user":
if isinstance(content, str) and content.startswith(ContextBuilder._RUNTIME_CONTEXT_TAG):
if isinstance(content, str) and content.startswith(
ContextBuilder._RUNTIME_CONTEXT_TAG
):
# Strip the runtime-context prefix, keep only the user text.
parts = content.split("\n\n", 1)
if len(parts) > 1 and parts[1].strip():
@@ -473,10 +530,15 @@ class AgentLoop:
if isinstance(content, list):
filtered = []
for c in content:
if c.get("type") == "text" and isinstance(c.get("text"), str) and c["text"].startswith(ContextBuilder._RUNTIME_CONTEXT_TAG):
if (
c.get("type") == "text"
and isinstance(c.get("text"), str)
and c["text"].startswith(ContextBuilder._RUNTIME_CONTEXT_TAG)
):
continue # Strip runtime context from multimodal messages
if (c.get("type") == "image_url"
and c.get("image_url", {}).get("url", "").startswith("data:image/")):
if c.get("type") == "image_url" and c.get("image_url", {}).get(
"url", ""
).startswith("data:image/"):
filtered.append({"type": "text", "text": "[image]"})
else:
filtered.append(c)
@@ -490,8 +552,11 @@ class AgentLoop:
async def _consolidate_memory(self, session, archive_all: bool = False) -> bool:
"""Delegate to MemoryStore.consolidate(). Returns True on success."""
return await MemoryStore(self.workspace).consolidate(
session, self.provider, self.model,
archive_all=archive_all, memory_window=self.memory_window,
session,
self.provider,
self.model,
archive_all=archive_all,
memory_window=self.memory_window,
)
async def process_direct(
@@ -505,5 +570,7 @@ class AgentLoop:
"""Process a message directly (for CLI or cron usage)."""
await self._connect_mcp()
msg = InboundMessage(channel=channel, sender_id="user", chat_id=chat_id, content=content)
response = await self._process_message(msg, session_key=session_key, on_progress=on_progress)
response = await self._process_message(
msg, session_key=session_key, on_progress=on_progress
)
return response.content if response else ""

View File

@@ -30,12 +30,12 @@ class SubagentManager:
temperature: float = 0.7,
max_tokens: int = 4096,
reasoning_effort: str | None = None,
brave_api_key: str | None = None,
web_search_config: "WebSearchConfig | None" = None,
web_proxy: str | None = None,
exec_config: "ExecToolConfig | None" = None,
restrict_to_workspace: bool = False,
):
from nanobot.config.schema import ExecToolConfig
from nanobot.config.schema import ExecToolConfig, WebSearchConfig
self.provider = provider
self.workspace = workspace
self.bus = bus
@@ -43,8 +43,8 @@ class SubagentManager:
self.temperature = temperature
self.max_tokens = max_tokens
self.reasoning_effort = reasoning_effort
self.brave_api_key = brave_api_key
self.web_proxy = web_proxy
self.web_search_config = web_search_config or WebSearchConfig()
self.exec_config = exec_config or ExecToolConfig()
self.restrict_to_workspace = restrict_to_workspace
self._running_tasks: dict[str, asyncio.Task[None]] = {}
@@ -106,7 +106,7 @@ class SubagentManager:
restrict_to_workspace=self.restrict_to_workspace,
path_append=self.exec_config.path_append,
))
tools.register(WebSearchTool(api_key=self.brave_api_key, proxy=self.web_proxy))
tools.register(WebSearchTool(config=self.web_search_config, proxy=self.web_proxy))
tools.register(WebFetchTool(proxy=self.web_proxy))
system_prompt = self._build_subagent_prompt()

View File

@@ -1,13 +1,16 @@
"""Web tools: web_search and web_fetch."""
import asyncio
import html
import json
import os
import re
from collections.abc import Awaitable, Callable
from typing import Any
from urllib.parse import urlparse
import httpx
from ddgs import DDGS
from loguru import logger
from nanobot.agent.tools.base import Tool
@@ -44,8 +47,22 @@ def _validate_url(url: str) -> tuple[bool, str]:
return False, str(e)
def _format_results(query: str, items: list[dict[str, Any]], n: int) -> str:
"""Format provider results into a shared plaintext output."""
if not items:
return f"No results for: {query}"
lines = [f"Results for: {query}\n"]
for i, item in enumerate(items[:n], 1):
title = _normalize(_strip_tags(item.get('title', '')))
snippet = _normalize(_strip_tags(item.get('content', '')))
lines.append(f"{i}. {title}\n {item.get('url', '')}")
if snippet:
lines.append(f" {snippet}")
return "\n".join(lines)
class WebSearchTool(Tool):
"""Search the web using Brave Search API."""
"""Search the web using configured provider."""
name = "web_search"
description = "Search the web. Returns titles, URLs, and snippets."
@@ -58,49 +75,133 @@ class WebSearchTool(Tool):
"required": ["query"]
}
def __init__(self, api_key: str | None = None, max_results: int = 5, proxy: str | None = None):
self._init_api_key = api_key
self.max_results = max_results
self.proxy = proxy
def __init__(
self,
config: "WebSearchConfig | None" = None,
transport: httpx.AsyncBaseTransport | None = None,
ddgs_factory: Callable[[], DDGS] | None = None,
proxy: str | None = None,
):
from nanobot.config.schema import WebSearchConfig
@property
def api_key(self) -> str:
"""Resolve API key at call time so env/config changes are picked up."""
return self._init_api_key or os.environ.get("BRAVE_API_KEY", "")
self.config = config if config is not None else WebSearchConfig()
self._transport = transport
self._ddgs_factory = ddgs_factory or (lambda: DDGS(timeout=10))
self.proxy = proxy
self._provider_dispatch: dict[str, Callable[[str, int], Awaitable[str]]] = {
"duckduckgo": self._search_duckduckgo,
"tavily": self._search_tavily,
"searxng": self._search_searxng,
"brave": self._search_brave,
}
async def execute(self, query: str, count: int | None = None, **kwargs: Any) -> str:
if not self.api_key:
return (
"Error: Brave Search API key not configured. Set it in "
"~/.nanobot/config.json under tools.web.search.apiKey "
"(or export BRAVE_API_KEY), then restart the gateway."
)
provider = (self.config.provider or "brave").strip().lower()
n = min(max(count or self.config.max_results, 1), 10)
search = self._provider_dispatch.get(provider)
if search is None:
return f"Error: unknown search provider '{provider}'"
return await search(query, n)
async def _fallback_to_duckduckgo(self, missing_key: str, query: str, n: int) -> str:
logger.warning("Falling back to DuckDuckGo: {} not configured", missing_key)
ddg = await self._search_duckduckgo(query=query, n=n)
if ddg.startswith('Error:'):
return ddg
return f'Using DuckDuckGo fallback ({missing_key} missing).\n\n{ddg}'
async def _search_brave(self, query: str, n: int) -> str:
api_key = self.config.api_key or os.environ.get("BRAVE_API_KEY", "")
if not api_key:
if self.config.fallback_to_duckduckgo:
return await self._fallback_to_duckduckgo('BRAVE_API_KEY', query, n)
return "Error: BRAVE_API_KEY not configured"
try:
n = min(max(count or self.max_results, 1), 10)
logger.debug("WebSearch: {}", "proxy enabled" if self.proxy else "direct connection")
async with httpx.AsyncClient(proxy=self.proxy) as client:
async with httpx.AsyncClient(transport=self._transport, proxy=self.proxy) as client:
r = await client.get(
"https://api.search.brave.com/res/v1/web/search",
params={"q": query, "count": n},
headers={"Accept": "application/json", "X-Subscription-Token": self.api_key},
timeout=10.0
headers={"Accept": "application/json", "X-Subscription-Token": api_key},
timeout=10.0,
)
r.raise_for_status()
results = r.json().get("web", {}).get("results", [])[:n]
if not results:
items = [{"title": x.get("title", ""), "url": x.get("url", ""),
"content": x.get("description", "")}
for x in r.json().get("web", {}).get("results", [])]
return _format_results(query, items, n)
except Exception as e:
return f"Error: {e}"
async def _search_tavily(self, query: str, n: int) -> str:
api_key = self.config.api_key or os.environ.get("TAVILY_API_KEY", "")
if not api_key:
if self.config.fallback_to_duckduckgo:
return await self._fallback_to_duckduckgo('TAVILY_API_KEY', query, n)
return "Error: TAVILY_API_KEY not configured"
try:
async with httpx.AsyncClient(transport=self._transport, proxy=self.proxy) as client:
r = await client.post(
"https://api.tavily.com/search",
headers={"Authorization": f"Bearer {api_key}"},
json={"query": query, "max_results": n},
timeout=15.0,
)
r.raise_for_status()
results = r.json().get("results", [])
return _format_results(query, results, n)
except Exception as e:
return f"Error: {e}"
async def _search_duckduckgo(self, query: str, n: int) -> str:
try:
ddgs = self._ddgs_factory()
raw_results = await asyncio.to_thread(ddgs.text, query, max_results=n)
if not raw_results:
return f"No results for: {query}"
lines = [f"Results for: {query}\n"]
for i, item in enumerate(results, 1):
lines.append(f"{i}. {item.get('title', '')}\n {item.get('url', '')}")
if desc := item.get("description"):
lines.append(f" {desc}")
return "\n".join(lines)
except httpx.ProxyError as e:
logger.error("WebSearch proxy error: {}", e)
return f"Proxy error: {e}"
items = [
{
"title": result.get("title", ""),
"url": result.get("href", ""),
"content": result.get("body", ""),
}
for result in raw_results
]
return _format_results(query, items, n)
except Exception as e:
logger.warning("DuckDuckGo search failed: {}", e)
return f"Error: DuckDuckGo search failed ({e})"
async def _search_searxng(self, query: str, n: int) -> str:
base_url = (self.config.base_url or os.environ.get("SEARXNG_BASE_URL", "")).strip()
if not base_url:
if self.config.fallback_to_duckduckgo:
return await self._fallback_to_duckduckgo('SEARXNG_BASE_URL', query, n)
return "Error: SEARXNG_BASE_URL not configured"
endpoint = f"{base_url.rstrip('/')}/search"
is_valid, error_msg = _validate_url(endpoint)
if not is_valid:
return f"Error: invalid SearXNG URL: {error_msg}"
try:
async with httpx.AsyncClient(transport=self._transport, proxy=self.proxy) as client:
r = await client.get(
endpoint,
params={"q": query, "format": "json"},
headers={"User-Agent": USER_AGENT},
timeout=10.0,
)
r.raise_for_status()
results = r.json().get("results", [])
return _format_results(query, results, n)
except Exception as e:
logger.error("WebSearch error: {}", e)
return f"Error: {e}"
@@ -157,7 +258,8 @@ class WebFetchTool(Tool):
text, extractor = r.text, "raw"
truncated = len(text) > max_chars
if truncated: text = text[:max_chars]
if truncated:
text = text[:max_chars]
return json.dumps({"url": url, "finalUrl": str(r.url), "status": r.status_code,
"extractor": extractor, "truncated": truncated, "length": len(text), "text": text}, ensure_ascii=False)

View File

@@ -332,7 +332,7 @@ def gateway(
max_iterations=config.agents.defaults.max_tool_iterations,
memory_window=config.agents.defaults.memory_window,
reasoning_effort=config.agents.defaults.reasoning_effort,
brave_api_key=config.tools.web.search.api_key or None,
web_search_config=config.tools.web.search,
web_proxy=config.tools.web.proxy or None,
exec_config=config.tools.exec,
cron_service=cron,
@@ -517,7 +517,7 @@ def agent(
max_iterations=config.agents.defaults.max_tool_iterations,
memory_window=config.agents.defaults.memory_window,
reasoning_effort=config.agents.defaults.reasoning_effort,
brave_api_key=config.tools.web.search.api_key or None,
web_search_config=config.tools.web.search,
web_proxy=config.tools.web.proxy or None,
exec_config=config.tools.exec,
cron_service=cron,

View File

@@ -288,7 +288,10 @@ class GatewayConfig(Base):
class WebSearchConfig(Base):
"""Web search tool configuration."""
api_key: str = "" # Brave Search API key
provider: str = "" # brave, tavily, searxng, duckduckgo (empty = brave)
api_key: str = "" # API key for selected provider
base_url: str = "" # Base URL (SearXNG)
fallback_to_duckduckgo: bool = True
max_results: int = 5

View File

@@ -24,6 +24,7 @@ dependencies = [
"websockets>=16.0,<17.0",
"websocket-client>=1.9.0,<2.0.0",
"httpx>=0.28.0,<1.0.0",
"ddgs>=9.5.5,<10.0.0",
"oauth-cli-kit>=0.1.3,<1.0.0",
"loguru>=0.7.3,<1.0.0",
"readability-lxml>=0.8.4,<1.0.0",

View File

@@ -1,8 +1,10 @@
from typing import Any
from nanobot.agent.tools.web import WebSearchTool
from nanobot.agent.tools.base import Tool
from nanobot.agent.tools.registry import ToolRegistry
from nanobot.agent.tools.shell import ExecTool
from nanobot.config.schema import WebSearchConfig
class SampleTool(Tool):
@@ -337,3 +339,16 @@ def test_cast_params_single_value_not_auto_wrapped_to_array() -> None:
assert result["items"] == 5 # Not wrapped to [5]
result = tool.cast_params({"items": "text"})
assert result["items"] == "text" # Not wrapped to ["text"]
async def test_web_search_no_fallback_returns_provider_error() -> None:
tool = WebSearchTool(
config=WebSearchConfig(
provider="brave",
api_key="",
fallback_to_duckduckgo=False,
)
)
result = await tool.execute(query="fallback", count=1)
assert result == "Error: BRAVE_API_KEY not configured"

View File

@@ -0,0 +1,327 @@
import httpx
import pytest
from collections.abc import Callable
from typing import Literal
from nanobot.agent.tools.web import WebSearchTool
from nanobot.config.schema import WebSearchConfig
def _tool(config: WebSearchConfig, handler) -> WebSearchTool:
return WebSearchTool(config=config, transport=httpx.MockTransport(handler))
def _assert_tavily_request(request: httpx.Request) -> bool:
return (
request.method == "POST"
and str(request.url) == "https://api.tavily.com/search"
and request.headers.get("authorization") == "Bearer tavily-key"
and '"query":"openclaw"' in request.read().decode("utf-8")
)
@pytest.mark.asyncio
@pytest.mark.parametrize(
("provider", "config_kwargs", "query", "count", "assert_request", "response", "assert_text"),
[
(
"brave",
{"api_key": "brave-key"},
"nanobot",
1,
lambda request: (
request.method == "GET"
and str(request.url)
== "https://api.search.brave.com/res/v1/web/search?q=nanobot&count=1"
and request.headers["X-Subscription-Token"] == "brave-key"
),
httpx.Response(
200,
json={
"web": {
"results": [
{
"title": "NanoBot",
"url": "https://example.com/nanobot",
"description": "Ultra-lightweight assistant",
}
]
}
},
),
["Results for: nanobot", "1. NanoBot", "https://example.com/nanobot"],
),
(
"tavily",
{"api_key": "tavily-key"},
"openclaw",
2,
_assert_tavily_request,
httpx.Response(
200,
json={
"results": [
{
"title": "OpenClaw",
"url": "https://example.com/openclaw",
"content": "Plugin-based assistant framework",
}
]
},
),
["Results for: openclaw", "1. OpenClaw", "https://example.com/openclaw"],
),
(
"searxng",
{"base_url": "https://searx.example"},
"nanobot",
1,
lambda request: (
request.method == "GET"
and str(request.url) == "https://searx.example/search?q=nanobot&format=json"
),
httpx.Response(
200,
json={
"results": [
{
"title": "nanobot docs",
"url": "https://example.com/nanobot",
"content": "Lightweight assistant docs",
}
]
},
),
["Results for: nanobot", "1. nanobot docs", "https://example.com/nanobot"],
),
],
)
async def test_web_search_provider_formats_results(
provider: Literal["brave", "tavily", "searxng"],
config_kwargs: dict,
query: str,
count: int,
assert_request: Callable[[httpx.Request], bool],
response: httpx.Response,
assert_text: list[str],
) -> None:
def handler(request: httpx.Request) -> httpx.Response:
assert assert_request(request)
return response
tool = _tool(WebSearchConfig(provider=provider, max_results=5, **config_kwargs), handler)
result = await tool.execute(query=query, count=count)
for text in assert_text:
assert text in result
@pytest.mark.asyncio
async def test_web_search_from_legacy_config_works() -> None:
def handler(request: httpx.Request) -> httpx.Response:
return httpx.Response(
200,
json={
"web": {
"results": [
{"title": "Legacy", "url": "https://example.com", "description": "ok"}
]
}
},
)
config = WebSearchConfig(api_key="legacy-key", max_results=3)
tool = WebSearchTool(config=config, transport=httpx.MockTransport(handler))
result = await tool.execute(query="constructor", count=1)
assert "1. Legacy" in result
@pytest.mark.asyncio
@pytest.mark.parametrize(
("provider", "config", "missing_env", "expected_title"),
[
(
"brave",
WebSearchConfig(provider="brave", api_key="", max_results=5),
"BRAVE_API_KEY",
"Fallback Result",
),
(
"tavily",
WebSearchConfig(provider="tavily", api_key="", max_results=5),
"TAVILY_API_KEY",
"Tavily Fallback",
),
],
)
async def test_web_search_missing_key_falls_back_to_duckduckgo(
monkeypatch: pytest.MonkeyPatch,
provider: str,
config: WebSearchConfig,
missing_env: str,
expected_title: str,
) -> None:
monkeypatch.delenv(missing_env, raising=False)
called = False
class FakeDDGS:
def __init__(self, *args, **kwargs):
pass
def text(self, keywords: str, max_results: int):
nonlocal called
called = True
return [
{
"title": expected_title,
"href": f"https://example.com/{provider}-fallback",
"body": "Fallback snippet",
}
]
monkeypatch.setattr("nanobot.agent.tools.web.DDGS", FakeDDGS, raising=False)
result = await WebSearchTool(config=config).execute(query="fallback", count=1)
assert called
assert "Using DuckDuckGo fallback" in result
assert f"1. {expected_title}" in result
@pytest.mark.asyncio
async def test_web_search_brave_missing_key_without_fallback_returns_error(
monkeypatch: pytest.MonkeyPatch,
) -> None:
monkeypatch.delenv("BRAVE_API_KEY", raising=False)
tool = WebSearchTool(
config=WebSearchConfig(
provider="brave",
api_key="",
fallback_to_duckduckgo=False,
)
)
result = await tool.execute(query="fallback", count=1)
assert result == "Error: BRAVE_API_KEY not configured"
@pytest.mark.asyncio
async def test_web_search_searxng_missing_base_url_falls_back_to_duckduckgo() -> None:
tool = WebSearchTool(
config=WebSearchConfig(provider="searxng", base_url="", max_results=5)
)
result = await tool.execute(query="nanobot", count=1)
assert "DuckDuckGo fallback" in result
assert "SEARXNG_BASE_URL" in result
@pytest.mark.asyncio
async def test_web_search_searxng_missing_base_url_no_fallback_returns_error() -> None:
tool = WebSearchTool(
config=WebSearchConfig(
provider="searxng", base_url="",
fallback_to_duckduckgo=False, max_results=5,
)
)
result = await tool.execute(query="nanobot", count=1)
assert result == "Error: SEARXNG_BASE_URL not configured"
@pytest.mark.asyncio
async def test_web_search_searxng_uses_env_base_url(
monkeypatch: pytest.MonkeyPatch,
) -> None:
monkeypatch.setenv("SEARXNG_BASE_URL", "https://searx.env")
def handler(request: httpx.Request) -> httpx.Response:
assert request.method == "GET"
assert str(request.url) == "https://searx.env/search?q=nanobot&format=json"
return httpx.Response(
200,
json={
"results": [
{
"title": "env result",
"url": "https://example.com/env",
"content": "from env",
}
]
},
)
config = WebSearchConfig(provider="searxng", base_url="", max_results=5)
result = await _tool(config, handler).execute(query="nanobot", count=1)
assert "1. env result" in result
@pytest.mark.asyncio
async def test_web_search_register_custom_provider() -> None:
config = WebSearchConfig(provider="custom", max_results=5)
tool = WebSearchTool(config=config)
async def _custom_provider(query: str, n: int) -> str:
return f"custom:{query}:{n}"
tool._provider_dispatch["custom"] = _custom_provider
result = await tool.execute(query="nanobot", count=2)
assert result == "custom:nanobot:2"
@pytest.mark.asyncio
async def test_web_search_duckduckgo_uses_injected_ddgs_factory() -> None:
class FakeDDGS:
def text(self, keywords: str, max_results: int):
assert keywords == "nanobot"
assert max_results == 1
return [
{
"title": "NanoBot result",
"href": "https://example.com/nanobot",
"body": "Search content",
}
]
tool = WebSearchTool(
config=WebSearchConfig(provider="duckduckgo", max_results=5),
ddgs_factory=lambda: FakeDDGS(),
)
result = await tool.execute(query="nanobot", count=1)
assert "1. NanoBot result" in result
@pytest.mark.asyncio
async def test_web_search_unknown_provider_returns_error() -> None:
tool = WebSearchTool(
config=WebSearchConfig(provider="google", max_results=5),
)
result = await tool.execute(query="nanobot", count=1)
assert result == "Error: unknown search provider 'google'"
@pytest.mark.asyncio
async def test_web_search_dispatch_dict_overwrites_builtin() -> None:
async def _custom_brave(query: str, n: int) -> str:
return f"custom-brave:{query}:{n}"
tool = WebSearchTool(
config=WebSearchConfig(provider="brave", api_key="key", max_results=5),
)
tool._provider_dispatch["brave"] = _custom_brave
result = await tool.execute(query="nanobot", count=2)
assert result == "custom-brave:nanobot:2"
@pytest.mark.asyncio
async def test_web_search_searxng_rejects_invalid_url() -> None:
tool = WebSearchTool(
config=WebSearchConfig(
provider="searxng",
base_url="ftp://internal.host",
max_results=5,
),
)
result = await tool.execute(query="nanobot", count=1)
assert "Error: invalid SearXNG URL" in result