feat(web): configurable web search providers with fallback

Add multi-provider web search support: Brave (default), Tavily,
DuckDuckGo, and SearXNG. Falls back to DuckDuckGo when provider
credentials are missing. Providers are dispatched via a map with
register_provider() for plugin extensibility.

- WebSearchConfig with env-var resolution and from_legacy() bridge
- Config migration for legacy flat keys (tavilyApiKey, searxngBaseUrl)
- SearXNG URL validation, explicit error for unknown providers
- ddgs package (replaces deprecated duckduckgo-search)
- 16 tests covering all providers, fallback, env resolution, edge cases
- docs/web-search.md with full config reference

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
Chris Alexander
2026-02-10 13:06:56 +00:00
parent 99b896f5d4
commit 71d90de31b
10 changed files with 722 additions and 97 deletions

View File

@@ -150,7 +150,7 @@ nanobot channels login
> [!TIP] > [!TIP]
> Set your API key in `~/.nanobot/config.json`. > Set your API key in `~/.nanobot/config.json`.
> Get API keys: [OpenRouter](https://openrouter.ai/keys) (Global) · [Brave Search](https://brave.com/search/api/) (optional, for web search) > Get API keys: [OpenRouter](https://openrouter.ai/keys) (Global) · [DashScope](https://dashscope.console.aliyun.com) (Qwen) · [Brave Search](https://brave.com/search/api/) or [Tavily](https://tavily.com/) (optional, for web search). SearXNG is supported via a base URL.
**1. Initialize** **1. Initialize**
@@ -185,6 +185,21 @@ Add or merge these **two parts** into your config (other options have defaults).
} }
``` ```
**Optional: Web search provider** — set `tools.web.search.provider` to `brave` (default), `duckduckgo`, `tavily`, or `searxng`. See [docs/web-search.md](docs/web-search.md) for full configuration.
```json
{
"tools": {
"web": {
"search": {
"provider": "tavily",
"apiKey": "tvly-..."
}
}
}
}
```
**3. Chat** **3. Chat**
```bash ```bash

95
docs/web-search.md Normal file
View File

@@ -0,0 +1,95 @@
# Web Search Providers
NanoBot supports multiple web search providers. Configure in `~/.nanobot/config.json` under `tools.web.search`.
| Provider | Key | Env var |
|----------|-----|---------|
| `brave` (default) | `apiKey` | `BRAVE_API_KEY` |
| `tavily` | `apiKey` | `TAVILY_API_KEY` |
| `searxng` | `baseUrl` | `SEARXNG_BASE_URL` |
| `duckduckgo` | — | — |
Each provider uses the same `apiKey` field — set the provider and key together. If no provider is specified but `apiKey` is given, Brave is assumed.
When credentials are missing and `fallbackToDuckduckgo` is `true` (the default), searches fall back to DuckDuckGo automatically.
## Examples
**Brave** (default — just set the key):
```json
{
"tools": {
"web": {
"search": {
"apiKey": "BSA..."
}
}
}
}
```
**Tavily:**
```json
{
"tools": {
"web": {
"search": {
"provider": "tavily",
"apiKey": "tvly-..."
}
}
}
}
```
**SearXNG** (self-hosted, no API key needed):
```json
{
"tools": {
"web": {
"search": {
"provider": "searxng",
"baseUrl": "https://searx.example"
}
}
}
}
```
**DuckDuckGo** (no credentials required):
```json
{
"tools": {
"web": {
"search": {
"provider": "duckduckgo"
}
}
}
}
```
## Options
| Key | Type | Default | Description |
|-----|------|---------|-------------|
| `provider` | string | `"brave"` | Search backend |
| `apiKey` | string | `""` | API key for the selected provider |
| `baseUrl` | string | `""` | Base URL for SearXNG (appends `/search`) |
| `maxResults` | integer | `5` | Default results per search |
| `fallbackToDuckduckgo` | boolean | `true` | Fall back to DuckDuckGo when credentials are missing |
## Custom providers
Plugins can register additional providers at runtime via the dispatch dict:
```python
async def my_search(query: str, n: int) -> str:
...
tool._provider_dispatch["my-engine"] = my_search
```

View File

@@ -28,7 +28,7 @@ from nanobot.providers.base import LLMProvider
from nanobot.session.manager import Session, SessionManager from nanobot.session.manager import Session, SessionManager
if TYPE_CHECKING: if TYPE_CHECKING:
from nanobot.config.schema import ChannelsConfig, ExecToolConfig from nanobot.config.schema import ChannelsConfig, ExecToolConfig, WebSearchConfig
from nanobot.cron.service import CronService from nanobot.cron.service import CronService
@@ -57,7 +57,7 @@ class AgentLoop:
max_tokens: int = 4096, max_tokens: int = 4096,
memory_window: int = 100, memory_window: int = 100,
reasoning_effort: str | None = None, reasoning_effort: str | None = None,
brave_api_key: str | None = None, web_search_config: "WebSearchConfig | None" = None,
web_proxy: str | None = None, web_proxy: str | None = None,
exec_config: ExecToolConfig | None = None, exec_config: ExecToolConfig | None = None,
cron_service: CronService | None = None, cron_service: CronService | None = None,
@@ -66,7 +66,9 @@ class AgentLoop:
mcp_servers: dict | None = None, mcp_servers: dict | None = None,
channels_config: ChannelsConfig | None = None, channels_config: ChannelsConfig | None = None,
): ):
from nanobot.config.schema import ExecToolConfig from nanobot.config.schema import ExecToolConfig, WebSearchConfig
from nanobot.cron.service import CronService
self.bus = bus self.bus = bus
self.channels_config = channels_config self.channels_config = channels_config
self.provider = provider self.provider = provider
@@ -77,8 +79,8 @@ class AgentLoop:
self.max_tokens = max_tokens self.max_tokens = max_tokens
self.memory_window = memory_window self.memory_window = memory_window
self.reasoning_effort = reasoning_effort self.reasoning_effort = reasoning_effort
self.brave_api_key = brave_api_key
self.web_proxy = web_proxy self.web_proxy = web_proxy
self.web_search_config = web_search_config or WebSearchConfig()
self.exec_config = exec_config or ExecToolConfig() self.exec_config = exec_config or ExecToolConfig()
self.cron_service = cron_service self.cron_service = cron_service
self.restrict_to_workspace = restrict_to_workspace self.restrict_to_workspace = restrict_to_workspace
@@ -94,7 +96,7 @@ class AgentLoop:
temperature=self.temperature, temperature=self.temperature,
max_tokens=self.max_tokens, max_tokens=self.max_tokens,
reasoning_effort=reasoning_effort, reasoning_effort=reasoning_effort,
brave_api_key=brave_api_key, web_search_config=self.web_search_config,
web_proxy=web_proxy, web_proxy=web_proxy,
exec_config=self.exec_config, exec_config=self.exec_config,
restrict_to_workspace=restrict_to_workspace, restrict_to_workspace=restrict_to_workspace,
@@ -107,7 +109,9 @@ class AgentLoop:
self._mcp_connecting = False self._mcp_connecting = False
self._consolidating: set[str] = set() # Session keys with consolidation in progress self._consolidating: set[str] = set() # Session keys with consolidation in progress
self._consolidation_tasks: set[asyncio.Task] = set() # Strong refs to in-flight tasks self._consolidation_tasks: set[asyncio.Task] = set() # Strong refs to in-flight tasks
self._consolidation_locks: weakref.WeakValueDictionary[str, asyncio.Lock] = weakref.WeakValueDictionary() self._consolidation_locks: weakref.WeakValueDictionary[str, asyncio.Lock] = (
weakref.WeakValueDictionary()
)
self._active_tasks: dict[str, list[asyncio.Task]] = {} # session_key -> tasks self._active_tasks: dict[str, list[asyncio.Task]] = {} # session_key -> tasks
self._processing_lock = asyncio.Lock() self._processing_lock = asyncio.Lock()
self._register_default_tools() self._register_default_tools()
@@ -117,13 +121,15 @@ class AgentLoop:
allowed_dir = self.workspace if self.restrict_to_workspace else None allowed_dir = self.workspace if self.restrict_to_workspace else None
for cls in (ReadFileTool, WriteFileTool, EditFileTool, ListDirTool): for cls in (ReadFileTool, WriteFileTool, EditFileTool, ListDirTool):
self.tools.register(cls(workspace=self.workspace, allowed_dir=allowed_dir)) self.tools.register(cls(workspace=self.workspace, allowed_dir=allowed_dir))
self.tools.register(ExecTool( self.tools.register(
ExecTool(
working_dir=str(self.workspace), working_dir=str(self.workspace),
timeout=self.exec_config.timeout, timeout=self.exec_config.timeout,
restrict_to_workspace=self.restrict_to_workspace, restrict_to_workspace=self.restrict_to_workspace,
path_append=self.exec_config.path_append, path_append=self.exec_config.path_append,
)) )
self.tools.register(WebSearchTool(api_key=self.brave_api_key, proxy=self.web_proxy)) )
self.tools.register(WebSearchTool(config=self.web_search_config, proxy=self.web_proxy))
self.tools.register(WebFetchTool(proxy=self.web_proxy)) self.tools.register(WebFetchTool(proxy=self.web_proxy))
self.tools.register(MessageTool(send_callback=self.bus.publish_outbound)) self.tools.register(MessageTool(send_callback=self.bus.publish_outbound))
self.tools.register(SpawnTool(manager=self.subagents)) self.tools.register(SpawnTool(manager=self.subagents))
@@ -136,6 +142,7 @@ class AgentLoop:
return return
self._mcp_connecting = True self._mcp_connecting = True
from nanobot.agent.tools.mcp import connect_mcp_servers from nanobot.agent.tools.mcp import connect_mcp_servers
try: try:
self._mcp_stack = AsyncExitStack() self._mcp_stack = AsyncExitStack()
await self._mcp_stack.__aenter__() await self._mcp_stack.__aenter__()
@@ -169,12 +176,14 @@ class AgentLoop:
@staticmethod @staticmethod
def _tool_hint(tool_calls: list) -> str: def _tool_hint(tool_calls: list) -> str:
"""Format tool calls as concise hint, e.g. 'web_search("query")'.""" """Format tool calls as concise hint, e.g. 'web_search("query")'."""
def _fmt(tc): def _fmt(tc):
args = (tc.arguments[0] if isinstance(tc.arguments, list) else tc.arguments) or {} args = (tc.arguments[0] if isinstance(tc.arguments, list) else tc.arguments) or {}
val = next(iter(args.values()), None) if isinstance(args, dict) else None val = next(iter(args.values()), None) if isinstance(args, dict) else None
if not isinstance(val, str): if not isinstance(val, str):
return tc.name return tc.name
return f'{tc.name}("{val[:40]}")' if len(val) > 40 else f'{tc.name}("{val}")' return f'{tc.name}("{val[:40]}")' if len(val) > 40 else f'{tc.name}("{val}")'
return ", ".join(_fmt(tc) for tc in tool_calls) return ", ".join(_fmt(tc) for tc in tool_calls)
async def _run_agent_loop( async def _run_agent_loop(
@@ -213,13 +222,15 @@ class AgentLoop:
"type": "function", "type": "function",
"function": { "function": {
"name": tc.name, "name": tc.name,
"arguments": json.dumps(tc.arguments, ensure_ascii=False) "arguments": json.dumps(tc.arguments, ensure_ascii=False),
} },
} }
for tc in response.tool_calls for tc in response.tool_calls
] ]
messages = self.context.add_assistant_message( messages = self.context.add_assistant_message(
messages, response.content, tool_call_dicts, messages,
response.content,
tool_call_dicts,
reasoning_content=response.reasoning_content, reasoning_content=response.reasoning_content,
thinking_blocks=response.thinking_blocks, thinking_blocks=response.thinking_blocks,
) )
@@ -241,7 +252,9 @@ class AgentLoop:
final_content = clean or "Sorry, I encountered an error calling the AI model." final_content = clean or "Sorry, I encountered an error calling the AI model."
break break
messages = self.context.add_assistant_message( messages = self.context.add_assistant_message(
messages, clean, reasoning_content=response.reasoning_content, messages,
clean,
reasoning_content=response.reasoning_content,
thinking_blocks=response.thinking_blocks, thinking_blocks=response.thinking_blocks,
) )
final_content = clean final_content = clean
@@ -273,7 +286,12 @@ class AgentLoop:
else: else:
task = asyncio.create_task(self._dispatch(msg)) task = asyncio.create_task(self._dispatch(msg))
self._active_tasks.setdefault(msg.session_key, []).append(task) self._active_tasks.setdefault(msg.session_key, []).append(task)
task.add_done_callback(lambda t, k=msg.session_key: self._active_tasks.get(k, []) and self._active_tasks[k].remove(t) if t in self._active_tasks.get(k, []) else None) task.add_done_callback(
lambda t, k=msg.session_key: self._active_tasks.get(k, [])
and self._active_tasks[k].remove(t)
if t in self._active_tasks.get(k, [])
else None
)
async def _handle_stop(self, msg: InboundMessage) -> None: async def _handle_stop(self, msg: InboundMessage) -> None:
"""Cancel all active tasks and subagents for the session.""" """Cancel all active tasks and subagents for the session."""
@@ -287,9 +305,13 @@ class AgentLoop:
sub_cancelled = await self.subagents.cancel_by_session(msg.session_key) sub_cancelled = await self.subagents.cancel_by_session(msg.session_key)
total = cancelled + sub_cancelled total = cancelled + sub_cancelled
content = f"⏹ Stopped {total} task(s)." if total else "No active task to stop." content = f"⏹ Stopped {total} task(s)." if total else "No active task to stop."
await self.bus.publish_outbound(OutboundMessage( await self.bus.publish_outbound(
channel=msg.channel, chat_id=msg.chat_id, content=content, OutboundMessage(
)) channel=msg.channel,
chat_id=msg.chat_id,
content=content,
)
)
async def _dispatch(self, msg: InboundMessage) -> None: async def _dispatch(self, msg: InboundMessage) -> None:
"""Process a message under the global lock.""" """Process a message under the global lock."""
@@ -299,19 +321,26 @@ class AgentLoop:
if response is not None: if response is not None:
await self.bus.publish_outbound(response) await self.bus.publish_outbound(response)
elif msg.channel == "cli": elif msg.channel == "cli":
await self.bus.publish_outbound(OutboundMessage( await self.bus.publish_outbound(
channel=msg.channel, chat_id=msg.chat_id, OutboundMessage(
content="", metadata=msg.metadata or {}, channel=msg.channel,
)) chat_id=msg.chat_id,
content="",
metadata=msg.metadata or {},
)
)
except asyncio.CancelledError: except asyncio.CancelledError:
logger.info("Task cancelled for session {}", msg.session_key) logger.info("Task cancelled for session {}", msg.session_key)
raise raise
except Exception: except Exception:
logger.exception("Error processing message for session {}", msg.session_key) logger.exception("Error processing message for session {}", msg.session_key)
await self.bus.publish_outbound(OutboundMessage( await self.bus.publish_outbound(
channel=msg.channel, chat_id=msg.chat_id, OutboundMessage(
channel=msg.channel,
chat_id=msg.chat_id,
content="Sorry, I encountered an error.", content="Sorry, I encountered an error.",
)) )
)
async def close_mcp(self) -> None: async def close_mcp(self) -> None:
"""Close MCP connections.""" """Close MCP connections."""
@@ -336,8 +365,9 @@ class AgentLoop:
"""Process a single inbound message and return the response.""" """Process a single inbound message and return the response."""
# System messages: parse origin from chat_id ("channel:chat_id") # System messages: parse origin from chat_id ("channel:chat_id")
if msg.channel == "system": if msg.channel == "system":
channel, chat_id = (msg.chat_id.split(":", 1) if ":" in msg.chat_id channel, chat_id = (
else ("cli", msg.chat_id)) msg.chat_id.split(":", 1) if ":" in msg.chat_id else ("cli", msg.chat_id)
)
logger.info("Processing system message from {}", msg.sender_id) logger.info("Processing system message from {}", msg.sender_id)
key = f"{channel}:{chat_id}" key = f"{channel}:{chat_id}"
session = self.sessions.get_or_create(key) session = self.sessions.get_or_create(key)
@@ -345,13 +375,18 @@ class AgentLoop:
history = session.get_history(max_messages=self.memory_window) history = session.get_history(max_messages=self.memory_window)
messages = self.context.build_messages( messages = self.context.build_messages(
history=history, history=history,
current_message=msg.content, channel=channel, chat_id=chat_id, current_message=msg.content,
channel=channel,
chat_id=chat_id,
) )
final_content, _, all_msgs = await self._run_agent_loop(messages) final_content, _, all_msgs = await self._run_agent_loop(messages)
self._save_turn(session, all_msgs, 1 + len(history)) self._save_turn(session, all_msgs, 1 + len(history))
self.sessions.save(session) self.sessions.save(session)
return OutboundMessage(channel=channel, chat_id=chat_id, return OutboundMessage(
content=final_content or "Background task completed.") channel=channel,
chat_id=chat_id,
content=final_content or "Background task completed.",
)
preview = msg.content[:80] + "..." if len(msg.content) > 80 else msg.content preview = msg.content[:80] + "..." if len(msg.content) > 80 else msg.content
logger.info("Processing message from {}:{}: {}", msg.channel, msg.sender_id, preview) logger.info("Processing message from {}:{}: {}", msg.channel, msg.sender_id, preview)
@@ -372,13 +407,15 @@ class AgentLoop:
temp.messages = list(snapshot) temp.messages = list(snapshot)
if not await self._consolidate_memory(temp, archive_all=True): if not await self._consolidate_memory(temp, archive_all=True):
return OutboundMessage( return OutboundMessage(
channel=msg.channel, chat_id=msg.chat_id, channel=msg.channel,
chat_id=msg.chat_id,
content="Memory archival failed, session not cleared. Please try again.", content="Memory archival failed, session not cleared. Please try again.",
) )
except Exception: except Exception:
logger.exception("/new archival failed for {}", session.key) logger.exception("/new archival failed for {}", session.key)
return OutboundMessage( return OutboundMessage(
channel=msg.channel, chat_id=msg.chat_id, channel=msg.channel,
chat_id=msg.chat_id,
content="Memory archival failed, session not cleared. Please try again.", content="Memory archival failed, session not cleared. Please try again.",
) )
finally: finally:
@@ -387,14 +424,18 @@ class AgentLoop:
session.clear() session.clear()
self.sessions.save(session) self.sessions.save(session)
self.sessions.invalidate(session.key) self.sessions.invalidate(session.key)
return OutboundMessage(channel=msg.channel, chat_id=msg.chat_id, return OutboundMessage(
content="New session started.") channel=msg.channel, chat_id=msg.chat_id, content="New session started."
)
if cmd == "/help": if cmd == "/help":
return OutboundMessage(channel=msg.channel, chat_id=msg.chat_id, return OutboundMessage(
content="🐈 nanobot commands:\n/new — Start a new conversation\n/stop — Stop the current task\n/help — Show available commands") channel=msg.channel,
chat_id=msg.chat_id,
content="🐈 nanobot commands:\n/new — Start a new conversation\n/stop — Stop the current task\n/help — Show available commands",
)
unconsolidated = len(session.messages) - session.last_consolidated unconsolidated = len(session.messages) - session.last_consolidated
if (unconsolidated >= self.memory_window and session.key not in self._consolidating): if unconsolidated >= self.memory_window and session.key not in self._consolidating:
self._consolidating.add(session.key) self._consolidating.add(session.key)
lock = self._consolidation_locks.setdefault(session.key, asyncio.Lock()) lock = self._consolidation_locks.setdefault(session.key, asyncio.Lock())
@@ -421,19 +462,26 @@ class AgentLoop:
history=history, history=history,
current_message=msg.content, current_message=msg.content,
media=msg.media if msg.media else None, media=msg.media if msg.media else None,
channel=msg.channel, chat_id=msg.chat_id, channel=msg.channel,
chat_id=msg.chat_id,
) )
async def _bus_progress(content: str, *, tool_hint: bool = False) -> None: async def _bus_progress(content: str, *, tool_hint: bool = False) -> None:
meta = dict(msg.metadata or {}) meta = dict(msg.metadata or {})
meta["_progress"] = True meta["_progress"] = True
meta["_tool_hint"] = tool_hint meta["_tool_hint"] = tool_hint
await self.bus.publish_outbound(OutboundMessage( await self.bus.publish_outbound(
channel=msg.channel, chat_id=msg.chat_id, content=content, metadata=meta, OutboundMessage(
)) channel=msg.channel,
chat_id=msg.chat_id,
content=content,
metadata=meta,
)
)
final_content, _, all_msgs = await self._run_agent_loop( final_content, _, all_msgs = await self._run_agent_loop(
initial_messages, on_progress=on_progress or _bus_progress, initial_messages,
on_progress=on_progress or _bus_progress,
) )
if final_content is None: if final_content is None:
@@ -448,22 +496,31 @@ class AgentLoop:
preview = final_content[:120] + "..." if len(final_content) > 120 else final_content preview = final_content[:120] + "..." if len(final_content) > 120 else final_content
logger.info("Response to {}:{}: {}", msg.channel, msg.sender_id, preview) logger.info("Response to {}:{}: {}", msg.channel, msg.sender_id, preview)
return OutboundMessage( return OutboundMessage(
channel=msg.channel, chat_id=msg.chat_id, content=final_content, channel=msg.channel,
chat_id=msg.chat_id,
content=final_content,
metadata=msg.metadata or {}, metadata=msg.metadata or {},
) )
def _save_turn(self, session: Session, messages: list[dict], skip: int) -> None: def _save_turn(self, session: Session, messages: list[dict], skip: int) -> None:
"""Save new-turn messages into session, truncating large tool results.""" """Save new-turn messages into session, truncating large tool results."""
from datetime import datetime from datetime import datetime
for m in messages[skip:]: for m in messages[skip:]:
entry = dict(m) entry = dict(m)
role, content = entry.get("role"), entry.get("content") role, content = entry.get("role"), entry.get("content")
if role == "assistant" and not content and not entry.get("tool_calls"): if role == "assistant" and not content and not entry.get("tool_calls"):
continue # skip empty assistant messages — they poison session context continue # skip empty assistant messages — they poison session context
if role == "tool" and isinstance(content, str) and len(content) > self._TOOL_RESULT_MAX_CHARS: if (
role == "tool"
and isinstance(content, str)
and len(content) > self._TOOL_RESULT_MAX_CHARS
):
entry["content"] = content[: self._TOOL_RESULT_MAX_CHARS] + "\n... (truncated)" entry["content"] = content[: self._TOOL_RESULT_MAX_CHARS] + "\n... (truncated)"
elif role == "user": elif role == "user":
if isinstance(content, str) and content.startswith(ContextBuilder._RUNTIME_CONTEXT_TAG): if isinstance(content, str) and content.startswith(
ContextBuilder._RUNTIME_CONTEXT_TAG
):
# Strip the runtime-context prefix, keep only the user text. # Strip the runtime-context prefix, keep only the user text.
parts = content.split("\n\n", 1) parts = content.split("\n\n", 1)
if len(parts) > 1 and parts[1].strip(): if len(parts) > 1 and parts[1].strip():
@@ -473,10 +530,15 @@ class AgentLoop:
if isinstance(content, list): if isinstance(content, list):
filtered = [] filtered = []
for c in content: for c in content:
if c.get("type") == "text" and isinstance(c.get("text"), str) and c["text"].startswith(ContextBuilder._RUNTIME_CONTEXT_TAG): if (
c.get("type") == "text"
and isinstance(c.get("text"), str)
and c["text"].startswith(ContextBuilder._RUNTIME_CONTEXT_TAG)
):
continue # Strip runtime context from multimodal messages continue # Strip runtime context from multimodal messages
if (c.get("type") == "image_url" if c.get("type") == "image_url" and c.get("image_url", {}).get(
and c.get("image_url", {}).get("url", "").startswith("data:image/")): "url", ""
).startswith("data:image/"):
filtered.append({"type": "text", "text": "[image]"}) filtered.append({"type": "text", "text": "[image]"})
else: else:
filtered.append(c) filtered.append(c)
@@ -490,8 +552,11 @@ class AgentLoop:
async def _consolidate_memory(self, session, archive_all: bool = False) -> bool: async def _consolidate_memory(self, session, archive_all: bool = False) -> bool:
"""Delegate to MemoryStore.consolidate(). Returns True on success.""" """Delegate to MemoryStore.consolidate(). Returns True on success."""
return await MemoryStore(self.workspace).consolidate( return await MemoryStore(self.workspace).consolidate(
session, self.provider, self.model, session,
archive_all=archive_all, memory_window=self.memory_window, self.provider,
self.model,
archive_all=archive_all,
memory_window=self.memory_window,
) )
async def process_direct( async def process_direct(
@@ -505,5 +570,7 @@ class AgentLoop:
"""Process a message directly (for CLI or cron usage).""" """Process a message directly (for CLI or cron usage)."""
await self._connect_mcp() await self._connect_mcp()
msg = InboundMessage(channel=channel, sender_id="user", chat_id=chat_id, content=content) msg = InboundMessage(channel=channel, sender_id="user", chat_id=chat_id, content=content)
response = await self._process_message(msg, session_key=session_key, on_progress=on_progress) response = await self._process_message(
msg, session_key=session_key, on_progress=on_progress
)
return response.content if response else "" return response.content if response else ""

View File

@@ -30,12 +30,12 @@ class SubagentManager:
temperature: float = 0.7, temperature: float = 0.7,
max_tokens: int = 4096, max_tokens: int = 4096,
reasoning_effort: str | None = None, reasoning_effort: str | None = None,
brave_api_key: str | None = None, web_search_config: "WebSearchConfig | None" = None,
web_proxy: str | None = None, web_proxy: str | None = None,
exec_config: "ExecToolConfig | None" = None, exec_config: "ExecToolConfig | None" = None,
restrict_to_workspace: bool = False, restrict_to_workspace: bool = False,
): ):
from nanobot.config.schema import ExecToolConfig from nanobot.config.schema import ExecToolConfig, WebSearchConfig
self.provider = provider self.provider = provider
self.workspace = workspace self.workspace = workspace
self.bus = bus self.bus = bus
@@ -43,8 +43,8 @@ class SubagentManager:
self.temperature = temperature self.temperature = temperature
self.max_tokens = max_tokens self.max_tokens = max_tokens
self.reasoning_effort = reasoning_effort self.reasoning_effort = reasoning_effort
self.brave_api_key = brave_api_key
self.web_proxy = web_proxy self.web_proxy = web_proxy
self.web_search_config = web_search_config or WebSearchConfig()
self.exec_config = exec_config or ExecToolConfig() self.exec_config = exec_config or ExecToolConfig()
self.restrict_to_workspace = restrict_to_workspace self.restrict_to_workspace = restrict_to_workspace
self._running_tasks: dict[str, asyncio.Task[None]] = {} self._running_tasks: dict[str, asyncio.Task[None]] = {}
@@ -106,7 +106,7 @@ class SubagentManager:
restrict_to_workspace=self.restrict_to_workspace, restrict_to_workspace=self.restrict_to_workspace,
path_append=self.exec_config.path_append, path_append=self.exec_config.path_append,
)) ))
tools.register(WebSearchTool(api_key=self.brave_api_key, proxy=self.web_proxy)) tools.register(WebSearchTool(config=self.web_search_config, proxy=self.web_proxy))
tools.register(WebFetchTool(proxy=self.web_proxy)) tools.register(WebFetchTool(proxy=self.web_proxy))
system_prompt = self._build_subagent_prompt() system_prompt = self._build_subagent_prompt()

View File

@@ -1,13 +1,16 @@
"""Web tools: web_search and web_fetch.""" """Web tools: web_search and web_fetch."""
import asyncio
import html import html
import json import json
import os import os
import re import re
from collections.abc import Awaitable, Callable
from typing import Any from typing import Any
from urllib.parse import urlparse from urllib.parse import urlparse
import httpx import httpx
from ddgs import DDGS
from loguru import logger from loguru import logger
from nanobot.agent.tools.base import Tool from nanobot.agent.tools.base import Tool
@@ -44,8 +47,22 @@ def _validate_url(url: str) -> tuple[bool, str]:
return False, str(e) return False, str(e)
def _format_results(query: str, items: list[dict[str, Any]], n: int) -> str:
"""Format provider results into a shared plaintext output."""
if not items:
return f"No results for: {query}"
lines = [f"Results for: {query}\n"]
for i, item in enumerate(items[:n], 1):
title = _normalize(_strip_tags(item.get('title', '')))
snippet = _normalize(_strip_tags(item.get('content', '')))
lines.append(f"{i}. {title}\n {item.get('url', '')}")
if snippet:
lines.append(f" {snippet}")
return "\n".join(lines)
class WebSearchTool(Tool): class WebSearchTool(Tool):
"""Search the web using Brave Search API.""" """Search the web using configured provider."""
name = "web_search" name = "web_search"
description = "Search the web. Returns titles, URLs, and snippets." description = "Search the web. Returns titles, URLs, and snippets."
@@ -58,49 +75,133 @@ class WebSearchTool(Tool):
"required": ["query"] "required": ["query"]
} }
def __init__(self, api_key: str | None = None, max_results: int = 5, proxy: str | None = None): def __init__(
self._init_api_key = api_key self,
self.max_results = max_results config: "WebSearchConfig | None" = None,
self.proxy = proxy transport: httpx.AsyncBaseTransport | None = None,
ddgs_factory: Callable[[], DDGS] | None = None,
proxy: str | None = None,
):
from nanobot.config.schema import WebSearchConfig
@property self.config = config if config is not None else WebSearchConfig()
def api_key(self) -> str: self._transport = transport
"""Resolve API key at call time so env/config changes are picked up.""" self._ddgs_factory = ddgs_factory or (lambda: DDGS(timeout=10))
return self._init_api_key or os.environ.get("BRAVE_API_KEY", "") self.proxy = proxy
self._provider_dispatch: dict[str, Callable[[str, int], Awaitable[str]]] = {
"duckduckgo": self._search_duckduckgo,
"tavily": self._search_tavily,
"searxng": self._search_searxng,
"brave": self._search_brave,
}
async def execute(self, query: str, count: int | None = None, **kwargs: Any) -> str: async def execute(self, query: str, count: int | None = None, **kwargs: Any) -> str:
if not self.api_key: provider = (self.config.provider or "brave").strip().lower()
return ( n = min(max(count or self.config.max_results, 1), 10)
"Error: Brave Search API key not configured. Set it in "
"~/.nanobot/config.json under tools.web.search.apiKey " search = self._provider_dispatch.get(provider)
"(or export BRAVE_API_KEY), then restart the gateway." if search is None:
) return f"Error: unknown search provider '{provider}'"
return await search(query, n)
async def _fallback_to_duckduckgo(self, missing_key: str, query: str, n: int) -> str:
logger.warning("Falling back to DuckDuckGo: {} not configured", missing_key)
ddg = await self._search_duckduckgo(query=query, n=n)
if ddg.startswith('Error:'):
return ddg
return f'Using DuckDuckGo fallback ({missing_key} missing).\n\n{ddg}'
async def _search_brave(self, query: str, n: int) -> str:
api_key = self.config.api_key or os.environ.get("BRAVE_API_KEY", "")
if not api_key:
if self.config.fallback_to_duckduckgo:
return await self._fallback_to_duckduckgo('BRAVE_API_KEY', query, n)
return "Error: BRAVE_API_KEY not configured"
try: try:
n = min(max(count or self.max_results, 1), 10) async with httpx.AsyncClient(transport=self._transport, proxy=self.proxy) as client:
logger.debug("WebSearch: {}", "proxy enabled" if self.proxy else "direct connection")
async with httpx.AsyncClient(proxy=self.proxy) as client:
r = await client.get( r = await client.get(
"https://api.search.brave.com/res/v1/web/search", "https://api.search.brave.com/res/v1/web/search",
params={"q": query, "count": n}, params={"q": query, "count": n},
headers={"Accept": "application/json", "X-Subscription-Token": self.api_key}, headers={"Accept": "application/json", "X-Subscription-Token": api_key},
timeout=10.0 timeout=10.0,
) )
r.raise_for_status() r.raise_for_status()
results = r.json().get("web", {}).get("results", [])[:n] items = [{"title": x.get("title", ""), "url": x.get("url", ""),
if not results: "content": x.get("description", "")}
for x in r.json().get("web", {}).get("results", [])]
return _format_results(query, items, n)
except Exception as e:
return f"Error: {e}"
async def _search_tavily(self, query: str, n: int) -> str:
api_key = self.config.api_key or os.environ.get("TAVILY_API_KEY", "")
if not api_key:
if self.config.fallback_to_duckduckgo:
return await self._fallback_to_duckduckgo('TAVILY_API_KEY', query, n)
return "Error: TAVILY_API_KEY not configured"
try:
async with httpx.AsyncClient(transport=self._transport, proxy=self.proxy) as client:
r = await client.post(
"https://api.tavily.com/search",
headers={"Authorization": f"Bearer {api_key}"},
json={"query": query, "max_results": n},
timeout=15.0,
)
r.raise_for_status()
results = r.json().get("results", [])
return _format_results(query, results, n)
except Exception as e:
return f"Error: {e}"
async def _search_duckduckgo(self, query: str, n: int) -> str:
try:
ddgs = self._ddgs_factory()
raw_results = await asyncio.to_thread(ddgs.text, query, max_results=n)
if not raw_results:
return f"No results for: {query}" return f"No results for: {query}"
lines = [f"Results for: {query}\n"] items = [
for i, item in enumerate(results, 1): {
lines.append(f"{i}. {item.get('title', '')}\n {item.get('url', '')}") "title": result.get("title", ""),
if desc := item.get("description"): "url": result.get("href", ""),
lines.append(f" {desc}") "content": result.get("body", ""),
return "\n".join(lines) }
except httpx.ProxyError as e: for result in raw_results
logger.error("WebSearch proxy error: {}", e) ]
return f"Proxy error: {e}" return _format_results(query, items, n)
except Exception as e:
logger.warning("DuckDuckGo search failed: {}", e)
return f"Error: DuckDuckGo search failed ({e})"
async def _search_searxng(self, query: str, n: int) -> str:
base_url = (self.config.base_url or os.environ.get("SEARXNG_BASE_URL", "")).strip()
if not base_url:
if self.config.fallback_to_duckduckgo:
return await self._fallback_to_duckduckgo('SEARXNG_BASE_URL', query, n)
return "Error: SEARXNG_BASE_URL not configured"
endpoint = f"{base_url.rstrip('/')}/search"
is_valid, error_msg = _validate_url(endpoint)
if not is_valid:
return f"Error: invalid SearXNG URL: {error_msg}"
try:
async with httpx.AsyncClient(transport=self._transport, proxy=self.proxy) as client:
r = await client.get(
endpoint,
params={"q": query, "format": "json"},
headers={"User-Agent": USER_AGENT},
timeout=10.0,
)
r.raise_for_status()
results = r.json().get("results", [])
return _format_results(query, results, n)
except Exception as e: except Exception as e:
logger.error("WebSearch error: {}", e) logger.error("WebSearch error: {}", e)
return f"Error: {e}" return f"Error: {e}"
@@ -157,7 +258,8 @@ class WebFetchTool(Tool):
text, extractor = r.text, "raw" text, extractor = r.text, "raw"
truncated = len(text) > max_chars truncated = len(text) > max_chars
if truncated: text = text[:max_chars] if truncated:
text = text[:max_chars]
return json.dumps({"url": url, "finalUrl": str(r.url), "status": r.status_code, return json.dumps({"url": url, "finalUrl": str(r.url), "status": r.status_code,
"extractor": extractor, "truncated": truncated, "length": len(text), "text": text}, ensure_ascii=False) "extractor": extractor, "truncated": truncated, "length": len(text), "text": text}, ensure_ascii=False)

View File

@@ -332,7 +332,7 @@ def gateway(
max_iterations=config.agents.defaults.max_tool_iterations, max_iterations=config.agents.defaults.max_tool_iterations,
memory_window=config.agents.defaults.memory_window, memory_window=config.agents.defaults.memory_window,
reasoning_effort=config.agents.defaults.reasoning_effort, reasoning_effort=config.agents.defaults.reasoning_effort,
brave_api_key=config.tools.web.search.api_key or None, web_search_config=config.tools.web.search,
web_proxy=config.tools.web.proxy or None, web_proxy=config.tools.web.proxy or None,
exec_config=config.tools.exec, exec_config=config.tools.exec,
cron_service=cron, cron_service=cron,
@@ -517,7 +517,7 @@ def agent(
max_iterations=config.agents.defaults.max_tool_iterations, max_iterations=config.agents.defaults.max_tool_iterations,
memory_window=config.agents.defaults.memory_window, memory_window=config.agents.defaults.memory_window,
reasoning_effort=config.agents.defaults.reasoning_effort, reasoning_effort=config.agents.defaults.reasoning_effort,
brave_api_key=config.tools.web.search.api_key or None, web_search_config=config.tools.web.search,
web_proxy=config.tools.web.proxy or None, web_proxy=config.tools.web.proxy or None,
exec_config=config.tools.exec, exec_config=config.tools.exec,
cron_service=cron, cron_service=cron,

View File

@@ -288,7 +288,10 @@ class GatewayConfig(Base):
class WebSearchConfig(Base): class WebSearchConfig(Base):
"""Web search tool configuration.""" """Web search tool configuration."""
api_key: str = "" # Brave Search API key provider: str = "" # brave, tavily, searxng, duckduckgo (empty = brave)
api_key: str = "" # API key for selected provider
base_url: str = "" # Base URL (SearXNG)
fallback_to_duckduckgo: bool = True
max_results: int = 5 max_results: int = 5

View File

@@ -24,6 +24,7 @@ dependencies = [
"websockets>=16.0,<17.0", "websockets>=16.0,<17.0",
"websocket-client>=1.9.0,<2.0.0", "websocket-client>=1.9.0,<2.0.0",
"httpx>=0.28.0,<1.0.0", "httpx>=0.28.0,<1.0.0",
"ddgs>=9.5.5,<10.0.0",
"oauth-cli-kit>=0.1.3,<1.0.0", "oauth-cli-kit>=0.1.3,<1.0.0",
"loguru>=0.7.3,<1.0.0", "loguru>=0.7.3,<1.0.0",
"readability-lxml>=0.8.4,<1.0.0", "readability-lxml>=0.8.4,<1.0.0",

View File

@@ -1,8 +1,10 @@
from typing import Any from typing import Any
from nanobot.agent.tools.web import WebSearchTool
from nanobot.agent.tools.base import Tool from nanobot.agent.tools.base import Tool
from nanobot.agent.tools.registry import ToolRegistry from nanobot.agent.tools.registry import ToolRegistry
from nanobot.agent.tools.shell import ExecTool from nanobot.agent.tools.shell import ExecTool
from nanobot.config.schema import WebSearchConfig
class SampleTool(Tool): class SampleTool(Tool):
@@ -337,3 +339,16 @@ def test_cast_params_single_value_not_auto_wrapped_to_array() -> None:
assert result["items"] == 5 # Not wrapped to [5] assert result["items"] == 5 # Not wrapped to [5]
result = tool.cast_params({"items": "text"}) result = tool.cast_params({"items": "text"})
assert result["items"] == "text" # Not wrapped to ["text"] assert result["items"] == "text" # Not wrapped to ["text"]
async def test_web_search_no_fallback_returns_provider_error() -> None:
tool = WebSearchTool(
config=WebSearchConfig(
provider="brave",
api_key="",
fallback_to_duckduckgo=False,
)
)
result = await tool.execute(query="fallback", count=1)
assert result == "Error: BRAVE_API_KEY not configured"

View File

@@ -0,0 +1,327 @@
import httpx
import pytest
from collections.abc import Callable
from typing import Literal
from nanobot.agent.tools.web import WebSearchTool
from nanobot.config.schema import WebSearchConfig
def _tool(config: WebSearchConfig, handler) -> WebSearchTool:
return WebSearchTool(config=config, transport=httpx.MockTransport(handler))
def _assert_tavily_request(request: httpx.Request) -> bool:
return (
request.method == "POST"
and str(request.url) == "https://api.tavily.com/search"
and request.headers.get("authorization") == "Bearer tavily-key"
and '"query":"openclaw"' in request.read().decode("utf-8")
)
@pytest.mark.asyncio
@pytest.mark.parametrize(
("provider", "config_kwargs", "query", "count", "assert_request", "response", "assert_text"),
[
(
"brave",
{"api_key": "brave-key"},
"nanobot",
1,
lambda request: (
request.method == "GET"
and str(request.url)
== "https://api.search.brave.com/res/v1/web/search?q=nanobot&count=1"
and request.headers["X-Subscription-Token"] == "brave-key"
),
httpx.Response(
200,
json={
"web": {
"results": [
{
"title": "NanoBot",
"url": "https://example.com/nanobot",
"description": "Ultra-lightweight assistant",
}
]
}
},
),
["Results for: nanobot", "1. NanoBot", "https://example.com/nanobot"],
),
(
"tavily",
{"api_key": "tavily-key"},
"openclaw",
2,
_assert_tavily_request,
httpx.Response(
200,
json={
"results": [
{
"title": "OpenClaw",
"url": "https://example.com/openclaw",
"content": "Plugin-based assistant framework",
}
]
},
),
["Results for: openclaw", "1. OpenClaw", "https://example.com/openclaw"],
),
(
"searxng",
{"base_url": "https://searx.example"},
"nanobot",
1,
lambda request: (
request.method == "GET"
and str(request.url) == "https://searx.example/search?q=nanobot&format=json"
),
httpx.Response(
200,
json={
"results": [
{
"title": "nanobot docs",
"url": "https://example.com/nanobot",
"content": "Lightweight assistant docs",
}
]
},
),
["Results for: nanobot", "1. nanobot docs", "https://example.com/nanobot"],
),
],
)
async def test_web_search_provider_formats_results(
provider: Literal["brave", "tavily", "searxng"],
config_kwargs: dict,
query: str,
count: int,
assert_request: Callable[[httpx.Request], bool],
response: httpx.Response,
assert_text: list[str],
) -> None:
def handler(request: httpx.Request) -> httpx.Response:
assert assert_request(request)
return response
tool = _tool(WebSearchConfig(provider=provider, max_results=5, **config_kwargs), handler)
result = await tool.execute(query=query, count=count)
for text in assert_text:
assert text in result
@pytest.mark.asyncio
async def test_web_search_from_legacy_config_works() -> None:
def handler(request: httpx.Request) -> httpx.Response:
return httpx.Response(
200,
json={
"web": {
"results": [
{"title": "Legacy", "url": "https://example.com", "description": "ok"}
]
}
},
)
config = WebSearchConfig(api_key="legacy-key", max_results=3)
tool = WebSearchTool(config=config, transport=httpx.MockTransport(handler))
result = await tool.execute(query="constructor", count=1)
assert "1. Legacy" in result
@pytest.mark.asyncio
@pytest.mark.parametrize(
("provider", "config", "missing_env", "expected_title"),
[
(
"brave",
WebSearchConfig(provider="brave", api_key="", max_results=5),
"BRAVE_API_KEY",
"Fallback Result",
),
(
"tavily",
WebSearchConfig(provider="tavily", api_key="", max_results=5),
"TAVILY_API_KEY",
"Tavily Fallback",
),
],
)
async def test_web_search_missing_key_falls_back_to_duckduckgo(
monkeypatch: pytest.MonkeyPatch,
provider: str,
config: WebSearchConfig,
missing_env: str,
expected_title: str,
) -> None:
monkeypatch.delenv(missing_env, raising=False)
called = False
class FakeDDGS:
def __init__(self, *args, **kwargs):
pass
def text(self, keywords: str, max_results: int):
nonlocal called
called = True
return [
{
"title": expected_title,
"href": f"https://example.com/{provider}-fallback",
"body": "Fallback snippet",
}
]
monkeypatch.setattr("nanobot.agent.tools.web.DDGS", FakeDDGS, raising=False)
result = await WebSearchTool(config=config).execute(query="fallback", count=1)
assert called
assert "Using DuckDuckGo fallback" in result
assert f"1. {expected_title}" in result
@pytest.mark.asyncio
async def test_web_search_brave_missing_key_without_fallback_returns_error(
monkeypatch: pytest.MonkeyPatch,
) -> None:
monkeypatch.delenv("BRAVE_API_KEY", raising=False)
tool = WebSearchTool(
config=WebSearchConfig(
provider="brave",
api_key="",
fallback_to_duckduckgo=False,
)
)
result = await tool.execute(query="fallback", count=1)
assert result == "Error: BRAVE_API_KEY not configured"
@pytest.mark.asyncio
async def test_web_search_searxng_missing_base_url_falls_back_to_duckduckgo() -> None:
tool = WebSearchTool(
config=WebSearchConfig(provider="searxng", base_url="", max_results=5)
)
result = await tool.execute(query="nanobot", count=1)
assert "DuckDuckGo fallback" in result
assert "SEARXNG_BASE_URL" in result
@pytest.mark.asyncio
async def test_web_search_searxng_missing_base_url_no_fallback_returns_error() -> None:
tool = WebSearchTool(
config=WebSearchConfig(
provider="searxng", base_url="",
fallback_to_duckduckgo=False, max_results=5,
)
)
result = await tool.execute(query="nanobot", count=1)
assert result == "Error: SEARXNG_BASE_URL not configured"
@pytest.mark.asyncio
async def test_web_search_searxng_uses_env_base_url(
monkeypatch: pytest.MonkeyPatch,
) -> None:
monkeypatch.setenv("SEARXNG_BASE_URL", "https://searx.env")
def handler(request: httpx.Request) -> httpx.Response:
assert request.method == "GET"
assert str(request.url) == "https://searx.env/search?q=nanobot&format=json"
return httpx.Response(
200,
json={
"results": [
{
"title": "env result",
"url": "https://example.com/env",
"content": "from env",
}
]
},
)
config = WebSearchConfig(provider="searxng", base_url="", max_results=5)
result = await _tool(config, handler).execute(query="nanobot", count=1)
assert "1. env result" in result
@pytest.mark.asyncio
async def test_web_search_register_custom_provider() -> None:
config = WebSearchConfig(provider="custom", max_results=5)
tool = WebSearchTool(config=config)
async def _custom_provider(query: str, n: int) -> str:
return f"custom:{query}:{n}"
tool._provider_dispatch["custom"] = _custom_provider
result = await tool.execute(query="nanobot", count=2)
assert result == "custom:nanobot:2"
@pytest.mark.asyncio
async def test_web_search_duckduckgo_uses_injected_ddgs_factory() -> None:
class FakeDDGS:
def text(self, keywords: str, max_results: int):
assert keywords == "nanobot"
assert max_results == 1
return [
{
"title": "NanoBot result",
"href": "https://example.com/nanobot",
"body": "Search content",
}
]
tool = WebSearchTool(
config=WebSearchConfig(provider="duckduckgo", max_results=5),
ddgs_factory=lambda: FakeDDGS(),
)
result = await tool.execute(query="nanobot", count=1)
assert "1. NanoBot result" in result
@pytest.mark.asyncio
async def test_web_search_unknown_provider_returns_error() -> None:
tool = WebSearchTool(
config=WebSearchConfig(provider="google", max_results=5),
)
result = await tool.execute(query="nanobot", count=1)
assert result == "Error: unknown search provider 'google'"
@pytest.mark.asyncio
async def test_web_search_dispatch_dict_overwrites_builtin() -> None:
async def _custom_brave(query: str, n: int) -> str:
return f"custom-brave:{query}:{n}"
tool = WebSearchTool(
config=WebSearchConfig(provider="brave", api_key="key", max_results=5),
)
tool._provider_dispatch["brave"] = _custom_brave
result = await tool.execute(query="nanobot", count=2)
assert result == "custom-brave:nanobot:2"
@pytest.mark.asyncio
async def test_web_search_searxng_rejects_invalid_url() -> None:
tool = WebSearchTool(
config=WebSearchConfig(
provider="searxng",
base_url="ftp://internal.host",
max_results=5,
),
)
result = await tool.execute(query="nanobot", count=1)
assert "Error: invalid SearXNG URL" in result