Replace fire-and-forget consolidation with archive_messages(), which
retries until the raw-dump fallback triggers — making it effectively
infallible. /new now clears the session immediately and archives in
the background. Pending archive tasks are drained on shutdown via
close_mcp() so no data is lost on process exit.
PR #881 (commit 755e424) fixed the race condition between normal consolidation
and /new consolidation, but did so by making /new wait for consolidation to
complete before returning. This hurts user experience - /new should be instant.
This PR restores the original immediate-return behavior while keeping safety:
1. **Immediate return**: Session clears and user sees "New session started" right away
2. **Background archival**: Consolidation runs in background via asyncio.create_task
3. **Serialized consolidation**: Uses the same lock as normal consolidation via
`memory_consolidator.get_lock()` to prevent concurrent writes
If consolidation fails after session clear, archived messages may be lost.
This is acceptable because:
- User already sees the new session and can continue working
- Failure is logged for debugging
- The alternative (blocking /new on every call) hurts UX for all users
Instead of adding a separate load_skill tool to bypass workspace restrictions,
extend ReadFileTool with extra_allowed_dirs so it can read builtin skill paths
while keeping write/edit tools locked to the workspace. Fixes the original issue
for both main agent and subagents.
Made-with: Cursor
When restrictToWorkspace is enabled, the agent cannot read builtin skill
files via read_file since they live outside the workspace. This adds a
dedicated load_skill tool that reads skills by name through the SkillsLoader,
which accesses files directly via Python without the workspace restriction.
- Add LoadSkillTool to filesystem tools
- Register it in the agent loop
- Update system prompt to instruct agent to use load_skill instead of read_file
- Remove raw filesystem paths from skills summary
- Enhance _strip_think to handle stray tags:
* Remove unmatched closing tags (</think>)
* Remove incomplete blocks (<think> ... to end of string)
- Apply _strip_think to tool hint messages as well
- Prevents blank/parse errors from showing </think> in chat outputs
Fixes issue with empty </think> appearing in Feishu tool call cards and other messages.
Implement asynchronous memory consolidation that runs in the background when
sessions are idle, instead of blocking user interactions after each message.
Changes:
- MemoryConsolidator: Add background task management with idle detection
* Track session activity timestamps
* Background loop checks idle sessions every 30s
* Consolidation triggers only when session idle > 60s
- AgentLoop: Integrate background task lifecycle
* Start consolidation task when loop starts
* Stop gracefully on shutdown
* Record activity on each message
- Refactor maybe_consolidate_by_tokens: Keep sync API but schedule async
- Add debug logging for consolidation completion
Benefits:
- Non-blocking: Users no longer wait for consolidation after responses
- Efficient: Only consolidate idle sessions, avoiding redundant work
- Scalable: Background task can process multiple sessions efficiently
- Backward compatible: Existing API unchanged
Tests: 11 new tests covering background task lifecycle, idle detection,
scheduling, and error handling. All passing.
🤖 Generated with Claude Code
On Windows, sys.argv[0] may be just "nanobot" without full path when
running from PATH. os.execv() doesn't search PATH, causing restart to
fail with "No such file or directory".
Fix by using `python -m nanobot` instead of relying on sys.argv[0].
Fixes#1937
Fix issue #1823: Memory consolidation does not inherit agent temperature
and maxTokens configuration.
The agent's configured generation parameters were not being passed through
to the memory consolidation call, causing it to fall back to default values.
This resulted in the consolidation response being truncated before the
save_memory tool call was emitted.
- Pass temperature, max_tokens, reasoning_effort from AgentLoop to
MemoryConsolidator and then to MemoryStore.consolidate()
- Forward these parameters to the provider.chat_with_retry() call
Fixes#1823
Move consolidation policy into MemoryConsolidator, keep backward compatibility for legacy config, and compress history by token budget instead of message count.
Major changes:
- Replace message-count-based memory window with token-budget-based compression
- Add max_tokens_input, compression_start_ratio, compression_target_ratio config
- Implement _maybe_compress_history() that triggers based on prompt token usage
- Use _build_compressed_history_view() to provide compressed history to LLM
- Refactor MemoryStore.consolidate() -> consolidate_chunk() for chunk-based compression
- Remove last_consolidated from Session, use _compressed_until metadata instead
- Add background compression scheduling to avoid blocking message processing
Key improvements:
- Compression now based on actual token usage, not arbitrary message counts
- Better handling of long conversations with large context windows
- Non-destructive compression: old messages remain in session, but excluded from prompt
- Automatic compression when history exceeds configured token thresholds
provider.chat() had no retry logic — a transient 429 rate limit,
502 gateway error, or network timeout would permanently fail the
entire message. For a system running cron jobs and heartbeats 24/7,
even a brief provider blip causes lost tasks.
Adds _chat_with_retry() that:
- Retries up to 3 times with 1s/2s/4s exponential backoff
- Only retries transient errors (429, 5xx, timeout, connection)
- Returns immediately on permanent errors (400, 401, etc.)
- Falls through to the final attempt if all retries exhaust
Some LLM providers (Minimax, Dashscope) strictly reject consecutive
messages with the same role. build_messages() was emitting two separate
user messages back-to-back: the runtime context and the actual user
content.
Merge them into a single user message, handling both plain text and
multimodal (image) content. Update _save_turn() to strip the runtime
context prefix from the merged message when persisting to session
history.
Fixes#1414Fixes#1344