OpenManus v2 upgrade#1366
Open
MohamedElsaeidy wants to merge 70 commits into
Open
Conversation
1. 禁止在入口文件中直接实例化或运行任何 Agent 2. 如果存在 run_task / run_agent / main_execute 之类逻辑,请注释或移除 3. 入口文件只保留启动服务或占位逻辑 4. 不引入新功能,不改变任何 Agent 行为 5. 代码必须仍可 import,不要求可执行 请只修改入口相关文件,不要改 agent、tool、memory 代码。
要求: 1. Task 必须包含: - id(字符串) - status(CREATED / RUNNING / DONE / FAILED / INTERRUPTED) - interrupt_flag(bool) - event_queue(线程安全或 asyncio 兼容) 2. 提供方法: - emit(type: str, data: Any) - interrupt() - is_interrupted() 3. 不依赖 FastAPI / WebSocket 4. 不修改任何已有代码 5. Task 设计要尽量通用,后续可被 agent / tool 使用 只新增文件,不改已有文件。
要求: 1. 所有 Agent 的入口方法签名改为: run(task: Task, input: Any) 2. Agent 内部不得直接 print / logging 作为输出 3. 所有“对外可见”的信息必须通过: task.emit(event_type, data) 4. 在每个主要执行步骤前,检查: if task.is_interrupted(): raise TaskInterrupted 5. 不改变 Agent 的决策逻辑,不优化 prompt,不新增能力 请只修改 Agent 相关文件。
1. Planner 的输出必须是结构化 plan(如 list[dict] 或 dataclass) 2. 生成 plan 的过程中,通过 task.emit(plan.step, step) 3. Planner 不执行任何 tool 4. Planner 完成时 emit(plan.done) 不要修改 Executor,不引入新模型调用。
1. Executor 只消费 Planner 生成的 plan 2. 每次执行 step 前: task.emit(execute.step.start, step) 3. 调用 tool 前: task.emit(tool.call, tool_name, args) 4. tool 执行完成后: task.emit(tool.result, result) 5. 支持 interrupt(中断后立刻停止) 不要修改 Planner 和 Tool 实现。
实现 ToolRunner 类,统一所有 tool 执行入口 提供 run(task, tool_name, args) -> ToolResult 支持: timeout 捕获 stdout / stderr 在执行期间检查 task.is_interrupted() 不直接调用 os.system 不修改具体 tool 的业务逻辑 只新增 runner,不重构 tool。
1. 禁止直接 subprocess.run / os.system 2. 统一通过 ToolRunner 调用 3. 返回结构化 ToolResult(stdout / stderr / exit_code) 4. 支持 interrupt 不修改 shell 的功能行为。
实现 ContextEngine.build(task, agent_role, step_type) Context 分为: Hard facts(用户目标、当前 plan) Recent events(最近 tool 输出) Process summary(字符串) 支持 context token budget(简单字符裁剪即可) 不调用向量数据
1. 所有 Agent 在调用 LLM 前,必须通过 ContextEngine.build 获取上下文 2. Agent 不得自行拼接历史记录 3. 不修改 prompt 内容本身,只替换 context 来源 只改 prompt 构建相关代码。
1. 使用 FastAPI
2. 提供:
- POST /tasks(创建任务)
- GET /tasks/{id}(查询状态)
- POST /tasks/{id}/interrupt
3. API 调用 TaskRegistry
4. Agent 在 background task 中运行
5. 不引入 WebSocket
只新增 server/api.py。
1. WebSocket endpoint: /tasks/{id}/stream
2. 从 task.event_queue 读取事件
3. 将事件以 JSON 形式推送给客户端
4. 支持客户端发送 interrupt 指令
5. 不包含业务逻辑,只做事件转发
不要修改 API 层。
1. TaskInterrupted → status = INTERRUPTED 2. 未捕获异常 → status = FAILED 3. 正常完成 → status = DONE 4. 所有状态变化 emit task.status 事件 不要改变业务逻辑,只补状态收敛。
使用 SQLAlchemy 定义 Task 模型,用于 PostgreSQL。 Task 字段: task_id (UUID, 主键) status (字符串) input (JSON) result (JSON, 可为空) created_at (时间戳) updated_at (时间戳) 将原来的内存 TaskRegistry 改成 DB 操作: get_task(task_id) create_task(...) update_task(task) ORM 模型放在 server/models.py。 禁止修改业务逻辑或 API 层。 保持与现有 /tasks API 接口兼容。
使用 Celery 作为异步任务执行框架。
创建 server/celery_app.py,初始化 Celery:
broker 使用 Redis (redis://redis:6379/0)
backend 使用 PostgreSQL 或 Redis
将 Task 执行逻辑包装成 Celery task:
接收 task_id,执行原来逻辑
执行完成后更新数据库 task.status = COMPLETED
将执行结果写入 task.result
FastAPI 的 /tasks POST API:
创建 Task(写 DB)
立即调用 Celery 异步执行
FastAPI 的 /tasks/{task_id} GET API:
直接查询数据库
不改变现有 API 路径。
1. 修改 server/api.py,使 API 查询和状态返回读取 PostgreSQL。
2. POST /tasks:
- 写入 DB
- 调用 Celery task 异步执行
- 返回 task_id 和 status=CREATED
3. GET /tasks/{task_id}:
- 查询 DB 返回 status + result
4. POST /tasks/{task_id}/interrupt:
- 标记 DB task.status=INTERRUPTED
- 可发送 Celery revoke 命令取消任务
5. 保持 FastAPI 路由不变
6. 不修改业务逻辑
1. 在项目根创建 Dockerfile,用于 FastAPI Web:
- Python 3.10-slim
- 安装 requirements.txt
- CMD uvicorn main:app --host 0.0.0.0 --port 8000
2. 创建 docker-compose.yml:
- 服务:
a) web: FastAPI
b) worker: Celery Worker,依赖 Redis + PostgreSQL
c) redis: 最新官方镜像
d) postgres: 最新官方镜像,持久化卷
- 网络互通
- 显式暴露端口 8000
3. Web 与 Worker 使用同一代码库
4. 保持 API 与 DB 连接参数一致
5. 不修改业务逻辑
1. 创建 .github/workflows/docker-celery-smoke-test.yml 2. CI 步骤: a) checkout 代码 b) build docker-compose c) 启动容器 (web + worker + redis + postgres) d) 等待服务启动 e) 验证 /health 返回 200 f) 验证 /docs 返回 200 g) 验证 /tasks POST + GET 返回正确 status h) 关闭并清理容器 3. 失败任意步骤 CI 红灯 4. 不修改源代码业务逻辑 5. Docker + Celery + DB 必须和 PR 代码一致
…servability, and UI overhaul
…servability, and UI overhaul
… non-actionable model outputs
…nd cleanup minor whitespace
mergin tree to main
This was referenced May 14, 2026
added 12 commits
May 16, 2026 04:15
…nd integration health monitoring
…ion for long-term memory components
…pendencies with Python scripts and enhancing diagnostic logging on failure
…se polling attempts
…n lifecycle tracking
…onse on local model failures
…otency, and add diff statistics to chat UI
…recise file modifications
|
works well ! Great work |
added 14 commits
May 22, 2026 00:26
## Agent Core
- Semantic stuck-loop detection: repeated tool+args signatures (Counter-based)
- Tool-call dedup: seen_sigs set in act() skips identical calls within a step
- _looks_final_response: replaced brittle single-regex with 3-tier signal system
(_STRONG_FINAL_RE, _WEAK_FINAL_RE) — prevents mid-task words from early-finishing
## LLM Layer
- Remove stdout-polluting print() from streaming chunks (MCP pipe fix)
- stream_options={include_usage:true} for real completion token counts
- Expanded REASONING_MODELS (o1-mini, o3, o4-mini) + REASONING_EFFORT_MODELS
- CLAUDE_THINKING_MODELS + extended thinking block injection in ask_tool()
- LM-Studio dynamic capability probe: _probe_local_server_caps() reads
reasoning/vision flags from /api/v0/models at init, falls back to Ollama
- enable_thinking: Optional[bool] in LLMSettings + config.example.toml docs
- ensure_user_query: non-mutating (list spread instead of .append())
## Tool System
- ToolResult: exit_code, metadata, is_error property (structured results)
- BaseTool: parallel_safe, can_retry, emits_progress capability flags
- GlobSearch, GrepSearch, ReadFiles, CodebaseOverview: parallel_safe=True
- WebSearch: parallel_safe=True + SearchResponse.metadata -> search_metadata
- ToolCallAgent._is_parallel_safe(): reads tool.parallel_safe flag (data-driven)
- execute_tool(): retry-with-feedback loop for tools with can_retry=True
- Remove dead str_replace_editor branch
## UI Rewind
- HomePage: suggested prompts (6), capability chips, recent conversations resume
- TaskDetailPage: step progress pill, token counter, removed hardcoded v2.0
- preview-content.tsx: 1087 -> 120 lines (pure router)
+ panels/BrowserPanel.tsx (~40 lines)
+ panels/RuntimePanel.tsx (~180 lines)
+ panels/ToolsPanel.tsx (~220 lines, live/terminal/tool panels)
+ panels/WorkspacePanel.tsx (~260 lines, file browser + collapsible diffs)
- ChangesPanel: collapsible <details> diff viewer (replaces always-expanded)
## Server
- _get_llm_caps() helper exposed in health check
(model, is_local_server, caps_thinking, caps_vision, thinking_enabled)
## 1. LLM Singleton → Factory (Medium) - Add get_llm(config_name) module-level factory with _llm_registry dict - Add _evict_llm(config_name) for test isolation / hot-config-reload - LLM() still works as a backwards-compatible constructor (delegates to factory) - _init_from_config() is now a separate method called at most once per instance - Eliminates the __new__ + __init__ double-call fragility ## 2. ReAct Trace Events (Low — but high observability value) - react.py: rewritten from 46-line stub to full Reason→Act→Observe step loop Each step emits: step:reason, step:act, step:complete (or step:observe on act) - toolcall.py think(): emits agent:lifecycle:step:reason after LLM responds (includes reasoning text, will_act bool, tools_planned list) - toolcall.py act(): emits agent:lifecycle:step:observe after all tools complete (includes tool_count, tools_executed, observation_preview) ## 3. Tool Progress Streaming (Low) - Bash: emits tool:progress start/done events in sandbox path - PythonExecute: emits tool:progress start/done events in sandbox path - Both: emits_progress=True capability flag added - Local session path already streamed; sandbox path was a silent black hole — fixed ## 4. Agent-Level Planning visibility (Medium — lite version) - PlanningTool: adds emit_current_task calls to create/update/mark_step/delete - Emits agent:plan:updated with full step list, statuses, notes, progress % - UI: LiveActivityPanel now shows sticky PlanCard above the event stream Real-time progress bar + per-step status icons (○→✓!) that update as agent calls mark_step — no polling required ## 5. Settings Page (Low) - ConversationSettingsPage: enable_thinking three-way toggle (Auto/Always On/Disabled) with descriptive help text explaining each option - ConversationSettingsPage: Performance Mode checkbox in LLM Limits card - Both saved to updateConversationSettings as enable_thinking + performance_mode
…oss LLM and agent modules
…uttons
## Search fallback improvements (web_search.py)
- Fix User-Agent header typo ('WebSearch' key → 'User-Agent') — content fetcher
was sending no User-Agent header at all, causing many sites to 403
- Fix failed_engines tracking bug: engines were never added to failed_engines
during the loop so silent failures went unrecorded and unlogged
- Add per-engine asyncio.wait_for(timeout=12s) so a hung Bing Playwright call
can't block all other engines for 30s+
- Add proper exception catch per engine (timeout + generic) so one crashing
engine doesn't abort the entire fallback chain
- Surface an LLM-readable error message when ALL engines fail, explicitly
telling the model NOT to fall back to python_execute with requests (which
has the same restrictions and also fails)
- Increase default num_results 5 → 8 for richer agent context
- Update tool description to be explicit about fallback order and warn against
python fallback
## browser_use_tool.py web_search action crash fix
- Guard against IndexError when search_response.results is empty
(previously: results[0] would crash with unhandled exception)
- Return the search error string directly to the agent instead of crashing
## UI: remove duplicate Manus Computer header buttons (index.tsx)
- Remove 'Live' text button that was a visual+functional duplicate of the
ListChecksIcon button (both called setData({ type: 'live' }))
- Replace Skills button icon from ListChecksIcon (duplicate of Live) to
BookOpenIcon for clear visual differentiation
## requirements.txt - Bump cloakbrowser 0.3.24 → 0.3.30 (Chromium 146, 58 C++ fingerprint patches, SOCKS5 proxy, WebRTC IP spoofing, humanize actionability improvements) ## Dockerfile - Pre-download CloakBrowser binary at image build time via ensure_binary() so it is baked into the image and available without a network hit at runtime ## browser_use_tool.py - Add _get_cloak_binary_path() helper: reads binary_info(), downloads if not cached, returns path or None on any error (graceful fallback) - _ensure_browser_initialized: inject cloak binary as chrome_instance_path so browser_use launches its CDP session into the stealth Chromium binary instead of stock Playwright Chromium - Only applies when no custom chrome_instance_path/wss_url/cdp_url is set (respects existing user overrides) - Falls back transparently to stock Playwright if CloakBrowser is unavailable Effect: ALL agent browser sessions (go_to_url, extract_content, click_element, web_search, etc.) now use Chromium with source-level anti-detection patches instead of detectable headless Chromium. Cloudflare, FingerprintJS, BrowserScan and Turnstile should no longer block agent browsing.
## bing_search.py
- Add &setlang=en&cc=US&mkt=en-US to Bing URLs to force English results
regardless of container IP geolocation (was returning German credit card results)
- Extend selector wait timeout 5s → 8s for CloakBrowser render time
## browser_use_tool.py
- extract_content: use page.inner_text('body') as primary method (works on SPAs)
- Fall back to markdownify on raw HTML for static pages
- Return explicit error when page is empty (login wall / rate limit / unrendered SPA)
## config.toml (not tracked by git — apply manually or already on disk)
- Primary engine: Bing → DuckDuckGo
- Fallback order: Google → Bing → Baidu
retry_delay: 60s → 5s (was sleeping 60s per retry between engine attempts) max_retries: 3 → 1 (was retrying all 4 broken engines 3 times = 3 min hang) Root cause: with all container search engines failing (DDG rate-limited, Bing returning garbage), the retry loop was sleeping 60s × 3 retries = 3min before returning an error to the agent. LM Studio never got called. Now fails fast (≤20s) and returns actionable error to the agent immediately.
…retries on blocked search engines
… (matches web UI)
…browser tools and MCP server configuration
ahamerski
approved these changes
Jun 14, 2026
…DeepSpec research scaffolding
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
This PR introduces my OpenManus v2 upgrade focused on conversation-first execution, better runtime reliability, and a significantly improved web UX.
What’s Included
read_filestool compatibility fix (pathandpaths)Why
The goal is to make OpenManus practical for daily use:
Validation
vite build)Notes