OpenManus v2 upgrade by MohamedElsaeidy · Pull Request #1366 · FoundationAgents/OpenManus

MohamedElsaeidy · 2026-05-14T22:09:50Z

Summary

This PR introduces my OpenManus v2 upgrade focused on conversation-first execution, better runtime reliability, and a significantly improved web UX.

What’s Included

Conversation-first task flow with continuity across follow-ups
Per-conversation workspace/sandbox isolation
Mid-task user messaging support
Live observability improvements:
- tool traces
- terminal output
- browser screenshots
- token counts
Runtime controls for process/container visibility and kill actions
Agent completion reliability fixes (prevents non-final “stopping early” responses)
read_files tool compatibility fix (path and paths)
Frontend UX overhaul (chat/runtime/settings/admin improvements)
LM Studio model eject action from UI + backend endpoint
Config/runtime settings enhancements

Why

The goal is to make OpenManus practical for daily use:

fewer stuck/unfinished runs
better transparency while the agent works
safer conversation/project isolation
better operator control from the UI

Validation

Backend compile checks passed
Frontend build passed (vite build)
Manual runtime checks on task streaming, model controls, and conversation behavior

Notes

This is a broad feature PR with backend, frontend, and runtime changes.
I’m actively maintaining this branch and will respond quickly to review comments and follow-up fixes.

1. 禁止在入口文件中直接实例化或运行任何 Agent 2. 如果存在 run_task / run_agent / main_execute 之类逻辑，请注释或移除 3. 入口文件只保留启动服务或占位逻辑 4. 不引入新功能，不改变任何 Agent 行为 5. 代码必须仍可 import，不要求可执行请只修改入口相关文件，不要改 agent、tool、memory 代码。

要求： 1. Task 必须包含： - id（字符串） - status（CREATED / RUNNING / DONE / FAILED / INTERRUPTED） - interrupt_flag（bool） - event_queue（线程安全或 asyncio 兼容） 2. 提供方法： - emit(type: str, data: Any) - interrupt() - is_interrupted() 3. 不依赖 FastAPI / WebSocket 4. 不修改任何已有代码 5. Task 设计要尽量通用，后续可被 agent / tool 使用只新增文件，不改已有文件。

要求： 1. 所有 Agent 的入口方法签名改为： run(task: Task, input: Any) 2. Agent 内部不得直接 print / logging 作为输出 3. 所有“对外可见”的信息必须通过： task.emit(event_type, data) 4. 在每个主要执行步骤前，检查： if task.is_interrupted(): raise TaskInterrupted 5. 不改变 Agent 的决策逻辑，不优化 prompt，不新增能力请只修改 Agent 相关文件。

1. Planner 的输出必须是结构化 plan（如 list[dict] 或 dataclass） 2. 生成 plan 的过程中，通过 task.emit(plan.step, step) 3. Planner 不执行任何 tool 4. Planner 完成时 emit(plan.done) 不要修改 Executor，不引入新模型调用。

1. Executor 只消费 Planner 生成的 plan 2. 每次执行 step 前： task.emit(execute.step.start, step) 3. 调用 tool 前： task.emit(tool.call, tool_name, args) 4. tool 执行完成后： task.emit(tool.result, result) 5. 支持 interrupt（中断后立刻停止）不要修改 Planner 和 Tool 实现。

实现 ToolRunner 类，统一所有 tool 执行入口提供 run(task, tool_name, args) -> ToolResult 支持： timeout 捕获 stdout / stderr 在执行期间检查 task.is_interrupted() 不直接调用 os.system 不修改具体 tool 的业务逻辑只新增 runner，不重构 tool。

1. 禁止直接 subprocess.run / os.system 2. 统一通过 ToolRunner 调用 3. 返回结构化 ToolResult（stdout / stderr / exit_code） 4. 支持 interrupt 不修改 shell 的功能行为。

实现 ContextEngine.build(task, agent_role, step_type) Context 分为： Hard facts（用户目标、当前 plan） Recent events（最近 tool 输出） Process summary（字符串）支持 context token budget（简单字符裁剪即可）不调用向量数据

1. 所有 Agent 在调用 LLM 前，必须通过 ContextEngine.build 获取上下文 2. Agent 不得自行拼接历史记录 3. 不修改 prompt 内容本身，只替换 context 来源只改 prompt 构建相关代码。

1. 使用 FastAPI 2. 提供： - POST /tasks（创建任务） - GET /tasks/{id}（查询状态） - POST /tasks/{id}/interrupt 3. API 调用 TaskRegistry 4. Agent 在 background task 中运行 5. 不引入 WebSocket 只新增 server/api.py。

1. WebSocket endpoint: /tasks/{id}/stream 2. 从 task.event_queue 读取事件 3. 将事件以 JSON 形式推送给客户端 4. 支持客户端发送 interrupt 指令 5. 不包含业务逻辑，只做事件转发不要修改 API 层。

1. TaskInterrupted → status = INTERRUPTED 2. 未捕获异常 → status = FAILED 3. 正常完成 → status = DONE 4. 所有状态变化 emit task.status 事件不要改变业务逻辑，只补状态收敛。

使用 SQLAlchemy 定义 Task 模型，用于 PostgreSQL。 Task 字段： task_id (UUID, 主键) status (字符串) input (JSON) result (JSON, 可为空) created_at (时间戳) updated_at (时间戳) 将原来的内存 TaskRegistry 改成 DB 操作： get_task(task_id) create_task(...) update_task(task) ORM 模型放在 server/models.py。禁止修改业务逻辑或 API 层。保持与现有 /tasks API 接口兼容。

使用 Celery 作为异步任务执行框架。创建 server/celery_app.py，初始化 Celery： broker 使用 Redis (redis://redis:6379/0) backend 使用 PostgreSQL 或 Redis 将 Task 执行逻辑包装成 Celery task：接收 task_id，执行原来逻辑执行完成后更新数据库 task.status = COMPLETED 将执行结果写入 task.result FastAPI 的 /tasks POST API：创建 Task（写 DB）立即调用 Celery 异步执行 FastAPI 的 /tasks/{task_id} GET API：直接查询数据库不改变现有 API 路径。

1. 修改 server/api.py，使 API 查询和状态返回读取 PostgreSQL。 2. POST /tasks： - 写入 DB - 调用 Celery task 异步执行 - 返回 task_id 和 status=CREATED 3. GET /tasks/{task_id}： - 查询 DB 返回 status + result 4. POST /tasks/{task_id}/interrupt： - 标记 DB task.status=INTERRUPTED - 可发送 Celery revoke 命令取消任务 5. 保持 FastAPI 路由不变 6. 不修改业务逻辑

1. 在项目根创建 Dockerfile，用于 FastAPI Web： - Python 3.10-slim - 安装 requirements.txt - CMD uvicorn main:app --host 0.0.0.0 --port 8000 2. 创建 docker-compose.yml： - 服务： a) web: FastAPI b) worker: Celery Worker，依赖 Redis + PostgreSQL c) redis: 最新官方镜像 d) postgres: 最新官方镜像，持久化卷 - 网络互通 - 显式暴露端口 8000 3. Web 与 Worker 使用同一代码库 4. 保持 API 与 DB 连接参数一致 5. 不修改业务逻辑

1. 创建 .github/workflows/docker-celery-smoke-test.yml 2. CI 步骤： a) checkout 代码 b) build docker-compose c) 启动容器 (web + worker + redis + postgres) d) 等待服务启动 e) 验证 /health 返回 200 f) 验证 /docs 返回 200 g) 验证 /tasks POST + GET 返回正确 status h) 关闭并清理容器 3. 失败任意步骤 CI 红灯 4. 不修改源代码业务逻辑 5. Docker + Celery + DB 必须和 PR 代码一致

…servability, and UI overhaul

… non-actionable model outputs

…nd cleanup minor whitespace

mergin tree to main

…ute resolution

…nd integration health monitoring

…ion for long-term memory components

…pendencies with Python scripts and enhancing diagnostic logging on failure

…se polling attempts

…n lifecycle tracking

…onse on local model failures

…otency, and add diff statistics to chat UI

… guidelines

…recise file modifications

tayseer-dev · 2026-05-19T20:30:18Z

works well ! Great work

## Agent Core - Semantic stuck-loop detection: repeated tool+args signatures (Counter-based) - Tool-call dedup: seen_sigs set in act() skips identical calls within a step - _looks_final_response: replaced brittle single-regex with 3-tier signal system (_STRONG_FINAL_RE, _WEAK_FINAL_RE) — prevents mid-task words from early-finishing ## LLM Layer - Remove stdout-polluting print() from streaming chunks (MCP pipe fix) - stream_options={include_usage:true} for real completion token counts - Expanded REASONING_MODELS (o1-mini, o3, o4-mini) + REASONING_EFFORT_MODELS - CLAUDE_THINKING_MODELS + extended thinking block injection in ask_tool() - LM-Studio dynamic capability probe: _probe_local_server_caps() reads reasoning/vision flags from /api/v0/models at init, falls back to Ollama - enable_thinking: Optional[bool] in LLMSettings + config.example.toml docs - ensure_user_query: non-mutating (list spread instead of .append()) ## Tool System - ToolResult: exit_code, metadata, is_error property (structured results) - BaseTool: parallel_safe, can_retry, emits_progress capability flags - GlobSearch, GrepSearch, ReadFiles, CodebaseOverview: parallel_safe=True - WebSearch: parallel_safe=True + SearchResponse.metadata -> search_metadata - ToolCallAgent._is_parallel_safe(): reads tool.parallel_safe flag (data-driven) - execute_tool(): retry-with-feedback loop for tools with can_retry=True - Remove dead str_replace_editor branch ## UI Rewind - HomePage: suggested prompts (6), capability chips, recent conversations resume - TaskDetailPage: step progress pill, token counter, removed hardcoded v2.0 - preview-content.tsx: 1087 -> 120 lines (pure router) + panels/BrowserPanel.tsx (~40 lines) + panels/RuntimePanel.tsx (~180 lines) + panels/ToolsPanel.tsx (~220 lines, live/terminal/tool panels) + panels/WorkspacePanel.tsx (~260 lines, file browser + collapsible diffs) - ChangesPanel: collapsible <details> diff viewer (replaces always-expanded) ## Server - _get_llm_caps() helper exposed in health check (model, is_local_server, caps_thinking, caps_vision, thinking_enabled)

## 1. LLM Singleton → Factory (Medium) - Add get_llm(config_name) module-level factory with _llm_registry dict - Add _evict_llm(config_name) for test isolation / hot-config-reload - LLM() still works as a backwards-compatible constructor (delegates to factory) - _init_from_config() is now a separate method called at most once per instance - Eliminates the __new__ + __init__ double-call fragility ## 2. ReAct Trace Events (Low — but high observability value) - react.py: rewritten from 46-line stub to full Reason→Act→Observe step loop Each step emits: step:reason, step:act, step:complete (or step:observe on act) - toolcall.py think(): emits agent:lifecycle:step:reason after LLM responds (includes reasoning text, will_act bool, tools_planned list) - toolcall.py act(): emits agent:lifecycle:step:observe after all tools complete (includes tool_count, tools_executed, observation_preview) ## 3. Tool Progress Streaming (Low) - Bash: emits tool:progress start/done events in sandbox path - PythonExecute: emits tool:progress start/done events in sandbox path - Both: emits_progress=True capability flag added - Local session path already streamed; sandbox path was a silent black hole — fixed ## 4. Agent-Level Planning visibility (Medium — lite version) - PlanningTool: adds emit_current_task calls to create/update/mark_step/delete - Emits agent:plan:updated with full step list, statuses, notes, progress % - UI: LiveActivityPanel now shows sticky PlanCard above the event stream Real-time progress bar + per-step status icons (○→✓!) that update as agent calls mark_step — no polling required ## 5. Settings Page (Low) - ConversationSettingsPage: enable_thinking three-way toggle (Auto/Always On/Disabled) with descriptive help text explaining each option - ConversationSettingsPage: Performance Mode checkbox in LLM Limits card - Both saved to updateConversationSettings as enable_thinking + performance_mode

…solver issues

…oss LLM and agent modules

…uttons ## Search fallback improvements (web_search.py) - Fix User-Agent header typo ('WebSearch' key → 'User-Agent') — content fetcher was sending no User-Agent header at all, causing many sites to 403 - Fix failed_engines tracking bug: engines were never added to failed_engines during the loop so silent failures went unrecorded and unlogged - Add per-engine asyncio.wait_for(timeout=12s) so a hung Bing Playwright call can't block all other engines for 30s+ - Add proper exception catch per engine (timeout + generic) so one crashing engine doesn't abort the entire fallback chain - Surface an LLM-readable error message when ALL engines fail, explicitly telling the model NOT to fall back to python_execute with requests (which has the same restrictions and also fails) - Increase default num_results 5 → 8 for richer agent context - Update tool description to be explicit about fallback order and warn against python fallback ## browser_use_tool.py web_search action crash fix - Guard against IndexError when search_response.results is empty (previously: results[0] would crash with unhandled exception) - Return the search error string directly to the agent instead of crashing ## UI: remove duplicate Manus Computer header buttons (index.tsx) - Remove 'Live' text button that was a visual+functional duplicate of the ListChecksIcon button (both called setData({ type: 'live' })) - Replace Skills button icon from ListChecksIcon (duplicate of Live) to BookOpenIcon for clear visual differentiation

…ools

…safety standards

## requirements.txt - Bump cloakbrowser 0.3.24 → 0.3.30 (Chromium 146, 58 C++ fingerprint patches, SOCKS5 proxy, WebRTC IP spoofing, humanize actionability improvements) ## Dockerfile - Pre-download CloakBrowser binary at image build time via ensure_binary() so it is baked into the image and available without a network hit at runtime ## browser_use_tool.py - Add _get_cloak_binary_path() helper: reads binary_info(), downloads if not cached, returns path or None on any error (graceful fallback) - _ensure_browser_initialized: inject cloak binary as chrome_instance_path so browser_use launches its CDP session into the stealth Chromium binary instead of stock Playwright Chromium - Only applies when no custom chrome_instance_path/wss_url/cdp_url is set (respects existing user overrides) - Falls back transparently to stock Playwright if CloakBrowser is unavailable Effect: ALL agent browser sessions (go_to_url, extract_content, click_element, web_search, etc.) now use Chromium with source-level anti-detection patches instead of detectable headless Chromium. Cloudflare, FingerprintJS, BrowserScan and Turnstile should no longer block agent browsing.

## bing_search.py - Add &setlang=en&cc=US&mkt=en-US to Bing URLs to force English results regardless of container IP geolocation (was returning German credit card results) - Extend selector wait timeout 5s → 8s for CloakBrowser render time ## browser_use_tool.py - extract_content: use page.inner_text('body') as primary method (works on SPAs) - Fall back to markdownify on raw HTML for static pages - Return explicit error when page is empty (login wall / rate limit / unrendered SPA) ## config.toml (not tracked by git — apply manually or already on disk) - Primary engine: Bing → DuckDuckGo - Fallback order: Google → Bing → Baidu

retry_delay: 60s → 5s (was sleeping 60s per retry between engine attempts) max_retries: 3 → 1 (was retrying all 4 broken engines 3 times = 3 min hang) Root cause: with all container search engines failing (DDG rate-limited, Bing returning garbage), the retry loop was sleeping 60s × 3 retries = 3min before returning an error to the agent. LM Studio never got called. Now fails fast (≤20s) and returns actionable error to the agent immediately.

…retries on blocked search engines

… (matches web UI)

…browser tools and MCP server configuration

…DeepSpec research scaffolding

LukeJiaoR and others added 28 commits January 11, 2026 22:39

fix pillow

ac43c73

请新增 core/task_registry.py：

6a21f50

请修改 tools/shell.py：

e08b122

1. 禁止直接 subprocess.run / os.system 2. 统一通过 ToolRunner 调用 3. 返回结构化 ToolResult（stdout / stderr / exit_code） 4. 支持 interrupt 不修改 shell 的功能行为。

请修改 Agent prompt 构建逻辑：

d52acd2

1. 所有 Agent 在调用 LLM 前，必须通过 ContextEngine.build 获取上下文 2. Agent 不得自行拼接历史记录 3. 不修改 prompt 内容本身，只替换 context 来源只改 prompt 构建相关代码。

请新增 server/api.py：

6445f43

1. 使用 FastAPI 2. 提供： - POST /tasks（创建任务） - GET /tasks/{id}（查询状态） - POST /tasks/{id}/interrupt 3. API 调用 TaskRegistry 4. Agent 在 background task 中运行 5. 不引入 WebSocket 只新增 server/api.py。

请新增 server/ws.py：

7e9f56e

1. WebSocket endpoint: /tasks/{id}/stream 2. 从 task.event_queue 读取事件 3. 将事件以 JSON 形式推送给客户端 4. 支持客户端发送 interrupt 指令 5. 不包含业务逻辑，只做事件转发不要修改 API 层。

请新增统一异常处理：

f8604d4

1. TaskInterrupted → status = INTERRUPTED 2. 未捕获异常 → status = FAILED 3. 正常完成 → status = DONE 4. 所有状态变化 emit task.status 事件不要改变业务逻辑，只补状态收敛。

add dockerfile and ci

1ccc414

fix pre-commit check

29d352a

OpenManus v2: conversation-first runtime, persistent sandbox, live ob…

c3ebfff

…servability, and UI overhaul

OpenManus v2: conversation-first runtime, persistent sandbox, live ob…

57c52f9

…servability, and UI overhaul

implement final response detection and add step-tracking for repeated…

66575a5

… non-actionable model outputs

apply consistent code formatting and black linting across codebase

b98eec1

refactor: reformat codebase with consistent black-style line breaks a…

dac8cbe

…nd cleanup minor whitespace

Merge branch 'openmanus-v2'

d9c7bb3

mergin tree to main

chore: update path alias in vite.config.ts to use node:path for absol…

4b32c52

…ute resolution

This was referenced May 14, 2026

Missing dependencies and Daytona SDK Authentication Crash on Startup #1361

Closed

问题重复，无法继续，执行过程中无法交付 #749

Open

MohamedElsaeidy added 12 commits May 16, 2026 04:15

chore: increase smoke test polling limit from 10 to 30 iterations

e2f0b2a

feat: implement long-term memory system with agent-accessible tools a…

bb527da

…nd integration health monitoring

refactor: improve import organization, formatting, and tool registrat…

acda9c6

…ion for long-term memory components

refactor: improve smoke test reliability by replacing external CLI de…

6d41419

…pendencies with Python scripts and enhancing diagnostic logging on failure

test: update celery smoke test to verify task interruption and increa…

1a1c330

…se polling attempts

refactor: improve LLM retry logic and enhance chat message aggregatio…

2bec2f0

…n lifecycle tracking

feat: implement template error fallback mechanism to force final resp…

b662004

…onse on local model failures

feat: introduce ApplyPatchEditor tool, improve StrReplaceEditor idemp…

ec87310

…otency, and add diff statistics to chat UI

refactor: reformat codebase to comply with standard linting and style…

bf6f250

… guidelines

feat: add MCP server implementation and introduce LineEdit tool for p…

098585e

…recise file modifications

style: reformat Python codebase for PEP 8 compliance and consistency

a3c9498

updated README

5cc68f5

MohamedElsaeidy mentioned this pull request May 21, 2026

feat: comprehensive security hardening, flow expansion, test infra, CI, and bug fixes #1372

Closed

MohamedElsaeidy added 14 commits May 22, 2026 00:26

chore: freeze requirements.txt to speed up docker builds and prevent …

05723c9

…solver issues

refactor: apply consistent code formatting and style improvements acr…

814f1f5

…oss LLM and agent modules

fix: correct indentation in LLM._init_from_config

b95e6a4

style: refactor whitespace and formatting in web search and browser t…

37ec1c6

…ools

feat: implement v1-grounded-heuristic policy for agent execution and …

533c2dc

…safety standards

fix: change engine-level stop_after_attempt to 1 to prevent multiple …

adad432

…retries on blocked search engines

fix(mcp): sync LM Studio context window to 128k before every run_task…

95f83c3

… (matches web UI)

refactor: standardize code formatting and improve readability across …

89a9914

…browser tools and MCP server configuration

ahamerski approved these changes Jun 14, 2026

View reviewed changes

feat: integrate FAISS vector search for hybrid memory recall and add …

3310c14

…DeepSpec research scaffolding

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

OpenManus v2 upgrade#1366

OpenManus v2 upgrade#1366
MohamedElsaeidy wants to merge 70 commits into
FoundationAgents:mainfrom
MohamedElsaeidy:main

MohamedElsaeidy commented May 14, 2026

Uh oh!

tayseer-dev commented May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants