Skip to content

OpenManus v2 upgrade#1366

Open
MohamedElsaeidy wants to merge 70 commits into
FoundationAgents:mainfrom
MohamedElsaeidy:main
Open

OpenManus v2 upgrade#1366
MohamedElsaeidy wants to merge 70 commits into
FoundationAgents:mainfrom
MohamedElsaeidy:main

Conversation

@MohamedElsaeidy

Copy link
Copy Markdown

Summary

This PR introduces my OpenManus v2 upgrade focused on conversation-first execution, better runtime reliability, and a significantly improved web UX.

What’s Included

  • Conversation-first task flow with continuity across follow-ups
  • Per-conversation workspace/sandbox isolation
  • Mid-task user messaging support
  • Live observability improvements:
    • tool traces
    • terminal output
    • browser screenshots
    • token counts
  • Runtime controls for process/container visibility and kill actions
  • Agent completion reliability fixes (prevents non-final “stopping early” responses)
  • read_files tool compatibility fix (path and paths)
  • Frontend UX overhaul (chat/runtime/settings/admin improvements)
  • LM Studio model eject action from UI + backend endpoint
  • Config/runtime settings enhancements

Why

The goal is to make OpenManus practical for daily use:

  • fewer stuck/unfinished runs
  • better transparency while the agent works
  • safer conversation/project isolation
  • better operator control from the UI

Validation

  • Backend compile checks passed
  • Frontend build passed (vite build)
  • Manual runtime checks on task streaming, model controls, and conversation behavior

Notes

  • This is a broad feature PR with backend, frontend, and runtime changes.
  • I’m actively maintaining this branch and will respond quickly to review comments and follow-up fixes.

LukeJiaoR and others added 28 commits January 11, 2026 22:39
1. 禁止在入口文件中直接实例化或运行任何 Agent
2. 如果存在 run_task / run_agent / main_execute 之类逻辑,请注释或移除
3. 入口文件只保留启动服务或占位逻辑
4. 不引入新功能,不改变任何 Agent 行为
5. 代码必须仍可 import,不要求可执行

请只修改入口相关文件,不要改 agent、tool、memory 代码。
要求:
1. Task 必须包含:
   - id(字符串)
   - status(CREATED / RUNNING / DONE / FAILED / INTERRUPTED)
   - interrupt_flag(bool)
   - event_queue(线程安全或 asyncio 兼容)
2. 提供方法:
   - emit(type: str, data: Any)
   - interrupt()
   - is_interrupted()
3. 不依赖 FastAPI / WebSocket
4. 不修改任何已有代码
5. Task 设计要尽量通用,后续可被 agent / tool 使用

只新增文件,不改已有文件。
要求:
1. 所有 Agent 的入口方法签名改为:
   run(task: Task, input: Any)
2. Agent 内部不得直接 print / logging 作为输出
3. 所有“对外可见”的信息必须通过:
   task.emit(event_type, data)
4. 在每个主要执行步骤前,检查:
   if task.is_interrupted(): raise TaskInterrupted
5. 不改变 Agent 的决策逻辑,不优化 prompt,不新增能力

请只修改 Agent 相关文件。
1. Planner 的输出必须是结构化 plan(如 list[dict] 或 dataclass)
2. 生成 plan 的过程中,通过 task.emit(plan.step, step)
3. Planner 不执行任何 tool
4. Planner 完成时 emit(plan.done)

不要修改 Executor,不引入新模型调用。
1. Executor 只消费 Planner 生成的 plan
2. 每次执行 step 前:
   task.emit(execute.step.start, step)
3. 调用 tool 前:
   task.emit(tool.call, tool_name, args)
4. tool 执行完成后:
   task.emit(tool.result, result)
5. 支持 interrupt(中断后立刻停止)

不要修改 Planner 和 Tool 实现。
实现 ToolRunner 类,统一所有 tool 执行入口
提供 run(task, tool_name, args) -> ToolResult
支持:
timeout
捕获 stdout / stderr
在执行期间检查 task.is_interrupted()
不直接调用 os.system
不修改具体 tool 的业务逻辑
只新增 runner,不重构 tool。
1. 禁止直接 subprocess.run / os.system
2. 统一通过 ToolRunner 调用
3. 返回结构化 ToolResult(stdout / stderr / exit_code)
4. 支持 interrupt

不修改 shell 的功能行为。
实现 ContextEngine.build(task, agent_role, step_type)
Context 分为:
Hard facts(用户目标、当前 plan)
Recent events(最近 tool 输出)
Process summary(字符串)
支持 context token budget(简单字符裁剪即可)
不调用向量数据
1. 所有 Agent 在调用 LLM 前,必须通过 ContextEngine.build 获取上下文
2. Agent 不得自行拼接历史记录
3. 不修改 prompt 内容本身,只替换 context 来源

只改 prompt 构建相关代码。
1. 使用 FastAPI
2. 提供:
   - POST /tasks(创建任务)
   - GET /tasks/{id}(查询状态)
   - POST /tasks/{id}/interrupt
3. API 调用 TaskRegistry
4. Agent 在 background task 中运行
5. 不引入 WebSocket

只新增 server/api.py。
1. WebSocket endpoint: /tasks/{id}/stream
2. 从 task.event_queue 读取事件
3. 将事件以 JSON 形式推送给客户端
4. 支持客户端发送 interrupt 指令
5. 不包含业务逻辑,只做事件转发

不要修改 API 层。
1. TaskInterrupted → status = INTERRUPTED
2. 未捕获异常 → status = FAILED
3. 正常完成 → status = DONE
4. 所有状态变化 emit task.status 事件

不要改变业务逻辑,只补状态收敛。
使用 SQLAlchemy 定义 Task 模型,用于 PostgreSQL。
Task 字段:
task_id (UUID, 主键)
status (字符串)
input (JSON)
result (JSON, 可为空)
created_at (时间戳)
updated_at (时间戳)
将原来的内存 TaskRegistry 改成 DB 操作:
get_task(task_id)
create_task(...)
update_task(task)
ORM 模型放在 server/models.py。
禁止修改业务逻辑或 API 层。
保持与现有 /tasks API 接口兼容。
使用 Celery 作为异步任务执行框架。
创建 server/celery_app.py,初始化 Celery:
broker 使用 Redis (redis://redis:6379/0)
backend 使用 PostgreSQL 或 Redis
将 Task 执行逻辑包装成 Celery task:
接收 task_id,执行原来逻辑
执行完成后更新数据库 task.status = COMPLETED
将执行结果写入 task.result
FastAPI 的 /tasks POST API:
创建 Task(写 DB)
立即调用 Celery 异步执行
FastAPI 的 /tasks/{task_id} GET API:
直接查询数据库
不改变现有 API 路径。
1. 修改 server/api.py,使 API 查询和状态返回读取 PostgreSQL。
2. POST /tasks:
   - 写入 DB
   - 调用 Celery task 异步执行
   - 返回 task_id 和 status=CREATED
3. GET /tasks/{task_id}:
   - 查询 DB 返回 status + result
4. POST /tasks/{task_id}/interrupt:
   - 标记 DB task.status=INTERRUPTED
   - 可发送 Celery revoke 命令取消任务
5. 保持 FastAPI 路由不变
6. 不修改业务逻辑
1. 在项目根创建 Dockerfile,用于 FastAPI Web:
   - Python 3.10-slim
   - 安装 requirements.txt
   - CMD uvicorn main:app --host 0.0.0.0 --port 8000
2. 创建 docker-compose.yml:
   - 服务:
     a) web: FastAPI
     b) worker: Celery Worker,依赖 Redis + PostgreSQL
     c) redis: 最新官方镜像
     d) postgres: 最新官方镜像,持久化卷
   - 网络互通
   - 显式暴露端口 8000
3. Web 与 Worker 使用同一代码库
4. 保持 API 与 DB 连接参数一致
5. 不修改业务逻辑
1. 创建 .github/workflows/docker-celery-smoke-test.yml
2. CI 步骤:
   a) checkout 代码
   b) build docker-compose
   c) 启动容器 (web + worker + redis + postgres)
   d) 等待服务启动
   e) 验证 /health 返回 200
   f) 验证 /docs 返回 200
   g) 验证 /tasks POST + GET 返回正确 status
   h) 关闭并清理容器
3. 失败任意步骤 CI 红灯
4. 不修改源代码业务逻辑
5. Docker + Celery + DB 必须和 PR 代码一致
mergin tree to main
@tayseer-dev

Copy link
Copy Markdown

works well ! Great work

MohamedElsaeidy added 14 commits May 22, 2026 00:26
## Agent Core
- Semantic stuck-loop detection: repeated tool+args signatures (Counter-based)
- Tool-call dedup: seen_sigs set in act() skips identical calls within a step
- _looks_final_response: replaced brittle single-regex with 3-tier signal system
  (_STRONG_FINAL_RE, _WEAK_FINAL_RE) — prevents mid-task words from early-finishing

## LLM Layer
- Remove stdout-polluting print() from streaming chunks (MCP pipe fix)
- stream_options={include_usage:true} for real completion token counts
- Expanded REASONING_MODELS (o1-mini, o3, o4-mini) + REASONING_EFFORT_MODELS
- CLAUDE_THINKING_MODELS + extended thinking block injection in ask_tool()
- LM-Studio dynamic capability probe: _probe_local_server_caps() reads
  reasoning/vision flags from /api/v0/models at init, falls back to Ollama
- enable_thinking: Optional[bool] in LLMSettings + config.example.toml docs
- ensure_user_query: non-mutating (list spread instead of .append())

## Tool System
- ToolResult: exit_code, metadata, is_error property (structured results)
- BaseTool: parallel_safe, can_retry, emits_progress capability flags
- GlobSearch, GrepSearch, ReadFiles, CodebaseOverview: parallel_safe=True
- WebSearch: parallel_safe=True + SearchResponse.metadata -> search_metadata
- ToolCallAgent._is_parallel_safe(): reads tool.parallel_safe flag (data-driven)
- execute_tool(): retry-with-feedback loop for tools with can_retry=True
- Remove dead str_replace_editor branch

## UI Rewind
- HomePage: suggested prompts (6), capability chips, recent conversations resume
- TaskDetailPage: step progress pill, token counter, removed hardcoded v2.0
- preview-content.tsx: 1087 -> 120 lines (pure router)
  + panels/BrowserPanel.tsx (~40 lines)
  + panels/RuntimePanel.tsx (~180 lines)
  + panels/ToolsPanel.tsx (~220 lines, live/terminal/tool panels)
  + panels/WorkspacePanel.tsx (~260 lines, file browser + collapsible diffs)
- ChangesPanel: collapsible <details> diff viewer (replaces always-expanded)

## Server
- _get_llm_caps() helper exposed in health check
  (model, is_local_server, caps_thinking, caps_vision, thinking_enabled)
## 1. LLM Singleton → Factory (Medium)
- Add get_llm(config_name) module-level factory with _llm_registry dict
- Add _evict_llm(config_name) for test isolation / hot-config-reload
- LLM() still works as a backwards-compatible constructor (delegates to factory)
- _init_from_config() is now a separate method called at most once per instance
- Eliminates the __new__ + __init__ double-call fragility

## 2. ReAct Trace Events (Low — but high observability value)
- react.py: rewritten from 46-line stub to full Reason→Act→Observe step loop
  Each step emits: step:reason, step:act, step:complete (or step:observe on act)
- toolcall.py think(): emits agent:lifecycle:step:reason after LLM responds
  (includes reasoning text, will_act bool, tools_planned list)
- toolcall.py act(): emits agent:lifecycle:step:observe after all tools complete
  (includes tool_count, tools_executed, observation_preview)

## 3. Tool Progress Streaming (Low)
- Bash: emits tool:progress start/done events in sandbox path
- PythonExecute: emits tool:progress start/done events in sandbox path
- Both: emits_progress=True capability flag added
- Local session path already streamed; sandbox path was a silent black hole — fixed

## 4. Agent-Level Planning visibility (Medium — lite version)
- PlanningTool: adds emit_current_task calls to create/update/mark_step/delete
- Emits agent:plan:updated with full step list, statuses, notes, progress %
- UI: LiveActivityPanel now shows sticky PlanCard above the event stream
  Real-time progress bar + per-step status icons (○→✓!) that update as
  agent calls mark_step — no polling required

## 5. Settings Page (Low)
- ConversationSettingsPage: enable_thinking three-way toggle (Auto/Always On/Disabled)
  with descriptive help text explaining each option
- ConversationSettingsPage: Performance Mode checkbox in LLM Limits card
- Both saved to updateConversationSettings as enable_thinking + performance_mode
…uttons

## Search fallback improvements (web_search.py)
- Fix User-Agent header typo ('WebSearch' key → 'User-Agent') — content fetcher
  was sending no User-Agent header at all, causing many sites to 403
- Fix failed_engines tracking bug: engines were never added to failed_engines
  during the loop so silent failures went unrecorded and unlogged
- Add per-engine asyncio.wait_for(timeout=12s) so a hung Bing Playwright call
  can't block all other engines for 30s+
- Add proper exception catch per engine (timeout + generic) so one crashing
  engine doesn't abort the entire fallback chain
- Surface an LLM-readable error message when ALL engines fail, explicitly
  telling the model NOT to fall back to python_execute with requests (which
  has the same restrictions and also fails)
- Increase default num_results 5 → 8 for richer agent context
- Update tool description to be explicit about fallback order and warn against
  python fallback

## browser_use_tool.py web_search action crash fix
- Guard against IndexError when search_response.results is empty
  (previously: results[0] would crash with unhandled exception)
- Return the search error string directly to the agent instead of crashing

## UI: remove duplicate Manus Computer header buttons (index.tsx)
- Remove 'Live' text button that was a visual+functional duplicate of the
  ListChecksIcon button (both called setData({ type: 'live' }))
- Replace Skills button icon from ListChecksIcon (duplicate of Live) to
  BookOpenIcon for clear visual differentiation
## requirements.txt
- Bump cloakbrowser 0.3.24 → 0.3.30 (Chromium 146, 58 C++ fingerprint patches,
  SOCKS5 proxy, WebRTC IP spoofing, humanize actionability improvements)

## Dockerfile
- Pre-download CloakBrowser binary at image build time via ensure_binary()
  so it is baked into the image and available without a network hit at runtime

## browser_use_tool.py
- Add _get_cloak_binary_path() helper: reads binary_info(), downloads if not
  cached, returns path or None on any error (graceful fallback)
- _ensure_browser_initialized: inject cloak binary as chrome_instance_path
  so browser_use launches its CDP session into the stealth Chromium binary
  instead of stock Playwright Chromium
- Only applies when no custom chrome_instance_path/wss_url/cdp_url is set
  (respects existing user overrides)
- Falls back transparently to stock Playwright if CloakBrowser is unavailable

Effect: ALL agent browser sessions (go_to_url, extract_content, click_element,
web_search, etc.) now use Chromium with source-level anti-detection patches
instead of detectable headless Chromium. Cloudflare, FingerprintJS, BrowserScan
and Turnstile should no longer block agent browsing.
## bing_search.py
- Add &setlang=en&cc=US&mkt=en-US to Bing URLs to force English results
  regardless of container IP geolocation (was returning German credit card results)
- Extend selector wait timeout 5s → 8s for CloakBrowser render time

## browser_use_tool.py
- extract_content: use page.inner_text('body') as primary method (works on SPAs)
- Fall back to markdownify on raw HTML for static pages
- Return explicit error when page is empty (login wall / rate limit / unrendered SPA)

## config.toml (not tracked by git — apply manually or already on disk)
- Primary engine: Bing → DuckDuckGo
- Fallback order: Google → Bing → Baidu
retry_delay: 60s → 5s (was sleeping 60s per retry between engine attempts)
max_retries: 3 → 1 (was retrying all 4 broken engines 3 times = 3 min hang)

Root cause: with all container search engines failing (DDG rate-limited,
Bing returning garbage), the retry loop was sleeping 60s × 3 retries = 3min
before returning an error to the agent. LM Studio never got called.
Now fails fast (≤20s) and returns actionable error to the agent immediately.
…browser tools and MCP server configuration
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants