Skip to content
Closed
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 13 additions & 4 deletions pr_agent/servers/github_app.py
Original file line number Diff line number Diff line change
Expand Up @@ -200,10 +200,19 @@ async def handle_push_trigger_for_new_commits(body: Dict[str, Any],
await _perform_auto_commands_github("push_commands", agent, body, api_url, log_context)

finally:
# release the waiting task block
async with _pending_task_duplicate_push_conditions[api_url]:
_pending_task_duplicate_push_conditions[api_url].notify(1)
_duplicate_push_triggers[api_url] -= 1
# release the waiting task block, then clean up if no tasks remain
should_cleanup = False
try:
async with _pending_task_duplicate_push_conditions[api_url]:
_pending_task_duplicate_push_conditions[api_url].notify(1)
_duplicate_push_triggers[api_url] -= 1
should_cleanup = _duplicate_push_triggers[api_url] <= 0
except KeyError:
# TTL eviction already cleaned up this entry
pass
if should_cleanup:
_duplicate_push_triggers.pop(api_url, None)
_pending_task_duplicate_push_conditions.pop(api_url, None)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Pop bypasses ttl bookkeeping 🐞 Bug ☼ Reliability

handle_push_trigger_for_new_commits() removes DefaultDictWithTimeout entries via .pop(), which
does not update DefaultDictWithTimeout’s internal __key_times map. After this PR makes
__refresh() run, a later refresh can try to del self[key] for a key that was already popped from
the dict, raising KeyError and potentially failing webhook handling.
Agent Prompt
## Issue description
`DefaultDictWithTimeout` maintains a private `__key_times` map that is only kept in sync when deletion goes through `__delitem__`. The new cleanup uses `.pop()`, which bypasses `__delitem__`, leaving stale timestamps behind.

Once `__refresh()` runs (and this PR makes it run at the correct cadence), it will build `to_delete` from `__key_times` and execute `del self[key]`. If the dict entry was previously removed via `.pop()`, `super().__delitem__(key)` raises `KeyError`, propagating out of `__refresh()`/`__getitem__()`.

## Issue Context
This affects both `_duplicate_push_triggers` and `_pending_task_duplicate_push_conditions`, which are `DefaultDictWithTimeout` instances.

## Fix Focus Areas
- pr_agent/servers/github_app.py[203-215]
- pr_agent/servers/utils.py[63-72]
- pr_agent/servers/utils.py[84-86]

## Suggested fix
Prefer one of:
1) **Don’t use `.pop()` on `DefaultDictWithTimeout`**; delete via `del` so `__delitem__` runs, e.g.:
```python
try:
    del _duplicate_push_triggers[api_url]
except KeyError:
    pass
try:
    del _pending_task_duplicate_push_conditions[api_url]
except KeyError:
    pass
```

2) **Implement `pop()` (and possibly `clear()`/`popitem()`) on `DefaultDictWithTimeout`** to also remove `__key_times` entries (`self.__key_times.pop(key, None)`) before delegating to `super().pop(...)`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Non-atomic dedupe cleanup 🐞 Bug ☼ Reliability

handle_push_trigger_for_new_commits() decides to clean up shared per-PR state while holding the
Condition lock, but performs the actual .pop() outside that lock, allowing a concurrent push-trigger
task to start in between. This can delete the condition/counter for an in-flight task and corrupt
the dedupe bookkeeping (missed wait/notify, inconsistent counters).
Agent Prompt
### Issue description
`handle_push_trigger_for_new_commits()` computes `should_cleanup` under the per-PR `asyncio.Condition` lock, but then removes `_duplicate_push_triggers[api_url]` and `_pending_task_duplicate_push_conditions[api_url]` after releasing that lock. Because the increment/admission path mutates `_duplicate_push_triggers` before acquiring the condition lock, a new task can interleave between the decrement/check and the pop, causing live state to be removed.

### Issue Context
This code is implementing a concurrency/deduplication gate for push triggers per `api_url`. The counter and the condition must have a consistent lifecycle across all concurrent tasks.

### Fix Focus Areas
- pr_agent/servers/github_app.py[175-215]

### Suggested fix direction
- Ensure **all** mutations and lifecycle transitions for both `_duplicate_push_triggers[api_url]` and `_pending_task_duplicate_push_conditions[api_url]` are performed under a single, consistent synchronization mechanism.
  - One workable approach: fetch/create the per-PR condition first, then `async with condition:` to protect **both** admission (`setdefault`/increment) and cleanup (decrement + possible pop) for that `api_url`.
  - Alternatively, introduce a dedicated lock (global or per-`api_url`) that guards: setdefault/increment, decrement/check, and pop, so no new task can start between the cleanup decision and the removal.
- If you keep the `pop()`, perform it inside the same critical section as the `should_cleanup` decision, and re-check the counter immediately before removing entries.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



def handle_closed_pr(body, event, action, log_context):
Expand Down
Loading