fix: invert TTL refresh guard in DefaultDictWithTimeout (memory leak) by karesansui-u · Pull Request #2255 · The-PR-Agent/pr-agent

karesansui-u · 2026-03-16T12:29:34Z

Bug description

DefaultDictWithTimeout.__refresh() has an inverted guard condition. The comparison > should be <:

# Current (bug): returns when refresh IS needed
if request_time - self.__last_refresh > self.__refresh_interval:
    return

# Fixed: returns when refresh is NOT needed
if request_time - self.__last_refresh < self.__refresh_interval:
    return

This causes expired keys to never be cleaned up. The to_delete loop only runs when request_time - __last_refresh is small (no time passed = nothing to delete), and skips when enough time has passed (when cleanup is actually needed).

Impact

_duplicate_push_triggers and _pending_task_duplicate_push_conditions in the GitHub App server grow without bound. Every processed PR URL stays in memory forever. On long-running server instances, this is a memory leak.

Fix

Change > to < on line 67 of pr_agent/servers/utils.py.

Affected files

pr_agent/servers/utils.py (L67) — 1 character change

qodo-free-for-open-source-projects · 2026-03-16T12:29:47Z

Review Summary by Qodo

Fix inverted TTL refresh guard causing memory leak

🐞 Bug fix

Walkthroughs

Description

• Inverts TTL refresh guard condition from > to <
• Fixes memory leak in DefaultDictWithTimeout expiration logic
• Prevents unbounded growth of duplicate push trigger caches
• Ensures expired keys are properly cleaned up periodically

Diagram

flowchart LR
  A["Guard Condition<br/>request_time - last_refresh"] --> B{"Comparison<br/>Operator"}
  B -->|Before: >| C["Returns early<br/>when refresh needed"]
  B -->|After: <| D["Returns early<br/>when refresh not needed"]
  C --> E["Expired keys<br/>never deleted"]
  D --> F["Expired keys<br/>properly cleaned"]
  E --> G["Memory Leak"]
  F --> H["Fixed"]

File Changes

1. pr_agent/servers/utils.py 🐞 Bug fix +1/-1

Invert TTL refresh guard comparison operator

• Changed comparison operator from > to < in __refresh() method guard condition
• Fixes logic so deletion loop runs when enough time has passed
• Prevents memory leak in _duplicate_push_triggers and _pending_task_duplicate_push_conditions
 caches

pr_agent/servers/utils.py

qodo-free-for-open-source-projects · 2026-03-16T12:29:48Z

Code Review by Qodo

🐞 Bugs (4) 📘 Rule violations (1) 📜 Skill insights (0)

1. Non-atomic dedupe cleanup 🐞 Bug ☼ Reliability

Description

handle_push_trigger_for_new_commits() decides to clean up shared per-PR state while holding the
Condition lock, but performs the actual .pop() outside that lock, allowing a concurrent push-trigger
task to start in between. This can delete the condition/counter for an in-flight task and corrupt
the dedupe bookkeeping (missed wait/notify, inconsistent counters).

Code

pr_agent/servers/github_app.py[R203-215]

+        # release the waiting task block, then clean up if no tasks remain
+        should_cleanup = False
+        try:
+            async with _pending_task_duplicate_push_conditions[api_url]:
+                _pending_task_duplicate_push_conditions[api_url].notify(1)
+                _duplicate_push_triggers[api_url] -= 1
+                should_cleanup = _duplicate_push_triggers[api_url] <= 0
+        except KeyError:
+            # TTL eviction already cleaned up this entry
+            pass
+        if should_cleanup:
+            _duplicate_push_triggers.pop(api_url, None)
+            _pending_task_duplicate_push_conditions.pop(api_url, None)

Evidence

The admission path increments _duplicate_push_triggers[api_url] before acquiring the condition
lock, while the new cleanup code removes the dict entries after releasing the condition
lock—creating a check-then-act race window where a concurrent task can start and then have its state
popped out from under it.

pr_agent/servers/github_app.py[175-195]
pr_agent/servers/github_app.py[202-215]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`handle_push_trigger_for_new_commits()` computes `should_cleanup` under the per-PR `asyncio.Condition` lock, but then removes `_duplicate_push_triggers[api_url]` and `_pending_task_duplicate_push_conditions[api_url]` after releasing that lock. Because the increment/admission path mutates `_duplicate_push_triggers` before acquiring the condition lock, a new task can interleave between the decrement/check and the pop, causing live state to be removed.
### Issue Context
This code is implementing a concurrency/deduplication gate for push triggers per `api_url`. The counter and the condition must have a consistent lifecycle across all concurrent tasks.
### Fix Focus Areas
- pr_agent/servers/github_app.py[175-215]
### Suggested fix direction
- Ensure **all** mutations and lifecycle transitions for both `_duplicate_push_triggers[api_url]` and `_pending_task_duplicate_push_conditions[api_url]` are performed under a single, consistent synchronization mechanism.
- One workable approach: fetch/create the per-PR condition first, then `async with condition:` to protect **both** admission (`setdefault`/increment) and cleanup (decrement + possible pop) for that `api_url`.
- Alternatively, introduce a dedicated lock (global or per-`api_url`) that guards: setdefault/increment, decrement/check, and pop, so no new task can start between the cleanup decision and the removal.
- If you keep the `pop()`, perform it inside the same critical section as the `should_cleanup` decision, and re-check the counter immediately before removing entries.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. pop bypasses TTL bookkeeping 🐞 Bug ☼ Reliability

Description

handle_push_trigger_for_new_commits() removes DefaultDictWithTimeout entries via .pop(), which
does not update DefaultDictWithTimeout’s internal __key_times map. After this PR makes
__refresh() run, a later refresh can try to del self[key] for a key that was already popped from
the dict, raising KeyError and potentially failing webhook handling.

Code

pr_agent/servers/github_app.py[R214-215]

Evidence

The cleanup uses .pop() on DefaultDictWithTimeout, but the class only synchronizes its internal
timestamp map on __delitem__. __refresh() deletes based on __key_times using del self[key],
which will raise if the dict entry was already removed via .pop() and __key_times wasn’t
updated.

pr_agent/servers/github_app.py[203-215]
pr_agent/servers/utils.py[63-72]
pr_agent/servers/utils.py[84-86]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DefaultDictWithTimeout` maintains a private `__key_times` map that is only kept in sync when deletion goes through `__delitem__`. The new cleanup uses `.pop()`, which bypasses `__delitem__`, leaving stale timestamps behind.
Once `__refresh()` runs (and this PR makes it run at the correct cadence), it will build `to_delete` from `__key_times` and execute `del self[key]`. If the dict entry was previously removed via `.pop()`, `super().__delitem__(key)` raises `KeyError`, propagating out of `__refresh()`/`__getitem__()`.
## Issue Context
This affects both `_duplicate_push_triggers` and `_pending_task_duplicate_push_conditions`, which are `DefaultDictWithTimeout` instances.
## Fix Focus Areas
- pr_agent/servers/github_app.py[203-215]
- pr_agent/servers/utils.py[63-72]
- pr_agent/servers/utils.py[84-86]
## Suggested fix
Prefer one of:
1) **Don’t use `.pop()` on `DefaultDictWithTimeout`**; delete via `del` so `__delitem__` runs, e.g.:

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. TTL evicts in-flight state 🐞 Bug ≡ Correctness

Description

Now that DefaultDictWithTimeout.__refresh() actually runs, it can delete the asyncio.Condition and
counter entries used by handle_push_trigger_for_new_commits while the first task is still
running/waiting, if processing time exceeds the configured TTL. That can cause a waiting task to
never be notified (notify happens on a newly-created Condition) and/or raise KeyError when
decrementing _duplicate_push_triggers in the finally block.

Code

pr_agent/servers/utils.py[R67-68]

Evidence

DefaultDictWithTimeout will now perform periodic TTL-based deletion (every refresh_interval) of any
key whose last recorded access time is older than ttl. The GitHub App uses this structure to store
per-PR-url synchronization primitives and a running-task counter, but it does not touch these dict
keys during the long await of PR processing, so another webhook can trigger a refresh and evict the
key while it is still needed; default TTL is 300 seconds and the e2e test explicitly allows ~5
minutes for processing, making this eviction plausible in normal operation.

pr_agent/servers/utils.py[63-72]
pr_agent/servers/github_app.py[175-206]
pr_agent/settings/configuration.toml[234-255]
tests/e2e_tests/test_github_app.py[23-29]
pr_agent/tools/pr_reviewer.py[152-156]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DefaultDictWithTimeout` eviction is now active and can delete `_duplicate_push_triggers[api_url]` and `_pending_task_duplicate_push_conditions[api_url]` while `handle_push_trigger_for_new_commits` is still processing. This can break synchronization: a waiter may block forever if `notify()` runs on a newly-created `Condition`, and `_duplicate_push_triggers[api_url] -= 1` can raise if the key was evicted.
## Issue Context
- The dicts are accessed, then the code awaits long-running PR processing (`agent.handle_request` via `_perform_auto_commands_github`) without touching those keys.
- TTL default is 300s, and the e2e GitHub App test allows ~5 minutes processing.
## Fix Focus Areas
- pr_agent/servers/github_app.py[175-206]
- pr_agent/servers/utils.py[63-78]
- pr_agent/settings/configuration.toml[234-252]
## Implementation notes
- Prefer explicit lifecycle cleanup: in the `finally` block, after decrementing, if the counter becomes `0`, delete both dict entries for `api_url` (guard with `try/except KeyError`).
- If you keep TTL, ensure it cannot evict in-flight state: either set TTL to a safe upper bound relative to worst-case processing time or periodically refresh/touch the key during processing.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. Cancelled wait leaks dedupe 🐞 Bug ☼ Reliability ⭐ New

Description

handle_push_trigger_for_new_commits() increments the per-PR counter and may await Condition.wait()
before entering the try/finally that decrements and removes the per-PR dedupe entries. If that wait
is cancelled/raises (only possible on the current_active_tasks==1 waiter path), the function exits
without cleanup and the per-PR dedupe state can remain stuck indefinitely because these dicts are
configured with ttl=None.

Code

pr_agent/servers/github_app.py[R181-198]

+    async with _pending_task_duplicate_push_conditions[api_url]:
+        current_active_tasks = _duplicate_push_triggers.setdefault(api_url, 0)
+        max_active_tasks = 2 if get_settings().github_app.push_trigger_pending_tasks_backlog else 1
+        if current_active_tasks >= max_active_tasks:
+            get_logger().info(
+                f"Skipping push trigger for {api_url=} because another event already triggered the same processing"
+            )
+            return {}
        # first task can enter, and second tasks too if backlog is enabled
        get_logger().info(
            f"Continue processing push trigger for {api_url=} because there are {current_active_tasks} active tasks"
        )
        _duplicate_push_triggers[api_url] += 1
-    else:
-        get_logger().info(
-            f"Skipping push trigger for {api_url=} because another event already triggered the same processing"
-        )
-        return {}
-    async with _pending_task_duplicate_push_conditions[api_url]:
        if current_active_tasks == 1:
-            # second task waits
+            # second task waits for the in-progress task to finish
            get_logger().info(
                f"Waiting to process push trigger for {api_url=} because the first task is still in progress"
            )

Evidence
The admission section can increment the counter and then await the condition before any try/finally
cleanup is established; cleanup (decrement/pop) only happens in the later finally. Since both dicts
are configured with ttl=None, leaked entries have no time-based eviction fallback.
pr_agent/servers/github_app.py[77-82]
pr_agent/servers/github_app.py[178-201]
pr_agent/servers/github_app.py[202-220]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`handle_push_trigger_for_new_commits()` increments `_duplicate_push_triggers[api_url]` and can suspend on `Condition.wait()` before entering the `try/finally` that decrements and cleans up. If the waiter is cancelled (or `wait()` raises) before reaching the `try/finally`, the per-PR counter/condition entry will not be decremented/removed.

Because these caches are now `ttl=None`, this leaked state can persist for the lifetime of the process and suppress/distort future push-trigger handling for that PR URL.

## Issue Context
The wait happens only for the second admitted task when backlog is enabled (`current_active_tasks == 1`). Cleanup logic currently lives in the later `finally` block.

## Fix Focus Areas
- pr_agent/servers/github_app.py[175-221]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

5. Silently ignores KeyError 📘 Rule violation ☼ Reliability

Description

The new except KeyError: ... pass suppresses unexpected state errors without logging, which can
hide real bugs and make incidents harder to diagnose. This conflicts with the requirement for robust
error handling rather than ignoring errors.

Code

pr_agent/servers/github_app.py[R210-212]

Evidence

PR Compliance ID 3 disallows ignored errors. The added code catches KeyError and performs no
logging/handling (pass), which is an ignored error path in newly changed lines.

Rule 3: Robust Error Handling
pr_agent/servers/github_app.py[210-212]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`handle_push_trigger_for_new_commits()` catches `KeyError` and silently ignores it with `pass`, which can hide unexpected runtime issues.
## Issue Context
The `finally` cleanup block is intended to be resilient to TTL eviction, but completely suppressing the exception provides no signal when the `KeyError` is unexpected.
## Fix Focus Areas
- pr_agent/servers/github_app.py[205-215]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Author self-review: I have reviewed the code review findings, and addressed the relevant ones.

Previous review results

Review updated until commit 3604957

Results up to commit 215c9c3

🐞 Bugs (3) 📘 Rule violations (1) 📜 Skill insights (0)

1. Non-atomic dedupe cleanup 🐞 Bug ☼ Reliability

Description

handle_push_trigger_for_new_commits() decides to clean up shared per-PR state while holding the
Condition lock, but performs the actual .pop() outside that lock, allowing a concurrent push-trigger
task to start in between. This can delete the condition/counter for an in-flight task and corrupt
the dedupe bookkeeping (missed wait/notify, inconsistent counters).

Code

pr_agent/servers/github_app.py[R203-215]

+        # release the waiting task block, then clean up if no tasks remain
+        should_cleanup = False
+        try:
+            async with _pending_task_duplicate_push_conditions[api_url]:
+                _pending_task_duplicate_push_conditions[api_url].notify(1)
+                _duplicate_push_triggers[api_url] -= 1
+                should_cleanup = _duplicate_push_triggers[api_url] <= 0
+        except KeyError:
+            # TTL eviction already cleaned up this entry
+            pass
+        if should_cleanup:
+            _duplicate_push_triggers.pop(api_url, None)
+            _pending_task_duplicate_push_conditions.pop(api_url, None)

Evidence

The admission path increments _duplicate_push_triggers[api_url] before acquiring the condition
lock, while the new cleanup code removes the dict entries after releasing the condition
lock—creating a check-then-act race window where a concurrent task can start and then have its state
popped out from under it.

pr_agent/servers/github_app.py[175-195]
pr_agent/servers/github_app.py[202-215]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`handle_push_trigger_for_new_commits()` computes `should_cleanup` under the per-PR `asyncio.Condition` lock, but then removes `_duplicate_push_triggers[api_url]` and `_pending_task_duplicate_push_conditions[api_url]` after releasing that lock. Because the increment/admission path mutates `_duplicate_push_triggers` before acquiring the condition lock, a new task can interleave between the decrement/check and the pop, causing live state to be removed.
### Issue Context
This code is implementing a concurrency/deduplication gate for push triggers per `api_url`. The counter and the condition must have a consistent lifecycle across all concurrent tasks.
### Fix Focus Areas
- pr_agent/servers/github_app.py[175-215]
### Suggested fix direction
- Ensure **all** mutations and lifecycle transitions for both `_duplicate_push_triggers[api_url]` and `_pending_task_duplicate_push_conditions[api_url]` are performed under a single, consistent synchronization mechanism.
- One workable approach: fetch/create the per-PR condition first, then `async with condition:` to protect **both** admission (`setdefault`/increment) and cleanup (decrement + possible pop) for that `api_url`.
- Alternatively, introduce a dedicated lock (global or per-`api_url`) that guards: setdefault/increment, decrement/check, and pop, so no new task can start between the cleanup decision and the removal.
- If you keep the `pop()`, perform it inside the same critical section as the `should_cleanup` decision, and re-check the counter immediately before removing entries.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. pop bypasses TTL bookkeeping 🐞 Bug ☼ Reliability

Description

handle_push_trigger_for_new_commits() removes DefaultDictWithTimeout entries via .pop(), which
does not update DefaultDictWithTimeout’s internal __key_times map. After this PR makes
__refresh() run, a later refresh can try to del self[key] for a key that was already popped from
the dict, raising KeyError and potentially failing webhook handling.

Code

pr_agent/servers/github_app.py[R214-215]

Evidence

The cleanup uses .pop() on DefaultDictWithTimeout, but the class only synchronizes its internal
timestamp map on __delitem__. __refresh() deletes based on __key_times using del self[key],
which will raise if the dict entry was already removed via .pop() and __key_times wasn’t
updated.

pr_agent/servers/github_app.py[203-215]
pr_agent/servers/utils.py[63-72]
pr_agent/servers/utils.py[84-86]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DefaultDictWithTimeout` maintains a private `__key_times` map that is only kept in sync when deletion goes through `__delitem__`. The new cleanup uses `.pop()`, which bypasses `__delitem__`, leaving stale timestamps behind.
Once `__refresh()` runs (and this PR makes it run at the correct cadence), it will build `to_delete` from `__key_times` and execute `del self[key]`. If the dict entry was previously removed via `.pop()`, `super().__delitem__(key)` raises `KeyError`, propagating out of `__refresh()`/`__getitem__()`.
## Issue Context
This affects both `_duplicate_push_triggers` and `_pending_task_duplicate_push_conditions`, which are `DefaultDictWithTimeout` instances.
## Fix Focus Areas
- pr_agent/servers/github_app.py[203-215]
- pr_agent/servers/utils.py[63-72]
- pr_agent/servers/utils.py[84-86]
## Suggested fix
Prefer one of:
1) **Don’t use `.pop()` on `DefaultDictWithTimeout`**; delete via `del` so `__delitem__` runs, e.g.:

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. TTL evicts in-flight state 🐞 Bug ≡ Correctness

Description

Now that DefaultDictWithTimeout.__refresh() actually runs, it can delete the asyncio.Condition and
counter entries used by handle_push_trigger_for_new_commits while the first task is still
running/waiting, if processing time exceeds the configured TTL. That can cause a waiting task to
never be notified (notify happens on a newly-created Condition) and/or raise KeyError when
decrementing _duplicate_push_triggers in the finally block.

Code

pr_agent/servers/utils.py[R67-68]

Evidence

DefaultDictWithTimeout will now perform periodic TTL-based deletion (every refresh_interval) of any
key whose last recorded access time is older than ttl. The GitHub App uses this structure to store
per-PR-url synchronization primitives and a running-task counter, but it does not touch these dict
keys during the long await of PR processing, so another webhook can trigger a refresh and evict the
key while it is still needed; default TTL is 300 seconds and the e2e test explicitly allows ~5
minutes for processing, making this eviction plausible in normal operation.

pr_agent/servers/utils.py[63-72]
pr_agent/servers/github_app.py[175-206]
pr_agent/settings/configuration.toml[234-255]
tests/e2e_tests/test_github_app.py[23-29]
pr_agent/tools/pr_reviewer.py[152-156]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DefaultDictWithTimeout` eviction is now active and can delete `_duplicate_push_triggers[api_url]` and `_pending_task_duplicate_push_conditions[api_url]` while `handle_push_trigger_for_new_commits` is still processing. This can break synchronization: a waiter may block forever if `notify()` runs on a newly-created `Condition`, and `_duplicate_push_triggers[api_url] -= 1` can raise if the key was evicted.
## Issue Context
- The dicts are accessed, then the code awaits long-running PR processing (`agent.handle_request` via `_perform_auto_commands_github`) without touching those keys.
- TTL default is 300s, and the e2e GitHub App test allows ~5 minutes processing.
## Fix Focus Areas
- pr_agent/servers/github_app.py[175-206]
- pr_agent/servers/utils.py[63-78]
- pr_agent/settings/configuration.toml[234-252]
## Implementation notes
- Prefer explicit lifecycle cleanup: in the `finally` block, after decrementing, if the counter becomes `0`, delete both dict entries for `api_url` (guard with `try/except KeyError`).
- If you keep TTL, ensure it cannot evict in-flight state: either set TTL to a safe upper bound relative to worst-case processing time or periodically refresh/touch the key during processing.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. Silently ignores KeyError 📘 Rule violation ☼ Reliability

Description

The new except KeyError: ... pass suppresses unexpected state errors without logging, which can
hide real bugs and make incidents harder to diagnose. This conflicts with the requirement for robust
error handling rather than ignoring errors.

Code

pr_agent/servers/github_app.py[R210-212]

Evidence

PR Compliance ID 3 disallows ignored errors. The added code catches KeyError and performs no
logging/handling (pass), which is an ignored error path in newly changed lines.

Rule 3: Robust Error Handling
pr_agent/servers/github_app.py[210-212]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`handle_push_trigger_for_new_commits()` catches `KeyError` and silently ignores it with `pass`, which can hide unexpected runtime issues.
## Issue Context
The `finally` cleanup block is intended to be resilient to TTL eviction, but completely suppressing the exception provides no signal when the `KeyError` is unexpected.
## Fix Focus Areas
- pr_agent/servers/github_app.py[205-215]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

Results up to commit 74ded19

🐞 Bugs (3) 📘 Rule violations (1) 📜 Skill insights (0)

1. Non-atomic dedupe cleanup 🐞 Bug ☼ Reliability

Description

handle_push_trigger_for_new_commits() decides to clean up shared per-PR state while holding the
Condition lock, but performs the actual .pop() outside that lock, allowing a concurrent push-trigger
task to start in between. This can delete the condition/counter for an in-flight task and corrupt
the dedupe bookkeeping (missed wait/notify, inconsistent counters).

Code

pr_agent/servers/github_app.py[R203-215]

+        # release the waiting task block, then clean up if no tasks remain
+        should_cleanup = False
+        try:
+            async with _pending_task_duplicate_push_conditions[api_url]:
+                _pending_task_duplicate_push_conditions[api_url].notify(1)
+                _duplicate_push_triggers[api_url] -= 1
+                should_cleanup = _duplicate_push_triggers[api_url] <= 0
+        except KeyError:
+            # TTL eviction already cleaned up this entry
+            pass
+        if should_cleanup:
+            _duplicate_push_triggers.pop(api_url, None)
+            _pending_task_duplicate_push_conditions.pop(api_url, None)

Evidence

The admission path increments _duplicate_push_triggers[api_url] before acquiring the condition
lock, while the new cleanup code removes the dict entries after releasing the condition
lock—creating a check-then-act race window where a concurrent task can start and then have its state
popped out from under it.

pr_agent/servers/github_app.py[175-195]
pr_agent/servers/github_app.py[202-215]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`handle_push_trigger_for_new_commits()` computes `should_cleanup` under the per-PR `asyncio.Condition` lock, but then removes `_duplicate_push_triggers[api_url]` and `_pending_task_duplicate_push_conditions[api_url]` after releasing that lock. Because the increment/admission path mutates `_duplicate_push_triggers` before acquiring the condition lock, a new task can interleave between the decrement/check and the pop, causing live state to be removed.
### Issue Context
This code is implementing a concurrency/deduplication gate for push triggers per `api_url`. The counter and the condition must have a consistent lifecycle across all concurrent tasks.
### Fix Focus Areas
- pr_agent/servers/github_app.py[175-215]
### Suggested fix direction
- Ensure **all** mutations and lifecycle transitions for both `_duplicate_push_triggers[api_url]` and `_pending_task_duplicate_push_conditions[api_url]` are performed under a single, consistent synchronization mechanism.
- One workable approach: fetch/create the per-PR condition first, then `async with condition:` to protect **both** admission (`setdefault`/increment) and cleanup (decrement + possible pop) for that `api_url`.
- Alternatively, introduce a dedicated lock (global or per-`api_url`) that guards: setdefault/increment, decrement/check, and pop, so no new task can start between the cleanup decision and the removal.
- If you keep the `pop()`, perform it inside the same critical section as the `should_cleanup` decision, and re-check the counter immediately before removing entries.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

2. pop bypasses TTL bookkeeping 🐞 Bug ☼ Reliability

Description

handle_push_trigger_for_new_commits() removes DefaultDictWithTimeout entries via .pop(), which
does not update DefaultDictWithTimeout’s internal __key_times map. After this PR makes
__refresh() run, a later refresh can try to del self[key] for a key that was already popped from
the dict, raising KeyError and potentially failing webhook handling.

Code

pr_agent/servers/github_app.py[R214-215]

Evidence

The cleanup uses .pop() on DefaultDictWithTimeout, but the class only synchronizes its internal
timestamp map on __delitem__. __refresh() deletes based on __key_times using del self[key],
which will raise if the dict entry was already removed via .pop() and __key_times wasn’t
updated.

pr_agent/servers/github_app.py[203-215]
pr_agent/servers/utils.py[63-72]
pr_agent/servers/utils.py[84-86]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DefaultDictWithTimeout` maintains a private `__key_times` map that is only kept in sync when deletion goes through `__delitem__`. The new cleanup uses `.pop()`, which bypasses `__delitem__`, leaving stale timestamps behind.
Once `__refresh()` runs (and this PR makes it run at the correct cadence), it will build `to_delete` from `__key_times` and execute `del self[key]`. If the dict entry was previously removed via `.pop()`, `super().__delitem__(key)` raises `KeyError`, propagating out of `__refresh()`/`__getitem__()`.
## Issue Context
This affects both `_duplicate_push_triggers` and `_pending_task_duplicate_push_conditions`, which are `DefaultDictWithTimeout` instances.
## Fix Focus Areas
- pr_agent/servers/github_app.py[203-215]
- pr_agent/servers/utils.py[63-72]
- pr_agent/servers/utils.py[84-86]
## Suggested fix
Prefer one of:
1) **Don’t use `.pop()` on `DefaultDictWithTimeout`**; delete via `del` so `__delitem__` runs, e.g.:

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

3. TTL evicts in-flight state 🐞 Bug ≡ Correctness

Description

Now that DefaultDictWithTimeout.__refresh() actually runs, it can delete the asyncio.Condition and
counter entries used by handle_push_trigger_for_new_commits while the first task is still
running/waiting, if processing time exceeds the configured TTL. That can cause a waiting task to
never be notified (notify happens on a newly-created Condition) and/or raise KeyError when
decrementing _duplicate_push_triggers in the finally block.

Code

pr_agent/servers/utils.py[R67-68]

Evidence

DefaultDictWithTimeout will now perform periodic TTL-based deletion (every refresh_interval) of any
key whose last recorded access time is older than ttl. The GitHub App uses this structure to store
per-PR-url synchronization primitives and a running-task counter, but it does not touch these dict
keys during the long await of PR processing, so another webhook can trigger a refresh and evict the
key while it is still needed; default TTL is 300 seconds and the e2e test explicitly allows ~5
minutes for processing, making this eviction plausible in normal operation.

pr_agent/servers/utils.py[63-72]
pr_agent/servers/github_app.py[175-206]
pr_agent/settings/configuration.toml[234-255]
tests/e2e_tests/test_github_app.py[23-29]
pr_agent/tools/pr_reviewer.py[152-156]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`DefaultDictWithTimeout` eviction is now active and can delete `_duplicate_push_triggers[api_url]` and `_pending_task_duplicate_push_conditions[api_url]` while `handle_push_trigger_for_new_commits` is still processing. This can break synchronization: a waiter may block forever if `notify()` runs on a newly-created `Condition`, and `_duplicate_push_triggers[api_url] -= 1` can raise if the key was evicted.
## Issue Context
- The dicts are accessed, then the code awaits long-running PR processing (`agent.handle_request` via `_perform_auto_commands_github`) without touching those keys.
- TTL default is 300s, and the e2e GitHub App test allows ~5 minutes processing.
## Fix Focus Areas
- pr_agent/servers/github_app.py[175-206]
- pr_agent/servers/utils.py[63-78]
- pr_agent/settings/configuration.toml[234-252]
## Implementation notes
- Prefer explicit lifecycle cleanup: in the `finally` block, after decrementing, if the counter becomes `0`, delete both dict entries for `api_url` (guard with `try/except KeyError`).
- If you keep TTL, ensure it cannot evict in-flight state: either set TTL to a safe upper bound relative to worst-case processing time or periodically refresh/touch the key during processing.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

4. Silently ignores KeyError 📘 Rule violation ☼ Reliability

Description

The new except KeyError: ... pass suppresses unexpected state errors without logging, which can
hide real bugs and make incidents harder to diagnose. This conflicts with the requirement for robust
error handling rather than ignoring errors.

Code

pr_agent/servers/github_app.py[R210-212]

Evidence

PR Compliance ID 3 disallows ignored errors. The added code catches KeyError and performs no
logging/handling (pass), which is an ignored error path in newly changed lines.

Rule 3: Robust Error Handling
pr_agent/servers/github_app.py[210-212]

Agent prompt

The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`handle_push_trigger_for_new_commits()` catches `KeyError` and silently ignores it with `pass`, which can hide unexpected runtime issues.
## Issue Context
The `finally` cleanup block is intended to be resilient to TTL eviction, but completely suppressing the exception provides no signal when the `KeyError` is unexpected.
## Fix Focus Areas
- pr_agent/servers/github_app.py[205-215]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

karesansui-u · 2026-03-16T12:37:31Z

Good catch. You're right that once TTL eviction starts working correctly, it can evict entries while handle_push_trigger_for_new_commits is still processing — particularly if PR processing exceeds the 300s default TTL.

Two options:

Explicit lifecycle cleanup (preferred): In the finally block of handle_push_trigger_for_new_commits, after decrementing the counter, delete both dict entries when the counter reaches 0. Guard with try/except KeyError for the case where TTL already cleaned up. This makes the code self-cleaning regardless of TTL.
Increase TTL: Set the TTL high enough to cover worst-case processing time. Less robust but simpler.

I can add option 1 to this PR if that's the preferred direction. The fix to the guard condition itself is still correct — the current code never evicts anything.

qodo-free-for-open-source-projects · 2026-03-16T12:51:31Z

Code Review by Qodo

Looking for bugs?

Check back in a few minutes. An AI review agent is analyzing this pull request.

qodo-free-for-open-source-projects · 2026-05-25T05:46:21Z

Persistent review updated to latest commit 4dc7874

qodo-free-for-open-source-projects · 2026-05-25T05:49:02Z

+            _duplicate_push_triggers.pop(api_url, None)
+            _pending_task_duplicate_push_conditions.pop(api_url, None)


1. Pop bypasses ttl bookkeeping 🐞 Bug ☼ Reliability

handle_push_trigger_for_new_commits() removes DefaultDictWithTimeout entries via .pop(), which does not update DefaultDictWithTimeout’s internal __key_times map. After this PR makes __refresh() run, a later refresh can try to del self[key] for a key that was already popped from the dict, raising KeyError and potentially failing webhook handling.

Agent Prompt

## Issue description `DefaultDictWithTimeout` maintains a private `__key_times` map that is only kept in sync when deletion goes through `__delitem__`. The new cleanup uses `.pop()`, which bypasses `__delitem__`, leaving stale timestamps behind. Once `__refresh()` runs (and this PR makes it run at the correct cadence), it will build `to_delete` from `__key_times` and execute `del self[key]`. If the dict entry was previously removed via `.pop()`, `super().__delitem__(key)` raises `KeyError`, propagating out of `__refresh()`/`__getitem__()`. ## Issue Context This affects both `_duplicate_push_triggers` and `_pending_task_duplicate_push_conditions`, which are `DefaultDictWithTimeout` instances. ## Fix Focus Areas - pr_agent/servers/github_app.py[203-215] - pr_agent/servers/utils.py[63-72] - pr_agent/servers/utils.py[84-86] ## Suggested fix Prefer one of: 1) **Don’t use `.pop()` on `DefaultDictWithTimeout`**; delete via `del` so `__delitem__` runs, e.g.: ```python try: del _duplicate_push_triggers[api_url] except KeyError: pass try: del _pending_task_duplicate_push_conditions[api_url] except KeyError: pass ``` 2) **Implement `pop()` (and possibly `clear()`/`popitem()`) on `DefaultDictWithTimeout`** to also remove `__key_times` entries (`self.__key_times.pop(key, None)`) before delegating to `super().pop(...)`.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

naorpeled · 2026-06-22T18:29:29Z

Hey @karesansui-u,
Thanks for opening this!

Some of the issue was already resolved by another commit, can you please rebase and add tests?

When the active task counter reaches 0, delete both dict entries immediately so TTL eviction cannot race with in-flight processing. Guard with try/except KeyError for the case where TTL already evicted.

qodo-free-for-open-source-projects · 2026-07-03T09:43:26Z

+        # release the waiting task block, then clean up if no tasks remain
+        should_cleanup = False
+        try:
+            async with _pending_task_duplicate_push_conditions[api_url]:
+                _pending_task_duplicate_push_conditions[api_url].notify(1)
+                _duplicate_push_triggers[api_url] -= 1
+                should_cleanup = _duplicate_push_triggers[api_url] <= 0
+        except KeyError:
+            # TTL eviction already cleaned up this entry
+            pass
+        if should_cleanup:
+            _duplicate_push_triggers.pop(api_url, None)
+            _pending_task_duplicate_push_conditions.pop(api_url, None)


1. Non-atomic dedupe cleanup 🐞 Bug ☼ Reliability

handle_push_trigger_for_new_commits() decides to clean up shared per-PR state while holding the Condition lock, but performs the actual .pop() outside that lock, allowing a concurrent push-trigger task to start in between. This can delete the condition/counter for an in-flight task and corrupt the dedupe bookkeeping (missed wait/notify, inconsistent counters).

Agent Prompt

### Issue description `handle_push_trigger_for_new_commits()` computes `should_cleanup` under the per-PR `asyncio.Condition` lock, but then removes `_duplicate_push_triggers[api_url]` and `_pending_task_duplicate_push_conditions[api_url]` after releasing that lock. Because the increment/admission path mutates `_duplicate_push_triggers` before acquiring the condition lock, a new task can interleave between the decrement/check and the pop, causing live state to be removed. ### Issue Context This code is implementing a concurrency/deduplication gate for push triggers per `api_url`. The counter and the condition must have a consistent lifecycle across all concurrent tasks. ### Fix Focus Areas - pr_agent/servers/github_app.py[175-215] ### Suggested fix direction - Ensure **all** mutations and lifecycle transitions for both `_duplicate_push_triggers[api_url]` and `_pending_task_duplicate_push_conditions[api_url]` are performed under a single, consistent synchronization mechanism. - One workable approach: fetch/create the per-PR condition first, then `async with condition:` to protect **both** admission (`setdefault`/increment) and cleanup (decrement + possible pop) for that `api_url`. - Alternatively, introduce a dedicated lock (global or per-`api_url`) that guards: setdefault/increment, decrement/check, and pop, so no new task can start between the cleanup decision and the removal. - If you keep the `pop()`, perform it inside the same critical section as the `should_cleanup` decision, and re-check the counter immediately before removing entries.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

qodo-free-for-open-source-projects · 2026-07-03T09:43:28Z

Code review by qodo was updated up to the latest commit fd0a503

Update the completion test to assert the dedupe entries are removed (the new self-cleaning contract) and add coverage for the two new branches: - cleanup is skipped while another task is still active - a TTL-evicted counter is tolerated in the finally block (KeyError guard)

qodo-free-for-open-source-projects · 2026-07-03T09:49:51Z

Code review by qodo was updated up to the latest commit e10a438

Resolve CodeQL py/mixed-returns on handle_push_trigger_for_new_commits by returning {} explicitly on the success path, matching every other return in the function instead of falling through to an implicit None.

qodo-free-for-open-source-projects · 2026-07-03T13:23:12Z

Code review by qodo was updated up to the latest commit 53da118

- DefaultDictWithTimeout.pop() now keeps the internal __key_times map in sync. The finally block cleaned up via .pop(), which bypassed __delitem__ and left a stale timestamp; a later __refresh() would then run 'del self[key]' for an already-removed key and raise KeyError. - Move the decrement, the no-tasks-left decision, and the entry removal all under the same per-PR condition lock so a newly admitted task cannot interleave between the decision and the removal. - Log the KeyError from a TTL-evicted counter instead of silently passing, and leave any stray condition entry for the TTL sweep in that case. - Add unit tests for pop() bookkeeping (incl. a regression for the refresh-after-pop KeyError).

qodo-free-for-open-source-projects · 2026-07-03T13:36:22Z

Code review by qodo was updated up to the latest commit 74ded19

CodeQL py/side-effect-in-assert: an assert expression is skipped under python -O, so the mutating pop() must run on its own line.

qodo-free-for-open-source-projects · 2026-07-03T13:52:18Z

Code review by qodo was updated up to the latest commit 215c9c3

…ction Addresses the remaining review feedback on the push-trigger dedupe: - Non-atomic cleanup: admission (setdefault/increment/wait) now runs under the same per-PR condition lock as the finally cleanup, so the counter and condition have a single consistent lifecycle and no task can interleave between the cleanup decision and the removal. - TTL evicts in-flight state: switch both dicts to ttl=None. The deterministic finally cleanup already bounds memory (entries are removed once no task remains), and disabling eviction removes the risk of deleting a condition/ counter that an in-flight or waiting task still needs. Drop the now-unused push_trigger_pending_tasks_ttl config. - With eviction gone and admission locked, the counter always exists during cleanup, so the defensive try/except KeyError (the last silently-handled error) is removed. Replace the obsolete TTL-eviction test with a concurrent backlog test that exercises the wait/notify path and asserts the shared state is cleaned up only after both tasks drain.

qodo-free-for-open-source-projects · 2026-07-04T11:22:41Z

Code review by qodo was updated up to the latest commit 3604957

naorpeled · 2026-07-04T11:34:24Z

Closing this as the main issue was fixed by another PR, I tried to further fix stuff but it feels like the change that landed was good enough.

I'm open to re-revisiting this in the future.
Thanks for your work on this!

karesansui-u mentioned this pull request Mar 16, 2026

Bug: DefaultDictWithTimeout TTL refresh guard is inverted — expired keys never cleaned up (memory leak) #2260

Closed

qodo-free-for-open-source-projects Bot reviewed Mar 16, 2026

View reviewed changes

Comment thread pr_agent/servers/utils.py

qodo-free-for-open-source-projects Bot reviewed May 25, 2026

View reviewed changes

Add explicit lifecycle cleanup for push trigger entries

fd0a503

When the active task counter reaches 0, delete both dict entries immediately so TTL eviction cannot race with in-flight processing. Guard with try/except KeyError for the case where TTL already evicted.

naorpeled force-pushed the fix/ttl-refresh-guard-inversion branch from 4dc7874 to fd0a503 Compare July 3, 2026 09:40

github-actions Bot added the bug label Jul 3, 2026

qodo-free-for-open-source-projects Bot reviewed Jul 3, 2026

View reviewed changes

fix(github_app): make handle_push_trigger return consistent

53da118

Resolve CodeQL py/mixed-returns on handle_push_trigger_for_new_commits by returning {} explicitly on the success path, matching every other return in the function instead of falling through to an implicit None.

github-advanced-security AI found potential problems Jul 3, 2026

View reviewed changes

Comment thread tests/unittest/test_github_app_timeout_core.py Fixed

Comment thread tests/unittest/test_github_app_timeout_core.py Fixed

test: hoist pop() calls out of assert expressions

215c9c3

CodeQL py/side-effect-in-assert: an assert expression is skipped under python -O, so the mutating pop() must run on its own line.

naorpeled closed this Jul 4, 2026

		_duplicate_push_triggers.pop(api_url, None)
		_pending_task_duplicate_push_conditions.pop(api_url, None)

Uh oh!

Uh oh!

Conversation

karesansui-u commented Mar 16, 2026

Bug description

Impact

Fix

Affected files

Uh oh!

qodo-free-for-open-source-projects Bot commented Mar 16, 2026

Review Summary by Qodo

Walkthroughs

File Changes

Uh oh!

qodo-free-for-open-source-projects Bot commented Mar 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review by Qodo

Previous review results

Uh oh!

Uh oh!

karesansui-u commented Mar 16, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Mar 16, 2026

Code Review by Qodo

Looking for bugs?

Uh oh!

qodo-free-for-open-source-projects Bot commented May 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

qodo-free-for-open-source-projects Bot May 25, 2026

Choose a reason for hiding this comment

Uh oh!

naorpeled commented Jun 22, 2026

Uh oh!

qodo-free-for-open-source-projects Bot Jul 3, 2026

Choose a reason for hiding this comment

Uh oh!

qodo-free-for-open-source-projects Bot commented Jul 3, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Jul 3, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Jul 3, 2026

Uh oh!

Uh oh!

Uh oh!

qodo-free-for-open-source-projects Bot commented Jul 3, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Jul 3, 2026

Uh oh!

qodo-free-for-open-source-projects Bot commented Jul 4, 2026

Uh oh!

naorpeled commented Jul 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

qodo-free-for-open-source-projects Bot commented Mar 16, 2026 •

edited

Loading

qodo-free-for-open-source-projects Bot commented May 25, 2026 •

edited

Loading