Skip to content

feat: support CI artifact context injection and workflow_run trigger#2494

Open
MahmoudHaouachi wants to merge 3 commits into
The-PR-Agent:mainfrom
idealo:feat/ci-artifact-context
Open

feat: support CI artifact context injection and workflow_run trigger#2494
MahmoudHaouachi wants to merge 3 commits into
The-PR-Agent:mainfrom
idealo:feat/ci-artifact-context

Conversation

@MahmoudHaouachi

Copy link
Copy Markdown

Closes #2493

Summary

Adds two related enhancements that together enable a common CI/CD pattern: running PR-Agent after a prior workflow completes, with that workflow's artifacts (e.g. terraform plan, test report) injected as extra review context.

Changes

1. pr_agent/algo/artifacts.py (new)

Registry-based artifact parser that reads a local file and formats its content for injection into tool prompts. Three built-in parsers:

  • generic — plain context, useful for any CI output
  • terraform_plan — instructs the AI to verify infrastructure changes match the code diff and flag risky deletions
  • test_report — instructs the AI to correlate failures with the code changes

Key functions: resolve_artifact_path (handles relative/absolute, respects GITHUB_WORKSPACE), load_artifact_content (reads, truncates, formats; no-op unless enabled).

2. pr_agent/servers/github_action_runner.py

Artifact injection block: reads ARTIFACT_PATH / ARTIFACT_TYPE env vars before event dispatch, calls load_artifact_content(), and appends the result to extra_instructions for pr_description, pr_code_suggestions, and pr_reviewer. Fully wrapped in try/except — safe no-op when ARTIFACT_PATH is not set.

workflow_run handler: new elif GITHUB_EVENT_NAME == "workflow_run": branch. Extracts the PR URL from workflow_run.pull_requests[0].url and runs the same auto tools as the pull_request handler. Guards: skips non-pull_request origins and empty pull_requests arrays (fork PRs).

3. action.yaml

Two new optional inputs with safe defaults:

  • artifact_path (default "") — path to artifact relative to GITHUB_WORKSPACE
  • artifact_type (default "generic") — parser type

4. pr_agent/settings/configuration.toml

New [artifacts] section documenting all options (enable, artifact_path, artifact_type, artifact_label, target_tools, max_artifact_size).

5. Tests

  • tests/unittest/test_artifacts.py (new, 176 lines) — full coverage of the artifacts module
  • tests/unittest/test_github_action_runner_core.py — 3 new tests for workflow_run: runs tools, skips non-PR origin, skips empty pull_requests

6. github_action/entrypoint.sh

Added set -e for fail-fast behavior.


Example: terraform plan review via workflow_run

on:
  workflow_run:
    workflows: ["Terraform Plan"]
    types: [completed]

jobs:
  pr-agent:
    if: github.event.workflow_run.event == 'pull_request'
    runs-on: ubuntu-latest
    steps:
      - name: Download terraform plan
        uses: actions/download-artifact@v4
        with:
          run-id: ${{ github.event.workflow_run.id }}
          name: terraform-plan
          path: plans

      - name: Review PR with plan context
        uses: The-PR-Agent/pr-agent@main
        env:
          GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
        with:
          artifact_path: plans/plan.txt
          artifact_type: terraform_plan

Toggle / opt-out

  • Default: completely off — artifact_path defaults to "", artifact injection never runs
  • Enable: set artifact_path in action inputs, or [artifacts] enable = true + artifact_path in .pr_agent.toml
  • Disable: remove artifact_path from action inputs, or [artifacts] enable = false
  • The workflow_run handler only fires when GITHUB_EVENT_NAME=workflow_run — existing pull_request and issue_comment flows are completely unchanged

Testing

The implementation has been validated end-to-end against real terraform plan artifacts using Bedrock Claude Haiku.

Adds a registry-based artifact parser system that reads a CI-produced
file (e.g. terraform plan output, test report) and injects its content
into tool extra_instructions as extra review context.

New files:
- pr_agent/algo/artifacts.py: core module with resolve_artifact_path,
  load_artifact_content, and three built-in parsers (generic,
  terraform_plan, test_report)
- tests/unittest/test_artifacts.py: full unit test suite

Configuration (off by default, opt-in via artifact_path action input):
  [artifacts]
  enable = false
  artifact_path = ""
  artifact_type = "generic"
  target_tools = ["pr_reviewer", "pr_description", "pr_code_suggestions"]
  max_artifact_size = 50000
Two additions to github_action_runner.py:

1. Artifact injection: reads ARTIFACT_PATH / ARTIFACT_TYPE env vars
   before event dispatch and appends parsed artifact content to
   extra_instructions for pr_description, pr_code_suggestions, and
   pr_reviewer. Delegates to pr_agent/algo/artifacts.py. No-op when
   ARTIFACT_PATH is not set.

2. workflow_run handler: new elif branch that handles
   GITHUB_EVENT_NAME=workflow_run. Extracts the PR API URL from
   workflow_run.pull_requests[0].url and runs the same auto tools
   (PRDescription, PRReviewer, PRCodeSuggestions) as the pull_request
   handler. Guards: skips non-pull_request origins and empty
   pull_requests arrays (fork PRs).

This enables the common pattern of downloading cross-workflow artifacts
before running PR-Agent:

  on:
    workflow_run:
      workflows: ["CI"]
      types: [completed]
  steps:
    - uses: actions/download-artifact@v4
      with:
        run-id: ${{ github.event.workflow_run.id }}
        name: terraform-plan
    - uses: The-PR-Agent/pr-agent@main
      with:
        artifact_path: plan.txt
        artifact_type: terraform_plan
Exposes two optional inputs for the GitHub Action:
- artifact_path: path to a CI artifact file (relative to GITHUB_WORKSPACE
  or absolute). When set, artifact injection is automatically enabled.
- artifact_type: parser to apply — generic (default), terraform_plan,
  or test_report.

Both default to empty/generic so existing users see no behavior change.
Also adds `set -e` to entrypoint.sh for fail-fast behavior.
@github-actions github-actions Bot added the feature 💡 label Jul 2, 2026
@qodo-free-for-open-source-projects

Copy link
Copy Markdown
Contributor

PR Summary by Qodo

Support workflow_run trigger and inject CI artifact context into PR tools

✨ Enhancement 🧪 Tests ⚙️ Configuration changes 🕐 40+ Minutes

Grey Divider

AI Description

• Add registry-based artifact parsers to inject CI outputs into PR-Agent prompts.
• Extend GitHub Action runner to support workflow_run events and run auto tools.
• Expose artifact path/type as action inputs and document artifacts configuration defaults.
Diagram

graph TD
  A{{"GitHub event"}} --> B["github_action_runner.py"] --> C["Dynaconf settings"] --> D["artifacts.py"] --> E[("Artifact file")]
  B --> F["PR tools"] --> G["GitHub PR API"]
  C --> F

  subgraph Legend
    direction LR
    _evt{{"Event"}} ~~~ _proc["Process/Module"] ~~~ _file[("File")]
  end
Loading
High-Level Assessment

The following are alternative approaches to this PR:

1. Fetch artifacts via GitHub API instead of local file
  • ➕ No dependency on workspace file layout or download-artifact step
  • ➕ Could support larger/multiple artifacts with pagination
  • ➖ Requires additional API calls/permissions and artifact lookup logic
  • ➖ More complexity and failure modes than local file injection
2. Support multiple artifacts as a list input
  • ➕ Better matches real CI runs (plan + test report + lints)
  • ➕ Avoids overloading a single artifact with mixed content
  • ➖ Needs prompt-size budgeting and ordering/labeling strategy
  • ➖ Expands configuration/testing surface area
3. Attach artifact context as a PR comment link/summary instead of prompt injection
  • ➕ Keeps model prompts smaller and more predictable
  • ➕ Makes artifact context visible/auditable in GitHub UI
  • ➖ Less direct for model reasoning unless comments are always fetched
  • ➖ Potentially noisy PR timeline depending on frequency

Recommendation: The PR’s approach (opt-in local file injection + parser registry) is the right default: it’s simple, safe (no-op when unset), and keeps permissions minimal. Consider multi-artifact support as a future extension once prompt-size limits and prioritization rules are defined.

Files changed (7) +507 / -0

Enhancement (2) +209 / -0
artifacts.pyAdd artifact parsing/formatting module with registry and truncation +132/-0

Add artifact parsing/formatting module with registry and truncation

• Implements artifact path resolution (absolute, workspace-relative, or CWD) and safe file reading with size truncation. Adds a parser registry with built-in generic, terraform_plan, and test_report formatters and a load_artifact_content() gate controlled by settings.

pr_agent/algo/artifacts.py

github_action_runner.pyInject artifact context and add workflow_run event handling +77/-0

Inject artifact context and add workflow_run event handling

• Adds an opt-in artifact injection block that enables ARTIFACTS settings from env vars and appends parsed artifact text to extra_instructions for selected tools. Adds a workflow_run handler that extracts the PR URL from the payload, applies repo settings, and runs the same auto tools with guards for non-PR origins and empty pull_requests arrays.

pr_agent/servers/github_action_runner.py

Tests (2) +271 / -0
test_artifacts.pyAdd unit tests for artifact resolution, parsing, and load gating +176/-0

Add unit tests for artifact resolution, parsing, and load gating

• Covers path resolution behaviors (workspace vs CWD), file truncation, parser registry contents, and load_artifact_content() gating (disabled/missing path/unknown type). Verifies terraform_plan parsing and generic fallback behavior.

tests/unittest/test_artifacts.py

test_github_action_runner_core.pyAdd workflow_run dispatch tests and skip-guard coverage +95/-0

Add workflow_run dispatch tests and skip-guard coverage

• Adds tests ensuring workflow_run triggers the expected auto tools when originating from pull_request. Verifies the runner skips execution for non-pull_request origins and when the payload has no pull_requests entries.

tests/unittest/test_github_action_runner_core.py

Other (3) +27 / -0
action.yamlAdd artifact_path/type inputs and map them to runner env vars +12/-0

Add artifact_path/type inputs and map them to runner env vars

• Introduces optional action inputs for artifact_path and artifact_type with safe defaults. Exposes them as ARTIFACT_PATH/ARTIFACT_TYPE environment variables for the container runner.

action.yaml

entrypoint.shFail fast in GitHub Action entrypoint +1/-0

Fail fast in GitHub Action entrypoint

• Adds 'set -e' so the action exits on the first failing command instead of continuing silently.

github_action/entrypoint.sh

configuration.tomlDocument artifacts configuration section and defaults +14/-0

Document artifacts configuration section and defaults

• Adds an [artifacts] section documenting enable/path/type/label/target_tools and max_artifact_size, with injection off by default.

pr_agent/settings/configuration.toml

@qodo-free-for-open-source-projects

Copy link
Copy Markdown
Contributor

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (1) 📎 Requirement gaps (1) 📜 Skill insights (0)

Context used

Grey Divider


Action required

1. Artifact path can escape workspace 📎 Requirement gap ⛨ Security
Description
resolve_artifact_path() accepts absolute paths and falls back to resolving relative paths against
the current working directory, allowing ARTIFACT_PATH to escape GITHUB_WORKSPACE (including ..
traversal) and read arbitrary local files instead of workspace-local artifacts. Because the resolved
file content is injected into tool extra_instructions and rendered into LLM prompts, this violates
the workspace-relative artifact requirement and can leak unintended sensitive files to the model
provider.
Code

pr_agent/algo/artifacts.py[R60-77]

+def resolve_artifact_path(path: str) -> Optional[Path]:
+    if not path:
+        return None
+
+    artifact_path = Path(path)
+    if artifact_path.is_absolute():
+        return artifact_path if artifact_path.is_file() else None
+
+    workspace = os.environ.get("GITHUB_WORKSPACE", "")
+    if workspace:
+        resolved = Path(workspace) / artifact_path
+        if resolved.is_file():
+            return resolved
+
+    resolved = artifact_path.resolve()
+    if resolved.is_file():
+        return resolved
+
Evidence
The requirement is to read artifacts from a GITHUB_WORKSPACE-relative path, but the current
implementation returns an arbitrary absolute path when provided and otherwise falls back to
artifact_path.resolve() (which depends on the process CWD) without verifying that the resolved
target remains inside GITHUB_WORKSPACE, enabling traversal/escape to any readable file in the
container. Separately, the GitHub Action runner takes ARTIFACT_PATH, reads the resolved artifact
contents, appends that content into tool extra_instructions, and the PR review flow renders those
instructions into prompts that are sent via chat_completion; since the runner environment can
include secrets (e.g., GITHUB_TOKEN/OPENAI_KEY in env or files like /proc/self/environ),
arbitrary file read becomes prompt-based exfiltration.

Artifact file is read from GITHUB_WORKSPACE-relative artifact_path and injected into extra_instructions for configured tools
pr_agent/algo/artifacts.py[60-77]
pr_agent/algo/artifacts.py[60-90]
pr_agent/servers/github_action_runner.py[33-66]
pr_agent/servers/github_action_runner.py[110-139]
pr_agent/tools/pr_reviewer.py[77-102]
pr_agent/tools/pr_reviewer.py[203-225]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
`resolve_artifact_path()` currently permits resolving artifact paths outside `GITHUB_WORKSPACE` by allowing absolute paths and by resolving relative paths against the container’s current working directory. Because the resolved file contents are injected into tool `extra_instructions` and rendered into prompts sent to the model, this breaks the workspace-relative artifact contract and creates an arbitrary local file read → prompt exfiltration risk.

## Issue Context
Compliance requires artifact reading to be `GITHUB_WORKSPACE`-relative and effectively confined to that directory before injecting content into `extra_instructions`. Today, `ARTIFACT_PATH` is sourced from environment variables in the GitHub Action runner, the artifact text is loaded and injected into `extra_instructions` for tools, and those instructions are rendered into prompts and sent to the model via `chat_completion`; the runner also has access to sensitive data (e.g., `GITHUB_TOKEN`/`OPENAI_KEY`), so allowing artifacts to point outside the workspace can leak unintended local files.

## Fix Focus Areas
- pr_agent/algo/artifacts.py[60-78]
- pr_agent/servers/github_action_runner.py[110-139]
- pr_agent/settings/configuration.toml[371-383]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools



Remediation recommended

2. Broad except Exception masks errors 📘 Rule violation ⛨ Security
Description
The artifact-injection block uses a broad except Exception and only logs at info, which can
silently mask configuration/path-validation failures in a security-relevant code path. This reduces
visibility and can hide unsafe or unintended behavior.
Code

pr_agent/servers/github_action_runner.py[R110-141]

+    # Inject artifact content into extra_instructions for configured tools
+    try:
+        ARTIFACT_PATH = os.environ.get('ARTIFACT_PATH') or os.environ.get('PR_AGENT_ARTIFACT_PATH')
+        ARTIFACT_TYPE = os.environ.get('ARTIFACT_TYPE') or os.environ.get('PR_AGENT_ARTIFACT_TYPE')
+        if ARTIFACT_PATH:
+            get_settings().set("ARTIFACTS.ENABLE", True)
+            get_settings().set("ARTIFACTS.ARTIFACT_PATH", ARTIFACT_PATH)
+            if ARTIFACT_TYPE:
+                get_settings().set("ARTIFACTS.ARTIFACT_TYPE", ARTIFACT_TYPE)
+
+        artifacts_enabled = get_settings().get("ARTIFACTS.ENABLE", False)
+        if is_true(artifacts_enabled):
+            from pr_agent.algo.artifacts import load_artifact_content
+
+            get_logger().info("Artifact injection enabled, processing artifacts")
+            for key in get_settings():
+                setting = get_settings().get(key)
+                if str(type(setting)) == "<class 'dynaconf.utils.boxing.DynaBox'>":
+                    if key.lower() in ['pr_description', 'pr_code_suggestions', 'pr_reviewer']:
+                        artifact_text = load_artifact_content(key.lower())
+                        if artifact_text:
+                            if hasattr(setting, 'extra_instructions'):
+                                extra_instructions = setting.extra_instructions
+                                separator = "\n======\n\nCI Artifact Context:\n"
+                                updated_instructions = (
+                                    str(extra_instructions) + separator + artifact_text
+                                    if extra_instructions else artifact_text
+                                )
+                                setting.extra_instructions = updated_instructions
+                                get_logger().info(f"Injected artifact context into {key}")
+    except Exception as e:
+        get_logger().info(f"github action: failed to process artifacts: {e}")
Evidence
The checklist requires treating configuration and logs as security boundaries and avoiding broad
exception masking in security paths. The new artifact injection code wraps the entire injection flow
in except Exception, potentially suppressing validation/normalization failures and continuing
execution without clear signaling.

pr_agent/servers/github_action_runner.py[110-141]
Best Practice: Learned patterns

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Artifact injection is wrapped in a broad `except Exception`, which can mask important failures and reduces the safety of configuration-as-input handling.

## Issue Context
Configuration and filesystem path handling are security boundaries; failures should be explicit, validated, and handled with targeted exceptions where feasible.

## Fix Focus Areas
- pr_agent/servers/github_action_runner.py[110-141]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


3. Artifact read loads full file 🐞 Bug ➹ Performance
Description
_read_and_truncate() reads the entire artifact into memory before truncating, and
github_action_runner may call load_artifact_content() separately for each targeted tool, multiplying
I/O and memory usage. Large CI artifacts can cause slow runs or excessive memory usage even when
max_artifact_size is small.
Code

pr_agent/algo/artifacts.py[R81-90]

+def _read_and_truncate(path: Path, max_size: int) -> str:
+    try:
+        content = path.read_text(encoding="utf-8", errors="replace")
+    except (OSError, IOError) as e:
+        get_logger().warning(f"Failed to read artifact file {path}: {e}")
+        return ""
+
+    if len(content) > max_size:
+        content = content[:max_size] + "\n\n[... content truncated due to size limit ...]"
+    return content
Evidence
The artifacts reader uses read_text() (full file read) prior to truncation, and the action runner
calls load_artifact_content() inside a loop over tools, causing redundant reads for the same
artifact.

pr_agent/algo/artifacts.py[81-90]
pr_agent/servers/github_action_runner.py[124-139]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
Artifact truncation is applied only after reading the full file into memory, and the runner calls `load_artifact_content()` per tool. This makes the action unnecessarily slow and memory-hungry for large artifacts.

## Issue Context
`max_artifact_size` is documented as a limit, but currently it doesn't prevent the initial full read.

## Fix Focus Areas
- pr_agent/algo/artifacts.py[81-90]
- pr_agent/servers/github_action_runner.py[124-139]

## Implementation notes
- Change `_read_and_truncate()` to read at most `max_size + 1` characters (or bytes) from disk, then append the truncation marker if needed.
 - Example: `with path.open('r', encoding='utf-8', errors='replace') as f: content = f.read(max_size + 1)`.
- Avoid repeated reads/parsing:
 - Option A: compute `artifact_text` once in `github_action_runner.py` and reuse it for all targeted tools.
 - Option B: cache the parsed artifact text inside `load_artifact_content()` keyed by `(artifact_path, artifact_type, max_size, label)` for the process lifetime.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

ⓘ  1 issues published inline · 3 in summary

  • Author self-review: I have reviewed the code review findings, and addressed the relevant ones.

Qodo Logo

Comment on lines +60 to +77
def resolve_artifact_path(path: str) -> Optional[Path]:
if not path:
return None

artifact_path = Path(path)
if artifact_path.is_absolute():
return artifact_path if artifact_path.is_file() else None

workspace = os.environ.get("GITHUB_WORKSPACE", "")
if workspace:
resolved = Path(workspace) / artifact_path
if resolved.is_file():
return resolved

resolved = artifact_path.resolve()
if resolved.is_file():
return resolved

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Artifact path can escape workspace 📎 Requirement gap ⛨ Security

resolve_artifact_path() accepts absolute paths and falls back to resolving relative paths against
the current working directory, allowing ARTIFACT_PATH to escape GITHUB_WORKSPACE (including ..
traversal) and read arbitrary local files instead of workspace-local artifacts. Because the resolved
file content is injected into tool extra_instructions and rendered into LLM prompts, this violates
the workspace-relative artifact requirement and can leak unintended sensitive files to the model
provider.
Agent Prompt
## Issue description
`resolve_artifact_path()` currently permits resolving artifact paths outside `GITHUB_WORKSPACE` by allowing absolute paths and by resolving relative paths against the container’s current working directory. Because the resolved file contents are injected into tool `extra_instructions` and rendered into prompts sent to the model, this breaks the workspace-relative artifact contract and creates an arbitrary local file read → prompt exfiltration risk.

## Issue Context
Compliance requires artifact reading to be `GITHUB_WORKSPACE`-relative and effectively confined to that directory before injecting content into `extra_instructions`. Today, `ARTIFACT_PATH` is sourced from environment variables in the GitHub Action runner, the artifact text is loaded and injected into `extra_instructions` for tools, and those instructions are rendered into prompts and sent to the model via `chat_completion`; the runner also has access to sensitive data (e.g., `GITHUB_TOKEN`/`OPENAI_KEY`), so allowing artifacts to point outside the workspace can leak unintended local files.

## Fix Focus Areas
- pr_agent/algo/artifacts.py[60-78]
- pr_agent/servers/github_action_runner.py[110-139]
- pr_agent/settings/configuration.toml[371-383]

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@naorpeled

Copy link
Copy Markdown
Member

Hey @MahmoudHaouachi,
First of all, thanks for this. WDYT that instead of adding specific artifact types, we would by default take the generic artifact prompt and allow overriding the prompt itself within the action?

@@ -0,0 +1,176 @@
import os
import tempfile
@@ -0,0 +1,176 @@
import os
import tempfile
from pathlib import Path
from pathlib import Path
from unittest.mock import patch, MagicMock

import pytest
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: Support CI artifact context injection and workflow_run trigger for GitHub Action

3 participants