Tighten anomaly detector test coverage and subtest structure#43364
Conversation
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
Co-authored-by: pelikhan <4175913+pelikhan@users.noreply.github.com>
|
✅ Test Quality Sentinel completed test quality analysis. |
|
|
|
🧠 Matt Pocock Skills Reviewer has completed the skills-based review. ✅ |
|
✅ Design Decision Gate 🏗️ completed the design decision gate check. No ADR enforcement needed: PR does not have the 'implementation' label and has ≤100 new lines of code in business logic directories (93 additions counted). |
There was a problem hiding this comment.
Pull request overview
This PR strengthens pkg/agentdrain unit tests around anomaly detection and event analysis by adding coverage for previously untested edge cases and restructuring existing tests for clearer, independent subcases.
Changes:
- Added
assertAnomalyScorehelper to standardize zero vs non-zero anomaly score assertions. - Added coverage for
Miner.AnalyzeEventreturning an error when masking produces an empty token stream. - Refactored
TestAnalyzeEventinto independentt.Run(...)subtests and modernized threshold validation assertions withrequire.ErrorContains.
Show a summary per file
| File | Description |
|---|---|
| pkg/agentdrain/anomaly_test.go | Adds new edge-case test coverage, refactors TestAnalyzeEvent into subtests, and updates assertion helpers/patterns. |
Review details
Tip
Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Files reviewed: 1/1 changed files
- Comments generated: 1
- Review effort level: Low
| assert.Panics(t, func() { | ||
| detector.Analyze(nil, false, &Cluster{ID: 1, Template: []string{"a"}, Size: 1}) | ||
| }, "Analyze should panic when result is nil for an existing template") |
🧪 Test Quality Sentinel Report✅ Test Quality Score: 84/100 — Excellent
📊 Metrics (5 tests)
Structural ImprovementsThe PR also delivers three notable quality improvements beyond new tests:
Inflation Note
Verdict
Warning Firewall blocked 1 domainThe following domain was blocked by the firewall during workflow execution:
network:
allowed:
- defaults
- "awmgmcpg"See Network Configuration for more information.
|
There was a problem hiding this comment.
Review Summary
The structural improvements here are solid: the subtest refactor adds clear test isolation, require.ErrorContains is the right helper, and assertAnomalyScore correctly distinguishes zero from near-zero.
Two actionable issues need attention before merge:
1. Nil result panic is a production code hazard (blocking)
TestAnomalyDetector_Analyze_PanicsOnNilResultParameter correctly observes that passing (nil, false, ...) panics. But the panic is an accidental nil-dereference in production code at:
LowSimilarity: !isNew && result.Similarity < d.threshold, // NPE when result == nilThe test enshrines this hazard without fixing or explicitly documenting it in anomaly.go. The right fix is either a defensive nil-guard in Analyze or an explicit panic("Analyze: result must not be nil when isNew is false") with a godoc note — not a test that silently accepts accidental NPE behavior.
2. setupEvents intent undocumented (non-blocking)
The per-subtest miner construction plus setupEvents slice are idiomatic but not self-explanatory. A brief comment explaining that each subtest seeds a fresh miner would prevent future regressions.
What is good
assertAnomalyScorehelper correctly separates exact-zero assertions from delta-based checks.- Converting
TestAnalyzeEventto independent subtests with per-case miner construction is the right pattern. require.ErrorContainsoverassert.Contains(err.Error(), ...)is strictly better.TestAnalyzeEvent_EmptyAfterMaskingcovers the masking guard inminer.go:134that previously had no test.
🧵 Reviewed using Impeccable skills by Impeccable Skills Reviewer · 70.6 AIC · ⌖ 6.05 AIC · ⊞ 4.8K
| require.NoError(t, err, "NewAnomalyDetector should succeed") | ||
| require.NotNil(t, detector, "NewAnomalyDetector should return a non-nil detector") | ||
|
|
||
| assert.Panics(t, func() { |
There was a problem hiding this comment.
This test correctly documents the current behavior, but it encodes a silent nil-dereference panic as "expected." The Analyze function guards cluster with cluster != nil, but there is no equivalent guard for result.
Consider one of:
- Preferred – Add a nil guard in
Analyzeso a nil result is treated as "no existing match":
LowSimilarity: !isNew && result != nil && result.Similarity < d.threshold,- Explicit contract – Add a godoc note and deliberate panic call so callers know this is a programming error, not a runtime edge-case.
As written the test normalises a latent API hazard in the production code without any comment in anomaly.go explaining that callers must never pass (nil, false, ...).
@copilot please address this.
| @@ -371,9 +392,6 @@ func TestBuildReason(t *testing.T) { | |||
|
|
|||
| func TestAnalyzeEvent(t *testing.T) { | |||
| cfg := DefaultConfig() | |||
There was a problem hiding this comment.
The shared evtPlan and evtFinish variables are defined in the outer function scope (lines 397-403) and captured by all subtests. The values are immutable structs so this is safe, but each subtest also creates a fresh miner while reusing the same event variables.
The issue is that the setupEvents mechanism relies on inserting evtPlan as setup before testing evtPlan again. Because the miner is fresh each time, the setup correctly seeds the miner before the actual assertion. This is fine.
However, the comment (or lack thereof) at the top of the table makes the setupEvents pattern non-obvious. Consider documenting the intent, e.g.:
// setupEvents seeds the miner before the test event so it sees a pre-warmed state.
// Every subtest gets a fresh miner; setup events are re-applied per case.This will prevent future maintainers from removing the per-subtest miner construction (and breaking isolation) thinking it is unnecessary. @copilot please address this.
There was a problem hiding this comment.
Skills-Based Review 🧠
Applied /tdd — commenting with minor improvement suggestions; no blocking correctness issues found.
📋 Key Themes & Highlights
Positive Highlights
- ✅ Excellent gap coverage: adds tests for both the nil-result panic path and the empty-after-masking error path — two realistic failure modes that were previously uncovered
- ✅ Subtest isolation: converting
TestAnalyzeEventfrom stateful sequential steps to independent table-driven subtests with fresh miners per case is a meaningful correctness improvement - ✅ Better assertions: switching
err.Error()string matching torequire.ErrorContainsis idiomatic and reports failures more clearly - ✅
assertAnomalyScorehelper cleanly separates the zero vs. near-zero distinction — the rationale is sound
Suggestions (non-blocking)
- Panic test needs context (line 252) —
assert.Panicslocks in a latent bug as expected behavior; a TODO or companion issue would make the technical debt explicit - Parameter naming on the helper (line 12) —
want, gotdiverges from testify convention (expected, actual); low risk but could cause confusion in future callers - Missing
t.Parallel()in table subtests (line 439) — subtests create fresh state per case, so parallelism is safe and would improve CI throughput - Setup loop diagnostic message (line 444) — include an index in the error message so failing setup steps are immediately identifiable
- Masking test intent (line 472) — the empty
Fieldsis non-obvious; a brief comment would clarify why no fields are needed
Warning
Firewall blocked 1 domain
The following domain was blocked by the firewall during workflow execution:
proxy.golang.org
To allow these domains, add them to the
network.allowedlist in your workflow frontmatter:
network:
allowed:
- defaults
- "proxy.golang.org"See Network Configuration for more information.
🧠 Reviewed using Matt Pocock's skills by Matt Pocock Skills Reviewer · 112 AIC · ⌖ 5.8 AIC · ⊞ 6.6K
Comment /matt to run again
| require.NoError(t, err, "NewAnomalyDetector should succeed") | ||
| require.NotNil(t, detector, "NewAnomalyDetector should return a non-nil detector") | ||
|
|
||
| assert.Panics(t, func() { |
There was a problem hiding this comment.
[/tdd] assert.Panics here documents a latent nil-dereference as expected behavior, which could mislead future readers into treating the panic as an intentional contract rather than a defect.
💡 Suggestion
If the goal is purely to document existing behavior, add an explicit comment:
// This test pins the current panic; a follow-up should make Analyze return an
// error instead so callers can handle the failure gracefully.Or, open a companion issue and reference it here so the debt is trackable.
@copilot please address this.
| "github.com/stretchr/testify/require" | ||
| ) | ||
|
|
||
| func assertAnomalyScore(t *testing.T, want, got float64) { |
There was a problem hiding this comment.
[/tdd] The helper signature assertAnomalyScore(t, want, got) diverges from testify's own convention of (t, expected, actual) — but more practically, the argument order is the opposite of how assert.InDelta is called internally (assert.InDelta(t, want, got, ...)), which is correct. The concern is that callers reading the signature may accidentally swap want/got and get silent false-positives.
💡 Suggestion
Consider renaming the parameters to match testify convention (expected, actual) or adding a brief doc comment:
// assertAnomalyScore checks that actual equals expected.
// Pass expected first, actual second — matching testify convention.
func assertAnomalyScore(t *testing.T, expected, actual float64) {@copilot please address this.
| } | ||
|
|
||
| for _, tt := range tests { | ||
| t.Run(tt.name, func(t *testing.T) { |
There was a problem hiding this comment.
[/tdd] The new table-driven subtests are a clear improvement in isolation, but none of them call t.Parallel() — the outer loop captures tt by reference, so this is actually a pre-Go-1.22 loop-variable hazard if parallelism were added later.
💡 Suggestion
Either add t.Parallel() now (which also exercises the isolation you've built) or pin the loop variable explicitly to make it safe if parallelism is added later:
for _, tt := range tests {
tt := tt // pin for parallel safety (pre-Go 1.22)
t.Run(tt.name, func(t *testing.T) {
t.Parallel()
// ...
})
}If intentionally sequential (e.g., shared miner state), add a comment explaining why.
@copilot please address this.
| assert.True(t, reportDistinct.IsNewTemplate, "IsNewTemplate mismatch for distinct event") | ||
| assert.InDelta(t, 0.65, reportDistinct.AnomalyScore, 1e-9, "AnomalyScore mismatch for distinct event") | ||
| assert.Equal(t, "new log template discovered; rare cluster (few observations)", reportDistinct.Reason, "Reason mismatch for distinct event") | ||
| result, report, err := m.AnalyzeEvent(AgentEvent{Stage: "plan"}) |
There was a problem hiding this comment.
[/tdd] The masking test uses AgentEvent{Stage: "plan"} with no Fields. Because FlattenEvent emits "stage=plan" even without fields, the ".+" mask rule erases it, so the test correctly triggers the empty-after-masking path. However, this is non-obvious — a reader might wonder why no fields are needed.
💡 Suggestion
Add a brief inline comment to make the intent explicit:
// No Fields needed: the mask rule ".+" erases the stage token too,
// leaving an empty string after masking.
result, report, err := m.AnalyzeEvent(AgentEvent{Stage: "plan"})Also consider adding a second case where Fields is set to show that masking applies to the full flattened string, not just the fields portion.
@copilot please address this.
| require.NoError(t, err, "NewMiner should succeed") | ||
| require.NotNil(t, m, "NewMiner should return a non-nil miner") | ||
|
|
||
| for _, setupEvent := range tt.setupEvents { |
There was a problem hiding this comment.
[/tdd] The setup loop silently discards the result values; if a setup AnalyzeEvent returns non-nil results, that's unverified. More importantly, if setup fails we know it failed but we have no diagnostic about which setup event caused it.
💡 Suggestion
Include an index or event identifier in the error message so failures are easier to diagnose:
for i, setupEvent := range tt.setupEvents {
_, _, err := m.AnalyzeEvent(setupEvent)
require.NoError(t, err, "AnalyzeEvent setup[%d] should not fail", i)
}@copilot please address this.
There was a problem hiding this comment.
Two correctness/quality issues need addressing before merge, plus one robustness concern.
The structural refactoring (table-driven subtests, independent miners per case, require.ErrorContains cleanup, assertAnomalyScore helper) is sound and an improvement. The new TestAnalyzeEvent_EmptyAfterMasking test covers a real gap. However three issues block this:
### Blocking issues
1. Panic test cements a bug (line ~255) — TestAnomalyDetector_Analyze_PanicsOnNilResultParameter asserts that passing nil with isNew=false panics. The panic is a plain nil pointer dereference in production code (result.Similarity), not an intentional contract. Shipping a green assert.Panics test means future callers can't distinguish "this is a spec" from "this is a bug I should fix". The correct fix is to add a nil guard in Analyze and test the error path instead.
2. Table refactor omits flag assertions (line ~415) — TestAnalyzeEvent drops wantLowSimilarity and wantRareCluster assertions. The Reason string check partially compensates but is not equivalent: a regression that flips flags while keeping the same reason (via weight adjustments) would silently pass.
### Non-blocking concern
3. Masking test regex fragility (line ~473) — The .+ pattern works for the current FlattenEvent output but does not cover whitespace-only events and is sensitive to changes in how stage= is emitted. A \S+ pattern would be more precise and self-documenting.
🔎 Code quality review by PR Code Quality Reviewer · 175.4 AIC · ⌖ 9.94 AIC · ⊞ 5.4K
Comment /review to run again
| assert.Panics(t, func() { | ||
| detector.Analyze(nil, false, &Cluster{ID: 1, Template: []string{"a"}, Size: 1}) | ||
| }, "Analyze should panic when result is nil for an existing template") | ||
| } |
There was a problem hiding this comment.
Panic test codifies a nil-dereference bug as an intentional API contract. The underlying cause is an unconditional nil pointer dereference in Analyze, not deliberate defensive behavior — shipping this as a passing test means future callers get a silent crash in production instead of a recoverable error.
💡 Analysis and suggested fix
The production code panics here:
LowSimilarity: !isNew && result.Similarity < d.threshold,When isNew=false and result is nil, this is a plain nil pointer dereference — there is no explicit panic or documented contract. The isNew=true path silently works fine with nil due to short-circuit evaluation, making the behavior asymmetric and surprising.
A test that asserts Panics on this path tells future developers this is correct behavior to preserve, not a bug to fix. Any future caller who passes nil (e.g., from a new code path or refactor) will get a silent crash in production.
Option 1 — Fix the bug instead of testing it:
func (d *AnomalyDetector) Analyze(result *MatchResult, isNew bool, cluster *Cluster) (*AnomalyReport, error) {
if !isNew && result == nil {
return nil, errors.New("agentdrain: Analyze: result must not be nil when isNew=false")
}
// ... rest of function
}Option 2 — If the panic is truly intentional, add a doc comment and rename the test:
// TestAnalyzeEvent_NilResult_KnownPanic documents a known limitation: passing nil result
// with isNew=false causes a panic. TODO: replace panic with error return (#NNN).At minimum, a passing assert.Panics test should never be the end state for unexpected nil dereferences.
| }{ | ||
| { | ||
| name: "first occurrence trains a new template", | ||
| event: evtPlan, |
There was a problem hiding this comment.
Refactored table test drops wantLowSimilarity and wantRareCluster assertions, leaving gaps in flag-level coverage. The original linear test didn't verify these fields either, but the table struct has room and the PR description calls out flag semantics specifically.
💡 Why this matters and how to fix it
The test verifies IsNewTemplate, AnomalyScore, and Reason — but not LowSimilarity or RareCluster. These are the underlying boolean flags that drive the score and reason. A future change that flips a flag and compensates by adjusting score weights could make the reason string match while the flags are wrong, and this test would not catch it.
The existing TestAnomalyDetector_Analyze table already tests flags exhaustively via the detector directly. The gap here is that TestAnalyzeEvent goes through the full Miner path — that's a different integration surface — and it silently skips flag assertions.
Adding two fields and two assertions closes this gap with minimal effort:
tests := []struct {
name string
setupEvents []AgentEvent
event AgentEvent
wantIsNew bool
wantLowSimilarity bool // add
wantRareCluster bool // add
wantScore float64
wantReason string
}{
{
name: "first occurrence trains a new template",
event: evtPlan,
wantIsNew: true,
wantLowSimilarity: false,
wantRareCluster: true,
wantScore: 0.65,
wantReason: "new log template discovered; rare cluster (few observations)",
},
// ...
}
// in the assertion block:
assert.Equal(t, tt.wantLowSimilarity, report.LowSimilarity, "LowSimilarity mismatch")
assert.Equal(t, tt.wantRareCluster, report.RareCluster, "RareCluster mismatch")| assert.InDelta(t, 0.65, reportDistinct.AnomalyScore, 1e-9, "AnomalyScore mismatch for distinct event") | ||
| assert.Equal(t, "new log template discovered; rare cluster (few observations)", reportDistinct.Reason, "Reason mismatch for distinct event") | ||
| result, report, err := m.AnalyzeEvent(AgentEvent{Stage: "plan"}) | ||
| require.Error(t, err, "AnalyzeEvent should reject events that become empty after masking") |
There was a problem hiding this comment.
TestAnalyzeEvent_EmptyAfterMasking may silently pass for the wrong reason. The regex .+ does NOT match an empty string or a string of only whitespace, but FlattenEvent always prepends stage=plan — so the test works today. However, the test has no assertion that the input to Analyze is actually empty; it relies entirely on the error message string.
💡 Robustness concern
FlattenEvent for AgentEvent{Stage: "plan"} produces "stage=plan". The mask rule Pattern: ".+" erases that to "", so Tokenize("") returns nil, and the empty-tokens guard fires correctly.
This is working now, but the test is fragile in two ways:
-
If
FlattenEventever omitsstage=from the output (e.g.,Stagebecomes an excluded field), the flattened event would be empty before masking, andAnalyzeEventwould still return the same error — but via a different code path. The test would still pass, masking a behavioral regression. -
The regex
.+does not match empty string or whitespace-only strings. An event that flattens to only whitespace (possible if field values are empty) would produce tokens even with this mask rule, and the test would not cover that edge case.
Consider using a mask rule that is explicitly designed to test the empty-after-masking path, and add a comment explaining why this rule is chosen:
// Replace every non-whitespace character, leaving only spaces which Tokenize strips.
cfg.MaskRules = []MaskRule{{
Name: "eraseAllNonWhitespace",
Pattern: `\S+`,
Replacement: "",
}}Or alternatively, verify that the test input actually reaches the masking step by inspecting intermediate state, or adding a second case with a whitespace-only flattened event.
|
@copilot please run the Unresolved review feedback to address:
|
Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
Addressed the nil-deref feedback: |
🤖 PR Triage — Run §28706384347
Score breakdown: impact 25 + urgency 12 + quality 15
|
|
@copilot please run the Unresolved review feedback to address:
Run: https://github.com/github/gh-aw/actions/runs/28706672228
|
Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
All five unresolved threads addressed in commit 8ee26b8:
Also addressed the other non-outdated |
|
@copilot please run the
|
Both blocking reviews were addressed in previous commits — no further code changes required:
All other non-outdated thread concerns are also reflected in the current code: Local validation: Summary:
Hand-off: CI must be re-triggered by a maintainer before merge. |
Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
|
@copilot please
|
Both reviews were already addressed in prior commits — no new code changes are needed:
Local validation ( |
Co-authored-by: gh-aw-bot <259018956+gh-aw-bot@users.noreply.github.com>
pkg/agentdrain/anomaly_test.gohad strong boundary coverage but missed two high-value behaviors:Analyzewith a nil result andAnalyzeEventwhen masking removes the full event. This change adds coverage for those paths and cleans up the test structure/assertions to better match current behavior and repository conventions.Behavioral gaps
AnomalyDetector.Analyze(nil, false, ...), documenting the current panic behavior for a nilresulton existing-template analysis.Miner.AnalyzeEventreturningagentdrain: AnalyzeEvent: empty event after maskingwhen masking eliminates all tokens.Assertion cleanup
err.Error()withrequire.ErrorContains(...)in threshold validation cases.Subtest refactor
TestAnalyzeEventfrom linear stateful steps into explicitt.Run(...)cases.Illustrative case