Skip to content

docs(skills): anti-overfit pass over the skill library#2257

Open
braden-w wants to merge 12 commits into
mainfrom
skill-audit
Open

docs(skills): anti-overfit pass over the skill library#2257
braden-w wants to merge 12 commits into
mainfrom
skill-audit

Conversation

@braden-w

@braden-w braden-w commented Jul 1, 2026

Copy link
Copy Markdown
Member

An anti-overfit audit of all 80 skills found the same failure shapes recurring: descriptions claiming everyday phrases ("be brief", "what should we do") that load a skill in conversations where it is wrong, closing sections that restate the body until the two copies drift apart, content harvested verbatim from the one conversation that spawned a skill, and one broken mandated command. This PR fixes the ten clearest offenders, one commit per skill, and encodes the recurring shapes as new detectors so the next pass is mechanical.

Trigger narrowing

The worst misfires were sticky or heavyweight skills claiming cheap phrases. caveman's persistent persona fired on a one-off "be brief"; progress-summary claimed any "can you summarize". Each narrowed description now names the real trigger and, where a near-miss is likely, says what to do instead:

# progress-summary, before
Use when: "can you summarize", "what happened", "where are we at", ...

# after
Use when: "summarize what we did", "where are we at", "walk me through the
changes", or "give me an overview" of session or branch work. Not for
summarizing an article, file, or codebase you did not change.

Bodies were aligned with their narrowed descriptions so a skill never contradicts its own trigger.

Restated content deleted

Where a skill taught the same rule twice (a Quick Reference re-rendering the workflow, an Examples block restating The Pattern case for case), the copy is deleted and the one owner remains. svg-animations drops from 421 lines to 68 plus a load-on-demand recipes reference; specification-writing loses its three restating closers. Two real bugs rode along: spec-execution mandated bun scripts/check-doc-hygiene.mjs (the script is .ts), and error-handling's own examples used console.log against both the repo rule and its own Logging section.

Detectors 5-9

skill-creator/references/composition-audit.md gains five grep-recipe detectors for the shapes this audit kept finding: description-restatement body sections, restating closers, transcript/import residue, dash-strip damage, and everyday-ask triggers. Each was validated against the live library and finds its known offenders (detector 5 flags exactly the 40 skills with a "When to Apply" copy; detector 8 finds the ~111 colon artifacts). Those library-wide findings are deliberately not fixed here; they are the next, mechanical pass.

govuk-style boundary

govuk-style claimed "documentation, summaries, or any prose where clarity and accessibility matter", colliding with the house writing-voice skill. It now owns external-reader prose (reports, research write-ups, guidance) and explicitly hands repo docs, UI strings, commit messages, and code comments to writing-voice.

Edited skills that came from upstream imports (caveman, from mattpocock/skills) now diverge from their skills-lock.json hash; that is the documented "customizable after install" model, and the lock keeps recording the installed baseline.


View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

braden-w added 12 commits July 1, 2026 15:30
"Be brief" is an everyday instruction, but matching it engages a
persistent persona that stays active until the user says "stop
caveman". Keep only explicit mode invocations as triggers.
"Simplify this" collides with collapse-pass, fresh-eyes-grill, and the
harness /simplify command, and most such asks are not about nesting.
Also repair four space-colon artifacts left by an earlier em dash
removal pass.
The txt block still sent agents to run the asymmetric wins pass in
cohesive-clean-breaks, but that pass moved to the asymmetric-wins
skill; the paragraph above it already delegates there, so drop the
re-taught steps. Also stop claiming the everyday phrase "what does X
do", which audit-framed plain code-comprehension questions.
Bare "can you summarize" and "what happened" fire on everyday asks
(summarize this article, explain this stack trace) that have nothing
to do with PR-style work recaps. Also repair three glued-colon
artifacts from an earlier em dash removal pass and compress the git
command block to the two non-obvious rules.
That phrase matches nearly every planning question, loading a
heavyweight visual-proposal format where a plain answer is wanted.
Collapse the four-way diagram-type menu to a default (ownership) with
a single escape clause, and repair a glued-colon artifact.
The skill hardcoded artifacts of a single 2025 Whispering thread as
unconditional rules: always include the cal.com link, claim "I'm free
as early as tomorrow", reference v7.1.0 by number, and ask that
thread's 4-second-recording follow-ups. Agents were set up to paste
stale versions and fabricated availability into public replies. Make
the call offer and Discord invite conditional, swap version literals
for placeholders, generalize the follow-up questions, resolve the
always-greet vs no-formulaic-openings contradiction, and cut the
Writing Style Notes bullets that restate Anti-Patterns and
writing-voice.
The mandated hygiene check ran bun scripts/check-doc-hygiene.mjs, but
the script is check-doc-hygiene.ts, so the gate failed on every spec
execution. Also delete the Quick Reference (a third rendering of the
workflow already shown as the loop diagram and the phases), fold the
commit-shape table back into the intro it restated, and drop the
Length Panic anti-pattern that repeated Phase 0's large-specs-are-fine
rule. Triple-rendered workflow copies drift out of sync.
Good vs Bad Specs, Writing for Agent Implementers, and the Quick
Reference Checklist each re-rendered the Document Structure and Core
Philosophy sections item for item, so every spec-writing task loaded
the same guidance three times and the copies could drift apart. Keep
the two novel pieces (the process-theater rule moves into Core
Philosophy, the too-prescriptive/too-vague sweet spot stays as the
closer) and collapse the duplicated Adjacent Work philosophy block
into the template section.
The Pattern's own snippets and the Examples block used console.log in
service-shaped code, contradicting the repo's no-console-in-library
rule and this skill's own Logging section; agents copy examples
verbatim, so the skill was actively teaching the violation. Delete the
Examples block (a case-for-case restatement of The Pattern), compress
the trySync/tryAsync halves of the when-to-use taxonomy down to the
try-catch escape hatch that actually needs stating, align the
corrupted-row example with the canonical log.warn pattern, and repair
the space-colon artifacts left by an earlier em dash removal pass.
The skill was a 421-line generic essay (coordinate systems, shape
primitives, a path-command table, gradient syntax) with no
repo-specific line anywhere; a competent agent writes that SVG
identically without the skill, and every icon-adjacent task paid the
full load. Keep what an agent actually gets wrong: SMIL-vs-CSS reach
in img contexts, transform-origin defaulting to the viewBox origin,
fill=freeze snap-back, morph command matching, getTotalLength, GPU
compositing, and reduced motion. Move the seven ready-made recipes to
references/recipes.md loaded on demand, and narrow the description so
static SVG and icon-layout asks stop triggering it. Also removes 21
space-colon artifacts from an earlier em dash strip.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant