Skip to content

feat(whispering): replace Transformations with Polish, a Dictionary, and Recipes (Waves 1-3)#2104

Closed
braden-w wants to merge 28 commits into
mainfrom
whispering-cleanup-formats-restart
Closed

feat(whispering): replace Transformations with Polish, a Dictionary, and Recipes (Waves 1-3)#2104
braden-w wants to merge 28 commits into
mainfrom
whispering-cleanup-formats-restart

Conversation

@braden-w

@braden-w braden-w commented Jun 18, 2026

Copy link
Copy Markdown
Member

Summary

Replaces Whispering's single Transformation concept with three sharper nouns, per ADR 0021. A Transformation fused correction (a property of every transcript) with reformatting (a choice between alternatives); splitting them lifts implementation vocabulary (pre/post replacements, prompt templates) out of the product and makes the reformatting half portable to a future writing app.

The model is three nouns, two behaviors:

  • Dictionary (dictionary: string[]): words Whispering should know. Injection-only: a runtime block in every AI prompt. No find/replace, no regex.
  • Polish (polish.enabled + polish.instructions): the always-on, meaning-preserving AI base. On by default, gated on a configured key (no key means speed mode: instant raw transcript, no AI). Instruction editable under Advanced.
  • Recipes: the on-demand reshape library. Built-ins in code, a recipes table for customs; each runs over already-polished text. (Picker and library land in a follow-up.)

buildSystemPrompt(instructions, dictionary) is pure and unit-tested; the runners read the dictionary at use and pass it in.

What's here

  • Wave 1 (data model + runtime): formats to recipes, cleanup.* to polish.* plus dictionary, the pure buildSystemPrompt, runner rewire, command and shortcut renames. Deletes the find/replace path.
  • Wave 2 (Settings → Dictation page): a Polish toggle (off is speed mode), the instruction under an Advanced disclosure, and the Dictionary as an add/remove term list.
  • Wave 3 (Polishing HUD): a "Polishing…" surface on the floating recording pill while the AI pass runs, with a ship-raw control that cancels the in-flight completion (AbortSignal threaded through complete() and every provider) and delivers the raw transcript. Delivery stays single-write.
  • Merge: reconciled with main's parallel evolution of the same surfaces (clipboard-default output, keyboard permission tiers, capture-surface restructure, view transitions). Details in the merge commit.

Delivery model

Polish delivers the polished transcript once. Under main's clipboard-default output that means one clipboard write when the pass completes; the HUD makes the wait legible and ship-raw bails to raw. The wait only happens for users who configured a key and left Polish on; everyone else gets instant raw.

Not in this PR

  • Recipes picker and the built-in library (Wave 4).
  • Dictionary terms in the transcription initial_prompt (Wave 5).
  • Deferred per ADR 0021: the deterministic fuzzy matcher, local-default Polish, and per-context recipe modes.

View with Codesmith Autofix with Codesmith
Need help on this PR? Tag /codesmith with what you need. Autofix is disabled.

braden-w added 28 commits June 16, 2026 19:22
Replace the fused `Transformation` model (preReplacements -> prompt ->
postReplacements) with two concepts: a singular automatic Cleanup layer
(dictionary + one optional AI tidy) and a plural, picked library of portable
Format units (name + one instruction, text in / text out) that any host can run
over a selection through one shared picker.

- ADR 0013 (Proposed) records the durable decision and the refusals.
- Greenfield spec lives in root `specs/`; its first screen answers current
  state, target shape, the four implementation waves, and per-wave verification,
  and reflects current main (ADR 0011/0012, settings projection, shortcut and
  `#platform` command seams, the raw transcript already on `recordings`).
Delete the `Transformation` concept and replace it with the two-concept model
from ADR 0013: a singular automatic Cleanup layer and a plural, picked library
of Format units. This wave lands the data model and the reshaped runner; the
automatic path (Wave 2), the picker/UI (Wave 3), and migration (Wave 4) follow.

Data model:
- Add a `formats` table { id, name, instructions, icon } (the portable text
  action: name + one instruction, text in / text out).
- Add Cleanup settings: `cleanup.autoCleanup` (enabled-with-instructions or
  disabled, ships enabled) and `cleanup.dictionary` (deterministic entries).
- Add the single global AI default `completion.provider` / `completion.model`
  (Google / gemini-2.5-flash); there is no per-Format model or provider.
- Delete the `transformations` and `transformationRuns` tables, the
  `transformation.selectedId` setting, and the pre/prompt/post shapes.
- Document that `recordings.transcript` stays the raw transcript underneath, so
  Cleanup never loses the user's original words.

Runner:
- New pure `run({ input, format })` in operations/run-format.ts: one AI call,
  instructions as the directive and input as the content. No pre/post, no
  `{{input}}` template, no persistence.

Clean break (no compat layer). The old authoring UI (transformations editor,
routes, run views, selector) and the candidate picker window are deleted; the
two picker/clipboard commands are kept as Wave-3 stubs so shortcuts stay wired.
The post-transcription auto path is stubbed with a Wave-2 marker. Refresh the
state/rpc/settings READMEs to drop the deleted modules.
Wire the automatic correction path the Transformation split left as a TODO:
every transcription now runs Cleanup before delivery.

- run-cleanup.ts: applies cleanup.dictionary deterministically and in order
  (literal default; regex/wholeWord advanced, invalid regex skipped), then
  makes one AI tidy pass only when cleanup.autoCleanup.enabled AND a provider
  key is actually configured. cleanup.*/completion.* are read at use per
  ADR 0012; on AI failure the dictionary-corrected text rides along in the
  error so a transcript is never lost.
- completion.ts: extract the shared COMPLETION_PROVIDERS map plus complete()
  and hasCompletionKey() so the cleanup and Format runners share one call path
  instead of duplicating provider/model/key resolution. run-format.ts now
  consumes it.
- pipeline.ts: deliver the CLEANED text once, after Cleanup finishes
  (deliver-after-cleanup), keeping the raw transcript underneath on
  recordings.transcript. Delivering raw then re-typing cleaned would
  double-type, so we hold delivery instead.

See ADR 0013.
…ry to format

Pre-release hygiene on the three "transformation" surfaces Wave 1 left dead at
runtime. Under deliver-after-cleanup the automatic Cleanup path rides
output.transcription.*, so this scope's real successor is the Wave-3 Format
picker, not Cleanup: rename to format, not cleanup.

- output.transformation.* -> output.format.* (KV keys + settings UI)
- sound.transformationComplete -> sound.formatComplete (KV, sound map, asset,
  sound settings UI)
- deliverTransformationResult -> deliverFormatResult (delivery.ts; the
  settingsScope union and the cursor-asymmetry comment follow)

Clean break, no migration: pre-release, the old keys simply orphan and fall
back to defaults. The Wave-3 picker command IDs (openTransformationPicker /
runTransformationOnClipboard) are intentionally untouched.
Mark the automatic Cleanup path landed and capture the two load-bearing Wave 2
decisions: deliver-after-cleanup (cleaned text delivered once via
output.transcription.*, raw kept underneath) and renaming the dead
transformation output vocabulary to format rather than cleanup, since under
deliver-after-cleanup that scope serves the Wave-3 Format picker. Also note the
non-fatal auto-cleanup fallback and the extracted shared completion call path.
…cipe model

Externalize the post-Wave-2 design pass. Two collapses, recorded in ADR 0013's
Evolution section and a new spec "Evolved direction" section:

- "Cleanup" is not a second concept: the auto AI tidy pass needs no primitive a
  Recipe lacks, so it dissolves into a deterministic Dictionary plus one Recipe
  pinned to auto-run (defaulting to a meaning-preserving Polish). One AI concept.
- "Format" becomes "Recipe."

Also settle the dictionary-ordering question (grounded in the old transform.ts
pre/prompt/post recovery and Handy's source): correct once, deterministically,
before the AI; inject the terms into the polish prompt so the AI cannot un-fix
them; no post-AI pass. Record the fuzzy single-column Dictionary (Handy-style
Levenshtein + Soundex + n-gram) as the strong form over literal heard->spell, and
the single-write-to-cursor + HUD delivery story. Open forks captured for the next
pass. Naming locked to Recipe; shipped cleanup.*/formats names follow later.
…ecipes

ADR 0013's one-AI-mechanism / auto-pinned-Recipe collapse was over-stated:
the pinnedId pointer only ever held Polish-or-null, and the real future
(per-context modes) is a different shape. Revert to three nouns matching the
two behaviors the category actually has: an always-on Polish base, an
injection-only Dictionary, and an on-demand Recipe library. Drop pinnedId, the
deterministic dictionary, and streaming; defer the fuzzy matcher, local Polish,
and per-context modes with reasons.

Rewrites ADR 0013, the spec's model + wave plan, and adds the glossary terms.
Mechanical rename of the on-demand reshape library from `formats` to
`recipes`, per ADR 0013's three-noun model (Dictionary, Polish, Recipes).
No behavior change.

- Table `formats` -> `recipes`; `Format` type -> `Recipe`
- state/formats.svelte.ts -> state/recipes.svelte.ts (formats -> recipes,
  generateDefaultFormat -> generateDefaultRecipe)
- run-format.ts -> run-recipe.ts (run -> runRecipe, RunFormatError -> RunRecipeError)
- output.format.* -> output.recipe.*; sound.formatComplete -> sound.recipeComplete
- deliverFormatResult -> deliverRecipeResult
- debug page and settings UI labels follow the rename
…nners (Wave 1.2-1.3)

Land ADR 0013's data model and runtime: the always-on Polish base, the
injection-only Dictionary, and a pure shared system-prompt builder. Greenfield
clean break (compatibility released): old keys are orphaned, no migration.

Data model (definition.ts):
- cleanup.autoCleanup (union) -> polish.enabled (bool, default true) +
  polish.instructions (string)
- cleanup.dictionary (DictionaryEntry[]) -> dictionary (string[], default [])
- delete the AutoCleanup and DictionaryEntry schemas/types
- shortcut.{openTransformationPicker,runTransformationOnClipboard} ->
  shortcut.{openRecipePicker,runRecipeOnClipboard}

Runtime:
- add pure buildSystemPrompt(instructions, dictionary): instructions plus a
  tagged term block when the dictionary is non-empty, instructions alone when
  empty (with a unit test); the one unit worth testing
- run-cleanup.ts -> run-polish.ts (runPolish): one completion through
  buildSystemPrompt when enabled + keyed + non-empty, else the raw input
  unchanged (speed mode); deletes applyDictionary/escapeRegExp and the whole
  find/replace path; Polish failure stays non-fatal via the error fallback
- runRecipe composes buildSystemPrompt with the dictionary read at use
- pipeline runCleanup -> runPolish (delivery and keep-raw behavior unchanged)
- rename the two command stubs + their command ids and device-config global
  bindings to openRecipePicker / runRecipeOnClipboard

Also retires dangling user-facing references to the deleted Transformation
feature (api-keys "AI" tab, provider tooltips, nav/comment cleanup). The legacy
Dexie `transformations`/`transformationRuns` IndexedDB schema is left untouched:
it is the audio blob store's version history, a name collision, not this feature.
Records the Recipes rename and the Polish + Dictionary data model / runtime
reshape as shipped in the build plan.
A Settings -> Dictation page over Wave 1's data: a Polish toggle (off = speed
mode) with the instruction editable under an Advanced disclosure, and a
Dictionary as an add/remove list of plain terms. The only in-app way to reach
speed mode or add known terms. Reads/writes through the settings namespace.

Also corrects the stale "format completion" sound copy to "recipe completion".
Records the Dictation settings page (Polish toggle + Advanced instruction +
Dictionary term list) as shipped.
…llation (Wave 3)

Mask the ~1s Polish latency on the surface the user is already watching during
dictation. Extends the floating recording-overlay pill (always-on-top, over
other apps) with a 'polishing' state so the dictation HUD reads as one
continuous surface: recording -> Polishing... -> gone. The pill shows a spinner,
"Polishing...", and a clickable x that ships the raw transcript immediately.

- complete() and every provider (openai-compatible, anthropic, google) accept an
  optional AbortSignal threaded to the SDK request; runPolish takes a signal and
  treats a user abort as a clean success (ships raw), not an error
- new polish-hud reactive state owns the active flag + AbortController; the
  pipeline brackets the AI call with begin()/end() and only shows the HUD when
  polishWillRun is true (no flicker in speed mode), routing the overlay's
  ship-raw action to shipRaw()
- RecordingOverlayStatus gains { mode: 'polishing' }; RecordingOverlayAction
  gains 'ship-raw'

Surface choice (greenfield): the HUD lives on the floating pill because the
canonical flow is dictation into another app, where an in-window overlay would
be invisible; this matches the category (Wispr Flow, superwhisper, VoiceInk). On
web (no out-of-window surface) the existing loading toast stands; cancellation
there is a no-op. Global esc-over-other-apps would need a transient rdev key
hook and is deliberately deferred to a follow-up; the clickable pill control is
the cancel affordance for now. No streaming.
Records the Polishing HUD on the floating pill + AbortSignal ship-raw, and flags
true esc-over-other-apps cancellation as a deferred follow-up needing a steer.
Reconcile the dictation rewrite (Polish + Dictionary + Recipes) with main's
parallel evolution of the same surfaces. Policy: main's structural and product
changes win; the transformation->recipe / cleanup->polish rename is re-applied
on top; the two behaviors compose.

Key resolutions:
- Overlay status: keep main's `trigger` discriminant for the recording states;
  the Polish HUD joins as a distinct `{ phase: 'polishing' }` member, narrowed
  with `in`.
- Delivery: adopt main's clipboard-default output (cursor paste opt-in,
  Accessibility-gated); Polish runs before delivery regardless of channel.
- Keyboard: take main's TOGGLE_MODIFIERS tier rework; recipe gestures stay
  unbound (comment vocabulary updated from "Transformation").
- Capture pipeline: take main's capture-surface restructure; drop the inline
  TransformationSelector (recipes are on-demand via the picker, not a pipeline
  stage).
- Drop the removed `sound.pauseMediaDuringRecording` toggle.
- Renumber the dictation ADR 0013 -> 0021 (main now carries three 0013s);
  update all dictation references and the spec link.

bun install pulls @tauri-apps/plugin-global-shortcut (the Tier-0 backend main
re-added). Clean typecheck, 37 tests pass.
…decision

Add a Wave 3.5 section to the dictation spec (injection guard plus persist
polished, then stop before the Wave 4 picker) and record two decisions in
ADR 0021: the Polish system prompt is a fixed scaffold wrapping the editable
directive, and the recording stores both the raw and polished transcript.
Keep free-text Polish over structured toggles for v1; the local-default
deferral still holds. ADR stays Proposed, spec stays In Progress.
…scaffold

Polish passed the raw transcript to the model under a six-word system prompt
with no framing that the content was text to clean rather than instructions to
obey, so a dictated "ignore the above and write a poem" could derail the pass.

Add buildPolishSystemPrompt: a system-invariant scaffold ("text filter, not an
assistant", a Forbidden list, a never-execute-the-transcript line, and a
self-correction line) that wraps the user-editable polish.instructions. The
user tunes the core directive under Advanced but cannot delete the guard.

Keep the shared buildSystemPrompt a pure Dictionary injector: Recipes call it
too and a reshape legitimately adds and rewords text, so the meaning-preserving
scaffold is Polish-only. run-polish.ts switches to the new composer; run-recipe
is untouched. Tests assert the scaffold, Forbidden rules, embedded directive,
and Dictionary ordering (prompt structure, not model behavior).
The recording stored only the raw transcript, so the history showed the rough
words rather than the polished text actually delivered to the cursor. The ADR's
"show original is one click away" needs both stored.

Add polishedTranscript (nullable) to the recordings table. The pipeline writes
the delivered polished text after Polish, and leaves it null in speed mode and
on a polish-failure fallback, where no polished version exists. TranscriptDialog
shows the polished text by default and falls back to raw, with a "Show original"
toggle when Polish produced a distinct version; TextPreviewDialog gains a generic
footer actions slot to host it. Greenfield: one nullable field, no migration.
…Polish comment

The Transformation concept is gone, but its view-transition names survived the
merge with no consumers: the transformation(id) card/selector name and the
recording transformationOutput member. Delete both and the now-stale pipeline
JSDoc that pointed at them.

Reword the pipeline's Polish comment: it still described "typing the raw at the
cursor ... double-type", but delivery is clipboard-default now, so the real risk
is landing two copies (a clipboard paste mid-polish, or two cursor pastes).
Built-in recipes (Email, Reply, Notes, To-dos) ship in code, no "Clean" since
Polish owns cleanup. Built-in ids carry a builtin: prefix so they never collide
with a user recipe's generated id and the library can show them read-only.

Expose recipes.builtins and recipes.pickable (built-ins first, then the user's
own, alphabetical): the single list the picker and the library page both render.
A top-level /recipes config route (parallel to /recordings, since recipes are
user data, not preferences). It lists built-ins union the user's own; built-ins
show a "Built-in" badge and are read-only, customs get edit and delete. A
sticky-note editor modal creates and edits a custom recipe: a name and one
instruction. Replaces the nav-items.ts TODO with a Recipes entry that propagates
to both the sidebar and the mobile bottom nav.
The openRecipePicker and runRecipeOnClipboard commands reported "coming soon".
Repoint them at a real in-app command palette: capture the source (the
foreground selection, or the clipboard) first, focus the Whispering window, then
open a picker over builtins union customs. Picking a recipe runs it via
runRecipe and delivers the take through deliverRecipeResult.

A module-level recipePicker rune is the shared seam (the same shape as the Polish
HUD): the operations open it, the mounted RecipePicker palette reads it and runs
the choice. Add tauri.mainWindow.focus so a shortcut fired from another app
surfaces the palette; this is the in-app step before a floating picker window.
…l_prompt

Whisper and OpenAI take an initial_prompt as a spelling and vocabulary hint, so
append the user's Dictionary terms to transcription.prompt at both transcribe
call sites (local Rust engine and cloud/self-hosted upload). This nudges the
transcriber toward the right spellings before Polish runs. The default Parakeet
ignores initial_prompt, so it is harmless there; an empty dictionary leaves the
prompt unchanged.
Post-implementation review found two stragglers: recipes.builtins was exported
with no consumers (the page and picker both read recipes.pickable), and the
recipe icon field had been inert since Wave 1, never displayed or editable.

Remove the dead getter. Give the built-ins emoji icons, render the icon in the
library list and the picker, and add an optional icon input to the editor so the
schema field finally earns its place.
Polish, the Dictionary, and the Recipe library + picker have all shipped, so
flip ADR 0021 from Proposed to Accepted and record the v1 picker decision: an
in-app command palette now, with the floating Tauri window deferred as a clean
follow-up. Delete the spent spec; git and the generated spec-history index keep
its body.
recordings.set already leaves polishedTranscript null, so writing null again
when Polish did not run (speed mode, or a polish failure that ships the raw
words) was a redundant Yjs transaction on the transcription hot path. Only
update when a Polish pass actually produced a result.
Drop the dead public accessors on the recipes rune: `all`, `get`,
`sorted`, `update`, and `count` have zero consumers. The live surface is
`pickable`, `set`, `delete`, and `[Symbol.dispose]`. The internal
`sorted` $derived stays since `pickable` is built from it.

recordings.svelte.ts keeps its fuller CRUD surface because that one is
actually consumed (TanStack table, bulk delete); the asymmetry is
intentional, not an oversight.
Reconcile the Polish/Dictionary/Recipes feature (ADR 0029) with main's
recording-overlay, desktop-events, and tauri-seam refactors.

Conflict resolutions:
- ADR number collision: main shipped ADRs 0021-0028, so this feature's ADR
  is renumbered 0021 -> 0029 (file, README row, and 20 in-app references).
  Cross-references to main's ADR 0021 (actions boundary) are left intact.
- delivery.ts: OUTPUT_SCOPES becomes ['transcription', 'recipe'] (the old
  'transformation' scope was renamed by this feature); settingsScope adopts
  main's DRY `OutputScope`. outputWritesToCursor() now covers recipe.cursor,
  so the auto-paste intent our deleted attach-desktop-events carried is now
  served by main's attach-auto-paste-intent for free.
- attach-desktop-events.svelte.ts: deleted, superseded by main's granular
  runtime owners (auto-paste-intent, update-check, main-window-reveal, etc.).
- attach-recording-overlay.svelte.ts: adopt main's typed
  recordingOverlayAction.listen; re-graft the polishing-phase + ship-raw
  handling. The focus-main handler moved to main's attach-main-window-reveal.
- recording-overlay/events.ts: keep the ship-raw action on main's typed
  window-event channels.
- transformation-picker: keep this feature's deletion.
- (app)/+page.svelte: drop TransformationSelector, keep main's
  CaptureBehaviorPopover.
@github-actions

Copy link
Copy Markdown
Contributor

Preview Deployment

App Preview URL
Whispering https://920e5ea9-whispering.epicenter.workers.dev
Landing https://5333e911-landing.epicenter.workers.dev

These previews update automatically with new commits to this PR.
No cleanup needed — previews are version aliases on the existing workers.


Commit 2e678b6

@braden-w

Copy link
Copy Markdown
Member Author

Superseded by #2137 (feat/whispering-polish-report), the cleaner reroll of the same Dictionary/Polish/Recipes work. Closing to keep the effort in one place.

@braden-w braden-w closed this Jun 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant