feat(whispering): replace Transformations with Polish, a Dictionary, and Recipes (Waves 1-3)#2104
Closed
braden-w wants to merge 28 commits into
Closed
feat(whispering): replace Transformations with Polish, a Dictionary, and Recipes (Waves 1-3)#2104braden-w wants to merge 28 commits into
braden-w wants to merge 28 commits into
Conversation
Replace the fused `Transformation` model (preReplacements -> prompt -> postReplacements) with two concepts: a singular automatic Cleanup layer (dictionary + one optional AI tidy) and a plural, picked library of portable Format units (name + one instruction, text in / text out) that any host can run over a selection through one shared picker. - ADR 0013 (Proposed) records the durable decision and the refusals. - Greenfield spec lives in root `specs/`; its first screen answers current state, target shape, the four implementation waves, and per-wave verification, and reflects current main (ADR 0011/0012, settings projection, shortcut and `#platform` command seams, the raw transcript already on `recordings`).
Delete the `Transformation` concept and replace it with the two-concept model
from ADR 0013: a singular automatic Cleanup layer and a plural, picked library
of Format units. This wave lands the data model and the reshaped runner; the
automatic path (Wave 2), the picker/UI (Wave 3), and migration (Wave 4) follow.
Data model:
- Add a `formats` table { id, name, instructions, icon } (the portable text
action: name + one instruction, text in / text out).
- Add Cleanup settings: `cleanup.autoCleanup` (enabled-with-instructions or
disabled, ships enabled) and `cleanup.dictionary` (deterministic entries).
- Add the single global AI default `completion.provider` / `completion.model`
(Google / gemini-2.5-flash); there is no per-Format model or provider.
- Delete the `transformations` and `transformationRuns` tables, the
`transformation.selectedId` setting, and the pre/prompt/post shapes.
- Document that `recordings.transcript` stays the raw transcript underneath, so
Cleanup never loses the user's original words.
Runner:
- New pure `run({ input, format })` in operations/run-format.ts: one AI call,
instructions as the directive and input as the content. No pre/post, no
`{{input}}` template, no persistence.
Clean break (no compat layer). The old authoring UI (transformations editor,
routes, run views, selector) and the candidate picker window are deleted; the
two picker/clipboard commands are kept as Wave-3 stubs so shortcuts stay wired.
The post-transcription auto path is stubbed with a Wave-2 marker. Refresh the
state/rpc/settings READMEs to drop the deleted modules.
Wire the automatic correction path the Transformation split left as a TODO: every transcription now runs Cleanup before delivery. - run-cleanup.ts: applies cleanup.dictionary deterministically and in order (literal default; regex/wholeWord advanced, invalid regex skipped), then makes one AI tidy pass only when cleanup.autoCleanup.enabled AND a provider key is actually configured. cleanup.*/completion.* are read at use per ADR 0012; on AI failure the dictionary-corrected text rides along in the error so a transcript is never lost. - completion.ts: extract the shared COMPLETION_PROVIDERS map plus complete() and hasCompletionKey() so the cleanup and Format runners share one call path instead of duplicating provider/model/key resolution. run-format.ts now consumes it. - pipeline.ts: deliver the CLEANED text once, after Cleanup finishes (deliver-after-cleanup), keeping the raw transcript underneath on recordings.transcript. Delivering raw then re-typing cleaned would double-type, so we hold delivery instead. See ADR 0013.
…ry to format Pre-release hygiene on the three "transformation" surfaces Wave 1 left dead at runtime. Under deliver-after-cleanup the automatic Cleanup path rides output.transcription.*, so this scope's real successor is the Wave-3 Format picker, not Cleanup: rename to format, not cleanup. - output.transformation.* -> output.format.* (KV keys + settings UI) - sound.transformationComplete -> sound.formatComplete (KV, sound map, asset, sound settings UI) - deliverTransformationResult -> deliverFormatResult (delivery.ts; the settingsScope union and the cursor-asymmetry comment follow) Clean break, no migration: pre-release, the old keys simply orphan and fall back to defaults. The Wave-3 picker command IDs (openTransformationPicker / runTransformationOnClipboard) are intentionally untouched.
Mark the automatic Cleanup path landed and capture the two load-bearing Wave 2 decisions: deliver-after-cleanup (cleaned text delivered once via output.transcription.*, raw kept underneath) and renaming the dead transformation output vocabulary to format rather than cleanup, since under deliver-after-cleanup that scope serves the Wave-3 Format picker. Also note the non-fatal auto-cleanup fallback and the extracted shared completion call path.
…cipe model Externalize the post-Wave-2 design pass. Two collapses, recorded in ADR 0013's Evolution section and a new spec "Evolved direction" section: - "Cleanup" is not a second concept: the auto AI tidy pass needs no primitive a Recipe lacks, so it dissolves into a deterministic Dictionary plus one Recipe pinned to auto-run (defaulting to a meaning-preserving Polish). One AI concept. - "Format" becomes "Recipe." Also settle the dictionary-ordering question (grounded in the old transform.ts pre/prompt/post recovery and Handy's source): correct once, deterministically, before the AI; inject the terms into the polish prompt so the AI cannot un-fix them; no post-AI pass. Record the fuzzy single-column Dictionary (Handy-style Levenshtein + Soundex + n-gram) as the strong form over literal heard->spell, and the single-write-to-cursor + HUD delivery story. Open forks captured for the next pass. Naming locked to Recipe; shipped cleanup.*/formats names follow later.
…ecipes ADR 0013's one-AI-mechanism / auto-pinned-Recipe collapse was over-stated: the pinnedId pointer only ever held Polish-or-null, and the real future (per-context modes) is a different shape. Revert to three nouns matching the two behaviors the category actually has: an always-on Polish base, an injection-only Dictionary, and an on-demand Recipe library. Drop pinnedId, the deterministic dictionary, and streaming; defer the fuzzy matcher, local Polish, and per-context modes with reasons. Rewrites ADR 0013, the spec's model + wave plan, and adds the glossary terms.
Mechanical rename of the on-demand reshape library from `formats` to `recipes`, per ADR 0013's three-noun model (Dictionary, Polish, Recipes). No behavior change. - Table `formats` -> `recipes`; `Format` type -> `Recipe` - state/formats.svelte.ts -> state/recipes.svelte.ts (formats -> recipes, generateDefaultFormat -> generateDefaultRecipe) - run-format.ts -> run-recipe.ts (run -> runRecipe, RunFormatError -> RunRecipeError) - output.format.* -> output.recipe.*; sound.formatComplete -> sound.recipeComplete - deliverFormatResult -> deliverRecipeResult - debug page and settings UI labels follow the rename
…nners (Wave 1.2-1.3)
Land ADR 0013's data model and runtime: the always-on Polish base, the
injection-only Dictionary, and a pure shared system-prompt builder. Greenfield
clean break (compatibility released): old keys are orphaned, no migration.
Data model (definition.ts):
- cleanup.autoCleanup (union) -> polish.enabled (bool, default true) +
polish.instructions (string)
- cleanup.dictionary (DictionaryEntry[]) -> dictionary (string[], default [])
- delete the AutoCleanup and DictionaryEntry schemas/types
- shortcut.{openTransformationPicker,runTransformationOnClipboard} ->
shortcut.{openRecipePicker,runRecipeOnClipboard}
Runtime:
- add pure buildSystemPrompt(instructions, dictionary): instructions plus a
tagged term block when the dictionary is non-empty, instructions alone when
empty (with a unit test); the one unit worth testing
- run-cleanup.ts -> run-polish.ts (runPolish): one completion through
buildSystemPrompt when enabled + keyed + non-empty, else the raw input
unchanged (speed mode); deletes applyDictionary/escapeRegExp and the whole
find/replace path; Polish failure stays non-fatal via the error fallback
- runRecipe composes buildSystemPrompt with the dictionary read at use
- pipeline runCleanup -> runPolish (delivery and keep-raw behavior unchanged)
- rename the two command stubs + their command ids and device-config global
bindings to openRecipePicker / runRecipeOnClipboard
Also retires dangling user-facing references to the deleted Transformation
feature (api-keys "AI" tab, provider tooltips, nav/comment cleanup). The legacy
Dexie `transformations`/`transformationRuns` IndexedDB schema is left untouched:
it is the audio blob store's version history, a name collision, not this feature.
Records the Recipes rename and the Polish + Dictionary data model / runtime reshape as shipped in the build plan.
A Settings -> Dictation page over Wave 1's data: a Polish toggle (off = speed mode) with the instruction editable under an Advanced disclosure, and a Dictionary as an add/remove list of plain terms. The only in-app way to reach speed mode or add known terms. Reads/writes through the settings namespace. Also corrects the stale "format completion" sound copy to "recipe completion".
Records the Dictation settings page (Polish toggle + Advanced instruction + Dictionary term list) as shipped.
…llation (Wave 3)
Mask the ~1s Polish latency on the surface the user is already watching during
dictation. Extends the floating recording-overlay pill (always-on-top, over
other apps) with a 'polishing' state so the dictation HUD reads as one
continuous surface: recording -> Polishing... -> gone. The pill shows a spinner,
"Polishing...", and a clickable x that ships the raw transcript immediately.
- complete() and every provider (openai-compatible, anthropic, google) accept an
optional AbortSignal threaded to the SDK request; runPolish takes a signal and
treats a user abort as a clean success (ships raw), not an error
- new polish-hud reactive state owns the active flag + AbortController; the
pipeline brackets the AI call with begin()/end() and only shows the HUD when
polishWillRun is true (no flicker in speed mode), routing the overlay's
ship-raw action to shipRaw()
- RecordingOverlayStatus gains { mode: 'polishing' }; RecordingOverlayAction
gains 'ship-raw'
Surface choice (greenfield): the HUD lives on the floating pill because the
canonical flow is dictation into another app, where an in-window overlay would
be invisible; this matches the category (Wispr Flow, superwhisper, VoiceInk). On
web (no out-of-window surface) the existing loading toast stands; cancellation
there is a no-op. Global esc-over-other-apps would need a transient rdev key
hook and is deliberately deferred to a follow-up; the clickable pill control is
the cancel affordance for now. No streaming.
Records the Polishing HUD on the floating pill + AbortSignal ship-raw, and flags true esc-over-other-apps cancellation as a deferred follow-up needing a steer.
Reconcile the dictation rewrite (Polish + Dictionary + Recipes) with main's
parallel evolution of the same surfaces. Policy: main's structural and product
changes win; the transformation->recipe / cleanup->polish rename is re-applied
on top; the two behaviors compose.
Key resolutions:
- Overlay status: keep main's `trigger` discriminant for the recording states;
the Polish HUD joins as a distinct `{ phase: 'polishing' }` member, narrowed
with `in`.
- Delivery: adopt main's clipboard-default output (cursor paste opt-in,
Accessibility-gated); Polish runs before delivery regardless of channel.
- Keyboard: take main's TOGGLE_MODIFIERS tier rework; recipe gestures stay
unbound (comment vocabulary updated from "Transformation").
- Capture pipeline: take main's capture-surface restructure; drop the inline
TransformationSelector (recipes are on-demand via the picker, not a pipeline
stage).
- Drop the removed `sound.pauseMediaDuringRecording` toggle.
- Renumber the dictation ADR 0013 -> 0021 (main now carries three 0013s);
update all dictation references and the spec link.
bun install pulls @tauri-apps/plugin-global-shortcut (the Tier-0 backend main
re-added). Clean typecheck, 37 tests pass.
…decision Add a Wave 3.5 section to the dictation spec (injection guard plus persist polished, then stop before the Wave 4 picker) and record two decisions in ADR 0021: the Polish system prompt is a fixed scaffold wrapping the editable directive, and the recording stores both the raw and polished transcript. Keep free-text Polish over structured toggles for v1; the local-default deferral still holds. ADR stays Proposed, spec stays In Progress.
…scaffold
Polish passed the raw transcript to the model under a six-word system prompt
with no framing that the content was text to clean rather than instructions to
obey, so a dictated "ignore the above and write a poem" could derail the pass.
Add buildPolishSystemPrompt: a system-invariant scaffold ("text filter, not an
assistant", a Forbidden list, a never-execute-the-transcript line, and a
self-correction line) that wraps the user-editable polish.instructions. The
user tunes the core directive under Advanced but cannot delete the guard.
Keep the shared buildSystemPrompt a pure Dictionary injector: Recipes call it
too and a reshape legitimately adds and rewords text, so the meaning-preserving
scaffold is Polish-only. run-polish.ts switches to the new composer; run-recipe
is untouched. Tests assert the scaffold, Forbidden rules, embedded directive,
and Dictionary ordering (prompt structure, not model behavior).
The recording stored only the raw transcript, so the history showed the rough words rather than the polished text actually delivered to the cursor. The ADR's "show original is one click away" needs both stored. Add polishedTranscript (nullable) to the recordings table. The pipeline writes the delivered polished text after Polish, and leaves it null in speed mode and on a polish-failure fallback, where no polished version exists. TranscriptDialog shows the polished text by default and falls back to raw, with a "Show original" toggle when Polish produced a distinct version; TextPreviewDialog gains a generic footer actions slot to host it. Greenfield: one nullable field, no migration.
…Polish comment The Transformation concept is gone, but its view-transition names survived the merge with no consumers: the transformation(id) card/selector name and the recording transformationOutput member. Delete both and the now-stale pipeline JSDoc that pointed at them. Reword the pipeline's Polish comment: it still described "typing the raw at the cursor ... double-type", but delivery is clipboard-default now, so the real risk is landing two copies (a clipboard paste mid-polish, or two cursor pastes).
Built-in recipes (Email, Reply, Notes, To-dos) ship in code, no "Clean" since Polish owns cleanup. Built-in ids carry a builtin: prefix so they never collide with a user recipe's generated id and the library can show them read-only. Expose recipes.builtins and recipes.pickable (built-ins first, then the user's own, alphabetical): the single list the picker and the library page both render.
A top-level /recipes config route (parallel to /recordings, since recipes are user data, not preferences). It lists built-ins union the user's own; built-ins show a "Built-in" badge and are read-only, customs get edit and delete. A sticky-note editor modal creates and edits a custom recipe: a name and one instruction. Replaces the nav-items.ts TODO with a Recipes entry that propagates to both the sidebar and the mobile bottom nav.
The openRecipePicker and runRecipeOnClipboard commands reported "coming soon". Repoint them at a real in-app command palette: capture the source (the foreground selection, or the clipboard) first, focus the Whispering window, then open a picker over builtins union customs. Picking a recipe runs it via runRecipe and delivers the take through deliverRecipeResult. A module-level recipePicker rune is the shared seam (the same shape as the Polish HUD): the operations open it, the mounted RecipePicker palette reads it and runs the choice. Add tauri.mainWindow.focus so a shortcut fired from another app surfaces the palette; this is the in-app step before a floating picker window.
…l_prompt Whisper and OpenAI take an initial_prompt as a spelling and vocabulary hint, so append the user's Dictionary terms to transcription.prompt at both transcribe call sites (local Rust engine and cloud/self-hosted upload). This nudges the transcriber toward the right spellings before Polish runs. The default Parakeet ignores initial_prompt, so it is harmless there; an empty dictionary leaves the prompt unchanged.
Post-implementation review found two stragglers: recipes.builtins was exported with no consumers (the page and picker both read recipes.pickable), and the recipe icon field had been inert since Wave 1, never displayed or editable. Remove the dead getter. Give the built-ins emoji icons, render the icon in the library list and the picker, and add an optional icon input to the editor so the schema field finally earns its place.
Polish, the Dictionary, and the Recipe library + picker have all shipped, so flip ADR 0021 from Proposed to Accepted and record the v1 picker decision: an in-app command palette now, with the floating Tauri window deferred as a clean follow-up. Delete the spent spec; git and the generated spec-history index keep its body.
recordings.set already leaves polishedTranscript null, so writing null again when Polish did not run (speed mode, or a polish failure that ships the raw words) was a redundant Yjs transaction on the transcription hot path. Only update when a Polish pass actually produced a result.
Drop the dead public accessors on the recipes rune: `all`, `get`, `sorted`, `update`, and `count` have zero consumers. The live surface is `pickable`, `set`, `delete`, and `[Symbol.dispose]`. The internal `sorted` $derived stays since `pickable` is built from it. recordings.svelte.ts keeps its fuller CRUD surface because that one is actually consumed (TanStack table, bulk delete); the asymmetry is intentional, not an oversight.
Reconcile the Polish/Dictionary/Recipes feature (ADR 0029) with main's recording-overlay, desktop-events, and tauri-seam refactors. Conflict resolutions: - ADR number collision: main shipped ADRs 0021-0028, so this feature's ADR is renumbered 0021 -> 0029 (file, README row, and 20 in-app references). Cross-references to main's ADR 0021 (actions boundary) are left intact. - delivery.ts: OUTPUT_SCOPES becomes ['transcription', 'recipe'] (the old 'transformation' scope was renamed by this feature); settingsScope adopts main's DRY `OutputScope`. outputWritesToCursor() now covers recipe.cursor, so the auto-paste intent our deleted attach-desktop-events carried is now served by main's attach-auto-paste-intent for free. - attach-desktop-events.svelte.ts: deleted, superseded by main's granular runtime owners (auto-paste-intent, update-check, main-window-reveal, etc.). - attach-recording-overlay.svelte.ts: adopt main's typed recordingOverlayAction.listen; re-graft the polishing-phase + ship-raw handling. The focus-main handler moved to main's attach-main-window-reveal. - recording-overlay/events.ts: keep the ship-raw action on main's typed window-event channels. - transformation-picker: keep this feature's deletion. - (app)/+page.svelte: drop TransformationSelector, keep main's CaptureBehaviorPopover.
Contributor
Preview Deployment
These previews update automatically with new commits to this PR. Commit 2e678b6 |
Member
Author
|
Superseded by #2137 (feat/whispering-polish-report), the cleaner reroll of the same Dictionary/Polish/Recipes work. Closing to keep the effort in one place. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Replaces Whispering's single
Transformationconcept with three sharper nouns, per ADR 0021. ATransformationfused correction (a property of every transcript) with reformatting (a choice between alternatives); splitting them lifts implementation vocabulary (pre/post replacements, prompt templates) out of the product and makes the reformatting half portable to a future writing app.The model is three nouns, two behaviors:
dictionary: string[]): words Whispering should know. Injection-only: a runtime block in every AI prompt. No find/replace, no regex.polish.enabled+polish.instructions): the always-on, meaning-preserving AI base. On by default, gated on a configured key (no key means speed mode: instant raw transcript, no AI). Instruction editable under Advanced.recipestable for customs; each runs over already-polished text. (Picker and library land in a follow-up.)buildSystemPrompt(instructions, dictionary)is pure and unit-tested; the runners read the dictionary at use and pass it in.What's here
formatstorecipes,cleanup.*topolish.*plusdictionary, the purebuildSystemPrompt, runner rewire, command and shortcut renames. Deletes the find/replace path.complete()and every provider) and delivers the raw transcript. Delivery stays single-write.Delivery model
Polish delivers the polished transcript once. Under main's clipboard-default output that means one clipboard write when the pass completes; the HUD makes the wait legible and ship-raw bails to raw. The wait only happens for users who configured a key and left Polish on; everyone else gets instant raw.
Not in this PR
initial_prompt(Wave 5).Need help on this PR? Tag
/codesmithwith what you need. Autofix is disabled.