Skip to content

fix(language-detector): changed tinyld for google Language Detector#4544

Open
AARP41298 wants to merge 1 commit into
pear-devs:masterfrom
AARP41298:language-detector
Open

fix(language-detector): changed tinyld for google Language Detector#4544
AARP41298 wants to merge 1 commit into
pear-devs:masterfrom
AARP41298:language-detector

Conversation

@AARP41298

@AARP41298 AARP41298 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Fix #4529
Vibe coded

Summary by CodeRabbit

  • New Features

    • Added smarter language detection for synced lyrics, helping choose the right romanization more accurately.
    • Improved support for loading language-model assets needed by the app.
  • Bug Fixes

    • Empty or unsupported lyric lines now fall back more safely when language detection isn’t available.
    • Reduced reliance on older text-detection behavior for more consistent results.

@coderabbitai

coderabbitai Bot commented Jul 2, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Replaces the tinyld-based synchronous language detection in the synced-lyrics romanization flow with an async MediaPipe Tasks Text detector. Adds a new language-detector module, a TypeScript declaration for .tflite?url assets, updates package.json dependencies, and adjusts romanize to await the new detector.

Changes

Language Detection Replacement

Layer / File(s) Summary
Dependency and asset type declaration
package.json, src/pear-desktop.ts
Adds @mediapipe/tasks-text dependency, removes tinyld, and declares a module type for *.tflite?url imports exporting a url: string.
MediaPipe language detector module
src/plugins/synced-lyrics/language-detector.ts
New module lazily initializes a MediaPipe LanguageDetector from a CDN WASM fileset and bundled language_detector.tflite model, exposing detectLanguage(text) which returns the top normalized language code or undefined.
Romanization handler wiring
src/plugins/synced-lyrics/renderer/utils.tsx
romanize now imports and awaits detectLanguage instead of tinyld's synchronous detect, selecting a handler only when a truthy language is returned, otherwise falling back to existing heuristics.

Estimated code review effort: 2 (Simple) | ~10 minutes

Sequence Diagram(s)

sequenceDiagram
  participant Utils as romanize()
  participant Detector as detectLanguage()
  participant MediaPipe as LanguageDetector
  participant CDN as CDN WASM/model

  Utils->>Detector: detectLanguage(line)
  Detector->>MediaPipe: lazy init (if needed)
  MediaPipe->>CDN: fetch WASM fileset + tflite model
  CDN-->>MediaPipe: assets loaded
  Detector->>MediaPipe: detect(text)
  MediaPipe-->>Detector: top language result
  Detector-->>Utils: normalized language code or undefined
Loading

Related issues: #4529 — addresses wrong language detection causing incorrect romanization of Chinese lyrics.

Suggested labels: dependencies, plugin: synced-lyrics, enhancement

Suggested reviewers: semvis123

🐰 A rabbit swapped one guesser for a smarter friend,
now Chinese lines to pinyin bend.
No more romaji sneaking through,
MediaPipe tells the tongue anew.
Hop, detect, and sing along! 🎵

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly summarizes the main change: replacing tinyld with Google Language Detector for lyrics language detection.
Linked Issues check ✅ Passed The changes align with #4529 by swapping the lyrics language detector and wiring the new detector into romanization.
Out of Scope Changes check ✅ Passed The dependency update and module declaration support the detector swap and appear in scope for the bug fix.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🧹 Nitpick comments (1)
src/plugins/synced-lyrics/language-detector.ts (1)

9-20: 🩺 Stability & Availability | 🔵 Trivial | 🏗️ Heavy lift

CDN-hosted WASM fetch with a likely-permanent cached failure and silent error swallowing.

The WASM fileset is fetched at runtime from an external CDN (Line 10) inside a lazy-var singleton (Lines 12-20). If that first fetch fails (offline, blocked CDN, transient network issue), the memoized lazy value is very likely a rejected promise, so all subsequent detectLanguage calls for the rest of the session immediately hit the bare catch {} (Lines 35-37) and silently fall back to the old heuristic-based detection — reintroducing the exact Chinese-vs-Japanese misdetection this PR is meant to fix, with no logging to diagnose it.

Consider: bundling the WASM runtime alongside the model (consistent with how the model is already bundled locally) to remove the network dependency, and/or resetting the lazy value on failure so a later call can retry, and/or logging the failure for observability.

Also applies to: 35-37

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@src/plugins/synced-lyrics/language-detector.ts` around lines 9 - 20, The
runtime WASM load in language-detector.ts is relying on a CDN-backed lazy
singleton that can fail once and then stay rejected for the rest of the session,
while detectLanguage silently swallows the error. Update the LanguageDetector
setup around lazyVar.lazy and detectLanguage so the WASM/fileset is bundled or
otherwise not dependent on the external CDN, and ensure a failed initialization
can be retried or the cached failure is cleared. Also replace the bare catch in
detectLanguage with logging so failures are observable instead of silently
falling back.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Nitpick comments:
In `@src/plugins/synced-lyrics/language-detector.ts`:
- Around line 9-20: The runtime WASM load in language-detector.ts is relying on
a CDN-backed lazy singleton that can fail once and then stay rejected for the
rest of the session, while detectLanguage silently swallows the error. Update
the LanguageDetector setup around lazyVar.lazy and detectLanguage so the
WASM/fileset is bundled or otherwise not dependent on the external CDN, and
ensure a failed initialization can be retried or the cached failure is cleared.
Also replace the bare catch in detectLanguage with logging so failures are
observable instead of silently falling back.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: e4ef0fbf-95dc-4fa9-a4c9-41f82d6a36f5

📥 Commits

Reviewing files that changed from the base of the PR and between 2b67ee5 and a082c87.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (5)
  • package.json
  • src/pear-desktop.ts
  • src/plugins/synced-lyrics/language-detector.ts
  • src/plugins/synced-lyrics/language_detector.tflite
  • src/plugins/synced-lyrics/renderer/utils.tsx

@AARP41298

Copy link
Copy Markdown
Contributor Author

Radwimps - Sparkle 2:47

1000年周期を 一日で息しよう //100 JP
1000年周期を 一日で息し //91 JP
1000年周期を 一日で息 // 89 ZH

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Wrong language detected on lyrics romanization

1 participant