feat(sync): alibaba provider sync script#2950
Open
nathannli wants to merge 23 commits into
Open
Conversation
Author
|
output of running sync manually ( i added the "Skipped creating:" section temporarily) |
Author
|
spot checked a few via https://modelstudio.console.alibabacloud.com/ap-southeast-1?spm=a2ty_o05.31384571.0.0.5e7d9f6bAkjBB4&tab=doc#/doc/?type=model&url=prices and prices match up |
Author
|
waiting for automated review to trigger.. |
DashScope's inference_metadata.request_modality never surfaces pdf, but vision-understanding models accept PDF inputs (the underlying VL stack parses document pages as images). Push pdf into modalities.input only for models with the VU capability instead of fabricating it for every image-input model. Refs: alibabacloud.com/model-studio vision-model docs
The cost.output chain was falling back to image_number, content_duration*1000, and cosy_tts_number when no output_token was present. Those fields are per-image, per-second, and per-character respectively — writing them into cost.output silently corrupts the per-token semantic for image-gen, ASR, and TTS models (e.g. qwen3-asr-flash's hand-curated output would be overwritten on next sync). Match google.ts: preserve existing.cost for models without a token output price.
The sync script perpetuated existing.reasoning_options verbatim, leaving 5 reasoning models (qvq-max, qwen3-next-80b-a3b-thinking, qwen3-vl-235b-a22b, qwen3-vl-30b-a3b, qwq-plus) without a reasoning_options block on every sync. Default to [] when reasoning is true and the TOML has no existing options. Hand-curated values (qwen3.5-plus et al.) are preserved. DashScope's catalog blob has no signal for reasoning controls, so [] is the honest default.
DashScope renamed the text/image/video input bucket to omni_no_audio_input_token and the text-only output bucket to omni_no_audio_output_token in the qwen3.5 omni series (qwen3.5-omni-flash, qwen3.5-omni-plus, +realtime). Add them to the input/output lookup lists so sync captures the rates instead of falling through to 0. Old omni models still report text_input_token / purein_text_output_token and match those first, so no behavior change for them.
DashScope exposes tiered pricing with named ranges (e.g.
'Input<=128k', '128k<Input<=256k'). The catalog schema's
context_over_200k is a sync-only convenience field that lets
consumers read the 200k+ rate directly without walking the
tiers array. Without this, models with a base tier ending below
200k silently misreport their 200k+ cost (qwen3.6-max-preview
shows the 128k base rate 1.3/7.8 at 200k+ context instead of
the correct 2.0/12.0).
Find the tier with the largest lowerBound that is still < 200,000.
That tier covers the 200k+ range (its upper bound is the next tier
or the model's context_window). If only the base covers 200k+
(no non-base tier below 200k), omit the field so consumers fall
back to cost.input/cost.output.
Affects 7 intl models whose tier structure crosses 200k:
qwen3.6-max-preview and the qwen3-coder-{plus,flash,30b-a3b-instruct,
480b-a35b-instruct,plus-2025-09-23,plus-2025-07-22,flash-2025-07-28} family.
The as unknown as Cost cast widens the return type: SyncedFullModel
is typed as AuthoredCost (forbids context_over_200k) but the
catalog validates the synced output as OutputCost (allows it).
The previous code rebuilt tiers from the API via ranges.slice(1), silently overwriting any hand-curated tiers in the TOML even when the team had curated additional tiers (e.g. per-tier cache_read values the API doesn't expose). Now compare apiTiers.length vs existingTiers.length: - API has at least as many tiers as the TOML: API wins (up to date) - TOML has more tiers: preserve the TOML's (hand-curated, the API is behind) - API has no ranges: return existing.cost wholesale (unchanged) The base rate is still spread into the top-level cost so consumers can read the default rate without indexing into tiers[0]. Each tier carries its lowerBound as size — a tier with size: N covers N < context <= (next tier's size, or model context_window).
- context_over_200k is owned by generate.ts (derived from tiers at build time). Sync output validates as AuthoredCost which forbids it; writing it crashes safeParse when tiers exist. Drop the computation, the field, and the `as unknown as Cost` cast. - Tiers: API is the source of truth; hand-curated TOML tiers are never preserved.
equivalent_snapshot and inference_provider are declared in AlibabaModel but never read. .passthrough() preserves them at runtime if a use ever appears.
- cache_read/cache_write: API-only, no `?? existing` fallback.
DashScope reliably exposes cache price types; omission is
intentional. Differs from reasoning/input_audio/output_audio
which keep the fallback (less reliable API exposure).
- imageOutput/duration/tts: guard-only. The Cost schema has no
field for per-image/per-second/per-char pricing; the arms
suppress early-return so `?? existing` preserves curated token
cost for non-token models (qwen3-asr-flash, content_duration
only — without the arm, requireExisting("cost") throws).
When a model has base metadata (models/<id>/<model>.toml) but no provider TOML, mint a thin stub (base_model + reasoning default + API cost) via factorBaseModel inheritance. Skip if no base metadata or no API pricing. 0 candidates today — forward-looking for future base-metadata additions. L4: replace `requireExisting(...) as cast` (factorBaseModel's required 3rd arg feeds baseModelOmit, crashes on undefined) with limitForOmit = translatedLimit ?? baseLimit, matching all other factorBaseModel callers. L1: reasoning base_model models default reasoning_options to [] when neither provider nor base metadata supply them; inherited options left to factorBaseModel. Helpers baseModelMetadata/baseModelMetadataExists read base metadata locally (MODELS_DIR 5 levels up). openrouter.ts and other vendor sync scripts untouched.
Implements the auto-create shape from d0f3203's intent commit: skipCreates defaults to false, and translateModel returns undefined only when both existing and base are missing. New models with a models/alibaba/<id>.toml base match are minted as thin stubs via factorBaseModel; required inline fields (family, temperature, open_weights, knowledge) are passed as undefined so the base is the sole authority and the API's empty signals can't override true base values with false. L1: drop the unreachable baseModelMetadataExists create-branch from the original PR; replace with resolveAlibabaBaseModel walking the models/<id>/ directory. New-model + base-matched path routes through buildAlibabaModel's thin-stub branch (which already factored limitForOmit correctly). L2: cost/limit/modalities now accept ExistingModel | undefined so the new-model path can call them without throwing on undefined. L3: reasoning_options guard restored — default to [] only when the base has no options of its own, otherwise leave undefined so any future curated base options inherit cleanly. L4: buildAlibabaModel signature takes baseModel as the third arg, defaulting to existing?.base_model ?? resolveAlibabaBaseModel(id). Existing-model callers are unchanged; new-model callers pass the resolved base explicitly. Verified: bun validate exit 0 (51 existing TOMLs round-trip), bun models:sync alibaba --dry-run exit 0 (0 created, 42 updated, 9 unchanged, 9 retained via missingNotice), and a targeted buildAlibabaModel check confirms the new-model + base-matched path emits a thin stub with no fabricated overrides and refuses the unreachable (existing=undefined, baseModel=undefined) state.
Match the shape of google.ts / xai.ts / chutes.ts / ovhcloud.ts / baseten.ts / venice.ts. Drops the AlibabaProviderOptions interface and createAlibabaProvider factory; the alibaba provider is now a single object literal that hardcodes the intl deployment config (DASHSCOPE_API_KEY, INTL_API_ENDPOINT, international prose). Net: 80 inserts, 104 deletes (-24 lines). No behavior change. Verified: bun validate, bun models:sync alibaba --dry-run, and the targeted buildAlibabaModel check all stay green.
The intl API has 7 unique range_name values, all English with
ASCII <=; the Chinese (输入) alternative is never used. Verified
via live API curl.
$ jq -rs '[.[] | .output.models[] | .prices[]? | .range_name] | unique' ...
[
"128k<Input<=256k",
"256k<Input<=1m",
"32k<Input<=128k",
"Default",
"Input<=128k",
"Input<=256k",
"Input<=32k"
]
The Chinese branch was speculative scaffolding for the China
deployment, which we don't sync. A China sync would need its own
tierLowerBound (different API, different locale conventions)
anyway — keeping the 输入 handling in the intl version conflates
two deployments.
Verified: bun validate, bun models:sync alibaba --dry-run, and the
targeted buildAlibabaModel check all stay green. The cleanup
produces identical results on all 221 intl models.
The pdf modality fudge relied solely on the (undocumented) VU
capability. Live intl catalog has 2 VL models where DashScope
omits VU despite the vl-segment in the model id:
$ jq -rs '[.[] | .output.models[] | select(((.capabilities // [])
| contains(["VU"])) | not) | select(((.model | ascii_downcase)
| contains("vl"))) | .model]' …
[
"qwen3-vl-32b-instruct",
"qwen3-vl-32b-thinking"
]
Add a second signal: a segment regex matching vl as a delimited
token (qwen3-vl-32b, qwen-vl-max). Avoids false positives from
arbitrary substrings containing vl (e.g. a hypothetical 'evolve-7b'
would not match because there's no leading delimiter before vl).
Behavior:
- 16 models with both VU and vl-substring: pdf added via VU
(existing behavior preserved).
- 21 models with VU only (qwen3.5-plus, qvq-max, wan2.2-kf2v-flash,
kimi-k2.7-code, etc.): pdf added via VU (existing behavior).
- 2 models with vl-substring only (qwen3-vl-32b-*): pdf now added
via the segment regex. Without this fix, pdf was silently missing
for these models.
- 182 models with neither: no pdf (unchanged).
Verified: bun validate, bun models:sync alibaba --dry-run, and the
targeted buildAlibabaModel check all stay green.
Match chutes/openrouter; file used tabs.
existsSync is case-insensitive on macOS; a wrong-case base file would mint a stub locally but skip on Linux CI. Mirror chutes' canonicalExists.
fetchModelsPage and parseModels both ran AlibabaCatalogResponse.parse, validating every model twice. fetchModelsPage now uses a light AlibabaCatalogPage envelope; parseModels does the single full model parse.
a3592d5 to
f420bb1
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
#2121
Summary
Sources