Skip to content

Fix GenerationConfig continuous batching serialization#47038

Open
VectorPeak wants to merge 4 commits into
huggingface:mainfrom
VectorPeak:fix-continuous-batching-config-serialization
Open

Fix GenerationConfig continuous batching serialization#47038
VectorPeak wants to merge 4 commits into
huggingface:mainfrom
VectorPeak:fix-continuous-batching-config-serialization

Conversation

@VectorPeak

@VectorPeak VectorPeak commented Jul 3, 2026

Copy link
Copy Markdown

CI

What does this PR do?

Fixes #47039

This PR fixes a GenerationConfig serialization round-trip loss for ContinuousBatchingConfig.

What Problem This Solves

GenerationConfig.save_pretrained() persists generation settings by writing the result of GenerationConfig.to_json_string() into generation_config.json. During that JSON conversion, nested dataclass values are passed through convert_dataclass_to_dict(), which currently serializes dataclasses by calling to_dict() when that method exists.

The problem is that ContinuousBatchingConfig is a dataclass, but it did not define to_dict(). Because that helper had no fallback return for dataclasses without to_dict(), the continuous batching config could silently fall through as None during JSON conversion. In practice, a configured continuous batching block could therefore be persisted as JSON null.

A user or service can construct a generation config like this:

GenerationConfig(
    continuous_batching_config=ContinuousBatchingConfig(
        block_size=128,
        default_compile_level=2,
        varlen_compile_config=CompileConfig(dynamic=True),
        decode_compile_config=CompileConfig(mode="default"),
    )
).save_pretrained(...)

Before this fix, the saved generation_config.json could contain:

{
  "continuous_batching_config": null
}

That means the saved config no longer carries the actual continuous batching parameters, including values such as block_size, default_compile_level, and the nested varlen_compile_config / decode_compile_config settings. After a normal save_pretrained() -> from_pretrained() round trip, the loaded GenerationConfig has lost the continuous batching configuration instead of reconstructing it.

This matters for serving and inference setups that rely on saved generation configs: a configuration that was valid in memory can become incomplete after being saved and reloaded, so behavior can drift from the original runtime settings without an explicit error.

Change

This PR fixes both directions of the GenerationConfig round trip: writing ContinuousBatchingConfig into JSON, and restoring it back into typed config objects when the generation config is loaded again.

For the save / serialization path:

  • Add ContinuousBatchingConfig.to_dict() so convert_dataclass_to_dict() has an explicit structured representation to use instead of falling through to None.
  • Serialize the top-level continuous batching fields from the dataclass state, preserving user-provided values such as block_size, default_compile_level, max_cached_graphs, and other continuous batching knobs.
  • Delegate nested varlen_compile_config and decode_compile_config serialization to CompileConfig.to_dict() when those fields are present, so nested compile settings keep the same serialization behavior as standalone CompileConfig objects.
  • Preserve the existing CompileConfig.to_dict() filtering behavior, including not leaking internal implementation fields such as _compile_all_devices into the saved JSON.

For the load / deserialization path:

  • Convert a saved continuous_batching_config dictionary back into a ContinuousBatchingConfig inside GenerationConfig.__init__, matching the existing pattern used by other nested generation config objects.
  • Convert nested varlen_compile_config and decode_compile_config dictionaries back into CompileConfig during ContinuousBatchingConfig.__post_init__, so callers get typed config objects after GenerationConfig.from_pretrained() instead of raw dictionaries.
  • Keep the reconstruction local to continuous batching config handling rather than changing the generic dataclass serializer, which limits the behavioral surface of the fix.

For coverage:

  • Add a regression test that saves a GenerationConfig containing ContinuousBatchingConfig, reloads it with GenerationConfig.from_pretrained(), and verifies that the continuous batching fields survive the round trip.
  • Include nested CompileConfig values in the test to cover the deeper round-trip path, not just the top-level ContinuousBatchingConfig object.
  • Assert that nested compile configs are restored as CompileConfig instances and that internal compile-only fields are not emitted through to_dict().

This keeps the fix scoped to continuous batching generation config serialization. It does not alter continuous batching scheduling, generation execution, compile defaults, or unrelated generation config fields; it only makes the saved configuration faithfully represent the object that was already present in memory.

Evidence

Local behavior proof after the patch:

contains null: False
contains block_size: True
ContinuousBatchingConfig
CompileConfig
False

The final False verifies that _compile_all_devices is not leaked through nested CompileConfig.to_dict() serialization.

Possible call chain / impact

User / service saves generation config
  -> GenerationConfig.save_pretrained(...)
  -> GenerationConfig.to_json_string(...)
  -> convert_dataclass_to_dict(continuous_batching_config)
  -> ContinuousBatchingConfig parameters are preserved instead of becoming null

User / service reloads generation config
  -> GenerationConfig.from_pretrained(...)
  -> GenerationConfig.__init__(...)
  -> continuous_batching_config dict is restored to ContinuousBatchingConfig
  -> nested compile config dicts are restored to CompileConfig

This PR only changes serialization/deserialization of ContinuousBatchingConfig. It does not change continuous batching runtime scheduling, generation behavior, compile defaults, or unrelated generation config fields.

Code Agent Policy

  • I confirm that this is not a pure code agent PR.

AI assistance was used for diagnosis, patch drafting, validation planning, and PR text preparation. The human submitter should review all changed lines and understand the diff before checking this box.

Before submitting

  • This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
  • Did you read the contributor guideline and the Pull Request checks?
  • Was this discussed/approved via a Github issue or the forum? Please add a link to it if that's the case.
  • Did you make sure to update the documentation with your changes according to the guidelines?
    • No docs update needed; this fixes config round-trip behavior and adds a regression test.
  • Did you write any new necessary tests?

Validation run locally:

python -m pytest tests/generation/test_configuration_utils.py::GenerationConfigSerializationTest::test_serialize_generation_continuous_batching_config -q
python -m pytest tests/generation/test_configuration_utils.py::GenerationConfigSerializationTest::test_serialize_generation_watermarking_config -q
python -m ruff format --check src/transformers/generation/configuration_utils.py tests/generation/test_configuration_utils.py
python -m ruff check src/transformers/generation/configuration_utils.py tests/generation/test_configuration_utils.py
git diff --check

Limitations:

make fix-repo

was not run because make is unavailable in the current Windows PowerShell environment.

python utils/tests_fetcher.py --diff_with_last_commit

failed with an IndexError after selecting files from the previous commit rather than the current uncommitted diff. The default tests_fetcher.py invocation completed, but reported no changed files before the patch was committed.

Who can review?

Continuous batching / generation reviewers from the template are likely relevant after tests pass and the coordination issue has maintainer feedback: @remi-or, @ArthurZucker, @McPatate.

@VectorPeak VectorPeak marked this pull request as ready for review July 3, 2026 09:52
@VectorPeak

Copy link
Copy Markdown
Author

CI note: the remaining red check appears to be a self-hosted runner/container infrastructure failure rather than a test failure from this PR.

The CI recap reports 68,570 tests with 0 failures. The failing job is tests_non_model [shard 6/8], and it failed during Initialize containers before checkout, dependency installation, or test execution. The relevant log lines are:

Pod ... is unhealthy with phase status Failed
TypeError: Cannot read properties of null (reading 'jobPod')
Executing the custom container implementation failed. Please contact your self hosted runner administrator.

I do not have permission to rerun the failed upstream job from the fork side.

Comment on lines +1847 to +1854
def to_dict(self) -> dict[str, Any]:
"""Serializes this instance to a Python dictionary."""
output = copy.deepcopy(self.__dict__)
if self.varlen_compile_config is not None:
output["varlen_compile_config"] = self.varlen_compile_config.to_dict()
if self.decode_compile_config is not None:
output["decode_compile_config"] = self.decode_compile_config.to_dict()
return output

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rather than manually doing it per key, lets force all dataclasses to resolve as dicts. I copied this from a few lines above

def convert_dataclass_to_dict(obj):
            if isinstance(obj, dict):
                return {key: convert_dataclass_to_dict(value) for key, value in obj.items()}
            elif is_dataclass(obj):
                # Some of our dataclasses have a custom `to_dict()` method, and we prefer it
                if hasattr(obj, "to_dict"):
                    return obj.to_dict()
            else:
                return obj

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion! I updated the patch to use a generic dataclass fallback in convert_dataclass_to_dict() and removed the per-key ContinuousBatchingConfig.to_dict() handling.

My initial intent with the manual keys was to avoid widening the behavioral surface, but this shared fallback is cleaner and matches the direction you suggested while still preferring custom to_dict() implementations when present.

Validation rerun locally:

python -m pytest tests/generation/test_configuration_utils.py::GenerationConfigSerializationTest::test_serialize_generation_continuous_batching_config -q
python -m pytest tests/generation/test_configuration_utils.py::GenerationConfigSerializationTest::test_serialize_generation_watermarking_config -q
python -m ruff format --check src/transformers/generation/configuration_utils.py tests/generation/test_configuration_utils.py
python -m ruff check src/transformers/generation/configuration_utils.py tests/generation/test_configuration_utils.py
git diff --check

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cc @remi-or , to make sure if saving continuous-batch config is intended

@github-actions

github-actions Bot commented Jul 4, 2026

Copy link
Copy Markdown
Contributor

CI recap

Dashboard: View test results in Grafana
Latest run: 28701913722:2
Result: failure | Jobs: 15 | Tests: 171,151 | Failures: 6 | Duration: 23h 35m

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

GenerationConfig serializes ContinuousBatchingConfig as null

2 participants