Skip to content

feat(data-collection): create DataCollection option in client#6702

Open
ericapisani wants to merge 16 commits into
masterfrom
ep/db-spec-experiement-foundation-dict
Open

feat(data-collection): create DataCollection option in client#6702
ericapisani wants to merge 16 commits into
masterfrom
ep/db-spec-experiement-foundation-dict

narrow the type on a variable

0c3be5c
Select commit
Loading
Failed to load commit list.
@sentry/warden / warden completed Jul 3, 2026 in 0s

16 issues

Medium

`should_send_default_pii()` ignores `data_collection["user_info"]`, causing silent divergence - `sentry_sdk/client.py:357`

After this change, EventScrubber is initialized from data_collection["user_info"], but should_send_default_pii() (line 753) still reads only options["send_default_pii"]. A user who migrates to data_collection={"user_info": True} without setting send_default_pii=True will find EventScrubber won't scrub PII fields, yet all integration code paths guarded by should_send_default_pii() — including scope user attachment (scope.py:1749), cookies, IP addresses, and dozens of integrations — will still suppress PII, leaving the configuration in a silently split state.

`should_send_default_pii()` not updated to reflect `data_collection["user_info"]`, creating split-brain PII gating - `sentry_sdk/client.py:357`

_Client.should_send_default_pii() still reads options["send_default_pii"] directly, so setting data_collection={"user_info": True} without send_default_pii=True causes EventScrubber (now driven by data_collection["user_info"]) to operate in PII-on mode while all integrations and scope.py that call should_send_default_pii() still return False, contradicting the user's explicit intent.

Low

Spotlight re-derivation treats `None` option values differently than `resolve_data_collection` - `sentry_sdk/client.py:640-643`

Using is not False identity checks for include_local_variables and include_source_context means an explicit None value is treated as True here, while resolve_data_collection passes None directly to _map_from_send_default_pii (where it's falsy). If a user passes either option as None, the spotlight override would produce a data_collection inconsistent with the initial resolution. Consider using the raw option values directly (matching resolve_data_collection's approach) or explicitly coercing with bool(...).

_http_headers_from_value silently mishandles non-dict values via string-contains fallback - `sentry_sdk/consts.py:1282`

_resolve_explicit passes d.get("http_headers", {}) straight into _http_headers_from_value without verifying the value is a dict. Inside, the guard "request" in val performs a substring check when val is a string. If a user passes a string like "off" (which contains neither "request" nor "response"), the code silently falls back to the deny_list default — meaning a user attempting to disable header collection actually still collects headers (a PII concern). If instead the string contains "request" or "response" as a substring (e.g. "request_off"), val["request"] performs string indexing and raises a cryptic TypeError/AttributeError in _kvcb_from_value rather than a clear validation error. The test test_http_headers_collection_defaults exercises "off" and confirms the silent-fallback path, but no validation rejects malformed values.

Also found at:

  • sentry_sdk/data_collection.py:152-158
DeprecationWarning stacklevel points to internal SDK code instead of user code - `sentry_sdk/data_collection.py:249-251`

With stacklevel=2, the warning points to _get_options() in client.py rather than the user's sentry_sdk.init() call, making it hard for users to locate the offending line in their own code. A stacklevel of ~5 is needed to reach user code via resolve_data_collection_get_options_Client.__init___init → user code.

Also found at:

  • sentry_sdk/client.py:28
`_kvcb_from_value` raises confusing AttributeError on non-dict `cookies`/`query_params` values

When a user passes a non-dict value for cookies or query_params (e.g. data_collection={"cookies": "off"}), _resolve_explicit forwards it to _kvcb_from_value, which calls val.get("mode", "deny_list") unconditionally. Strings and other non-dicts have no .get(), so this raises a confusing AttributeError at sentry_sdk.init time instead of the clear ValueError the module raises for other invalid modes. This is inconsistent with _http_headers_from_value, which for a string value uses the in substring operator ("request" in val) and silently falls back to the deny-list default. So one bad-input shape crashes and another is silently ignored. Impact is limited to SDK misconfiguration surfaced at startup (no security/runtime data impact); the concern is a misleading error and inconsistent validation.

`http_bodies` list items are never validated against the set of known body types

When a user provides data_collection={"http_bodies": ["bad_value"]}, the list is accepted and stored without checking each item against ALL_HTTP_BODY_TYPES, silently producing an invalid configuration that downstream consumers may mishandle.

Individual `http_bodies` items are not validated against `ALL_HTTP_BODY_TYPES`

Arbitrary strings in the http_bodies list are accepted without error; if an invalid body type is supplied the SDK will silently never match any request body, making collection appear configured while nothing is actually captured.

`_kvcb_from_value` crashes with unhelpful AttributeError when a non-dict is given for `cookies` or `query_params` - `sentry_sdk/client.py:27-28`

In sentry_sdk/data_collection.py, _kvcb_from_value calls val.get(...) with no isinstance guard, so passing data_collection={"cookies": "off"} instead of data_collection={"cookies": {"mode": "off"}} raises an unhelpful AttributeError: 'str' object has no attribute 'get' during sentry_sdk.init(). Add an isinstance(val, dict) check at the top of _kvcb_from_value and raise a clear TypeError/ValueError for non-dict inputs so misconfiguration produces an actionable error.

Also found at:

  • sentry_sdk/data_collection.py:164-180
`DeprecationWarning` for `send_default_pii` uses wrong `stacklevel`, pointing to internal SDK code - `sentry_sdk/client.py:28`

In sentry_sdk/data_collection.py, resolve_data_collection emits a DeprecationWarning for send_default_pii with warnings.warn(..., stacklevel=2). Because the warning is raised deep in the SDK call chain (sentry_sdk.init() -> Client.__init__ -> _get_options() -> resolve_data_collection() -> warnings.warn), stacklevel=2 resolves to _get_options() in client.py rather than the user's sentry_sdk.init() call site. This makes the deprecation warning reference an unhelpful internal frame. Increase stacklevel to match the actual call depth from user code.

Also found at:

  • sentry_sdk/data_collection.py:247-251

...and 6 more

4 skills analyzed
Skill Findings Duration Cost
security-review 0 55.0s $0.26
code-review 4 11m 19s $4.40
find-bugs 12 16m 25s $7.53
skill-scanner 0 11m 39s $0.11

⏱ 40m 19s · 11.1M in / 288.2k out · $12.30