Skip to content

feat(data-collection): create DataCollection option in client#6702

Open
ericapisani wants to merge 16 commits into
masterfrom
ep/db-spec-experiement-foundation-dict
Open

feat(data-collection): create DataCollection option in client#6702
ericapisani wants to merge 16 commits into
masterfrom
ep/db-spec-experiement-foundation-dict

narrow the type on a variable

0c3be5c
Select commit
Loading
Failed to load commit list.
@sentry/warden / warden: find-bugs completed Jul 3, 2026

11 issues

find-bugs: Found 12 issues (1 medium, 11 low)

Medium

`should_send_default_pii()` not updated to reflect `data_collection["user_info"]`, creating split-brain PII gating - `sentry_sdk/client.py:357`

_Client.should_send_default_pii() still reads options["send_default_pii"] directly, so setting data_collection={"user_info": True} without send_default_pii=True causes EventScrubber (now driven by data_collection["user_info"]) to operate in PII-on mode while all integrations and scope.py that call should_send_default_pii() still return False, contradicting the user's explicit intent.

Low

`_kvcb_from_value` raises confusing AttributeError on non-dict `cookies`/`query_params` values

When a user passes a non-dict value for cookies or query_params (e.g. data_collection={"cookies": "off"}), _resolve_explicit forwards it to _kvcb_from_value, which calls val.get("mode", "deny_list") unconditionally. Strings and other non-dicts have no .get(), so this raises a confusing AttributeError at sentry_sdk.init time instead of the clear ValueError the module raises for other invalid modes. This is inconsistent with _http_headers_from_value, which for a string value uses the in substring operator ("request" in val) and silently falls back to the deny-list default. So one bad-input shape crashes and another is silently ignored. Impact is limited to SDK misconfiguration surfaced at startup (no security/runtime data impact); the concern is a misleading error and inconsistent validation.

`http_bodies` list items are never validated against the set of known body types

When a user provides data_collection={"http_bodies": ["bad_value"]}, the list is accepted and stored without checking each item against ALL_HTTP_BODY_TYPES, silently producing an invalid configuration that downstream consumers may mishandle.

Individual `http_bodies` items are not validated against `ALL_HTTP_BODY_TYPES`

Arbitrary strings in the http_bodies list are accepted without error; if an invalid body type is supplied the SDK will silently never match any request body, making collection appear configured while nothing is actually captured.

`_kvcb_from_value` crashes with unhelpful AttributeError when a non-dict is given for `cookies` or `query_params` - `sentry_sdk/client.py:27-28`

In sentry_sdk/data_collection.py, _kvcb_from_value calls val.get(...) with no isinstance guard, so passing data_collection={"cookies": "off"} instead of data_collection={"cookies": {"mode": "off"}} raises an unhelpful AttributeError: 'str' object has no attribute 'get' during sentry_sdk.init(). Add an isinstance(val, dict) check at the top of _kvcb_from_value and raise a clear TypeError/ValueError for non-dict inputs so misconfiguration produces an actionable error.

Also found at:

  • sentry_sdk/data_collection.py:164-180
`DeprecationWarning` for `send_default_pii` uses wrong `stacklevel`, pointing to internal SDK code - `sentry_sdk/client.py:28`

In sentry_sdk/data_collection.py, resolve_data_collection emits a DeprecationWarning for send_default_pii with warnings.warn(..., stacklevel=2). Because the warning is raised deep in the SDK call chain (sentry_sdk.init() -> Client.__init__ -> _get_options() -> resolve_data_collection() -> warnings.warn), stacklevel=2 resolves to _get_options() in client.py rather than the user's sentry_sdk.init() call site. This makes the deprecation warning reference an unhelpful internal frame. Increase stacklevel to match the actual call depth from user code.

Also found at:

  • sentry_sdk/data_collection.py:247-251
`_DISABLED_DATA_COLLECTION_CONFIG` is a mutable module-level dict returned by reference from `BaseClient.data_collection` - `sentry_sdk/client.py:394`

BaseClient.data_collection returns the shared _DISABLED_DATA_COLLECTION_CONFIG dict directly; any caller that mutates the returned dict (e.g. client.data_collection["cookies"]["mode"] = "deny_list") permanently alters the singleton, affecting every subsequent BaseClient instance.

Also found at:

  • sentry_sdk/client.py:440-441
Spotlight re-derive uses `is not False`, diverging from normal data_collection resolution when frame options are `None` - `sentry_sdk/client.py:640-643`

In the DSN-less spotlight override path in client.py, data_collection is re-derived via _map_from_send_default_pii using self.options['include_local_variables'] is not False and self.options['include_source_context'] is not False. This differs from the normal resolution in resolve_data_collection, which passes the raw option value through to _map_from_send_default_pii (include_local_variables=include_local_variables). When a user explicitly sets include_local_variables=None or include_source_context=None (both declared Optional[bool] in consts.py:1312-1313), the two paths disagree: the spotlight re-derive treats None as enabled (None is not False -> True), producing stack_frame_variables=True / frame_context_lines=DEFAULT_FRAME_CONTEXT_LINES, while normal resolution stores the raw None (falsy) / frame_context_lines=0. For any True/False value the two paths agree, so the divergence is limited to the explicit-None edge case. Impact today is latent: no code currently reads stack_frame_variables or frame_context_lines from the resolved DataCollection dict, so the difference is not yet observable in behavior.

Deprecation warning stacklevel points to internal SDK frame instead of user code

The stacklevel=2 in resolve_data_collection makes the DeprecationWarning point to client.py:353 (the _get_options call site) rather than to the user's sentry_sdk.init() call; it should be at least stacklevel=5 to reach user code.

Non-dict values for `cookies`/`query_params`/`http_headers` raise confusing AttributeError instead of a clear config error

resolve_data_collection validates only that the top-level data_collection option is a dict; it does not validate the types of nested fields. If a user passes a non-dict for a field expected to be a KeyValueCollectionBehaviour (e.g. data_collection={"cookies": False}), _resolve_explicit forwards the value directly to _kvcb_from_value, whose first line calls val.get("mode", "deny_list"). For a bool (or any non-mapping) this raises AttributeError: 'bool' object has no attribute 'get' at client init, rather than a clear validation error like the TypeError/ValueError used elsewhere in this module. The same applies to query_params and to http_headers.request/http_headers.response. Because user_info: False is a valid bool field, a user may plausibly try the analogous cookies: False and get an opaque traceback. This is a config-time, fail-loud robustness/UX issue, not a security or data-integrity bug.

`_http_headers_from_value` silently ignores non-dict input via substring membership test

When a non-dict value such as http_headers: "off" is supplied, _http_headers_from_value performs "request" in "off" / "response" in "off", which are Python substring checks that both return False. As a result both request and response headers silently fall back to the collection-enabled deny_list default instead of raising an error or disabling collection. This diverges from _kvcb_from_value (used for cookies/query_params), which raises AttributeError on the same malformed string input, and means headers keep being collected when the user's malformed config appears intended to turn them off.

`_http_headers_from_value` silently ignores malformed `http_headers` values, falling back to `deny_list` - `sentry_sdk/data_collection.py:187-195`

_http_headers_from_value only inspects val for the keys "request" and "response". Any other shape — a bare string like "off", or a mis-shaped dict like {"mode": "off"} — is silently ignored and both request and response default to deny_list (i.e. headers are still collected). This is inconsistent with sibling helpers such as _kvcb_from_value, which raise on invalid modes, and with resolve_data_collection, which raises TypeError for a non-dict data_collection. A user who intuitively writes http_headers="off" or http_headers={"mode": "off"} (a plausible mistake since cookies/query_params accept {"mode": "off"}) gets no error and silently keeps collecting headers. Note the impact is currently latent: the resolved data_collection is stored on the client but not yet consumed by integrations for actual header filtering, so this is a robustness/UX correctness issue rather than an active PII leak.

Also found at:

  • tests/test_data_collection.py:33-36

⏱ 16m 25s · 6.2M in / 180.2k out · $7.50

Annotations

Check warning on line 357 in sentry_sdk/client.py

See this annotation in the file changed.

@sentry-warden sentry-warden / warden: find-bugs

`should_send_default_pii()` not updated to reflect `data_collection["user_info"]`, creating split-brain PII gating

`_Client.should_send_default_pii()` still reads `options["send_default_pii"]` directly, so setting `data_collection={"user_info": True}` without `send_default_pii=True` causes `EventScrubber` (now driven by `data_collection["user_info"]`) to operate in PII-on mode while all integrations and `scope.py` that call `should_send_default_pii()` still return `False`, contradicting the user's explicit intent.