feat(data-collection): create DataCollection option in client#6702
feat(data-collection): create DataCollection option in client#6702ericapisani wants to merge 16 commits into
4 issues
code-review: Found 4 issues (1 medium, 3 low)
Medium
`should_send_default_pii()` ignores `data_collection["user_info"]`, causing silent divergence - `sentry_sdk/client.py:357`
After this change, EventScrubber is initialized from data_collection["user_info"], but should_send_default_pii() (line 753) still reads only options["send_default_pii"]. A user who migrates to data_collection={"user_info": True} without setting send_default_pii=True will find EventScrubber won't scrub PII fields, yet all integration code paths guarded by should_send_default_pii() — including scope user attachment (scope.py:1749), cookies, IP addresses, and dozens of integrations — will still suppress PII, leaving the configuration in a silently split state.
Low
Spotlight re-derivation treats `None` option values differently than `resolve_data_collection` - `sentry_sdk/client.py:640-643`
Using is not False identity checks for include_local_variables and include_source_context means an explicit None value is treated as True here, while resolve_data_collection passes None directly to _map_from_send_default_pii (where it's falsy). If a user passes either option as None, the spotlight override would produce a data_collection inconsistent with the initial resolution. Consider using the raw option values directly (matching resolve_data_collection's approach) or explicitly coercing with bool(...).
_http_headers_from_value silently mishandles non-dict values via string-contains fallback - `sentry_sdk/consts.py:1282`
_resolve_explicit passes d.get("http_headers", {}) straight into _http_headers_from_value without verifying the value is a dict. Inside, the guard "request" in val performs a substring check when val is a string. If a user passes a string like "off" (which contains neither "request" nor "response"), the code silently falls back to the deny_list default — meaning a user attempting to disable header collection actually still collects headers (a PII concern). If instead the string contains "request" or "response" as a substring (e.g. "request_off"), val["request"] performs string indexing and raises a cryptic TypeError/AttributeError in _kvcb_from_value rather than a clear validation error. The test test_http_headers_collection_defaults exercises "off" and confirms the silent-fallback path, but no validation rejects malformed values.
Also found at:
sentry_sdk/data_collection.py:152-158
DeprecationWarning stacklevel points to internal SDK code instead of user code - `sentry_sdk/data_collection.py:249-251`
With stacklevel=2, the warning points to _get_options() in client.py rather than the user's sentry_sdk.init() call, making it hard for users to locate the offending line in their own code. A stacklevel of ~5 is needed to reach user code via resolve_data_collection → _get_options → _Client.__init__ → _init → user code.
Also found at:
sentry_sdk/client.py:28
⏱ 11m 19s · 4.6M in / 105.0k out · $4.37
Annotations
Check warning on line 357 in sentry_sdk/client.py
sentry-warden / warden: code-review
`should_send_default_pii()` ignores `data_collection["user_info"]`, causing silent divergence
After this change, `EventScrubber` is initialized from `data_collection["user_info"]`, but `should_send_default_pii()` (line 753) still reads only `options["send_default_pii"]`. A user who migrates to `data_collection={"user_info": True}` without setting `send_default_pii=True` will find EventScrubber won't scrub PII fields, yet all integration code paths guarded by `should_send_default_pii()` — including scope user attachment (`scope.py:1749`), cookies, IP addresses, and dozens of integrations — will still suppress PII, leaving the configuration in a silently split state.