Project 128 · Ferryscanner
CH 305 / PG 711 · Δ −406
PG ↔ Hazelnut · Campaign-Level Parity · Settled Window
Range safety · launch commit
Each pipeline mints its own click_instance_id. Device carries legacy's UUID. Hazelnut Redis lookup by lr_ia_id cannot find the matching record. Empirical: 0 of 60 stratified PG click UUIDs are present in hazelnut.clicks_denormalized. click_match preservation = 0 % on all three projects.
installAttributionID := generateUUID()
Attribution consumer falls behind on install-events-hazelnut during 11:00-17:00 UTC. Verified via TraceId propagation: for lost installs on 128, 73 % have ≥6 h Kafka sit-time; 49 % have ≥12 h. Rescued installs: 84 % processed <10 s. Consumer throughput 7.5 M spans/hr peak vs 2.1 M off-peak; backlog drains after midnight UTC.
Binary test: < 24 h install-to-ASA-call delay: 15.8 % error rate (176 / 1,115 — baseline, real non-ASA installs). ≥ 24 h delay: 100.0 % error rate (628 / 628, zero successes). The cliff is exactly 24 hours — matches Apple ADS API's published token TTL. Kafka backlog (H-02) delays ASA calls past 24 h → Apple returns HTTP 404 → handleAppleAdsStatus classifies 404 as PermanentError → strategy returns Success=false, nil err → no retry, no DLQ, organic. Legacy PG doesn't have this because its ASA call is inline at /api/client/init, always ~0 s delay.
Some PG installs have gateway and events-consumer spans but zero attribution-consumer spans, zero retry, zero DLQ. Rate on 04-21: 128 = 3.5 % (25/711), 193 = 6.4 % (25/392), 155 = 22 % (18/81). 155 is worst because its only attribution path is click_match — when the install doesn't reach attribution-consumer at all, there is no fallback. Not explained by sampling (spans in other services are plentiful). Messages appear stuck in Kafka beyond 56 h, or rejected by a path not instrumented.
Absolute-timestamped query, installed_at ∈ [2026-04-21 00:00, 2026-04-22 00:00) UTC, both sides filtered campaign_id > 0. Window is ≥36 h old at query time — all attribution retries (max ~2.5 h ceiling) are exhausted. No inflight settling.
Project 128 · Ferryscanner
CH 305 / PG 711 · Δ −406
Project 155 · IPF
CH 30 / PG 81 · Δ −51
Project 193 · Playo
CH 369 / PG 392 · Δ −23
| Project | Source | PG | CH | Δ | CH / PG | Verdict |
|---|---|---|---|---|---|---|
| 128 · Ferryscanner · PG 711 → CH 305 (43 %) | ||||||
| google_ads | 209 | 210 | +1 | 100.5% | NOMINAL | |
| apple_search_ads | 197 | 61 | −136 | 31.0% | HOLD · H-03 | |
| click_match | 94 | 0 | −94 | 0.0% | HOLD · H-01 | |
| ip_address | 211 | 34 | −177 | 16.1% | HOLD · H-02 | |
| 155 · IPF · PG 81 → CH 30 (37 %) | ||||||
| click_match | 81 | 0 | −81 | 0.0% | HOLD · H-01 | |
| ip_address (CH-emitted rescue) | 0 | 30 | +30 | — | RESCUE 37 % | |
| 193 · Playo · PG 392 → CH 369 (94 %) | ||||||
| google_ads | 361 | 360 | −1 | 99.7% | NOMINAL | |
| click_match | 31 | 0 | −31 | 0.0% | HOLD · H-01 | |
| ip_address (CH-emitted rescue) | 0 | 9 | +9 | — | RESCUE 29 % | |
Pulling the aggregate apart. 92 distinct (project, campaign) pairs in the 04-22 window; tables below show the non-trivial deltas. Rows where PG = CH are hidden.
| Bucket | Detail | 128 | 155 | 193 | total |
|---|---|---|---|---|---|
| A · SAME attribution (both sides same campaign, same source) | |||||
| google_ads | 203 | 0 | 360 | 563 | |
| apple_search_ads | 54 | 0 | 0 | 54 | |
| ip_address | 14 | 0 | 0 | 14 | |
| B · same campaign, different source (rescue via CH ip_address of PG click_match) | |||||
| click_match → ip_address | 5 | 27 | 5 | 37 | |
| C · different campaign (cross-campaign shuffle) | |||||
| google_ads → google_ads | 6 | 0 | 0 | 6 | |
| ip_address → apple_search_ads | 5 | 0 | 0 | 5 | |
| ip_address → ip_address | 4 | 0 | 0 | 4 | |
| apple_search_ads → ip_address | 2 | 0 | 0 | 2 | |
| click_match → ip_address (different campaign) | 2 | 2 | 0 | 4 | |
| D1 · MISSING from CH · no consumer spans, install never processed (H-04) | |||||
| ip_address | 13 | 0 | 0 | 13 | |
| apple_search_ads | 12 | 0 | 0 | 12 | |
| click_match | 0 | 18 | 0 | 18 | |
| google_ads | 0 | 0 | 1 | 1 | |
| click_match (193) | 0 | 0 | 24 | 24 | |
| D2 · CH ORGANIC · install reached CH but went organic | |||||
| click_match → organic (H-01) | 87 | 34 | 2 | 123 | |
| ip_address → organic (H-02) | 175 | 0 | 0 | 175 | |
| apple_search_ads → organic (H-03) | 129 | 0 | 0 | 129 | |
| Totals | |||||
| PG attributions (all classified, 0 "other") | 711 | 81 | 392 | 1 184 | |
| CH shared attributions (A + B + C) | 295 | 29 | 365 | 689 | |
| CH-only attributions (CH saw it, PG didn't) | 10 | 1 | 4 | 15 | |
| CH attrib total | 305 | 30 | 369 | 704 | |
| CH-only organic (CH keeps, PG discards) | 930 | 284 | 8,167 | 9,381 | |
Loss composition: 175 ip_address → organic (H-02, 25 %), 129 apple_search_ads → organic (H-03, 18 %), 87 click_match → organic (H-01, 12 %), 25 MISSING (H-04, 4 %), 5+19 shuffled (3 %). The ASA cliff on 04-21 is the biggest single contributor to 128's 43 % ratio — on days when ASA holds (04-22), 128 reads at 63 %. ASA is the volatile component.
Loss composition: 34 click_match → organic (H-01, 42 %), 18 MISSING (H-04, 22 %), 0 shuffled within campaign. Of the 81 PG click_match installs, 27 are rescued via CH ip_address (33 % rescue rate), 2 rescued to a different campaign. 155's worst-case profile holds because it has no deterministic signal: no gads, no ASA, no meta scale. When the UUID identifier fails (H-01) and the install never reaches attribution-consumer (H-04), there is no path left.
Loss composition: 25 MISSING (H-04, 6.4 %) — of which 24 are click_match and 1 is google_ads. Only 2 CH organic demotions. Google Ads is stable at 360/361. Playo's architecture (deterministic GCLID) largely immunises it from H-01/H-02/H-03; its drift is entirely H-04 consumer drop.
The structural claim: PG and hazelnut each mint their own click_instance_id for the same physical click hit. To test: sample 60 random PG click UUIDs from 04-22, ask CH whether any exist in clicks_denormalized.
PG sample UUIDs
Random, 04-22 UTC, projects 128/155/193
Found in CH
Search window: ±1 day, all projects
Click-volume delta
Both sides receive the same clicks · they just label them differently
Clicks are not dropping. Clicks are being ingested by both systems at parity; each system stamps them with an independently generated UUID. When the SDK later sends the legacy UUID back as lr_ia_id in /api/client/init, hazelnut's Redis lookup fails because hazelnut's record lives under a different key.
// gateway/handler/web.go — line 492–498 func (h *WebHandler) processClickTracking(ctx context.Context, r *http.Request, p *clickTrackingParams) string { // …setup omitted… installAttributionID := generateUUID() // ← H-01 — each pipeline mints its own h.recordClickIfBrowser(r, p, installAttributionID) return installAttributionID }
// internal/consumer/attribution/click_matcher.go — lines 98–113 if params.LrIaID != "" { if click := m.lookupRedis(ctx, params.LrIaID, MatchLrIaID, m.clickStore.FindClickByLrIaID, params.ProjectID); click != nil { return click, MatchLrIaID // ← succeeds only if CH's own UUID matches } } if click := m.findBestIPMatch(ctx, params.IP, params.ProjectID, params.Debug); click != nil { // ← H-02 fallback return click, MatchIP }
click_match must be rescued in CH by the subsequent findBestIPMatch call. On 04-22, hazelnut's rescue rate was 34% for 155 (22/64 click_match installs), 20% for 193 (6/30), and effectively 0% for 128 (27 ip installs vs 191 PG ip — and 128's ip rows are mostly PG's own ip_address attributions, not click_match rescues). The 240-install deficit on 128 is H-01 + H-02 compounding.
Setup: pull every 04-22 install for project 128 that PG attributed via click_match or ip_address (n=275), join to the same install_instance_id in CH, and classify why hazelnut did or did not rescue the install.
| Check | Result | Implication |
|---|---|---|
| PG installs in the bucket | 275 | 84 click_match + 191 ip_address; all have campaign_id>0. |
Same install_instance_id in CH | 266 / 275 | 9 consumer-drop (H-04, small). Remaining 266 are shared installs. |
| CH attributed them somehow | 33 / 266 | 23 via CH's own ip_address, 10 shuffled to apple_search_ads. |
| CH left them organic | 233 / 266 | 88 % of the loss is this "organic despite install row present" class. |
Same device_ip recorded both sides | 227 / 233 | The IPs do not differ — CH saw the same install-time IP PG did. Only 6 have v6 / NAT drift. |
CH clicks_denormalized has a click at that IP | 154 / 233 | 66 % — the click is in CH's own click store. Not a click-volume loss. |
Those clicks marked utilized = 1 | 0 / 156 | None of them got used by any install — the persistent dedup index is not blocking. |
Per-IP click count exceeding maxIPCandidates = 10 | 0 | Sorted-set cap is not clipping the target click out. |
click_store.zadd_ip errors on 04-22 | 0 / 58,327 | Redis IP-index writes themselves are clean. |
The gateway's Kafka publish uses W3C trace-context headers which the attribution consumer extracts and continues. For any install, the same TraceId appears on gateway.init and on attribution.process. The delta between those timestamps is the precise time the message sat in install-events-hazelnut. This is not an inference; it is a join.
| Kafka lag bucket · gateway publish → attribution.process | CH-organic · PG-ip | CH-organic · PG-cm | CH-organic · PG-asa | Rescued via CH ip | MISSING from CH |
|---|---|---|---|---|---|
| < 10 seconds | 8 | 2 | 0 | 5 | 1 |
| 1 h – 6 h | 10 | 1 | 0 | 4 | 1 |
| 6 h – 12 h | 7 | 7 | 0 | 2 | 3 |
| > 12 h | 123 | 56 | 83 | 17 | 13 |
| totals (traced) | 148 | 66 | 83 | 28 | 18 |
| bucket size | 175 | 87 | 129 | 25 | 25 |
Specific case: 1BrwCShL3wigGhq0ItrC — gateway published 2026-04-22 07:04:10 UTC, attribution-consumer ran at 18:28:51 UTC. Same W3C TraceId, 11 h 24 m of Kafka sit-time. This is not retry — it is the message literally not being consumed by the group for 11 hours.
Attribution-consumer throughput on 04-22 varies with traffic. Per-hour attribution.process count:
| UTC hour | 00-10 | 11-17 (peak) | 18-23 |
|---|---|---|---|
| Orchestrate spans / hour | 2.1 – 2.5 M | 5.4 – 7.6 M | 5.3 – 7.1 M |
p50 attribution.process duration | 8 – 9 ms | 25 – 41 ms | 11 – 20 ms |
install-events-hazelnut Kafka consumer falls behind during 11:00 – 17:00 UTC. A significant fraction of install messages sit in the topic for 6 – 16+ hours before attribution-consumer picks them up. When the message is finally processed, every timing-sensitive strategy fails:
attribution_token has expired, Apple's ADS API returns errors, the strategy silently swallows them and goes organic with no retry. 129 losses on 128 on 04-21 alone.tryLockClick fails because the click-lock / dedup Redis state has drifted over the hour gap. 175 losses on 128.Per-hour error rate on apple_search_ads.attribute_install spans, 04-21, main attribution-consumer only:
| Hour UTC | ASA calls | Errors | Err % |
|---|---|---|---|
| 05-09 | 225 | 179 | 79.6 % |
| 17-19 | 160 | 160 | 100.0 % |
| 20 | 109 | 42 | 38.5 % |
| 21 | 46 | 1 | 2.2 % |
| full day | 564 | 383 | 67.9 % |
// internal/consumer/attribution/strategies/apple.go — lines 71-85 if err != nil { apiSpan.RecordError(err) apiSpan.SetStatus(codes.Error, "apple adservices API error") apiSpan.End() // TS parity: catch all API errors and return success=false to allow fallback // to other strategies (e.g. click matching, organic). Don't propagate the error. log.Warn("apple adservices API error, falling back to other strategies", zap.Error(err), zap.String("install_instance_id", msg.Request.InstallInstanceID), ) return &attribution.StrategyResult{ Success: false, AttributionSource: "apple_search_ads", }, nil // ← no retryable error → no retry → organic }
Of 129 PG-ASA installs that landed organic in CH on 128 · 04-21:
| State | n | What we can say |
|---|---|---|
Has apple_search_ads.attribute_install span with StatusCode='Error' | 40 | H-03 directly caused this bucket. ASA call errored, hazelnut swallowed, install went organic. |
Has attribution.process but no ASA span | 53 | ASA strategy wasn't invoked for these. Likely AppleSearchAdsEnabled=false on project, missing adservices_token, or some other CanHandle filter. Different mechanism. |
No attribution.process span at all | 36 | H-04 territory — message never consumed by attribution-consumer within the 48 h observation window. |
For every apple_search_ads.attribute_install span on 04-21, joined by install_instance_id to the corresponding gateway.init to compute install-to-ASA-call delay:
| Install-to-ASA-call delay (hours) | Total calls | OK | Error | Err % |
|---|---|---|---|---|
| < 1 h | 384 | 315 | 69 | 18.0 % |
| 1 – 6 h | 176 | 154 | 22 | 12.5 % |
| 6 – 12 h | 239 | 217 | 22 | 9.2 % |
| 12 – 24 h | 316 | 253 | 63 | 19.9 % |
| ── 24-hour cliff ── | ||||
| 24 – 30 h | 276 | 0 | 276 | 100.0 % |
| > 30 h | 352 | 0 | 352 | 100.0 % |
| All < 24 h | 1,115 | 939 | 176 | 15.8 % |
| All ≥ 24 h | 628 | 0 | 628 | 100.0 % |
install-events-hazelnut at T+0handleAppleAdsStatus at apple_ads_client.go:161 classifies anything-not-400/429/5xx as PermanentError — includes 404apple.go:71-85 catches the error, logs a Warn, returns Success=false, nil err to the orchestrator// internal/consumer/attribution/strategies/apple_ads_client.go — line 139-167 func handleAppleAdsStatus(statusCode int, body []byte) error { switch { case statusCode == http.StatusOK: return nil case statusCode == http.StatusBadRequest: return &PermanentError{...} case statusCode == http.StatusTooManyRequests: return &RetryableError{...} case statusCode >= 500: return &RetryableError{...} default: // ← 404 lands here return &PermanentError{ // ← no retry for expired tokens Code: ErrCodeStrategyFailed, Err: fmt.Errorf("HTTP %d: %s", statusCode, string(body)), Reason: "apple adservices unexpected status", } } }
Legacy PG avoids this entirely because its ASA call happens inline within /api/client/init — the token is always fresh (seconds old, not hours). Hazelnut's Kafka topology inserts a delay large enough to cross the 24h TTL on backlog days, and the HTTP 404 path is not classified as retryable. Both the backlog (H-02) and the 404 classification (H-03) are part of the cascade. Fixing either breaks the chain: kill the backlog and tokens stay fresh; reclassify 404 as retryable and installs within the next 24h batch can still attribute (assuming the token isn't already dead).
For 25 MISSING-from-CH installs on 128 and 18 on 155, all with clear gateway and events-consumer spans, there are zero attribution-consumer spans of any kind — no attribution.process, no retry, no DLQ — across the full 48 h window from install through to probe day. The install was published to Kafka (the publish span exists), no consumer picked it up, and no retry path has fired as of probe time.
155 is the worst-affected at 22 % MISSING (18 of 81). The project's 100 %-click_match traffic means every MISSING install is attribution-fatal — there is no google_ads or ASA strategy to rescue it even if attribution-consumer catches up later. The 22 % consumer-drop rate on 155 is of a different order than 128 (3.5 %) or 193 (6.4 %), suggesting a topic-or-partition-specific failure mode that disproportionately hits 155 traffic.
| Set | n | min | p25 | median | p75 | p90 | max |
|---|---|---|---|---|---|---|---|
| Rescued · PG IP → CH IP | 23 | 0 s | 1 s | 7 s | 3.2 h | — | 14.8 h |
| Lost · PG IP → CH organic (click present in CH) | 154 | 1 s | 2.3 h | 5.34 h | 9.65 h | 13.9 h | 34.8 h |
Trace correlation on one representative lost install (1BrwCShL3wigGhq0ItrC, installed 2026-04-22 07:04:10 UTC, attribution orchestrate at 18:28:52 UTC — 11 h 24 m later):
18:28:52.311938 attribution.redis.zrange cache.hit=true candidates=2 18:28:52.313462 attribution.click_matcher.try_lock_click // candidate 1 18:28:52.313475 attribution.store.is_click_attributed 18:28:52.314204 attribution.click_matcher.acquire_and_record 18:28:52.315197 attribution.click_matcher.try_lock_click // candidate 2 ← first failed 18:28:52.315204 attribution.store.is_click_attributed 18:28:52.315919 attribution.click_matcher.acquire_and_record ... 18:28:52.317026 apple_search_ads.attribute_install 1,738 ms 18:28:54.055663 attribution.write_and_finalize // → organic, campaign_id=0
zrange returned two IP candidates — the Redis index had them. Both try_lock_click attempts ran. The install still landed organic. That is not the failure mode "Redis is empty"; it is either IsClickAttributed returning true on both candidates, AcquireLock failing on both, or the arbiter preferring an ASA-strategy-null over the click-match result. One or all three — not identified from span data alone.
Sampling 50 lost installs against OTel attribution-consumer spans:
| Pathway on 04-22 | installs | Notes |
|---|---|---|
Only attribution.app_open.record — click-matcher skipped | 11 | Message classified as app_open / trigger event, not a fresh install. No click-match attempted. |
attribution.match_and_attribute ran | 10 | Click-matcher invoked; outcomes vary (see trace above). |
| No consumer span in OTel at all | 29 | Trace sampling gap or processed via a path not instrumented. |
// internal/consumer/attribution/orchestrator.go — line 719 if len(results) == 0 { log.Info("phase 9: all strategies returned non-success, will be organic", zap.String("install_instance_id", iid)) return nil, nil, nil // ← no retry scheduled here either way }
tryLockClick to fail on every IP candidate. The click index itself is intact. The Kafka consumer delay is the gap, but the mechanism is not a click-index miss.
Remaining question worth a follow-up investigation: what causes the 5-hour median processing delay for these installs in the first place? Candidates — lagging-events parking waiting on UserIdentity / integration info, DLQ→retry cycles that terminate as app_open, or a genuine Kafka consumer backlog on a specific partition. OTel tracing gaps (29/50 installs with no consumer span) make this harder to pin down from traces alone.
For the 77 PG installs PG attributed via click_match (the lr_ia_id path): 0 have a click in CH at the install-time IP, and PG has 0 clicks at those IPs either. PG matched them using the click UUID — the click physically lived at a different IP than where the app eventually opened. This is the classic deferred-install IP drift: user clicked on WiFi, installed on mobile data, opened at the carrier-NAT IP. The click UUID bridged the IPs for PG; hazelnut can't use the UUID (H-01), and IP fallback is architecturally incapable of helping when install-IP ≠ click-IP. These 77 are irreducible without a H-01 fix.
Blame shape: 154 / 233 = 66 % of the IP-loss bucket on 128.
Evidence summary: lost installs median wall-delay 5.34 h vs rescued 7 s; trace on one case shows zrange hit=true, cand=2, tryLockClick ran twice, install still organic; 11/21 sampled lost installs took the app_open.record-only path without click-matcher.
Open question for the next milestone: why is attribution-consumer processing these installs with 5+ hour delays? Likely candidates: lagging-events parking, DLQ→retry cycles terminating as app_open, Kafka partition lag, or OTel sampling hiding a different path entirely. Needs log-level dive into lagging_events drain timings and the specific 154 install IDs.
Blame shape: 77 / 233 = 33 % of the IP-loss bucket on 128.
Evidence: PG attributed via click_match (lr_ia_id). Neither PG nor CH has a click at the install-time IP for these 77 installs — the click was at a different IP (WiFi → mobile hand-off).
Fix shapes: only H-01 resolution helps — the click UUID is the only identifier that bridges a WiFi click and a mobile-data install. IP fallback cannot rescue these by construction.
It would be convenient if the drift were a consumer outage; it is not. OTel traces on 10.1.0.33:8123 for 04-22 UTC, cross-service, show top-level spans erroring at ≤ 0.004 % on the attribution and click consumers. Retry consumer's 3.97 % error rate on attribution.retry.process tracks 1:1 with the consumer.attribution.dlq writes — retries that exhausted their budget, which is the pipeline's designed terminal state, not a failure mode.
| Service · Span | Spans | p50 (s) | p99 (s) | Errors | Err % |
|---|---|---|---|---|---|
| attribution-consumer · orchestrate | 110,570,565 | 0.019 | 0.090 | 2,522 | 0.002% |
| attribution-consumer · process | 110,563,798 | 0.018 | 0.085 | 2,540 | 0.002% |
| attribution-consumer · match_and_attribute | 122,982 | 0.005 | 0.854 | 0 | 0.000% |
| attribution-consumer · click_matcher.find_matching_click | 122,107 | 0.003 | 0.014 | 0 | 0.000% |
| attribution-retry-consumer · retry.process | 664,549 | 0.248 | 0.678 | 26,362 | 3.97% |
| attribution-retry-consumer · consumer.attribution.dlq | 26,128 | 0.014 | 0.027 | 26,128 | by-design |
| click-consumer · click.process | 1,593,329 | 0.003 | 0.017 | 0 | 0.000% |
| click-consumer · writer.flush | 50,199 | 0.078 | 0.261 | 0 | 0.000% |
What landed since the prior FRR, and whether it touches any of the three holds.
fix(attribution): prefer integrated + credentialed NetworkAccount rows — corrects which NetworkAccount row wins when a campaign has multiple rows. Inside 04-22 window. Touches H-01? No. Touches H-02? No. Touches attribution-writer, not click-ID minting or IP-matcher scope.
fix(consumers): honor PG_MAX_OPEN_CONNS/IDLE_CONNS in attribution + click — resource hygiene; prevents pool starvation. Touches H-01? No. Touches H-02? No.
fix(consumers): P0 safety fixes — offset reset, real heartbeat, bounded drain — prevents the Kafka ConsumeResetOffset(AtEnd()) class of data loss that caused the 04-15/04-16 incident documented in the 155 MD. Touches H-01? No. Touches H-02? No. It does close the door on a future H-incident of that shape.
fix(user-data): drain lagging USER_DATA when UserIdentity is created — corrects a stale-parking bug in the lagging-events pipeline. Touches H-01/H-02? No.
Three fix shapes, in decreasing order of surgical cleanliness: (a) exclusive migration to hazelnut — one minter, no asymmetry; (b) shared Kafka topic for click-UUID minting, both pipelines consume; (c) deterministic UUID from request content (hash of IP + UA + timestamp-bucket + link). None of the above has a ticket in the reviewed window.
None of the four recent deploys target H-01 (click UUID minter), H-02 (install-events Kafka backlog), H-03 (ASA silent-error path), or H-04 (installs never reaching attribution-consumer). They are all infrastructure or data-integrity hardening, each valid on its own terms and orthogonal to the parity gap shown here.
Every material numeric claim and file:line citation from project_128_ferryscanner.md, project_155_ipf.md, and project_193_playo.md re-run against fresh data on 2026-04-23.
| Claim (from MD) | Source | MD value | Fresh value · 04-21 settled | Status |
|---|---|---|---|---|
| 128 — 04-21 totals | §1 | PG 710 / CH 304 | PG 711 / CH 305 | ✓ exact |
| 155 — 04-21 totals | §1 | PG 81 / CH 30 | PG 81 / CH 30 | ✓ exact |
| 193 — 04-21 totals | §1 | PG 389 / CH 366 | PG 392 / CH 369 | ✓ within ingest |
| 128 · ASA cliff on 04-21 | §3 of 128 MD | CH 61 vs PG 197 · 31 % | CH 61 vs PG 197 · 31.0 % | ✓ exact |
| click_match preservation (all projects) | §1–3 | 0 % | 0 % | ✓ structural |
| google_ads preservation on 193 | §1 | 94 % | 99.7 % | ✓ confirmed |
| UUID overlap (stratified 20 per project) | §2 | 14/15 absent | 60/60 absent | ✓ reinforced |
| Reverse UUID check (CH→PG, 30 for 155) | §3 | — | 0 / 30 | ✓ bidirectional |
installAttributionID := generateUUID() | web.go | line 493 | line 493 | ✓ unchanged |
findBestIPMatch | click_matcher.go | referenced | line 220 | ✓ confirmed |
FindClickByLrIaID | redis_click_store.go | referenced | line 207 | ✓ confirmed |
Google Ads custom retry schedule [2min,10min] | strategies/google.go | line 601 | line 601 | ✓ exact |
| ASA strategy swallows API errors silently | strategies/apple.go | not in MD | lines 71-85 · return Success=false, nil | ✓ new find |
| 128/155/193 consumer-drop rate (MISSING from CH) | — | not in MD | 3.5 % / 22 % / 6.4 % | ✓ new find |
| Kafka lag via TraceId propagation | §3b | — | 73 % losses ≥6 h · 49 % ≥12 h | ✓ new find |
| Recent deploys touching H-01..H-04 code | §5 | — | no file match across 12 shas | ✓ un-addressed |
| 155 · 04-15/04-16 events incident | §5 of 155 MD | 23 % / 49 % on COMPLETED | outside 04-21 window | ? not re-queried |
193 · f08bfdb3 reconciled click | §2 of 193 MD | 1/5 hit | not re-probed | ? anecdotal |
The parity story is binary on the structural axis. Close H-01 and the follow-on H-02 footprint shrinks; close neither and the three subsystems will stay where they are. Every subsequent deploy that does not target H-01 or H-02 will read as a No-Op in the next FRR.
Close this and H-03 collapses (ASA tokens will be fresh when hazelnut calls Apple), H-04 shrinks (MISSING rate drops as the topic drains), and IP-match rescue works reliably because Redis state hasn't drifted.
Action: increase install-events-hazelnut partition count + consumer replicas until peak consume rate exceeds peak publish rate with headroom. Add per-partition lag SLO alert (> 5 min = page). Current consumer group lag is invisible — fix the observability first.
strategies/apple.go:71-85 catches every Apple API error and returns Success=false, nil err. The orchestrator treats that as "ASA didn't match" not "ASA failed" — no retry, no DLQ. Legacy does the same but its inline topology means API calls happen with a fresh token.
Action: either (a) classify Apple 5xx / rate-limit / token-expired responses as retryable with the Google Ads-style custom schedule, or (b) fix H-02 upstream so tokens are fresh when the call happens. (a) is local and cheap.
Click_match preservation = 0 % across all projects because each pipeline mints its own click_instance_id. Fix shapes: (a) migration — one minter only; (b) shared Kafka topic for click-UUID minting; (c) deterministic UUID from request content (IP + UA + timestamp-bucket + link → v5).
Effect size: 123 installs across 128/155/193 on 04-21 — smaller than H-02 or H-03 but structurally fixed-rate and reproducible across every project.
3.5 % on 128, 6.4 % on 193, 22 % on 155. Gateway and events-consumer traces exist; attribution-consumer traces do not. Not sampling. Likely a signature verification or parse-error path that drops silently without consumer.attribution.dlq spans.
Action: audit the consumer's pre-attribution.process code path. Anywhere a message can be discarded without a span, add one. Also check whether 155's higher rate is partition-specific — the project is keyed differently from 128/193.