split observe analytics group for cleaner resiliency tracking

## Summary
A single error-analytics group (Observe grouping hash `16864652937831232783`) is acting as a catch-all for essentially **every GraphQL / Admin-API client error in the CLI**, across four unrelated product areas. Over the last 10 days it held ~1,170 events spanning `app`, `theme`, `store`, and `hydrogen`. This makes the group un-routable, un-actionable, and a permanent source of false P1 escalations. We should make the CLI emit a meaningful `groupingHash` so distinct failures land in distinct buckets.

Context: this surfaced from resiliency issue [Vault 31608](https://vault.shopify.io/issues/31608) / shop/issues#32995 ("Access denied for themes field… `read_themes`"). That specific error was fixed in CLI 4.2.0 (#7652) and is confirmed gone on 4.2.0 — but the resiliency item stays hot because ~94% of the bucket is unrelated errors sharing the same hash.

## What's actually in the bucket (last 10 days, ~1,170 events)
By slice: `app` 618 · `theme` 385 · `store` 105 · `hydrogen` 40 · unknown 18 · `cli` 4.
Handled split: 708 unhandled / 462 handled.

Distinct families lumped together:
- **App Management 403 "Unauthorized"** (~316; slice `app`, `app deploy`) — caller lacks permission/membership. Owner: App Management.
- **401 authentication failures** (~390; slice `store`/`theme`, e.g. `store info`, `store execute`) — invalid/expired session token; 163 are literally `Service is not valid for authentication`. Owner: CLI auth/identity.
- **Missing access scope ACCESS_DENIED** (166 = `read_themes` 75 + `write_themes` 91; slice `theme`) — custom-app token missing scope. Already fixed in 4.2.0 (#7652); now a clean `AbortError`, aging out as adoption rolls.
- **5xx server errors** (~150; Admin + App-Management HTTP 500) — server-side API reliability, not a CLI bug. Owner: the respective API teams.
- **THROTTLED rate limiting** (46; slice `theme`) — expected/transient; should be retried or suppressed, not crash-reported.
- **Hydrogen 403/404** (~40; `hydrogen deploy`/`link`) — separate surface again.

## Root cause
In `packages/cli-kit/src/public/node/error-handler.ts` (`sendErrorToBugsnag`):
1. Every error is rebuilt as `reportableError = new Error(error.message)`, so the error **class is always the generic `Error`**.
2. **No `event.groupingHash` is ever set**, so the backend falls back to stack-trace grouping (cf. the `stack_frame_grouping_hash` column).
3. Stack frame paths are aggressively normalized (`cleanStackFrameFilePath`), and there are really two message shapes thrown from the same request site (`graphql-request`'s `GraphQL Error (Code: NNN): {…}` for theme/store, and `The Admin/App Management GraphQL API responded unsuccessfully with the HTTP status NNN …` for app/hydrogen).

Net: same class + same normalized stack + ignored message ⇒ one bucket for all of them.

## Why we should track these separately
- **Different owners.** This group is assigned to one team, but it contains errors owned by App Management, Hydrogen, Storefront, CLI auth, and the server-side API teams. It cannot be routed as a single item.
- **Different root causes and fixes.** Scope→UX messaging, 401→token refresh, 403→app permissions, THROTTLED→backoff/suppress, 5xx→server reliability. One issue can't carry five fixes, one assignee, or one fix-due date.
- **Expected vs. real regressions get mixed.** Most of the volume is expected user-config / transient throttles; a genuine regression in any single family (e.g. a 401 or 5xx spike from a real bug) is invisible inside the aggregate.
- **It manufactures false P1s.** Severity escalates on aggregate volume, so the group oscillates P1↔P3 independent of whether any underlying problem is bad — and a shipped fix (like #7652) can never turn it green.

## Proposed fix
Set a meaningful grouping key in the Bugsnag `eventHandler`, reusing the existing analytics taxonomy (`categorizeError` + `formatErrorMessage` in `packages/cli-kit/src/private/node/analytics/error-categorizer.ts`, already used by `storage.ts` to emit `error:${category}:${signature}` events):

```ts
import {categorizeError, formatErrorMessage} from '../../private/node/analytics/error-categorizer.js'

const category = categorizeError(error)
event.groupingHash = `${sliceName}:${category.toLowerCase()}:${formatErrorMessage(error, category)}`
```

Notes:
- Include `slice_name` so `app` / `theme` / `store` / `hydrogen` split immediately.
- Trim the message at the first `: {` before categorizing — the raw GraphQL `ClientError` dumps the full request/response JSON, and the literal `"request"` substring currently mis-routes errors into the `network` category. Trimming makes categories semantically correct (scope→Permission, 401→Authentication, THROTTLED→RateLimit) and keeps signatures stable.
- Complementary, higher-leverage: stop reporting known transient/user errors (THROTTLED, and arguably the 401 "Service is not valid for authentication") as *unexpected* — same pattern as #7652 — so they leave crash reporting entirely rather than just getting relabeled.

## Considerations
- Setting `groupingHash` universally re-buckets **all** CLI errors (a one-time grouping migration across every error dashboard), not just these. That's an improvement (grouping finally matches the analytics taxonomy) but should be a conscious, coordinated change.
- Update `packages/cli-kit/src/public/node/error-handler.test.ts`.

## References
- Resiliency: [Vault 31608](https://vault.shopify.io/issues/31608) / shop/issues#32995
- Prior fix: #7652 (shipped in CLI 4.2.0, 2026-06-16)
- Observe grouping hash: `16864652937831232783`
- Reporting code: `packages/cli-kit/src/public/node/error-handler.ts`

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

split observe analytics group for cleaner resiliency tracking #7891

Summary

What's actually in the bucket (last 10 days, ~1,170 events)

Root cause

Why we should track these separately

Proposed fix

Considerations

References

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

split observe analytics group for cleaner resiliency tracking #7891

Description

Summary

What's actually in the bucket (last 10 days, ~1,170 events)

Root cause

Why we should track these separately

Proposed fix

Considerations

References

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions