docs: define deals, retrievals, and metrics by SgtPooki · Pull Request #164 · FilOzone/dealbot

SgtPooki · 2026-01-28T14:40:33Z

This PR attempts to handle the documentation parts of #158, #159, #160.

This is all in a single PR to make the entire scope of documentation changes easier to review.

Copilot

Pull request overview

Adds public “source of truth” documentation for Dealbot’s Data Storage and Retrieval checks, plus a consolidated reference for check events and dashboard metrics (addressing #158, #159, #160).

Changes:

Introduces a Data Storage Check specification and lifecycle description.
Introduces a Retrieval Check specification, including selection logic and recorded metrics.
Adds an Events & Metrics reference for dashboard consumers (with TBD markers for not-yet-implemented items).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File	Description
docs/checks/data-storage.md	Defines intended Data Storage check behavior, lifecycle, assertions, metrics, and configuration.
docs/checks/retrievals.md	Defines intended Retrieval check behavior, deal selection, assertions, recording, metrics, and configuration.
docs/checks/events-and-metrics.md	Documents a canonical event/metric model and links expected sources of truth in code.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

BigLep

Thanks for putting this together Russell. I think this will be a useful artifact going forward, and I think it's useful right now for making sure we're fully aligned on what #158 and #159 should implement. I think it was smart to start with it.

I have read through everything, but because of time I didn't comment on each thing I saw. I have a few inline comments and some more general things. If I had more time tonight, I would have just created a PR on top of yours to point out some other suggestions or changes. I think in terms of next steps it would be great if you can review/incorporate the feedback I have provided so far. I'll then checkout the change and as I read through it again, make any other changes in a PR on top so it's easy for you to review/accept/reject my items. Sound good?

Generall items

I think we should write this document assuming the flow for when #158 and #159 are done (since they will be completed soon). It's fine/good to denote the parts that aren't complete as TBD or TBI (to be implemented). For example, lets write it assuming the IPNI advertising and verification happens as part of the sequential "data storage" flow
What retries are we doing?
Rather than stating line number in text can we just #123 in the URLs (more convenient and less text presented to user)
Maybe this is a losing battle, but rather than use the word "deal" throughout the document, do we use "piece" since that has a definition throughout FWSS?
Some parts in each document seem duplicative? Maybe ask AI for ideas to reduce redundancy?
In particular, there seems to be some duplication of the metric definitions across the files? Either reference events-and-metrics.md as much as possible or inline that content into the other two files?
Not critical, but maybe do mermaid diagrams?

SgtPooki · 2026-01-30T17:52:19Z

@BigLep I addressed all comments.. lmk if you need anything else from me

timfong888 · 2026-02-05T05:06:12Z

Is this mean to include the thresholds from https://linear.app/filoz/document/public-definition-of-thresholds-for-dealbot-ccf29ae32583

I scanned in manually and had AI do a check as well. AI output:

No success rate targets (97% for storage/retrieval)
No TTLB threshold (20 seconds for retrieval)
No PDP/Data Retention documentation (the entire third pillar is missing)
Retrieval selection algorithm differs from Linear spec (this one this document might be correct since we froze the Linear so this may represent other changes).

BigLep · 2026-02-05T06:09:59Z

@timfong888 : I am active in adding a lot more content to this including thresholds - I'll ping you when it's ready for review.

BigLep · 2026-02-05T09:58:15Z

#228 is where I am working on top of this PR, but it's not ready for review yet. I need to review with fresh eyes and explain my thinking better.

BigLep · 2026-02-10T00:44:16Z

I'll approve this once #228 is merged.

@SgtPooki

) * docs: explain deal status as composite of sub-statuses with branching flow - Add explanation that overall deal status depends on Upload, Onchain, Discoverability, and Retrieval sub-statuses - Document branching flow: Upload → (Onchain ∥ Discoverability) → Retrieval - Add multiple diagram options showing the dependency structure - Update assertions table with retries and metric references - Add production configuration and approval methodology doc * docs: clean up FAQ and add checks directory README - Add README.md to checks directory with overview and terminology - Move 'What is DealBot?' section to README - Add MIN_NUM_DATASETS_FOR_CHECKS to configuration table with description - Clean up incomplete FAQ entries in data-storage.md - Add link to filecoin-pin FAQ for filecoinpin.contact vs cid.contact - Add SP maintenance window section to production config doc - Add check frequency information to production config doc - Fix metric name typo (dataStorageCheckMs -> retrievalCheckMs) * docs: fix typos, broken links, and grammar issues in checks docs - Fix typos: instnace→instance, appropraite→appropriate, verificaiont→verification, etc. - Fix broken anchor links in data-storage.md - Fix grammar: dealbot confirm→confirms, already knows→know, etc. - Remove TBD_VARIABLE placeholders and TODO comments - Fix missing closing backticks in formulas - Fix duplicate words and double spaces - Standardize terminology and fix inconsistent section references - Fix metric name typos and broken event links in events-and-metrics.md - Complete incomplete sentences and fix punctuation * docs: expand Prometheus/OpenTelemetry metrics label documentation Add details about checkType and providerId labels/attributes for filtering metrics * docs: use 'SP under test' consistently in checks docs Replace SP-under-test and SP under testing with SP under test / SPs under test throughout. * docs: add MAX_RETRIEVAL_CHECKS config vars to retrievals Configuration section * docs: document `MAX_RETRIEVAL_CHECKS_PER_SP_PER_CYCLE` for production * docs: redraw deal status diagram so Retrieval clearly follows Discoverability Show Upload → Onchain and Discoverability in parallel, then Retrieval only after Discoverability succeeds; deal success requires both Onchain and Retrieval. * docs: fix typo and cross-references in production config doc - bandwith → bandwidth - Link Data Storage Check and Retrieval Check to data-storage.md and retrievals.md * docs: align data-storage metric links with events-and-metrics definitions - Use pieceConfirmedOnChainMs and spIndexLocallyMs for assertions 3 and 4 - Set assertion 2 metric to n/a (no matching metric in events-and-metrics) * docs: add Sub Status Affected column to What Gets Asserted table Column links to sub-status meanings and maps each assertion to Upload, Onchain, Discoverability, or Retrieval. * docs: light polish in checks docs - events-and-metrics: fix 'metrics results' wording, Emmitted→Emitted, remove empty bullet, fix table cell - retrievals: subject-verb agreement (remain), 'on 5xx response' - production-config: clarify RANDOM_PIECE_SIZES note for retrieval downloads * docs: fix Count Related Metrics table alignment in events-and-metrics * docs: clarify count metrics use a value label (success|failure or HTTP code) * Incorporating feedback based on @SgtPooki 2026-02-05 discussion * Various improvements from reading through fresh * More fixup when doing self review * Applying copilot fixups. * Apply suggestion from @Copilot Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> * Incorporate feedback from @SgtPooki in #228 (review) * Incorporated feedback from latest round of verbal conversation and me reading through again --------- Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

BigLep · 2026-02-13T21:08:31Z

We've had a lot of discussion sync and async on this, especially in #228.

I'm going to merge at this point given we have a couple of items tracking open related items:
#174
#282

docs: define deals, retrievals, and metrics

96dc749

Copilot AI review requested due to automatic review settings January 28, 2026 14:40

FilOzzy added this to FOC Jan 28, 2026

SgtPooki requested a review from BigLep January 28, 2026 14:40

github-project-automation Bot moved this to 📌 Triage in FOC Jan 28, 2026

Copilot started reviewing on behalf of SgtPooki January 28, 2026 14:40 View session

Copilot AI reviewed Jan 28, 2026

View reviewed changes

rjan90 moved this from 📌 Triage to ⌨️ In Progress in FOC Jan 28, 2026

rjan90 added this to the M4.0: mainnet staged milestone Jan 28, 2026

rjan90 assigned SgtPooki Jan 28, 2026

rjan90 moved this from ⌨️ In Progress to 🔎 Awaiting review in FOC Jan 28, 2026

BigLep reviewed Jan 29, 2026

View reviewed changes

BigLep moved this from 🔎 Awaiting review to ⌨️ In Progress in FOC Jan 29, 2026

SgtPooki mentioned this pull request Jan 29, 2026

feat: deal overhaul #175

Merged

BigLep modified the milestones: M4.0: mainnet staged, M4.1: mainnet ready Jan 30, 2026

BigLep linked an issue Jan 30, 2026 that may be closed by this pull request

Document "check" events and metrics #160

Closed

SgtPooki added 6 commits January 30, 2026 11:17

fix: address pr comments

4b7da38

docs: fix events/metrics and retrieval check details

42e618b

chore: link to lines

7718547

chore: deal -> piece wording

c46e21a

chore: further doc cleanup

b8c3f65

chore: mermaid diagrams

97d1e95

rjan90 moved this from ⌨️ In Progress to 🔎 Awaiting review in FOC Jan 30, 2026

BigLep mentioned this pull request Jan 31, 2026

Determine GA scope #123

Closed

rjan90 requested a review from BigLep February 2, 2026 09:44

BigLep moved this from 🔎 Awaiting review to ⌨️ In Progress in FOC Feb 5, 2026

BigLep self-assigned this Feb 5, 2026

BigLep mentioned this pull request Feb 5, 2026

docs: improve checks documentation - fix typos, links, and grammar #228

Merged

BigLep moved this from ⌨️ In Progress to 🔎 Awaiting review in FOC Feb 7, 2026

BigLep mentioned this pull request Feb 10, 2026

Document "check" events and metrics #160

Closed

BigLep mentioned this pull request Feb 10, 2026

Document production configuration settings #174

Closed

BigLep approved these changes Feb 13, 2026

View reviewed changes

github-project-automation Bot moved this from 🔎 Awaiting review to ✔️ Approved by reviewer in FOC Feb 13, 2026

BigLep merged commit a9e6e7d into main Feb 13, 2026
6 checks passed

github-project-automation Bot moved this from ✔️ Approved by reviewer to 🎉 Done in FOC Feb 13, 2026

SgtPooki mentioned this pull request Feb 23, 2026

fix: align retrieval checks with docs #299

Merged

Conversation

SgtPooki commented Jan 28, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

BigLep left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

SgtPooki commented Jan 30, 2026

Uh oh!

timfong888 commented Feb 5, 2026

Uh oh!

BigLep commented Feb 5, 2026

Uh oh!

BigLep commented Feb 5, 2026

Uh oh!

BigLep commented Feb 10, 2026

Uh oh!

BigLep commented Feb 13, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants