Skip to content

docs: define deals, retrievals, and metrics#164

Merged
BigLep merged 8 commits intomainfrom
docs/whats-it-doing
Feb 13, 2026
Merged

docs: define deals, retrievals, and metrics#164
BigLep merged 8 commits intomainfrom
docs/whats-it-doing

Conversation

@SgtPooki
Copy link
Copy Markdown
Collaborator

This PR attempts to handle the documentation parts of #158, #159, #160.

This is all in a single PR to make the entire scope of documentation changes easier to review.

Copilot AI review requested due to automatic review settings January 28, 2026 14:40
@FilOzzy FilOzzy added this to FOC Jan 28, 2026
@SgtPooki SgtPooki requested a review from BigLep January 28, 2026 14:40
@github-project-automation github-project-automation Bot moved this to 📌 Triage in FOC Jan 28, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds public “source of truth” documentation for Dealbot’s Data Storage and Retrieval checks, plus a consolidated reference for check events and dashboard metrics (addressing #158, #159, #160).

Changes:

  • Introduces a Data Storage Check specification and lifecycle description.
  • Introduces a Retrieval Check specification, including selection logic and recorded metrics.
  • Adds an Events & Metrics reference for dashboard consumers (with TBD markers for not-yet-implemented items).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 8 comments.

File Description
docs/checks/data-storage.md Defines intended Data Storage check behavior, lifecycle, assertions, metrics, and configuration.
docs/checks/retrievals.md Defines intended Retrieval check behavior, deal selection, assertions, recording, metrics, and configuration.
docs/checks/events-and-metrics.md Documents a canonical event/metric model and links expected sources of truth in code.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/checks/events-and-metrics.md Outdated
Comment thread docs/checks/retrievals.md Outdated
Comment thread docs/checks/retrievals.md Outdated
Comment thread docs/checks/retrievals.md Outdated
Comment thread docs/checks/retrievals.md Outdated
Comment thread docs/checks/data-storage.md
Comment thread docs/checks/data-storage.md Outdated
Comment thread docs/checks/data-storage.md Outdated
@rjan90 rjan90 moved this from 📌 Triage to ⌨️ In Progress in FOC Jan 28, 2026
@rjan90 rjan90 added this to the M4.0: mainnet staged milestone Jan 28, 2026
@rjan90 rjan90 moved this from ⌨️ In Progress to 🔎 Awaiting review in FOC Jan 28, 2026
Copy link
Copy Markdown
Contributor

@BigLep BigLep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together Russell. I think this will be a useful artifact going forward, and I think it's useful right now for making sure we're fully aligned on what #158 and #159 should implement. I think it was smart to start with it.

I have read through everything, but because of time I didn't comment on each thing I saw. I have a few inline comments and some more general things. If I had more time tonight, I would have just created a PR on top of yours to point out some other suggestions or changes. I think in terms of next steps it would be great if you can review/incorporate the feedback I have provided so far. I'll then checkout the change and as I read through it again, make any other changes in a PR on top so it's easy for you to review/accept/reject my items. Sound good?

Generall items

  • I think we should write this document assuming the flow for when #158 and #159 are done (since they will be completed soon). It's fine/good to denote the parts that aren't complete as TBD or TBI (to be implemented). For example, lets write it assuming the IPNI advertising and verification happens as part of the sequential "data storage" flow
  • What retries are we doing?
  • Rather than stating line number in text can we just #123 in the URLs (more convenient and less text presented to user)
  • Maybe this is a losing battle, but rather than use the word "deal" throughout the document, do we use "piece" since that has a definition throughout FWSS?
  • Some parts in each document seem duplicative? Maybe ask AI for ideas to reduce redundancy?
  • In particular, there seems to be some duplication of the metric definitions across the files? Either reference events-and-metrics.md as much as possible or inline that content into the other two files?
  • Not critical, but maybe do mermaid diagrams?

Comment thread docs/checks/data-storage.md
Comment thread docs/checks/data-storage.md Outdated
Comment thread docs/checks/data-storage.md Outdated
Comment thread docs/checks/data-storage.md Outdated
Comment thread docs/checks/data-storage.md Outdated
Comment thread docs/checks/events-and-metrics.md Outdated
Comment thread docs/checks/events-and-metrics.md Outdated
Comment thread docs/checks/events-and-metrics.md Outdated
Comment thread docs/checks/events-and-metrics.md Outdated
Comment thread docs/checks/events-and-metrics.md Outdated
@BigLep BigLep moved this from 🔎 Awaiting review to ⌨️ In Progress in FOC Jan 29, 2026
@SgtPooki SgtPooki mentioned this pull request Jan 29, 2026
@BigLep BigLep linked an issue Jan 30, 2026 that may be closed by this pull request
@SgtPooki
Copy link
Copy Markdown
Collaborator Author

@BigLep I addressed all comments.. lmk if you need anything else from me

@rjan90 rjan90 moved this from ⌨️ In Progress to 🔎 Awaiting review in FOC Jan 30, 2026
@BigLep BigLep mentioned this pull request Jan 31, 2026
@rjan90 rjan90 requested a review from BigLep February 2, 2026 09:44
@timfong888
Copy link
Copy Markdown

Is this mean to include the thresholds from https://linear.app/filoz/document/public-definition-of-thresholds-for-dealbot-ccf29ae32583

I scanned in manually and had AI do a check as well. AI output:

  1. No success rate targets (97% for storage/retrieval)
  2. No TTLB threshold (20 seconds for retrieval)
  3. No PDP/Data Retention documentation (the entire third pillar is missing)
  4. Retrieval selection algorithm differs from Linear spec (this one this document might be correct since we froze the Linear so this may represent other changes).

@BigLep
Copy link
Copy Markdown
Contributor

BigLep commented Feb 5, 2026

@timfong888 : I am active in adding a lot more content to this including thresholds - I'll ping you when it's ready for review.

@BigLep BigLep moved this from 🔎 Awaiting review to ⌨️ In Progress in FOC Feb 5, 2026
@BigLep
Copy link
Copy Markdown
Contributor

BigLep commented Feb 5, 2026

#228 is where I am working on top of this PR, but it's not ready for review yet. I need to review with fresh eyes and explain my thinking better.

@BigLep BigLep moved this from ⌨️ In Progress to 🔎 Awaiting review in FOC Feb 7, 2026
@BigLep
Copy link
Copy Markdown
Contributor

BigLep commented Feb 10, 2026

I'll approve this once #228 is merged.

)

* docs: explain deal status as composite of sub-statuses with branching flow

- Add explanation that overall deal status depends on Upload, Onchain, Discoverability, and Retrieval sub-statuses
- Document branching flow: Upload → (Onchain ∥ Discoverability) → Retrieval
- Add multiple diagram options showing the dependency structure
- Update assertions table with retries and metric references
- Add production configuration and approval methodology doc

* docs: clean up FAQ and add checks directory README

- Add README.md to checks directory with overview and terminology
- Move 'What is DealBot?' section to README
- Add MIN_NUM_DATASETS_FOR_CHECKS to configuration table with description
- Clean up incomplete FAQ entries in data-storage.md
- Add link to filecoin-pin FAQ for filecoinpin.contact vs cid.contact
- Add SP maintenance window section to production config doc
- Add check frequency information to production config doc
- Fix metric name typo (dataStorageCheckMs -> retrievalCheckMs)

* docs: fix typos, broken links, and grammar issues in checks docs

- Fix typos: instnace→instance, appropraite→appropriate, verificaiont→verification, etc.
- Fix broken anchor links in data-storage.md
- Fix grammar: dealbot confirm→confirms, already knows→know, etc.
- Remove TBD_VARIABLE placeholders and TODO comments
- Fix missing closing backticks in formulas
- Fix duplicate words and double spaces
- Standardize terminology and fix inconsistent section references
- Fix metric name typos and broken event links in events-and-metrics.md
- Complete incomplete sentences and fix punctuation

* docs: expand Prometheus/OpenTelemetry metrics label documentation

Add details about checkType and providerId labels/attributes for filtering metrics

* docs: use 'SP under test' consistently in checks docs

Replace SP-under-test and SP under testing with SP under test / SPs under test throughout.

* docs: add MAX_RETRIEVAL_CHECKS config vars to retrievals Configuration section

* docs: document `MAX_RETRIEVAL_CHECKS_PER_SP_PER_CYCLE` for production

* docs: redraw deal status diagram so Retrieval clearly follows Discoverability

Show Upload → Onchain and Discoverability in parallel, then Retrieval only after Discoverability succeeds; deal success requires both Onchain and Retrieval.

* docs: fix typo and cross-references in production config doc

- bandwith → bandwidth
- Link Data Storage Check and Retrieval Check to data-storage.md and retrievals.md

* docs: align data-storage metric links with events-and-metrics definitions

- Use pieceConfirmedOnChainMs and spIndexLocallyMs for assertions 3 and 4
- Set assertion 2 metric to n/a (no matching metric in events-and-metrics)

* docs: add Sub Status Affected column to What Gets Asserted table

Column links to sub-status meanings and maps each assertion to Upload, Onchain, Discoverability, or Retrieval.

* docs: light polish in checks docs

- events-and-metrics: fix 'metrics results' wording, Emmitted→Emitted, remove empty bullet, fix table cell
- retrievals: subject-verb agreement (remain), 'on 5xx response'
- production-config: clarify RANDOM_PIECE_SIZES note for retrieval downloads

* docs: fix Count Related Metrics table alignment in events-and-metrics

* docs: clarify count metrics use a value label (success|failure or HTTP code)

* Incorporating feedback based on @SgtPooki 2026-02-05 discussion

* Various improvements from reading through fresh

* More fixup when doing self review

* Applying copilot fixups.

* Apply suggestion from @Copilot

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Incorporate feedback from @SgtPooki in #228 (review)

* Incorporated feedback from latest round of verbal conversation and me reading through again

---------

Co-authored-by: Russell Dempsey <1173416+SgtPooki@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
@BigLep
Copy link
Copy Markdown
Contributor

BigLep commented Feb 13, 2026

We've had a lot of discussion sync and async on this, especially in #228.

I'm going to merge at this point given we have a couple of items tracking open related items:
#174
#282

@github-project-automation github-project-automation Bot moved this from 🔎 Awaiting review to ✔️ Approved by reviewer in FOC Feb 13, 2026
@BigLep BigLep merged commit a9e6e7d into main Feb 13, 2026
6 checks passed
@github-project-automation github-project-automation Bot moved this from ✔️ Approved by reviewer to 🎉 Done in FOC Feb 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: 🎉 Done

Development

Successfully merging this pull request may close these issues.

Document "check" events and metrics

6 participants