Skip to content

fix: bound ipns cache-control to record eol#1166

Open
lidel wants to merge 7 commits into
mainfrom
fix/ipns-cache-control-expiry
Open

fix: bound ipns cache-control to record eol#1166
lidel wants to merge 7 commits into
mainfrom
fix/ipns-cache-control-expiry

Conversation

@lidel
Copy link
Copy Markdown
Member

@lidel lidel commented Jun 3, 2026

Problem

GET /routing/v1/ipns/{name} advertised a cache window that outlived the record: max-age was the record TTL and stale-while-revalidate/stale-if-error covered the full remaining validity, so the stale-serving window ran past the record's EOL by up to one max-age. A cache (the CDN in front of delegated-ipfs.dev, a browser, or a Helia HTTP cache) then served a record whose signature had already expired, which validating clients reject, surfacing as sporadic 500s in the service-worker gateway. The other IPNS surfaces had the same gap, and a malformed negative TTL propagated unchecked, even underflowing the v1 uint64 TTL field.

Fix

  • routing/http/server: cap max-age to the record's remaining validity and size the stale window to the validity left after it, so max-age+stale never crosses EOL; return no-store for a record that is expired or carries an unrecognized ValidityType.
  • gateway: the raw record endpoint (GET /ipns/{name}?format=ipns-record) caps max-age to the remaining validity.
  • namesys: the resolver floors a negative record TTL, and a cache hit reports the entry's remaining lifetime instead of the original TTL.
  • ipns: NewRecord floors a negative TTL (also closing the v1 uint64 underflow) and Validate rejects a record carrying one.

No cache reuses an IPNS response past the record's EOL, and a negative TTL can no longer surface through any IPNS API.

max-age plus stale-while-revalidate could extend past an IPNS
record's EOL, letting caches serve an expired record that then
fails validation. Cap max-age to the remaining validity and size
the stale window to whatever validity is left, so max-age+stale
never crosses EOL. An already-expired record is returned with
Cache-Control: no-store.
@lidel lidel requested a review from achingbrain June 3, 2026 15:36
@lidel lidel requested a review from a team as a code owner June 3, 2026 15:36
@codecov
Copy link
Copy Markdown

codecov Bot commented Jun 3, 2026

Codecov Report

❌ Patch coverage is 77.27273% with 10 lines in your changes missing coverage. Please review.
✅ Project coverage is 63.45%. Comparing base (fc2837e) to head (55fd621).

Files with missing lines Patch % Lines
internal/ipnstest/ipnstest.go 64.70% 3 Missing and 3 partials ⚠️
gateway/handler_ipns_record.go 63.63% 3 Missing and 1 partial ⚠️

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1166      +/-   ##
==========================================
+ Coverage   63.39%   63.45%   +0.05%     
==========================================
  Files         268      269       +1     
  Lines       26967    27002      +35     
==========================================
+ Hits        17096    17133      +37     
+ Misses       8147     8142       -5     
- Partials     1724     1727       +3     
Files with missing lines Coverage Δ
ipns/record.go 75.37% <100.00%> (+0.07%) ⬆️
ipns/validation.go 58.26% <100.00%> (+0.66%) ⬆️
namesys/ipns_resolver.go 53.08% <100.00%> (+3.70%) ⬆️
namesys/namesys_cache.go 55.55% <100.00%> (+8.33%) ⬆️
routing/http/server/server.go 79.09% <100.00%> (+1.26%) ⬆️
gateway/handler_ipns_record.go 28.33% <63.63%> (+8.72%) ⬆️
internal/ipnstest/ipnstest.go 64.70% <64.70%> (ø)

... and 6 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@achingbrain
Copy link
Copy Markdown
Member

lidel added 5 commits June 5, 2026 14:59
A record's TTL is read from CBOR without a sign check, so a negative
value would make min(ttl, remainingValidity) emit a malformed
Cache-Control: max-age=-N. Floor ttl at zero before capping.
The raw IPNS record handler set max-age straight from record.TTL(),
bypassing the resolver's EOL clamp, so a cache could reuse the record
past its signature expiry; a negative TTL also yielded a negative
max-age. Cap max-age to the remaining validity and floor it at zero.
calculateBestTTL clamped a record TTL down to the remaining EOL
validity but never floored a negative record TTL, so a malformed
record with a future EOL leaked a negative duration through
namesys.Result.TTL. Floor the record TTL at zero.
GET /routing/v1/ipns/{name} assumed the default 48h record lifetime
for any record whose ValidityType was not EOL (or could not be
parsed), advertising a stale-serving window for a record whose
expiration is unknown. Treat such records as uncacheable instead.
On a cache hit the resolver returned the record's original TTL, so a
hit late in the cache window re-advertised the full TTL and could let
a downstream cache outlive the record's EOL. Return the cache entry's
remaining lifetime instead, which is already bounded by the EOL.
@lidel lidel marked this pull request as draft June 5, 2026 15:16
@lidel lidel self-assigned this Jun 5, 2026
A negative TTL was stored verbatim: the CBOR field went negative and
the v1 uint64 protobuf field underflowed to a huge value. NewRecord
now floors the TTL at zero, and Validate rejects a record whose CBOR
TTL is negative.

Tests that need a genuinely negative TTL build records at the wire
level via internal/ipnstest, since NewRecord no longer mints them.
@lidel lidel marked this pull request as ready for review June 5, 2026 18:12
@lidel
Copy link
Copy Markdown
Member Author

lidel commented Jun 5, 2026

Hey @gammazero, heads-up before this lands: it grew quite a bit since your approval (which was on just the initial routing/v1 cap, bd30921). I extended the same EOL-bounding to the other IPNS surfaces and hardened negative-TTL handling:

  • routing/http/server: also returns no-store for an unrecognized ValidityType, and never emits a negative max-age.
  • gateway: the raw record endpoint (?format=ipns-record) caps max-age to the remaining validity.
  • namesys: the resolver floors a negative record TTL, and a cache hit reports the entry's remaining lifetime (if less than TTL) instead of the original TTL.
  • ipns: NewRecord floors a negative TTL (also closing the v1 uint64 underflow) and Validate rejects a record carrying one.

I smoke-tested all of it in ipfs/kubo#11349 (bumps kubo to this branch's tip, CI green) and used it there to add user-facing --ttl/--lifetime sanity checks to ipfs name publish. Green here and there.

Could you give the post-approval commits a final look? If they look good to you, this is ready to merge and cut a boxo release, which is what lets someguy/delegated-ipfs.dev pick it up and clear the service-worker-gateway 500s (ipfs/service-worker-gateway#1072).

Thanks for the earlier review!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants