Skip to content

Add DNS cache to deduplicate upstream lookups#610

Open
clemlesne wants to merge 1 commit intocontainers:mainfrom
dualeai:feat/dnscache
Open

Add DNS cache to deduplicate upstream lookups#610
clemlesne wants to merge 1 commit intocontainers:mainfrom
dualeai:feat/dnscache

Conversation

@clemlesne
Copy link
Copy Markdown

@clemlesne clemlesne commented Feb 12, 2026

Summary

  • Add pkg/dnscache, a thread-safe LRU cache (map + doubly-linked list) with per-entry TTL (5 s) and lazy expiry, capped at 10k entries (~2.5 MB at full load)
  • Wire cache into the DNS handler: Type A lookups check the cache first, fall back to upstream on miss, and populate on resolve
  • Cache is created in addServices() and passed through to dns.New(); passing nil disables caching

Why

Go's net.Resolver does not cache DNS lookups. Each LookupIPAddr() call goes through to the system resolver.

On hosts with a local DNS cache daemon — systemd-resolved (up to 4096 entries, honors upstream TTL up to 2 hours), macOS mDNSResponder (unlimited active entries, min 15 s TTL) — repeated lookups are already cheap. Our cache adds no meaningful value there.

On hosts without a local cache daemon — glibc's stub resolver (minimal containers, Alpine, some RHEL/Debian setups) — every LookupIPAddr() is a real network round trip to the upstream DNS server. Since gvisor-tap-vsock acts as the DNS server for VMs, bursts of repeated queries for the same domain (TLS handshakes, HTTP redirects, retries) each hit upstream. This is where the cache helps.

Relationship with #609 (SNI-based outbound filtering)

This cache becomes more valuable when coupled with #609. The TCP forwarder's SNI allowlist check needs to correlate a TLS ClientHello hostname with previously resolved IPs. Without this cache, that would require a fresh DNS lookup on every TCP connection. With it, the forwarder calls cache.Get() to retrieve the already-resolved IPs — no extra round trip per connection.

Prior art

Application-level DNS caching with short TTLs is a common pattern when the runtime doesn't cache:

Design decisions

  • 5 s TTL — short enough to avoid serving stale results to clients without their own cache, long enough to deduplicate bursts
  • LRU eviction — bounded memory with O(1) Get/Put via map + linked list
  • Lazy expiry — expired entries are cleaned up on access, no background goroutine
  • nil-safe — all existing call sites pass nil and behave identically to before

What is NOT included

Test plan

  • go build ./cmd/gvproxy/
  • go test ./pkg/dnscache/ -count=1 -v (24 tests: correctness, TTL, LRU, concurrency, security, memory usage)
  • go test ./pkg/services/dns/ -count=1 -v (all existing tests pass with nil cache)
  • go vet ./pkg/...
  • Manual smoke test with gvproxy

🤖 Generated with Claude Code

Introduce pkg/dnscache, a thread-safe LRU cache (map + doubly-linked
list) with per-entry TTL and lazy expiry. The DNS handler now checks the
cache before querying upstream for Type A lookups, and populates it on
cache miss. This deduplicates repeated upstream lookups for clients
without their own cache.

The cache is created in addServices() with a 5 s TTL and 10 k entry
capacity (~2.5 MB at full load, verified by a memory-usage test).
Passing nil disables caching, keeping all existing call sites and tests
unchanged in behaviour.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@openshift-ci
Copy link
Copy Markdown
Contributor

openshift-ci bot commented Feb 12, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: clemlesne
Once this PR has been reviewed and has the lgtm label, please assign evidolob for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@clemlesne clemlesne marked this pull request as ready for review February 12, 2026 15:20
@cfergeau
Copy link
Copy Markdown
Collaborator

Have you considered using https://github.com/rs/dnscache which you mention in your PR description? Just curious, not saying you should change.

You will also need to add a DCO to your commits https://github.com/containers/gvisor-tap-vsock/pull/610/checks?check_run_id=63405935692

@clemlesne
Copy link
Copy Markdown
Author

Have you considered using https://github.com/rs/dnscache which you mention in your PR description? Just curious, not saying you should change.

Well this implementation is really simple, why use a lib? A lib inherently add risks. Don’t see the added values there. I would rather to strengthen tests if you want.

What’s your opinion?

You will also need to add a DCO to your commits https://github.com/containers/gvisor-tap-vsock/pull/610/checks?check_run_id=63405935692

I will!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants