Open
Conversation
…e resource management during shutdown
…tion in BootstrapNode
…educing ticker interval and improving logging
…g metrics in BootstrapNode
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
ConnManagerHighWater + 100so gossipsub peer-scoring state cannot grow unbounded as peers churn.BootstrapNode.Close()with cancelable context,NotifyBundlederegistration, and DHT close — wired intocmd/main.goshutdown.PPROF_PORT) and a 60-second status logger reporting connected/peerstore/dht_rt/goroutines/heap_alloc/heap_inuse/sys.start.sh/stop.sh/build-docker.sh,--buildflag forstart.sh, signal trap for clean shutdown, Go base image bumped to 1.25, README debugging guide.Why
The bootstrap node was leaking memory under sustained peer churn. Three root causes were identified:
Conns,ConnsInbound,ConnsOutbound,Memory, andFDwere allrcmgr.Unlimitedin both theSystemandTransientscopes. Gossipsub peer-score state is tracked per peer/connection, so unbounded conns meant unbounded scoring memory.RemovePeerdoes not clear addresses, so we now callClearAddrsfirst and thenRemovePeer.host.Close(). The DHT, the network notification bundle, and the parent context for DHT/gossipsub were never released — so background goroutines and routing-table state lingered across restarts in long-running test cycles.Changes
pkg/service/bootstrap.goSystemandTransientscopes:Conns / ConnsInbound / ConnsOutboundnowrcmgr.LimitVal(cfg.ConnManagerHighWater + 100)instead ofUnlimited.BootstrapNodenow holdsPubsub,notificationBundle,ctx, andcancelfor proper lifecycle.hostCtx) is used for DHT and gossipsub construction; it is cancelled on every constructor failure path.Close()method orders cleanup as: cancel context →Network().StopNotify(bundle)→DHT.Close()→Host.Close(), aggregating errors into a single returned error.startPeerstoreGC()goroutine ticks every 2 minutes; for each non-self peer not currently connected, it callsClearAddrs(p)thenRemovePeer(p)and logs a per-cycle summary at info level when anything was removed (debug otherwise)._, err = pubsub.NewGossipSub(...)), so the GC root is explicit and we can extend pubsub usage later without reconstructing the instance.cmd/main.goPPROF_PORTenv var; logs the listen address on startup and an error ifListenAndServefails.Connected peers: Nlog) emits:node.Close()(instead ofnode.Host.Close()) and logs aggregated shutdown errors.signal.Stop(sigs)is called on the shutdown path.Docker / scripts
Dockerfile:golang:1.24.5-alpine→golang:1.25-alpine.docker-compose.yaml: expose${PPROF_PORT:-6060}:${PPROF_PORT:-6060}and passPPROF_PORTthrough as an environment variable.start.sh,stop.sh,build-docker.sh: detectdocker composeplugin vs the standalonedocker-composebinary and fail clearly if neither is present.start.sh: parses a--buildflag, runs<DOCKER_COMPOSE> buildfirst when set, and installs a signal trap that runs<DOCKER_COMPOSE> downonSIGINT/SIGTERM.Docs / misc
README.md: new Debugging Memory / Performance section explaining the status line metrics and pprof recipes (heap,-alloc_space, two-snapshot-basediff, goroutine dump)..gitignore: add.DS_Store.File map
pkg/service/bootstrap.gocmd/main.gostart.sh--build, signal trapREADME.mdstop.shbuild-docker.shdocker-compose.yamlDockerfile.gitignore.DS_StoreCommits
51882964d248260eb61eaa9f7554f762208fb6ddb7Test plan
go build ./...succeeds on Go 1.25../start.sh --buildbuilds the image and runs the container;./stop.shshuts it down cleanly on bothdocker compose(plugin) anddocker-compose(standalone) hosts.PPROF_PORT=6060set,curl http://localhost:6060/debug/pprof/heapreturns a profile and the port is reachable from the host.go tool pprof -base heap1.pb.gz heap2.pb.gzdoes not show unbounded growth in gossipsub peer-score allocations.peerstore=value in the status line tracksconnected=rather than monotonically increasing, andPeerstore GC: removed N stale peerslog lines appear.SIGINT, logs showShutting down bootstrap node...followed by clean DHT and host close (noError during shutdownlines).Risk / rollout
ConnManagerHighWater + 100. Confirm the configured high-water mark is sized for expected peak inbound peer count — too low will reject legitimate connections during traffic spikes.