Skip to content

Implement otela-fleet project with CLI, templates, and deployment#54

Merged
xzyaoi merged 19 commits intomainfrom
feat/fleet-manager
Apr 12, 2026
Merged

Implement otela-fleet project with CLI, templates, and deployment#54
xzyaoi merged 19 commits intomainfrom
feat/fleet-manager

Conversation

@xzyaoi
Copy link
Copy Markdown
Collaborator

@xzyaoi xzyaoi commented Apr 9, 2026

No description provided.

xzyaoi and others added 13 commits April 9, 2026 10:51
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…c_prefix

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…t single)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…lica support

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ity hashing

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…mmands

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- Fix double #SBATCH prefix in extra_sbatch template rendering
- Use unique job script filenames to avoid race condition with replicas
- Add job_identity unit test
- Update template test to match extra_sbatch convention

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
- Add README.md and docs/ (getting-started, cluster-config, fleet-apply)
- Add `otela-fleet clusters` and `otela-fleet presets <cluster>` commands
- Default config dir: ./clusters or ~/.config/opentela/fleet/clusters/

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Clariden now has a persistent relay at 148.187.108.172, so relay
startup is skipped and workers bootstrap directly from it.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings April 9, 2026 12:26
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Apr 9, 2026

Test Coverage Report 📊

Click to view detailed coverage
opentela/entry/cmd/init.go:103:					writeDefaultConfig		0.0%
opentela/entry/cmd/root.go:34:					init				100.0%
opentela/entry/cmd/root.go:70:					configFilePath			85.7%
opentela/entry/cmd/root.go:85:					initConfig			88.4%
opentela/entry/cmd/root.go:215:					Execute				0.0%
opentela/entry/cmd/start.go:40:					init				100.0%
opentela/entry/cmd/update.go:13:				doUpdate			0.0%
opentela/entry/cmd/wallet.go:388:				init				100.0%
opentela/entry/main.go:18:					main				0.0%
opentela/internal/attestation/attestation.go:43:		loadPubKey			53.8%
opentela/internal/attestation/attestation.go:65:		attestationMessage		100.0%
opentela/internal/attestation/attestation.go:72:		Verify				75.0%
opentela/internal/attestation/attestation.go:96:		Sign				100.0%
opentela/internal/attestation/cmd/buildsign/main.go:22:		main				0.0%
opentela/internal/attestation/cmd/buildsign/main.go:39:		keygen				0.0%
opentela/internal/attestation/cmd/buildsign/main.go:51:		sign				0.0%
opentela/internal/common/constants.go:10:			GetHomePath			63.6%
opentela/internal/common/constants.go:27:			GetDBPath			100.0%
opentela/internal/common/filesystem.go:7:			RemoveDir			100.0%
opentela/internal/common/logger.go:11:				init				100.0%
opentela/internal/common/logger.go:15:				InitLogger			50.0%
opentela/internal/common/logger.go:55:				ReportError			50.0%
opentela/internal/common/process/manager.go:13:			NewProcessManager		100.0%
opentela/internal/common/process/manager.go:20:			StartProcess			0.0%
opentela/internal/common/process/manager.go:27:			StopAllProcesses		0.0%
opentela/internal/common/process/manager.go:33:			StartCriticalProcess		57.1%
opentela/internal/common/process/manager.go:44:			HealthCheck			50.0%
opentela/internal/common/process/process.go:36:			NewProcess			100.0%
opentela/internal/common/process/process.go:63:			Start				94.4%
opentela/internal/common/process/process.go:93:			SetTimeout			66.7%
opentela/internal/common/process/process.go:101:		Wait				100.0%
opentela/internal/common/process/process.go:105:		awaitOutput			100.0%
opentela/internal/common/process/process.go:111:		Kill				100.0%
opentela/internal/common/process/process.go:120:		OpenInputStream			71.4%
opentela/internal/common/process/process.go:133:		StreamOutput			77.8%
opentela/internal/common/process/process.go:151:		finishTimeOutOrDie		87.5%
opentela/internal/common/process/process.go:169:		cleanup				100.0%
opentela/internal/common/process/process.go:181:		isRunning			0.0%
opentela/internal/common/requests.go:12:			RemoteGET			52.6%
opentela/internal/common/serialization.go:5:			DictionaryToBytes		100.0%
opentela/internal/common/utils.go:3:				DeduplicateStrings		100.0%
opentela/internal/ingest/server.go:40:				Run				0.0%
opentela/internal/ingest/server.go:70:				collectStats			0.0%
opentela/internal/metrics/collector.go:20:			NewAggregatedCollector		100.0%
opentela/internal/metrics/collector.go:38:			Describe			0.0%
opentela/internal/metrics/collector.go:42:			Collect				88.9%
opentela/internal/metrics/collector.go:58:			SetNetworkStats			100.0%
opentela/internal/metrics/collector.go:63:			SetScraperTargets		100.0%
opentela/internal/metrics/collector.go:67:			metricFromDTO			91.7%
opentela/internal/metrics/peer_provider.go:15:			GetScrapablePeers		0.0%
opentela/internal/metrics/peer_provider.go:42:			extractServices			100.0%
opentela/internal/metrics/peer_provider.go:57:			buildPeerLabels			100.0%
opentela/internal/metrics/relabeler.go:12:			Relabel				100.0%
opentela/internal/metrics/relabeler.go:53:			sanitizeMetricName		75.0%
opentela/internal/metrics/scraper.go:54:			NewMetricsScraper		100.0%
opentela/internal/metrics/scraper.go:83:			Start				0.0%
opentela/internal/metrics/scraper.go:99:			Stop				0.0%
opentela/internal/metrics/scraper.go:103:			GetCachedMetrics		100.0%
opentela/internal/metrics/scraper.go:114:			GetSelfMetrics			0.0%
opentela/internal/metrics/scraper.go:118:			scrapeAll			100.0%
opentela/internal/metrics/scraper.go:145:			scrapePeer			72.7%
opentela/internal/metrics/scraper.go:166:			scrapeTarget			84.6%
opentela/internal/metrics/scraper.go:189:			parseMetrics			84.6%
opentela/internal/metrics/scraper.go:210:			evictStale			85.7%
opentela/internal/platform/gpu.go:10:				GetGPUInfo			75.0%
opentela/internal/platform/gpu.go:18:				getNvidiaGPUs			26.3%
opentela/internal/platform/gpu.go:48:				getAMDGPUs			12.5%
opentela/internal/platform/gpu.go:117:				csvField			100.0%
opentela/internal/platform/slurm/env.go:10:			IsSlurm				100.0%
opentela/internal/platform/slurm/env.go:14:			getJobId			100.0%
opentela/internal/platform/slurm/env.go:18:			getNodeId			100.0%
opentela/internal/platform/slurm/env.go:22:			getRemainingTimeInSeconds	44.4%
opentela/internal/platform/slurm/env.go:43:			GetJobInfo			100.0%
opentela/internal/protocol/bootstrap.go:17:			getDefaultBootstrapPeers	77.8%
opentela/internal/protocol/bootstrap.go:49:			collectBootstrapSources		92.3%
opentela/internal/protocol/bootstrap.go:71:			resolveBootstrapSources		90.0%
opentela/internal/protocol/bootstrap.go:88:			resolveBootstrapSource		85.7%
opentela/internal/protocol/bootstrap.go:104:			fetchHTTPBootstraps		80.0%
opentela/internal/protocol/bootstrap.go:120:			fetchDNSAddrBootstraps		77.8%
opentela/internal/protocol/bootstrap.go:147:			expandBootstrapValue		54.5%
opentela/internal/protocol/bootstrap.go:167:			splitBootstrapValue		100.0%
opentela/internal/protocol/bootstrap.go:174:			parseBootstrapMultiaddrs	75.0%
opentela/internal/protocol/clock.go:20:				StartTicker			0.0%
opentela/internal/protocol/crdt.go:33:				GetCRDTStore			0.0%
opentela/internal/protocol/crdt.go:183:				Reconnect			0.0%
opentela/internal/protocol/crdt.go:199:				ClearCRDTStore			0.0%
opentela/internal/protocol/go-ds-crdt/compaction.go:16:		decodeTombstoneTimestamp	0.0%
opentela/internal/protocol/go-ds-crdt/compaction.go:32:		CompactTombstones		0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:114:		verify				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:147:		DefaultOptions			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:231:		New				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:370:		handleNext			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:461:		decodeBroadcast			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:492:		encodeBroadcast			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:501:		randomizeInterval		0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:510:		rebroadcast			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:527:		repair				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:555:		rebroadcastHeads		0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:588:		logStats			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:614:		handleBlock			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:632:		handleBranch			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:650:		dagWorker			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:690:		sendNewJobs			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:775:		sendJobWorker			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:791:		processedBlockKey		0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:795:		isProcessed			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:799:		markProcessed			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:803:		dirtyKey			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:808:		MarkDirty			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:817:		IsDirty				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:826:		MarkClean			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:836:		processNode			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:959:		repairDAG			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1072:		Repair				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1078:		Get				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1086:		Has				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1093:		GetSize				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1108:		Query				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1117:		Put				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1123:		Delete				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1137:		Sync				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1181:		Context				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1186:		Close				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1198:		Batch				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1202:		deltaMerge			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1215:		addToDelta			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1221:		rmvToDelta			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1232:		updateDeltaWithRemove		0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1254:		updateDelta			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1265:		publishDelta			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1276:		putBlock			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1295:		publish				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1307:		addDAGNode			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1349:		broadcast			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1383:		Put				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1395:		Delete				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1409:		Commit				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1415:		PrintDAG			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1434:		printDAGRec			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1495:		DotDAG				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1523:		dotDAGRec			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1576:		InternalStats			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1591:		newCidSafeSet			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1597:		Visit				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1610:		Remove				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1618:		Has				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:29:		newHeads			0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:42:		key				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:47:		write				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:56:		delete				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:68:		IsHead				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:79:		Len				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:90:		Replace				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:134:		Add				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:149:		List				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:176:		primeCache			0.0%
opentela/internal/protocol/go-ds-crdt/ipld.go:23:		GetDelta			0.0%
opentela/internal/protocol/go-ds-crdt/ipld.go:33:		GetPriority			0.0%
opentela/internal/protocol/go-ds-crdt/ipld.go:48:		GetDeltas			0.0%
opentela/internal/protocol/go-ds-crdt/ipld.go:72:		extractDelta			0.0%
opentela/internal/protocol/go-ds-crdt/ipld.go:82:		makeNode			0.0%
opentela/internal/protocol/go-ds-crdt/migrations.go:18:		versionKey			0.0%
opentela/internal/protocol/go-ds-crdt/migrations.go:22:		getVersion			0.0%
opentela/internal/protocol/go-ds-crdt/migrations.go:39:		setVersion			0.0%
opentela/internal/protocol/go-ds-crdt/migrations.go:50:		applyMigrations			0.0%
opentela/internal/protocol/go-ds-crdt/migrations.go:81:		migrate0to1			0.0%
opentela/internal/protocol/go-ds-crdt/pubsub_broadcaster.go:28:	NewPubSubBroadcaster		0.0%
opentela/internal/protocol/go-ds-crdt/pubsub_broadcaster.go:60:	Broadcast			0.0%
opentela/internal/protocol/go-ds-crdt/pubsub_broadcaster.go:65:	Next				0.0%
opentela/internal/protocol/go-ds-crdt/set.go:50:		newCRDTSet			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:73:		Add				0.0%
opentela/internal/protocol/go-ds-crdt/set.go:86:		Rmv				0.0%
opentela/internal/protocol/go-ds-crdt/set.go:133:		Element				0.0%
opentela/internal/protocol/go-ds-crdt/set.go:150:		Elements			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:251:		InSet				0.0%
opentela/internal/protocol/go-ds-crdt/set.go:259:		keyPrefix			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:264:		elemsPrefix			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:269:		tombsPrefix			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:274:		valueKey			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:279:		priorityKey			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:283:		getPriority			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:300:		setPriority			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:313:		setValue			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:359:		findBestValue			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:464:		putElems			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:510:		putTombs			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:584:		Merge				0.0%
opentela/internal/protocol/go-ds-crdt/set.go:599:		inTombsKeyID			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:620:		datastoreSync			0.0%
opentela/internal/protocol/host.go:51:				GetP2PNode			0.0%
opentela/internal/protocol/host.go:73:				newHost				0.0%
opentela/internal/protocol/host.go:288:				StartAutoReconnect		0.0%
opentela/internal/protocol/host.go:296:				startAutoReconnect		0.0%
opentela/internal/protocol/host.go:349:				tryReconnectToBootstraps	0.0%
opentela/internal/protocol/host.go:405:				waitFor				100.0%
opentela/internal/protocol/host.go:421:				backoffDelay			77.8%
opentela/internal/protocol/host.go:437:				backoffBaseDelay		90.0%
opentela/internal/protocol/host.go:457:				isTransientNetworkError		87.5%
opentela/internal/protocol/host.go:474:				newResourceManager		0.0%
opentela/internal/protocol/host.go:484:				newDHT				0.0%
opentela/internal/protocol/host.go:498:				ConnectedPeers			0.0%
opentela/internal/protocol/host.go:513:				AllPeers			0.0%
opentela/internal/protocol/host.go:527:				BuildBootstrapAddr		85.7%
opentela/internal/protocol/host.go:539:				buildPublicTCPMultiaddr		66.7%
opentela/internal/protocol/host.go:552:				publicAddrHostAndProtocol	78.6%
opentela/internal/protocol/host.go:578:				appendUniqueMultiaddrs		88.2%
opentela/internal/protocol/host.go:610:				isRecentRelayPeer		100.0%
opentela/internal/protocol/host.go:619:				ConnectedBootstraps		0.0%
opentela/internal/protocol/host.go:659:				MakeRelayReservations		0.0%
opentela/internal/protocol/host.go:697:				IsDirectlyConnected		0.0%
opentela/internal/protocol/host.go:713:				FindRelayFor			0.0%
opentela/internal/protocol/host.go:750:				GetResourceManagerStats		0.0%
opentela/internal/protocol/key.go:11:				writeKeyToFile			0.0%
opentela/internal/protocol/key.go:34:				loadKeyFromFile			0.0%
opentela/internal/protocol/node_table.go:34:			InitScalableNodeTable		0.0%
opentela/internal/protocol/node_table.go:42:			GetScalableSnapshot		0.0%
opentela/internal/protocol/node_table.go:49:			GetNodeTableWriter		0.0%
opentela/internal/protocol/node_table.go:55:			StartSWIM			0.0%
opentela/internal/protocol/node_table.go:211:			getNodeTable			100.0%
opentela/internal/protocol/node_table.go:218:			UpdateNodeTable			0.0%
opentela/internal/protocol/node_table.go:253:			MarkSelfAsBootstrap		0.0%
opentela/internal/protocol/node_table.go:279:			AnnounceLeave			0.0%
opentela/internal/protocol/node_table.go:304:			UpdateNodeTableHook		70.5%
opentela/internal/protocol/node_table.go:381:			DeleteNodeTableHook		100.0%
opentela/internal/protocol/node_table.go:388:			GetPeerFromTable		100.0%
opentela/internal/protocol/node_table.go:399:			GetConnectedPeers		0.0%
opentela/internal/protocol/node_table.go:411:			GetAllPeers			100.0%
opentela/internal/protocol/node_table.go:421:			GetService			0.0%
opentela/internal/protocol/node_table.go:438:			GetAllProviders			0.0%
opentela/internal/protocol/node_table.go:466:			InitializeMyself		0.0%
opentela/internal/protocol/node_table.go:555:			GetSelf				100.0%
opentela/internal/protocol/node_table.go:562:			SetMyselfRelayPeer		0.0%
opentela/internal/protocol/node_table.go:569:			SetMyselfForTest		100.0%
opentela/internal/protocol/node_table.go:577:			RegisterRemotePeer		0.0%
opentela/internal/protocol/nodetable/snapshot.go:55:		NewNodeTable			100.0%
opentela/internal/protocol/nodetable/snapshot.go:62:		Snapshot			100.0%
opentela/internal/protocol/nodetable/snapshot.go:67:		Store				100.0%
opentela/internal/protocol/nodetable/snapshot.go:72:		NewSnapshot			100.0%
opentela/internal/protocol/nodetable/snapshot.go:84:		Clone				100.0%
opentela/internal/protocol/nodetable/snapshot.go:100:		ApplyEvent			71.7%
opentela/internal/protocol/nodetable/snapshot.go:187:		RebuildIndexes			91.7%
opentela/internal/protocol/nodetable/writer.go:33:		init				100.0%
opentela/internal/protocol/nodetable/writer.go:52:		NewWriter			100.0%
opentela/internal/protocol/nodetable/writer.go:60:		Start				100.0%
opentela/internal/protocol/nodetable/writer.go:65:		Stop				100.0%
opentela/internal/protocol/nodetable/writer.go:71:		Send				100.0%
opentela/internal/protocol/nodetable/writer.go:79:		run				100.0%
opentela/internal/protocol/nodetable/writer.go:109:		drainAndApply			66.7%
opentela/internal/protocol/nodetable/writer.go:123:		applyBatch			100.0%
opentela/internal/protocol/registrar.go:25:			addLocalService			100.0%
opentela/internal/protocol/registrar.go:54:			snapshotLocalServices		100.0%
opentela/internal/protocol/registrar.go:62:			RegisterLocalServices		0.0%
opentela/internal/protocol/registrar.go:96:			healthCheckRemote		0.0%
opentela/internal/protocol/registrar.go:120:			registerLLMService		0.0%
opentela/internal/protocol/registrar.go:147:			provideService			0.0%
opentela/internal/protocol/registrar.go:173:			ReannounceLocalServices		0.0%
opentela/internal/protocol/swim/dissemination.go:23:		NewDisseminator			66.7%
opentela/internal/protocol/swim/dissemination.go:35:		UpdateN				100.0%
opentela/internal/protocol/swim/dissemination.go:44:		retransmitLimit			100.0%
opentela/internal/protocol/swim/dissemination.go:50:		Enqueue				88.9%
opentela/internal/protocol/swim/dissemination.go:73:		GetPiggyback			93.3%
opentela/internal/protocol/swim/dissemination.go:100:		statusPriority			50.0%
opentela/internal/protocol/swim/messages.go:58:			MarshalJSON			100.0%
opentela/internal/protocol/swim/messages.go:67:			UnmarshalJSON			87.5%
opentela/internal/protocol/swim/messages.go:87:			Marshal				28.6%
opentela/internal/protocol/swim/messages.go:109:		Unmarshal			100.0%
opentela/internal/protocol/swim/messages.go:129:		Marshal				80.0%
opentela/internal/protocol/swim/messages.go:149:		Unmarshal			76.9%
opentela/internal/protocol/swim/metrics.go:30:			init				100.0%
opentela/internal/protocol/swim/swim.go:72:			NewSWIM				100.0%
opentela/internal/protocol/swim/swim.go:86:			AddMember			100.0%
opentela/internal/protocol/swim/swim.go:97:			RemoveMember			100.0%
opentela/internal/protocol/swim/swim.go:107:			GetStatus			100.0%
opentela/internal/protocol/swim/swim.go:117:			GetIncarnation			100.0%
opentela/internal/protocol/swim/swim.go:124:			Members				100.0%
opentela/internal/protocol/swim/swim.go:136:			probeOnce			86.7%
opentela/internal/protocol/swim/swim.go:168:			processPendingProbes		94.2%
opentela/internal/protocol/swim/swim.go:273:			processSuspects			100.0%
opentela/internal/protocol/swim/swim.go:314:			HandleMessage			100.0%
opentela/internal/protocol/swim/swim.go:366:			processEvents			27.8%
opentela/internal/protocol/swim/swim.go:437:			Run				100.0%
opentela/internal/protocol/swim/swim.go:460:			updateMemberGauge		100.0%
opentela/internal/protocol/swim/swim.go:475:			Close				75.0%
opentela/internal/protocol/swim/transport.go:27:		NewLibP2PTransport		0.0%
opentela/internal/protocol/swim/transport.go:31:		send				0.0%
opentela/internal/protocol/swim/transport.go:49:		SendPing			0.0%
opentela/internal/protocol/swim/transport.go:53:		SendAck				0.0%
opentela/internal/protocol/swim/transport.go:57:		SendPingReq			0.0%
opentela/internal/protocol/swim/transport.go:62:		RegisterHandler			0.0%
opentela/internal/protocol/tombstone_compactor.go:21:		startTombstoneCompactor		0.0%
opentela/internal/protocol/tombstone_compactor.go:91:		readDurationSetting		100.0%
opentela/internal/protocol/tombstone_manager.go:24:		GetTombstoneManager		0.0%
opentela/internal/protocol/tombstone_manager.go:37:		CleanupLeftNodes		87.5%
opentela/internal/protocol/tombstone_manager.go:58:		collectCandidates		100.0%
opentela/internal/server/access_control.go:46:			resolveCallerWallet		18.2%
opentela/internal/server/access_control.go:73:			accessControlMiddleware		73.5%
opentela/internal/server/access_control.go:147:			isLibp2pRemoteAddr		100.0%
opentela/internal/server/access_control.go:153:			containsWallet			100.0%
opentela/internal/server/auth_client.go:41:			get				100.0%
opentela/internal/server/auth_client.go:51:			set				100.0%
opentela/internal/server/auth_client.go:64:			verifyBearerToken		83.3%
opentela/internal/server/auth_client.go:107:			resolveClientWallet		75.0%
opentela/internal/server/cors.go:10:				corsHeader			100.0%
opentela/internal/server/cors.go:26:				rewriteHeader			100.0%
opentela/internal/server/crdt_handler.go:12:			listPeers			100.0%
opentela/internal/server/crdt_handler.go:17:			listPeersWithStatus		100.0%
opentela/internal/server/crdt_handler.go:23:			listBootstraps			0.0%
opentela/internal/server/crdt_handler.go:28:			getResourceStats		100.0%
opentela/internal/server/crdt_handler.go:45:			updateLocal			0.0%
opentela/internal/server/crdt_handler.go:55:			deleteLocal			0.0%
opentela/internal/server/crdt_handler.go:64:			getDNT				0.0%
opentela/internal/server/health.go:9:				healthStatusCheck		100.0%
opentela/internal/server/ingest.go:38:				getIngestStats			88.9%
opentela/internal/server/p2p_listener.go:11:			P2PListener			0.0%
opentela/internal/server/proxy_handler.go:58:			init				100.0%
opentela/internal/server/proxy_handler.go:62:			getGlobalTransport		0.0%
opentela/internal/server/proxy_handler.go:77:			ErrorHandler			0.0%
opentela/internal/server/proxy_handler.go:89:			WriteHeader			0.0%
opentela/internal/server/proxy_handler.go:99:			Flush				0.0%
opentela/internal/server/proxy_handler.go:106:			P2PForwardHandler		0.0%
opentela/internal/server/proxy_handler.go:144:			ServiceForwardHandler		0.0%
opentela/internal/server/proxy_handler.go:180:			parseFallbackLevel		100.0%
opentela/internal/server/proxy_handler.go:199:			selectCandidates		100.0%
opentela/internal/server/proxy_handler.go:274:			weightedRandomSelect		81.2%
opentela/internal/server/proxy_handler.go:304:			scoreCandidates			0.0%
opentela/internal/server/proxy_handler.go:316:			excludePeers			100.0%
opentela/internal/server/proxy_handler.go:330:			shouldShedLoad			100.0%
opentela/internal/server/proxy_handler.go:339:			filterByTrust			0.0%
opentela/internal/server/proxy_handler.go:354:			GlobalServiceForwardHandler	0.0%
opentela/internal/server/ratelimit.go:27:			newRateLimiterStore		100.0%
opentela/internal/server/ratelimit.go:37:			getLimiter			100.0%
opentela/internal/server/ratelimit.go:51:			cleanup				57.1%
opentela/internal/server/ratelimit.go:66:			rateLimitMiddleware		88.9%
opentela/internal/server/registration.go:52:			challengePeer			81.8%
opentela/internal/server/registration.go:77:			registerPeer			37.9%
opentela/internal/server/registration.go:227:			StartChallengeCleanup		0.0%
opentela/internal/server/registration.go:236:			cleanExpiredChallenges		0.0%
opentela/internal/server/self_handler.go:14:			isLoopback			80.0%
opentela/internal/server/self_handler.go:24:			getSelf				100.0%
opentela/internal/server/self_handler.go:34:			signData			10.7%
opentela/internal/server/server.go:24:				StartServer			0.0%
opentela/internal/server/tracer.go:20:				initTracer			0.0%
opentela/internal/server/tracer.go:44:				IngestEvents			0.0%
opentela/internal/solana/client.go:27:				NewClient			100.0%
opentela/internal/solana/client.go:55:				call				88.2%
opentela/internal/solana/client.go:93:				HasSPLToken			100.0%
opentela/internal/solana/client.go:141:				GetBalance			88.9%
opentela/internal/solana/client.go:159:				GetBalanceSOL			75.0%
opentela/internal/solana/client.go:173:				GetTokenBalance			73.3%
opentela/internal/solana/client.go:215:				RequestAirdrop			77.8%
opentela/internal/solana/client.go:253:				SendSOL				0.0%
opentela/internal/solana/client.go:309:				getRecentBlockhash		0.0%
opentela/internal/solana/client.go:336:				buildTransferMessage		100.0%
opentela/internal/solana/client.go:387:				serializeTransaction		100.0%
opentela/internal/solana/processor.go:45:			NewPaymentProcessor		83.3%
opentela/internal/solana/processor.go:72:			ProcessUsageRecords		0.0%
opentela/internal/solana/processor.go:134:			submitPayment			0.0%
opentela/internal/solana/processor.go:223:			confirmTransaction		0.0%
opentela/internal/solana/processor.go:247:			VerifyBalance			0.0%
opentela/internal/solana/rates.go:30:				NewRateManager			100.0%
opentela/internal/solana/rates.go:38:				rateKey				100.0%
opentela/internal/solana/rates.go:44:				GetRate				100.0%
opentela/internal/solana/rates.go:72:				SetRate				100.0%
opentela/internal/solana/rates.go:79:				LoadFromConfig			87.5%
opentela/internal/solana/settlement.go:22:			SubmitSettlement		7.3%
opentela/internal/solana/spl.go:26:				FindATA				85.7%
opentela/internal/solana/spl.go:42:				findProgramAddress		57.1%
opentela/internal/solana/spl.go:57:				createProgramAddress		88.9%
opentela/internal/solana/spl.go:88:				isOnCurve			100.0%
opentela/internal/solana/spl.go:98:				BuildSPLTransferChecked		100.0%
opentela/internal/solana/spl.go:155:				BuildCreateATAInstruction	100.0%
opentela/internal/solana/spl.go:208:				SendSPLTransfer			0.0%
opentela/internal/solana/spl.go:249:				CreateATA			0.0%
opentela/internal/solana/spl.go:292:				GetSignatureStatus		0.0%
opentela/internal/solana/spl.go:331:				mustDecodeBase58		75.0%
opentela/internal/usage/aggregator.go:31:			NewAggregator			100.0%
opentela/internal/usage/aggregator.go:39:			AddRecord			100.0%
opentela/internal/usage/aggregator.go:59:			ShouldFlush			92.3%
opentela/internal/usage/aggregator.go:86:			BuildAggregate			88.9%
opentela/internal/usage/aggregator.go:114:			GetValue			83.3%
opentela/internal/usage/aggregator.go:127:			SetWindowStart			83.3%
opentela/internal/usage/crdt.go:13:				PublishAggregate		0.0%
opentela/internal/usage/crdt.go:27:				GetPeerAggregate		0.0%
opentela/internal/usage/crdt.go:46:				getAggregateKey			100.0%
opentela/internal/usage/extractor.go:15:			ExtractUsageMetrics		93.8%
opentela/internal/usage/reconciler.go:11:			ReconcileRecords		92.6%
opentela/internal/usage/store.go:17:				NewUsageStore			83.3%
opentela/internal/usage/store.go:30:				Close				100.0%
opentela/internal/usage/store.go:35:				SaveRecord			83.3%
opentela/internal/usage/store.go:49:				GetRecord			100.0%
opentela/internal/usage/store.go:73:				GetPendingRecords		88.9%
opentela/internal/usage/store.go:114:				MarkAggregated			83.3%
opentela/internal/usage/store.go:127:				SaveAggregate			83.3%
opentela/internal/usage/tracker.go:28:				InitTracker			0.0%
opentela/internal/usage/tracker.go:54:				CloseTracker			0.0%
opentela/internal/usage/tracker.go:62:				Track				0.0%
opentela/internal/usage/tracker.go:89:				GenerateRequestID		100.0%
opentela/internal/wallet/identity.go:35:			SignIdentity			75.0%
opentela/internal/wallet/identity.go:67:			VerifyIdentity			77.8%
opentela/internal/wallet/wallet.go:67:				NewWalletManager		66.7%
opentela/internal/wallet/wallet.go:98:				NewWalletManagerWithDir		66.7%
opentela/internal/wallet/wallet.go:116:				loadAccounts			94.4%
opentela/internal/wallet/wallet.go:151:				migrateLegacyWallet		87.5%
opentela/internal/wallet/wallet.go:186:				migrateLegacyDir		82.9%
opentela/internal/wallet/wallet.go:254:				saveAccounts			71.4%
opentela/internal/wallet/wallet.go:275:				Accounts			100.0%
opentela/internal/wallet/wallet.go:282:				DefaultAccount			100.0%
opentela/internal/wallet/wallet.go:291:				AddSolanaAccount		73.3%
opentela/internal/wallet/wallet.go:325:				ImportSolanaKeypair		81.8%
opentela/internal/wallet/wallet.go:375:				ExportKeypair			77.8%
opentela/internal/wallet/wallet.go:394:				ExportBase58PrivateKey		85.7%
opentela/internal/wallet/wallet.go:413:				FindByFile			100.0%
opentela/internal/wallet/wallet.go:423:				FindByPublicKey			100.0%
opentela/internal/wallet/wallet.go:433:				FindByProviderID		100.0%
opentela/internal/wallet/wallet.go:443:				WalletExists			100.0%
opentela/internal/wallet/wallet.go:451:				GetPublicKey			100.0%
opentela/internal/wallet/wallet.go:458:				GetPrivateKey			100.0%
opentela/internal/wallet/wallet.go:465:				GetWalletPath			100.0%
opentela/internal/wallet/wallet.go:472:				GetWalletType			100.0%
opentela/internal/wallet/wallet.go:481:				GetProviderID			100.0%
opentela/internal/wallet/wallet.go:490:				GetPrivateKeyBytes		77.8%
opentela/internal/wallet/wallet.go:510:				InitializeWallet		83.3%
opentela/internal/wallet/wallet.go:527:				writeSolanaKeypair		77.8%
opentela/internal/wallet/wallet.go:544:				parseSolanaKeypairJSON		100.0%
opentela/internal/wallet/wallet.go:565:				deriveProviderID		100.0%
opentela/internal/wallet/wallet.go:574:				copyDirRecursive		76.9%
opentela/plugins/webui/embed.go:12:				Static				0.0%
total:								(statements)			36.3%

Summary:

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR introduces a new otela-fleet “fleet manager” subproject under contrib/fleet_manager to deploy and manage OpenTela worker + LLM backend jobs on SLURM clusters (apptainer/enroot), including relay management and a declarative “apply” workflow.

Changes:

  • Adds a Click-based otela-fleet CLI for starting/stopping jobs, status/logs, and applying a fleet YAML.
  • Implements cluster config loading + SSH connection helpers, relay lifecycle management, job submission/cancel/list, and Jinja2 templates for SLURM + OpenTela configs.
  • Adds initial pytest coverage for cluster parsing, deploy/apply logic, templates, relay, and connection behaviors.

Reviewed changes

Copilot reviewed 31 out of 35 changed files in this pull request and generated 12 comments.

Show a summary per file
File Description
contrib/fleet_manager/tests/init.py Test package marker.
contrib/fleet_manager/tests/test_apply.py Tests parsing fleet file and diff computation.
contrib/fleet_manager/tests/test_cluster.py Tests cluster config loading, presets, and job identity.
contrib/fleet_manager/tests/test_connection.py Tests SSH connection behavior/host selection.
contrib/fleet_manager/tests/test_deploy.py Tests deploy orchestration and binary sync behavior.
contrib/fleet_manager/tests/test_relay.py Tests relay status/ensure logic.
contrib/fleet_manager/tests/test_templates.py Tests template rendering for job scripts and configs.
contrib/fleet_manager/tests/test_worker.py Tests worker job listing/cancel and exec prefix/template selection.
contrib/fleet_manager/src/fleet_manager/init.py Package marker.
contrib/fleet_manager/src/fleet_manager/apply.py Fleet reconciliation (desired vs live) and apply execution.
contrib/fleet_manager/src/fleet_manager/cli.py Click CLI entrypoint and commands.
contrib/fleet_manager/src/fleet_manager/cluster.py Cluster config schema/loading + SSH connection abstraction.
contrib/fleet_manager/src/fleet_manager/deploy.py Deployment flow (sync binary, render configs, submit jobs).
contrib/fleet_manager/src/fleet_manager/relay.py Relay process status/start/stop/ensure.
contrib/fleet_manager/src/fleet_manager/worker.py Worker job list/submit/cancel/logs + runtime template selection.
contrib/fleet_manager/src/fleet_manager/templates/init.py Jinja2 rendering helper.
contrib/fleet_manager/src/fleet_manager/templates/apptainer_single.sh.j2 Single-node apptainer SLURM job script template.
contrib/fleet_manager/src/fleet_manager/templates/apptainer_multi.sh.j2 Multi-node apptainer SLURM job script template.
contrib/fleet_manager/src/fleet_manager/templates/enroot_single.sh.j2 Single-node enroot SLURM job script template.
contrib/fleet_manager/src/fleet_manager/templates/clariden_sglang.toml.j2 Enroot EDF template for Clariden.
contrib/fleet_manager/src/fleet_manager/templates/relay.cfg.yaml.j2 Relay OpenTela config template.
contrib/fleet_manager/src/fleet_manager/templates/worker.cfg.yaml.j2 Worker OpenTela config template.
contrib/fleet_manager/src/otela_fleet.egg-info/PKG-INFO Generated packaging metadata artifact.
contrib/fleet_manager/src/otela_fleet.egg-info/SOURCES.txt Generated packaging sources list artifact.
contrib/fleet_manager/src/otela_fleet.egg-info/dependency_links.txt Generated packaging artifact.
contrib/fleet_manager/src/otela_fleet.egg-info/entry_points.txt Generated packaging artifact (console_script).
contrib/fleet_manager/src/otela_fleet.egg-info/requires.txt Generated packaging artifact (deps).
contrib/fleet_manager/src/otela_fleet.egg-info/top_level.txt Generated packaging artifact.
contrib/fleet_manager/pyproject.toml Packaging configuration, deps, and console script entry point.
contrib/fleet_manager/README.md Project-level usage and links to docs.
contrib/fleet_manager/docs/getting-started.md Installation + quickstart documentation.
contrib/fleet_manager/docs/cluster-config.md Cluster YAML schema documentation and examples.
contrib/fleet_manager/docs/fleet-apply.md Declarative apply documentation and behavior description.
contrib/fleet_manager/clusters/.gitkeep Keeps empty clusters dir in repo.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +44 to +46
for cluster, backend, cmd, preset, replicas in desired:
job_name = job_identity(backend, cmd, preset)
cluster_jobs = live_jobs.get(cluster, [])
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compute_diff() only iterates over the desired deployments, so jobs that exist in live_jobs but are removed from the fleet file will never be cancelled. This contradicts the documented behavior (“remove the entry and re-apply”) and will leave orphaned jobs running. Consider also generating cancel actions for any live OpenTela jobs whose (backend, cmd, preset) identity is not present in the desired set for that cluster (or update the docs to require replicas: 0).

Copilot uses AI. Check for mistakes.
Comment on lines +52 to +55
elif current > replicas:
excess = sorted(matching, key=lambda j: j.id, reverse=True)[: current - replicas]
for job in excess:
actions.append(Action(action="cancel", cluster=cluster, backend=backend, cmd=cmd, preset=preset, job_id=job.id))
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When scaling down, excess is chosen by sorting job.id as a string. SLURM job IDs are numeric, so lexicographic ordering can cancel the wrong “newest” job (e.g., "99" > "100"). Sort by int(job.id) (with a safe fallback) to reliably cancel the newest jobs first.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in ef00066. Changed the sort key from lambda j: j.id to lambda j: int(j.id) if j.id and str(j.id).isdigit() else 0 so SLURM job IDs are compared numerically (e.g., 100 > 99) rather than lexicographically (where "99" > "100"). Also added a test test_compute_diff_cancel_numeric_order to cover this exact edge case.

Comment on lines +79 to +82
actions = compute_diff(desired, live_jobs)
if not actions:
click.echo("\nNo changes needed. Fleet is at desired state.")
return
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

apply() returns early when actions is empty, but any SSH connections created in the earlier loop are never closed. This can leak SSH sessions and file descriptors in long-running uses. Ensure connections are closed on all return paths (e.g., via try/finally around the whole function or closing before returning).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Comment on lines +88 to +91
click.echo(f" - cancel job {a.job_id} on {a.cluster}")
if dry_run:
click.echo("\n(dry run - no changes made)")
return
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the --dry-run path, apply() returns without closing any of the per-cluster ClusterConnections opened earlier. Please close connections before returning (or use a finally block) so dry-runs don’t leak SSH sessions.

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Comment on lines +20 to +23
pattern = f"{cfg.binary_remote_path}.*relay.cfg.yaml"
out, _, code = conn.run(f'pgrep -f "{pattern}"', target="relay")
if code != 0:
return RelayStatus.STOPPED
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pgrep/pkill -f patterns are built from cfg.binary_remote_path and then wrapped in double quotes. If binary_remote_path contains ~ (as shown throughout the docs), tilde expansion will not occur inside quotes, so the pattern won’t match the actual running process (which uses an expanded absolute path). This can cause relay_status() to incorrectly report STOPPED and relay_start() to spawn duplicate relays. Consider expanding ~ to $HOME on the remote side before constructing the pattern, or match on a more stable substring (e.g., start --config .*relay.cfg.yaml).

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@copilot apply changes based on this feedback

Comment on lines +66 to +71
### Job Identity

Each deployment is identified by a hash of `backend + cmd + preset`. This means:
- Changing the command triggers a redeploy (new hash)
- Changing the preset triggers a redeploy (new hash)
- Changing only the replica count scales up/down without redeploying
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docs say the job identity hash is computed from backend + cmd + preset, but job_identity() currently hashes only cmd + preset_name (the backend is only in the prefix). Either include backend in the hash computation or adjust this documentation so users don’t assume hash changes when switching backends.

Copilot uses AI. Check for mistakes.
Comment on lines +88 to +90
```

To remove a deployment, set `replicas: 0` or remove the entry and re-apply.
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doc claims that removing a deployment entry and re-applying will remove the deployment, but the current reconciliation logic only considers deployments present in the fleet file (and won’t cancel jobs for removed entries). Either update the reconcile logic to cancel undesired jobs, or revise this statement to require replicas: 0 to remove a deployment.

Copilot uses AI. Check for mistakes.
Comment on lines +16 to +31
[project.optional-dependencies]
dev = ["pytest>=7.0"]

[project.scripts]
otela-fleet = "fleet_manager.cli:cli"

[tool.setuptools.packages.find]
where = ["src"]

[tool.setuptools.package-data]
fleet_manager = ["templates/*.j2"]

[dependency-groups]
dev = [
"pytest>=9.0.2",
]
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dev dependency for pytest is specified twice with different minimum versions ([project.optional-dependencies].dev uses pytest>=7.0, while [dependency-groups].dev uses pytest>=9.0.2). This can lead to inconsistent environments depending on installer/tooling. Consider using a single mechanism (and a single version constraint) for dev dependencies.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +10
pyproject.toml
src/fleet_manager/__init__.py
src/fleet_manager/templates/__init__.py
src/otela_fleet.egg-info/PKG-INFO
src/otela_fleet.egg-info/SOURCES.txt
src/otela_fleet.egg-info/dependency_links.txt
src/otela_fleet.egg-info/entry_points.txt
src/otela_fleet.egg-info/requires.txt
src/otela_fleet.egg-info/top_level.txt
tests/test_cluster.py No newline at end of file
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The src/otela_fleet.egg-info/* files (e.g., SOURCES.txt, PKG-INFO, etc.) are build artifacts typically generated by setuptools during packaging. Committing them tends to create noisy diffs and can go stale relative to pyproject.toml. Consider removing the .egg-info directory from version control and adding it to .gitignore, letting the build backend regenerate it as needed.

Copilot uses AI. Check for mistakes.
Comment on lines +76 to +92
try:
cfg = load_cluster(name, cluster_dir)
conn = ClusterConnection(cfg)
click.echo(f"\n=== {name} ===")
rs = relay_status(conn, cfg)
click.echo(f" Relay: {rs.value}")
jobs = worker_list(conn, target="slurm")
if jobs:
click.echo(f" Jobs ({len(jobs)}):")
for j in jobs:
click.echo(f" {j.id} {j.name} {j.state} {j.elapsed}/{j.time_limit} {j.node}")
else:
click.echo(" Jobs: none")
conn.close()
except Exception as e:
click.echo(f"\n=== {name} ===")
click.echo(f" ERROR: {e}")
Copy link

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In status(), conn.close() is only called at the end of the try block. If an exception occurs after establishing the SSH connection (e.g., during relay_status() / worker_list()), the except path doesn’t close the connection, potentially leaking SSH sessions when iterating multiple clusters. Wrap the per-cluster body in try/finally (or use a context manager) to always close the connection.

Copilot uses AI. Check for mistakes.
xzyaoi and others added 5 commits April 9, 2026 17:52
…and enhance logging

- Updated GitHub Actions workflow to use new Dockerfile for amd64 architecture and set build metadata.
- Added new Dockerfile for building signed OpenTela binaries.
- Removed obsolete deployment scripts and configuration files for Clariden and Euler.
- Updated Dockerfile to streamline the build process and ensure proper entry points.
- Enhanced logging in the protocol to provide more detailed debug information.
- Adjusted public address handling to support DNS hostnames in the network configuration.
Add ProxyChains dataclass to cluster.py with enabled/ssh_key/proxy_target/
socks_port/skip_partitions fields. When enabled and the preset partition is
not in skip_partitions, apptainer_single/multi templates open an SSH SOCKS
tunnel from inside the SLURM allocation and export HTTPS_PROXY/HTTP_PROXY/
ALL_PROXY so downstream commands (apptainer pull, HF hub, user code) have
internet access via the tunnel.

Populate contrib/fleet_manager/clusters/ with jsc, jsc-dev, euler, clariden
migrated to the preset-based schema. JSC enables proxychains by default with
develbooster and dc-gpu-devel listed in skip_partitions.

Update README to document presets, user-command env vars ($SERVICE_PORT,
$HF_HOME), multi-node conventions, and proxychains. Delete stale docs/
files that described the old model-centric layout.

52/52 tests pass.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@xzyaoi xzyaoi merged commit 696bd0b into main Apr 12, 2026
2 checks passed
@xzyaoi xzyaoi deleted the feat/fleet-manager branch April 12, 2026 19:16
@xzyaoi xzyaoi restored the feat/fleet-manager branch April 26, 2026 14:51
@xzyaoi xzyaoi deleted the feat/fleet-manager branch April 26, 2026 14:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants