Implement otela-fleet project with CLI, templates, and deployment by xzyaoi · Pull Request #54 · eth-easl/OpenTela

xzyaoi · 2026-04-09T12:26:13Z

No description provided.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…c_prefix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…t single) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…lica support Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ity hashing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…mmands Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

- Fix double #SBATCH prefix in extra_sbatch template rendering - Use unique job script filenames to avoid race condition with replicas - Add job_identity unit test - Update template test to match extra_sbatch convention Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

- Add README.md and docs/ (getting-started, cluster-config, fleet-apply) - Add `otela-fleet clusters` and `otela-fleet presets <cluster>` commands - Default config dir: ./clusters or ~/.config/opentela/fleet/clusters/ Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Clariden now has a persistent relay at 148.187.108.172, so relay startup is skipped and workers bootstrap directly from it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

github-actions · 2026-04-09T12:28:06Z

Test Coverage Report 📊

Click to view detailed coverage

opentela/entry/cmd/init.go:103:					writeDefaultConfig		0.0%
opentela/entry/cmd/root.go:34:					init				100.0%
opentela/entry/cmd/root.go:70:					configFilePath			85.7%
opentela/entry/cmd/root.go:85:					initConfig			88.4%
opentela/entry/cmd/root.go:215:					Execute				0.0%
opentela/entry/cmd/start.go:40:					init				100.0%
opentela/entry/cmd/update.go:13:				doUpdate			0.0%
opentela/entry/cmd/wallet.go:388:				init				100.0%
opentela/entry/main.go:18:					main				0.0%
opentela/internal/attestation/attestation.go:43:		loadPubKey			53.8%
opentela/internal/attestation/attestation.go:65:		attestationMessage		100.0%
opentela/internal/attestation/attestation.go:72:		Verify				75.0%
opentela/internal/attestation/attestation.go:96:		Sign				100.0%
opentela/internal/attestation/cmd/buildsign/main.go:22:		main				0.0%
opentela/internal/attestation/cmd/buildsign/main.go:39:		keygen				0.0%
opentela/internal/attestation/cmd/buildsign/main.go:51:		sign				0.0%
opentela/internal/common/constants.go:10:			GetHomePath			63.6%
opentela/internal/common/constants.go:27:			GetDBPath			100.0%
opentela/internal/common/filesystem.go:7:			RemoveDir			100.0%
opentela/internal/common/logger.go:11:				init				100.0%
opentela/internal/common/logger.go:15:				InitLogger			50.0%
opentela/internal/common/logger.go:55:				ReportError			50.0%
opentela/internal/common/process/manager.go:13:			NewProcessManager		100.0%
opentela/internal/common/process/manager.go:20:			StartProcess			0.0%
opentela/internal/common/process/manager.go:27:			StopAllProcesses		0.0%
opentela/internal/common/process/manager.go:33:			StartCriticalProcess		57.1%
opentela/internal/common/process/manager.go:44:			HealthCheck			50.0%
opentela/internal/common/process/process.go:36:			NewProcess			100.0%
opentela/internal/common/process/process.go:63:			Start				94.4%
opentela/internal/common/process/process.go:93:			SetTimeout			66.7%
opentela/internal/common/process/process.go:101:		Wait				100.0%
opentela/internal/common/process/process.go:105:		awaitOutput			100.0%
opentela/internal/common/process/process.go:111:		Kill				100.0%
opentela/internal/common/process/process.go:120:		OpenInputStream			71.4%
opentela/internal/common/process/process.go:133:		StreamOutput			77.8%
opentela/internal/common/process/process.go:151:		finishTimeOutOrDie		87.5%
opentela/internal/common/process/process.go:169:		cleanup				100.0%
opentela/internal/common/process/process.go:181:		isRunning			0.0%
opentela/internal/common/requests.go:12:			RemoteGET			52.6%
opentela/internal/common/serialization.go:5:			DictionaryToBytes		100.0%
opentela/internal/common/utils.go:3:				DeduplicateStrings		100.0%
opentela/internal/ingest/server.go:40:				Run				0.0%
opentela/internal/ingest/server.go:70:				collectStats			0.0%
opentela/internal/metrics/collector.go:20:			NewAggregatedCollector		100.0%
opentela/internal/metrics/collector.go:38:			Describe			0.0%
opentela/internal/metrics/collector.go:42:			Collect				88.9%
opentela/internal/metrics/collector.go:58:			SetNetworkStats			100.0%
opentela/internal/metrics/collector.go:63:			SetScraperTargets		100.0%
opentela/internal/metrics/collector.go:67:			metricFromDTO			91.7%
opentela/internal/metrics/peer_provider.go:15:			GetScrapablePeers		0.0%
opentela/internal/metrics/peer_provider.go:42:			extractServices			100.0%
opentela/internal/metrics/peer_provider.go:57:			buildPeerLabels			100.0%
opentela/internal/metrics/relabeler.go:12:			Relabel				100.0%
opentela/internal/metrics/relabeler.go:53:			sanitizeMetricName		75.0%
opentela/internal/metrics/scraper.go:54:			NewMetricsScraper		100.0%
opentela/internal/metrics/scraper.go:83:			Start				0.0%
opentela/internal/metrics/scraper.go:99:			Stop				0.0%
opentela/internal/metrics/scraper.go:103:			GetCachedMetrics		100.0%
opentela/internal/metrics/scraper.go:114:			GetSelfMetrics			0.0%
opentela/internal/metrics/scraper.go:118:			scrapeAll			100.0%
opentela/internal/metrics/scraper.go:145:			scrapePeer			72.7%
opentela/internal/metrics/scraper.go:166:			scrapeTarget			84.6%
opentela/internal/metrics/scraper.go:189:			parseMetrics			84.6%
opentela/internal/metrics/scraper.go:210:			evictStale			85.7%
opentela/internal/platform/gpu.go:10:				GetGPUInfo			75.0%
opentela/internal/platform/gpu.go:18:				getNvidiaGPUs			26.3%
opentela/internal/platform/gpu.go:48:				getAMDGPUs			12.5%
opentela/internal/platform/gpu.go:117:				csvField			100.0%
opentela/internal/platform/slurm/env.go:10:			IsSlurm				100.0%
opentela/internal/platform/slurm/env.go:14:			getJobId			100.0%
opentela/internal/platform/slurm/env.go:18:			getNodeId			100.0%
opentela/internal/platform/slurm/env.go:22:			getRemainingTimeInSeconds	44.4%
opentela/internal/platform/slurm/env.go:43:			GetJobInfo			100.0%
opentela/internal/protocol/bootstrap.go:17:			getDefaultBootstrapPeers	77.8%
opentela/internal/protocol/bootstrap.go:49:			collectBootstrapSources		92.3%
opentela/internal/protocol/bootstrap.go:71:			resolveBootstrapSources		90.0%
opentela/internal/protocol/bootstrap.go:88:			resolveBootstrapSource		85.7%
opentela/internal/protocol/bootstrap.go:104:			fetchHTTPBootstraps		80.0%
opentela/internal/protocol/bootstrap.go:120:			fetchDNSAddrBootstraps		77.8%
opentela/internal/protocol/bootstrap.go:147:			expandBootstrapValue		54.5%
opentela/internal/protocol/bootstrap.go:167:			splitBootstrapValue		100.0%
opentela/internal/protocol/bootstrap.go:174:			parseBootstrapMultiaddrs	75.0%
opentela/internal/protocol/clock.go:20:				StartTicker			0.0%
opentela/internal/protocol/crdt.go:33:				GetCRDTStore			0.0%
opentela/internal/protocol/crdt.go:183:				Reconnect			0.0%
opentela/internal/protocol/crdt.go:199:				ClearCRDTStore			0.0%
opentela/internal/protocol/go-ds-crdt/compaction.go:16:		decodeTombstoneTimestamp	0.0%
opentela/internal/protocol/go-ds-crdt/compaction.go:32:		CompactTombstones		0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:114:		verify				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:147:		DefaultOptions			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:231:		New				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:370:		handleNext			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:461:		decodeBroadcast			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:492:		encodeBroadcast			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:501:		randomizeInterval		0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:510:		rebroadcast			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:527:		repair				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:555:		rebroadcastHeads		0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:588:		logStats			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:614:		handleBlock			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:632:		handleBranch			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:650:		dagWorker			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:690:		sendNewJobs			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:775:		sendJobWorker			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:791:		processedBlockKey		0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:795:		isProcessed			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:799:		markProcessed			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:803:		dirtyKey			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:808:		MarkDirty			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:817:		IsDirty				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:826:		MarkClean			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:836:		processNode			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:959:		repairDAG			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1072:		Repair				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1078:		Get				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1086:		Has				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1093:		GetSize				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1108:		Query				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1117:		Put				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1123:		Delete				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1137:		Sync				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1181:		Context				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1186:		Close				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1198:		Batch				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1202:		deltaMerge			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1215:		addToDelta			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1221:		rmvToDelta			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1232:		updateDeltaWithRemove		0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1254:		updateDelta			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1265:		publishDelta			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1276:		putBlock			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1295:		publish				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1307:		addDAGNode			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1349:		broadcast			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1383:		Put				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1395:		Delete				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1409:		Commit				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1415:		PrintDAG			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1434:		printDAGRec			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1495:		DotDAG				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1523:		dotDAGRec			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1576:		InternalStats			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1591:		newCidSafeSet			0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1597:		Visit				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1610:		Remove				0.0%
opentela/internal/protocol/go-ds-crdt/crdt.go:1618:		Has				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:29:		newHeads			0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:42:		key				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:47:		write				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:56:		delete				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:68:		IsHead				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:79:		Len				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:90:		Replace				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:134:		Add				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:149:		List				0.0%
opentela/internal/protocol/go-ds-crdt/heads.go:176:		primeCache			0.0%
opentela/internal/protocol/go-ds-crdt/ipld.go:23:		GetDelta			0.0%
opentela/internal/protocol/go-ds-crdt/ipld.go:33:		GetPriority			0.0%
opentela/internal/protocol/go-ds-crdt/ipld.go:48:		GetDeltas			0.0%
opentela/internal/protocol/go-ds-crdt/ipld.go:72:		extractDelta			0.0%
opentela/internal/protocol/go-ds-crdt/ipld.go:82:		makeNode			0.0%
opentela/internal/protocol/go-ds-crdt/migrations.go:18:		versionKey			0.0%
opentela/internal/protocol/go-ds-crdt/migrations.go:22:		getVersion			0.0%
opentela/internal/protocol/go-ds-crdt/migrations.go:39:		setVersion			0.0%
opentela/internal/protocol/go-ds-crdt/migrations.go:50:		applyMigrations			0.0%
opentela/internal/protocol/go-ds-crdt/migrations.go:81:		migrate0to1			0.0%
opentela/internal/protocol/go-ds-crdt/pubsub_broadcaster.go:28:	NewPubSubBroadcaster		0.0%
opentela/internal/protocol/go-ds-crdt/pubsub_broadcaster.go:60:	Broadcast			0.0%
opentela/internal/protocol/go-ds-crdt/pubsub_broadcaster.go:65:	Next				0.0%
opentela/internal/protocol/go-ds-crdt/set.go:50:		newCRDTSet			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:73:		Add				0.0%
opentela/internal/protocol/go-ds-crdt/set.go:86:		Rmv				0.0%
opentela/internal/protocol/go-ds-crdt/set.go:133:		Element				0.0%
opentela/internal/protocol/go-ds-crdt/set.go:150:		Elements			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:251:		InSet				0.0%
opentela/internal/protocol/go-ds-crdt/set.go:259:		keyPrefix			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:264:		elemsPrefix			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:269:		tombsPrefix			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:274:		valueKey			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:279:		priorityKey			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:283:		getPriority			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:300:		setPriority			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:313:		setValue			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:359:		findBestValue			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:464:		putElems			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:510:		putTombs			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:584:		Merge				0.0%
opentela/internal/protocol/go-ds-crdt/set.go:599:		inTombsKeyID			0.0%
opentela/internal/protocol/go-ds-crdt/set.go:620:		datastoreSync			0.0%
opentela/internal/protocol/host.go:51:				GetP2PNode			0.0%
opentela/internal/protocol/host.go:73:				newHost				0.0%
opentela/internal/protocol/host.go:288:				StartAutoReconnect		0.0%
opentela/internal/protocol/host.go:296:				startAutoReconnect		0.0%
opentela/internal/protocol/host.go:349:				tryReconnectToBootstraps	0.0%
opentela/internal/protocol/host.go:405:				waitFor				100.0%
opentela/internal/protocol/host.go:421:				backoffDelay			77.8%
opentela/internal/protocol/host.go:437:				backoffBaseDelay		90.0%
opentela/internal/protocol/host.go:457:				isTransientNetworkError		87.5%
opentela/internal/protocol/host.go:474:				newResourceManager		0.0%
opentela/internal/protocol/host.go:484:				newDHT				0.0%
opentela/internal/protocol/host.go:498:				ConnectedPeers			0.0%
opentela/internal/protocol/host.go:513:				AllPeers			0.0%
opentela/internal/protocol/host.go:527:				BuildBootstrapAddr		85.7%
opentela/internal/protocol/host.go:539:				buildPublicTCPMultiaddr		66.7%
opentela/internal/protocol/host.go:552:				publicAddrHostAndProtocol	78.6%
opentela/internal/protocol/host.go:578:				appendUniqueMultiaddrs		88.2%
opentela/internal/protocol/host.go:610:				isRecentRelayPeer		100.0%
opentela/internal/protocol/host.go:619:				ConnectedBootstraps		0.0%
opentela/internal/protocol/host.go:659:				MakeRelayReservations		0.0%
opentela/internal/protocol/host.go:697:				IsDirectlyConnected		0.0%
opentela/internal/protocol/host.go:713:				FindRelayFor			0.0%
opentela/internal/protocol/host.go:750:				GetResourceManagerStats		0.0%
opentela/internal/protocol/key.go:11:				writeKeyToFile			0.0%
opentela/internal/protocol/key.go:34:				loadKeyFromFile			0.0%
opentela/internal/protocol/node_table.go:34:			InitScalableNodeTable		0.0%
opentela/internal/protocol/node_table.go:42:			GetScalableSnapshot		0.0%
opentela/internal/protocol/node_table.go:49:			GetNodeTableWriter		0.0%
opentela/internal/protocol/node_table.go:55:			StartSWIM			0.0%
opentela/internal/protocol/node_table.go:211:			getNodeTable			100.0%
opentela/internal/protocol/node_table.go:218:			UpdateNodeTable			0.0%
opentela/internal/protocol/node_table.go:253:			MarkSelfAsBootstrap		0.0%
opentela/internal/protocol/node_table.go:279:			AnnounceLeave			0.0%
opentela/internal/protocol/node_table.go:304:			UpdateNodeTableHook		70.5%
opentela/internal/protocol/node_table.go:381:			DeleteNodeTableHook		100.0%
opentela/internal/protocol/node_table.go:388:			GetPeerFromTable		100.0%
opentela/internal/protocol/node_table.go:399:			GetConnectedPeers		0.0%
opentela/internal/protocol/node_table.go:411:			GetAllPeers			100.0%
opentela/internal/protocol/node_table.go:421:			GetService			0.0%
opentela/internal/protocol/node_table.go:438:			GetAllProviders			0.0%
opentela/internal/protocol/node_table.go:466:			InitializeMyself		0.0%
opentela/internal/protocol/node_table.go:555:			GetSelf				100.0%
opentela/internal/protocol/node_table.go:562:			SetMyselfRelayPeer		0.0%
opentela/internal/protocol/node_table.go:569:			SetMyselfForTest		100.0%
opentela/internal/protocol/node_table.go:577:			RegisterRemotePeer		0.0%
opentela/internal/protocol/nodetable/snapshot.go:55:		NewNodeTable			100.0%
opentela/internal/protocol/nodetable/snapshot.go:62:		Snapshot			100.0%
opentela/internal/protocol/nodetable/snapshot.go:67:		Store				100.0%
opentela/internal/protocol/nodetable/snapshot.go:72:		NewSnapshot			100.0%
opentela/internal/protocol/nodetable/snapshot.go:84:		Clone				100.0%
opentela/internal/protocol/nodetable/snapshot.go:100:		ApplyEvent			71.7%
opentela/internal/protocol/nodetable/snapshot.go:187:		RebuildIndexes			91.7%
opentela/internal/protocol/nodetable/writer.go:33:		init				100.0%
opentela/internal/protocol/nodetable/writer.go:52:		NewWriter			100.0%
opentela/internal/protocol/nodetable/writer.go:60:		Start				100.0%
opentela/internal/protocol/nodetable/writer.go:65:		Stop				100.0%
opentela/internal/protocol/nodetable/writer.go:71:		Send				100.0%
opentela/internal/protocol/nodetable/writer.go:79:		run				100.0%
opentela/internal/protocol/nodetable/writer.go:109:		drainAndApply			66.7%
opentela/internal/protocol/nodetable/writer.go:123:		applyBatch			100.0%
opentela/internal/protocol/registrar.go:25:			addLocalService			100.0%
opentela/internal/protocol/registrar.go:54:			snapshotLocalServices		100.0%
opentela/internal/protocol/registrar.go:62:			RegisterLocalServices		0.0%
opentela/internal/protocol/registrar.go:96:			healthCheckRemote		0.0%
opentela/internal/protocol/registrar.go:120:			registerLLMService		0.0%
opentela/internal/protocol/registrar.go:147:			provideService			0.0%
opentela/internal/protocol/registrar.go:173:			ReannounceLocalServices		0.0%
opentela/internal/protocol/swim/dissemination.go:23:		NewDisseminator			66.7%
opentela/internal/protocol/swim/dissemination.go:35:		UpdateN				100.0%
opentela/internal/protocol/swim/dissemination.go:44:		retransmitLimit			100.0%
opentela/internal/protocol/swim/dissemination.go:50:		Enqueue				88.9%
opentela/internal/protocol/swim/dissemination.go:73:		GetPiggyback			93.3%
opentela/internal/protocol/swim/dissemination.go:100:		statusPriority			50.0%
opentela/internal/protocol/swim/messages.go:58:			MarshalJSON			100.0%
opentela/internal/protocol/swim/messages.go:67:			UnmarshalJSON			87.5%
opentela/internal/protocol/swim/messages.go:87:			Marshal				28.6%
opentela/internal/protocol/swim/messages.go:109:		Unmarshal			100.0%
opentela/internal/protocol/swim/messages.go:129:		Marshal				80.0%
opentela/internal/protocol/swim/messages.go:149:		Unmarshal			76.9%
opentela/internal/protocol/swim/metrics.go:30:			init				100.0%
opentela/internal/protocol/swim/swim.go:72:			NewSWIM				100.0%
opentela/internal/protocol/swim/swim.go:86:			AddMember			100.0%
opentela/internal/protocol/swim/swim.go:97:			RemoveMember			100.0%
opentela/internal/protocol/swim/swim.go:107:			GetStatus			100.0%
opentela/internal/protocol/swim/swim.go:117:			GetIncarnation			100.0%
opentela/internal/protocol/swim/swim.go:124:			Members				100.0%
opentela/internal/protocol/swim/swim.go:136:			probeOnce			86.7%
opentela/internal/protocol/swim/swim.go:168:			processPendingProbes		94.2%
opentela/internal/protocol/swim/swim.go:273:			processSuspects			100.0%
opentela/internal/protocol/swim/swim.go:314:			HandleMessage			100.0%
opentela/internal/protocol/swim/swim.go:366:			processEvents			27.8%
opentela/internal/protocol/swim/swim.go:437:			Run				100.0%
opentela/internal/protocol/swim/swim.go:460:			updateMemberGauge		100.0%
opentela/internal/protocol/swim/swim.go:475:			Close				75.0%
opentela/internal/protocol/swim/transport.go:27:		NewLibP2PTransport		0.0%
opentela/internal/protocol/swim/transport.go:31:		send				0.0%
opentela/internal/protocol/swim/transport.go:49:		SendPing			0.0%
opentela/internal/protocol/swim/transport.go:53:		SendAck				0.0%
opentela/internal/protocol/swim/transport.go:57:		SendPingReq			0.0%
opentela/internal/protocol/swim/transport.go:62:		RegisterHandler			0.0%
opentela/internal/protocol/tombstone_compactor.go:21:		startTombstoneCompactor		0.0%
opentela/internal/protocol/tombstone_compactor.go:91:		readDurationSetting		100.0%
opentela/internal/protocol/tombstone_manager.go:24:		GetTombstoneManager		0.0%
opentela/internal/protocol/tombstone_manager.go:37:		CleanupLeftNodes		87.5%
opentela/internal/protocol/tombstone_manager.go:58:		collectCandidates		100.0%
opentela/internal/server/access_control.go:46:			resolveCallerWallet		18.2%
opentela/internal/server/access_control.go:73:			accessControlMiddleware		73.5%
opentela/internal/server/access_control.go:147:			isLibp2pRemoteAddr		100.0%
opentela/internal/server/access_control.go:153:			containsWallet			100.0%
opentela/internal/server/auth_client.go:41:			get				100.0%
opentela/internal/server/auth_client.go:51:			set				100.0%
opentela/internal/server/auth_client.go:64:			verifyBearerToken		83.3%
opentela/internal/server/auth_client.go:107:			resolveClientWallet		75.0%
opentela/internal/server/cors.go:10:				corsHeader			100.0%
opentela/internal/server/cors.go:26:				rewriteHeader			100.0%
opentela/internal/server/crdt_handler.go:12:			listPeers			100.0%
opentela/internal/server/crdt_handler.go:17:			listPeersWithStatus		100.0%
opentela/internal/server/crdt_handler.go:23:			listBootstraps			0.0%
opentela/internal/server/crdt_handler.go:28:			getResourceStats		100.0%
opentela/internal/server/crdt_handler.go:45:			updateLocal			0.0%
opentela/internal/server/crdt_handler.go:55:			deleteLocal			0.0%
opentela/internal/server/crdt_handler.go:64:			getDNT				0.0%
opentela/internal/server/health.go:9:				healthStatusCheck		100.0%
opentela/internal/server/ingest.go:38:				getIngestStats			88.9%
opentela/internal/server/p2p_listener.go:11:			P2PListener			0.0%
opentela/internal/server/proxy_handler.go:58:			init				100.0%
opentela/internal/server/proxy_handler.go:62:			getGlobalTransport		0.0%
opentela/internal/server/proxy_handler.go:77:			ErrorHandler			0.0%
opentela/internal/server/proxy_handler.go:89:			WriteHeader			0.0%
opentela/internal/server/proxy_handler.go:99:			Flush				0.0%
opentela/internal/server/proxy_handler.go:106:			P2PForwardHandler		0.0%
opentela/internal/server/proxy_handler.go:144:			ServiceForwardHandler		0.0%
opentela/internal/server/proxy_handler.go:180:			parseFallbackLevel		100.0%
opentela/internal/server/proxy_handler.go:199:			selectCandidates		100.0%
opentela/internal/server/proxy_handler.go:274:			weightedRandomSelect		81.2%
opentela/internal/server/proxy_handler.go:304:			scoreCandidates			0.0%
opentela/internal/server/proxy_handler.go:316:			excludePeers			100.0%
opentela/internal/server/proxy_handler.go:330:			shouldShedLoad			100.0%
opentela/internal/server/proxy_handler.go:339:			filterByTrust			0.0%
opentela/internal/server/proxy_handler.go:354:			GlobalServiceForwardHandler	0.0%
opentela/internal/server/ratelimit.go:27:			newRateLimiterStore		100.0%
opentela/internal/server/ratelimit.go:37:			getLimiter			100.0%
opentela/internal/server/ratelimit.go:51:			cleanup				57.1%
opentela/internal/server/ratelimit.go:66:			rateLimitMiddleware		88.9%
opentela/internal/server/registration.go:52:			challengePeer			81.8%
opentela/internal/server/registration.go:77:			registerPeer			37.9%
opentela/internal/server/registration.go:227:			StartChallengeCleanup		0.0%
opentela/internal/server/registration.go:236:			cleanExpiredChallenges		0.0%
opentela/internal/server/self_handler.go:14:			isLoopback			80.0%
opentela/internal/server/self_handler.go:24:			getSelf				100.0%
opentela/internal/server/self_handler.go:34:			signData			10.7%
opentela/internal/server/server.go:24:				StartServer			0.0%
opentela/internal/server/tracer.go:20:				initTracer			0.0%
opentela/internal/server/tracer.go:44:				IngestEvents			0.0%
opentela/internal/solana/client.go:27:				NewClient			100.0%
opentela/internal/solana/client.go:55:				call				88.2%
opentela/internal/solana/client.go:93:				HasSPLToken			100.0%
opentela/internal/solana/client.go:141:				GetBalance			88.9%
opentela/internal/solana/client.go:159:				GetBalanceSOL			75.0%
opentela/internal/solana/client.go:173:				GetTokenBalance			73.3%
opentela/internal/solana/client.go:215:				RequestAirdrop			77.8%
opentela/internal/solana/client.go:253:				SendSOL				0.0%
opentela/internal/solana/client.go:309:				getRecentBlockhash		0.0%
opentela/internal/solana/client.go:336:				buildTransferMessage		100.0%
opentela/internal/solana/client.go:387:				serializeTransaction		100.0%
opentela/internal/solana/processor.go:45:			NewPaymentProcessor		83.3%
opentela/internal/solana/processor.go:72:			ProcessUsageRecords		0.0%
opentela/internal/solana/processor.go:134:			submitPayment			0.0%
opentela/internal/solana/processor.go:223:			confirmTransaction		0.0%
opentela/internal/solana/processor.go:247:			VerifyBalance			0.0%
opentela/internal/solana/rates.go:30:				NewRateManager			100.0%
opentela/internal/solana/rates.go:38:				rateKey				100.0%
opentela/internal/solana/rates.go:44:				GetRate				100.0%
opentela/internal/solana/rates.go:72:				SetRate				100.0%
opentela/internal/solana/rates.go:79:				LoadFromConfig			87.5%
opentela/internal/solana/settlement.go:22:			SubmitSettlement		7.3%
opentela/internal/solana/spl.go:26:				FindATA				85.7%
opentela/internal/solana/spl.go:42:				findProgramAddress		57.1%
opentela/internal/solana/spl.go:57:				createProgramAddress		88.9%
opentela/internal/solana/spl.go:88:				isOnCurve			100.0%
opentela/internal/solana/spl.go:98:				BuildSPLTransferChecked		100.0%
opentela/internal/solana/spl.go:155:				BuildCreateATAInstruction	100.0%
opentela/internal/solana/spl.go:208:				SendSPLTransfer			0.0%
opentela/internal/solana/spl.go:249:				CreateATA			0.0%
opentela/internal/solana/spl.go:292:				GetSignatureStatus		0.0%
opentela/internal/solana/spl.go:331:				mustDecodeBase58		75.0%
opentela/internal/usage/aggregator.go:31:			NewAggregator			100.0%
opentela/internal/usage/aggregator.go:39:			AddRecord			100.0%
opentela/internal/usage/aggregator.go:59:			ShouldFlush			92.3%
opentela/internal/usage/aggregator.go:86:			BuildAggregate			88.9%
opentela/internal/usage/aggregator.go:114:			GetValue			83.3%
opentela/internal/usage/aggregator.go:127:			SetWindowStart			83.3%
opentela/internal/usage/crdt.go:13:				PublishAggregate		0.0%
opentela/internal/usage/crdt.go:27:				GetPeerAggregate		0.0%
opentela/internal/usage/crdt.go:46:				getAggregateKey			100.0%
opentela/internal/usage/extractor.go:15:			ExtractUsageMetrics		93.8%
opentela/internal/usage/reconciler.go:11:			ReconcileRecords		92.6%
opentela/internal/usage/store.go:17:				NewUsageStore			83.3%
opentela/internal/usage/store.go:30:				Close				100.0%
opentela/internal/usage/store.go:35:				SaveRecord			83.3%
opentela/internal/usage/store.go:49:				GetRecord			100.0%
opentela/internal/usage/store.go:73:				GetPendingRecords		88.9%
opentela/internal/usage/store.go:114:				MarkAggregated			83.3%
opentela/internal/usage/store.go:127:				SaveAggregate			83.3%
opentela/internal/usage/tracker.go:28:				InitTracker			0.0%
opentela/internal/usage/tracker.go:54:				CloseTracker			0.0%
opentela/internal/usage/tracker.go:62:				Track				0.0%
opentela/internal/usage/tracker.go:89:				GenerateRequestID		100.0%
opentela/internal/wallet/identity.go:35:			SignIdentity			75.0%
opentela/internal/wallet/identity.go:67:			VerifyIdentity			77.8%
opentela/internal/wallet/wallet.go:67:				NewWalletManager		66.7%
opentela/internal/wallet/wallet.go:98:				NewWalletManagerWithDir		66.7%
opentela/internal/wallet/wallet.go:116:				loadAccounts			94.4%
opentela/internal/wallet/wallet.go:151:				migrateLegacyWallet		87.5%
opentela/internal/wallet/wallet.go:186:				migrateLegacyDir		82.9%
opentela/internal/wallet/wallet.go:254:				saveAccounts			71.4%
opentela/internal/wallet/wallet.go:275:				Accounts			100.0%
opentela/internal/wallet/wallet.go:282:				DefaultAccount			100.0%
opentela/internal/wallet/wallet.go:291:				AddSolanaAccount		73.3%
opentela/internal/wallet/wallet.go:325:				ImportSolanaKeypair		81.8%
opentela/internal/wallet/wallet.go:375:				ExportKeypair			77.8%
opentela/internal/wallet/wallet.go:394:				ExportBase58PrivateKey		85.7%
opentela/internal/wallet/wallet.go:413:				FindByFile			100.0%
opentela/internal/wallet/wallet.go:423:				FindByPublicKey			100.0%
opentela/internal/wallet/wallet.go:433:				FindByProviderID		100.0%
opentela/internal/wallet/wallet.go:443:				WalletExists			100.0%
opentela/internal/wallet/wallet.go:451:				GetPublicKey			100.0%
opentela/internal/wallet/wallet.go:458:				GetPrivateKey			100.0%
opentela/internal/wallet/wallet.go:465:				GetWalletPath			100.0%
opentela/internal/wallet/wallet.go:472:				GetWalletType			100.0%
opentela/internal/wallet/wallet.go:481:				GetProviderID			100.0%
opentela/internal/wallet/wallet.go:490:				GetPrivateKeyBytes		77.8%
opentela/internal/wallet/wallet.go:510:				InitializeWallet		83.3%
opentela/internal/wallet/wallet.go:527:				writeSolanaKeypair		77.8%
opentela/internal/wallet/wallet.go:544:				parseSolanaKeypairJSON		100.0%
opentela/internal/wallet/wallet.go:565:				deriveProviderID		100.0%
opentela/internal/wallet/wallet.go:574:				copyDirRecursive		76.9%
opentela/plugins/webui/embed.go:12:				Static				0.0%
total:								(statements)			36.3%

Summary:

Copilot

Pull request overview

This PR introduces a new otela-fleet “fleet manager” subproject under contrib/fleet_manager to deploy and manage OpenTela worker + LLM backend jobs on SLURM clusters (apptainer/enroot), including relay management and a declarative “apply” workflow.

Changes:

Adds a Click-based otela-fleet CLI for starting/stopping jobs, status/logs, and applying a fleet YAML.
Implements cluster config loading + SSH connection helpers, relay lifecycle management, job submission/cancel/list, and Jinja2 templates for SLURM + OpenTela configs.
Adds initial pytest coverage for cluster parsing, deploy/apply logic, templates, relay, and connection behaviors.

Reviewed changes

Copilot reviewed 31 out of 35 changed files in this pull request and generated 12 comments.

Show a summary per file

File	Description
contrib/fleet_manager/tests/init.py	Test package marker.
contrib/fleet_manager/tests/test_apply.py	Tests parsing fleet file and diff computation.
contrib/fleet_manager/tests/test_cluster.py	Tests cluster config loading, presets, and job identity.
contrib/fleet_manager/tests/test_connection.py	Tests SSH connection behavior/host selection.
contrib/fleet_manager/tests/test_deploy.py	Tests deploy orchestration and binary sync behavior.
contrib/fleet_manager/tests/test_relay.py	Tests relay status/ensure logic.
contrib/fleet_manager/tests/test_templates.py	Tests template rendering for job scripts and configs.
contrib/fleet_manager/tests/test_worker.py	Tests worker job listing/cancel and exec prefix/template selection.
contrib/fleet_manager/src/fleet_manager/init.py	Package marker.
contrib/fleet_manager/src/fleet_manager/apply.py	Fleet reconciliation (desired vs live) and apply execution.
contrib/fleet_manager/src/fleet_manager/cli.py	Click CLI entrypoint and commands.
contrib/fleet_manager/src/fleet_manager/cluster.py	Cluster config schema/loading + SSH connection abstraction.
contrib/fleet_manager/src/fleet_manager/deploy.py	Deployment flow (sync binary, render configs, submit jobs).
contrib/fleet_manager/src/fleet_manager/relay.py	Relay process status/start/stop/ensure.
contrib/fleet_manager/src/fleet_manager/worker.py	Worker job list/submit/cancel/logs + runtime template selection.
contrib/fleet_manager/src/fleet_manager/templates/init.py	Jinja2 rendering helper.
contrib/fleet_manager/src/fleet_manager/templates/apptainer_single.sh.j2	Single-node apptainer SLURM job script template.
contrib/fleet_manager/src/fleet_manager/templates/apptainer_multi.sh.j2	Multi-node apptainer SLURM job script template.
contrib/fleet_manager/src/fleet_manager/templates/enroot_single.sh.j2	Single-node enroot SLURM job script template.
contrib/fleet_manager/src/fleet_manager/templates/clariden_sglang.toml.j2	Enroot EDF template for Clariden.
contrib/fleet_manager/src/fleet_manager/templates/relay.cfg.yaml.j2	Relay OpenTela config template.
contrib/fleet_manager/src/fleet_manager/templates/worker.cfg.yaml.j2	Worker OpenTela config template.
contrib/fleet_manager/src/otela_fleet.egg-info/PKG-INFO	Generated packaging metadata artifact.
contrib/fleet_manager/src/otela_fleet.egg-info/SOURCES.txt	Generated packaging sources list artifact.
contrib/fleet_manager/src/otela_fleet.egg-info/dependency_links.txt	Generated packaging artifact.
contrib/fleet_manager/src/otela_fleet.egg-info/entry_points.txt	Generated packaging artifact (console_script).
contrib/fleet_manager/src/otela_fleet.egg-info/requires.txt	Generated packaging artifact (deps).
contrib/fleet_manager/src/otela_fleet.egg-info/top_level.txt	Generated packaging artifact.
contrib/fleet_manager/pyproject.toml	Packaging configuration, deps, and console script entry point.
contrib/fleet_manager/README.md	Project-level usage and links to docs.
contrib/fleet_manager/docs/getting-started.md	Installation + quickstart documentation.
contrib/fleet_manager/docs/cluster-config.md	Cluster YAML schema documentation and examples.
contrib/fleet_manager/docs/fleet-apply.md	Declarative apply documentation and behavior description.
contrib/fleet_manager/clusters/.gitkeep	Keeps empty clusters dir in repo.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2026-04-09T12:31:20Z

+    for cluster, backend, cmd, preset, replicas in desired:
+        job_name = job_identity(backend, cmd, preset)
+        cluster_jobs = live_jobs.get(cluster, [])


compute_diff() only iterates over the desired deployments, so jobs that exist in live_jobs but are removed from the fleet file will never be cancelled. This contradicts the documented behavior (“remove the entry and re-apply”) and will leave orphaned jobs running. Consider also generating cancel actions for any live OpenTela jobs whose (backend, cmd, preset) identity is not present in the desired set for that cluster (or update the docs to require replicas: 0).

Copilot · 2026-04-09T12:31:20Z

+        elif current > replicas:
+            excess = sorted(matching, key=lambda j: j.id, reverse=True)[: current - replicas]
+            for job in excess:
+                actions.append(Action(action="cancel", cluster=cluster, backend=backend, cmd=cmd, preset=preset, job_id=job.id))


When scaling down, excess is chosen by sorting job.id as a string. SLURM job IDs are numeric, so lexicographic ordering can cancel the wrong “newest” job (e.g., "99" > "100"). Sort by int(job.id) (with a safe fallback) to reliably cancel the newest jobs first.

@copilot apply changes based on this feedback

Fixed in ef00066. Changed the sort key from lambda j: j.id to lambda j: int(j.id) if j.id and str(j.id).isdigit() else 0 so SLURM job IDs are compared numerically (e.g., 100 > 99) rather than lexicographically (where "99" > "100"). Also added a test test_compute_diff_cancel_numeric_order to cover this exact edge case.

Copilot · 2026-04-09T12:31:20Z

+    actions = compute_diff(desired, live_jobs)
+    if not actions:
+        click.echo("\nNo changes needed. Fleet is at desired state.")
+        return


apply() returns early when actions is empty, but any SSH connections created in the earlier loop are never closed. This can leak SSH sessions and file descriptors in long-running uses. Ensure connections are closed on all return paths (e.g., via try/finally around the whole function or closing before returning).

@copilot apply changes based on this feedback

Copilot · 2026-04-09T12:31:21Z

+            click.echo(f"  - cancel job {a.job_id} on {a.cluster}")
+    if dry_run:
+        click.echo("\n(dry run - no changes made)")
+        return


In the --dry-run path, apply() returns without closing any of the per-cluster ClusterConnections opened earlier. Please close connections before returning (or use a finally block) so dry-runs don’t leak SSH sessions.

@copilot apply changes based on this feedback

Copilot · 2026-04-09T12:31:21Z

+    pattern = f"{cfg.binary_remote_path}.*relay.cfg.yaml"
+    out, _, code = conn.run(f'pgrep -f "{pattern}"', target="relay")
+    if code != 0:
+        return RelayStatus.STOPPED


pgrep/pkill -f patterns are built from cfg.binary_remote_path and then wrapped in double quotes. If binary_remote_path contains ~ (as shown throughout the docs), tilde expansion will not occur inside quotes, so the pattern won’t match the actual running process (which uses an expanded absolute path). This can cause relay_status() to incorrectly report STOPPED and relay_start() to spawn duplicate relays. Consider expanding ~ to $HOME on the remote side before constructing the pattern, or match on a more stable substring (e.g., start --config .*relay.cfg.yaml).

@copilot apply changes based on this feedback

Copilot · 2026-04-09T12:31:22Z

+### Job Identity
+
+Each deployment is identified by a hash of `backend + cmd + preset`. This means:
+- Changing the command triggers a redeploy (new hash)
+- Changing the preset triggers a redeploy (new hash)
+- Changing only the replica count scales up/down without redeploying


The docs say the job identity hash is computed from backend + cmd + preset, but job_identity() currently hashes only cmd + preset_name (the backend is only in the prefix). Either include backend in the hash computation or adjust this documentation so users don’t assume hash changes when switching backends.

Copilot · 2026-04-09T12:31:22Z

+```
+
+To remove a deployment, set `replicas: 0` or remove the entry and re-apply.


This doc claims that removing a deployment entry and re-applying will remove the deployment, but the current reconciliation logic only considers deployments present in the fleet file (and won’t cancel jobs for removed entries). Either update the reconcile logic to cancel undesired jobs, or revise this statement to require replicas: 0 to remove a deployment.

Copilot · 2026-04-09T12:31:22Z

+[project.optional-dependencies]
+dev = ["pytest>=7.0"]
+
+[project.scripts]
+otela-fleet = "fleet_manager.cli:cli"
+
+[tool.setuptools.packages.find]
+where = ["src"]
+
+[tool.setuptools.package-data]
+fleet_manager = ["templates/*.j2"]
+
+[dependency-groups]
+dev = [
+    "pytest>=9.0.2",
+]


The dev dependency for pytest is specified twice with different minimum versions ([project.optional-dependencies].dev uses pytest>=7.0, while [dependency-groups].dev uses pytest>=9.0.2). This can lead to inconsistent environments depending on installer/tooling. Consider using a single mechanism (and a single version constraint) for dev dependencies.

Copilot · 2026-04-09T12:31:23Z

+pyproject.toml
+src/fleet_manager/__init__.py
+src/fleet_manager/templates/__init__.py
+src/otela_fleet.egg-info/PKG-INFO
+src/otela_fleet.egg-info/SOURCES.txt
+src/otela_fleet.egg-info/dependency_links.txt
+src/otela_fleet.egg-info/entry_points.txt
+src/otela_fleet.egg-info/requires.txt
+src/otela_fleet.egg-info/top_level.txt
+tests/test_cluster.py


The src/otela_fleet.egg-info/* files (e.g., SOURCES.txt, PKG-INFO, etc.) are build artifacts typically generated by setuptools during packaging. Committing them tends to create noisy diffs and can go stale relative to pyproject.toml. Consider removing the .egg-info directory from version control and adding it to .gitignore, letting the build backend regenerate it as needed.

Copilot · 2026-04-09T12:31:23Z

+        try:
+            cfg = load_cluster(name, cluster_dir)
+            conn = ClusterConnection(cfg)
+            click.echo(f"\n=== {name} ===")
+            rs = relay_status(conn, cfg)
+            click.echo(f"  Relay: {rs.value}")
+            jobs = worker_list(conn, target="slurm")
+            if jobs:
+                click.echo(f"  Jobs ({len(jobs)}):")
+                for j in jobs:
+                    click.echo(f"    {j.id}  {j.name}  {j.state}  {j.elapsed}/{j.time_limit}  {j.node}")
+            else:
+                click.echo("  Jobs: none")
+            conn.close()
+        except Exception as e:
+            click.echo(f"\n=== {name} ===")
+            click.echo(f"  ERROR: {e}")


In status(), conn.close() is only called at the end of the try block. If an exception occurs after establishing the SSH connection (e.g., during relay_status() / worker_list()), the except path doesn’t close the connection, potentially leaking SSH sessions when iterating multiple clusters. Wrap the per-cluster body in try/finally (or use a context manager) to always close the connection.

Agent-Logs-Url: https://github.com/eth-easl/OpenTela/sessions/12e7506b-7a54-4197-80a3-da0be219e533 Co-authored-by: xzyaoi <10499297+xzyaoi@users.noreply.github.com>

…and enhance logging - Updated GitHub Actions workflow to use new Dockerfile for amd64 architecture and set build metadata. - Added new Dockerfile for building signed OpenTela binaries. - Removed obsolete deployment scripts and configuration files for Clariden and Euler. - Updated Dockerfile to streamline the build process and ensure proper entry points. - Enhanced logging in the protocol to provide more detailed debug information. - Adjusted public address handling to support DNS hostnames in the network configuration.

…o feat/fleet-manager

Add ProxyChains dataclass to cluster.py with enabled/ssh_key/proxy_target/ socks_port/skip_partitions fields. When enabled and the preset partition is not in skip_partitions, apptainer_single/multi templates open an SSH SOCKS tunnel from inside the SLURM allocation and export HTTPS_PROXY/HTTP_PROXY/ ALL_PROXY so downstream commands (apptainer pull, HF hub, user code) have internet access via the tunnel. Populate contrib/fleet_manager/clusters/ with jsc, jsc-dev, euler, clariden migrated to the preset-based schema. JSC enables proxychains by default with develbooster and dc-gpu-devel listed in skip_partitions. Update README to document presets, user-command env vars ($SERVICE_PORT, $HF_HOME), multi-node conventions, and proxychains. Delete stale docs/ files that described the old model-centric layout. 52/52 tests pass. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

xzyaoi and others added 13 commits April 9, 2026 10:51

feat(fleet): scaffold otela-fleet project in contrib/

519988b

feat(fleet): add ClusterConfig with Preset dataclass and job_identity

42bcb44

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(fleet): port relay management and config templates

63fdfe6

feat(fleet): add worker with generic template selection and build_exe…

6146428

…c_prefix Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(fleet): add generic job templates (apptainer single/multi, enroo…

9d62e40

…t single) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(fleet): add deploy pipeline with preset resolution and multi-rep…

795573a

…lica support Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(fleet): add fleet apply with new deployment format and job ident…

1afbddf

…ity hashing Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(fleet): add otela-fleet CLI with start/stop/status/logs/apply co…

64793d8

…mmands Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(fleet): port connection tests and EDF template

3eb2edc

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

docs(fleet): add Euler, JSC, and Clariden cluster config examples

6c9e20d

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

docs(fleet): update Clariden example with long-running relay node

98aa912

Clariden now has a persistent relay at 148.187.108.172, so relay startup is skipped and workers bootstrap directly from it. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot AI review requested due to automatic review settings April 9, 2026 12:26

Copilot started reviewing on behalf of xzyaoi April 9, 2026 12:26 View session

Copilot AI reviewed Apr 9, 2026

View reviewed changes

Copilot started work on behalf of xzyaoi April 9, 2026 12:43 View session

fix: sort SLURM job IDs numerically when selecting excess jobs to cancel

ef00066

Agent-Logs-Url: https://github.com/eth-easl/OpenTela/sessions/12e7506b-7a54-4197-80a3-da0be219e533 Co-authored-by: xzyaoi <10499297+xzyaoi@users.noreply.github.com>

Copilot finished work on behalf of xzyaoi April 9, 2026 12:51

xzyaoi and others added 5 commits April 9, 2026 17:52

baremetal docker serving docs

e0295cc

Merge branch 'feat/fleet-manager' of github.com:eth-easl/OpenTela int…

2014e57

…o feat/fleet-manager

update deploy playbook

fb1a13e

xzyaoi merged commit 696bd0b into main Apr 12, 2026
2 checks passed

xzyaoi deleted the feat/fleet-manager branch April 12, 2026 19:16

xzyaoi restored the feat/fleet-manager branch April 26, 2026 14:51

xzyaoi deleted the feat/fleet-manager branch April 26, 2026 14:52

		```

		To remove a deployment, set `replicas: 0` or remove the entry and re-apply.

Conversation

xzyaoi commented Apr 9, 2026

Uh oh!

github-actions Bot commented Apr 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Test Coverage Report 📊

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

xzyaoi Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

xzyaoi Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

xzyaoi Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

xzyaoi Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Apr 9, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

github-actions Bot commented Apr 9, 2026 •

edited

Loading