feat(CubeMaster): add node label management API by devincd · Pull Request #633 · TencentCloud/CubeSandbox

devincd · 2026-06-24T07:31:36Z

feat(CubeMaster): add node label management API

Summary

Add two internal CubeMaster endpoints for administrators to add and remove custom labels on nodes:

POST /internal/meta/nodes/{node_id}/labels — merge new labels into a node
DELETE /internal/meta/nodes/{node_id}/labels?key=example.com%2Fenv — remove a label from a node

Motivation

In production deployments, cluster administrators often need to attach metadata to nodes that cubelet does not know about. Today the only way to influence scheduling is through canonical affinity keys (kubernetes.io/cpu-type, topology.kubernetes.io/zone, etc.) that cubelet derives from hardware. This covers infrastructure attributes but not operational or business attributes.

Scenario	Example Label	Purpose
Environment segregation	`env=staging` or `env=production`	Prevent staging workloads from landing on production nodes
Maintenance mode	`maintenance=planned`	Taint nodes earmarked for downtime so no new sandboxes are scheduled
Tenant / team isolation	`team=ml-platform`	Pin a team's workloads to a subset of nodes via node affinity
Custom topology	`rack=rack-3`	Express physical topology that cubelet cannot detect
Gradual rollout	`canary=true`	Limit a canary deployment to specific nodes

Multi-Master Data Synchronization

CubeSandbox supports multiple CubeMaster replicas for HA. There is no direct inter-master communication — replicas converge through shared backing stores:

Data Type	Store	Cross-Replica Mechanism	Latency
Node labels, registration, status	MySQL	`syncAllFromDB` periodic reload (default 30s)	≤ 30s
Resource metrics (CPU, memory, disk)	Redis HSET	`loopUpdateMetric` tick (default 1s)	~1s

Write path: The receiving replica wraps the read-merge-write in a DB transaction with SELECT ... FOR UPDATE to prevent concurrent label updates from silently overwriting each other. The in-memory snapshot and localcache are updated immediately so the scheduler on this replica sees the change right away. Other replicas converge within the next syncAllFromDB cycle (≤ 30s).

Cubelet interaction: When cubelet sends RegisterNode, it merges its reported labels into the existing labels_json with cubelet priority on conflicts. Admin-set labels that cubelet does not report are preserved across heartbeats; if cubelet later reports the same key, it overwrites the admin value — system-critical labels always reflect the node's actual hardware state.

Label Validation

Label key/value validation follows Kubernetes conventions (IsQualifiedName / IsValidLabelValue).

Key format: [prefix/]name

Component	Rules	Max Length	Example
`prefix`	Optional. DNS1123 subdomain: lowercase alphanumeric, `-` or `.`	253	`example.com`, `my-org.io`
`name`	Required. Qualified name: alphanumeric, `-`, `_`, or `.`; must start and end with alphanumeric	63	`env`, `my-label`, `rack_3`

Value format: empty string or qualified name (same rules as name, max 63 chars).

Note: Labels from cubelet RegisterNode are not validated — they are trusted system input. Only labels set via the admin API are subject to the constraints above.

Reserved Label Keys

Labels whose key namespace (prefix before /) matches a reserved namespace — or is a subdomain of one — cannot be created or deleted via the admin API.

Reserved Namespace	Description	Example Blocked Keys
`kubernetes.io`	Standard K8s labels, includes all subdomains	`kubernetes.io/os`, `topology.kubernetes.io/zone`, `node.kubernetes.io/instance-type`
`beta.kubernetes.io`	Legacy beta K8s labels, includes all subdomains	`beta.kubernetes.io/arch`, `failure-domain.beta.kubernetes.io/region`
`cube.cloud.tencentcloud.com`	Cube-specific labels	`cube.cloud.tencentcloud.com/instance-type`

Blocked:

# Rejected — kubernetes.io namespace is reserved
POST /internal/meta/nodes/node-1/labels
{"labels": {"kubernetes.io/os": "windows"}}

# Rejected — cannot delete a reserved label
DELETE /internal/meta/nodes/node-1/labels?key=kubernetes.io%2Fos

Allowed:

# Accepted — non-reserved key
POST /internal/meta/nodes/node-1/labels
{"labels": {"env": "production"}}

# Accepted — custom domain prefix
POST /internal/meta/nodes/node-1/labels
{"labels": {"example.com/team": "ml-platform"}}

# Accepted — delete a custom label
DELETE /internal/meta/nodes/node-1/labels?key=env

cubesandboxbot · 2026-06-24T07:37:45Z

+	existing := global.readLabelsJSON(nodeID)
+	for k, v := range labels {
+		existing[k] = v
+	}
+	if err := global.db.Table(constants.NodeMetaRegistrationTable).
+		Where("node_id = ?", nodeID).
+		Updates(map[string]interface{}{
+			"labels_json": mustJSON(existing),
+			"updated_at":  time.Now(),
+		}).Error; err != nil {
+		return err


Critical: Read-modify-write race on labels_json

UpdateNodeLabels reads labels from DB, merges in Go memory, then writes back — but there's no DB-level locking or optimistic concurrency. If two requests arrive concurrently for the same node, the second read sees stale data before the first write commits, and one update is silently lost.

Example: Request A adds env=prod, Request B adds tier=web. Both read {}, merge, write. A's write {"env":"prod"} is overwritten by B's write {"tier":"web"}.

Fix: Use an optimistic conditional update (add a WHERE labels_json = ? check with retry) or wrap the read-merge-write in a DB transaction with row-level locking (SELECT ... FOR UPDATE).

cubesandboxbot · 2026-06-24T07:37:46Z

+	existing := global.readLabelsJSON(nodeID)
+	delete(existing, key)
+	if err := global.db.Table(constants.NodeMetaRegistrationTable).
+		Where("node_id = ?", nodeID).
+		Updates(map[string]interface{}{
+			"labels_json": mustJSON(existing),
+			"updated_at":  time.Now(),
+		}).Error; err != nil {
+		return err
+	}
+
+	snap := global.ensureNode(nodeID)
+	global.mu.Lock()
+	snap.Labels = existing
+	global.mu.Unlock()
+	syncLocalcache(snap)


Critical: Same read-modify-write race as UpdateNodeLabels

DeleteNodeLabel has the identical TOCTOU issue — reads labels, deletes a key, writes back — without DB-level protection against concurrent writes.

cubesandboxbot · 2026-06-24T07:37:47Z

+func (s *service) readLabelsJSON(nodeID string) map[string]string {
+	var raw string
+	if err := s.db.Table(constants.NodeMetaRegistrationTable).
+		Select("labels_json").
+		Where("node_id = ?", nodeID).
+		Scan(&raw).Error; err != nil {
+		return map[string]string{}
+	}
+	m := map[string]string{}
+	_ = json.Unmarshal([]byte(raw), &m)
+	return m
+}


Critical: Data loss on DB read failure

If the DB read fails (connection timeout, transient error), this returns an empty map. The callers (UpdateNodeLabels, DeleteNodeLabel, RegisterNode) then write that empty map back to the database, permanently wiping all existing labels for that node.

The comment says "returns empty map if row is missing" — but it also returns empty map on DB error, which is not the documented contract and is destructive.

Fix: Either propagate the error upward ((map[string]string, error) return signature), or at minimum log the error and return a sentinel that callers check before proceeding with the write.

cubesandboxbot · 2026-06-24T07:37:48Z

+
+	snap := global.ensureNode(nodeID)
+	global.mu.Lock()
+	snap.Labels = existing


Should use cloneStringMap for consistency and safety

All other write paths (RegisterNode L241, UpdateNodeLabels L565) use cloneStringMap(existing) to give the snapshot its own copy of the map. This line assigns the map reference directly, meaning the in-memory snapshot shares the same map as the DB-read value. A concurrent read of snap.Labels (e.g., via ListNodes holding only an RLock) could observe intermediate state during a future mutation.

Fix: snap.Labels = cloneStringMap(existing)

cubesandboxbot · 2026-06-24T07:37:49Z

+	existing := global.readLabelsJSON(req.NodeID)
+	for k, v := range req.Labels {
+		existing[k] = v
+	}
+	if err := global.db.Table(constants.NodeMetaRegistrationTable).
+		Where("node_id = ?", req.NodeID).
+		Update("labels_json", mustJSON(existing)).Error; err != nil {
+		return nil, err
+	}


Cubelet bypasses reserved label check

Cubelet-supplied labels (req.Labels) are merged into the node here without any validation — no call to validateNodeLabels or IsReservedLabelKey. This means a compromised or misconfigured cubelet can set labels in reserved namespaces (e.g., kubernetes.io/os, topology.kubernetes.io/zone) that the admin API explicitly blocks.

By contrast, UpdateNodeLabels calls validateNodeLabels before writing. The two paths should apply the same validation.

Fix: Call validateNodeLabels(req.Labels) before the merge loop, or at minimum strip/block reserved-namespace keys.

cubesandboxbot · 2026-06-24T07:37:50Z

+		return
+	}
+			if err := nodemeta.UpdateNodeLabels(r.Context(), nodeID, req.Labels); err != nil {
+		writeErr(w, http.StatusOK, err)


Wrong HTTP status code on error — using 200 for all error responses

writeErr is called with http.StatusOK (200), but this is an error path. A 200 response with an error body defeats standard HTTP error handling on the client (interceptors, middleware, proxies).

Validation errors (invalid label format, reserved key) should use http.StatusBadRequest (400). DB errors should use http.StatusInternalServerError (500). The writeErr function already accepts the status code, so this is a straightforward fix.

cubesandboxbot · 2026-06-24T07:37:50Z

+func DeleteNodeLabelHandler(w http.ResponseWriter, r *http.Request) {
+	nodeID := mux.Vars(r)["node_id"]
+	key := mux.Vars(r)["key"]
+	if err := nodemeta.DeleteNodeLabel(r.Context(), nodeID, key); err != nil {


Same issue as L181 — 200 status on error

Same as UpdateNodeLabelsHandler — error paths should return 400/500 instead of 200.

cubesandboxbot · 2026-06-24T07:37:51Z

 	nodeAction          = "/nodes/{node_id}"
 	nodeStatusAction    = "/nodes/{node_id}/status"
+	nodeLabelsAction    = "/nodes/{node_id}/labels"
+	nodeLabelKeyAction  = "/nodes/{node_id}/labels/{key}"


{key} route pattern cannot match label keys containing a slash

gorilla/mux's default {key} pattern matches [^/]+ — it stops at the first /. Label keys with a prefix (e.g., example.com/env) cannot be deleted via this endpoint since the key contains a slash that would be parsed as additional path segments.

Fix: Use a query parameter approach: DELETE /nodes/{node_id}/labels?key=example.com%2Fenv, or document this limitation and provide a workaround (operators can overwrite the label value to empty via POST instead).

cubesandboxbot · 2026-06-24T08:27:31Z

TOP-LEVEL REVIEW: feat(CubeMaster) - node label management API

CRITICAL: SELECT FOR UPDATE missing from readLabelsJSONForUpdate (service.go:841).
The function promises a row-level lock in its name and docstring, but the
GORM query is a plain Take() without any locking clause. Concurrent label
writes on the same node silently lose updates.

FIX: Add the GORM locking clause to the query inside readLabelsJSONForUpdate.

HIGH: DB transaction held across Redis I/O. UpdateNodeLabels and
DeleteNodeLabel wrap in-memory state updates and cache sync inside the
DB transaction closure. RegisterNode correctly does this outside the
transaction.

MEDIUM:

mustJSON silently swallows marshal errors (service.go:1159)
HTTP 200 for client errors (meta.go:269,281)
No request body size limit (meta.go:264)
DeleteNodeLabel skips key format validation

TEST COVERAGE: Validation tests are excellent but integration tests for
DB-backed functions are missing.

cubesandboxbot · 2026-06-24T08:27:41Z

 	unlockIso := global.lockIsolationWrite(req.NodeID)
 	iso := loadIsolationFromDB(req.NodeID)

+	// Read existing labels from DB, merge cubelet labels (cubelet wins on conflict),


TOCTOU race: RegisterNode label read-modify-write is unprotected

RegisterNode reads labels with a plain SELECT (no FOR UPDATE, line 342), merges cubelet labels in Go (lines 346-348), and writes back with a bare UPDATE (line 349-353). Meanwhile UpdateNodeLabels/DeleteNodeLabel use SELECT ... FOR UPDATE inside a GORM transaction.

A concurrent operator label write can be silently overwritten by a heartbeat that holds stale data. Consider either:

Moving this read-merge-write into a transaction with FOR UPDATE, or

Introducing a per-node label write mutex (following the existing versionWriteLocks/isolationWriteLocks pattern) shared by all three code paths.

cubesandboxbot · 2026-06-24T08:27:43Z

+	if err != nil {
+		return nil, err
+	}
+	for k, v := range req.Labels {


No label validation on cubelet-supplied labels in RegisterNode

req.Labels from cubelet are merged into the persisted store without calling validateNodeLabels. This allows a compromised or misconfigured cubelet to set labels in reserved namespaces (kubernetes.io, cube.cloud.tencentcloud.com).

Consider applying the same validateNodeLabels check (including IsReservedLabelKey) on req.Labels, or at minimum stripping reserved keys before persisting.

cubesandboxbot · 2026-06-24T08:27:45Z

+// readLabelsJSONForUpdate reads the labels_json column with a row-level lock
+// (SELECT ... FOR UPDATE) inside an ongoing transaction, preventing concurrent
+// read-modify-write races on the same node's labels.
+func readLabelsJSONForUpdate(tx *gorm.DB, nodeID string) (map[string]string, error) {


Phantom snapshot: readLabelsJSONForUpdate does not error on non-existent nodes

GORM Scan() does not return ErrRecordNotFound when no rows match — it just leaves the output at its zero value. This means UpdateNodeLabels/DeleteNodeLabel with a non-existent node ID return an empty map, then call ensureNode() which creates a phantom in-memory NodeSnapshot.

The existing SetNodeIsolated (line 527-533) guards against this with an explicit Take check. Consider adding a similar node-existence check here (inside the transaction so the FOR UPDATE lock protects it).

cubesandboxbot · 2026-06-24T08:27:47Z

+		return nil, err
+	}
+	m := map[string]string{}
+	_ = json.Unmarshal([]byte(raw), &m)


Silent json.Unmarshal error discard can destroy all labels

If the labels_json column contains corrupt data (from a prior bug, manual DB edit, or partial write), this returns an empty map. Callers then merge into the empty map, marshal, and overwrite the DB — permanently destroying all existing labels.

Same pattern exists at line 842 in readLabelsJSONForUpdate. Consider at minimum logging a warning when the unmarshal fails and raw != "", so operators can detect corruption before it gets overwritten.

cubesandboxbot · 2026-06-24T08:27:48Z

+	if name == "" {
+		errs = append(errs, "name part must not be empty")


Redundant error messages when name is empty

When name == "" (e.g. key "prefix/"), the function appends "name part must not be empty" at line 936, then falls through to line 940 where the regex match on empty string fails, appending "name part a qualified name must consist of..." too. User sees a redundant combined message.

Consider returning early after the empty-name check to avoid the cascading format error.

Signed-off-by: devincd <505259926@qq.com>

- Eliminate read-modify-write race: wrap UpdateNodeLabels and DeleteNodeLabel in a DB transaction with SELECT ... FOR UPDATE (readLabelsJSONForUpdate) to prevent concurrent label updates from silently losing writes. - Fix readLabelsJSON to propagate DB errors instead of returning an empty map, so transient failures cannot accidentally clear labels. - Use cloneStringMap when assigning to snap.Labels in DeleteNodeLabel, consistent with RegisterNode and UpdateNodeLabels, to avoid shared map reference between snapshot and DB-read value. - Change DELETE label route from path param to query parameter (?key=example.com%2Fenv) so namespaced keys containing '/' can be deleted. - Remove unused nodeLabelKeyAction constant and NodeLabelKeyAction accessor. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com> Signed-off-by: devincd <505259926@qq.com>

…educe cascading validation errors - Wrap RegisterNode label read-merge-write in a DB transaction with SELECT ... FOR UPDATE, consistent with UpdateNodeLabels and DeleteNodeLabel, preventing concurrent admin label writes from being lost during cubelet registration. - In isQualifiedLabelKey, chain the regex check with else-if after the empty-name check so that an empty name produces a single "name part must not be empty" error instead of a cascading format error. Signed-off-by: devincd <505259926@qq.com>

cubesandboxbot · 2026-06-24T08:52:23Z

-	// otherwise let this stale read clobber a just-applied cordon until the
-	// next reconcile tick).
+	// SetNodeIsolated cannot land its DB+memory write in between.
 	unlockIso := global.lockIsolationWrite(req.NodeID)


⚠️ Mutex leak: unlockIso() never called when the labels transaction fails.

On line 337, unlockIso := global.lockIsolationWrite(req.NodeID) acquires a per-node mutex. If the global.db.Transaction(...) on line 344 returns an error, line 360 (return nil, err) returns without calling unlockIso(). The mutex is only released on line 386 in the success path.

With MySQL FOR UPDATE timeouts, deadlock detection, or transient connection drops, the per-node sync.Mutex stays locked forever. Every subsequent RegisterNode or SetNodeIsolated for this nodeID blocks indefinitely on lockIsolationWrite. In a production cluster, a single timeout permanently wedges all isolation/label operations for that node.

Fix: Move unlockIso() into a defer right after acquisition:

unlockIso := global.lockIsolationWrite(req.NodeID) defer unlockIso()

cubesandboxbot · 2026-06-24T08:52:25Z

+		snap := global.ensureNode(nodeID)
+		global.mu.Lock()
+		snap.Labels = cloneStringMap(existing)
+		global.mu.Unlock()
+		syncLocalcache(snap)
+		return nil
+	})


syncLocalcache called inside DB transaction — inconsistent with RegisterNode pattern

Holding the FOR UPDATE row lock across syncLocalcache (in-memory cache operations) is unnecessary — the lock only needs to cover the read-modify-write of labels_json. If anything inside syncLocalcache ever grows to touch I/O (Redis, another DB call), this silently extends the lock duration.

RegisterNode correctly calls syncLocalcache outside the transaction on line 385. Follow the same pattern here: move lines 794-798 outside the transaction closure.

cubesandboxbot · 2026-06-24T08:52:27Z

+		snap := global.ensureNode(nodeID)
+		global.mu.Lock()
+		snap.Labels = cloneStringMap(existing)
+		global.mu.Unlock()
+		syncLocalcache(snap)
+		return nil
+	})


Same issue as UpdateNodeLabels — syncLocalcache inside the transaction.

Move syncLocalcache (lines 829-833) outside the transaction closure, like RegisterNode does on line 385.

cubesandboxbot · 2026-06-24T08:52:29Z

+func DeleteNodeLabel(ctx context.Context, nodeID, key string) error {
+	if nodeID == "" {
+		return fmt.Errorf("node_id is required")
+	}
+	if key == "" {
+		return fmt.Errorf("label key is required")
+	}
+	if config.IsReservedLabelKey(key) {
+		return fmt.Errorf("label key %q is reserved for system use and cannot be deleted", key)
+	}


Missing key format validation in DeleteNodeLabel

The function checks key == "" and IsReservedLabelKey(key), but does NOT validate the key's format via isQualifiedLabelKey. By contrast, UpdateNodeLabels calls validateNodeLabels which includes isQualifiedLabelKey for every key.

This means a caller can DELETE /.../labels?key=invalid@format or DELETE /.../labels?key=a/b/c. Both enter the DB transaction, perform a no-op delete on the Go map (safe on absent keys), and return success — a silent no-op that's indistinguishable from an actual deletion.

Add the same format check:

if errs := isQualifiedLabelKey(key); len(errs) != 0 { return fmt.Errorf("label key %q is invalid: %s", key, strings.Join(errs, ", ")) }

cubesandboxbot · 2026-06-24T08:52:31Z

+func readLabelsJSONForUpdate(tx *gorm.DB, nodeID string) (map[string]string, error) {
+	var raw string
+	if err := tx.Raw(
+		"SELECT labels_json FROM "+constants.NodeMetaRegistrationTable+" WHERE node_id = ? FOR UPDATE",
+		nodeID,
+	).Scan(&raw).Error; err != nil {
+		return nil, err
+	}
+	m := map[string]string{}
+	err := json.Unmarshal([]byte(raw), &m)
+	if err != nil {
+		return nil, err
+	}
+	return m, nil
+}


Misleading error when node doesn't exist

If the queried nodeID doesn't exist, GORM's Scan leaves raw as "" (zero value) and returns nil (no rows is not an error for Scan). Then json.Unmarshal([]byte(""), &m) fails with "unexpected end of JSON input" — a confusing error that bubbles up to the API caller as a JSON parse error rather than "node not found".

Use .Row() + row.Scan() to get sql.ErrNoRows, or add an explicit check:

if raw == "" { return nil, fmt.Errorf("node %q not found", nodeID) }

cubesandboxbot · 2026-06-24T08:52:34Z

+func UpdateNodeLabelsHandler(w http.ResponseWriter, r *http.Request) {
+	nodeID := mux.Vars(r)["node_id"]
+	req := &nodemeta.UpdateNodeLabelsRequest{}
+	if err := common.GetBodyReq(r, req); err != nil {
+		writeErr(w, http.StatusBadRequest, err)
+		return
+	}
+	if err := nodemeta.UpdateNodeLabels(r.Context(), nodeID, req.Labels); err != nil {
+		writeErr(w, http.StatusOK, err)
+		return
+	}
+	common.WriteResponse(w, http.StatusOK, &sandboxtypes.Res{
+		Ret: successRet(),
+	})
+}


Unbounded request body: no size limit on labels payload

common.GetBodyReq calls io.Copy(buffer, r.Body) with no cap — it reads the entire body into memory. Combined with the map[string]string having no entry limit, a caller could send tens of thousands of labels. This payload is processed inside the DB transaction that acquires SELECT ... FOR UPDATE (service.go:777-778), meaning a large body holds the row lock for the entire parse + process duration, starving cubelet registrations on the same node.

Recommendations:

Add r.Body = http.MaxBytesReader(w, r.Body, 1<<20) before GetBodyReq (1 MB limit)

Cap the labels map count in validateNodeLabels (e.g., 64 entries, matching typical K8s label cardinality)

…odes Use Take() instead of Raw+Scan so that gorm.ErrRecordNotFound is returned when the node does not exist, preventing ensureNode from creating a phantom in-memory snapshot. Consistent with the existing guard in SetNodeIsolated. Signed-off-by: devincd <505259926@qq.com>

cubesandboxbot · 2026-06-24T09:13:01Z

+	var reg models.NodeRegistration
+	if err := tx.Table(constants.NodeMetaRegistrationTable).
+		Where("node_id = ?", nodeID).
+		Take(&reg).Error; err != nil {


Critical: The function name and docstring (lines 838-840) promise SELECT ... FOR UPDATE, but the implementation issues a plain SELECT ... LIMIT 1 with no locking clause. GORM requires an explicit .Clauses(clause.Locking{Strength: "UPDATE"}) to append FOR UPDATE. Without it, concurrent UpdateNodeLabels/DeleteNodeLabel calls on the same node can race: both transactions read the same baseline labels_json, merge independently, and the second UPDATE silently overwrites the first — a classic lost-update bug.

This affects all three call sites: RegisterNode (line 345), UpdateNodeLabels (line 778), and DeleteNodeLabel (line 815).

Fix:

tx.Clauses(clause.Locking{Strength: "UPDATE"}). Table(constants.NodeMetaRegistrationTable). Where("node_id = ?", nodeID). Take(&reg)

cubesandboxbot · 2026-06-24T09:13:02Z

+			}).Error; err != nil {
+			return err
+		}
+


High: The DB transaction wraps both global.mu.Lock() (in-memory snapshot update) and syncLocalcache(snap) (Redis I/O), keeping the MySQL connection and row lock held across network round-trips. If Redis is slow or many concurrent admin operations hit, this can exhaust the DB connection pool.

RegisterNode (lines 344-384) demonstrates the correct pattern: commit the transaction first, then update in-memory state outside it. Consider restructuring UpdateNodeLabels and DeleteNodeLabel the same way.

cubesandboxbot · 2026-06-24T09:13:03Z

+		}
+		if err := tx.Table(constants.NodeMetaRegistrationTable).
+			Where("node_id = ?", nodeID).
+			Updates(map[string]interface{}{


Medium: This silently returns an empty string on marshal failure. If json.Marshal fails on the labels map (unlikely but possible with unexpected data), labels_json is overwritten with "", permanently losing all labels for that node with no error logged. Either log the error or propagate it so the transaction rolls back.

Same issue at line 823 (DeleteNodeLabel).

cubesandboxbot · 2026-06-24T09:13:04Z

+		return
+	}
+	if err := nodemeta.UpdateNodeLabels(r.Context(), nodeID, req.Labels); err != nil {
+		writeErr(w, http.StatusOK, err)


Medium: Validation errors (empty node_id, invalid label format, reserved key) all return HTTP 200. Clients must parse the response body to detect errors. Consider returning http.StatusBadRequest for client-side validation failures. Using http.StatusOK for errors violates REST conventions and makes API client error handling fragile.

cubesandboxbot · 2026-06-24T09:13:05Z

+func DeleteNodeLabelHandler(w http.ResponseWriter, r *http.Request) {
+	nodeID := mux.Vars(r)["node_id"]
+	key := r.URL.Query().Get("key")
+	if err := nodemeta.DeleteNodeLabel(r.Context(), nodeID, key); err != nil {


Medium: Same HTTP 200 issue as UpdateNodeLabelsHandler. Also, DeleteNodeLabel doesn't validate the key format (unlike UpdateNodeLabels which calls validateNodeLabels) — an operator could DELETE with an invalid key and silently get a no-op + wasted DB write. Consider passing the key through isQualifiedLabelKey (minus the reserved check which is already done above) for consistency.

- Add maxLabelsPerNode=64 constant to validation block - Check label count in validateNodeLabels before iterating keys - Check merged label count after merge in UpdateNodeLabels to prevent exceeding the limit when adding new labels - Add test for too many labels rejection Signed-off-by: devincd <505259926@qq.com>

chenhengqi · 2026-06-25T02:39:57Z

+		ns = prefix
+	}
+	for _, reserved := range reservedLabelNamespaces {
+		if ns == reserved || strings.HasSuffix(ns, "."+reserved) {


strings.HasSuffix()?

ns is the prefix part of the key (e.g. topology.kubernetes.io for key topology.kubernetes.io/zone). strings.HasSuffix(ns, "."+reserved) correctly matches subdomains of a reserved namespace — e.g. ns="topology.kubernetes.io" has suffix ".kubernetes.io", so it's recognized as a subdomain of kubernetes.io. This follows the same pattern K8s uses in its NodeRestriction admission plugin, where a key like failure-domain.beta.kubernetes.io/region is blocked because failure-domain.beta.kubernetes.io is a subdomain of the reserved beta.kubernetes.io.

fslongjin · 2026-06-25T03:47:23Z

Thanks for adding the node label management API — the backend design (validation, reserved keys, multi-master sync) looks solid.

One follow-up request: please consider integrating this into the WebUI so cluster admins can view and manage node labels without calling the internal API directly.

Suggested scope for the UI (can be in this PR or a follow-up — either works):

Display labels on the existing Nodes list / Node detail pages (web/src/pages/Nodes.tsx, NodeDetail.tsx), distinguishing system labels (from cubelet) vs admin-set custom labels if feasible
Manage labels — add/edit via merge (POST) and remove (DELETE with ?key=) for non-reserved keys, with client-side validation aligned with the K8s label rules documented in this PR

This would make operational scenarios like env=production, maintenance=planned, and team=ml-platform much easier to adopt. Happy to review a WebUI PR whenever it's ready.

devincd · 2026-06-25T06:07:04Z

Thanks for adding the node label management API — the backend design (validation, reserved keys, multi-master sync) looks solid.

One follow-up request: please consider integrating this into the WebUI so cluster admins can view and manage node labels without calling the internal API directly.

Suggested scope for the UI (can be in this PR or a follow-up — either works):

Display labels on the existing Nodes list / Node detail pages (web/src/pages/Nodes.tsx, NodeDetail.tsx), distinguishing system labels (from cubelet) vs admin-set custom labels if feasible

Manage labels — add/edit via merge (POST) and remove (DELETE with ?key=) for non-reserved keys, with client-side validation aligned with the K8s label rules documented in this PR

This would make operational scenarios like env=production, maintenance=planned, and team=ml-platform much easier to adopt. Happy to review a WebUI PR whenever it's ready.

@fslongjin
Thanks for the suggestion! I agree that exposing this through the WebUI would make the feature much more practical for day-to-day cluster management.

I'd be happy to work on the WebUI integration. I'll take a look at the existing Nodes pages and prepare a follow-up PR that includes label display, add/edit/delete operations, and client-side validation aligned with the Kubernetes label rules.

cubesandboxbot Bot reviewed Jun 24, 2026

View reviewed changes

devincd force-pushed the feat/node-labels branch from 7851539 to a1333d4 Compare June 24, 2026 07:44

cubesandboxbot Bot reviewed Jun 24, 2026

View reviewed changes

devincd and others added 2 commits June 24, 2026 16:34

feat(CubeMaster): add node label management API

b012260

Signed-off-by: devincd <505259926@qq.com>

devincd force-pushed the feat/node-labels branch from 389fc03 to 6fc141a Compare June 24, 2026 08:34

cubesandboxbot Bot reviewed Jun 24, 2026

View reviewed changes

devincd marked this pull request as ready for review June 24, 2026 13:32

devincd requested review from chenhengqi, fslongjin, ls-ggg, tinklone and up2wing as code owners June 24, 2026 13:32

chenhengqi reviewed Jun 25, 2026

View reviewed changes

		if name == "" {
		errs = append(errs, "name part must not be empty")

Uh oh!

Conversation

devincd commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

feat(CubeMaster): add node label management API

Summary

Motivation

Multi-Master Data Synchronization

Label Validation

Reserved Label Keys

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot commented Jun 24, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

cubesandboxbot Bot Jun 24, 2026

Choose a reason for hiding this comment

Uh oh!

devincd commented Jun 24, 2026 •

edited

Loading

cubesandboxbot Bot commented Jun 24, 2026 •

edited

Loading