Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -200,6 +200,7 @@ karpenter-api: $(CONTROLLER_GEN) $(YQ)
karpenter-operator/hack/adjust-cel.sh
$(CONTROLLER_GEN) $(CRD_OPTIONS) paths="./api/karpenter/..." output:crd:artifacts:config=karpenter-operator/controllers/karpenter/assets
cp karpenter-operator/controllers/karpenter/assets/karpenter.hypershift.openshift.io_openshiftec2nodeclasses.yaml karpenter-operator/controllers/karpenter/assets/zz_generated.crd-manifests/openshiftec2nodeclasses.crd.yaml
$(GO) run ./hack/kubelet-ratcheting-gen/main.go

.PHONY: control-plane-operator
control-plane-operator:
Expand Down
21 changes: 21 additions & 0 deletions api/.golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -319,6 +319,27 @@ linters:
- kubeapilinter
path: hypershift/v1beta1/openstack.go
text: 'nobools: field OpenStackPlatformSpec.DisableExternalNetwork pointer should not use a bool. Use a string type with meaningful constant values as an enum.'
- linters:
- kubeapilinter
path: karpenter/v1/kubelet_config.go
text: 'nobools: field KubeletConfiguration.CPUCFSQuota pointer should not use a bool. Use a string type with meaningful constant values as an enum.'

- linters:
- kubeapilinter
path: karpenter/v1/karpenter_types.go
text: 'minlength: field OpenshiftEC2NodeClassSpec.Kubelet type KubeletConfiguration must have a minimum properties, add kubebuilder:validation:MinProperties marker'
- linters:
- kubeapilinter
path: karpenter/v1/karpenter_types.go
text: 'optionalfields: field OpenshiftEC2NodeClassSpec.Kubelet has a valid zero value'
- linters:
- kubeapilinter
path: karpenter/v1/kubelet_config.go
text: 'nodurations: field KubeletConfiguration.EvictionSoftGracePeriod map value should not use a Duration. Use an integer type with units in the name to avoid the need for clients to implement Go style duration parsing.'
- linters:
- kubeapilinter
path: karpenter/v1/kubelet_config.go
text: 'nomaps: field KubeletConfiguration\.(SystemReserved|KubeReserved|EvictionHard|EvictionSoft|EvictionSoftGracePeriod)'

# statussubresource (1 issue)
- linters:
Expand Down
27 changes: 27 additions & 0 deletions api/AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,33 @@ To avoid introducing new dependencies, do not add utils or methods to the API ty

- **Ratcheting validation**: when adding new validation to existing fields, verify that existing clusters with values that predate the new validation can still be updated. CRD validation ratchets (allows unchanged invalid values through), but only for fields that are literally unchanged in the update.

### KubeletConfiguration Field Graduation (Karpenter)

The `KubeletConfiguration` type in `api/karpenter/v1/kubelet_config.go` accepts arbitrary kubelet
configuration fields via a `PreserveUnknownFields` schema and a `runtime.RawExtension` overflow
mechanism. A subset of these fields are promoted to explicitly typed struct fields with CEL validation.

When graduating a field from overflow to a typed struct field:

- **Match upstream kubelet's field name and JSON tag exactly.** The primary compatibility target is
`k8s.io/kubelet/config/v1beta1.KubeletConfiguration`. Users today set kubelet fields that land in
our overflow using kubelet field names. Changing a name would be a breaking API change requiring a
new API version, because existing serialized data (and user-supplied YAML) uses the kubelet name.
- **Use upstream Karpenter as a secondary reference.** Upstream Karpenter
(`github.com/aws/karpenter-provider-aws/pkg/apis/v1.KubeletConfiguration`) currently mirrors
kubelet field names. If Karpenter ever diverges from kubelet naming, follow kubelet and add a
translation layer in `karpenterKubeletConfigurationFromNodeClassSpec()`.
- **Match the Go type from upstream kubelet/Karpenter.** Use the same Go type (pointer vs value, map
key/value types) to keep the mapping trivial and avoid serialization incompatibilities. Minor
differences (e.g., bare `int32` vs `*int32`) are acceptable only when they don't change the JSON
representation.
- **Add stricter CEL validation as needed.** Admission-time validation does not affect serialization
compatibility. We can add constraints beyond what upstream provides (e.g., `Minimum=1` instead of
`Minimum=0`, cross-field rules like `imageGCHigh > imageGCLow`) without breaking the API.
- **Do not rename fields for OpenShift conventions.** Normally OpenShift API conventions may prefer
different naming. For KubeletConfiguration fields, upstream kubelet compatibility takes precedence
over convention to avoid breaking changes.

## API Type Change Guidelines

### N-1 and N+1 Compatibility
Expand Down
5 changes: 5 additions & 0 deletions api/karpenter/v1/karpenter_types.go
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,11 @@ type OpenshiftEC2NodeClassSpec struct {
// +kubebuilder:validation:MaxLength=64
// +optional
Version string `json:"version,omitempty"`

// kubelet configures kubelet settings for nodes provisioned by this NodeClass.
// These settings are injected into the node's ignition configuration via MachineConfig.
// +optional
Kubelet KubeletConfiguration `json:"kubelet,omitzero"`
}

// SubnetSelectorTerm defines selection logic for a subnet used by Karpenter to launch nodes.
Expand Down
219 changes: 219 additions & 0 deletions api/karpenter/v1/kubelet_config.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,219 @@
package v1

import (
"encoding/json"
"reflect"
"strings"

"k8s.io/apimachinery/pkg/runtime"
)

// kubeletConfigKnownFields is the set of JSON keys corresponding to the explicitly typed
// fields in KubeletConfiguration. It is derived from the struct's json tags at init time
// so it stays in sync automatically when fields are added or removed.
var kubeletConfigKnownFields map[string]struct{}

func init() {
t := reflect.TypeOf(KubeletConfiguration{})
kubeletConfigKnownFields = make(map[string]struct{}, t.NumField())
for i := range t.NumField() {
f := t.Field(i)
if tag, ok := f.Tag.Lookup("json"); ok {
name, _, _ := strings.Cut(tag, ",")
if name != "" && name != "-" {
kubeletConfigKnownFields[name] = struct{}{}
}
}
}
}

// EvictionThreshold is a threshold value for a kubelet eviction signal.
// Values are either a percentage (e.g. "10%") or a Kubernetes quantity (e.g. "100Mi").
// +kubebuilder:validation:MaxLength=64
type EvictionThreshold string

// KubeletConfiguration configures kubelet settings for nodes provisioned by this NodeClass.
// These settings are injected into the node's ignition configuration via MachineConfig.
// The fields listed below are validated at admission time. Additional kubelet configuration
// fields beyond those listed here are also accepted and will be passed through to the node's
// kubelet configuration without validation. Overflow fields bypass all CRD validation;
// invalid overflow values will cause node bootstrap failures (kubelet crash loop) rather
// than admission errors.
// When graduating new fields from overflow to typed fields, match upstream kubelet's
// field names and types exactly. See api/AGENTS.md "KubeletConfiguration Field Graduation"
// for the full strategy.
// +kubebuilder:pruning:PreserveUnknownFields
Comment thread
JoelSpeed marked this conversation as resolved.
// +kubebuilder:validation:XValidation:rule="!has(self.imageGCHighThresholdPercent) || !has(self.imageGCLowThresholdPercent) || self.imageGCHighThresholdPercent > self.imageGCLowThresholdPercent",message="imageGCHighThresholdPercent must be greater than imageGCLowThresholdPercent"
// +kubebuilder:validation:XValidation:rule="!has(self.podsPerCore) || !has(self.maxPods) || self.podsPerCore <= self.maxPods",message="podsPerCore must not exceed maxPods"
// +kubebuilder:validation:XValidation:rule="!has(self.evictionSoft) || (has(self.evictionSoftGracePeriod) && self.evictionSoft.all(e, e in self.evictionSoftGracePeriod))",message="evictionSoft entry does not have a matching evictionSoftGracePeriod"
// +kubebuilder:validation:XValidation:rule="!has(self.evictionSoftGracePeriod) || (has(self.evictionSoft) && self.evictionSoftGracePeriod.all(e, e in self.evictionSoft))",message="evictionSoftGracePeriod entry does not have a matching evictionSoft"
// +kubebuilder:validation:XValidation:rule="!has(self.evictionHard) || !has(self.evictionSoft) || self.evictionHard.all(key, !(key in self.evictionSoft) || ((self.evictionSoft[key].endsWith('%') && self.evictionHard[key].endsWith('%')) ? (self.evictionSoft[key].size() <= 1 || self.evictionHard[key].size() <= 1 || double(self.evictionSoft[key].substring(0, self.evictionSoft[key].size() - 1)) >= double(self.evictionHard[key].substring(0, self.evictionHard[key].size() - 1))) : (!(isQuantity(self.evictionSoft[key]) && isQuantity(self.evictionHard[key])) || quantity(self.evictionSoft[key]).compareTo(quantity(self.evictionHard[key])) >= 0)))",message="evictionSoft threshold must be greater than or equal to evictionHard threshold for the same signal (soft eviction should fire before hard)"
type KubeletConfiguration struct {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this will need test coverage once d448ab4 merges

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can either come back afterwards and add it, or we can wait for that to merge and then do this one, and I can add it here. I'll set a test up based on your branch in the mean time.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added envtest test coverage for KubeletConfiguration

// maxPods is the maximum number of pods that can run on a node.
Comment thread
jkyros marked this conversation as resolved.
// The value must be between 1 and 2500, inclusive.
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=2500
// +optional
MaxPods int32 `json:"maxPods,omitempty"`
// podsPerCore is the maximum number of pods per core. The value must be between 1 and 2500,
// inclusive, and cannot exceed maxPods when both are set.
// +kubebuilder:validation:Minimum=1
// +kubebuilder:validation:Maximum=2500
// +optional
PodsPerCore int32 `json:"podsPerCore,omitempty"`
// systemReserved is a set of ResourceName=ResourceQuantity pairs that describe
// resources reserved for non-kubernetes components.
// Currently only cpu, memory, ephemeral-storage, and pid are supported.
// +kubebuilder:validation:XValidation:message="valid keys for systemReserved are ['cpu','memory','ephemeral-storage','pid']",rule="self.all(x, x=='cpu' || x=='memory' || x=='ephemeral-storage' || x=='pid')"
// +kubebuilder:validation:XValidation:message="systemReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))"
Comment thread
JoelSpeed marked this conversation as resolved.
// +kubebuilder:validation:XValidation:message="systemReserved values must not be empty",rule="self.all(x, self[x].size() > 0)"
// +kubebuilder:validation:MinProperties=1
// +kubebuilder:validation:MaxProperties=20
// +optional
SystemReserved map[string]string `json:"systemReserved,omitempty"`
// kubeReserved is a set of ResourceName=ResourceQuantity pairs that describe
// resources reserved for kubernetes system components.
// Currently only cpu, memory, ephemeral-storage, and pid are supported.
// +kubebuilder:validation:XValidation:message="valid keys for kubeReserved are ['cpu','memory','ephemeral-storage','pid']",rule="self.all(x, x=='cpu' || x=='memory' || x=='ephemeral-storage' || x=='pid')"
// +kubebuilder:validation:XValidation:message="kubeReserved value cannot be a negative resource quantity",rule="self.all(x, !self[x].startsWith('-'))"
Comment thread
JoelSpeed marked this conversation as resolved.
// +kubebuilder:validation:XValidation:message="kubeReserved values must not be empty",rule="self.all(x, self[x].size() > 0)"
// +kubebuilder:validation:MinProperties=1
// +kubebuilder:validation:MaxProperties=20
// +optional
KubeReserved map[string]string `json:"kubeReserved,omitempty"`
// evictionHard is a map of signal names to quantities that defines hard eviction thresholds.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to validate that evictionHard and EvictionSoft doesn't clash any values?

@jkyros jkyros May 8, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added as much CEL as we can to validate this.

They can be either quantities or percentages (yay!) and that makes it difficult to compare since CEL doesn't know node capacity. But we can can compare percentages to percentages and values to values so I added what ended up being kind of a gnarly expression that compares them if it can and fails open if they are mix/match quantity/percentage.

EDIT: Nope. Too gnarly. quantity() and compareTo() are too expensive. And apparently...I can't put max lengths on the map key strings to help without type aliasing them so I have somewhere to attach. Yuck. (unless we want to hack on the CRD like adjust-cel.sh does or something? I don't think we do )

I'm going to turn into Joel here with the "ugh, this API sucks" 😛

Yeah I think (because contents can be either type) we either have to hack on the CRD or type alias it to pass budget and still validate.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Added separate type definition for the strings so we can limit their length to 64 inside the map, CEL budget is okay now.

// +kubebuilder:validation:XValidation:message="valid keys for evictionHard are ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available']",rule="self.all(x, x in ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available'])"
// +kubebuilder:validation:XValidation:message="evictionHard values must not be empty",rule="self.all(x, self[x].size() > 0)"
// +kubebuilder:validation:MinProperties=1
// +kubebuilder:validation:MaxProperties=20
// +optional
EvictionHard map[string]EvictionThreshold `json:"evictionHard,omitempty"`
// evictionSoft is a map of signal names to quantities that defines soft eviction thresholds.
// +kubebuilder:validation:XValidation:message="valid keys for evictionSoft are ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available']",rule="self.all(x, x in ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available'])"
// +kubebuilder:validation:XValidation:message="evictionSoft values must not be empty",rule="self.all(x, self[x].size() > 0)"
// +kubebuilder:validation:MinProperties=1
// +kubebuilder:validation:MaxProperties=20
// +optional
EvictionSoft map[string]EvictionThreshold `json:"evictionSoft,omitempty"`
// evictionSoftGracePeriod is a map of signal names to quantities that defines grace periods
// for each soft eviction signal.
// +kubebuilder:validation:XValidation:message="valid keys for evictionSoftGracePeriod are ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available']",rule="self.all(x, x in ['memory.available','nodefs.available','nodefs.inodesFree','imagefs.available','imagefs.inodesFree','pid.available'])"
// +kubebuilder:validation:MinProperties=1
// +kubebuilder:validation:MaxProperties=20
// +optional
EvictionSoftGracePeriod map[string]string `json:"evictionSoftGracePeriod,omitempty"`
// evictionMaxPodGracePeriod is the maximum allowed grace period (in seconds) to use
Comment thread
JoelSpeed marked this conversation as resolved.
// when terminating pods in response to soft eviction thresholds.
// +optional
EvictionMaxPodGracePeriod *int32 `json:"evictionMaxPodGracePeriod,omitempty"`
Comment thread
coderabbitai[bot] marked this conversation as resolved.
// imageGCHighThresholdPercent is the percent of disk usage which triggers image garbage collection.
// The value must be between 0 and 100, inclusive, and must be greater than imageGCLowThresholdPercent when both are set.
// +kubebuilder:validation:Minimum=0
// +kubebuilder:validation:Maximum=100
// +optional
ImageGCHighThresholdPercent *int32 `json:"imageGCHighThresholdPercent,omitempty"`
// imageGCLowThresholdPercent is the percent of disk usage to which image garbage collection attempts to free.
// The value must be between 0 and 100, inclusive, and must be less than imageGCHighThresholdPercent when both are set.
Comment thread
jkyros marked this conversation as resolved.
// +kubebuilder:validation:Minimum=0
// +kubebuilder:validation:Maximum=100
// +optional
ImageGCLowThresholdPercent *int32 `json:"imageGCLowThresholdPercent,omitempty"`
// cpuCFSQuota enables CPU CFS quota enforcement for containers that specify CPU limits.
// +optional
CPUCFSQuota *bool `json:"cpuCFSQuota,omitempty"`
Comment thread
JoelSpeed marked this conversation as resolved.

// Overflow holds additional kubelet configuration fields not explicitly defined above.
// These fields are preserved during serialization and deserialization, allowing arbitrary
// kubelet configuration to pass through to the node's ignition/MachineConfig.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: may be mention overflow fields bypass all CRD validation, invalid values will manifest as node bootstrap failures (kubelet crash loop), not admission errors.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated language to mention the kubelet crash loop

Overflow runtime.RawExtension `json:"-"`
}

// UnmarshalJSON implements custom JSON unmarshaling for KubeletConfiguration.
// It deserializes known fields into the struct and captures all additional fields
// into the overflow map for pass-through.
func (k *KubeletConfiguration) UnmarshalJSON(data []byte) error {
// Zero the receiver so that fields absent from the new input
// (including Overflow, which is json:"-") do not survive from a
// previous decode.
*k = KubeletConfiguration{}

// Unmarshal known fields via alias to avoid infinite recursion
type Alias KubeletConfiguration
aux := &struct{ *Alias }{Alias: (*Alias)(k)}
if err := json.Unmarshal(data, aux); err != nil {
return err
}

// Unmarshal everything into a raw map
var raw map[string]json.RawMessage
if err := json.Unmarshal(data, &raw); err != nil {
return err
}

// Separate unknown fields into overflow
for key := range kubeletConfigKnownFields {
delete(raw, key)
}
if len(raw) > 0 {
overflowBytes, err := json.Marshal(raw)
if err != nil {
return err
}
k.Overflow = runtime.RawExtension{Raw: overflowBytes}
}
return nil
Comment thread
coderabbitai[bot] marked this conversation as resolved.
}

// MarshalJSON implements custom JSON marshaling for KubeletConfiguration.
// It serializes the known typed fields and merges any overflow fields back in.
func (k KubeletConfiguration) MarshalJSON() ([]byte, error) {
// Marshal known fields via alias to avoid infinite recursion
type Alias KubeletConfiguration
data, err := json.Marshal((*Alias)(&k))
if err != nil {
return nil, err
}

if len(k.Overflow.Raw) == 0 {
return data, nil
}

// Merge overflow fields into the output; structured fields win on conflict.
var overflowMap map[string]json.RawMessage
if err := json.Unmarshal(k.Overflow.Raw, &overflowMap); err != nil {
return nil, err
}
var structured map[string]json.RawMessage
if err := json.Unmarshal(data, &structured); err != nil {
return nil, err
}
for key, val := range structured {
overflowMap[key] = val
}
return json.Marshal(overflowMap)
}

// HasTypedFields reports whether any explicitly defined struct fields are set.
// This is used by IsZero, but is separate so we can differentiate the zero case
// from "only overflow fields set". This must be kept in sync with the typed
// fields in KubeletConfiguration.
func (k KubeletConfiguration) HasTypedFields() bool {
return k.MaxPods != 0 ||
k.PodsPerCore != 0 ||
k.SystemReserved != nil ||
k.KubeReserved != nil ||
k.EvictionHard != nil ||
k.EvictionSoft != nil ||
k.EvictionSoftGracePeriod != nil ||
k.EvictionMaxPodGracePeriod != nil ||
k.ImageGCHighThresholdPercent != nil ||
k.ImageGCLowThresholdPercent != nil ||
k.CPUCFSQuota != nil
}

// IsZero reports whether the KubeletConfiguration is empty (no typed fields set and
// no overflow fields). This is used by the omitzero JSON tag to determine whether the
// field should be omitted during serialization.
func (k KubeletConfiguration) IsZero() bool {
return !k.HasTypedFields() && len(k.Overflow.Raw) == 0
}
Loading