Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
47 changes: 20 additions & 27 deletions machine_configuration/index.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,10 +7,8 @@ include::_attributes/common-attributes.adoc[]
toc::[]



There are times when you need to make changes to the operating systems running on {product-title} nodes. This can include changing settings for network time service, adding kernel arguments, or configuring journaling in a specific way.

Aside from a few specialized features, most changes to operating systems on {product-title} nodes can be done by creating what are referred to as `MachineConfig` objects that are managed by the Machine Config Operator. For example, you can use the Machine Config Operator (MCO) and machine configs to manage update to systemd, CRI-O and kubelet, the kernel, Network Manager and other system features.
[role="_abstract"]
You can make most changes to the operating systems on {product-title} nodes by creating `MachineConfig` objects, which are managed by the Machine Config Operator. For example, you can use the Machine Config Operator (MCO) and machine configs to manage update to systemd, CRI-O and kubelet, the kernel, Network Manager and other system features.

Tasks in this section describe how to use features of the Machine Config Operator to configure operating system features on {product-title} nodes.

Expand All @@ -23,41 +21,36 @@ Previously, NetworkManager stored new network configurations to `/etc/sysconfig/

include::modules/understanding-machine-config-operator.adoc[leveloffset=+1]

[role="_additional-resources"]
.Additional resources

* xref:../networking/ovn_kubernetes_network_provider/about-ovn-kubernetes.adoc#about-ovn-kubernetes[About the OVN-Kubernetes network plugin]

include::modules/machine-config-overview.adoc[leveloffset=+1]

include::modules/architecture-machine-config-pools.adoc[leveloffset=+2]

include::modules/machine-config-node-drain.adoc[leveloffset=+1]

[role="_additional-resources"]
.Additional resources

* xref:../machine_configuration/index.adoc#about-machine-config-operator_machine-config-overview[About the Machine Config Operator]
* xref:../machine_configuration/machine-config-node-disruption.adoc#machine-configs-configure[Using node disruption policies to minimize disruption from machine config changes]
* xref:../support/troubleshooting/troubleshooting-operator-issues.adoc#troubleshooting-disabling-autoreboot-mco_troubleshooting-operator-issues[Disabling the Machine Config Operator from automatically rebooting]

include::modules/machine-config-drift-detection.adoc[leveloffset=+1]

include::modules/checking-mco-status.adoc[leveloffset=+1]

include::modules/checking-mco-node-status.adoc[leveloffset=+1]

[role="_additional-resources"]
.Additional resources

* xref:../machine_configuration/mco-coreos-layering.adoc#coreos-layering-configuring-on_mco-coreos-layering[About on-cluster image mode]
* xref:../nodes/clusters/nodes-cluster-enabling-features.adoc#nodes-cluster-enabling-features[Enabling features using feature gates]

include::modules/checking-mco-node-status-configuring.adoc[leveloffset=+2]

[id="machine-config-operator-certificates_{context}"]
== Understanding Machine Config Operator certificates

Machine Config Operator certificates are used to secure connections between the Red Hat Enterprise Linux CoreOS (RHCOS) nodes and the Machine Config Server. For more information, see xref:../security/certificate_types_descriptions/machine-config-operator-certificates.adoc#cert-types-machine-config-operator-certificates[Machine Config Operator certificates].
include::modules/checking-mco-status-certs.adoc[leveloffset=+1]

include::modules/checking-mco-status-certs.adoc[leveloffset=+2]
[role="_additional-resources"]
[id="additional-resources_{context}"]
== Additional resources
* link:https://github.com/openshift/machine-config-operator#machine-config-operator[machine-config-operator]
* link:https://coreos.github.io/ignition/configuration-v3_5/[Configuration Specification v3.5.0 (Ignition documentation)]
* link:https://coreos.github.io/ignition/configuration-v3_2/[Configuration Specification v3.2.0 (Ignition documentation)]
* link:https://access.redhat.com/solutions/5414371[How to skip validation of failing / stuck MachineConfig in OCP 4? (Red{nbsp}Hat Knowledgebase article)]
* link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/security_hardening/index#using-the-system-wide-cryptographic-policies_security-hardening[Using system-wide cryptographic policies (Red{nbsp}Hat documentation)]
* link:https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html-single/security_hardening/index#protecting-systems-against-intrusive-usb-devices_security-hardening[usbguard]
* xref:../networking/ovn_kubernetes_network_provider/about-ovn-kubernetes.adoc#about-ovn-kubernetes[About the OVN-Kubernetes network plugin]
* xref:../machine_configuration/index.adoc#about-machine-config-operator_machine-config-overview[About the Machine Config Operator]
* xref:../machine_configuration/machine-config-node-disruption.adoc#machine-configs-configure[Using node disruption policies to minimize disruption from machine config changes]
* link:https://github.com/openshift/runbooks/blob/master/alerts/machine-config-operator/MachineConfigControllerDrainError.md[MCCDrainError]
* xref:../support/troubleshooting/troubleshooting-operator-issues.adoc#troubleshooting-disabling-autoreboot-mco_troubleshooting-operator-issues[Disabling the Machine Config Operator from automatically rebooting]
* xref:../machine_configuration/mco-coreos-layering.adoc#coreos-layering-configuring-on_mco-coreos-layering[About on-cluster image mode]
* xref:../nodes/clusters/nodes-cluster-enabling-features.adoc#nodes-cluster-enabling-features[Enabling features using feature gates]
* xref:../security/certificate_types_descriptions/machine-config-operator-certificates.adoc#cert-types-machine-config-operator-certificates[Machine Config Operator certificates]
3 changes: 3 additions & 0 deletions modules/architecture-machine-config-pools.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,9 @@
[id="architecture-machine-config-pools_{context}"]
= Node configuration management with machine config pools

[role="_abstract"]
When making changes to nodes, you can make changes to groups of nodes by applying the changes to all of the nodes in the same machine config pool (MCP).

Machines that run control plane components or user workloads are divided into groups based on the types of resources they handle. These groups of machines are called machine config pools (MCP). Each MCP manages a set of nodes and its corresponding machine configs. The role of the node determines which MCP it belongs to; the MCP governs nodes based on its assigned node role label. Nodes in an MCP have the same configuration; this means nodes can be scaled up and torn down in response to increased or decreased workloads.

By default, there are two MCPs created by the cluster when it is installed: `master` and `worker`. Each default MCP has a defined configuration applied by the Machine Config Operator (MCO), which is responsible for managing MCPs and facilitating MCP updates.
Expand Down
34 changes: 20 additions & 14 deletions modules/checking-mco-node-status.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,8 @@ rendered-worker-01f27f752eb84eba917450e43636b210 c00e2c941bc6e236b50e0bf3988e6
rendered-worker-f351f6947f15cd0380514f4b1c89f8f2 c00e2c941bc6e236b50e0bf3988e6c790cf2bbb2 3.5.0 7m26s <2>
# ...
----
<1> The current machine config for the worker nodes.
<2> The newly-created machine config that is being applied to the worker nodes.

In this example, the current machine config for the worker nodes is listed before the newly-created machine config, which is being applied to the worker nodes.

You can watch as the nodes are updated with the new machine config:

Expand All @@ -97,13 +97,16 @@ NAME POOLNAME DESIREDCONFIG
ci-ln-ds73n5t-72292-9xsm9-master-0 master rendered-master-a386c2d1550b927d274054124f58be68 rendered-master-a386c2d1550b927d274054124f58be68 True 27M
ci-ln-ds73n5t-72292-9xsm9-master-1 master rendered-master-a386c2d1550b927d274054124f58be68 rendered-master-23cf200e4ee97daa6e39fdce24c9fb67 False 27M
ci-ln-ds73n5t-72292-9xsm9-master-2 master rendered-master-23cf200e4ee97daa6e39fdce24c9fb67 rendered-master-23cf200e4ee97daa6e39fdce24c9fb67 True 27M
ci-ln-ds73n5t-72292-9xsm9-worker-a-2d8tz worker-cnf rendered-worker-f351f6947f15cd0380514f4b1c89f8f2 rendered-worker-f351f6947f15cd0380514f4b1c89f8f2 True 20M <1>
ci-ln-ds73n5t-72292-9xsm9-worker-b-gw5sd worker rendered-worker-f351f6947f15cd0380514f4b1c89f8f2 rendered-worker-01f27f752eb84eba917450e43636b210 False 20M <2>
ci-ln-ds73n5t-72292-9xsm9-worker-c-t227w worker rendered-worker-01f27f752eb84eba917450e43636b210 rendered-worker-01f27f752eb84eba917450e43636b210 True 19M <3>
ci-ln-ds73n5t-72292-9xsm9-worker-a-2d8tz worker-cnf rendered-worker-f351f6947f15cd0380514f4b1c89f8f2 rendered-worker-f351f6947f15cd0380514f4b1c89f8f2 True 20M
ci-ln-ds73n5t-72292-9xsm9-worker-b-gw5sd worker rendered-worker-f351f6947f15cd0380514f4b1c89f8f2 rendered-worker-01f27f752eb84eba917450e43636b210 False 20M
ci-ln-ds73n5t-72292-9xsm9-worker-c-t227w worker rendered-worker-01f27f752eb84eba917450e43636b210 rendered-worker-01f27f752eb84eba917450e43636b210 True 19M
----
<1> This node has been updated. The new machine config, `rendered-worker-f351f6947f15cd0380514f4b1c89f8f2`, is shown as the desired and current machine configs.
<2> This node is currently being updated to the new machine config. The previous and new machine configs are shown as the desired and current machine configs, respectively.
<3> This node has not yet been updated to the new machine config. The previous machine config is shown as the desired and current machine configs.

In this example, the `ci-ln-ds73n5t-72292-9xsm9-worker-a-2d8tz` node has been updated. The new machine config, `rendered-worker-f351f6947f15cd0380514f4b1c89f8f2`, is shown as the desired and current machine configs.

The `ci-ln-ds73n5t-72292-9xsm9-worker-b-gw5sd` node is currently being updated to the new machine config. The previous and new machine configs are shown as the desired and current machine configs, respectively.

The `ci-ln-ds73n5t-72292-9xsm9-worker-c-t227w` node has not yet been updated to the new machine config. The previous machine config is shown as the desired and current machine configs.

.Basic machine config node fields
[cols="1,4",options="header"]
Expand Down Expand Up @@ -174,11 +177,11 @@ kind: MachineConfigNode
metadata:
creationTimestamp: "2025-04-28T18:40:29Z"
generation: 3
name: <machine_config_node_name> <1>
name: <machine_config_node_name>
# ...
spec:
configVersion:
desired: rendered-master-34f96af2e41acb615410b97ce1c819e6 <2>
desired: rendered-master-34f96af2e41acb615410b97ce1c819e6
node:
name: ci-ln-921r7qk-72292-kxv95-master-0
pool:
Expand Down Expand Up @@ -259,13 +262,16 @@ status:
status: "False"
type: PinnedImageSetsDegraded
configVersion:
current: rendered-master-34f96af2e41acb615410b97ce1c819e6 <3>
current: rendered-master-34f96af2e41acb615410b97ce1c819e6
desired: rendered-master-34f96af2e41acb615410b97ce1c819e6
observedGeneration: 4
----
<1> The `MachineConfigNode` object name.
<2> The new machine configuration. This field updates after the MCO validates the machine config in the `UPDATEPREPARED` phase, then the status adds the new configuration.
<3> The current machine config on the node.
+
where:

`metadata.name`:: Specifies the `MachineConfigNode` object name.
`spec.configVersion.desired`:: Specifies the new machine configuration. This field updates after the MCO validates the machine config in the `UPDATEPREPARED` phase, then the status adds the new configuration.
`status.configVersion.current`:: Specifies the current machine config on the node.

For clusters configured with {image-mode-os-on-lower}, the machine config node output also includes the name of the custom layered image that was applied to affected nodes.

Expand Down
7 changes: 5 additions & 2 deletions modules/checking-mco-status-certs.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,11 @@
[id="checking-mco-status-certs_{context}"]
= Viewing and interacting with certificates

[role="_abstract"]
Machine Config Operator certificates are used to secure connections between the Red Hat Enterprise Linux CoreOS (RHCOS) nodes and the Machine Config Server.

For more information, see "Machine Config Operator certificates" in the _Additional resources_ section.

The following certificates are handled in the cluster by the Machine Config Controller (MCC) and can be found in the `ControllerConfig` resource:

* `/etc/kubernetes/kubelet-ca.crt`
Expand All @@ -30,7 +35,6 @@ $ oc get controllerconfig/machine-config-controller -o yaml | yq -y '.status.con
----
+
.Example output
+
[source,yaml]
----
- bundleFile: KubeAPIServerServingCAData
Expand Down Expand Up @@ -59,7 +63,6 @@ $ oc get mcp master -o yaml | yq -y '.status.certExpirys'
----
+
.Example output
+
[source,yaml]
----
- bundle: KubeAPIServerServingCAData
Expand Down
11 changes: 6 additions & 5 deletions modules/checking-mco-status.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,8 @@
[id="checking-mco-status_{context}"]
= Checking machine config pool status

To see the status of the Machine Config Operator (MCO), its sub-components, and the resources it manages, use the following `oc` commands:
[role="_abstract"]
You can see the status of the Machine Config Operator (MCO), its sub-components, and the resources it manages, by using the `oc` commands.

.Procedure
. To see the number of MCO-managed nodes available on your cluster for each machine config pool (MCP), run the following command:
Expand Down Expand Up @@ -155,13 +156,13 @@ ExecStart=/usr/bin/hyperkube \
--config=/etc/kubernetes/kubelet.conf \ ...
----

If something goes wrong with a machine config that you apply, you can always back out that change. For example, if you had run `oc create -f ./myconfig.yaml` to apply a machine config, you could remove that machine config by running the following command:

. If something goes wrong with a machine config that you apply, you can always back out that change. For example, if you had run `oc create -f ./myconfig.yaml` to apply a machine config, you could remove that machine config by running the following command:
+
[source,terminal]
----
$ oc delete -f ./myconfig.yaml
----

+
If that was the only problem, the nodes in the affected pool should return to a non-degraded state. This actually causes the rendered configuration to roll back to its previously rendered state.

+
If you add your own machine configs to your cluster, you can use the commands shown in the previous example to check their status and the related status of the pool to which they are applied.
23 changes: 12 additions & 11 deletions modules/machine-config-drift-detection.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -6,10 +6,11 @@
[id="machine-config-drift-detection_{context}"]
= Understanding configuration drift detection

There might be situations when the on-disk state of a node differs from what is configured in the machine config. This is known as _configuration drift_. For example, a cluster admin might manually modify a file, a systemd unit file, or a file permission that was configured through a machine config. This causes configuration drift. Configuration drift can cause problems between nodes in a Machine Config Pool or when the machine configs are updated.

[role="_abstract"]
The Machine Config Operator (MCO) uses the Machine Config Daemon (MCD) to check nodes for configuration drift on a regular basis. If detected, the MCO sets the node and the machine config pool (MCP) to `Degraded` and reports the error. A degraded node is online and operational, but, it cannot be updated.

There might be situations when the on-disk state of a node differs from what is configured in the machine config. This is known as _configuration drift_. For example, a cluster admin might manually modify a file, a systemd unit file, or a file permission that was configured through a machine config. This causes configuration drift. Configuration drift can cause problems between nodes in a Machine Config Pool or when the machine configs are updated.

The MCD performs configuration drift detection upon each of the following conditions:

* When a node boots.
Expand Down Expand Up @@ -58,14 +59,14 @@ $ oc describe mcp worker
----
...
Last Transition Time: 2021-12-20T18:54:00Z
Message: Node ci-ln-j4h8nkb-72292-pxqxz-worker-a-fjks4 is reporting: "content mismatch for file \"/etc/mco-test-file\"" <1>
Message: Node ci-ln-j4h8nkb-72292-pxqxz-worker-a-fjks4 is reporting: "content mismatch for file \"/etc/mco-test-file\""
Reason: 1 nodes are reporting degraded status on sync
Status: True
Type: NodeDegraded <2>
Type: NodeDegraded
...
----
<1> This message shows that a node's `/etc/mco-test-file` file, which was added by the machine config, has changed outside of the machine config.
<2> The state of the node is `NodeDegraded`.

In this example, the text in the `Message` field shows that a node's `/etc/mco-test-file` file, which was added by the machine config, has changed outside of the machine config. In response, the `Type field show that state of the node is `NodeDegraded`.

Or, if you know which node is degraded, examine that node:

Expand All @@ -86,18 +87,18 @@ Annotations: cloud.network.openshift.io/egress-ipconfig: [{"interface":"n
machineconfiguration.openshift.io/controlPlaneTopology: HighlyAvailable
machineconfiguration.openshift.io/currentConfig: rendered-worker-67bd55d0b02b0f659aef33680693a9f9
machineconfiguration.openshift.io/desiredConfig: rendered-worker-67bd55d0b02b0f659aef33680693a9f9
machineconfiguration.openshift.io/reason: content mismatch for file "/etc/mco-test-file" <1>
machineconfiguration.openshift.io/state: Degraded <2>
machineconfiguration.openshift.io/reason: content mismatch for file "/etc/mco-test-file"
machineconfiguration.openshift.io/state: Degraded
...
----
<1> The error message indicating that configuration drift was detected between the node and the listed machine config. Here the error message indicates that the contents of the `/etc/mco-test-file`, which was added by the machine config, has changed outside of the machine config.
<2> The state of the node is `Degraded`.

The `content mismatch for file` error message indicates that configuration drift was detected between the node and the listed machine config. Here the error message indicates that the contents of the `/etc/mco-test-file`, which was added by the machine config, has changed outside of the machine config. As a result, the state of the node is `Degraded`.

You can correct configuration drift and return the node to the `Ready` state by performing one of the following remediations:

* Ensure that the contents and file permissions of the files on the node match what is configured in the machine config. You can manually rewrite the file
contents or change the file permissions.
* Generate a link:https://access.redhat.com/solutions/5414371[force file] on the degraded node. The force file causes the MCD to bypass the usual configuration drift detection and reapplies the current machine config.
* Generate a force file on the degraded node. The force file causes the MCD to bypass the usual configuration drift detection and reapplies the current machine config. For more information, see "Force File" in the _Additional resources_ section.
+
[NOTE]
====
Expand Down
Loading