Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/cli-parity-known-deltas.md
Original file line number Diff line number Diff line change
Expand Up @@ -51,6 +51,12 @@ Row IDs match the command-catalogue indexes used by `cli-parity-refresh.sh` (see

| 82 | `rd clone` data plane (`use_zfs_clone` vs `zfs send\|recv`) | BEHAVIOR | permanent | Bug-020. Upstream LINSTOR clones a VD-bearing RD by internal snapshot + either `zfs clone` (when the request carries `use_zfs_clone=true`, golinstor v0.58+/linstor-csi) or `zfs send \| zfs recv` (default, fully independent copy). blockstor's clone routes through the snapshot-restore machinery: internal snapshot `clone-<target>` + `BlockstorRestoreFromSnapshot` marker, whose ZFS provider materialises the target with `zfs clone` (cross-node placements use the existing send/recv restore path). Consequences accepted: (a) `use_zfs_clone=true` — the linstor-csi case — gets exactly the requested semantics; (b) `use_zfs_clone=false`/absent ALSO lands on the snapshot-clone path instead of an independent full copy, so same-node clone targets stay dependent on the origin snapshot (the snapshot is visible in `linstor s l` and must outlive the clone); (c) sources on non-snapshot-capable (thick) pools refuse the clone with an actionable envelope where upstream would full-copy. Pinned by `pkg/rest/clone_use_zfs_clone_bug020_test.go` (L1) + `tests/integration` Group J `CSICreateVolumeFromClone` (Tier 2). |

| 07 | `rd l --resource-definitions <rd>` (Layers column on a placed RD) | WIRE_SHAPE | permanent | Same `stampRDLayerDataFromStack` behaviour as row 81 — the flag-qualified `rd l --resource-definitions <rd>` catalogue cell (harness index 07) the bare `rd l` whitelist string does not literally cover. BS re-synthesises `layer_data: [{"type":"DRBD"},{"type":"STORAGE"}]` from `Spec.LayerStack` on every RD read, so the python CLI's Layers column renders `DRBD,STORAGE` even for a fresh `parity-rd` with only a volume-definition; upstream 1.33.2 leaves the column blank until DRBD layer data is actually allocated. BLOCKSTOR_SUPERSET, operator-friendly, and linstor-csi / piraeus-operator do not gate on RD `layer_data`. See row 81 for the full rationale and the L1/L3 pins (`stampRDLayerDataFromStack`, `tests/contract/normalize_test.go::TestNormalizeRDLayerDataDropped`). |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a minor grammatical omission in this description. A relative pronoun (such as "that" or "which") is missing between "index 07)" and "the bare rd l".

Suggested change
| 07 | `rd l --resource-definitions <rd>` (Layers column on a placed RD) | WIRE_SHAPE | permanent | Same `stampRDLayerDataFromStack` behaviour as row 81 — the flag-qualified `rd l --resource-definitions <rd>` catalogue cell (harness index 07) the bare `rd l` whitelist string does not literally cover. BS re-synthesises `layer_data: [{"type":"DRBD"},{"type":"STORAGE"}]` from `Spec.LayerStack` on every RD read, so the python CLI's Layers column renders `DRBD,STORAGE` even for a fresh `parity-rd` with only a volume-definition; upstream 1.33.2 leaves the column blank until DRBD layer data is actually allocated. BLOCKSTOR_SUPERSET, operator-friendly, and linstor-csi / piraeus-operator do not gate on RD `layer_data`. See row 81 for the full rationale and the L1/L3 pins (`stampRDLayerDataFromStack`, `tests/contract/normalize_test.go::TestNormalizeRDLayerDataDropped`). |
| 07 | `rd l --resource-definitions <rd>` (Layers column on a placed RD) | WIRE_SHAPE | permanent | Same `stampRDLayerDataFromStack` behaviour as row 81 — the flag-qualified `rd l --resource-definitions <rd>` catalogue cell (harness index 07) that the bare `rd l` whitelist string does not literally cover. BS re-synthesises `layer_data: [{"type":"DRBD"},{"type":"STORAGE"}]` from `Spec.LayerStack` on every RD read, so the python CLI's Layers column renders `DRBD,STORAGE` even for a fresh `parity-rd` with only a volume-definition; upstream 1.33.2 leaves the column blank until DRBD layer data is actually allocated. BLOCKSTOR_SUPERSET, operator-friendly, and linstor-csi / piraeus-operator do not gate on RD `layer_data`. See row 81 for the full rationale and the L1/L3 pins (`stampRDLayerDataFromStack`, `tests/contract/normalize_test.go::TestNormalizeRDLayerDataDropped`). |

| 33 | `s d <rd> <nonexistent-snap>` (idempotent snapshot delete) | WIRE_SHAPE | permanent | BLOCKSTOR idempotent-delete envelope. Deleting a snapshot definition that does not exist returns `SUCCESS: snapshot already absent: <snap>` (the desired-state delete is satisfied); upstream LINSTOR returns `WARNING: Snapshot definition <snap> of resource <rd> not found`. Both exit 0. Deliberate: a CSI `DeleteSnapshot` / operator retry MUST be idempotent, so "already gone" is a success, not a warning. Mirrors the resource-delete idempotency in row 42. Pinned by the snapshot-delete REST handler's already-absent path (`pkg/rest/snapshots.go`). |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

There is a discrepancy between this documentation and the actual code implementation in pkg/rest/snapshots.go. The description states that deleting a non-existent snapshot returns SUCCESS: snapshot already absent: <snap>. However, the code in pkg/rest/snapshots.go (line 1240) actually returns warnSnapshotNotFound as the RetCode, which carries the warning mask (0x4000_0000). Therefore, the LINSTOR CLI will render this as a WARNING, not SUCCESS. The documentation should be updated to reflect that it returns a warning instead of success.

Suggested change
| 33 | `s d <rd> <nonexistent-snap>` (idempotent snapshot delete) | WIRE_SHAPE | permanent | BLOCKSTOR idempotent-delete envelope. Deleting a snapshot definition that does not exist returns `SUCCESS: snapshot already absent: <snap>` (the desired-state delete is satisfied); upstream LINSTOR returns `WARNING: Snapshot definition <snap> of resource <rd> not found`. Both exit 0. Deliberate: a CSI `DeleteSnapshot` / operator retry MUST be idempotent, so "already gone" is a success, not a warning. Mirrors the resource-delete idempotency in row 42. Pinned by the snapshot-delete REST handler's already-absent path (`pkg/rest/snapshots.go`). |
| 33 | `s d <rd> <nonexistent-snap>` (idempotent snapshot delete) | WIRE_SHAPE | permanent | BLOCKSTOR idempotent-delete envelope. Deleting a snapshot definition that does not exist returns `WARNING: snapshot already absent: <snap>` (the desired-state delete is satisfied); upstream LINSTOR returns `WARNING: Snapshot definition <snap> of resource <rd> not found`. Both exit 0. Deliberate: a CSI `DeleteSnapshot` / operator retry MUST be idempotent, so "already gone" is a success, not a warning. Mirrors the resource-delete idempotency in row 42. Pinned by the snapshot-delete REST handler's already-absent path (`pkg/rest/snapshots.go`). |

| 42 | `r d <rd> <nonexistent-node>` (idempotent resource delete) | WIRE_SHAPE | permanent | BLOCKSTOR idempotent-delete envelope. Deleting a resource placement on a node that holds none returns `SUCCESS: resource already absent: <node> on <rd>`; upstream LINSTOR returns `WARNING: Node: <rd>, Resource: <node> not found`. Both exit 0. Deliberate: a CSI `DeleteVolume` / operator retry on an already-removed placement MUST be idempotent. Same family as the snapshot-delete idempotency in row 33. Pinned by the resource-delete REST handler's already-absent path (`pkg/rest/resources.go`). |

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Similar to row 33, there is a likely discrepancy here. If the resource-delete handler was updated to use a warning mask (similar to warnSnapshotNotFound in the snapshot-delete handler) to align with the audit fixes, the CLI will render this as a WARNING rather than SUCCESS. The documentation should be updated to reflect that it returns a warning.

Suggested change
| 42 | `r d <rd> <nonexistent-node>` (idempotent resource delete) | WIRE_SHAPE | permanent | BLOCKSTOR idempotent-delete envelope. Deleting a resource placement on a node that holds none returns `SUCCESS: resource already absent: <node> on <rd>`; upstream LINSTOR returns `WARNING: Node: <rd>, Resource: <node> not found`. Both exit 0. Deliberate: a CSI `DeleteVolume` / operator retry on an already-removed placement MUST be idempotent. Same family as the snapshot-delete idempotency in row 33. Pinned by the resource-delete REST handler's already-absent path (`pkg/rest/resources.go`). |
| 42 | `r d <rd> <nonexistent-node>` (idempotent resource delete) | WIRE_SHAPE | permanent | BLOCKSTOR idempotent-delete envelope. Deleting a resource placement on a node that holds none returns `WARNING: resource already absent: <node> on <rd>`; upstream LINSTOR returns `WARNING: Node: <rd>, Resource: <node> not found`. Both exit 0. Deliberate: a CSI `DeleteVolume` / operator retry on an already-removed placement MUST be idempotent. Same family as the snapshot-delete idempotency in row 33. Pinned by the resource-delete REST handler's already-absent path (`pkg/rest/resources.go`). |

| 40 | `n c <node> <ip> --node-type Satellite` (node-create success envelope) | WIRE_SHAPE | permanent | BLOCKSTOR envelope shape on satellite node registration. BS emits `node created: <node>` + a `SUCCESS: No active connection to satellite '<node>'` line; upstream emits `New node '<node>' registered.` (with a UUID detail) + a `WARNING: No active connection to satellite '<node>'` line whose Details explain the controller will (re-)establish the connection. Both register the node and exit 0; the operator-visible outcome (node exists, awaiting satellite handshake) is identical. The "no active connection" notice is INFO/SUCCESS-class in BS vs WARNING-class upstream, and BS does not surface the volatile node UUID inline. Envelope-shape only; no behavioural divergence. |
| 16 | `ps l` (physical-storage list — `size` field) | WIRE_SHAPE | 2026-12-31 | BS `/v1/physical-storage` omits the per-device `size` key that python-linstor's `show_physical_storage` reads unconditionally (`linstor/responses.py` `devices.size`), so `linstor ps l` raises `KeyError: 'size'` and exits 2 against BS where upstream renders the table and exits 0. `ps l` is a hardware-discovery convenience surface; linstor-csi / piraeus-operator never call it, and the device-pool creation path (`ps create-device-pool`, cli-matrix `ps-cdp-*`) is unaffected. Tracked as a missing wire field to populate on the physical-storage DTO; whitelisted until then. |

## Open (block merge until addressed)

These rows are **NOT** whitelisted on purpose — they appear in the audit but block any future refresh, so an open issue stays visible.
Expand Down
40 changes: 27 additions & 13 deletions stand/csi-sanity-job.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -38,12 +38,30 @@ spec:
- {name: CSI_ENDPOINT, value: "unix:///csi/csi.sock"}
# mTLS endpoint. golinstor (which linstor-csi wraps) upgrades
# to HTTPS and presents the client cert when LS_USER_*/
# LS_ROOT_CA point at PEM files. The Service exposes ONLY
# :3371 now, so the old http://...:3370 endpoint is gone.
# LS_ROOT_CA are set. The Service exposes ONLY :3371 now, so
# the old http://...:3370 endpoint is gone.
#
# IMPORTANT: golinstor v0.58.0 (vendored by piraeus-csi
# v1.10.1) reads LS_ROOT_CA / LS_USER_CERTIFICATE / LS_USER_KEY
# as PEM *content*, not file paths — the env value is fed
# straight into x509.CertPool.AppendCertsFromPEM /
# tls.X509KeyPair. The path-based LS_*_FILE variants only exist
# from golinstor v0.61.0. Passing a path made golinstor try to
# PEM-decode the literal string "/etc/linstor/client/ca.crt"
# and fail with "failed to get a valid certificate from
# 'LS_ROOT_CA'", so linstor-csi never opened the socket and the
# whole Job hung until the harness wait timed out. Inject the
# PEM bytes directly from the cert-manager secret instead.
- {name: LS_CONTROLLERS, value: "https://blockstor-controller.blockstor-system.svc:3371"}
- {name: LS_USER_CERTIFICATE, value: "/etc/linstor/client/tls.crt"}
- {name: LS_USER_KEY, value: "/etc/linstor/client/tls.key"}
- {name: LS_ROOT_CA, value: "/etc/linstor/client/ca.crt"}
- name: LS_USER_CERTIFICATE
valueFrom:
secretKeyRef: {name: blockstor-apiserver-client-tls, key: tls.crt}
- name: LS_USER_KEY
valueFrom:
secretKeyRef: {name: blockstor-apiserver-client-tls, key: tls.key}
- name: LS_ROOT_CA
valueFrom:
secretKeyRef: {name: blockstor-apiserver-client-tls, key: ca.crt}
# Resolved from a downward-API spec.nodeName if the job
# is scheduled onto a worker, otherwise hard-coded to a
# known stand worker. The recorder e2e6 stand uses
Expand All @@ -52,13 +70,12 @@ spec:
- name: BLOCKSTOR_NODE
valueFrom:
fieldRef: {fieldPath: spec.nodeName}
# The client cert (tls.crt/tls.key/ca.crt) from
# blockstor-apiserver-client-tls is injected as PEM content via
# the LS_USER_*/LS_ROOT_CA env vars above (golinstor v0.58.0 wants
# content, not a mount path), so no volume mount is needed here.
volumeMounts:
- {name: csi, mountPath: /csi}
# cert-manager-issued client cert (tls.crt/tls.key/ca.crt)
# presented to the apiserver's RequireAndVerifyClientCert
# listener. No reloader.stakater.com — the Job is short-lived
# so cert rotation is out of scope for it.
- {name: client-tls, mountPath: /etc/linstor/client, readOnly: true}

- name: csi-sanity
image: golang:1.25
Expand Down Expand Up @@ -92,9 +109,6 @@ spec:
- {name: csi, emptyDir: {}}
- name: scparams
configMap: {name: csi-sanity-params}
- name: client-tls
secret:
secretName: blockstor-apiserver-client-tls

---
apiVersion: v1
Expand Down