Skip to content

Commit 298ca8b

Browse files
committed
feat(shared/safety): add retry, with-timeout, file-lock, dry-run, confirm-dangerous, and audit-event scripts with docs
1 parent f2f6def commit 298ca8b

13 files changed

Lines changed: 1410 additions & 0 deletions

File tree

Lines changed: 80 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,80 @@
1+
# audit-event.sh
2+
3+
## Purpose
4+
Emit structured JSON audit events for operational actions, outcomes, and metadata.
5+
6+
## Location
7+
`shared/safety/audit-event.sh`
8+
9+
## Preconditions
10+
- Required tools: `bash`, `date`
11+
- Required permissions: write permissions for `--output` target file (if used)
12+
- Required environment variables: optional `AUDIT_ACTOR`
13+
14+
## Arguments
15+
| Flag | Required | Default | Description |
16+
|------|----------|---------|-------------|
17+
| `--action TEXT` | Yes | N/A | Action identifier |
18+
| `--actor TEXT` | No | `AUDIT_ACTOR` or `USER` | Acting principal |
19+
| `--target TEXT` | No | empty | Target resource id |
20+
| `--status VALUE` | No | `info` | `info\|success\|failure\|warning` |
21+
| `--message TEXT` | No | empty | Human-readable message |
22+
| `--event-id ID` | No | generated id | Event correlation id |
23+
| `--source TEXT` | No | script basename | Originating component |
24+
| `--meta KEY=VALUE` | No | none | Metadata pair (repeatable, last key wins) |
25+
| `--timestamp-format F` | No | `%Y-%m-%dT%H:%M:%S%z` | Timestamp format |
26+
| `--output FILE` | No | stdout | Append event to file |
27+
| `--pretty` | No | `false` | Multi-line JSON output |
28+
29+
## Scenarios
30+
- Happy path: emit audit JSON to stdout for stream ingestion.
31+
- Common operational path: append JSON lines to audit file.
32+
- Failure path: missing action, invalid status, or invalid meta key.
33+
- Recovery/rollback path: correct schema fields and rerun emit step.
34+
35+
## Usage
36+
```bash
37+
shared/safety/audit-event.sh --action deploy.start --status info --target service/api
38+
shared/safety/audit-event.sh --action deploy.finish --status success --meta version=1.4.2 --meta env=prod
39+
shared/safety/audit-event.sh --action db.backup --status failure --message "snapshot timeout" --output /var/log/devops-audit.log
40+
```
41+
42+
## Behavior
43+
- Main execution flow:
44+
- validate required fields and enums
45+
- generate timestamp/event id when absent
46+
- build escaped JSON payload
47+
- emit to stdout or append to output file
48+
- Idempotency notes: each execution emits a new event (non-idempotent by design).
49+
- Side effects: appends to audit file when `--output` is used.
50+
51+
## Output
52+
- Standard output format: JSON object (compact by default, pretty with `--pretty`).
53+
- Exit codes:
54+
- `0` event emitted successfully
55+
- `2` validation/usage error
56+
57+
## Failure Modes
58+
- Common errors and likely causes:
59+
- missing `--action`
60+
- invalid `--status`
61+
- malformed `--meta` pair or invalid metadata key
62+
- output directory does not exist
63+
- Recovery and rollback steps:
64+
- correct required fields and enums
65+
- ensure output path parent exists and is writable
66+
- validate metadata format before emission
67+
68+
## Security Notes
69+
- Secret handling: avoid including credentials/tokens in message or metadata.
70+
- Least-privilege requirements: restrict write access to audit output path.
71+
- Audit/logging expectations: designed for SIEM/log pipeline ingestion.
72+
73+
## Testing
74+
- Unit tests:
75+
- status enum and metadata validation
76+
- JSON escaping logic
77+
- Integration tests:
78+
- append behavior to output files
79+
- Manual verification:
80+
- emit success/failure events and validate JSON schema
Lines changed: 71 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,71 @@
1+
# confirm-dangerous.sh
2+
3+
## Purpose
4+
Enforce explicit human confirmation before running potentially destructive operations.
5+
6+
## Location
7+
`shared/safety/confirm-dangerous.sh`
8+
9+
## Preconditions
10+
- Required tools: `bash`, interactive stdin for prompted mode
11+
- Required permissions: none beyond script execution
12+
- Required environment variables: optional `CONFIRM_DANGEROUS`
13+
14+
## Arguments
15+
| Flag | Required | Default | Description |
16+
|------|----------|---------|-------------|
17+
| `--message TEXT` | No | destructive warning message | Context shown to operator |
18+
| `--prompt TEXT` | No | `Type '<token>' to continue` | Prompt text |
19+
| `--expect TOKEN` | No | `CONFIRM` | Required input token |
20+
| `-y`, `--yes` | No | `false` | Non-interactive bypass |
21+
| `--timeout SEC` | No | `0` | Prompt timeout in seconds |
22+
23+
## Scenarios
24+
- Happy path: operator types expected token and script exits `0`.
25+
- Common operational path: audited automation uses `--yes` or `CONFIRM_DANGEROUS=1`.
26+
- Failure path: mismatch, timeout, or non-interactive stdin without override exits `1`.
27+
- Recovery/rollback path: rerun with explicit approval and validated context.
28+
29+
## Usage
30+
```bash
31+
shared/safety/confirm-dangerous.sh --message "About to delete production resources"
32+
shared/safety/confirm-dangerous.sh --expect DELETE --prompt "Type DELETE to continue"
33+
CONFIRM_DANGEROUS=1 shared/safety/confirm-dangerous.sh
34+
```
35+
36+
## Behavior
37+
- Main execution flow:
38+
- check non-interactive overrides
39+
- require interactive input when override absent
40+
- compare response with expected token
41+
- Idempotency notes: idempotent; no mutable side effects.
42+
- Side effects: user interaction and stderr messaging.
43+
44+
## Output
45+
- Standard output format: confirmation status messages on stderr.
46+
- Exit codes:
47+
- `0` confirmed
48+
- `1` confirmation rejected/timed out/unavailable
49+
- `2` invalid script arguments
50+
51+
## Failure Modes
52+
- Common errors and likely causes:
53+
- running in non-interactive session without `--yes`
54+
- wrong confirmation token entered
55+
- read timeout reached
56+
- Recovery and rollback steps:
57+
- rerun in interactive shell or with explicit audited override
58+
- confirm correct token before retry
59+
60+
## Security Notes
61+
- Secret handling: avoid embedding secrets in prompt text.
62+
- Least-privilege requirements: no elevated permissions required.
63+
- Audit/logging expectations: pair with audit logging to record approval context.
64+
65+
## Testing
66+
- Unit tests:
67+
- option validation and token matching
68+
- Integration tests:
69+
- non-interactive behavior with and without overrides
70+
- Manual verification:
71+
- interactive acceptance/rejection and timeout paths
Lines changed: 73 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,73 @@
1+
# dry-run.sh
2+
3+
## Purpose
4+
Standardize dry-run behavior for shell automation by printing command intent without executing mutations.
5+
6+
## Location
7+
`shared/safety/dry-run.sh`
8+
9+
## Preconditions
10+
- Required tools: `bash`
11+
- Required permissions: execute permission for wrapped command when not in dry-run mode
12+
- Required environment variables: optional `DRY_RUN`
13+
14+
## Arguments
15+
| Flag | Required | Default | Description |
16+
|------|----------|---------|-------------|
17+
| `--dry-run` | No | from `DRY_RUN` env | Force dry-run mode |
18+
| `--execute` | No | from `DRY_RUN` env | Force execution mode |
19+
| `--prefix TEXT` | No | `DRY-RUN` | Prefix for dry-run message |
20+
| `--quiet` | No | `false` | Suppress dry-run message |
21+
| `-- COMMAND [ARGS...]` | Yes | N/A | Command to print/execute |
22+
23+
## Scenarios
24+
- Happy path: in dry-run mode, command is printed and not executed.
25+
- Common operational path: CI pipeline toggles execution via `DRY_RUN`.
26+
- Failure path: missing command or bad flags returns usage error.
27+
- Recovery/rollback path: rerun with `--execute` after validation.
28+
29+
## Usage
30+
```bash
31+
shared/safety/dry-run.sh --dry-run -- terraform apply
32+
DRY_RUN=1 shared/safety/dry-run.sh -- kubectl delete ns temp
33+
shared/safety/dry-run.sh --execute -- ./migrate.sh
34+
```
35+
36+
## Behavior
37+
- Main execution flow:
38+
- resolve dry-run mode from env/flags
39+
- validate command input
40+
- print quoted command in dry-run mode or execute directly
41+
- Idempotency notes: dry-run branch is non-mutating; execution branch depends on wrapped command.
42+
- Side effects: none in dry-run mode; wrapped command side effects in execute mode.
43+
44+
## Output
45+
- Standard output format:
46+
- dry-run mode: `<prefix>: <quoted command>` to stderr (unless `--quiet`)
47+
- execute mode: wrapped command output
48+
- Exit codes:
49+
- `0` dry-run success or wrapped command success
50+
- wrapped command non-zero exit in execute mode
51+
- `2` invalid script arguments
52+
53+
## Failure Modes
54+
- Common errors and likely causes:
55+
- no command passed after `--`
56+
- conflicting assumptions on dry-run state
57+
- Recovery and rollback steps:
58+
- pass explicit `--dry-run`/`--execute` for clarity
59+
- verify command arguments before execution mode
60+
61+
## Security Notes
62+
- Secret handling: dry-run output may include full arguments; avoid secret-bearing args in logged channels.
63+
- Least-privilege requirements: no elevated privileges for dry-run path.
64+
- Audit/logging expectations: useful for change preview in approval workflows.
65+
66+
## Testing
67+
- Unit tests:
68+
- env/flag precedence for dry-run mode
69+
- command quoting behavior
70+
- Integration tests:
71+
- verify no mutation occurs in dry-run mode
72+
- Manual verification:
73+
- compare behavior for `--dry-run` vs `--execute`
Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,74 @@
1+
# file-lock.sh
2+
3+
## Purpose
4+
Provide exclusive lock-based command execution to prevent concurrent unsafe operations.
5+
6+
## Location
7+
`shared/safety/file-lock.sh`
8+
9+
## Preconditions
10+
- Required tools: `bash`, `mkdir`, `rm`, `date`, `awk`, `sleep`, `stat`
11+
- Required permissions: write permissions on lock path parent directory
12+
- Required environment variables: none
13+
14+
## Arguments
15+
| Flag | Required | Default | Description |
16+
|------|----------|---------|-------------|
17+
| `--lock-file PATH` | Yes | N/A | Lock directory path |
18+
| `--timeout SEC` | No | `0` | Wait timeout (`0` waits indefinitely) |
19+
| `--poll-interval SEC` | No | `0.2` | Poll interval while waiting |
20+
| `--stale-after SEC` | No | `0` | Break stale lock older than SEC |
21+
| `--quiet` | No | `false` | Suppress wait/stale logs |
22+
| `-- COMMAND [ARGS...]` | Yes | N/A | Command to run while lock is held |
23+
24+
## Scenarios
25+
- Happy path: lock acquired immediately; command runs and lock is released.
26+
- Common operational path: multiple workers serialize writes safely.
27+
- Failure path: lock wait timeout expires or lock path cannot be created.
28+
- Recovery/rollback path: investigate stale lock owner, tune `--stale-after`, retry safely.
29+
30+
## Usage
31+
```bash
32+
shared/safety/file-lock.sh --lock-file /tmp/deploy.lock -- ./deploy.sh
33+
shared/safety/file-lock.sh --lock-file /tmp/state.lock --timeout 60 -- terraform apply
34+
shared/safety/file-lock.sh --lock-file /tmp/sync.lock --stale-after 300 -- ./sync-state.sh
35+
```
36+
37+
## Behavior
38+
- Main execution flow:
39+
- attempt atomic lock acquisition via `mkdir`
40+
- optionally break stale lock
41+
- run command under lock
42+
- release lock on exit/signals
43+
- Idempotency notes: lock handling is idempotent for same process lifecycle.
44+
- Side effects: lock directory create/remove and metadata file writes.
45+
46+
## Output
47+
- Standard output format: wrapped command output; lock status logs on stderr unless `--quiet`.
48+
- Exit codes:
49+
- wrapped command exit code
50+
- `73` timeout waiting for lock
51+
- `2` invalid script arguments
52+
53+
## Failure Modes
54+
- Common errors and likely causes:
55+
- missing or invalid `--lock-file`
56+
- lock path parent not writable
57+
- stale lock removal failure
58+
- Recovery and rollback steps:
59+
- correct filesystem permissions
60+
- inspect stale lock metadata (`.owner`)
61+
- manually clear lock only after owner validation
62+
63+
## Security Notes
64+
- Secret handling: do not store secret values in lock path names.
65+
- Least-privilege requirements: only filesystem permissions required for lock location.
66+
- Audit/logging expectations: lock wait/timeout logs useful for concurrency incident analysis.
67+
68+
## Testing
69+
- Unit tests:
70+
- timeout and stale lock validation logic
71+
- Integration tests:
72+
- concurrent process contention and serialization behavior
73+
- Manual verification:
74+
- run two commands against same lock path and confirm mutual exclusion
Lines changed: 77 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,77 @@
1+
# retry.sh
2+
3+
## Purpose
4+
Retry a command on failure with configurable attempts, delay, backoff, and retry-code filtering.
5+
6+
## Location
7+
`shared/safety/retry.sh`
8+
9+
## Preconditions
10+
- Required tools: `bash`, `awk`, `sleep`, `date`
11+
- Required permissions: execute permission for target command and this script
12+
- Required environment variables: none
13+
14+
## Arguments
15+
| Flag | Required | Default | Description |
16+
|------|----------|---------|-------------|
17+
| `--attempts N` | No | `3` | Total number of attempts |
18+
| `--delay SEC` | No | `1` | Initial delay before retry |
19+
| `--backoff FACTOR` | No | `2` | Multiplier applied to delay |
20+
| `--max-delay SEC` | No | `0` | Delay cap (`0` disables cap) |
21+
| `--jitter PERCENT` | No | `0` | Positive jitter percentage added to delay |
22+
| `--retry-on CODES` | No | all non-zero | Comma-separated exit codes to retry |
23+
| `--quiet` | No | `false` | Suppress retry logs |
24+
| `-- COMMAND [ARGS...]` | Yes | N/A | Command to execute and retry |
25+
26+
## Scenarios
27+
- Happy path: command succeeds on first attempt and exits `0`.
28+
- Common operational path: transient failures recover after one or more retries.
29+
- Failure path: command keeps failing or returns non-retryable status.
30+
- Recovery/rollback path: tune retry policy (`--retry-on`, attempts/delay) and rerun.
31+
32+
## Usage
33+
```bash
34+
shared/safety/retry.sh --attempts 5 --delay 1 --backoff 2 -- curl -fsS https://example.com/health
35+
shared/safety/retry.sh --retry-on 1,2,28 --attempts 4 --delay 0.5 -- terraform plan
36+
shared/safety/retry.sh --quiet --attempts 3 -- make deploy
37+
```
38+
39+
## Behavior
40+
- Main execution flow:
41+
- validate retry policy options
42+
- execute command
43+
- retry on eligible non-zero statuses until attempt limit
44+
- exit with command status
45+
- Idempotency notes: wrapper is idempotent; wrapped command may not be.
46+
- Side effects: repeated execution of wrapped command.
47+
48+
## Output
49+
- Standard output format: wrapped command output; retry logs on stderr unless `--quiet`.
50+
- Exit codes:
51+
- `0` command eventually succeeded
52+
- wrapped command exit code on final/non-retryable failure
53+
- `2` invalid script arguments
54+
55+
## Failure Modes
56+
- Common errors and likely causes:
57+
- invalid numeric options (`--attempts`, `--delay`, etc.)
58+
- missing command after `--`
59+
- bad retry code list format
60+
- Recovery and rollback steps:
61+
- correct option values
62+
- ensure command is present and executable
63+
- reduce retry scope for non-idempotent commands
64+
65+
## Security Notes
66+
- Secret handling: avoid printing secrets in wrapped command args or stderr.
67+
- Least-privilege requirements: run with minimum privileges required by wrapped command.
68+
- Audit/logging expectations: retry logs support incident and change tracing.
69+
70+
## Testing
71+
- Unit tests:
72+
- argument validation
73+
- retry-on filtering logic
74+
- Integration tests:
75+
- flaky command simulation with deterministic exit codes
76+
- Manual verification:
77+
- force failure then success and confirm retry timing/status behavior

0 commit comments

Comments
 (0)