Skip to content

Commit af13a3b

Browse files
committed
docs: v2.8 — hardware guide, public release guide, prod troubleshooting, test suite, complete issue registry
New documents: - docs/05-guides/19-hardware-deployment-guide.md (12-section hardware deployment guide, 8-server layout) - docs/05-guides/20-public-release-guide.md (community announcement strategy, MkDocs, press kit) - docs/05-guides/21-production-troubleshooting.md (all 20 services, cross-service issues, emergency recovery) - docs/06-technical-reference/test-suite.md (120 lab scripts, 23 integrations, E2E, perf benchmarks, CI patterns) - docs/06-technical-reference/troubleshooting-complete.md (40 issues catalogued across all phases, root causes, fixes) Updated: - CHANGELOG.md: [1.41.0] local Docker runner fixes + [Unreleased] Phase 5 Kubernetes plans - README.md: badges updated (120/120 PASS, Production Ready), project status all phases complete - docs/IT-STACK-TODO.md: Phase 5 Kubernetes section added (30+ Helm/k3s/ArgoCD items), v2.7 -> v2.8
1 parent 77f535b commit af13a3b

8 files changed

Lines changed: 3776 additions & 19 deletions

File tree

CHANGELOG.md

Lines changed: 46 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -8,10 +8,52 @@ This project adheres to [Keep a Changelog](https://keepachangelog.com/en/1.1.0/)
88

99
## [Unreleased]
1010

11-
### Planned — Next Up
12-
- Fix Phase 2/3/4 local Docker test runner failures (Zammad, FreePBX, Graylog, Snipe-IT healthchecks)
13-
- Azure Phase 2 standalone lab testing (Nextcloud, Mattermost, Jitsi, iRedMail, Zammad)
14-
- Remaining SSO integrations: Mattermost ↔ Keycloak, SuiteCRM ↔ Keycloak SAML, Zammad ↔ Keycloak, GLPI ↔ Keycloak, Taiga ↔ Keycloak, Odoo ↔ Keycloak
11+
### Planned — Phase 5: Kubernetes / Helm Production Deployment
12+
- Helm charts for all 20 modules (`it-stack-helm` repo — scaffolded, not yet implemented)
13+
- k3s single-node and multi-node cluster manifests
14+
- Persistent Volume Claims for stateful services (PostgreSQL, Elasticsearch, Redis)
15+
- Ingress via Traefik CRDs (replacing standalone Traefik Docker container)
16+
- Kubernetes-native health probes and readiness gates for all services
17+
- Production HA topology with pod anti-affinity rules
18+
- Horizontal Pod Autoscaler (HPA) for stateless services (Keycloak, Mattermost, Jitsi)
19+
- GitOps workflow via ArgoCD or Flux CD
20+
- Kubernetes-native secret management (External Secrets Operator + Vault or Sealed Secrets)
21+
22+
### Planned — Public Release Milestones
23+
- GitHub Pages documentation site live at `https://it-stack-dev.github.io/it-stack-docs/`
24+
- Docker Hub / GHCR images published for all 20 modules
25+
- Community announcement: r/selfhosted, r/homelab, Hacker News, Dev.to
26+
- YouTube demo video: full stack walkthrough (SSO login → Nextcloud → Mattermost → Odoo → GLPI)
27+
28+
---
29+
30+
## [1.41.0] — 2026-03-11
31+
32+
### Fixed — Sprint 47: Local Docker Test Runner Failures (All 3 Phases)
33+
34+
**`it-stack-dev/scripts/testing/lab-phase2.sh` — Zammad nginx healthcheck:**
35+
- Root cause: `nginx:1.25-alpine` does not include `curl`; Docker healthcheck command `curl -sf ...` always exited `sh: curl: not found`, keeping the container permanently in "unhealthy" state regardless of whether Zammad was actually serving traffic
36+
- Fix: replaced healthcheck test command with `wget -q -O /dev/null http://localhost:80/ && echo OK || exit 1` (wget ships in Alpine by default)
37+
- Increased healthcheck `retries` 20 → 40 (gives full 800s window at 20s interval)
38+
- Increased healthcheck `start_period` 60s → 120s (accounts for ES + zammad-init before nginx needs to respond)
39+
- Increased `wait_healthy` polling in test loop: 20×30=600s → 30×30=900s (15-minute cap)
40+
41+
**`it-stack-dev/scripts/testing/lab-phase3.sh` — FreePBX first-run init time:**
42+
- Root cause: `tiredofit/freepbx:latest` performs a full module install (>100 FreePBX modules via `fwconsole ma upgradeall`) on first run, which takes 10–30 minutes on local Docker Desktop vs. 8–12 minutes on Azure D4s_v4; the 20-minute cap was insufficient and the fallback did a single immediate HTTP check before the web stack was ready
43+
- Added `wait_http` helper function (existed in phase4 but was missing from phase3)
44+
- Extended `wait_healthy` hard cap: 40×30=1200s → 60×30=1800s (30 minutes)
45+
- Replaced immediate-fail HTTP fallback with a `wait_http "http://localhost:8301/" 20 30` retry loop — 10 additional minutes of HTTP polling before declaring failure (total 40-minute cap)
46+
47+
**`it-stack-dev/scripts/testing/lab-phase4.sh` — Snipe-IT and Graylog healthchecks:**
48+
- **Snipe-IT** — Root cause: Docker healthcheck `retries: 20` at 20s interval = 400s hard cap; first-run Laravel migrations + asset compilation take 6–8 minutes on local Docker Desktop; both the Docker healthcheck and `wait_healthy 24 10` (240s) timed out before migrations completed
49+
- Increased healthcheck `retries` 20 → 30 (600s hard cap)
50+
- Doubled `wait_healthy` polling: 24×10=240s → 48×10=480s
51+
- **Graylog** — Root cause: default `GRAYLOG_MESSAGE_JOURNAL_MAX_SIZE` is 5 GB; local Docker Desktop has limited disk I/O throughput causing journal segment creation to take >720s; Docker healthcheck `retries: 24` at 20s = 630s cap meant Docker marked the container "unhealthy" before it was ready, causing `wait_healthy` to immediately exit false
52+
- Increased healthcheck `retries` 24 → 36 (870s hard cap, consistent with `start_period: 150s` + 720s window)
53+
- Increased `wait_healthy` polling: 36×20=720s → 54×20=1080s (18-minute cap)
54+
55+
**`docs/IT-STACK-TODO.md` — v2.6 → v2.7:**
56+
- Marked all 3 remaining open items as `[x]`; zero open items remain in the entire project as originally scoped
1557

1658
---
1759

README.md

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -3,11 +3,12 @@
33
> **Complete enterprise IT platform built entirely from open-source software — $0 in software licensing.**
44
55
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
6-
[![Status](https://img.shields.io/badge/Status-CI%2FCD%20Complete-brightgreen.svg)](docs/project/todo.md)
7-
[![Modules](https://img.shields.io/badge/Modules-20%20scaffolded-green.svg)](https://github.com/orgs/it-stack-dev/repositories)
8-
[![Labs](https://img.shields.io/badge/Labs-10%2F120%20complete-blue.svg)](docs/labs/overview.md)
9-
[![Docs](https://img.shields.io/badge/Docs-GitHub%20Pages-informational.svg)](https://it-stack-dev.github.io/it-stack-docs/)
6+
[![Status](https://img.shields.io/badge/Status-Production%20Ready-brightgreen.svg)](docs/IT-STACK-TODO.md)
7+
[![Modules](https://img.shields.io/badge/Modules-20%20Complete-green.svg)](https://github.com/orgs/it-stack-dev/repositories)
8+
[![Labs](https://img.shields.io/badge/Labs-120%2F120%20PASS-success.svg)](docs/03-labs/)
9+
[![Integrations](https://img.shields.io/badge/Integrations-23%2F23%20PASS-success.svg)](docs/02-implementation/12-integration-guide.md)
1010
[![CI](https://img.shields.io/badge/CI-20%2F20%20passing-success.svg)](https://github.com/orgs/it-stack-dev/repositories)
11+
[![Docs](https://img.shields.io/badge/Docs-GitHub%20Pages-informational.svg)](https://it-stack-dev.github.io/it-stack-docs/)
1112

1213
---
1314

@@ -197,25 +198,26 @@ See the full list of [26 repositories](https://github.com/orgs/it-stack-dev/repo
197198

198199
| Phase | Description | Status |
199200
|-------|-------------|--------|
200-
| 0 | Planning & documentation | ✅ Complete |
201-
| 1 | GitHub org bootstrap (26 repos, 120 issues, 5 projects) | ✅ Complete |
201+
| 0 | Planning & documentation (~600 pages, 14 source docs) | ✅ Complete |
202+
| 1 | GitHub org bootstrap (26 repos, 120 issues, 5 projects, labels) | ✅ Complete |
202203
| 2 | Local dev environment (`C:\IT-Stack\it-stack-dev\`) | ✅ Complete |
203204
| 3 | Docs site (MkDocs Material, GitHub Pages) | ✅ Complete |
204-
| 4 | All 20 module repos scaffolded | ✅ Complete |
205-
| 5 | CI/CD workflows (20/20 passing) | ✅ Complete |
206-
| 6 | Ansible playbooks — Phase 1 modules (76 files, 6 roles) | ✅ Complete |
207-
| 7 | Lab 01 Docker Compose + test scripts — all 5 Phase 1 modules | ✅ Complete |
208-
| 8 | Lab 02 LAN stacks + test scripts — all 5 Phase 1 modules | ✅ Complete |
209-
| 9 | Lab 03 Advanced Features — all 5 Phase 1 modules | 🔲 Next |
205+
| 4 | All 20 module repos scaffolded + CI/CD (20/20 passing) | ✅ Complete |
206+
| 5 | Ansible playbooks — all 20 modules (76+ files, 20 roles, 23 integrations) | ✅ Complete |
207+
| 6 | Lab 01–06 Docker Compose + test scripts — all 20 modules (120 labs) | ✅ Complete — 120/120 PASS on Azure |
208+
| 7 | SSO integrations tested (FreeIPA→Keycloak→all 9 services) | ✅ Complete — 35/35 PASS on Azure |
209+
| 8 | Production readiness (Security · Monitoring · Backup · DR · Capacity) | ✅ Complete |
210+
| 9 | Phase 5: Kubernetes / Helm deployment | 🔲 Next |
210211

211212
---
212213

213214
## Getting Started
214215

215216
1. **Browse** the docs at https://it-stack-dev.github.io/it-stack-docs/
216-
2. **Read** [docs/project/master-index.md](docs/project/master-index.md) for the full documentation map
217-
3. **Track progress** in [docs/project/todo.md](docs/project/todo.md)
218-
4. **Deploy Phase 1** using [docs/labs/part2-identity-database.md](docs/labs/part2-identity-database.md)
217+
2. **Read** [docs/05-guides/01-master-index.md](docs/05-guides/01-master-index.md) for the full documentation map
218+
3. **Deploy on real hardware** using the [Hardware Deployment Guide](docs/05-guides/19-hardware-deployment-guide.md)
219+
4. **Track progress** in [docs/IT-STACK-TODO.md](docs/IT-STACK-TODO.md)
220+
5. **Troubleshoot** using the [Production Troubleshooting Guide](docs/05-guides/21-production-troubleshooting.md)
219221

220222
---
221223

0 commit comments

Comments
 (0)