Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
599f067
MAF-19265: feat(helm): add MinIO, Loki, and Vector dependencies to th…
seongsu-dev Feb 20, 2026
a91acbc
MAF-19265: feat(website): Update package-lock.json and enhance prereq…
seongsu-dev Feb 20, 2026
05cd757
MAF-19265: docs(AGENTS): expand agent self-improvement and design pri…
seongsu-dev Feb 20, 2026
2581b86
MAF-19265: feat(deploy): Update Helm chart dependencies for MoAI Infe…
seongsu-dev Feb 20, 2026
31ed7fd
MAF-19265: chore(deploy): replace bitnami minio chart with official m…
seongsu-dev Feb 20, 2026
afc58e9
MAF-19265: refactor(deploy): update service names for Loki and MinIO …
seongsu-dev Feb 20, 2026
24669ac
MAF-19265: feat(deploy): Update MinIO configuration in values.yaml an…
seongsu-dev Feb 20, 2026
c310d61
MAF-19265: feat(docs): create AGENTS.md files for test and helm direc…
seongsu-dev Feb 20, 2026
97427bf
MAF-19265: refactor(helm): improve helm-lint command to handle multip…
seongsu-dev Feb 20, 2026
67bc88b
MAF-19265: chore(docs): update README.md for MinIO configuration and …
seongsu-dev Feb 20, 2026
53aa0a8
MAF-19265: feat(deploy): enhance log collection configuration in Helm…
seongsu-dev Feb 20, 2026
0eb18f0
MAF-19265: docs(log-collection): enhance log collection documentation…
seongsu-dev Feb 20, 2026
43550ca
MAF-19265: docs(prerequisites): update YAML configuration examples fo…
seongsu-dev Feb 20, 2026
f36206b
MAF-19265: docs(prerequisites): clarify storage-class requirements in…
seongsu-dev Feb 20, 2026
f5081ca
MAF-19265: feat(website): Update AGENTS.md and log-collection.mdx for…
seongsu-dev Feb 20, 2026
32fafb6
MAF-19265: docs(AGENTS, log-collection): standardize output formattin…
seongsu-dev Feb 20, 2026
3e60b77
MAF-19265: docs(log-collection): update log query language references…
seongsu-dev Feb 20, 2026
dda854f
MAF-19265: docs(values.yaml, prerequisites, log-collection): update c…
seongsu-dev Feb 23, 2026
d600e5c
MAF-19265: chore(values.yaml, log-collection): streamline resource co…
seongsu-dev Feb 23, 2026
877b1e5
MAF-19265: chore(README.md, values.yaml): simplify configuration opti…
seongsu-dev Feb 23, 2026
4e7a4ab
MAF-19265: docs(log-collection): improve output formatting for comman…
seongsu-dev Feb 23, 2026
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
83 changes: 41 additions & 42 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,45 +57,44 @@ The commit message should be structured as follows:

### E2E Test

- **Version scope**:
- E2E tests cover only `vX.Y.Z` (release) and `vX.Y.Z-rc.N` (release candidate) version formats.
- Other version formats (e.g. dev builds, custom tags) are out of scope and should not be tested in E2E.

- **Do not test resource specifications**:
- Do not validate individual fields of the YAML file declaring the resource (resource spec).
- Instead, create the resource and verify that its status reaches the expected state.

- **Assume fully controlled cluster**:
- Do not check if components are already installed.
- Assume the cluster is fully controlled by the test and installed components are safe to overwrite or delete.

- **Test suite layout**:
- Split tests by purpose under `test/e2e`, for example `test/e2e/performance` and `test/e2e/quality`.
- In each directory, define shared Ginkgo configuration (labels, timeouts, common hooks) in `suite_test.go`, and keep scenarios in separate `*_test.go` files.
- Shared configuration values must come from the `test/utils/settings` package instead of hard-coded constants in test files.

- **Environment variable management**:
- Manage all E2E environment variables centrally in `test/e2e/envs/env_vars.go`.
- When a new environment variable is required:
- Add it to the `envVars` slice with default value, description, category, and type.
- Expose it via public variables (for example `TestModel`, `HFToken`) and access it only through those variables.
- Do not call `os.Getenv` directly in test code.
- Keep the documentation consistent: changes must pass the `validateEnvVars()` check.

- **Resource templates and settings**:
- Manage Kubernetes resource specifications for Gateway, InferenceService, Jobs, and similar resources as Go templates (`.yaml.tmpl`) under `test/config/**`.
- Tests must read template paths and default values from constants in `test/utils/settings/constants.go`.
- When adding a new benchmark or performance test Job:
- Add the template file under an appropriate `test/config/<domain>` subdirectory.
- Define the corresponding path and default parameters in the `settings` package.

- **Utility reuse**:
- Implement all cluster manipulation logic (namespace creation, Gateway create/delete, Heimdall install/uninstall, InferenceService(Template) create/delete, etc.) in the `test/utils` package and call only those helpers from tests.
- Follow this pattern for scenario flow:
- `BeforeAll`: create namespace → install Gateway → install Heimdall → create InferenceServiceTemplates → create InferenceServices → wait until they are Ready.
- `AfterAll`: if `envs.SkipCleanup` is `false`, clean up the above resources in reverse order.
- `It(...)`: render the Job template → create the Job with `kubectl create -f -` → wait for completion with `kubectl wait` → collect logs and perform domain-specific assertions.

- **Makefile and workflow integration**:
- Provide separate Make targets per test purpose (for example `e2e-performance`, `e2e-quality`) so that CI can run them independently.
- GitHub Actions and other workflows should invoke these targets directly, and new test categories should follow the same pattern when adding additional targets and workflows.
See [`test/AGENTS.md`](test/AGENTS.md).

## Agent Self-Improvement

After completing any non-trivial task, evaluate whether the work involved:
- A recurring pattern that will likely appear again in future tasks, or
- A mistake that was corrected through user feedback, or
- A design decision that required deliberate reasoning to reach the right answer.

If any of the above applies, **record it in the most relevant `AGENTS.md`** before closing the task — this file for general patterns, [`test/AGENTS.md`](test/AGENTS.md) for test-specific patterns, [`deploy/helm/AGENTS.md`](deploy/helm/AGENTS.md) for Helm chart patterns, and [`website/AGENTS.md`](website/AGENTS.md) for documentation patterns. Entries should be concise, actionable, and placed under the most relevant existing section. If no section fits, create one.

The goal is to make every repeated task faster and every repeated mistake impossible.

### Creating Sub-directory AGENTS.md Files

When a directory accumulates enough domain-specific rules to warrant separation, create a dedicated `AGENTS.md` in that directory. Follow this checklist:

1. **Create `AGENTS.md`** in the target directory with a header that links back to this root file:
```markdown
# <Domain> — Agent Rules

Rules specific to the `<dir>/` directory. General contribution guidelines are in the root [`AGENTS.md`](/AGENTS.md).
```

2. **Create a `CLAUDE.md` symlink** pointing to `AGENTS.md` in the same directory. Cursor reads `CLAUDE.md` as context; the symlink ensures both tools see the same content:
```shell
cd <dir> && ln -s AGENTS.md CLAUDE.md
```

3. **Move the relevant sections** from the root `AGENTS.md` (or parent `AGENTS.md`) into the new file. Replace the moved content in the parent with a one-line reference:
```markdown
### E2E Test

See [`test/AGENTS.md`](test/AGENTS.md).
```

4. **Update the Agent Self-Improvement section** in the parent to mention the new file as a recording target.

## Helm Charts

See [`deploy/helm/AGENTS.md`](deploy/helm/AGENTS.md) for design principles and chart development rules.
6 changes: 5 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -19,7 +19,11 @@ help: ## Display this help.

.PHONY: helm-lint
helm-lint: ## Lint Helm charts.
@helm lint ./deploy/helm/*
@for chart in ./deploy/helm/*; do \
if [ -d "$$chart" ] && [ -f "$$chart/Chart.yaml" ]; then \
helm lint "$$chart"; \
fi; \
done

.PHONY: helm-docs
helm-docs: ## Generate Helm chart documentation.
Expand Down
130 changes: 130 additions & 0 deletions deploy/helm/AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,130 @@
# Helm Charts — Agent Rules

Rules specific to the `deploy/helm/` directory. General contribution guidelines are in the root [`AGENTS.md`](/AGENTS.md).

## Design Principles

### Minimum Necessary Complexity

- **Do not add configuration options, fields, or abstractions for hypothetical future use cases.** Only add what the current task concretely requires.
- Before introducing a new value field, ask: "Is there a real, current use case that cannot be handled without it?" If the answer is no, omit the field and handle the edge case through documentation instead.
- Example: when considering whether to add a `minio.externalHost` field to support cross-namespace MinIO, the right answer was to document that users can point `loki.storage.s3.endpoint` to the external host directly — no new field needed.

### Documentation over Code for Edge Cases

- When a behavior difference only arises in a non-default, edge-case configuration, prefer documenting the workaround over adding a dedicated code path or configuration key.
- Reserve code changes for cases where the default path is broken or the workaround is genuinely error-prone.

### Reject Designs Before They Are Built

- If an initial design is heading in the wrong direction (e.g., standalone prerequisites instead of sub-chart dependencies, `enabled: false` defaults, nested config instead of top-level sections), raise the issue and redesign before writing code. Retrofitting a wrong structure is always more costly.

## Helm Chart Development

### Sub-chart Integration

- **All infrastructure components belong as sub-chart dependencies** of `moai-inference-framework`. Do not design them as standalone prerequisites that users install separately.
- **Enablement convention**: Every sub-chart dependency must have both a `condition:` entry in `Chart.yaml` AND `enabled: true` in the default `values.yaml`. Setting `enabled: false` as the default breaks the "install everything in one chart" philosophy. Follow the same pattern as existing components (`keda`, `lws`, `odin`, etc.).

```yaml
# Chart.yaml — always add condition: and use the official repository
- name: vector
version: 0.39.0
repository: https://helm.vector.dev
condition: vector.enabled

# values.yaml — always default to true
vector:
enabled: true
```

- **Official repositories**: Always use the chart's official upstream repository, not a mirror.
- loki: `https://grafana.github.io/helm-charts`
- vector: `https://helm.vector.dev`
- minio: `https://charts.min.io`

### Dynamic Service Name References

- **Do not use `fullnameOverride`** to fix service names. Instead, build references using `.Release.Name` so that names are always consistent with whatever release name the user chooses.

```yaml
# templates/grafana/datasource-loki.yaml
url: http://{{ .Release.Name }}-loki-gateway.{{ include "common.names.namespace" . }}.svc.cluster.local

# templates/loki/credentials.yaml
BUCKET_HOST: {{ printf "%s-minio" .Release.Name | quote }}
```

- In sub-chart `customConfig` values rendered through `tpl`, use `{{ .Release.Name }}` directly — it is evaluated by the sub-chart's `tpl` call and resolves to the parent release name.

```yaml
# values.yaml (vector customConfig) — .Release.Name evaluated by tpl
endpoint: "http://{{ .Release.Name }}-loki-gateway"
```

### Separation of Concerns in values.yaml

- **Large infrastructure components must be top-level sections**, not nested under their consumers. For example, MinIO configuration belongs at `minio:`, not at `loki.minio:`. This allows MinIO to be independently enabled/disabled and reused by other components in the future.

### MinIO Provisioning Pattern

- Use the `minio/minio` chart (`https://charts.min.io`), not the bitnami chart.
- Create buckets, users, and policies directly via the chart's top-level `buckets`, `users`, and `policies` fields (not under a `provisioning` key).
- Create a **dedicated user per consuming service** with a policy scoped to only its bucket — do not use root credentials for service-to-service access.

```yaml
minio:
policies:
- name: loki
statements:
- resources: ["arn:aws:s3:::loki/*"]
effect: Allow
actions: ["s3:*"]
users:
- accessKey: loki
secretKey: "loki123!"
policy: loki
buckets:
- name: loki
```

- Templates that read MinIO credentials must reference the `users` array directly:

```yaml
# credentials.yaml
stringData:
AWS_ACCESS_KEY_ID: {{ (index .Values.minio.users 0).accessKey | quote }}
AWS_SECRET_ACCESS_KEY: {{ (index .Values.minio.users 0).secretKey | quote }}
```

### Helm `tpl` Passthrough — Vector Label Syntax

- The vector chart renders `customConfig` through Helm's `tpl` function (`{{ tpl (toYaml .Values.customConfig) . | indent 4 }}`). This means any `{{ }}` expression in `customConfig` is evaluated as a Go template at render time.
- To pass **Vector's own field-template syntax** (`{{ field }}`) through `tpl` without evaluation, use Go raw string literals:

```yaml
# values.yaml — correct
labels:
namespace: "{{`{{ namespace }}`}}"

# values.yaml — WRONG: tpl evaluates {{ namespace }} as a Go template function
labels:
namespace: "{{ namespace }}"
```

- **Before using `customConfig` with any sub-chart, always verify whether the chart applies `tpl` to it** by running `helm pull <chart> --version <ver> --untar` and inspecting the ConfigMap template.

### YAML Anchors

- **Do not use YAML anchors at the root level of `values.yaml`** (e.g., `_defaults: &defaults`). Helm treats unknown root-level keys as invalid and may emit warnings or errors. Instead, duplicate shared configuration explicitly for each component.

### MIF Pod Label Keys

When filtering or labeling logs, metrics, or other signals by MIF-specific pod attributes, use these label keys:

| Concept | Label key | Example value |
| :---------------- | :--------------------------- | :------------------ |
| Pool | `mif.moreh.io/pool` | `heimdall` |
| Role | `mif.moreh.io/role` | `prefill`, `decode` |
| App name | `app.kubernetes.io/name` | `vllm` |
| Inference service | `app.kubernetes.io/instance` | `llama-3-2-1b` |
1 change: 1 addition & 0 deletions deploy/helm/CLAUDE.md
13 changes: 11 additions & 2 deletions deploy/helm/moai-inference-framework/Chart.lock
Original file line number Diff line number Diff line change
Expand Up @@ -23,5 +23,14 @@ dependencies:
- name: node-feature-discovery
repository: oci://registry.k8s.io/nfd/charts
version: 0.18.3
digest: sha256:d7f75e788dca4192775595637ec123afa390e09eebcaef9e9c0e40ff46c23e23
generated: "2026-02-19T16:14:15.495286+09:00"
- name: minio
repository: https://charts.min.io
version: 5.4.0
- name: loki
repository: https://grafana.github.io/helm-charts
version: 6.30.0
- name: vector
repository: https://helm.vector.dev
version: 0.39.0
digest: sha256:85af11696c630ed9ac9ef85a7a18c8b821187a76e949e535019fc5b91d929ee8
generated: "2026-02-20T18:51:13.630416372+09:00"
12 changes: 12 additions & 0 deletions deploy/helm/moai-inference-framework/Chart.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -42,3 +42,15 @@ dependencies:
version: 0.18.3
repository: oci://registry.k8s.io/nfd/charts
condition: nfd.enabled
- name: minio
version: 5.4.0
repository: https://charts.min.io
condition: minio.enabled
- name: loki
version: 6.30.0
repository: https://grafana.github.io/helm-charts
condition: loki.enabled
- name: vector
version: 0.39.0
repository: https://helm.vector.dev
condition: vector.enabled
Loading