Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .github/workflows/all_test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -65,10 +65,15 @@ jobs:
# Scene selection:
# - ci_top_attention_doc_page_build validates doc build through the prebuilt Docker image.
# - ci_top_attention_bin_kvtest keeps the Rust kv_test entry under the testbed scene contract.
# - ci_top_attention_mq_core keeps MQ correctness coverage inside the same CI testbed contract.
suite["scenes"] = {
key: value
for key, value in suite["scenes"].items()
if key in ("ci_top_attention_doc_page_build", "ci_top_attention_bin_kvtest")
if key in (
"ci_top_attention_doc_page_build",
"ci_top_attention_bin_kvtest",
"ci_top_attention_mq_core",
)
}

# Profile selection:
Expand All @@ -91,6 +96,7 @@ jobs:
# - Keep the original per-scene scales from ci_test_list.yaml.
# - ci_top_attention_doc_page_build stays on n1_kvowner_dram_3gib.
# - ci_top_attention_bin_kvtest stays on n1_kvowner_dram_20gib.
# - ci_top_attention_mq_core stays on n1_kvowner_dram_20gib.

out_path.write_text(
yaml.safe_dump(suite, sort_keys=False, allow_unicode=False),
Expand Down
3 changes: 3 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ Keep this document concise.
- Git operations are limited to basic `stage`, `unstage`, `commit`, and `push`. Do not use other Git operations.
- Prefer contraction over compatibility by default. Do not add compatibility layers, deprecated paths, or aliases unless the task explicitly requires them.
- Prefer one canonical name for one concept. Avoid synonym parameters, duplicated entrypoints, and parallel config surfaces.
- For test entrypoints, match the real execution model directly. If a test is a standalone script/process test, invoke it as a script/process; do not wrap it in `pytest` just for uniformity.
- Do not forward pytest-style flags (`-k`, `-q`, node selectors, etc.) through direct-process test wrappers unless the wrapper explicitly implements and documents that selector surface.
- For new integration or process-lifecycle tests, prefer direct process startup with explicit arguments and explicit exit-code checks over adding new pytest-only wrappers.
- Control branching deliberately. Prefer a small, explicit, enumerated set of supported branches in the style of a Rust enum over open-ended proliferation of near-duplicate cases.
- When extending a surface, prefer folding the new case into an existing finite branch set. If a new branch is unavoidable, make it explicit, bounded, and easy to list exhaustively.
- Names for testbed-scoped concepts should say `testbed` explicitly. Avoid generic names for testbed-only modes, ports, roots, workdirs, and other testbed-scoped settings.
Expand Down
3 changes: 3 additions & 0 deletions AGENTS_CN.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@
- Git 操作仅限基础的 `stage`、`unstage`、`commit` 和 `push`。不要使用其他 Git 操作
- 默认优先收束而不是兼容。除非任务明确要求,否则不要添加兼容层、废弃路径或别名
- 一个概念优先只保留一个正式名字。避免同义参数、重复入口和并行配置面
- 对测试入口,要直接匹配真实执行模型。如果测试本质上是独立脚本 / 独立进程测试,就按脚本 / 进程直接启动;不要为了表面统一再额外包一层 `pytest`
- 对直接启动进程的测试包装器,不要透传 `-k`、`-q`、node selector 等 pytest 风格参数,除非该包装器显式实现并文档化了这组筛选接口
- 新增集成测试或进程生命周期测试时,优先采用“直接启动进程 + 显式参数 + 显式检查退出码”的模式,而不是继续新增 pytest 专用包装层
- 有意识地控制分支。优先采用类似 Rust enum 的小而显式、可穷举罗列的有限分支集合,而不是开放式扩散出一批近似重复分支
- 扩展一个 surface 时,优先把新情况折叠进已有的有限分支集合;如果确实必须新增分支,就让它保持显式、边界清楚、易于完整罗列
- testbed 作用域内的概念,命名里应显式带上 `testbed`。对仅属于 testbed 的 mode、port、root、workdir 等设置,避免使用过于泛化的名字
Expand Down
72 changes: 69 additions & 3 deletions fluxon_doc_cn/design/teststack_1_当前架构与CI测试流程.md
Original file line number Diff line number Diff line change
Expand Up @@ -9,10 +9,11 @@
**稳定结论:**

- `teststack` 由三层组成:
- **suite 编译层**:将 `scene × scale × profile` 组合成可执行 case;
- **testbed 编排层**:拉起共享 testbed,保持 controller 在线,并基于 `fluxon_ops` / ops 接口调度部署 workload
- **case 执行层**:处理每个 case 的 prepare、execute、collect、finalize。
- **上层:suite 编译层**:将 `scene × scale × profile` 组合成可执行 case;
- **中层:统一 case plan / dispatch 层**:把编译结果收敛成统一的 `prepare / execute / collect / finalize` 外壳,并按 runtime backend 分发
- **下层:runtime backend 执行层**:分别承接 `CI` backend 和 `TEST_STACK` backend 的具体 prepare、execute、collect、finalize 实现
- `test_runner.py` 是统一执行器,覆盖 `CI` case、`TEST_STACK` benchmark case,以及 UI / GitOps 集成入口。
- `test_runner.py` 当前主要承载上层和中层;`test_runner_runtime_backend.py` 承载下层 runtime backend 实现。
- `start_test_bed.py` 只负责共享 testbed 的启动与 controller 侧 apply 编排,不承担通用测试执行职责。
- `ci_2_virt_node.py` 是 GitHub Actions / 本地双逻辑节点 CI 的封装入口,负责串联打包、dispatch、拉起 testbed、运行 runner、构建文档等步骤。

Expand Down Expand Up @@ -56,6 +57,46 @@ flowchart TD
B --> I[test_runner UI + GitOps]
```

### 4.0 `test_runner` 内部的上中下分层

这里要把“teststack 三层”与“`test_runner` 内部三层”区分开看。

`teststack` 整体上仍然是:

- suite 编译层
- testbed 编排层
- case 执行层

但在 `test_runner` 自身内部,当前稳定实现已经进一步分成三层:

| 层级 | 作用 | 当前主要落点 |
| --- | --- | --- |
| 上层 | 解析 suite、selector、`scene/scale/profile`,并 materialize `resolved_case` | `test_runner.py` |
| 中层 | 将不同 case family 收敛成统一 `_CasePlan` 外壳,并负责统一 dispatch | `test_runner.py` |
| 下层 | 按 runtime backend 执行具体 runtime 逻辑 | `test_runner_runtime_backend.py` |

这里的关键点是:

- **上层统一的是 schema 和 case 编译模型**;
- **中层统一的是 `prepare / execute / collect / finalize` 的外壳**;
- **下层不再按 `scene/scale/profile` 切分,而是按 runtime backend 切分**。

这意味着:

- `scene / scale / profile` 仍是一套统一输入模型;
- `CI` 与 `TEST_STACK` 的差异,主要落在下层 runtime backend,而不是上层 schema。

当前对应关系可以简化理解为:

```text
scene / scale / profile
-> resolved_case
-> _CasePlan
-> runtime backend dispatch
-> CI backend
-> TEST_STACK backend
```

### 4.1 suite 编译层

本层输入为 `ci_test_list.yaml`,主要定义三类核心对象与一类产物注册表:
Expand Down Expand Up @@ -135,6 +176,25 @@ flowchart TD
- `summary.yaml` 是单次 run_dir 的终态摘要;
- `resolved_case.yaml` / `resolved_case_full.yaml` 是单次 run 的编译产物。

### 4.4 `test_runner.py` 与 `test_runner_runtime_backend.py` 的边界

当前 repo 内已经开始把 `test_runner` 主体按“统一编译/分发”和“runtime backend 执行”拆开。

稳定边界如下:

| 文件 | 主要职责 | 不负责什么 |
| --- | --- | --- |
| `fluxon_test_stack/test_runner.py` | 上层 suite/schema/case 编译;中层 `_CasePlan` 编译与统一 dispatch;runner 入口、workdir 历史、通用 util | 不再直接承载大段 `CI` / `TEST_STACK` backend 细节 |
| `fluxon_test_stack/test_runner_runtime_backend.py` | 下层 backend 运行逻辑:`_prepare_ci_case`、`_execute_ci_case`、`_prepare_test_stack_case`、`_execute_test_stack_case`、对应 finalize / result wait | 不解析 suite,不决定 `scene × scale × profile` 的组合空间 |

这层拆分的目的不是制造第二套 case 模型,而是把:

- **统一 case schema**
- **统一 `_CasePlan` 外壳**
- **不同 runtime backend 实现**

三者分开,避免把所有逻辑继续堆在一个 `test_runner.py` 里。

## 5. teststack 的公共契约

### 5.1 两类场景
Expand Down Expand Up @@ -407,6 +467,12 @@ sequenceDiagram

`test_runner.py` 会先把每个 case 编译成 `_CasePlan`。这里有一个通用骨架:所有 case 都分成 `prepare_phases / execute_phases / collect_phases` 三段。不同场景的差异不在“三段结构本身”,而在于每段里放哪些 runtime phase、每个 phase 覆盖哪些 instance,以及 run_dir 怎样 staging。

这里要明确:

- `_CasePlan` 属于中层统一外壳;
- `CI` 和 `TEST_STACK` 都要先落到 `_CasePlan`;
- 真正的 backend 差异,延后到下层 runtime backend 才展开。

通用语义如下:

- prepare phase 先准备场景依赖的 runtime、配置、脚本和共享目录;
Expand Down
5 changes: 4 additions & 1 deletion fluxon_py/tests/test_config.yaml
Original file line number Diff line number Diff line change
@@ -1,2 +1,5 @@
deployconf_path: ../../deployment/deployconf.yaml
kv_svc_type: fluxon
etcd_address: 127.0.0.1:2379

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This moves the checked-in default test authority from the generic deployconf indirection to a concrete localhost runtime. CI does rewrite src/fluxon_py/tests/test_config.yaml with case-scoped values, but outside that prepared src_root every default import of fluxon_py.tests.test_lib now resolves to 127.0.0.1:2379 and /tmp/fluxon-example-cluster/* unless callers know to set FLUXON_TEST_CONFIG_PATH. Since repo YAMLs are examples by default and environment-specific runtime authority should be supplied separately, please keep the checked-in file as a generic example or make the local override path explicit in the runner/docs instead of committing this concrete runtime.

cluster_name: fluxon-example-cluster
shared_memory_path: /tmp/fluxon-example-cluster/shm
shared_file_path: /tmp/fluxon-example-cluster/share
38 changes: 16 additions & 22 deletions fluxon_py/tests/test_lib.py
Original file line number Diff line number Diff line change
Expand Up @@ -32,12 +32,11 @@
from setup_and_pack.utils.repo_config_utils import (
_verify_host_port,
_verify_url,
load_deployconf_etcd_address,
load_deployconf_fluxon_cluster_name,
load_deployconf_fluxon_shared_file_path,
load_deployconf_fluxon_shared_memory_path,
load_test_config_mapping,
load_test_deployconf_path,
load_test_etcd_address_from_test_config,
load_test_fluxon_cluster_name_from_test_config,
load_test_fluxon_shared_file_path_from_test_config,
load_test_fluxon_shared_memory_path_from_test_config,
load_test_kv_svc_type_from_test_config,
)

Expand All @@ -50,10 +49,9 @@ def load_test_kv_svc_type(*, config_path: Optional[Path] = None) -> str:


def load_test_kv_svc_ip(*, config_path: Optional[Path] = None) -> str:
"""Load test backend host from the shared deployconf."""
deployconf_path = load_test_deployconf_path(config_path=config_path)
etcd_addr = load_deployconf_etcd_address(config_path=deployconf_path)
s, _port = _verify_host_port(etcd_addr, field="deployconf.global_envs.ETCD_FULL_ADDRESS")
"""Load test backend host from test_config.yaml."""
etcd_addr = load_test_etcd_address_from_test_config(config_path=config_path)
s, _port = _verify_host_port(etcd_addr, field="test_config.yaml.etcd_address")
if "://" in s or not s:
raise ValueError("test backend host should be a host or IP without scheme, e.g. 127.0.0.1")
return s
Expand Down Expand Up @@ -81,21 +79,18 @@ def load_test_mooncake_master_server_address(*, config_path: Optional[Path] = No


def load_test_fluxon_cluster_name(*, config_path: Optional[Path] = None) -> str:
"""Load required fluxon cluster name from the shared deployconf."""
deployconf_path = load_test_deployconf_path(config_path=config_path)
return load_deployconf_fluxon_cluster_name(config_path=deployconf_path)
"""Load required Fluxon cluster name from test_config.yaml."""
return load_test_fluxon_cluster_name_from_test_config(config_path=config_path)


def load_test_fluxon_share_mem_path(*, config_path: Optional[Path] = None) -> str:
"""Load required fluxon shared memory path from the shared deployconf."""
deployconf_path = load_test_deployconf_path(config_path=config_path)
return load_deployconf_fluxon_shared_memory_path(config_path=deployconf_path)
"""Load required Fluxon shared-memory path from test_config.yaml."""
return load_test_fluxon_shared_memory_path_from_test_config(config_path=config_path)


def load_test_fluxon_share_file_path(*, config_path: Optional[Path] = None) -> str:
"""Load required fluxon shared file path from the shared deployconf."""
deployconf_path = load_test_deployconf_path(config_path=config_path)
return load_deployconf_fluxon_shared_file_path(config_path=deployconf_path)
"""Load required Fluxon shared-file path from test_config.yaml."""
return load_test_fluxon_shared_file_path_from_test_config(config_path=config_path)


def load_test_chan_config(*, config_path: Optional[Path] = None) -> Dict[str, int]:
Expand All @@ -105,10 +100,9 @@ def load_test_chan_config(*, config_path: Optional[Path] = None) -> Dict[str, in
"""
return {"capacity": 10, "ttl_seconds": 90, "weight": 1}

# Resolve ETCD host/port and test configuration via config utils (no direct field access)
_TEST_DEPLOYCONF_PATH = load_test_deployconf_path()
_ETCD_ADDRESS = load_deployconf_etcd_address(config_path=_TEST_DEPLOYCONF_PATH)
ETCD_HOST, _ETCD_PORT = _verify_host_port(_ETCD_ADDRESS, field="deployconf.global_envs.ETCD_FULL_ADDRESS")
# Resolve ETCD host/port and test configuration via test_config.yaml (single explicit authority)
_ETCD_ADDRESS = load_test_etcd_address_from_test_config()
ETCD_HOST, _ETCD_PORT = _verify_host_port(_ETCD_ADDRESS, field="test_config.yaml.etcd_address")
ETCD_PORT = int(_ETCD_PORT)
KV_SVC_TYPE = load_test_kv_svc_type()
KV_SVC_IP = load_test_kv_svc_ip()
Expand Down
2 changes: 2 additions & 0 deletions fluxon_release/test_rsc/source/prepare.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,8 @@ python_runtime:
source: wheel
- pinned: readerwriterlock==1.0.9
source: wheel
- pinned: pytest==8.3.5
source: wheel
zerorpc:
requirements:
- pinned: zerorpc==0.6.3
Expand Down
12 changes: 4 additions & 8 deletions fluxon_test_stack/benchmark_full_matrix.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -193,8 +193,7 @@ artifact_sets:
region: us-east-1
key_prefix: profiles/fluxon_fastws
release_artifacts:
python_wheel: fluxon-0.2.1-py3-none-any.whl
pyo3_wheel: fluxon_pyo3-0.2.1-cp38-abi3-manylinux_2_28_x86_64.whl
wheel: fluxon-0.2.1-py3-none-any.whl
test_rsc_source: &test_rsc_source_fastws
kind: FLUXON_OPS_FS_S3
bucket: fluxon-release
Expand All @@ -215,8 +214,7 @@ artifact_sets:
region: us-east-1
key_prefix: profiles/fluxon_tquic
release_artifacts:
python_wheel: fluxon-0.2.1-py3-none-any.whl
pyo3_wheel: fluxon_pyo3-0.2.1-cp38-abi3-manylinux_2_28_x86_64.whl
wheel: fluxon-0.2.1-py3-none-any.whl
test_rsc_source: &test_rsc_source_tquic
kind: FLUXON_OPS_FS_S3
bucket: fluxon-release
Expand All @@ -237,8 +235,7 @@ artifact_sets:
region: us-east-1
key_prefix: profiles/fluxon_sockudo_ws
release_artifacts:
python_wheel: fluxon-0.2.1-py3-none-any.whl
pyo3_wheel: fluxon_pyo3-0.2.1-cp38-abi3-manylinux_2_28_x86_64.whl
wheel: fluxon-0.2.1-py3-none-any.whl
test_rsc_source: &test_rsc_source_sockudo_ws
kind: FLUXON_OPS_FS_S3
bucket: fluxon-release
Expand All @@ -259,8 +256,7 @@ artifact_sets:
region: us-east-1
key_prefix: profiles/fluxon_tcp
release_artifacts:
python_wheel: fluxon-0.2.1-py3-none-any.whl
pyo3_wheel: fluxon_pyo3-0.2.1-cp38-abi3-manylinux_2_28_x86_64.whl
wheel: fluxon-0.2.1-py3-none-any.whl
test_rsc_source: &test_rsc_source_tcp
kind: FLUXON_OPS_FS_S3
bucket: fluxon-release
Expand Down
24 changes: 16 additions & 8 deletions fluxon_test_stack/ci_test_list.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,14 @@ scenes:
scales: [n1_kvowner_dram_20gib]
profiles: [fluxon_tcp]

ci_top_attention_mq_core:

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This adds a third top-attention runner-native CI scene, but the teststack design doc still describes the stable native dispatch set as only _bin_kvtest.py and _doc_page_build.py under section 9.1, and says ci_scene_config.yaml is handed to those two entries. Since ci_test_list.yaml is the suite contract and this PR is extending that public finite branch set, please update the design doc alongside the scene addition to include ci_top_attention_mq_core, its cluster_kv_owner runtime, and how _mq_core.py consumes the case-scoped config. Otherwise the implementation and architecture contract diverge immediately, making future CI scene work copy whichever source is stale.

ci:
subject: mq
runtime_contract: cluster_kv_owner
select:
scales: [n1_kvowner_dram_20gib]
profiles: [fluxon_tcp]

kv_read_heavy_zipf:
test_stack:
mode: KVSTORE
Expand Down Expand Up @@ -221,8 +229,7 @@ artifact_sets:
region: us-east-1
key_prefix: profiles/fluxon_fastws
release_artifacts:
python_wheel: fluxon-0.2.1-py3-none-any.whl
pyo3_wheel: fluxon_pyo3-0.2.1-cp38-abi3-manylinux_2_28_x86_64.whl
wheel: fluxon-0.2.1-py3-none-any.whl
test_rsc_source: &test_rsc_source_fastws
kind: FLUXON_OPS_FS_S3
bucket: fluxon-release
Expand All @@ -242,8 +249,7 @@ artifact_sets:
region: us-east-1
key_prefix: profiles/fluxon_tquic
release_artifacts:
python_wheel: fluxon-0.2.1-py3-none-any.whl
pyo3_wheel: fluxon_pyo3-0.2.1-cp38-abi3-manylinux_2_28_x86_64.whl
wheel: fluxon-0.2.1-py3-none-any.whl
test_rsc_source: &test_rsc_source_tquic
kind: FLUXON_OPS_FS_S3
bucket: fluxon-release
Expand All @@ -263,8 +269,7 @@ artifact_sets:
region: us-east-1
key_prefix: profiles/fluxon_sockudo_ws
release_artifacts:
python_wheel: fluxon-0.2.1-py3-none-any.whl
pyo3_wheel: fluxon_pyo3-0.2.1-cp38-abi3-manylinux_2_28_x86_64.whl
wheel: fluxon-0.2.1-py3-none-any.whl
test_rsc_source: &test_rsc_source_sockudo_ws
kind: FLUXON_OPS_FS_S3
bucket: fluxon-release
Expand All @@ -284,8 +289,7 @@ artifact_sets:
region: us-east-1
key_prefix: profiles/fluxon_tcp
release_artifacts:
python_wheel: fluxon-0.2.1-py3-none-any.whl
pyo3_wheel: fluxon_pyo3-0.2.1-cp38-abi3-manylinux_2_28_x86_64.whl
wheel: fluxon-0.2.1-py3-none-any.whl
test_rsc_source: &test_rsc_source_tcp
kind: FLUXON_OPS_FS_S3
bucket: fluxon-release
Expand Down Expand Up @@ -315,6 +319,7 @@ profiles:
doc_site_base_url: example.com
ci_top_attention_bin_kvtest:
kv_test_rounds: all
ci_top_attention_mq_core: {}
runtime_contracts:
cluster_kv_owner: &cluster_kv_owner_runtime
base_runtime:
Expand Down Expand Up @@ -460,6 +465,7 @@ profiles:
doc_site_base_url: example.com
ci_top_attention_bin_kvtest:
kv_test_rounds: all
ci_top_attention_mq_core: {}
test_stack:
<<: *common_test_stack_runtime
fluxon_sockudo_ws:
Expand All @@ -472,6 +478,7 @@ profiles:
doc_site_base_url: example.com
ci_top_attention_bin_kvtest:
kv_test_rounds: all
ci_top_attention_mq_core: {}
test_stack:
<<: *common_test_stack_runtime
fluxon_tcp:
Expand All @@ -484,6 +491,7 @@ profiles:
doc_site_base_url: example.com
ci_top_attention_bin_kvtest:
kv_test_rounds: all
ci_top_attention_mq_core: {}
test_stack:
<<: *common_test_stack_runtime
redis_sharded:
Expand Down
Loading
Loading