MNEMOS Embedkit — NPU-accelerated semantic memory for agentic AI on Cix Sky1 / 面向 Cix Sky1 的智能体语义记忆嵌入工具包


---

The MNEMOS Embedkit is the embedding layer that ships with NCZ 26.5 Magnetar. It gives you a production-grade semantic memory backend running entirely on-device, no cloud dependency.

## What it does

The Embedkit provides a unified embedding interface with automatic backend selection:

```python
from embedkit import Engine

engine = Engine.auto()  # picks NPU on Cix Sky1, CPU ONNX elsewhere
vector = engine.embed("What does the agent remember about the user?")
```

On Cix Sky1, `Engine.auto()` detects `/dev/aipu` (the Zhouyi V3 NPU device) and routes to the `npu-cix` adapter using `libnoe` and a compiled `.cix` model. On other hardware it falls back to `cpu-llamacpp` or `cpu-fastembed` (ONNX).

## Validated performance (Cix Sky1)

| Backend | Throughput | Notes |
|---|---|---|
| NPU (libnoe + bge-small-zh.cix) | 30+ emb/sec sustained | Per-call job overhead; persistent-job path coming |
| CPU ONNX (fastembed) | 700+ emb/sec batched | Leaves NPU + GPU free for other workloads |

Model: `bge-small-zh-v1.5` (512-dim, 256-token context), validated over 2000-call sustained runs on MS-R1 64GB.

## Architecture

```
Agent query
    ↓
MNEMOS semantic memory API
    ↓
Embedkit Engine.auto()
    ├── NPU adapter (libnoe + .cix model) — Cix Sky1
    ├── CPU ONNX adapter (fastembed) — any arm64/x86
    └── llama.cpp adapter (CPU/Vulkan) — fallback
    ↓
Vector embedding → MNEMOS vector store → retrieval results
```

The NPU runs concurrently with the Mali-G720 GPU (which handles LLM decode via llama.cpp Vulkan). No resource contention — both paths are always warm.

## Status

- ✅ `npu-cix` adapter: validated on Cix Sky1 / MS-R1
- ✅ `cpu-fastembed` adapter: validated on arm64 and x86
- 🔜 Bundled in ISO: currently installs post-boot via `ncz install mnemos`; will be baked into the next Magnetar release
- 🔜 70–80 emb/sec on NPU: pending upstream libnoe persistent-job fix (tracked with cixtech)

## Install (current)

On a running Magnetar system:
```bash
ncz install mnemos   # pulls MNEMOS server + Embedkit
```

The Embedkit source will be published to the MNEMOS organization shortly.

---

*NCZ 26.5 Magnetar: https://github.com/nclawzero/distro/releases/tag/v26.5-magnetar*

---

---

# MNEMOS 嵌入工具包——面向 Cix Sky1 的 NPU 加速语义记忆推理

MNEMOS Embedkit 是随 NCZ 26.5 Magnetar 一同发布的嵌入层工具包。它提供完全本地化的生产级语义记忆后端，无需云端依赖。

## 功能概述

Embedkit 提供统一的嵌入接口，自动选择最优后端：

```python
from embedkit import Engine

engine = Engine.auto()  # Cix Sky1 上自动选择 NPU，其他硬件回退到 CPU ONNX
vector = engine.embed("智能体需要记住哪些关于用户的信息？")
```

在 Cix Sky1 上，`Engine.auto()` 检测到 `/dev/aipu`（周易 V3 NPU 设备），通过 `libnoe` 和已编译的 `.cix` 模型文件路由到 `npu-cix` 适配器；在其他硬件上自动回退到 `cpu-llamacpp` 或 `cpu-fastembed`（ONNX）。

## 验证性能（Cix Sky1）

| 后端 | 吞吐量 | 说明 |
|---|---|---|
| NPU（libnoe + bge-small-zh.cix） | 30+ 次嵌入/秒（持续） | 每次调用重建 job 的开销；持久化 job 路径即将支持 |
| CPU ONNX（fastembed） | 700+ 次嵌入/秒（批量） | NPU + GPU 保持空闲，可并行处理其他工作负载 |

模型：`bge-small-zh-v1.5`（512 维，256 token 上下文），在 MS-R1 64GB 上经 2000 次调用的持续测试验证。

## 架构

```
智能体查询
    ↓
MNEMOS 语义记忆 API
    ↓
Embedkit Engine.auto()
    ├── NPU 适配器（libnoe + .cix 模型）—— Cix Sky1
    ├── CPU ONNX 适配器（fastembed）—— 任意 arm64/x86
    └── llama.cpp 适配器（CPU/Vulkan）—— 兜底
    ↓
向量嵌入 → MNEMOS 向量存储 → 检索结果
```

NPU 与 Mali-G720 GPU（通过 llama.cpp Vulkan 运行 LLM 解码）并行工作，无资源争抢，两条推理路径始终保持热启动。

## 当前状态

- ✅ `npu-cix` 适配器：在 Cix Sky1 / MS-R1 上验证
- ✅ `cpu-fastembed` 适配器：在 arm64 和 x86 上验证
- 🔜 内置于 ISO：当前通过 `ncz install mnemos` 安装（Magnetar 下一版本将直接打包）
- 🔜 NPU 70–80 次/秒目标：等待上游 libnoe 持久化 job 修复（已向星睿反馈追踪）

## 安装（当前）

在运行中的 Magnetar 系统上：
```bash
ncz install mnemos   # 拉取 MNEMOS 服务器 + Embedkit
```

Embedkit 源码将很快发布至 MNEMOS 组织。

---

*NCZ 26.5 Magnetar：https://github.com/nclawzero/distro/releases/tag/v26.5-magnetar*

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

MNEMOS Embedkit — NPU-accelerated semantic memory for agentic AI on Cix Sky1 / 面向 Cix Sky1 的智能体语义记忆嵌入工具包 #22

What it does

Validated performance (Cix Sky1)

Architecture

Status

Install (current)

MNEMOS 嵌入工具包——面向 Cix Sky1 的 NPU 加速语义记忆推理

功能概述

验证性能（Cix Sky1）

架构

当前状态

安装（当前）

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Backend	Throughput	Notes
NPU (libnoe + bge-small-zh.cix)	30+ emb/sec sustained	Per-call job overhead; persistent-job path coming
CPU ONNX (fastembed)	700+ emb/sec batched	Leaves NPU + GPU free for other workloads

后端	吞吐量	说明
NPU（libnoe + bge-small-zh.cix）	30+ 次嵌入/秒（持续）	每次调用重建 job 的开销；持久化 job 路径即将支持
CPU ONNX（fastembed）	700+ 次嵌入/秒（批量）	NPU + GPU 保持空闲，可并行处理其他工作负载

MNEMOS Embedkit — NPU-accelerated semantic memory for agentic AI on Cix Sky1 / 面向 Cix Sky1 的智能体语义记忆嵌入工具包 #22

Description

What it does

Validated performance (Cix Sky1)

Architecture

Status

Install (current)

MNEMOS 嵌入工具包——面向 Cix Sky1 的 NPU 加速语义记忆推理

功能概述

验证性能（Cix Sky1）

架构

当前状态

安装（当前）

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions