feat: implement high-entropy synthetic data generator skill (Issue #22) by rosspeili · Pull Request #31 · ARPAHLS/skillware

rosspeili · 2026-04-03T07:36:22Z

Description

This PR introduces the data_engineering/synthetic_generator skill, designed to provide agents with a robust pipeline for generating high-entropy synthetic training data.

Logic, Cognition, and Governance

Logic: Implemented a model-agnostic execution layer in skill.py that supports internal routing to Ollama, Gemini, and Anthropic. It features a zero-dependency entropy validator using zlib compression ratios to ensure lexical diversity and prevent "model collapse."
Cognition: The instructions.md (cognitive map) enforces the use of combinatorial personas and edge-case scenarios while strictly prohibiting common AI tropes and boilerplate.
Governance: The skill operates entirely in Python, using the SkillLoader adapter patterns to maintain strict schema compliance for input/output. It encapsulates high-temperature generation to keep the primary agent's state stable.

Type of Change

🚀 Skill Proposal: New Skill (Contains manifest.yaml, skill.py, and instructions.md)
🐛 Bug Report Fix: Non-breaking change which fixes an execution error or framework bug
📖 Doc Fix: Documentation Update
🧠 Framework Feature / RFC Updates: Core Framework Update

Checklist

My code follows the Agent Code of Conduct.
I have included a properly formatted manifest.yaml.
The skill logic operates purely in Python and does not rely on arbitrary LLM code generation.
Requirements and env_vars are explicitly documented in the manifest.
I have written unit tests proving deterministic execution and schema compliance.
I have verified that SkillLoader successfully loads this module without missing dependency errors.

Constitution & Safety

This skill is restricted to text synthesis and deterministic entropy evaluation. It does not perform any file system modifications (leaving data persistence to the orchestrating agent), nor does it execute any generated strings as code. It strictly isolates internal LLM calls to the providers specified in the configuration.

Related Issues

Fixes #22

…PAHLS#22) Resolves issue ARPAHLS#22 by introducing a data engineering skill that leverages model-agnostic execution and zlib compression heuristics to compute synthetic diversity. Includes tests, example dataset pipeline script, and fully updated docs.

rosspeili merged commit 8f8a963 into ARPAHLS:main Apr 3, 2026
2 of 5 checks passed

rosspeili deleted the feature/synthetic-generator-skill branch April 3, 2026 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: implement high-entropy synthetic data generator skill (Issue #22)#31

feat: implement high-entropy synthetic data generator skill (Issue #22)#31
rosspeili merged 1 commit intoARPAHLS:mainfrom
rosspeili:feature/synthetic-generator-skill

rosspeili commented Apr 3, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

rosspeili commented Apr 3, 2026

Description

Logic, Cognition, and Governance

Type of Change

Checklist

Constitution & Safety

Related Issues

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant