Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
c2306a1
Enforce branch promotion flow
ctn Mar 13, 2026
9f8bffe
Enforce branch promotion flow
ctn Mar 13, 2026
ecc6875
Switch project license to MIT
ctn Mar 14, 2026
f9c55a4
Switch project license to MIT (#21)
ctn Mar 14, 2026
29368c6
Reorganize repository around model and ontology
ctn Mar 14, 2026
aa82b74
Refactor model README into focused docs
ctn Mar 14, 2026
b478e4d
Add model Makefile and simplify project docs
ctn Mar 14, 2026
71acf3b
Tighten top-level README
ctn Mar 14, 2026
fe7c40a
Import ontology subtree and refresh project README
ctn Mar 14, 2026
5807795
Document SemiKong paper references
ctn Mar 14, 2026
8c8e89c
Simplify SemiKong paper references
ctn Mar 14, 2026
9b31b4c
Add full SemiKong citations to READMEs
ctn Mar 14, 2026
c1d974b
Add ontology Makefile commands
ctn Mar 14, 2026
24e1625
Scope Codacy to project-authored files
ctn Mar 14, 2026
13a0eef
Update PyTorch to address security alerts
ctn Mar 14, 2026
9cb08e8
Fix Codacy config format
ctn Mar 14, 2026
9b41a5a
Merge remote-tracking branch 'origin/main' into develop
ctn Mar 14, 2026
1ca518a
Resolve remaining Codacy issues on PR
ctn Mar 14, 2026
ba3a20b
Address remaining Codacy formatting issues
ctn Mar 14, 2026
02a2f63
Reduce Codacy noise from documentation
ctn Mar 14, 2026
217d8a3
Fix final Codacy formatting notice
ctn Mar 14, 2026
57228c6
Restructure SemiKong around model and ontology (#22)
ctn Mar 14, 2026
2cab6e9
Improve model setup and usage documentation
ctn Mar 14, 2026
2561d92
Clarify public model availability
ctn Mar 14, 2026
9875aae
Fix README links for GitHub
ctn Mar 14, 2026
e588d15
Merge origin/main into develop for promotion
ctn Mar 18, 2026
be31cea
Merge pull request #24 from aitomatic/develop
ctn Mar 18, 2026
e433961
Merge origin/stable into main for promotion
ctn Mar 18, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 12 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,8 +4,8 @@

SemiKong is an open-source semiconductor AI project that combines:

- a semiconductor language model in [model/](/Users/ctn/src/aitomatic/semikong/model)
- a semiconductor ontology and knowledge graph in [ontology/](/Users/ctn/src/aitomatic/semikong/ontology)
- a semiconductor language model in [model/](model/)
- a semiconductor ontology and knowledge graph in [ontology/](ontology/)

SemiKong began as an early open effort to build a semiconductor-specific language model from real industry collaboration. Publicly, it was presented through the AI Alliance ecosystem with contributions from Aitomatic, Tokyo Electron, FPT, and others, and later described by the AI Alliance as its first domain-specific open model.

Expand Down Expand Up @@ -56,20 +56,21 @@ make -C model infer

Key model entry points:

- [model/README.md](/Users/ctn/src/aitomatic/semikong/model/README.md)
- [model/INSTALL.md](/Users/ctn/src/aitomatic/semikong/model/INSTALL.md)
- [model/Makefile](/Users/ctn/src/aitomatic/semikong/model/Makefile)
- [model/README.md](model/README.md)
- [model/INSTALL.md](model/INSTALL.md)
- [model/USAGE.md](model/USAGE.md)
- [model/Makefile](model/Makefile)

If you want to work with the ontology:

- [ontology/README.md](/Users/ctn/src/aitomatic/semikong/ontology/README.md)
- [ontology/MANIFESTO.md](/Users/ctn/src/aitomatic/semikong/ontology/MANIFESTO.md)
- [ontology/ontology/README.md](/Users/ctn/src/aitomatic/semikong/ontology/ontology/README.md)
- [ontology/README.md](ontology/README.md)
- [ontology/MANIFESTO.md](ontology/MANIFESTO.md)
- [ontology/ontology/README.md](ontology/ontology/README.md)

## Repository Guide

- [model/](/Users/ctn/src/aitomatic/semikong/model) contains the language model code, configs, docs, and references
- [ontology/](/Users/ctn/src/aitomatic/semikong/ontology) contains the ontology modules, shapes, curation materials, and ontology docs
- [model/](model/) contains the language model code, configs, docs, and references
- [ontology/](ontology/) contains the ontology modules, shapes, curation materials, and ontology docs

## Why This Project Matters

Expand All @@ -84,7 +85,7 @@ SemiKong is aimed at that gap.

## License

The repository code and checked-in contents are distributed under the [MIT License](/Users/ctn/src/aitomatic/semikong/LICENSE).
The repository code and checked-in contents are distributed under the [MIT License](LICENSE).

Some model weights, datasets, and imported ontology assets may also carry upstream licenses or provenance-specific terms.

Expand Down
23 changes: 23 additions & 0 deletions model/INSTALL.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,8 @@ This guide covers the current contents of the `model/` subtree.

This documentation dedicated to instruct on how to setup the environment for training, evaluation and inference SEMIKONG model.

For runnable commands after installation, see [USAGE.md](USAGE.md).

## Requirement Hardware

- CUDA Version: use a current NVIDIA driver/runtime compatible with the PyTorch version pinned in `model/requirements.txt`
Expand All @@ -30,6 +32,21 @@ conda activate semikong-env
make -C model install
~~~

## Supported Package Versions

The current dependency baseline in [requirements.txt](requirements.txt) includes:

- `torch==2.8.0`
- `torchaudio==2.8.0`
- `torchvision==0.23.0`
- `datasets`
- `transformers`
- `flash_attn`
- `vllm`
- `vllm-flash-attn`

If you need a reproducible environment for debugging install issues, start from that file rather than mixing older PyTorch and torchaudio pins.

## Training
~~~
1. Download the Meta-LLaMA/Meta-LLaMA base model first on HuggingFace Hub
Expand All @@ -49,3 +66,9 @@ python -m vllm.entrypoints.openai.api_server --model <path_to_model_or_HF_model_
2. Using vLLM Server:
python -m vllm.entrypoints.api_server --model <path_to_model_or_HF_model_card_name> --device cuda --max-lora-rank 32 --dtype auto --port 8080
~~~

## References

- Usage guide: [USAGE.md](USAGE.md)
- Dataset and benchmark resources: <https://drive.google.com/drive/u/0/folders/1IjuVyP35-xBEe_i_KkG9MnE-4o7Eb7tq>
- Tech report / paper: <https://arxiv.org/abs/2411.13802>
25 changes: 15 additions & 10 deletions model/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -17,7 +17,11 @@ The top-level repository is being organized around two major areas:
## Quick Links

- Dataset and benchmarks: <https://drive.google.com/drive/u/0/folders/1IjuVyP35-xBEe_i_KkG9MnE-4o7Eb7tq>
- Model weights: <https://huggingface.co/pentagoniac/SEMIKONG-70B>, <https://huggingface.co/pentagoniac/SEMIKONG-8b-GPTQ>
- Public model weights:
- Base 70B: <https://huggingface.co/pentagoniac/SEMIKONG-70B>
- Quantized 8B GPTQ: <https://huggingface.co/pentagoniac/SEMIKONG-8b-GPTQ>
- Quantized 8B instruct GPTQ: <https://huggingface.co/sitloboi2012/SEMIKONG-8B-Instruct-GPTQ>
- Instruct chat API: launch with `python -m vllm.entrypoints.openai.api_server ...` as shown in [USAGE.md](USAGE.md)
- Paper: <https://arxiv.org/abs/2411.13802>

## Papers
Expand All @@ -36,10 +40,11 @@ The top-level repository is being organized around two major areas:

## Start Here

- Setup and environment: [INSTALL.md](/Users/ctn/src/aitomatic/semikong/model/INSTALL.md)
- Commands: [Makefile](/Users/ctn/src/aitomatic/semikong/model/Makefile)
- Training config: [configs/training-config.yaml](/Users/ctn/src/aitomatic/semikong/model/configs/training-config.yaml)
- Inference config: [configs/inference-config.yaml](/Users/ctn/src/aitomatic/semikong/model/configs/inference-config.yaml)
- Setup and environment: [INSTALL.md](INSTALL.md)
- Usage and serving: [USAGE.md](USAGE.md)
- Commands: [Makefile](Makefile)
- Training config: [configs/training-config.yaml](configs/training-config.yaml)
- Inference config: [configs/inference-config.yaml](configs/inference-config.yaml)

## How To Use

Expand All @@ -53,14 +58,14 @@ make -C model infer

If you need to change paths or parameters first, edit:

- [configs/training-config.yaml](/Users/ctn/src/aitomatic/semikong/model/configs/training-config.yaml)
- [configs/inference-config.yaml](/Users/ctn/src/aitomatic/semikong/model/configs/inference-config.yaml)
- [configs/training-config.yaml](configs/training-config.yaml)
- [configs/inference-config.yaml](configs/inference-config.yaml)

## Documentation

- Project overview and model summary: [docs/overview.md](/Users/ctn/src/aitomatic/semikong/model/docs/overview.md)
- Ecosystem, deployment, and references: [docs/ecosystem.md](/Users/ctn/src/aitomatic/semikong/model/docs/ecosystem.md)
- Governance, contributions, disclaimer, and license notes: [docs/governance.md](/Users/ctn/src/aitomatic/semikong/model/docs/governance.md)
- Project overview and model summary: [docs/overview.md](docs/overview.md)
- Ecosystem, deployment, and references: [docs/ecosystem.md](docs/ecosystem.md)
- Governance, contributions, disclaimer, and license notes: [docs/governance.md](docs/governance.md)

## License

Expand Down
88 changes: 88 additions & 0 deletions model/USAGE.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,88 @@
# Using the SEMIKONG Model

This guide covers the intended command paths for training, local inference, and vLLM serving.

## Quick Start

From the repository root:

```bash
make -C model install
make -C model train
make -C model infer
```

Before running those commands, update:

- [configs/training-config.yaml](configs/training-config.yaml)
- [configs/inference-config.yaml](configs/inference-config.yaml)

At minimum, set the paths for the base model, adapter output, and local dataset.

## Training

The supported training entrypoint is:

```bash
make -C model train
```

That target runs:

```bash
python model/training/train.py --config model/configs/training-config.yaml
```

Expected inputs:

- a local base model checkpoint
- a local instruction-style dataset
- a writable output directory for checkpoints and adapters

## Local Inference

The supported inference entrypoint is:

```bash
make -C model infer
```

That target runs:

```bash
python model/inference/raw_inference.py --config model/configs/inference-config.yaml
```

Use this path for direct local generation against a configured base model and adapter.

## vLLM Serving

For an OpenAI-compatible API server:

```bash
python -m vllm.entrypoints.openai.api_server \
--model <path_to_model_or_hf_model> \
--dtype auto \
--max-lora-rank 32 \
--api-key token-abc123
```

For the vLLM API server endpoint:

```bash
python -m vllm.entrypoints.api_server \
--model <path_to_model_or_hf_model> \
--device cuda \
--max-lora-rank 32 \
--dtype auto \
--port 8080
```

## Models, Datasets, and Paper

- Public model weights:
- Base 70B: <https://huggingface.co/pentagoniac/SEMIKONG-70B>
- Quantized 8B GPTQ: <https://huggingface.co/pentagoniac/SEMIKONG-8b-GPTQ>
- Quantized 8B instruct GPTQ: <https://huggingface.co/sitloboi2012/SEMIKONG-8B-Instruct-GPTQ>
- Dataset and benchmark resources: <https://drive.google.com/drive/u/0/folders/1IjuVyP35-xBEe_i_KkG9MnE-4o7Eb7tq>
- Tech report / paper: <https://arxiv.org/abs/2411.13802>
26 changes: 13 additions & 13 deletions ontology/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,11 +18,11 @@ We intend to contribute the SemiKong ontology work into the broader SEMI standar

## What Is Here

- ontology modules in Turtle under [ontology/ontology/](/Users/ctn/src/aitomatic/semikong/ontology/ontology)
- architecture and methodology docs under [ontology/docs/](/Users/ctn/src/aitomatic/semikong/ontology/docs)
- SHACL shapes under [ontology/shapes/](/Users/ctn/src/aitomatic/semikong/ontology/shapes)
- examples under [ontology/examples/](/Users/ctn/src/aitomatic/semikong/ontology/examples)
- curation and ontologist workflow materials under [ontology/ontologist/](/Users/ctn/src/aitomatic/semikong/ontology/ontologist)
- ontology modules in Turtle under [ontology/ontology/](ontology/)
- architecture and methodology docs under [ontology/docs/](docs/)
- SHACL shapes under [ontology/shapes/](shapes/)
- examples under [ontology/examples/](examples/)
- curation and ontologist workflow materials under [ontology/ontologist/](ontologist/)

## Design Intent

Expand All @@ -37,12 +37,12 @@ This aligns with the original `semicont` direction and now serves as the ontolog

## Start Here

- manifesto: [MANIFESTO.md](/Users/ctn/src/aitomatic/semikong/ontology/MANIFESTO.md)
- commands: [Makefile](/Users/ctn/src/aitomatic/semikong/ontology/Makefile)
- ontology source overview: [ontology/ontology/README.md](/Users/ctn/src/aitomatic/semikong/ontology/ontology/README.md)
- architecture: [ontology/docs/architecture.md](/Users/ctn/src/aitomatic/semikong/ontology/docs/architecture.md)
- industry hierarchy: [ontology/docs/semiconductor-industry-ontology-hierarchy.md](/Users/ctn/src/aitomatic/semikong/ontology/docs/semiconductor-industry-ontology-hierarchy.md)
- ontologist workflow: [ontology/ontologist/README.md](/Users/ctn/src/aitomatic/semikong/ontology/ontologist/README.md)
- manifesto: [MANIFESTO.md](MANIFESTO.md)
- commands: [Makefile](Makefile)
- ontology source overview: [ontology/ontology/README.md](ontology/README.md)
- architecture: [ontology/docs/architecture.md](docs/architecture.md)
- industry hierarchy: [ontology/docs/semiconductor-industry-ontology-hierarchy.md](docs/semiconductor-industry-ontology-hierarchy.md)
- ontologist workflow: [ontology/ontologist/README.md](ontologist/README.md)

## How To Use

Expand All @@ -62,7 +62,7 @@ The imported ontology currently includes:
- industry-layer modules such as integrators, EDA, foundry/IDM, OSAT, WFE, materials, and supply chain
- validation shapes and curation assets

The canonical semantic source remains the Turtle content under [ontology/ontology/](/Users/ctn/src/aitomatic/semikong/ontology/ontology).
The canonical semantic source remains the Turtle content under [ontology/ontology/](ontology/).

## Contributor Signal

Expand All @@ -72,4 +72,4 @@ Initial contributor called out in the imported ontology materials:

## License Note

The imported ontology subtree includes its own [ontology/LICENSE](/Users/ctn/src/aitomatic/semikong/ontology/LICENSE). The repository as a whole is MIT-licensed at the top level, but ontology assets may also carry their own preserved licensing and provenance context from the imported source.
The imported ontology subtree includes its own [ontology/LICENSE](LICENSE). The repository as a whole is MIT-licensed at the top level, but ontology assets may also carry their own preserved licensing and provenance context from the imported source.