Updating AFM results by sdeeptan-aws · Pull Request #36 · aws-neuron/neuronx-distributed-inference

sdeeptan-aws · 2026-02-17T18:19:57Z

Description

Updated AFM-4.5B-Base contrib model README with correct architecture details, validated accuracy results, and working usage examples. The model uses the Arcee architecture (arcee-ai/AFM-4.5B-Base) with YaRN RoPE scaling and ReLU² activation. Validation achieved 100% token match accuracy with a deterministic prompt.

Model Information

Model Name: AFM-4.5B-Base (Arcee)
Model Architecture: Decoder-only transformer (Arcee — similar to LLaMA with YaRN RoPE and ReLU² activation)
Purpose: Text generation

Checklist

Required Components

Accuracy Test (test/integration/test_model.py)
README.md with the following sections:
- Usage Example: Clear code example showing how to use the model
- Compatibility Matrix: Table showing tested Neuron SDK versions and instance types
- Example Checkpoints: Links to compatible model checkpoints
- Testing Instructions: Command to run the test suite for the model
Source Code (src/)

Optional Components

Unit Tests (CPU or Neuron-based)

Folder Structure

Confirm your contribution follows this structure:

/contrib/models/AFM-4.5B-Base/
    README.md
    /src
        modeling_afm.py
    /test
        /integration
            test_model.py

Testing

README updated to reflect validated results. Model was compiled and tested with TP=1, batch_size=1, seq_len=128, bfloat16. Token matching achieved 100% on deterministic prompt ("1+1="). Key implementation notes documented: YaRN RoPE scaling, ReLU² activation (no gate_proj), and QKV projection combining for Neuron.

Test Results:

Test	Status	Result
Smoke Test	✅ PASS	Model loads successfully
Token Matching	✅ PASS	100% match

Compatibility

Tested with:

Instance Type(s): Trn1
Configuration: TP=1, batch_size=1, seq_len=128, bfloat16

Additional Information

YaRN RoPE scaling extends context to 65k tokens (factor=20, original_max_pos=4096)
ReLU² activation (relu(x).pow(2)) replaces SwiGLU — only up_proj and down_proj, no gate_proj
HuggingFace QKV projections are separate; combined into qkv_proj for Neuron
BF16 precision causes divergence on ambiguous prompts (e.g., "The capital of France is" at 7.81%) but both outputs are coherent and correct

Related Issues

N/A

vLLM Integration

This model/feature is intended for use with vLLM
Documentation includes vLLM registration instructions

By submitting this PR, I confirm that:

I have read and followed the contributing guidelines
This is a community contribution and may have limited testing compared to officially-supported models
The code follows best practices and is well-documented
All required components listed above are included

aws-yishanm · 2026-02-17T20:39:13Z

contrib/models/AFM-4.5B-Base/README.md


-config = AFM45BBaseInferenceConfig(
-    neuron_config,
+config = AFMInferenceConfig(


Could you move the test to a different file to follow the structure in the PR Conversation?

I'm not sure I follow - the file is under test/integration/test_model.py

Updating AFM results

6ca522f

aws-yishanm reviewed Feb 17, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Updating AFM results#36

Updating AFM results#36
sdeeptan-aws wants to merge 1 commit intoaws-neuron:mainfrom
sdeeptan-aws:afm

sdeeptan-aws commented Feb 17, 2026

Uh oh!

aws-yishanm Feb 17, 2026

Uh oh!

sdeeptan-aws Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

sdeeptan-aws commented Feb 17, 2026

Description

Model Information

Checklist

Required Components

Optional Components

Folder Structure

Testing

Compatibility

Additional Information

Related Issues

vLLM Integration

Uh oh!

aws-yishanm Feb 17, 2026

Choose a reason for hiding this comment

Uh oh!

sdeeptan-aws Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments