Skip to content

Updating AFM results#36

Open
sdeeptan-aws wants to merge 1 commit intoaws-neuron:mainfrom
sdeeptan-aws:afm
Open

Updating AFM results#36
sdeeptan-aws wants to merge 1 commit intoaws-neuron:mainfrom
sdeeptan-aws:afm

Conversation

@sdeeptan-aws
Copy link
Contributor

Description

Updated AFM-4.5B-Base contrib model README with correct architecture details, validated accuracy results, and working usage examples. The model uses the Arcee architecture (arcee-ai/AFM-4.5B-Base) with YaRN RoPE scaling and ReLU² activation. Validation achieved 100% token match accuracy with a deterministic prompt.

Model Information

Model Name: AFM-4.5B-Base (Arcee)
Model Architecture: Decoder-only transformer (Arcee — similar to LLaMA with YaRN RoPE and ReLU² activation)
Purpose: Text generation

Checklist

Required Components

  • Accuracy Test (test/integration/test_model.py)
  • README.md with the following sections:
    • Usage Example: Clear code example showing how to use the model
    • Compatibility Matrix: Table showing tested Neuron SDK versions and instance types
    • Example Checkpoints: Links to compatible model checkpoints
    • Testing Instructions: Command to run the test suite for the model
  • Source Code (src/)

Optional Components

  • Unit Tests (CPU or Neuron-based)

Folder Structure

Confirm your contribution follows this structure:

/contrib/models/AFM-4.5B-Base/
    README.md
    /src
        modeling_afm.py
    /test
        /integration
            test_model.py

Testing

README updated to reflect validated results. Model was compiled and tested with TP=1, batch_size=1, seq_len=128, bfloat16. Token matching achieved 100% on deterministic prompt ("1+1="). Key implementation notes documented: YaRN RoPE scaling, ReLU² activation (no gate_proj), and QKV projection combining for Neuron.

Test Results:

Test Status Result
Smoke Test ✅ PASS Model loads successfully
Token Matching ✅ PASS 100% match

Compatibility

Tested with:

  • Instance Type(s): Trn1
  • Configuration: TP=1, batch_size=1, seq_len=128, bfloat16

Additional Information

  • YaRN RoPE scaling extends context to 65k tokens (factor=20, original_max_pos=4096)
  • ReLU² activation (relu(x).pow(2)) replaces SwiGLU — only up_proj and down_proj, no gate_proj
  • HuggingFace QKV projections are separate; combined into qkv_proj for Neuron
  • BF16 precision causes divergence on ambiguous prompts (e.g., "The capital of France is" at 7.81%) but both outputs are coherent and correct

Related Issues

N/A

vLLM Integration

  • This model/feature is intended for use with vLLM
  • Documentation includes vLLM registration instructions

By submitting this PR, I confirm that:

  • I have read and followed the contributing guidelines
  • This is a community contribution and may have limited testing compared to officially-supported models
  • The code follows best practices and is well-documented
  • All required components listed above are included


config = AFM45BBaseInferenceConfig(
neuron_config,
config = AFMInferenceConfig(

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you move the test to a different file to follow the structure in the PR Conversation?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I follow - the file is under test/integration/test_model.py

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants

Comments