Skip to content

Releases: aws/nova-forge-sdk

Release 1.3.24

30 Mar 19:44
dc43062

Choose a tag to compare

New Features

Training Job Email Notifications (SMTJ, SMHP)

  • Added support for job notifications via email for training on the SMTJ and SMHP platforms via the NotificationManager class.
  • Provided CloudFormation notification templates to allow users to deploy a stack of resources for receiving status updates.

SMHP Cluster Instance Group Scaling

  • Added the scale_cluster() and get_instance_groups() functions to the SMHPRuntimeManager to allow users to scale their SMHP cluster restricted instance groups (RIGs) from the SDK.

RFT Lambda Deploy and Validation via RuntimeManager

  • Added deploy_lambda() to RuntimeManager to package a local python file into zip, create/update a Lambda function, and store the resulting ARN on rft_lambda_arn.
  • Added validate_lambda() method to invoke a deployed lambda with sample data from S3 to validate correctness before training.

Enhancements

  • Simplified ECS infrastructure for RFT Multi-Turn and improved reliability
    • Removed Docker/ECR image push requirements
    • Switched ECS tasks to a public Amazon Linux 2023 base image
    • Improved CF stack state and virtual environment validation
    • Note: User might need to update their IAM policies to remove ECR permissions and add new SSM/EC2/CloudFormation permissions for RFT multiturn.
  • Updated troubleshooting documentation to provide guidance on Bedrock deployment with permission restraints.
  • Improved CloudWatchLogMonitor usability by allowing job start time resolution from platform APIs when not explicitly provided.

Bug Fixes

  • Fixed evaluate() passing the incorrect training method to the RecipeBuilder to instead pass TrainingMethod.EVALUATION.
  • Added an optional “recipe” flag to the “validation_config” to bypass recipe validation when needed.

Release 1.3.18

19 Mar 21:33
6ed1f59

Choose a tag to compare

Release 1.3.18

Bug Fixes

  • Fixed incorrect input data upload format for RFT SageMaker Training Jobs

Release 1.3.17

19 Mar 19:02
0c80693

Choose a tag to compare

Release 1.3.17

Bug Fixes

  • Fixed incorrect input data upload format for Nova 1.0 SageMaker Training Jobs

Release 1.3.16

18 Mar 15:01
dd90fff

Choose a tag to compare

Release 1.3.16

Bug Fixes

  • Removed requirements.txt to address installation issue for Mac OS

Release 1.3.14

17 Mar 19:36
2506a29

Choose a tag to compare

Release 1.3.14

The package has been renamed to Nova Forge SDK.
All internal imports are now referenced as amzn_nova_forge.

Upgrade Instructions

To use the latest version of Nova Forge SDK, run:

# Remove the old SDK
pip uninstall amzn-nova-customization-sdk

# Install the new SDK
pip install amzn-nova-forge

New Features

Bedrock Fine‑tuning

  • Added support for Supervised Fine‑Tuning (SFT) and Reinforcement Fine‑Tuning (RFT) with Low‑Rank Adaptation (LoRA) on Bedrock.
  • Introduced a new runtime manager, platform support, and extensive infrastructure to integrate Bedrock as a training platform.
  • Implemented job creation, status tracking, and cleanup for Bedrock jobs.

Limitations & Constraints

Limitation Details
Evaluation & batch inference Not supported on Bedrock
Supported methods Only SFT LoRA and RFT LoRA
Validation datasets Not supported for Nova Lite 2 models
Monitoring MLFlow monitoring is not available for Bedrock jobs

Serverless SageMaker Training Jobs

  • Enabled serverless SageMaker Training Jobs for SFT and Direct Preference Optimization (DPO) with Full‑Rank and LoRA.

Plot Training Metrics

  • Enabled plotting of:
    • Training loss curve for SFT and Continuous Pre‑Training (CPT) jobs.
    • Reward score curve for RFT jobs.

Enhancements

  • Data Prep operations refactored
    These changes are backward compatible via deprecation warnings:

    • transform()/validate() → method parameter renamed to training_method; the new parameter selects the operation type (default: SCHEMA).
    • column_mappings moved from loader constructor to transform() kwarg.
    • save_data() renamed to save().
    • split_data() renamed to split().
    • New guide: docs/data_prep.md.
    • Updated README and QuickStart notebook.
  • SageMaker SDK V3 Upgrade

  • Security documentation
    SECURITY.md contents moved to README.md.

  • Added Image URI override validation support.

Bug Fixes

  • Added missing IAM permissions for the RFT multiturn guidance.
  • Updated replicas override to use the customer‑provided instance_count instead of the recipe template value.

Release 1.1.2

03 Mar 22:15
6a15c05

Choose a tag to compare

Release 1.1.2

New Features

Reinforcement Fine-Tuning (RFT) Multiturn

  • Added RFT Multiturn data transformer and validator.
  • Added support for session restoration using dump() and load().
  • Enabled custom starter kit path via starter_kit_path to use custom environments.
  • Deprecated start_training_environment() and start_evaluation_environment() in favor of start_environment().
  • Added a feature to detect duplicate environments running on same stack and infrastructure.

Reinforcement Fine-Tuning (RFT) Singleturn

  • Introduced RFT Lambda verification via the validation_config parameter in NovaModelCustomizer.

Job Caching

  • Added enable_job_caching to NovaModelCustomizer to save job results to disk and reuse them on subsequent calls with matching parameters.

Enhancements

Validation for SageMaker Inference (SMI) Configs

  • Added validation for context length and concurrency settings based on model and instance type when deploying to SMI endpoints.

Bug Fixes

  • Fixed validation logic for save_steps for RFT multiturn to accept integer values.
  • Removed pinned version constraint for numpy (numpy<=2.2.6) to resolve dependency conflicts.

Release 1.0.97

20 Feb 16:11
3dcffe0

Choose a tag to compare

Bug Features

  • Addresses bug reported in #31
  • Updates error message with correct Github repo link

Release 1.0.96

16 Feb 20:44
23d2705

Choose a tag to compare

New Features

SageMaker Model Deployment

  • Introduced SageMaker as a deployment platform option alongside Bedrock On-Demand and Provisioned Throughput

  • Introduced invoke_inference() method for real-time inference supporting both SageMaker and Bedrock platforms

Reinforcement Fine-Tuning (RFT) Multiturn

  • Enabled Reinforcement Fine-Tuning (RFT) multiturn training and evaluation for Nova 2.0 models (Forge-subscribed feature only)

  • Introduced CustomEnvironment and RFTMultiturnInfrastructure classes for setting up custom Reinforcement Learning (RL) environments on local, EC2, or ECS infrastructure for training and evaluation

Inference to SageMaker and Bedrock Endpoints

  • Added support for streaming and non-streaming inference requests for SageMaker text models

  • Added support for non-streaming inference requests for Bedrock text models

MLflow Monitoring

  • Added get_presigned_url() function to MLflowMonitor for generating presigned URLs to access the MLflow tracking server UI

Enhancements

Region Support

  • Added support for us-west-2

Model deployment interface (NovaModelCustomizer.deploy)

  • Renamed pt_units parameter to unit_count for broader applicability across deployment platforms

  • Replaced bedrock_execution_role_name parameter with execution_role_name for flexible role configuration across platforms

  • Added sagemaker_instance_type parameter with default value ml.g5.4xlarge for SageMaker deployments

  • Added sagemaker_environment_variablesparameter for SageMaker environment configuration

Bug Fixes

  • Fix recipe overrides by creating deep copies to avoid shared references especially for overrides that have the same values.

  • Fix validation logic for the "temperature" parameter to accept both int and float types.

Release 1.0.83

30 Jan 18:01
ac24ef0

Choose a tag to compare

New Features

Training & Model Support

  • Adds Direct preference optimization (DPO) support (LoRA and Full) for SMTJ and SMHP on Nova 1.0

Memory Management

  • Refactors dataset loading to use iterators and lazy loading, allowing us to load large datasets with bounded host-memory utilization

Enhancements

Documentation

  • Improves documentation clarity by reducing bulk in README.md
  • Moves allowed instance types information from README.md to its own document
  • Expands SECURITY.md with security best practices and code examples
  • Adds CPT/DPO examples to JumpStart notebook
  • Updates spec.md with latest SDK changes and additional AWS documentation links

Installation & Setup

  • Improves guidance for Forge set-up
  • Improves notebook markdown formatting

Logging

  • Adds warning for checkpoint resolution failures with base model fallback in evaluate and batch inference

Bug Fixes

  • Fixed DatasetValidator for Multi-modal data with Nova models

Release 1.0.72

23 Jan 18:49
99e6489

Choose a tag to compare

Hey Nova builders! We have a bunch of features and Quality-of-Life enhancements for this release, and also improved some edge-case behaviors from the initial release.

New Features

MLflow Integration

  • Track training experiments with Amazon SageMaker MLflow tracking servers
  • Auto-discover DefaultMLFlowApp in your AWS account or specify a custom tracking URI
  • Log metrics, hyperparameters, and model artifacts automatically during training

Continued Pre-Training (CPT) Support

  • Pre-train Nova models on your own datasets

Data Mixing for SFT and CPT (Nova Forge customers only)

  • Blend your custom training data with Nova's high-quality curated datasets

OpenAI Messages Format Conversion

  • Transform datasets from OpenAI chat format to the Converse API format for use with Nova models

Multimodal SFT Dataset Validation

  • Validate image content in SFT datasets (PNG, JPEG, GIF formats)
  • Validate video content in SFT datasets (MOV, MKV, MP4 formats)
  • Validate document content for Nova 2.0 (PDF format)

Enhancements

Dataset Validation

  • Enhanced data-format validation for SFT, RFT, CPT, and Eval jobs, including message role ordering, content types, and tool specifications / uses.

IAM and Security

  • We now have more granular validations of whether the IAM calling role has the required permissions for a job. Where possible, we also validate the SMTJ execution role.
    • IAM validation can be disabled via the validation_config parameter of NovaModelCustomizer
  • New create_bedrock_execution_role() utility function generates scoped-down IAM roles for Bedrock model deployment with minimal required permissions
  • New VPC configuration parameters (subnets, security_group_ids) in SMTJRuntimeManager allow training jobs to run within your VPC for network isolation
  • New kms_key_id parameter encrypts training job output artifacts and inter-container traffic with your KMS key

Recipe Management

  • Recipes are now automatically pulled from SageMaker JumpStart, ensuring you always use the latest supported configurations
  • When starting evaluation, batch inference, or model deployment, we now automatically extract the model checkpoint from your most recent training job output unless explicitly specified.
    • You can also provide a TrainingResult object to extract the model checkpoint from it.

Documentation

  • Added SECURITY.md for vulnerability reporting

Bug Fixes

  • Fixed edge-case where evaluation jobs would fail if data_s3_path was set on the customizer but not needed for the evaluation task
  • Improved validation error messages with specific field locations
  • Improved README setup documentation and fixed some errors in the examples