Releases: aws/nova-forge-sdk
Release 1.3.24
New Features
Training Job Email Notifications (SMTJ, SMHP)
- Added support for job notifications via email for training on the SMTJ and SMHP platforms via the NotificationManager class.
- Provided CloudFormation notification templates to allow users to deploy a stack of resources for receiving status updates.
SMHP Cluster Instance Group Scaling
- Added the scale_cluster() and get_instance_groups() functions to the SMHPRuntimeManager to allow users to scale their SMHP cluster restricted instance groups (RIGs) from the SDK.
RFT Lambda Deploy and Validation via RuntimeManager
- Added deploy_lambda() to RuntimeManager to package a local python file into zip, create/update a Lambda function, and store the resulting ARN on rft_lambda_arn.
- Added validate_lambda() method to invoke a deployed lambda with sample data from S3 to validate correctness before training.
Enhancements
- Simplified ECS infrastructure for RFT Multi-Turn and improved reliability
- Removed Docker/ECR image push requirements
- Switched ECS tasks to a public Amazon Linux 2023 base image
- Improved CF stack state and virtual environment validation
- Note: User might need to update their IAM policies to remove ECR permissions and add new SSM/EC2/CloudFormation permissions for RFT multiturn.
- Updated troubleshooting documentation to provide guidance on Bedrock deployment with permission restraints.
- Improved CloudWatchLogMonitor usability by allowing job start time resolution from platform APIs when not explicitly provided.
Bug Fixes
- Fixed evaluate() passing the incorrect training method to the RecipeBuilder to instead pass TrainingMethod.EVALUATION.
- Added an optional “recipe” flag to the “validation_config” to bypass recipe validation when needed.
Release 1.3.18
Release 1.3.18
Bug Fixes
- Fixed incorrect input data upload format for RFT SageMaker Training Jobs
Release 1.3.17
Release 1.3.17
Bug Fixes
- Fixed incorrect input data upload format for Nova 1.0 SageMaker Training Jobs
Release 1.3.16
Release 1.3.16
Bug Fixes
- Removed
requirements.txtto address installation issue for Mac OS
Release 1.3.14
Release 1.3.14
The package has been renamed to Nova Forge SDK.
All internal imports are now referenced as amzn_nova_forge.
Upgrade Instructions
To use the latest version of Nova Forge SDK, run:
# Remove the old SDK
pip uninstall amzn-nova-customization-sdk
# Install the new SDK
pip install amzn-nova-forgeNew Features
Bedrock Fine‑tuning
- Added support for Supervised Fine‑Tuning (SFT) and Reinforcement Fine‑Tuning (RFT) with Low‑Rank Adaptation (LoRA) on Bedrock.
- Introduced a new runtime manager, platform support, and extensive infrastructure to integrate Bedrock as a training platform.
- Implemented job creation, status tracking, and cleanup for Bedrock jobs.
Limitations & Constraints
| Limitation | Details |
|---|---|
| Evaluation & batch inference | Not supported on Bedrock |
| Supported methods | Only SFT LoRA and RFT LoRA |
| Validation datasets | Not supported for Nova Lite 2 models |
| Monitoring | MLFlow monitoring is not available for Bedrock jobs |
Serverless SageMaker Training Jobs
- Enabled serverless SageMaker Training Jobs for SFT and Direct Preference Optimization (DPO) with Full‑Rank and LoRA.
Plot Training Metrics
- Enabled plotting of:
- Training loss curve for SFT and Continuous Pre‑Training (CPT) jobs.
- Reward score curve for RFT jobs.
Enhancements
-
Data Prep operations refactored
These changes are backward compatible via deprecation warnings:transform()/validate()→ method parameter renamed totraining_method; the new parameter selects the operation type (default:SCHEMA).column_mappingsmoved from loader constructor totransform()kwarg.save_data()renamed tosave().split_data()renamed tosplit().- New guide: docs/data_prep.md.
- Updated
READMEand QuickStart notebook.
-
SageMaker SDK V3 Upgrade
-
Security documentation
SECURITY.mdcontents moved toREADME.md. -
Added Image URI override validation support.
Bug Fixes
- Added missing IAM permissions for the RFT multiturn guidance.
- Updated replicas override to use the customer‑provided
instance_countinstead of the recipe template value.
Release 1.1.2
Release 1.1.2
New Features
Reinforcement Fine-Tuning (RFT) Multiturn
- Added RFT Multiturn data transformer and validator.
- Added support for session restoration using
dump()andload(). - Enabled custom starter kit path via
starter_kit_pathto use custom environments. - Deprecated
start_training_environment()andstart_evaluation_environment()in favor ofstart_environment(). - Added a feature to detect duplicate environments running on same stack and infrastructure.
Reinforcement Fine-Tuning (RFT) Singleturn
- Introduced RFT Lambda verification via the
validation_configparameter inNovaModelCustomizer.
Job Caching
- Added
enable_job_cachingtoNovaModelCustomizerto save job results to disk and reuse them on subsequent calls with matching parameters.
Enhancements
Validation for SageMaker Inference (SMI) Configs
- Added validation for context length and concurrency settings based on model and instance type when deploying to SMI endpoints.
Bug Fixes
- Fixed validation logic for
save_stepsfor RFT multiturn to accept integer values. - Removed pinned version constraint for numpy (
numpy<=2.2.6) to resolve dependency conflicts.
Release 1.0.97
Bug Features
- Addresses bug reported in #31
- Updates error message with correct Github repo link
Release 1.0.96
New Features
SageMaker Model Deployment
-
Introduced SageMaker as a deployment platform option alongside Bedrock On-Demand and Provisioned Throughput
-
Introduced
invoke_inference()method for real-time inference supporting both SageMaker and Bedrock platforms
Reinforcement Fine-Tuning (RFT) Multiturn
-
Enabled Reinforcement Fine-Tuning (RFT) multiturn training and evaluation for Nova 2.0 models (Forge-subscribed feature only)
-
Introduced CustomEnvironment and RFTMultiturnInfrastructure classes for setting up custom Reinforcement Learning (RL) environments on local, EC2, or ECS infrastructure for training and evaluation
Inference to SageMaker and Bedrock Endpoints
-
Added support for streaming and non-streaming inference requests for SageMaker text models
-
Added support for non-streaming inference requests for Bedrock text models
MLflow Monitoring
- Added
get_presigned_url()function to MLflowMonitor for generating presigned URLs to access the MLflow tracking server UI
Enhancements
Region Support
- Added support for us-west-2
Model deployment interface (NovaModelCustomizer.deploy)
-
Renamed
pt_unitsparameter tounit_countfor broader applicability across deployment platforms -
Replaced
bedrock_execution_role_nameparameter withexecution_role_namefor flexible role configuration across platforms -
Added
sagemaker_instance_typeparameter with default valueml.g5.4xlargefor SageMaker deployments -
Added
sagemaker_environment_variablesparameter for SageMaker environment configuration
Bug Fixes
-
Fix recipe overrides by creating deep copies to avoid shared references especially for overrides that have the same values.
-
Fix validation logic for the "temperature" parameter to accept both int and float types.
Release 1.0.83
New Features
Training & Model Support
- Adds Direct preference optimization (DPO) support (LoRA and Full) for SMTJ and SMHP on Nova 1.0
Memory Management
- Refactors dataset loading to use iterators and lazy loading, allowing us to load large datasets with bounded host-memory utilization
Enhancements
Documentation
- Improves documentation clarity by reducing bulk in README.md
- Moves allowed instance types information from README.md to its own document
- Expands SECURITY.md with security best practices and code examples
- Adds CPT/DPO examples to JumpStart notebook
- Updates spec.md with latest SDK changes and additional AWS documentation links
Installation & Setup
- Improves guidance for Forge set-up
- Improves notebook markdown formatting
Logging
- Adds warning for checkpoint resolution failures with base model fallback in evaluate and batch inference
Bug Fixes
- Fixed DatasetValidator for Multi-modal data with Nova models
Release 1.0.72
Hey Nova builders! We have a bunch of features and Quality-of-Life enhancements for this release, and also improved some edge-case behaviors from the initial release.
New Features
MLflow Integration
- Track training experiments with Amazon SageMaker MLflow tracking servers
- Auto-discover DefaultMLFlowApp in your AWS account or specify a custom tracking URI
- Log metrics, hyperparameters, and model artifacts automatically during training
Continued Pre-Training (CPT) Support
- Pre-train Nova models on your own datasets
Data Mixing for SFT and CPT (Nova Forge customers only)
- Blend your custom training data with Nova's high-quality curated datasets
OpenAI Messages Format Conversion
- Transform datasets from OpenAI chat format to the Converse API format for use with Nova models
Multimodal SFT Dataset Validation
- Validate image content in SFT datasets (PNG, JPEG, GIF formats)
- Validate video content in SFT datasets (MOV, MKV, MP4 formats)
- Validate document content for Nova 2.0 (PDF format)
Enhancements
Dataset Validation
- Enhanced data-format validation for SFT, RFT, CPT, and Eval jobs, including message role ordering, content types, and tool specifications / uses.
IAM and Security
- We now have more granular validations of whether the IAM calling role has the required permissions for a job. Where possible, we also validate the SMTJ execution role.
- IAM validation can be disabled via the
validation_configparameter ofNovaModelCustomizer
- IAM validation can be disabled via the
- New
create_bedrock_execution_role()utility function generates scoped-down IAM roles for Bedrock model deployment with minimal required permissions - New VPC configuration parameters (
subnets,security_group_ids) inSMTJRuntimeManagerallow training jobs to run within your VPC for network isolation - New
kms_key_idparameter encrypts training job output artifacts and inter-container traffic with your KMS key
Recipe Management
- Recipes are now automatically pulled from SageMaker JumpStart, ensuring you always use the latest supported configurations
- When starting evaluation, batch inference, or model deployment, we now automatically extract the model checkpoint from your most recent training job output unless explicitly specified.
- You can also provide a
TrainingResultobject to extract the model checkpoint from it.
- You can also provide a
Documentation
- Added SECURITY.md for vulnerability reporting
Bug Fixes
- Fixed edge-case where evaluation jobs would fail if
data_s3_pathwas set on the customizer but not needed for the evaluation task - Improved validation error messages with specific field locations
- Improved README setup documentation and fixed some errors in the examples