Skip to content

Implement TransformerRegressor and update documentation#11

Merged
jrosenfeld13 merged 1 commit intomainfrom
feat/keras-transformer-estimator
Feb 10, 2026
Merged

Implement TransformerRegressor and update documentation#11
jrosenfeld13 merged 1 commit intomainfrom
feat/keras-transformer-estimator

Conversation

@jrosenfeld13
Copy link
Contributor

@jrosenfeld13 jrosenfeld13 commented Feb 10, 2026

  • Added TransformerRegressor class for sequence modeling with transformer architecture, supporting multiple attention modes and pooling strategies.
  • Updated documentation to include TransformerRegressor details and usage examples.

Note

Medium Risk
Adds a new Keras transformer-based estimator with multiple attention/pooling modes; primary risk is correctness and training stability across backends rather than security or data handling.

Overview
Introduces a new Keras TransformerRegressor sequence estimator, including learned positional embeddings, optional dual-axis (CrossAttention) vs temporal/feature attention modes, and attention/average pooling before an MLP head.

Exports the estimator (and supporting layers) via the package/estimator __init__ lazy-import surfaces, adds unit tests covering fit/predict and all attention modes, and updates docs to list and demonstrate the new Transformer model; bumps the editable package version to 0.3.2 in uv.lock.

Written by Cursor Bugbot for commit 403698d. This will update automatically on new commits. Configure here.

- Added TransformerRegressor class for sequence modeling with transformer architecture, supporting multiple attention modes and pooling strategies.
- Updated documentation to include TransformerRegressor details and usage examples.
Copy link

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.

This PR is being reviewed by Cursor Bugbot

Details

You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

ffn = layers.Dropout(self.dropout_rate)(ffn)
ffn = layers.Dense(self.d_model)(ffn)
ffn = layers.Dropout(self.dropout_rate)(ffn)
return x + ffn
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Post-norm mode applies no LayerNorm at all

High Severity

When use_pre_norm is False, _encoder_block applies zero LayerNormalization operations. The docstring says False means "apply LayerNorm after attention/FFN" (post-norm), but the conditional only adds normalization when use_pre_norm is True and omits it entirely otherwise. A post-norm encoder block needs LayerNormalization after each residual connection (inputs + attention_out and x + ffn). Without any normalization, training will be numerically unstable and produce poor results.

Additional Locations (1)

Fix in Cursor Fix in Web

@jrosenfeld13 jrosenfeld13 merged commit 130a850 into main Feb 10, 2026
4 checks passed
@jrosenfeld13 jrosenfeld13 deleted the feat/keras-transformer-estimator branch February 10, 2026 16:01
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant