Implement TransformerRegressor and update documentation#11
Conversation
- Added TransformerRegressor class for sequence modeling with transformer architecture, supporting multiple attention modes and pooling strategies. - Updated documentation to include TransformerRegressor details and usage examples.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes and found 1 potential issue.
Bugbot Autofix is OFF. To automatically fix reported issues with Cloud Agents, enable Autofix in the Cursor dashboard.
This PR is being reviewed by Cursor Bugbot
Details
You are on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle.
To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.
| ffn = layers.Dropout(self.dropout_rate)(ffn) | ||
| ffn = layers.Dense(self.d_model)(ffn) | ||
| ffn = layers.Dropout(self.dropout_rate)(ffn) | ||
| return x + ffn |
There was a problem hiding this comment.
Post-norm mode applies no LayerNorm at all
High Severity
When use_pre_norm is False, _encoder_block applies zero LayerNormalization operations. The docstring says False means "apply LayerNorm after attention/FFN" (post-norm), but the conditional only adds normalization when use_pre_norm is True and omits it entirely otherwise. A post-norm encoder block needs LayerNormalization after each residual connection (inputs + attention_out and x + ffn). Without any normalization, training will be numerically unstable and produce poor results.


Note
Medium Risk
Adds a new Keras transformer-based estimator with multiple attention/pooling modes; primary risk is correctness and training stability across backends rather than security or data handling.
Overview
Introduces a new Keras
TransformerRegressorsequence estimator, including learned positional embeddings, optional dual-axis (CrossAttention) vs temporal/feature attention modes, and attention/average pooling before an MLP head.Exports the estimator (and supporting layers) via the package/estimator
__init__lazy-import surfaces, adds unit tests covering fit/predict and all attention modes, and updates docs to list and demonstrate the new Transformer model; bumps the editable package version to0.3.2inuv.lock.Written by Cursor Bugbot for commit 403698d. This will update automatically on new commits. Configure here.