fix: raise clear error when id_col is not found in dataframe#754
fix: raise clear error when id_col is not found in dataframe#754W057 wants to merge 3 commits intoNixtla:mainfrom
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bacb817d69
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if id_col is None: | ||
| id_col = "unique_id" | ||
| drop_id = True |
There was a problem hiding this comment.
Return normalized id_col from validations
When id_col=None, this function rewrites id_col to 'unique_id', but the updated value is not returned to callers; downstream methods still use the original None when calling _preprocess/_validate_exog (for example in forecast and cross_validation). In that path, _run_validations creates a 'unique_id' column while later code looks for column None, which causes failures before the API call and breaks the new single-series id_col=None behavior introduced by this commit.
Useful? React with 👍 / 👎.
Previously, passing a wrong id_col (e.g. id_col="Sth") would silently create a dummy ID column, leading to confusing downstream errors about timestamps. Now raises ValueError with the column name and available columns. Also accepts id_col=None for single-series DataFrames without an ID column, as discussed with @jmoralez and @ngupta23. Closes Nixtla#565
When id_col=None is passed, _run_validations rewrites it to "unique_id" internally but previously did not return the updated value. This caused downstream functions (_preprocess, _validate_exog, _maybe_drop_id) to receive None instead of "unique_id", breaking the id_col=None single-series flow. Now _run_validations returns id_col as part of its tuple, and all 5 callers (finetune, forecast, detect_anomalies, detect_anomalies_online, cross_validation) capture and use the normalized value.
d3b4d1f to
abf270e
Compare
Summary
ValueErrorwith a clear message whenid_colis not found in the dataframe columns, instead of silently creating a dummy ID that causes confusing downstream errors about timestampsid_col=Nonefor single-series DataFrames without an ID column, as discussed in [bug] error message is not correct #565Changes
nixtla/nixtla_client.py_run_validations()to check ifid_colexists in columns and raiseValueErrorif notid_col=None, creates a dummyunique_idcolumn (explicit single-series mode)strtoOptional[str]infinetune,forecast,detect_anomalies,detect_anomalies_online,cross_validationBreaking change
DataFrames without a
unique_idcolumn that relied on the implicit dummy ID behavior will now get a clear error message guiding them to useid_col=None. This was discussed and agreed upon in #565 by @jmoralez and @ngupta23.Test plan
id_col(e.g.id_col="Sth") raisesValueErrorwith column name and suggestionid_col=Nonecreates dummy ID for single seriesid_col="unique_id"with column present works normallyid_col="unique_id"without column raises clear errorid_colthat exists in columns works normallyid_col=NonewithX_dfalso adds dummy ID toX_dfCloses #565