-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Dev: live data #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dev: live data #226
Conversation
Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to 4.17.23. - [Release notes](https://github.com/lodash/lodash/releases) - [Commits](lodash/lodash@4.17.21...4.17.23) --- updated-dependencies: - dependency-name: lodash dependency-version: 4.17.23 dependency-type: direct:production ... Signed-off-by: dependabot[bot] <support@github.com>
| }] | ||
| return make_csv_response(rows) | ||
| except Exception as e: | ||
| return Response(f"error,{str(e)}", mimetype='text/csv'), 500 |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Stack trace information
Stack trace information
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 2 days ago
In general, to fix this issue you should avoid returning raw exception messages or stack traces to clients. Instead, log detailed errors on the server side (including exception type, message, and stack trace if needed) and return a generic, non-sensitive error message to the client, optionally with a simple error code.
For this specific function, the best minimal fix is to keep the logging line (so operators still see details) but change the HTTP response to return a static, generic CSV-formatted error string that does not embed e at all. For example, replace f"error,{str(e)}" with something like "error,Failed to fetch earthquake data". This preserves the CSV shape and HTTP status code while removing the sensitive data exposure. No changes to imports are required because we continue to use the existing logger and Response.
Concretely, in py-src/data_formulator/demo_stream_routes.py, in the get_earthquakes function’s except Exception as e: block (around lines 319–321), keep the logger.warning call as-is, and change the Response(...) construction to use a constant message that does not include e or str(e).
-
Copy modified line R321
| @@ -318,7 +318,7 @@ | ||
| return make_csv_response(rows) | ||
| except Exception as e: | ||
| logger.warning(f"Failed to fetch earthquakes: {e}") | ||
| return Response(f"error,{str(e)}", mimetype='text/csv'), 500 | ||
| return Response("error,Failed to fetch earthquake data", mimetype='text/csv'), 500 | ||
|
|
||
|
|
||
| # ============================================================================ |
| return jsonify({ | ||
| "status": "error", | ||
| "message": safe_msg | ||
| }), status_code |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 24 hours ago
In general, to fix this class of problem you should separate what is logged (full error details, including stack traces) from what is sent to users (sanitized, generic messages). The sanitization function should only ever return specific, pre‑vetted messages; it should not propagate arbitrary exception text to the client.
For this code, the best minimal fix without changing existing behavior more than necessary is:
- Keep the existing “safe patterns” behavior (where specific error messages are allowed or mapped to safer text).
- For all other errors (the
elsecase), stop returningerror_msgto the client. Instead:- Log the full error details (and, ideally, the stack trace) using
logger.error. - Return a fixed, generic message such as
"An unexpected error occurred. Please try again later."along with the 500 status code.
- Log the full error details (and, ideally, the stack trace) using
- Optionally, include some opaque ID or classification in the log only, not in the response, if needed for correlation; but that isn’t necessary for the fix.
Concretely:
- Modify the implementation of
sanitize_db_error_messageinpy-src/data_formulator/tables_routes.py.- Keep the safe pattern block intact.
- Replace the final
return f"An unexpected error occurred: {error_msg}", 500with a generic message that does not includeerror_msg. - Optionally enhance logging to include the full exception and/or stack trace (using
logger.exceptionortraceback.format_exc()), but only on the server side.
No changes are required at the data_loader_refresh_table call site other than benefiting from the safer sanitization function.
-
Copy modified line R698 -
Copy modified lines R701-R702
| @@ -695,11 +695,11 @@ | ||
| if re.search(pattern, error_msg, re.IGNORECASE): | ||
| return safe_msg, status_code | ||
|
|
||
| # Log the full error for debugging | ||
| # Log the full error for debugging (including full exception details on the server only) | ||
| logger.error(f"Unexpected error occurred: {error_msg}") | ||
|
|
||
| # Return a generic error message for unknown errors | ||
| return f"An unexpected error occurred: {error_msg}", 500 | ||
| # Return a generic error message for unknown errors to avoid leaking internal details | ||
| return "An unexpected error occurred. Please try again later.", 500 | ||
|
|
||
|
|
||
| @tables_bp.route('/data-loader/list-data-loaders', methods=['GET']) |
| return jsonify({ | ||
| "status": "error", | ||
| "message": safe_msg | ||
| }), status_code |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 24 hours ago
In general, the fix is to ensure that exception details (including str(e)) are never sent to the client for unexpected errors, and that even “known” errors expose only minimal, user-friendly messages that do not reveal implementation details. Detailed messages and stack traces should only go to server logs.
The best targeted fix here is to modify sanitize_db_error_message so that:
- For matched “safe” patterns, we return a short, controlled, human-friendly message string, not the original
error_msg, and avoid including variable internal details (paths, SQL, etc.). - For unmatched errors, we return a completely generic message (e.g.,
"An unexpected error occurred. Please try again later.") without interpolatingerror_msgat all. - We continue to log the full error server-side for debugging, possibly including a stack trace, but keep that out of the HTTP response.
Concretely, in py-src/data_formulator/tables_routes.py:
- Update
sanitize_db_error_messageat lines 670–702 so that:safe_error_patternsmaps patterns to (sanitized_message, status_code) wheresanitized_messageis a fixed, generic description per pattern (e.g.,"Requested table does not exist"rather thanerror_msg).- The default branch no longer formats
error_msginto the returned string, but instead returns a fixed generic message.
- Optionally, enhance logging to include the traceback (using
logger.exceptionortraceback.format_exc()), but only for the log, not the HTTP response.
All call sites (e.g., data_loader_list_data_loaders, data_loader_get_table_metadata, data_loader_list_table_metadata) already use the sanitizer, so no changes are needed there.
-
Copy modified line R670 -
Copy modified line R673 -
Copy modified lines R676-R677 -
Copy modified lines R679-R682 -
Copy modified lines R685-R686 -
Copy modified lines R689-R690 -
Copy modified line R693 -
Copy modified lines R696-R697 -
Copy modified lines R700-R701 -
Copy modified lines R703-R704
| @@ -667,39 +667,41 @@ | ||
| Sanitize error messages before sending to client. | ||
| Returns a tuple of (sanitized_message, status_code) | ||
| """ | ||
| # Convert error to string | ||
| # Convert error to string for logging purposes only | ||
| error_msg = str(error) | ||
|
|
||
| # Define patterns for known safe errors | ||
| # Define patterns for known error types and their sanitized messages | ||
| safe_error_patterns = { | ||
| # Database table errors | ||
| r"Table.*does not exist": (error_msg, 404), | ||
| r"Table.*already exists": (error_msg, 409), | ||
| r"Table.*does not exist": ("Requested table does not exist.", 404), | ||
| r"Table.*already exists": ("A table with this name already exists.", 409), | ||
| # Query errors | ||
| r"syntax error": (error_msg, 400), | ||
| r"Catalog Error": (error_msg, 404), | ||
| r"Binder Error": (error_msg, 400), | ||
| r"Invalid input syntax": (error_msg, 400), | ||
| r"syntax error": ("There is a syntax error in the query.", 400), | ||
| r"Catalog Error": ("The requested catalog object was not found.", 404), | ||
| r"Binder Error": ("There was a problem binding the query.", 400), | ||
| r"Invalid input syntax": ("Invalid input syntax.", 400), | ||
|
|
||
| # File errors | ||
| r"No such file": (error_msg, 404), | ||
| r"Permission denied": ("Access denied", 403), | ||
| r"No such file": ("Requested file was not found.", 404), | ||
| r"Permission denied": ("Access denied.", 403), | ||
|
|
||
| # Data loader errors | ||
| r"Entity ID": (error_msg, 500), | ||
| r"session_id": ("session_id not found, please refresh the page", 500), | ||
| r"Entity ID": ("There was a problem with the provided entity identifier.", 500), | ||
| r"session_id": ("session_id not found, please refresh the page.", 500), | ||
| } | ||
|
|
||
| # Check if error matches any safe pattern | ||
| # Check if error matches any known pattern | ||
| for pattern, (safe_msg, status_code) in safe_error_patterns.items(): | ||
| if re.search(pattern, error_msg, re.IGNORECASE): | ||
| # Log the detailed error but return only a sanitized message | ||
| logger.error(f"Handled error ({pattern}): {error_msg}") | ||
| return safe_msg, status_code | ||
|
|
||
| # Log the full error for debugging | ||
| logger.error(f"Unexpected error occurred: {error_msg}") | ||
| # Log the full error and stack trace for debugging of unexpected errors | ||
| logger.exception(f"Unexpected error occurred: {error_msg}") | ||
|
|
||
| # Return a generic error message for unknown errors | ||
| return f"An unexpected error occurred: {error_msg}", 500 | ||
| # Return a generic error message for unknown errors without exposing details | ||
| return "An unexpected error occurred. Please try again later.", 500 | ||
|
|
||
|
|
||
| @tables_bp.route('/data-loader/list-data-loaders', methods=['GET']) |
| return jsonify({ | ||
| "status": "error", | ||
| "message": safe_msg | ||
| }), status_code |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 24 hours ago
In general, to fix this kind of issue you should ensure that raw exception details are never sent directly to clients. Instead, log the detailed error on the server (including stack trace if desired) and return a generic, user-friendly message. For known safe and intentionally user-facing errors, you can still pass through some or all of the message, but unexpected errors must not leak internal information.
The best minimally invasive fix here is to change sanitize_db_error_message so that for unexpected errors it no longer includes error_msg in the message returned to clients. We can keep the current whitelisted patterns and behavior for those (since the project already treats them as safe), but for the “fallback” case we should: (1) log the full error and stack trace on the server; (2) return a generic message, e.g. "An unexpected error occurred. Please try again later or contact support.". To capture more context for developers, we can log the stack trace via logger.exception, which uses exc_info=True under the hood, instead of just logging str(error).
Concretely: in py-src/data_formulator/tables_routes.py, update lines 698–702 in sanitize_db_error_message. Replace the logger.error(f"Unexpected error occurred: {error_msg}") with logger.exception("Unexpected error occurred") (so we include stack trace in logs), and change the return from f"An unexpected error occurred: {error_msg}", 500 to a generic constant message like "An unexpected error occurred. Please try again later or contact support.", 500. This change does not alter existing control flow or status codes, only the text of unknown-error responses, and it uses imports that already exist (no new imports needed).
-
Copy modified lines R698-R699 -
Copy modified lines R701-R702
| @@ -695,11 +695,11 @@ | ||
| if re.search(pattern, error_msg, re.IGNORECASE): | ||
| return safe_msg, status_code | ||
|
|
||
| # Log the full error for debugging | ||
| logger.error(f"Unexpected error occurred: {error_msg}") | ||
| # Log the full error (with stack trace) for debugging | ||
| logger.exception("Unexpected error occurred") | ||
|
|
||
| # Return a generic error message for unknown errors | ||
| return f"An unexpected error occurred: {error_msg}", 500 | ||
| # Return a generic error message for unknown errors without exposing internal details | ||
| return "An unexpected error occurred. Please try again later or contact support.", 500 | ||
|
|
||
|
|
||
| @tables_bp.route('/data-loader/list-data-loaders', methods=['GET']) |
| return jsonify({ | ||
| "status": "error", | ||
| "message": safe_msg | ||
| }), status_code |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 24 hours ago
In general, the fix is to ensure that exception details are logged on the server but not sent to the client. User-facing responses should contain only generic, high-level information, while server logs keep the detailed str(e) or full tracebacks for debugging.
The best targeted fix here is to adjust sanitize_db_error_message so that, for unexpected errors (those that do not match any of the defined safe regex patterns), it returns a fully generic error message that does not include error_msg at all. We should still log the detailed error on the server side, ideally with a traceback, but the client should receive a constant, non-sensitive string like "An unexpected error occurred. Please contact support.". The rest of the behavior and status codes should be preserved.
Concretely, in py-src/data_formulator/tables_routes.py:
- Keep the existing safe patterns and their behavior intact.
- In
sanitize_db_error_message, at the end of the function:- Instead of logging only the message string, log the full exception information (using
logger.exceptionorlogger.error(..., exc_info=True)). - Change the returned message from
f"An unexpected error occurred: {error_msg}"to a generic string without interpolatingerror_msg.
- Instead of logging only the message string, log the full exception information (using
- No changes are needed in the calling code (e.g., lines 1171–1176), because the interface of
sanitize_db_error_messageremains(message, status_code).
No new imports are necessary; logging is already imported.
-
Copy modified lines R698-R699 -
Copy modified line R702
| @@ -695,11 +695,11 @@ | ||
| if re.search(pattern, error_msg, re.IGNORECASE): | ||
| return safe_msg, status_code | ||
|
|
||
| # Log the full error for debugging | ||
| logger.error(f"Unexpected error occurred: {error_msg}") | ||
| # Log the full error for debugging (including traceback) | ||
| logger.exception(f"Unexpected error occurred while processing database operation: {error_msg}") | ||
|
|
||
| # Return a generic error message for unknown errors | ||
| return f"An unexpected error occurred: {error_msg}", 500 | ||
| return "An unexpected error occurred. Please contact support if the problem persists.", 500 | ||
|
|
||
|
|
||
| @tables_bp.route('/data-loader/list-data-loaders', methods=['GET']) |
| return jsonify({ | ||
| "status": "error", | ||
| "message": safe_msg | ||
| }), status_code |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI about 24 hours ago
In general, to fix this issue we should ensure that detailed exception text is never echoed back to the client for unexpected errors. Instead, we should log the full error (and optionally a stack trace) on the server, and return a generic, non-sensitive message to the user. For “known safe” error types we can still provide relatively specific, user-friendly messages, but those should be curated and not directly derived from the raw exception text.
The best targeted change here is to update sanitize_db_error_message so that, for errors that do not match any of the “safe” patterns, it does not include error_msg in the returned client-facing string. We should keep or even improve the server-side logging so developers still get the details. Specifically:
- Keep the existing pattern matching; for matched patterns, we can continue returning the configured message and status code.
- For unmatched errors, change the return value from
f"An unexpected error occurred: {error_msg}", 500to a fully generic string like"An unexpected error occurred. Please contact support if the problem persists.". - Optionally enhance logging to include the full stack trace using
logger.exception, which uses the existingloggingimport and does not change external behavior.
All of this is confined to sanitize_db_error_message in py-src/data_formulator/tables_routes.py. No new imports or other files are required.
-
Copy modified lines R698-R699 -
Copy modified lines R701-R702
| @@ -695,11 +695,11 @@ | ||
| if re.search(pattern, error_msg, re.IGNORECASE): | ||
| return safe_msg, status_code | ||
|
|
||
| # Log the full error for debugging | ||
| logger.error(f"Unexpected error occurred: {error_msg}") | ||
| # Log the full error for debugging (including stack trace) | ||
| logger.exception(f"Unexpected error occurred: {error_msg}") | ||
|
|
||
| # Return a generic error message for unknown errors | ||
| return f"An unexpected error occurred: {error_msg}", 500 | ||
| # Return a generic error message for unknown errors without exposing internal details | ||
| return "An unexpected error occurred. Please contact support if the problem persists.", 500 | ||
|
|
||
|
|
||
| @tables_bp.route('/data-loader/list-data-loaders', methods=['GET']) |
| return jsonify({ | ||
| "status": "error", | ||
| "message": result.get('content', 'Unknown error during transformation') | ||
| }), 400 |
Check warning
Code scanning / CodeQL
Information exposure through an exception Medium
Stack trace information
Show autofix suggestion
Hide autofix suggestion
Copilot Autofix
AI 4 days ago
General approach: Do not return raw exception messages generated while executing user-supplied code. Instead, log the detailed error (including traceback) on the server and return only a generic, non-sensitive message to the client, optionally with a coarse error category (e.g., “Invalid transformation code”).
Concrete best fix here:
- In
py_sandbox.py, keep generating a detailederror_messagefor logging or internal debugging, but do not propagate that exact string for use in HTTP responses. - Change
run_transform_in_sandbox2020(and, for consistency,run_derive_concept) so that when an error occurs, it returns a generic, sanitized message like"An error occurred while executing the transformation code."instead ofresult['error_message']. - In
tables_routes.pywithinrefresh_derived_data, keep usingresult.get('content', ...)but rely on the sanitized generic message now provided byrun_transform_in_sandbox2020. This avoids exposing stack-trace-adjacent information while keeping the control flow and API contract intact. - Optionally, in
run_in_main_process, we can keep returningerror_messagefor logging purposes, since it no longer gets surfaced to clients.
No new imports are required; we do not change logging behavior in these snippets beyond how messages are propagated.
Specific changes:
-
File
py-src/data_formulator/py_sandbox.py:- In
run_transform_in_sandbox2020, replace the error branch (lines 141–144) to return a genericcontentstring not derived fromresult['error_message']. - In
run_derive_concept, similarly replace the error branch (lines 170–171) to return a genericcontentstring.
- In
-
File
py-src/data_formulator/tables_routes.py:- No change is strictly required once
run_transform_in_sandbox2020is sanitized, becauseresult.get('content', 'Unknown error during transformation')will only see the generic message. To keep the diff minimal and preserve behavior, we leave this file unchanged.
- No change is strictly required once
-
Copy modified line R141 -
Copy modified line R144 -
Copy modified lines R170-R174
| @@ -138,9 +138,10 @@ | ||
| 'content': result_df | ||
| } | ||
| else: | ||
| # Return a generic error message to avoid exposing internal details of the exception | ||
| return { | ||
| 'status': 'error', | ||
| 'content': result['error_message'] | ||
| 'content': 'An error occurred while executing the transformation code.' | ||
| } | ||
|
|
||
|
|
||
| @@ -168,4 +167,8 @@ | ||
| result_df[output_field_name] = result['allowed_objects']['new_column'] | ||
| return { 'status': 'ok', 'content': result_df } | ||
| else: | ||
| return { 'status': 'error', 'content': result['error_message'] } | ||
| # Return a generic error message to avoid exposing internal details of the exception | ||
| return { | ||
| 'status': 'error', | ||
| 'content': 'An error occurred while executing the derivation code.' | ||
| } |
…-4.17.23 Bump lodash from 4.17.21 to 4.17.23
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
Adds streaming/refreshable data support and expands the agentic data loader flow, with UI updates to better support live/refreshing tables and multi-table agent context.
Changes:
- Introduces table source/refresh metadata (frontend + backend) and new refresh endpoints for data-loader tables plus derived-table refresh.
- Updates UI flows for loading/refreshing data (unified upload entry points, draggable table headers, multi-table preview, report auto-refresh behavior).
- Updates charting/templates (Vega-Lite v6, new World Map template, smarter nominal axis handling).
Reviewed changes
Copilot reviewed 52 out of 57 changed files in this pull request and generated 54 comments.
Show a summary per file
| File | Description |
|---|---|
| yarn.lock | Lockfile updates for dependency bumps (incl. Vega-Lite ecosystem). |
| src/views/VisualizationView.tsx | Updates effect dependencies related to table row updates. |
| src/views/SelectableDataGrid.tsx | Adds draggable column headers and custom sort UI. |
| src/views/ReportView.tsx | Adds auto-refresh of report chart images when table data changes; updates report style options. |
| src/views/RefreshDataDialog.tsx | New dialog to replace table data via paste/file/url. |
| src/views/MultiTablePreview.tsx | New component for previewing multiple tables with a selectable active table. |
| src/views/EncodingShelfThread.tsx | React key/array construction adjustments for thread rendering. |
| src/views/EncodingShelfCard.tsx | Expands agent input to include multiple base tables; removes older UI controls. |
| src/views/DerivedDataDialog.tsx | Removes derived data candidate dialog. |
| src/views/DataView.tsx | Tweaks calculated column min width for table display. |
| src/views/DataLoadingChat.tsx | Refactors layout to a sidebar + main panel UI and adjusts empty state. |
| src/views/DataFormulator.tsx | Switches landing “load data” entry to unified upload menu/dialog; hooks in refresh logic. |
| src/views/ChartRecBox.tsx | Changes selected input tables logic for agent recs to include more base tables; fixes a MUI selector. |
| src/views/About.tsx | Reworks About page structure/layout and action buttons. |
| src/scss/DataView.scss | Adjusts header container padding; removes nested header-title styling block. |
| src/scss/App.scss | Adds global Dialog layout CSS to address scrolling issues. |
| src/data/utils.ts | Hardens Excel ingestion to skip empty sheets and ignore empty rows; adds per-sheet try/catch. |
| src/components/ComponentType.tsx | Adds table source/refresh config and contentHash; extends createDictTable signature. |
| src/components/ChartTemplates.tsx | Adds World Map template and nominal-axis enforcement helpers; adjusts grouped-bar color/offset behavior. |
| src/assets/chart-icon-world-map-min.png | Adds icon asset for World Map template. |
| src/app/utils.tsx | Adds fetchWithSession helper, content hashing, and improved temporal handling for Vega assembly. |
| src/app/store.ts | Exports store instance for use in utilities. |
| src/app/dfSlice.tsx | Adds table row/source update reducers, contentHash updates, and map auto-populate for lat/lon. |
| src/app/App.tsx | Replaces old “Data” menu/dialog set with unified upload dialog entry. |
| requirements.txt | Adds flask-limiter and yfinance. |
| pyproject.toml | Bumps project version and adds dependencies for limiter and yfinance. |
| py-src/data_formulator/tables_routes.py | Adds table source metadata storage, refresh-table endpoint, derived refresh endpoint, and metadata listing. |
| py-src/data_formulator/example_datasets_config.py | Minor formatting fix. |
| py-src/data_formulator/data_loader/s3_data_loader.py | Adds optional ORDER BY support during ingestion. |
| py-src/data_formulator/data_loader/postgresql_data_loader.py | Adds optional ORDER BY support during ingestion. |
| py-src/data_formulator/data_loader/mysql_data_loader.py | Adds optional ORDER BY support during ingestion. |
| py-src/data_formulator/data_loader/mssql_data_loader.py | Adds optional ORDER BY support during ingestion. |
| py-src/data_formulator/data_loader/mongodb_data_loader.py | Adds optional sort support during ingestion. |
| py-src/data_formulator/data_loader/kusto_data_loader.py | Adds optional sort clause support during ingestion. |
| py-src/data_formulator/data_loader/external_data_loader.py | Changes ingestion to CREATE OR REPLACE behavior. |
| py-src/data_formulator/data_loader/bigquery_data_loader.py | Adds optional ORDER BY support during ingestion. |
| py-src/data_formulator/data_loader/azure_blob_data_loader.py | Adds optional ORDER BY support during ingestion. |
| py-src/data_formulator/app.py | Registers new demo stream blueprint and initializes rate limiter. |
| py-src/data_formulator/agents/agent_utils.py | Refactors data summary formatting and removes old dedup helpers. |
| py-src/data_formulator/agents/agent_sql_data_transform.py | Updates prompt/examples and introduces shared SQL data summary generator. |
| py-src/data_formulator/agents/agent_sql_data_rec.py | Switches to new SQL data summary generator; extends schema for input_tables. |
| py-src/data_formulator/agents/agent_report_gen.py | Adds “live report” style and uses shared SQL data summary generator. |
| py-src/data_formulator/agents/agent_py_data_transform.py | Extends prompts to select input_tables and executes python transforms on selected tables only. |
| py-src/data_formulator/agents/agent_py_data_rec.py | Extends prompts to select input_tables; adds boxplot guidance; executes transforms on selected tables only. |
| py-src/data_formulator/agents/agent_interactive_explore.py | Improves prompt structure and uses shared SQL data summary generator. |
| py-src/data_formulator/agents/agent_exploration.py | Uses shared SQL data summary generator. |
| py-src/data_formulator/agent_routes.py | Removes query-completion endpoint and agent. |
| py-src/data_formulator/agents/agent_query_completion.py | Deletes query completion agent implementation. |
| package.json | Bumps lodash and pins Vega-Lite to 6.4.1. |
| README.md | Adds release notes entry for 0.6 streaming/live data. |
| .gitignore | Adds experiment_data/ and additional NUL patterns. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| Input, | ||
| Alert, | ||
| Tooltip, | ||
| Link, | ||
| alpha, | ||
| useTheme, | ||
| } from '@mui/material'; | ||
| import CloseIcon from '@mui/icons-material/Close'; | ||
| import UploadFileIcon from '@mui/icons-material/UploadFile'; | ||
| import { useDispatch, useSelector } from 'react-redux'; | ||
| import { AppDispatch } from '../app/store'; | ||
| import { DataFormulatorState, dfActions, dfSelectors, fetchFieldSemanticType } from '../app/dfSlice'; | ||
| import { DictTable } from '../components/ComponentType'; | ||
| import { createTableFromFromObjectArray, createTableFromText, loadTextDataWrapper, loadBinaryDataWrapper } from '../data/utils'; | ||
| import { getUrls } from '../app/utils'; | ||
|
|
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RefreshDataDialog has multiple unused imports/variables (e.g., Alert, Tooltip import is used but Alert isn’t; useDispatch/AppDispatch/dispatch; dfActions/dfSelectors/fetchFieldSemanticType; createTableFromFromObjectArray; loadBinaryDataWrapper; getUrls). This will fail builds with noUnusedLocals/noUnusedParameters. Remove unused imports/variables or wire them up if intended.
| <TextField | ||
| autoFocus | ||
| multiline | ||
| fullWidth | ||
| value={displayContent} | ||
| onChange={handlePasteContentChange} |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Paste preview mode is currently lossy: the TextField is controlled by displayContent (which becomes a truncated preview), and onChange writes event.target.value back into pasteContent. If a user edits while in preview, pasteContent is overwritten with the truncated preview and data is lost. Consider controlling the TextField with pasteContent and making it readOnly while showing a separate preview, or disable editing until “Full” is selected.
| const [{ isDragging }, dragSource, dragPreview] = useDrag(() => ({ | ||
| type: "concept-card", | ||
| item: field ? { | ||
| type: 'concept-card', | ||
| fieldID: field.id, | ||
| source: "conceptShelf" | ||
| } : undefined, | ||
| canDrag: !!field, |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
react-dnd useDrag is configured with item: undefined when no matching FieldItem is found. This is inconsistent with other drag sources in the codebase and can cause runtime/type issues inside react-dnd. Prefer always providing a stable item object and rely on canDrag to disable dragging (or conditionally skip calling useDrag and skip attaching refs when not draggable).
| // Convert to ISO date strings for Vega-Lite compatibility | ||
| if (typeof val === 'number') { | ||
| // Handle Year/Decade semantic types - these are year numbers, not timestamps | ||
| if (semanticType === 'Year' || semanticType === 'Decade') { | ||
| // Year values like 2018 should become "2018-01-01" | ||
| r[temporalKey] = `${Math.floor(val)}`; | ||
| } else if (isLikelyTimestamp(val)) { |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Year/Decade handling sets a numeric year to a bare string (e.g. "2018"), but the inline comment says it should become an ISO date (e.g. "2018-01-01"). With Vega-Lite temporal encodings, a bare year string can parse inconsistently. Either convert to a full ISO date string for temporal fields or keep the encoding as ordinal/nominal for Year/Decade.
src/views/ReportView.tsx
Outdated
| const rowCount = table.rows.length; | ||
| const firstRows = JSON.stringify(table.rows.slice(0, 3)); | ||
| const lastRows = JSON.stringify(table.rows.slice(-2)); | ||
| const signature = `${rowCount}:${firstRows}:${lastRows}`; | ||
|
|
||
| const prevSignature = tableRowSignaturesRef.current.get(tableId); | ||
| if (prevSignature && prevSignature !== signature) { | ||
| hasChanges = true; | ||
| } | ||
| tableRowSignaturesRef.current.set(tableId, signature); | ||
| } | ||
| }); | ||
|
|
||
| // If data changed, regenerate chart images for the report | ||
| if (hasChanges) { | ||
| console.log('[ReportView] Table data changed, refreshing chart images...'); |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This effect recomputes JSON.stringify(table.rows.slice(...)) for every affected table on every tables change, which is expensive for streaming updates. You already added table.contentHash support in the state—use that (or a lightweight hash/rowCount+updatedAt) as the signature instead of stringifying row slices. Also avoid leaving console.log in production UI code; gate behind a debug flag or remove.
| try: | ||
| since_dt = datetime.fromisoformat(since_str.replace("Z", "+00:00")).replace(tzinfo=None) | ||
| since_timestamp = since_dt.timestamp() * 1000 # USGS uses milliseconds | ||
| except: |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except block directly handles BaseException.
| except: | |
| except Exception: |
| """Check if value is valid (not NaN/None)""" | ||
| try: | ||
| return val is not None and not (isinstance(val, float) and math.isnan(val)) | ||
| except: |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except block directly handles BaseException.
| return date_utc.strftime("%Y-%m-%d %H:%M:%S") | ||
| else: | ||
| return str(date_utc) | ||
| except: |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except block directly handles BaseException.
| for date, row in hist.iterrows(): | ||
| try: | ||
| date_str = date.strftime("%Y-%m-%d") if hasattr(date, 'strftime') else str(date) | ||
| except: |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Except block directly handles BaseException.
| except: | |
| except Exception: |
| except: | ||
| pass |
Copilot
AI
Jan 26, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
'except' clause does nothing but pass and there is no explanatory comment.
| except: | |
| pass | |
| except (ValueError, TypeError) as e: | |
| # If the 'since' parameter is malformed, ignore it and proceed without a time filter | |
| logger.debug("Invalid 'since' parameter '%s': %s", since_str, e) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
Mestway
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looking good!
Adding streaming data and agentic data loader