Skip to content

Conversation

@Chenglong-MS
Copy link
Collaborator

Adding streaming data and agentic data loader

dependabot bot and others added 2 commits January 23, 2026 04:42
Bumps [lodash](https://github.com/lodash/lodash) from 4.17.21 to 4.17.23.
- [Release notes](https://github.com/lodash/lodash/releases)
- [Commits](lodash/lodash@4.17.21...4.17.23)

---
updated-dependencies:
- dependency-name: lodash
  dependency-version: 4.17.23
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
}]
return make_csv_response(rows)
except Exception as e:
return Response(f"error,{str(e)}", mimetype='text/csv'), 500

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.
Stack trace information
flows to this location and may be exposed to an external user.
Stack trace information
flows to this location and may be exposed to an external user.
Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 2 days ago

In general, to fix this issue you should avoid returning raw exception messages or stack traces to clients. Instead, log detailed errors on the server side (including exception type, message, and stack trace if needed) and return a generic, non-sensitive error message to the client, optionally with a simple error code.

For this specific function, the best minimal fix is to keep the logging line (so operators still see details) but change the HTTP response to return a static, generic CSV-formatted error string that does not embed e at all. For example, replace f"error,{str(e)}" with something like "error,Failed to fetch earthquake data". This preserves the CSV shape and HTTP status code while removing the sensitive data exposure. No changes to imports are required because we continue to use the existing logger and Response.

Concretely, in py-src/data_formulator/demo_stream_routes.py, in the get_earthquakes function’s except Exception as e: block (around lines 319–321), keep the logger.warning call as-is, and change the Response(...) construction to use a constant message that does not include e or str(e).

Suggested changeset 1
py-src/data_formulator/demo_stream_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/demo_stream_routes.py b/py-src/data_formulator/demo_stream_routes.py
--- a/py-src/data_formulator/demo_stream_routes.py
+++ b/py-src/data_formulator/demo_stream_routes.py
@@ -318,7 +318,7 @@
         return make_csv_response(rows)
     except Exception as e:
         logger.warning(f"Failed to fetch earthquakes: {e}")
-        return Response(f"error,{str(e)}", mimetype='text/csv'), 500
+        return Response("error,Failed to fetch earthquake data", mimetype='text/csv'), 500
 
 
 # ============================================================================
EOF
@@ -318,7 +318,7 @@
return make_csv_response(rows)
except Exception as e:
logger.warning(f"Failed to fetch earthquakes: {e}")
return Response(f"error,{str(e)}", mimetype='text/csv'), 500
return Response("error,Failed to fetch earthquake data", mimetype='text/csv'), 500


# ============================================================================
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +1113 to +1116
return jsonify({
"status": "error",
"message": safe_msg
}), status_code

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI about 24 hours ago

In general, to fix this class of problem you should separate what is logged (full error details, including stack traces) from what is sent to users (sanitized, generic messages). The sanitization function should only ever return specific, pre‑vetted messages; it should not propagate arbitrary exception text to the client.

For this code, the best minimal fix without changing existing behavior more than necessary is:

  • Keep the existing “safe patterns” behavior (where specific error messages are allowed or mapped to safer text).
  • For all other errors (the else case), stop returning error_msg to the client. Instead:
    • Log the full error details (and, ideally, the stack trace) using logger.error.
    • Return a fixed, generic message such as "An unexpected error occurred. Please try again later." along with the 500 status code.
  • Optionally, include some opaque ID or classification in the log only, not in the response, if needed for correlation; but that isn’t necessary for the fix.

Concretely:

  • Modify the implementation of sanitize_db_error_message in py-src/data_formulator/tables_routes.py.
    • Keep the safe pattern block intact.
    • Replace the final return f"An unexpected error occurred: {error_msg}", 500 with a generic message that does not include error_msg.
    • Optionally enhance logging to include the full exception and/or stack trace (using logger.exception or traceback.format_exc()), but only on the server side.

No changes are required at the data_loader_refresh_table call site other than benefiting from the safer sanitization function.


Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -695,11 +695,11 @@
         if re.search(pattern, error_msg, re.IGNORECASE):
             return safe_msg, status_code
             
-    # Log the full error for debugging
+    # Log the full error for debugging (including full exception details on the server only)
     logger.error(f"Unexpected error occurred: {error_msg}")
     
-    # Return a generic error message for unknown errors
-    return f"An unexpected error occurred: {error_msg}", 500
+    # Return a generic error message for unknown errors to avoid leaking internal details
+    return "An unexpected error occurred. Please try again later.", 500
 
 
 @tables_bp.route('/data-loader/list-data-loaders', methods=['GET'])
EOF
@@ -695,11 +695,11 @@
if re.search(pattern, error_msg, re.IGNORECASE):
return safe_msg, status_code

# Log the full error for debugging
# Log the full error for debugging (including full exception details on the server only)
logger.error(f"Unexpected error occurred: {error_msg}")

# Return a generic error message for unknown errors
return f"An unexpected error occurred: {error_msg}", 500
# Return a generic error message for unknown errors to avoid leaking internal details
return "An unexpected error occurred. Please try again later.", 500


@tables_bp.route('/data-loader/list-data-loaders', methods=['GET'])
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +1147 to +1150
return jsonify({
"status": "error",
"message": safe_msg
}), status_code

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI about 24 hours ago

In general, the fix is to ensure that exception details (including str(e)) are never sent to the client for unexpected errors, and that even “known” errors expose only minimal, user-friendly messages that do not reveal implementation details. Detailed messages and stack traces should only go to server logs.

The best targeted fix here is to modify sanitize_db_error_message so that:

  • For matched “safe” patterns, we return a short, controlled, human-friendly message string, not the original error_msg, and avoid including variable internal details (paths, SQL, etc.).
  • For unmatched errors, we return a completely generic message (e.g., "An unexpected error occurred. Please try again later.") without interpolating error_msg at all.
  • We continue to log the full error server-side for debugging, possibly including a stack trace, but keep that out of the HTTP response.

Concretely, in py-src/data_formulator/tables_routes.py:

  • Update sanitize_db_error_message at lines 670–702 so that:
    • safe_error_patterns maps patterns to (sanitized_message, status_code) where sanitized_message is a fixed, generic description per pattern (e.g., "Requested table does not exist" rather than error_msg).
    • The default branch no longer formats error_msg into the returned string, but instead returns a fixed generic message.
  • Optionally, enhance logging to include the traceback (using logger.exception or traceback.format_exc()), but only for the log, not the HTTP response.

All call sites (e.g., data_loader_list_data_loaders, data_loader_get_table_metadata, data_loader_list_table_metadata) already use the sanitizer, so no changes are needed there.

Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -667,39 +667,41 @@
     Sanitize error messages before sending to client.
     Returns a tuple of (sanitized_message, status_code)
     """
-    # Convert error to string
+    # Convert error to string for logging purposes only
     error_msg = str(error)
     
-    # Define patterns for known safe errors
+    # Define patterns for known error types and their sanitized messages
     safe_error_patterns = {
         # Database table errors
-        r"Table.*does not exist": (error_msg, 404),
-        r"Table.*already exists": (error_msg, 409),
+        r"Table.*does not exist": ("Requested table does not exist.", 404),
+        r"Table.*already exists": ("A table with this name already exists.", 409),
         # Query errors
-        r"syntax error": (error_msg, 400),
-        r"Catalog Error": (error_msg, 404), 
-        r"Binder Error": (error_msg, 400),
-        r"Invalid input syntax": (error_msg, 400),
+        r"syntax error": ("There is a syntax error in the query.", 400),
+        r"Catalog Error": ("The requested catalog object was not found.", 404), 
+        r"Binder Error": ("There was a problem binding the query.", 400),
+        r"Invalid input syntax": ("Invalid input syntax.", 400),
         
         # File errors
-        r"No such file": (error_msg, 404),
-        r"Permission denied": ("Access denied", 403),
+        r"No such file": ("Requested file was not found.", 404),
+        r"Permission denied": ("Access denied.", 403),
 
         # Data loader errors
-        r"Entity ID": (error_msg, 500),
-        r"session_id": ("session_id not found, please refresh the page", 500),
+        r"Entity ID": ("There was a problem with the provided entity identifier.", 500),
+        r"session_id": ("session_id not found, please refresh the page.", 500),
     }
     
-    # Check if error matches any safe pattern
+    # Check if error matches any known pattern
     for pattern, (safe_msg, status_code) in safe_error_patterns.items():
         if re.search(pattern, error_msg, re.IGNORECASE):
+            # Log the detailed error but return only a sanitized message
+            logger.error(f"Handled error ({pattern}): {error_msg}")
             return safe_msg, status_code
             
-    # Log the full error for debugging
-    logger.error(f"Unexpected error occurred: {error_msg}")
+    # Log the full error and stack trace for debugging of unexpected errors
+    logger.exception(f"Unexpected error occurred: {error_msg}")
     
-    # Return a generic error message for unknown errors
-    return f"An unexpected error occurred: {error_msg}", 500
+    # Return a generic error message for unknown errors without exposing details
+    return "An unexpected error occurred. Please try again later.", 500
 
 
 @tables_bp.route('/data-loader/list-data-loaders', methods=['GET'])
EOF
@@ -667,39 +667,41 @@
Sanitize error messages before sending to client.
Returns a tuple of (sanitized_message, status_code)
"""
# Convert error to string
# Convert error to string for logging purposes only
error_msg = str(error)

# Define patterns for known safe errors
# Define patterns for known error types and their sanitized messages
safe_error_patterns = {
# Database table errors
r"Table.*does not exist": (error_msg, 404),
r"Table.*already exists": (error_msg, 409),
r"Table.*does not exist": ("Requested table does not exist.", 404),
r"Table.*already exists": ("A table with this name already exists.", 409),
# Query errors
r"syntax error": (error_msg, 400),
r"Catalog Error": (error_msg, 404),
r"Binder Error": (error_msg, 400),
r"Invalid input syntax": (error_msg, 400),
r"syntax error": ("There is a syntax error in the query.", 400),
r"Catalog Error": ("The requested catalog object was not found.", 404),
r"Binder Error": ("There was a problem binding the query.", 400),
r"Invalid input syntax": ("Invalid input syntax.", 400),

# File errors
r"No such file": (error_msg, 404),
r"Permission denied": ("Access denied", 403),
r"No such file": ("Requested file was not found.", 404),
r"Permission denied": ("Access denied.", 403),

# Data loader errors
r"Entity ID": (error_msg, 500),
r"session_id": ("session_id not found, please refresh the page", 500),
r"Entity ID": ("There was a problem with the provided entity identifier.", 500),
r"session_id": ("session_id not found, please refresh the page.", 500),
}

# Check if error matches any safe pattern
# Check if error matches any known pattern
for pattern, (safe_msg, status_code) in safe_error_patterns.items():
if re.search(pattern, error_msg, re.IGNORECASE):
# Log the detailed error but return only a sanitized message
logger.error(f"Handled error ({pattern}): {error_msg}")
return safe_msg, status_code

# Log the full error for debugging
logger.error(f"Unexpected error occurred: {error_msg}")
# Log the full error and stack trace for debugging of unexpected errors
logger.exception(f"Unexpected error occurred: {error_msg}")

# Return a generic error message for unknown errors
return f"An unexpected error occurred: {error_msg}", 500
# Return a generic error message for unknown errors without exposing details
return "An unexpected error occurred. Please try again later.", 500


@tables_bp.route('/data-loader/list-data-loaders', methods=['GET'])
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +1147 to +1150
return jsonify({
"status": "error",
"message": safe_msg
}), status_code

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI about 24 hours ago

In general, to fix this kind of issue you should ensure that raw exception details are never sent directly to clients. Instead, log the detailed error on the server (including stack trace if desired) and return a generic, user-friendly message. For known safe and intentionally user-facing errors, you can still pass through some or all of the message, but unexpected errors must not leak internal information.

The best minimally invasive fix here is to change sanitize_db_error_message so that for unexpected errors it no longer includes error_msg in the message returned to clients. We can keep the current whitelisted patterns and behavior for those (since the project already treats them as safe), but for the “fallback” case we should: (1) log the full error and stack trace on the server; (2) return a generic message, e.g. "An unexpected error occurred. Please try again later or contact support.". To capture more context for developers, we can log the stack trace via logger.exception, which uses exc_info=True under the hood, instead of just logging str(error).

Concretely: in py-src/data_formulator/tables_routes.py, update lines 698–702 in sanitize_db_error_message. Replace the logger.error(f"Unexpected error occurred: {error_msg}") with logger.exception("Unexpected error occurred") (so we include stack trace in logs), and change the return from f"An unexpected error occurred: {error_msg}", 500 to a generic constant message like "An unexpected error occurred. Please try again later or contact support.", 500. This change does not alter existing control flow or status codes, only the text of unknown-error responses, and it uses imports that already exist (no new imports needed).


Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -695,11 +695,11 @@
         if re.search(pattern, error_msg, re.IGNORECASE):
             return safe_msg, status_code
             
-    # Log the full error for debugging
-    logger.error(f"Unexpected error occurred: {error_msg}")
+    # Log the full error (with stack trace) for debugging
+    logger.exception("Unexpected error occurred")
     
-    # Return a generic error message for unknown errors
-    return f"An unexpected error occurred: {error_msg}", 500
+    # Return a generic error message for unknown errors without exposing internal details
+    return "An unexpected error occurred. Please try again later or contact support.", 500
 
 
 @tables_bp.route('/data-loader/list-data-loaders', methods=['GET'])
EOF
@@ -695,11 +695,11 @@
if re.search(pattern, error_msg, re.IGNORECASE):
return safe_msg, status_code

# Log the full error for debugging
logger.error(f"Unexpected error occurred: {error_msg}")
# Log the full error (with stack trace) for debugging
logger.exception("Unexpected error occurred")

# Return a generic error message for unknown errors
return f"An unexpected error occurred: {error_msg}", 500
# Return a generic error message for unknown errors without exposing internal details
return "An unexpected error occurred. Please try again later or contact support.", 500


@tables_bp.route('/data-loader/list-data-loaders', methods=['GET'])
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +1168 to +1171
return jsonify({
"status": "error",
"message": safe_msg
}), status_code

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI about 24 hours ago

In general, the fix is to ensure that exception details are logged on the server but not sent to the client. User-facing responses should contain only generic, high-level information, while server logs keep the detailed str(e) or full tracebacks for debugging.

The best targeted fix here is to adjust sanitize_db_error_message so that, for unexpected errors (those that do not match any of the defined safe regex patterns), it returns a fully generic error message that does not include error_msg at all. We should still log the detailed error on the server side, ideally with a traceback, but the client should receive a constant, non-sensitive string like "An unexpected error occurred. Please contact support.". The rest of the behavior and status codes should be preserved.

Concretely, in py-src/data_formulator/tables_routes.py:

  • Keep the existing safe patterns and their behavior intact.
  • In sanitize_db_error_message, at the end of the function:
    • Instead of logging only the message string, log the full exception information (using logger.exception or logger.error(..., exc_info=True)).
    • Change the returned message from f"An unexpected error occurred: {error_msg}" to a generic string without interpolating error_msg.
  • No changes are needed in the calling code (e.g., lines 1171–1176), because the interface of sanitize_db_error_message remains (message, status_code).

No new imports are necessary; logging is already imported.


Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -695,11 +695,11 @@
         if re.search(pattern, error_msg, re.IGNORECASE):
             return safe_msg, status_code
             
-    # Log the full error for debugging
-    logger.error(f"Unexpected error occurred: {error_msg}")
+    # Log the full error for debugging (including traceback)
+    logger.exception(f"Unexpected error occurred while processing database operation: {error_msg}")
     
     # Return a generic error message for unknown errors
-    return f"An unexpected error occurred: {error_msg}", 500
+    return "An unexpected error occurred. Please contact support if the problem persists.", 500
 
 
 @tables_bp.route('/data-loader/list-data-loaders', methods=['GET'])
EOF
@@ -695,11 +695,11 @@
if re.search(pattern, error_msg, re.IGNORECASE):
return safe_msg, status_code

# Log the full error for debugging
logger.error(f"Unexpected error occurred: {error_msg}")
# Log the full error for debugging (including traceback)
logger.exception(f"Unexpected error occurred while processing database operation: {error_msg}")

# Return a generic error message for unknown errors
return f"An unexpected error occurred: {error_msg}", 500
return "An unexpected error occurred. Please contact support if the problem persists.", 500


@tables_bp.route('/data-loader/list-data-loaders', methods=['GET'])
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +1168 to +1171
return jsonify({
"status": "error",
"message": safe_msg
}), status_code

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI about 24 hours ago

In general, to fix this issue we should ensure that detailed exception text is never echoed back to the client for unexpected errors. Instead, we should log the full error (and optionally a stack trace) on the server, and return a generic, non-sensitive message to the user. For “known safe” error types we can still provide relatively specific, user-friendly messages, but those should be curated and not directly derived from the raw exception text.

The best targeted change here is to update sanitize_db_error_message so that, for errors that do not match any of the “safe” patterns, it does not include error_msg in the returned client-facing string. We should keep or even improve the server-side logging so developers still get the details. Specifically:

  • Keep the existing pattern matching; for matched patterns, we can continue returning the configured message and status code.
  • For unmatched errors, change the return value from f"An unexpected error occurred: {error_msg}", 500 to a fully generic string like "An unexpected error occurred. Please contact support if the problem persists.".
  • Optionally enhance logging to include the full stack trace using logger.exception, which uses the existing logging import and does not change external behavior.

All of this is confined to sanitize_db_error_message in py-src/data_formulator/tables_routes.py. No new imports or other files are required.

Suggested changeset 1
py-src/data_formulator/tables_routes.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/tables_routes.py b/py-src/data_formulator/tables_routes.py
--- a/py-src/data_formulator/tables_routes.py
+++ b/py-src/data_formulator/tables_routes.py
@@ -695,11 +695,11 @@
         if re.search(pattern, error_msg, re.IGNORECASE):
             return safe_msg, status_code
             
-    # Log the full error for debugging
-    logger.error(f"Unexpected error occurred: {error_msg}")
+    # Log the full error for debugging (including stack trace)
+    logger.exception(f"Unexpected error occurred: {error_msg}")
     
-    # Return a generic error message for unknown errors
-    return f"An unexpected error occurred: {error_msg}", 500
+    # Return a generic error message for unknown errors without exposing internal details
+    return "An unexpected error occurred. Please contact support if the problem persists.", 500
 
 
 @tables_bp.route('/data-loader/list-data-loaders', methods=['GET'])
EOF
@@ -695,11 +695,11 @@
if re.search(pattern, error_msg, re.IGNORECASE):
return safe_msg, status_code

# Log the full error for debugging
logger.error(f"Unexpected error occurred: {error_msg}")
# Log the full error for debugging (including stack trace)
logger.exception(f"Unexpected error occurred: {error_msg}")

# Return a generic error message for unknown errors
return f"An unexpected error occurred: {error_msg}", 500
# Return a generic error message for unknown errors without exposing internal details
return "An unexpected error occurred. Please contact support if the problem persists.", 500


@tables_bp.route('/data-loader/list-data-loaders', methods=['GET'])
Copilot is powered by AI and may make mistakes. Always verify output.
Comment on lines +1241 to +1244
return jsonify({
"status": "error",
"message": result.get('content', 'Unknown error during transformation')
}), 400

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 4 days ago

General approach: Do not return raw exception messages generated while executing user-supplied code. Instead, log the detailed error (including traceback) on the server and return only a generic, non-sensitive message to the client, optionally with a coarse error category (e.g., “Invalid transformation code”).

Concrete best fix here:

  1. In py_sandbox.py, keep generating a detailed error_message for logging or internal debugging, but do not propagate that exact string for use in HTTP responses.
  2. Change run_transform_in_sandbox2020 (and, for consistency, run_derive_concept) so that when an error occurs, it returns a generic, sanitized message like "An error occurred while executing the transformation code." instead of result['error_message'].
  3. In tables_routes.py within refresh_derived_data, keep using result.get('content', ...) but rely on the sanitized generic message now provided by run_transform_in_sandbox2020. This avoids exposing stack-trace-adjacent information while keeping the control flow and API contract intact.
  4. Optionally, in run_in_main_process, we can keep returning error_message for logging purposes, since it no longer gets surfaced to clients.

No new imports are required; we do not change logging behavior in these snippets beyond how messages are propagated.

Specific changes:

  • File py-src/data_formulator/py_sandbox.py:

    • In run_transform_in_sandbox2020, replace the error branch (lines 141–144) to return a generic content string not derived from result['error_message'].
    • In run_derive_concept, similarly replace the error branch (lines 170–171) to return a generic content string.
  • File py-src/data_formulator/tables_routes.py:

    • No change is strictly required once run_transform_in_sandbox2020 is sanitized, because result.get('content', 'Unknown error during transformation') will only see the generic message. To keep the diff minimal and preserve behavior, we leave this file unchanged.

Suggested changeset 1
py-src/data_formulator/py_sandbox.py
Outside changed files

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/py-src/data_formulator/py_sandbox.py b/py-src/data_formulator/py_sandbox.py
--- a/py-src/data_formulator/py_sandbox.py
+++ b/py-src/data_formulator/py_sandbox.py
@@ -138,9 +138,10 @@
             'content': result_df
         }
     else:
+        # Return a generic error message to avoid exposing internal details of the exception
         return {
             'status': 'error',
-            'content': result['error_message']
+            'content': 'An error occurred while executing the transformation code.'
         }
 
 
@@ -168,4 +167,8 @@
         result_df[output_field_name] = result['allowed_objects']['new_column']
         return { 'status': 'ok', 'content': result_df }
     else:
-        return { 'status': 'error', 'content': result['error_message'] }
\ No newline at end of file
+        # Return a generic error message to avoid exposing internal details of the exception
+        return {
+            'status': 'error',
+            'content': 'An error occurred while executing the derivation code.'
+        }
\ No newline at end of file
EOF
@@ -138,9 +138,10 @@
'content': result_df
}
else:
# Return a generic error message to avoid exposing internal details of the exception
return {
'status': 'error',
'content': result['error_message']
'content': 'An error occurred while executing the transformation code.'
}


@@ -168,4 +167,8 @@
result_df[output_field_name] = result['allowed_objects']['new_column']
return { 'status': 'ok', 'content': result_df }
else:
return { 'status': 'error', 'content': result['error_message'] }
# Return a generic error message to avoid exposing internal details of the exception
return {
'status': 'error',
'content': 'An error occurred while executing the derivation code.'
}
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds streaming/refreshable data support and expands the agentic data loader flow, with UI updates to better support live/refreshing tables and multi-table agent context.

Changes:

  • Introduces table source/refresh metadata (frontend + backend) and new refresh endpoints for data-loader tables plus derived-table refresh.
  • Updates UI flows for loading/refreshing data (unified upload entry points, draggable table headers, multi-table preview, report auto-refresh behavior).
  • Updates charting/templates (Vega-Lite v6, new World Map template, smarter nominal axis handling).

Reviewed changes

Copilot reviewed 52 out of 57 changed files in this pull request and generated 54 comments.

Show a summary per file
File Description
yarn.lock Lockfile updates for dependency bumps (incl. Vega-Lite ecosystem).
src/views/VisualizationView.tsx Updates effect dependencies related to table row updates.
src/views/SelectableDataGrid.tsx Adds draggable column headers and custom sort UI.
src/views/ReportView.tsx Adds auto-refresh of report chart images when table data changes; updates report style options.
src/views/RefreshDataDialog.tsx New dialog to replace table data via paste/file/url.
src/views/MultiTablePreview.tsx New component for previewing multiple tables with a selectable active table.
src/views/EncodingShelfThread.tsx React key/array construction adjustments for thread rendering.
src/views/EncodingShelfCard.tsx Expands agent input to include multiple base tables; removes older UI controls.
src/views/DerivedDataDialog.tsx Removes derived data candidate dialog.
src/views/DataView.tsx Tweaks calculated column min width for table display.
src/views/DataLoadingChat.tsx Refactors layout to a sidebar + main panel UI and adjusts empty state.
src/views/DataFormulator.tsx Switches landing “load data” entry to unified upload menu/dialog; hooks in refresh logic.
src/views/ChartRecBox.tsx Changes selected input tables logic for agent recs to include more base tables; fixes a MUI selector.
src/views/About.tsx Reworks About page structure/layout and action buttons.
src/scss/DataView.scss Adjusts header container padding; removes nested header-title styling block.
src/scss/App.scss Adds global Dialog layout CSS to address scrolling issues.
src/data/utils.ts Hardens Excel ingestion to skip empty sheets and ignore empty rows; adds per-sheet try/catch.
src/components/ComponentType.tsx Adds table source/refresh config and contentHash; extends createDictTable signature.
src/components/ChartTemplates.tsx Adds World Map template and nominal-axis enforcement helpers; adjusts grouped-bar color/offset behavior.
src/assets/chart-icon-world-map-min.png Adds icon asset for World Map template.
src/app/utils.tsx Adds fetchWithSession helper, content hashing, and improved temporal handling for Vega assembly.
src/app/store.ts Exports store instance for use in utilities.
src/app/dfSlice.tsx Adds table row/source update reducers, contentHash updates, and map auto-populate for lat/lon.
src/app/App.tsx Replaces old “Data” menu/dialog set with unified upload dialog entry.
requirements.txt Adds flask-limiter and yfinance.
pyproject.toml Bumps project version and adds dependencies for limiter and yfinance.
py-src/data_formulator/tables_routes.py Adds table source metadata storage, refresh-table endpoint, derived refresh endpoint, and metadata listing.
py-src/data_formulator/example_datasets_config.py Minor formatting fix.
py-src/data_formulator/data_loader/s3_data_loader.py Adds optional ORDER BY support during ingestion.
py-src/data_formulator/data_loader/postgresql_data_loader.py Adds optional ORDER BY support during ingestion.
py-src/data_formulator/data_loader/mysql_data_loader.py Adds optional ORDER BY support during ingestion.
py-src/data_formulator/data_loader/mssql_data_loader.py Adds optional ORDER BY support during ingestion.
py-src/data_formulator/data_loader/mongodb_data_loader.py Adds optional sort support during ingestion.
py-src/data_formulator/data_loader/kusto_data_loader.py Adds optional sort clause support during ingestion.
py-src/data_formulator/data_loader/external_data_loader.py Changes ingestion to CREATE OR REPLACE behavior.
py-src/data_formulator/data_loader/bigquery_data_loader.py Adds optional ORDER BY support during ingestion.
py-src/data_formulator/data_loader/azure_blob_data_loader.py Adds optional ORDER BY support during ingestion.
py-src/data_formulator/app.py Registers new demo stream blueprint and initializes rate limiter.
py-src/data_formulator/agents/agent_utils.py Refactors data summary formatting and removes old dedup helpers.
py-src/data_formulator/agents/agent_sql_data_transform.py Updates prompt/examples and introduces shared SQL data summary generator.
py-src/data_formulator/agents/agent_sql_data_rec.py Switches to new SQL data summary generator; extends schema for input_tables.
py-src/data_formulator/agents/agent_report_gen.py Adds “live report” style and uses shared SQL data summary generator.
py-src/data_formulator/agents/agent_py_data_transform.py Extends prompts to select input_tables and executes python transforms on selected tables only.
py-src/data_formulator/agents/agent_py_data_rec.py Extends prompts to select input_tables; adds boxplot guidance; executes transforms on selected tables only.
py-src/data_formulator/agents/agent_interactive_explore.py Improves prompt structure and uses shared SQL data summary generator.
py-src/data_formulator/agents/agent_exploration.py Uses shared SQL data summary generator.
py-src/data_formulator/agent_routes.py Removes query-completion endpoint and agent.
py-src/data_formulator/agents/agent_query_completion.py Deletes query completion agent implementation.
package.json Bumps lodash and pins Vega-Lite to 6.4.1.
README.md Adds release notes entry for 0.6 streaming/live data.
.gitignore Adds experiment_data/ and additional NUL patterns.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 18 to 33
Input,
Alert,
Tooltip,
Link,
alpha,
useTheme,
} from '@mui/material';
import CloseIcon from '@mui/icons-material/Close';
import UploadFileIcon from '@mui/icons-material/UploadFile';
import { useDispatch, useSelector } from 'react-redux';
import { AppDispatch } from '../app/store';
import { DataFormulatorState, dfActions, dfSelectors, fetchFieldSemanticType } from '../app/dfSlice';
import { DictTable } from '../components/ComponentType';
import { createTableFromFromObjectArray, createTableFromText, loadTextDataWrapper, loadBinaryDataWrapper } from '../data/utils';
import { getUrls } from '../app/utils';

Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RefreshDataDialog has multiple unused imports/variables (e.g., Alert, Tooltip import is used but Alert isn’t; useDispatch/AppDispatch/dispatch; dfActions/dfSelectors/fetchFieldSemanticType; createTableFromFromObjectArray; loadBinaryDataWrapper; getUrls). This will fail builds with noUnusedLocals/noUnusedParameters. Remove unused imports/variables or wire them up if intended.

Copilot uses AI. Check for mistakes.
Comment on lines +460 to +465
<TextField
autoFocus
multiline
fullWidth
value={displayContent}
onChange={handlePasteContentChange}
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Paste preview mode is currently lossy: the TextField is controlled by displayContent (which becomes a truncated preview), and onChange writes event.target.value back into pasteContent. If a user edits while in preview, pasteContent is overwritten with the truncated preview and data is lost. Consider controlling the TextField with pasteContent and making it readOnly while showing a separate preview, or disable editing until “Full” is selected.

Copilot uses AI. Check for mistakes.
Comment on lines +117 to +124
const [{ isDragging }, dragSource, dragPreview] = useDrag(() => ({
type: "concept-card",
item: field ? {
type: 'concept-card',
fieldID: field.id,
source: "conceptShelf"
} : undefined,
canDrag: !!field,
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

react-dnd useDrag is configured with item: undefined when no matching FieldItem is found. This is inconsistent with other drag sources in the codebase and can cause runtime/type issues inside react-dnd. Prefer always providing a stable item object and rely on canDrag to disable dragging (or conditionally skip calling useDrag and skip attaching refs when not draggable).

Copilot uses AI. Check for mistakes.
Comment on lines +738 to +744
// Convert to ISO date strings for Vega-Lite compatibility
if (typeof val === 'number') {
// Handle Year/Decade semantic types - these are year numbers, not timestamps
if (semanticType === 'Year' || semanticType === 'Decade') {
// Year values like 2018 should become "2018-01-01"
r[temporalKey] = `${Math.floor(val)}`;
} else if (isLikelyTimestamp(val)) {
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Year/Decade handling sets a numeric year to a bare string (e.g. "2018"), but the inline comment says it should become an ISO date (e.g. "2018-01-01"). With Vega-Lite temporal encodings, a bare year string can parse inconsistently. Either convert to a full ISO date string for temporal fields or keep the encoding as ordinal/nominal for Year/Decade.

Copilot uses AI. Check for mistakes.
Comment on lines 399 to 414
const rowCount = table.rows.length;
const firstRows = JSON.stringify(table.rows.slice(0, 3));
const lastRows = JSON.stringify(table.rows.slice(-2));
const signature = `${rowCount}:${firstRows}:${lastRows}`;

const prevSignature = tableRowSignaturesRef.current.get(tableId);
if (prevSignature && prevSignature !== signature) {
hasChanges = true;
}
tableRowSignaturesRef.current.set(tableId, signature);
}
});

// If data changed, regenerate chart images for the report
if (hasChanges) {
console.log('[ReportView] Table data changed, refreshing chart images...');
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This effect recomputes JSON.stringify(table.rows.slice(...)) for every affected table on every tables change, which is expensive for streaming updates. You already added table.contentHash support in the state—use that (or a lightweight hash/rowCount+updatedAt) as the signature instead of stringifying row slices. Also avoid leaving console.log in production UI code; gate behind a debug flag or remove.

Copilot uses AI. Check for mistakes.
try:
since_dt = datetime.fromisoformat(since_str.replace("Z", "+00:00")).replace(tzinfo=None)
since_timestamp = since_dt.timestamp() * 1000 # USGS uses milliseconds
except:
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except block directly handles BaseException.

Suggested change
except:
except Exception:

Copilot uses AI. Check for mistakes.
"""Check if value is valid (not NaN/None)"""
try:
return val is not None and not (isinstance(val, float) and math.isnan(val))
except:
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except block directly handles BaseException.

Copilot uses AI. Check for mistakes.
return date_utc.strftime("%Y-%m-%d %H:%M:%S")
else:
return str(date_utc)
except:
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except block directly handles BaseException.

Copilot uses AI. Check for mistakes.
for date, row in hist.iterrows():
try:
date_str = date.strftime("%Y-%m-%d") if hasattr(date, 'strftime') else str(date)
except:
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Except block directly handles BaseException.

Suggested change
except:
except Exception:

Copilot uses AI. Check for mistakes.
Comment on lines +229 to +230
except:
pass
Copy link

Copilot AI Jan 26, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'except' clause does nothing but pass and there is no explanatory comment.

Suggested change
except:
pass
except (ValueError, TypeError) as e:
# If the 'since' parameter is malformed, ignore it and proceed without a time filter
logger.debug("Invalid 'since' parameter '%s': %s", since_str, e)

Copilot uses AI. Check for mistakes.
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

@Chenglong-MS Chenglong-MS changed the title [WIP] Dev: streaming data, agentic data loader Dev: live data Jan 27, 2026
@Chenglong-MS Chenglong-MS requested a review from Mestway January 27, 2026 01:00
Copy link
Collaborator

@Mestway Mestway left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looking good!

@Chenglong-MS Chenglong-MS merged commit eebe31a into main Jan 27, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants