Skip to content

Comments

[Fixes #13936] Support for XLSX File Uploads in GeoNode#13937

Open
Gpetrak wants to merge 23 commits intomasterfrom
ISSUE_13936
Open

[Fixes #13936] Support for XLSX File Uploads in GeoNode#13937
Gpetrak wants to merge 23 commits intomasterfrom
ISSUE_13936

Conversation

@Gpetrak
Copy link
Member

@Gpetrak Gpetrak commented Feb 5, 2026

This PR was created accordiding to this issue: #13936

Checklist

Reviewing is a process done by project maintainers, mostly on a volunteer basis. We try to keep the overhead as small as possible and appreciate if you help us to do so by completing the following items. Feel free to ask in a comment if you have troubles with any of them.

For all pull requests:

  • Confirm you have read the contribution guidelines
  • You have sent a Contribution Licence Agreement (CLA) as necessary (not required for small changes, e.g., fixing typos in the documentation)
  • Make sure the first PR targets the master branch, eventual backports will be managed later. This can be ignored if the PR is fixing an issue that only happens in a specific branch, but not in newer ones.

The following are required only for core and extension modules (they are welcomed, but not required, for contrib modules):

  • There is a ticket in https://github.com/GeoNode/geonode/issues describing the issue/improvement/feature (a notable exemption is, changes not visible to end-users)
  • The issue connected to the PR must have Labels and Milestone assigned
  • PR for bug fixes and small new features are presented as a single commit
  • PR title must be in the form "[Fixes #<issue_number>] Title of the PR"
  • New unit tests have been added covering the changes, unless there is an explanation on why the tests are not necessary/implemented

Submitting the PR does not require you to check all items, but by the time it gets merged, they should be either satisfied or inapplicable.

@Gpetrak Gpetrak added this to the 5.1.0 milestone Feb 5, 2026
@Gpetrak Gpetrak added feature A new feature to be added to the codebase enhancement labels Feb 5, 2026
@cla-bot cla-bot bot added the cla-signed CLA Bot: community license agreement signed label Feb 5, 2026
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Gpetrak, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances GeoNode's data upload capabilities by introducing native support for XLSX and XLS file formats. It provides a robust and secure mechanism for ingesting tabular geographic data from Excel spreadsheets, ensuring data integrity through strict validation of headers and coordinates, and integrating seamlessly with the existing upload orchestration. The feature is also made configurable via an environment variable, offering flexibility for deployment.

Highlights

  • XLSX/XLS File Upload Support: Introduced a new XLSXFileHandler to enable the upload of both .xlsx (OpenXML) and .xls (Binary/Legacy) Excel files, expanding GeoNode's data ingestion capabilities.
  • Internal Conversion to CSV: The handler internally converts Excel files to CSV format using the python-calamine library, leveraging existing CSV processing logic while ensuring robust parsing of Excel data.
  • Strict Data Validation: Implemented comprehensive validation checks including sheet validation (only the first sheet is processed), header validation (requiring unique, non-empty Latitude and Longitude columns), and high-speed coordinate data sense-checking to prevent invalid or malformed geographic data.
  • Configurable Feature Toggle: Added an XLSX_UPLOAD_ENABLED environment variable across all .env files, allowing administrators to easily enable or disable the XLSX/XLS upload functionality.
  • Enhanced Security and Testing: Included a dedicated security test to protect against XML Entity Expansion (Billion Laughs) attacks, alongside extensive unit tests covering various scenarios for file handling, validation, and error conditions.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • .env.sample
    • Added XLSX_UPLOAD_ENABLED=True to enable or disable XLSX/XLS upload functionality.
  • .env_dev
    • Added XLSX_UPLOAD_ENABLED=True for development environments.
  • .env_local
    • Added XLSX_UPLOAD_ENABLED=True for local development environments.
  • .env_test
    • Added XLSX_UPLOAD_ENABLED=True for test environments.
  • geonode/upload/handlers/xlsx/handler.py
    • New file: Implements XLSXFileHandler for processing Excel files.
    • Inherits from CSVFileHandler and converts XLSX/XLS to CSV internally.
    • Includes methods for can_handle, is_valid, create_ogr2ogr_command, create_dynamic_model_fields, and pre_processing.
    • Adds strict validation for sheet presence, unique and non-empty headers, and valid Latitude/Longitude columns.
    • Incorporates _data_sense_check for robust coordinate validation (numeric, finite, magnitude).
  • geonode/upload/handlers/xlsx/tests.py
    • New file: Contains unit tests for the XLSXFileHandler.
    • Tests can_handle for both .xlsx and .xls extensions.
    • Includes tests for successful pre_processing and failure cases (missing lat/lon, wrong data types).
    • Features a security test (test_security_billion_laughs_protection) to prevent XML entity expansion attacks.
  • geonode/upload/settings.py
    • Registered geonode.upload.handlers.xlsx.handler.XLSXFileHandler in RESOURCE_HANDLERS to activate the new handler.
  • geonode/upload/tests/fixture/wrong_data.csv
    • New file: Added a CSV fixture for testing error conditions related to data validation.
  • pyproject.toml
    • Added python-calamine==0.6.1 as a new dependency for parsing Excel files.
Activity
  • The pull request was created by Gpetrak to address issue Support for XLSX File Uploads in GeoNode #13936, aiming to add support for XLSX file uploads.
  • Initial code changes include the addition of a new file handler for XLSX/XLS, corresponding unit tests, and configuration updates.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.


except Exception as e:
logger.exception("XLSX Pre-processing failed")
raise InvalidInputFileException(detail=f"Failed to securely parse Excel: {str(e)}")

Check warning

Code scanning / CodeQL

Information exposure through an exception Medium

Stack trace information
flows to this location and may be exposed to an external user.

Copilot Autofix

AI 4 days ago

In general, the fix is to avoid returning or propagating raw exception messages or stack traces to the client. Instead, log the full details server-side and return a generic, user-safe error message. For developers and operators, the log entry (with stack trace) provides enough information for debugging without exposing internals to attackers.

For this specific code, we should keep the existing logger.exception("XLSX Pre-processing failed") call, which already records the stack trace. Then, modify the raised InvalidInputFileException so that its detail is a static, generic message without interpolating str(e) (or any other exception-derived text). Functionality is preserved: clients still get a clear signal that the Excel file could not be processed; the only change is that they no longer see the internal error message. All changes occur in geonode/upload/handlers/xlsx/handler.py within the shown snippet, by replacing the single line that currently builds the tainted message.

Concretely:

  • In pre_processing, inside the except Exception as e: block, change:
raise InvalidInputFileException(detail=f"Failed to securely parse Excel: {str(e)}")

to something like:

raise InvalidInputFileException(
    detail="Failed to securely parse the Excel file. Please verify the file format and contents."
)

No new imports, methods, or definitions are required.

Suggested changeset 1
geonode/upload/handlers/xlsx/handler.py

Autofix patch

Autofix patch
Run the following command in your local git repository to apply this patch
cat << 'EOF' | git apply
diff --git a/geonode/upload/handlers/xlsx/handler.py b/geonode/upload/handlers/xlsx/handler.py
--- a/geonode/upload/handlers/xlsx/handler.py
+++ b/geonode/upload/handlers/xlsx/handler.py
@@ -215,7 +215,9 @@
 
         except Exception as e:
             logger.exception("XLSX Pre-processing failed")
-            raise InvalidInputFileException(detail=f"Failed to securely parse Excel: {str(e)}")
+            raise InvalidInputFileException(
+                detail="Failed to securely parse the Excel file. Please verify that the file is a valid XLSX document with the expected structure."
+            )
 
         # update the file path in the payload
         _data["files"]["base_file"] = output_file
EOF
@@ -215,7 +215,9 @@

except Exception as e:
logger.exception("XLSX Pre-processing failed")
raise InvalidInputFileException(detail=f"Failed to securely parse Excel: {str(e)}")
raise InvalidInputFileException(
detail="Failed to securely parse the Excel file. Please verify that the file is a valid XLSX document with the expected structure."
)

# update the file path in the payload
_data["files"]["base_file"] = output_file
Copilot is powered by AI and may make mistakes. Always verify output.
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for uploading XLSX and XLS files by converting them to CSV during a pre-processing step and then utilizing the existing CSV handler pipeline. While the implementation includes some security considerations, a critical command injection vulnerability was identified in the ogr2ogr command construction and execution flow. This vulnerability could allow an authenticated attacker to achieve remote code execution by uploading a specially crafted XLSX file, and remediation is required to ensure all user-supplied data is properly sanitized before being used in shell commands. Furthermore, a critical issue was found in the is_valid method that incorrectly attempts to validate an XLSX file using a CSV driver, which would block all uploads of this type. There are also several medium-severity recommendations to improve error handling by using more specific exception types instead of generic ones, which will enhance maintainability and provide clearer feedback to users.

@Gpetrak Gpetrak linked an issue Feb 5, 2026 that may be closed by this pull request
8 tasks
@Gpetrak Gpetrak marked this pull request as ready for review February 5, 2026 11:20
@codecov
Copy link

codecov bot commented Feb 5, 2026

Codecov Report

❌ Patch coverage is 80.08850% with 45 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.07%. Comparing base (a3959be) to head (7f6c030).
⚠️ Report is 27 commits behind head on master.

Additional details and impacted files
@@            Coverage Diff             @@
##           master   #13937      +/-   ##
==========================================
- Coverage   74.24%   74.07%   -0.18%     
==========================================
  Files         947      950       +3     
  Lines       56620    56826     +206     
  Branches     7675     7719      +44     
==========================================
+ Hits        42038    42093      +55     
- Misses      12892    13044     +152     
+ Partials     1690     1689       -1     
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@mattiagiupponi mattiagiupponi requested review from sijandh35 and removed request for giohappy February 11, 2026 11:15
Copy link
Contributor

@sijandh35 sijandh35 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @Gpetrak , when I tested manually renaming the valid_excel.xlsx file (provided in tests/fixture in this PR) to valid_excel_%$^_rename_test.xlsx (adding special characters to the file name) and trying to upload, the upload fails, showing:

Image

but when converted to CSV with the same file name, the upload succeeds.
Other looks good to me.

@giohappy
Copy link
Contributor

giohappy commented Feb 17, 2026

but when converted to CSV with the same file name, the upload succeeds.

@Gpetrak is the name validation and snification different from CSV?

@Gpetrak
Copy link
Member Author

Gpetrak commented Feb 17, 2026

@sijandh35 re-pushed

but when converted to CSV with the same file name, the upload succeeds.

@Gpetrak is the name validation and snification different from CSV?

@giohappy XLSX handler is a sub class of CSV handler but I tried to simplify some methods since we don't need WKT geometry - related processees. Thus I overrode some methods like create_ogr2ogr_command where in our case is the one that causes the bug. I just re-pushed a fix

@Gpetrak
Copy link
Member Author

Gpetrak commented Feb 20, 2026

As a note, we are waiting to fix two issues of the CSV handler which affect the XLSX handler as well, first before merging this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cla-signed CLA Bot: community license agreement signed enhancement feature A new feature to be added to the codebase

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for XLSX File Uploads in GeoNode

4 participants