Feat/improving object download by wordsworthc · Pull Request #1 · wordsworthc/evo-python-sdk

wordsworthc · 2025-10-07T22:44:17Z

Description

Big changes are coming to evo-objects!

The idea behind this PR is to lay some foundations for more expressive interactions with Geoscience Objects, in this case focusing specifically on consuming Geoscience Object data. The bullet-point changes are:

Add ObjectReference type for structured URL references
Add DownloadedObject.from_reference() constructor
Add DownloadedObject.search() method for JMESPath queries
Add DownloadedObject.download_table(), DownloadedObject.download_dataframe(), and DownloadedObject.download_array() methods for downloading parquet data
Deprecate KnownTableFormat.load_table() in favor of the ParquetLoader utility class

ObjectReference Type

A new ObjectReference type has been introduced as a (de)structured URL reference to geoscience objects. This type is implemented as a subclass of str, ensuring full backward compatibility - it can be used anywhere an object URL string is expected without breaking existing code. The ObjectReference is now provided via the ObjectMetadata.url property (which previously returned a plain string), maintaining compatibility while adding enhanced functionality. Additionally, a static method constructor has been added to make it easy to create an ObjectReference from component parts, simplifying object URL construction.

Example:

obj_ref = ObjectReference.new(
    environment=environment,  # an existing Environment instance
    object_path="path/to/object.json",  # or object_id=UUID("<object-id>")
    version_id="<version-id>",  # optional
)

DownloadedObject.from_reference() Constructor

A new static method constructor DownloadedObject.from_reference() has been added to enable simpler interactions when the Object URL is already known. This streamlines the process of working with geoscience objects by reducing the steps needed to download and interact with object data. The existing ObjectAPIClient has been refactored internally to use this new implementation for downloading geoscience objects, ensuring consistent behavior across the SDK while maintaining full backward compatibility with existing code.

Example:

downloaded_object = await DownloadedObject.from_reference(
    connector=connector,  # an existing APIConnector instance
    object_reference="<geoscience-object-url>",  # or an ObjectReference instance
    cache=cache,  # an existing ICache instance, optional
)

DownloadedObject.search() Method

The new DownloadedObject.search() method provides powerful querying capabilities for Geoscience Object JSON data using JMESPath expressions. This allows developers to efficiently extract specific data from complex object structures without manually traversing the JSON hierarchy, making data access more intuitive and less error-prone.

Example:

# Use a JMESPath expression to query the object data
result = downloaded_object.search("locations.coordinates")

# JMESPath expressions can also be used to filter and transform data
filtered_result = downloaded_object.search("locations.attributes[?attribute_type=='scalar']")
transformed_result = downloaded_object.search("locations.attributes[].{key: key || name, name: name}")

Data Download Methods

Three new methods have been added to DownloadedObject for downloading parquet data in different formats: download_table() returns a pyarrow.Table, download_dataframe() returns a pandas.DataFrame, and download_array() returns a numpy.ndarray. These methods are optionally enabled through dependency checks, with the utils extra dependency providing all required packages. Similar to the existing DataClient, these methods accept a dictionary resembling TableInfo format.

These methods offer several improvements over the existing DataClient:

JMESPath Support: Instead of requiring dictionary input, you can provide a JMESPath expression that resolves to a TableInfo-like JSON object
Self-contained Data Access: The data ID referenced by TableInfo must be available within the current object's JSON data
Simplified Interface: No additional identifiers are needed since DownloadedObject already contains the necessary details and API connector

This approach represents the preferred method for accessing parquet data, and the existing DataClient implementation will be gradually phased out through deprecation warnings before eventual removal.

Example:

# Using a dictionary
table_info = {
  "data": "<data-id>",
  "length": 1234,
  "width": 3,
  "data_type": "float64"
}
table = await downloaded_object.download_table(table_info)
df = await downloaded_object.download_dataframe(table_info)
array = await downloaded_object.download_array(table_info)

# Using a JMESPath expression
table = await downloaded_object.download_table("locations.coordinates")
df = await downloaded_object.download_dataframe("locations.coordinates")
array = await downloaded_object.download_array("locations.coordinates")

A new ParquetDownloader utility class has been introduced for downloading parquet data from evo.common.io.Download instances, with a ParquetLoader for schema validation and loading data into required formats. These lower-level utilities serve as the foundation for the DownloadedObject data download methods but may also be useful in other contexts where direct parquet data loading is needed, providing flexibility for advanced use cases.

Deprecation of KnownTableFormat.load_table()

The existing KnownTableFormat.load_table() method has been marked as deprecated in favor of the new ParquetLoader implementation. This change encourages developers to transition to the more robust and flexible ParquetLoader for loading parquet data, aligning with the overall improvements in data handling within the SDK. The deprecation is communicated through warnings, allowing developers time to adapt their code before the method is potentially removed in future releases.

Other non-specific changes

removed DataType, Schema, Table, and DataFrame protocols from evo.objects.utils in favour of the actual types from pyarrow and pandas.
refactored parsing API responses for improved composition and re-use
Always use HTTPS URLs in test data (http://unittest.localhost/ -> https://unittest.localhost/)

Checklist

I have read the contributing guide and the code of conduct

…arrow tables, pandas dataframes, or numpy arrays

wordsworthc · 2025-10-08T02:12:58Z

Just to clarify, the file count in this PR is largely due to using find & replace to change http://unittest.localhost/ to https://unittest.localhost/. most of the .json files touched in this PR are for that change alone, and can be safely skimmed over. The only exception is packages/evo-objects/tests/data/get_object_detailed.json, which was added for TestDownloadedObject.

In fact, all of the changes outside of evo-objects are for this reason alone.

BenLewis-Seequent

Looks good to me.

wordsworthc · 2025-10-08T02:38:26Z

Moved to SeequentEvo#113

wordsworthc added 30 commits October 3, 2025 12:53

Update evo-objects dependencies

85b2818

Refactor parsing object API responses

ab90ae9

Fix workspace_id mismatch in list_objects_for_instance()

eff1ba6

move ObjectAPIClient into client sub-package

5716ff0

Add license header

7f71cc7

Move DownloadedObject into new file

b82c367

add numpy and pyarrow-stubs to evo-objects utils dependencies

47de124

Refactor to use ParquetLoader for downloading parquet data

e75913b

WIP: Update unit tests

1701667

Update data client unit tests

6897d86

Remove outdated data client tests

a021eb3

Fix type annotation

d4796bb

Add object reference type

82a1145

Use HTTPS test URLs

8b7ef61

Refactor downloading a geoscience object

04834fe

Fix formatting of hub URL and object path in ObjectReference

81d773a

Merge branch 'feat/jmespath-support' into feat/improving-object-download

a265195

Move parse.py to client submodule

4326454

Add optional cache to ObjectAPIClient

8092e4e

Add optional JMESPath support to DownloadedObject

faecdd6

Move ParquetLoader to a separate submodule

a904cfe

Add optional support to DownloadedObject for downloading tables as py…

0d7d890

…arrow tables, pandas dataframes, or numpy arrays

Export ObjectReference from evo.objects

2d30f68

Add unit tests for DownloadedObject

2533605

Bump evo-objects to 0.3.0

e124d40

Run uv lock

c066364

Update evo-objects quickstart

4cd763e

JMESPath support is not optional in evo-objects

878b6b4

get_parquet_loader() doesn't need to be part of the public API

a56f875

Rename evo.objects.loader -> evo.objects.parquet

4112b6b

wordsworthc added 3 commits October 8, 2025 11:39

Refactor ParquetLoader into ParquetDownloader + ParquetLoader

8b07860

Fix pydantic error in python 3.11

4836794

Fix NoImport test util for macos

a43c437

Merge branch 'main' into feat/improving-object-download

d1ffa20

wordsworthc changed the base branch from feat/jmespath-support to main October 8, 2025 02:34

BenLewis-Seequent approved these changes Oct 8, 2025

View reviewed changes

wordsworthc closed this Oct 8, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Comments

Feat/improving object download#1

Feat/improving object download#1
wordsworthc wants to merge 34 commits intomainfrom
feat/improving-object-download

wordsworthc commented Oct 7, 2025

Uh oh!

wordsworthc commented Oct 8, 2025

Uh oh!

BenLewis-Seequent left a comment

Uh oh!

wordsworthc commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Conversation

wordsworthc commented Oct 7, 2025

Description

ObjectReference Type

DownloadedObject.from_reference() Constructor

DownloadedObject.search() Method

Data Download Methods

Deprecation of KnownTableFormat.load_table()

Other non-specific changes

Checklist

Uh oh!

wordsworthc commented Oct 8, 2025

Uh oh!

BenLewis-Seequent left a comment

Choose a reason for hiding this comment

Uh oh!

wordsworthc commented Oct 8, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants