Closed
Conversation
…arrow tables, pandas dataframes, or numpy arrays
Owner
Author
|
Just to clarify, the file count in this PR is largely due to using find & replace to change In fact, all of the changes outside of |
Owner
Author
|
Moved to SeequentEvo#113 |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Big changes are coming to
evo-objects!The idea behind this PR is to lay some foundations for more expressive interactions with Geoscience Objects, in this case focusing specifically on consuming Geoscience Object data. The bullet-point changes are:
ObjectReferencetype for structured URL referencesDownloadedObject.from_reference()constructorDownloadedObject.search()method for JMESPath queriesDownloadedObject.download_table(),DownloadedObject.download_dataframe(), andDownloadedObject.download_array()methods for downloading parquet dataKnownTableFormat.load_table()in favor of theParquetLoaderutility classObjectReference Type
A new
ObjectReferencetype has been introduced as a (de)structured URL reference to geoscience objects. This type is implemented as a subclass ofstr, ensuring full backward compatibility - it can be used anywhere an object URL string is expected without breaking existing code. TheObjectReferenceis now provided via theObjectMetadata.urlproperty (which previously returned a plain string), maintaining compatibility while adding enhanced functionality. Additionally, a static method constructor has been added to make it easy to create anObjectReferencefrom component parts, simplifying object URL construction.Example:
DownloadedObject.from_reference() Constructor
A new static method constructor
DownloadedObject.from_reference()has been added to enable simpler interactions when the Object URL is already known. This streamlines the process of working with geoscience objects by reducing the steps needed to download and interact with object data. The existingObjectAPIClienthas been refactored internally to use this new implementation for downloading geoscience objects, ensuring consistent behavior across the SDK while maintaining full backward compatibility with existing code.Example:
DownloadedObject.search() Method
The new
DownloadedObject.search()method provides powerful querying capabilities for Geoscience Object JSON data using JMESPath expressions. This allows developers to efficiently extract specific data from complex object structures without manually traversing the JSON hierarchy, making data access more intuitive and less error-prone.Example:
Data Download Methods
Three new methods have been added to
DownloadedObjectfor downloading parquet data in different formats:download_table()returns apyarrow.Table,download_dataframe()returns apandas.DataFrame, anddownload_array()returns anumpy.ndarray. These methods are optionally enabled through dependency checks, with theutilsextra dependency providing all required packages. Similar to the existingDataClient, these methods accept a dictionary resemblingTableInfoformat.These methods offer several improvements over the existing
DataClient:TableInfo-like JSON objectTableInfomust be available within the current object's JSON dataDownloadedObjectalready contains the necessary details and API connectorThis approach represents the preferred method for accessing parquet data, and the existing
DataClientimplementation will be gradually phased out through deprecation warnings before eventual removal.Example:
A new
ParquetDownloaderutility class has been introduced for downloading parquet data fromevo.common.io.Downloadinstances, with aParquetLoaderfor schema validation and loading data into required formats. These lower-level utilities serve as the foundation for theDownloadedObjectdata download methods but may also be useful in other contexts where direct parquet data loading is needed, providing flexibility for advanced use cases.Deprecation of KnownTableFormat.load_table()
The existing
KnownTableFormat.load_table()method has been marked as deprecated in favor of the newParquetLoaderimplementation. This change encourages developers to transition to the more robust and flexibleParquetLoaderfor loading parquet data, aligning with the overall improvements in data handling within the SDK. The deprecation is communicated through warnings, allowing developers time to adapt their code before the method is potentially removed in future releases.Other non-specific changes
DataType,Schema,Table, andDataFrameprotocols fromevo.objects.utilsin favour of the actual types frompyarrowandpandas.http://unittest.localhost/->https://unittest.localhost/)Checklist