Current implementation saves dataframes to disk as parquet files and then uses them to generated statistics via duckdb:
https://github.com/getml/getml-io/pull/49/files#diff-2e534974074df91c1edcab0c279d52e8228c7ec26346a746a97e90bcb582abbcR127
In case, that someone does not want to explicitly save the dataframes, we need to stream the dataframes in from getml into duckdb:
https://github.com/getml/getml-io/pull/19/files#diff-7a912f9ee2a1c8c724e374aa668d7cd394b96fa18db5b2fd912be63b092cf53eR60
- add options, so that a user can decide, if and which dataframes to store
- add a method to generate statistics from getml-arrow-stream instead of parquet file in case of not-stored dataframes
- adjust dataframe-information-path to reflect, that dataframes might not have been saved
Current implementation saves dataframes to disk as parquet files and then uses them to generated statistics via duckdb:
https://github.com/getml/getml-io/pull/49/files#diff-2e534974074df91c1edcab0c279d52e8228c7ec26346a746a97e90bcb582abbcR127
In case, that someone does not want to explicitly save the dataframes, we need to stream the dataframes in from getml into duckdb:
https://github.com/getml/getml-io/pull/19/files#diff-7a912f9ee2a1c8c724e374aa668d7cd394b96fa18db5b2fd912be63b092cf53eR60