-
Notifications
You must be signed in to change notification settings - Fork 4
Support for reading from unseekable streams #140
Description
This is been in the back of my mind since the project started but I just realized there was not an issue to track the task.
It would be nice (though not currently a priority) to be able to read ASDF files directly from a stream without seek support (e.g. reading from stdin, a network, etc.)
Much of the I/O code in stream.c is already designed with this functionality in mind, but it has never been tested, and there are some areas of the parser that currently still assume that seek support is available.
Of course, there are various trade-offs and decisions that need to be made when implementing stream support. Obviously the YAML tree is already buffered into memory and can be read, but the question is what to do with binary blocks. There are many possibilities here depending on the use-case and this can probably be specified in the parser configuration depending on the application requirements. For example:
- Are the binary blocks required at all? Do we want to retain just their metadata (e.g. the block headers) or the full data, or not at all?
- If we want to be able to access block data later how is it buffered? In memory? To a temp file?
- This can also depend a lot on how many blocks are in the file--multiple or just one block (perhaps even a single block that was written in streaming mode)
- Does the user need access to every block (if there is more than one) or can they specify only specific blocks that should be buffered?
Here it might also be useful to have access to the lower-level parser events. For example, it is possible to read the file just up to the YAML data, read the YAML, and then decide based on that how block parsing should proceed (if at all). This is possible now using the parser APIs, though the high-level asdf_file_t APIs don't currently expose the underlying parser either. But this might be a use case for doing so.