Improve dmod.client and update CLI#424
Conversation
6f55b6d to
8d29ffe
Compare
8d29ffe to
5caf4a3
Compare
aaraney
left a comment
There was a problem hiding this comment.
Thanks @robertbartel! I have a few fairly minor comments that more or less boil down to API parameter choices. There is one bug that will need to be addressed but other than that, this should be pretty easy to get through. Thanks!
| async def upload_to_dataset(self, dataset_name: str, paths: List[Path]) -> bool: | ||
| chunk_size = 1024 | ||
|
|
||
| with source.open('r') as file: |
There was a problem hiding this comment.
I think we need to open this in reading bytes, right? Since we dont know what kind of item is being uploaded.
There was a problem hiding this comment.
Hmm ... yeah, I believe you are correct, and as such we should do similarly for writes as you suggested above. But this actually becomes a larger undertaking. All the links in the chain will need to be modified to accommodate, from DataTransmitMessage.data to the declaration and implementations of _DatasetManager.add_data()` (and associated helper methods).
It probably makes sense to do this as part of #432. What do you think?
There was a problem hiding this comment.
Ping again here also. This probably relates to the earlier item and my old follow-up questions.
There was a problem hiding this comment.
Yeah I agree. This is a larger undertaking and not directly related so should be in its own PR.
I opened #486 to track changing DataTransmitMessage's data field to a more appropriate type. Im not sure if there are other models off the top of my head that will also need to be changed.
There was a problem hiding this comment.
It didn't occur to me back in September, but (unless I'm missing something) we can't actually do this because it will break JSON serialization.
There was a problem hiding this comment.
Yeahhhhh, you are right. I forgot about that. Does that mean you can only upload and download unicode encoded data at this moment? I don't recall seeing any logic to base64 / base85 encode binary data.
There was a problem hiding this comment.
Just as a sanity check, if we decode and then encode binary data using the "wrong" encoding, but it's the same encoding on both ends, won't we get the same data we started with?
Assuming more discussion and work is needed, I suggest that we defer this problem. A solution will probably be complex, and we can find workarounds as needed to get datasets into the system (worst case: temporarily opening up MinIO externally and uploading directly).
There was a problem hiding this comment.
Just as a sanity check, if we decode and then encode binary data using the "wrong" encoding, but it's the same encoding on both ends, won't we get the same data we started with?
Yeah! You are right. I may be blowing this out of proportion and it might not be a problem after all. I thought to transmit any data to and from the request service (and therefore the data service) it must be in json and therefore in some unicode flavor (e.g. utf-8). So, for example if I wanted to upload a netCDF forcing file to a dataset, I think I would first need to encode and chunk the data using base64 or base85, package that up in json, and send it along, right? I agree that we should move this discussion elsewhere and just focus on getting the PR through lol. Sorry to trying to derail us 😅!
|
|
||
| def __init__(self, *args, **kwargs): | ||
| super().__init__(*args, **kwargs) | ||
| #super().__init__(*args, **kwargs) |
5caf4a3 to
f39d89e
Compare
3c08c84 to
a1d10d3
Compare
48cb083 to
0ac3a65
Compare
dca9d6d to
2b38710
Compare
3c431ff to
e0c9ed9
Compare
cd66f76 to
c0bec91
Compare
|
Trying to get this one moving again after it has sat for a while. I think there are some outstanding follow-up questions, and possible tasks resulting from those discussions, but everything else is settled. @aaraney, when you are able, can you confirm all the other conversations can be marked resolved? |
|
@robertbartel, thanks for revitalizing this! I'd forgotten about it and let it get stale. I've resolved relevant comments and contributed to existing conversation to move this forward. We are getting really close, just a few minor things. |
c0bec91 to
8e6e3bd
Compare
Replacing ngen-job-specific, deprecated client classes with more general JobClient.
Adding more help detail for various subcommands, and renaming some subcommands to be more intuitive.
Replacing pyyaml with Pydantic for ClientConfig changes, and bumping to latest communication package version.
Co-authored-by: Austin Raney <austin.raney@noaa.gov>
Fixing bug with misnamed initial variable usage for setting up first message to send before look-ahead sending loop.
Having usage wrap raised exceptions and raise new ones that include the message of the former, rather than crudely pass the name of the user function to DataServiceClient._process_request().
Rename get_dataset_items method in DataServiceClient to get_dataset_item_names.
Fixing mismatched strings between CLI prepared args and supported action strings in DmodClient data_service_action() func.
Commenting out seemingly unneeded SchedulerClient init in RequestService.
Ensuring ConnectionContextClient async_recv() uses an async context manager.
Adding support to package __main__ routine for turning on remote debugging when properly configured.
Re-add DEFAULT_CLIENT_CONFIG_BASENAME definition (that was still being used) and refactor an error message.
Updating Dockerfile for app_server image to create a client config file using the new JSON syntax, complete with optional debugging config if env is set appropriately.
Fixing methods involved in making requests and processing responses to support collections of response types, rather than only a single, to account for scenarios like data transfer.
Updating classes to properly use async context management in transport client when appropriate.
8e6e3bd to
51f583d
Compare
|
The only thing different was manually resolving a conflict for a version bump to the communication package, and the result of that ended up the same. As such, I'm going to bypass waiting for another approval this time. |
General improvements to the CLI, as well as updates to bring in line with more recent DMOD updates to serialization, communication client classes, and user config representation.
Note that this PR depends on #409 and should remain in draft status until that is merged.