-
Notifications
You must be signed in to change notification settings - Fork 1
Organize structure #5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
I'm not a big fan of organizing by contributor. That information is already available in repo metadata if someone really cares. The pcn_common.py functions were placed there because they are common to the notebooks I provided and I did not want to create long notebooks. Since they are for scalar data requests, their use is agnostic to the data being requested. In my experience, even with the overhead, packages like Pandas and Xarray are common in scientific computing and data exploration. My notebooks are one interpretation of how users can access and organize ONC data. Users are more than welcome to make edits and suggestions for improving their clarity and efficiency. However, I can also see the pcn_common.py file being confusing or causing new Python users environment issues. I will spend some time making each notebook standalone and will remove pcn_common.py from the repo. Users wishing to reuse code can then copy-paste like you say. I think the goal for this repository should be to 1) show people how to access and discover ONC data via the onc python package, and 2) highlight interesting ONC assets and data. What do you think about separating into tutorials and science/advanced processing examples? For tutorials, notebooks could describe how to extend basic operations with the api-python-client. Such as finding all location codes within a bounding box, or finding all location codes that produced 'seawatertemperature' between two dates. Then science examples would be for reviewing data, making figures, or performing data corrections. Thoughts? @kan-fu @aschlesin Once a decision is made on the repo structure, I will update the notebooks and their descriptions in the README. |
Contributor name way or Category wayMy main reason for organizing under contributor names is that notebooks (along with the helper files) from one contributor can be independent with those from different contributors in this way, so contributors are free to modify their own content as they like in an autonomous space. It also reduces maintenance burden. Soon we will add https://github.com/OceanNetworksCanada/Barkley-Sound-datalabs and possibly https://github.com/g-bertozzi/Ocean-Hackathon-Datasets into this repository. The whole restructure thing in this PR is all about how to add more notebooks from different contributors. In this case, they have their own repo first, then we want to incorporate them into our repo.
To be frank I was leaning towards category way in the very beginning (tutorials and specific topics are words in my mind), as organizing by contributor names seems weird. It's just because my experience is not in the data/science area, I cannot help with classifying each notebook. I added keywords in the description as an alternative to act like categories. I am OK with either way. Just don't want this to discourage contributors to share their notebooks as this requires extra work. For pcn_common.pyI like the idea of having helper methods. I put it in the sub folder because I am thinking in the contributor name way. If we go with the category way, we should keep it in the root. So I don't think making notebooks standalone is necessary (or even beneficial). Having the same helper methods in 10 standalone notebooks is against DRY principle. Also it highlights and advertises some common methods (like xarray you mentioned) to users. The environment issue I mentioned is about the pinned versions in the different requirements.txt file. Users need to be aware of that, but contributors should not worry about it. One minor issue of having a pcn_common.py in the root is that other contributors need to either append their helper files into this file and adapt their import in the notebooks, or they simply ignore it and use their own ones. Take Barkley-Sound-datalabs as an example. This one looks like a tutorial for a conference or workshop. They have their own structure and helper files. It would be best if they can just move everything into the repository without any changes. We just need to let other contributors know that they can have their own ones. Not a big issue here. BTW, I added the Code Organization section in the README because I hope users can smoothly run the notebooks. Right now I believe users need to put pcn_common.py besides the notebooks to make the import work. |
|
Hi, I also don't prefer the naming convention by initial contributor. I like Ian's thoughts: What do you think about separating into tutorials and science/advanced processing examples?"_ First I thought we should organize by instrument category, but that does not really work if one creates a notebook to get data from different instruments to compare or to investigate a specific research idea. E.g. an easy example - water property changes in Strait of Georgia - one would request data from different sites (along the strait, ferries, moorrings (autonomous sites) and also different instrument categories (oxygen sensors, turbitity sensors, CO2 sensors,...) |
Hi, this is my attempt to reorganize the repo so that it would be easier to incorporate other people's work into this repo. I also proposed guidelines (recommended, not enforced) on some common topics. The main motivation is to establish a easy and consistent way for users to try out those notebooks. Feel free to leave comments if you have any questions or suggestions.
I used copilot to generate the catalog for Ian's notebooks (actually polished the whole README file).
Explanations on some of the decision (which I hesitated between both sides and am open to change):
I put the helper python script (
pcn_common.py) in the sub directory, instead of root directory.Having the helper file in the root directory helps other contributors to reuse the methods. But from my experience, different people tend to have their own helper files. And if they really want to use the methods in other people's helper method, they can always copy and paste. Original authors would not need to consider backward compatibility issues.
I used author names as the directory name.
I think it would be easier to manage the repo by organizing the notebooks under the author name instead of categories. I just used Ian's GitHub name. Feel free to change that @IanTBlack.
I put description and keywords in the catalog section.
I don't want to overburden the contributors, but I think a brief description and some keywords would be beneficial for users to navigate inside the repo. Users might just want to take a look at a random notebook, or they might want to look for some specific topics. Having both description and keywords (including names of the external libraries used) would give users a good idea on whether they are interesting or not. Users can also simply search keywords in the README file.
I initially plan to replace all the notebooks names with links, but later I decided not to because it adds some extra workload for the contributors.