Organize structure #5

kan-fu · 2026-01-22T22:51:02Z

Hi, this is my attempt to reorganize the repo so that it would be easier to incorporate other people's work into this repo. I also proposed guidelines (recommended, not enforced) on some common topics. The main motivation is to establish a easy and consistent way for users to try out those notebooks. Feel free to leave comments if you have any questions or suggestions.

I used copilot to generate the catalog for Ian's notebooks (actually polished the whole README file).

Might need @IanTBlack to double check those description if they are correct.

Explanations on some of the decision (which I hesitated between both sides and am open to change):

I put the helper python script (pcn_common.py) in the sub directory, instead of root directory.

Having the helper file in the root directory helps other contributors to reuse the methods. But from my experience, different people tend to have their own helper files. And if they really want to use the methods in other people's helper method, they can always copy and paste. Original authors would not need to consider backward compatibility issues.
I used author names as the directory name.

I think it would be easier to manage the repo by organizing the notebooks under the author name instead of categories. I just used Ian's GitHub name. Feel free to change that @IanTBlack.
I put description and keywords in the catalog section.

I don't want to overburden the contributors, but I think a brief description and some keywords would be beneficial for users to navigate inside the repo. Users might just want to take a look at a random notebook, or they might want to look for some specific topics. Having both description and keywords (including names of the external libraries used) would give users a good idea on whether they are interesting or not. Users can also simply search keywords in the README file.
I initially plan to replace all the notebooks names with links, but later I decided not to because it adds some extra workload for the contributors.

IanTBlack · 2026-01-23T21:09:25Z

I'm not a big fan of organizing by contributor. That information is already available in repo metadata if someone really cares.

The pcn_common.py functions were placed there because they are common to the notebooks I provided and I did not want to create long notebooks. Since they are for scalar data requests, their use is agnostic to the data being requested. In my experience, even with the overhead, packages like Pandas and Xarray are common in scientific computing and data exploration. My notebooks are one interpretation of how users can access and organize ONC data. Users are more than welcome to make edits and suggestions for improving their clarity and efficiency. However, I can also see the pcn_common.py file being confusing or causing new Python users environment issues. I will spend some time making each notebook standalone and will remove pcn_common.py from the repo. Users wishing to reuse code can then copy-paste like you say.

I think the goal for this repository should be to 1) show people how to access and discover ONC data via the onc python package, and 2) highlight interesting ONC assets and data.

What do you think about separating into tutorials and science/advanced processing examples?

For tutorials, notebooks could describe how to extend basic operations with the api-python-client. Such as finding all location codes within a bounding box, or finding all location codes that produced 'seawatertemperature' between two dates.

Then science examples would be for reviewing data, making figures, or performing data corrections.
Such as making gifs from seafloor cameras, recreating plots like from this ONC story, identifying and binning profiles/ferry transits, or doing depth matchup for profiling bioacoustic data (like @slonimer has done for BACVP). Things that are generally a little more advanced or show that the data exist.

Thoughts? @kan-fu @aschlesin

Once a decision is made on the repo structure, I will update the notebooks and their descriptions in the README.

kan-fu · 2026-01-23T23:31:32Z

Contributor name way or Category way

My main reason for organizing under contributor names is that notebooks (along with the helper files) from one contributor can be independent with those from different contributors in this way, so contributors are free to modify their own content as they like in an autonomous space. It also reduces maintenance burden. Soon we will add https://github.com/OceanNetworksCanada/Barkley-Sound-datalabs and possibly https://github.com/g-bertozzi/Ocean-Hackathon-Datasets into this repository. The whole restructure thing in this PR is all about how to add more notebooks from different contributors. In this case, they have their own repo first, then we want to incorporate them into our repo.

By using contributor name way, it will be pretty easy to move them.
By using category way, someone (probably the contributor) needs to put them into the correct categories.

To be frank I was leaning towards category way in the very beginning (tutorials and specific topics are words in my mind), as organizing by contributor names seems weird. It's just because my experience is not in the data/science area, I cannot help with classifying each notebook. I added keywords in the description as an alternative to act like categories.

I am OK with either way. Just don't want this to discourage contributors to share their notebooks as this requires extra work.

For pcn_common.py

I like the idea of having helper methods. I put it in the sub folder because I am thinking in the contributor name way. If we go with the category way, we should keep it in the root. So I don't think making notebooks standalone is necessary (or even beneficial). Having the same helper methods in 10 standalone notebooks is against DRY principle. Also it highlights and advertises some common methods (like xarray you mentioned) to users. The environment issue I mentioned is about the pinned versions in the different requirements.txt file. Users need to be aware of that, but contributors should not worry about it.

One minor issue of having a pcn_common.py in the root is that other contributors need to either append their helper files into this file and adapt their import in the notebooks, or they simply ignore it and use their own ones. Take Barkley-Sound-datalabs as an example. This one looks like a tutorial for a conference or workshop. They have their own structure and helper files. It would be best if they can just move everything into the repository without any changes.

We just need to let other contributors know that they can have their own ones. Not a big issue here.

BTW, I added the Code Organization section in the README because I hope users can smoothly run the notebooks. Right now I believe users need to put pcn_common.py besides the notebooks to make the import work.

aschlesin · 2026-01-27T18:38:35Z

Hi, I also don't prefer the naming convention by initial contributor. I like Ian's thoughts:
_"I think the goal for this repository should be to 1) show people how to access and discover ONC data via the onc python package, and 2) highlight interesting ONC assets and data.

What do you think about separating into tutorials and science/advanced processing examples?"_

First I thought we should organize by instrument category, but that does not really work if one creates a notebook to get data from different instruments to compare or to investigate a specific research idea. E.g. an easy example - water property changes in Strait of Georgia - one would request data from different sites (along the strait, ferries, moorrings (autonomous sites) and also different instrument categories (oxygen sensors, turbitity sensors, CO2 sensors,...)
I think it should be up to the owner of the notebooks to describe (1) what research aspect they were looking at, (2) what instruments/data they are requesting in the notebooks. I don't see it just as a way to show multiple ways on how to access data, but more as way to show interesting research and ONCs data in that way. Hope that makes sense. I will have a read at the instructions guide that Kan wrote (Many thanks!) and will see if I can modify this to my understanding if you all agree.

kan-fu added 2 commits January 22, 2026 14:11

Update README for guideline and catalog

7e7635c

Move notebooks

746421d

kan-fu requested review from IanTBlack and aschlesin January 22, 2026 22:51

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Organize structure #5

Organize structure #5

Uh oh!

kan-fu commented Jan 22, 2026

Uh oh!

IanTBlack commented Jan 23, 2026

Uh oh!

kan-fu commented Jan 23, 2026

Uh oh!

aschlesin commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Organize structure #5

Are you sure you want to change the base?

Organize structure #5

Uh oh!

Conversation

kan-fu commented Jan 22, 2026

Uh oh!

IanTBlack commented Jan 23, 2026

Uh oh!

kan-fu commented Jan 23, 2026

Contributor name way or Category way

For pcn_common.py

Uh oh!

aschlesin commented Jan 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants