-
Notifications
You must be signed in to change notification settings - Fork 71
Description
I have been part of the Google OSS ML stack for almost a decade, so I do understand the meaning of experimental very clearly. I do not understand why the documentation for grain is so poor that too in the LLM era where assisted documentation writing is a thing.
There are functionalities that I am interested in using and showcasing to other people, but the lack of documentation means I spend more time looking at the source code to figure out the right way to use an API. For example, look at the documentation of ParquetIterDataset. Does it answer any of the following questions:
- What are the read_kwargs?
- What if I want to lazy load the shards?
- What if I want to lazy load, but also want to ensure that next shard is already read so that I do not create data pipeline bubble during training?
- How to maximize the performance when you have small/large number of shards?
I have been waiting for things to improve on the documentation for almost 2 years now. There are a few of us who can also help, but even for that we need to know the API well (chicken-egg problem here!)