Skip to content

Issue on page /grain.experimental.html #1238

@AakashKumarNain

Description

@AakashKumarNain

I have been part of the Google OSS ML stack for almost a decade, so I do understand the meaning of experimental very clearly. I do not understand why the documentation for grain is so poor that too in the LLM era where assisted documentation writing is a thing.

There are functionalities that I am interested in using and showcasing to other people, but the lack of documentation means I spend more time looking at the source code to figure out the right way to use an API. For example, look at the documentation of ParquetIterDataset. Does it answer any of the following questions:

  1. What are the read_kwargs?
  2. What if I want to lazy load the shards?
  3. What if I want to lazy load, but also want to ensure that next shard is already read so that I do not create data pipeline bubble during training?
  4. How to maximize the performance when you have small/large number of shards?

I have been waiting for things to improve on the documentation for almost 2 years now. There are a few of us who can also help, but even for that we need to know the API well (chicken-egg problem here!)

Metadata

Metadata

Assignees

No one assigned

    Labels

    documentationImprovements or additions to documentation

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions