Skip to content

QarnotCluster for Dask ? #6

@remiadon

Description

@remiadon

I see Qarnot computing instances as potential candidates for data science use cases

For now I consider this example as the typical workflow to perform machine learning via your framework.
It is great, but requires code to be wrapped in a singled python file, and some docker prior knowledge to run the code in an embarrassingly-parallel fashion.
On the other hand, I would like to run some computation in a notebook environment, using Qarnot computing power.

For these - more interactive - workflows I think dask is a good candidate.

Even if dask is designed to build complex graphs there are some worfklows in datascience (and ML) that are embarassingly parallel, examples being :

Dask contributors already worked on dask-cloudprovider, a library to facilitate deployment on AWS and Azure. Providing a QarnotCluster would make the following workflow possible (I think)

>>> import dask.dataframe as dd
>>> cluster = QarnotCluster('<YOUR_API_TOKEN>')
>>> df = dd.read_csv('path-to-qarnot-bucket/*.csv')
.. some regular dask worfkflow .. 

I know for now you rely a lot on docker, I am just asking if this is a possible direction in the future.

Cheers,

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions