Add read image and process lables natebook#162
Add read image and process lables natebook#162sfc-gh-dan wants to merge 1 commit intoSnowflake-Labs:mainfrom
Conversation
| "from snowflake.snowpark.context import get_active_session\n", | ||
| "session = get_active_session()\n" |
There was a problem hiding this comment.
This won't work outside of Snowflake Notebooks
There was a problem hiding this comment.
(Need to create the session from config first)
There was a problem hiding this comment.
yeah ray data won't work from notebook as well, this notebook is meant to be used inside a snowbook.
| " database = \"ST_DB\",\n", | ||
| " schema = \"ST_SCHEMA\",\n", |
There was a problem hiding this comment.
This db/schema don't exist for users
There was a problem hiding this comment.
I wonder what would be the best practice on this? I guess we cannot assume any database and scheme won't exist on customer account.
| }, | ||
| "source": [ | ||
| "### Process both dataset to include addition columns\n", | ||
| "**Image Dataset**: add a join key, encode the images, standardize image\\n\n", |
| "### Process both dataset to include addition columns\n", | ||
| "**Image Dataset**: add a join key, encode the images, standardize image\\n\n", | ||
| "\n", | ||
| "**Label Dataset**: add a join key, interrpet the labels" |
| "source": [ | ||
| "from snowflake.ml.ray.datasource import SFStageImageDataSource, SFStageTextDataSource\n", | ||
| "\n", | ||
| "image_source = SFStageImageDataSource(\n", | ||
| " stage_location = \"@DATA_STAGE_RAY/images/\",\n", | ||
| " database = \"ST_DB\",\n", | ||
| " schema = \"ST_SCHEMA\",\n", | ||
| " image_size=(256, 256),\n", | ||
| ")" | ||
| ] | ||
| }, | ||
| { | ||
| "cell_type": "code", | ||
| "execution_count": null, | ||
| "id": "2324e409-b4c5-4405-ad1c-267831be1773", | ||
| "metadata": { | ||
| "language": "python", | ||
| "name": "cell15" | ||
| }, | ||
| "outputs": [], | ||
| "source": [ | ||
| "label_source = SFStageTextDataSource(\n", | ||
| " stage_location = \"@DATA_STAGE_RAY/labels/\",\n", | ||
| " database = \"ST_DB\",\n", | ||
| " schema = \"ST_SCHEMA\",\n", | ||
| ")" | ||
| ] | ||
| }, |
There was a problem hiding this comment.
Where should external users get the images and labels?
There was a problem hiding this comment.
Let me add a step before this notebook to prepare for the data, to answer your question: this is using a public third party dataset
| }, | ||
| "source": [ | ||
| "### Merge image source and label source into a single dataset\n", | ||
| "We have two ways of achieving this: 1) if customer is more famaliar with `pandas.Dataframe` and if the data fit into memory, then we can convert all data into pandas (or write into snowflake) and do the rest of the ops. 2) If the data does not fit into memory, we can directly leverage ray dataset to do the processing. \n", |
| "### Merge image source and label source into a single dataset\n", | ||
| "We have two ways of achieving this: 1) if customer is more famaliar with `pandas.Dataframe` and if the data fit into memory, then we can convert all data into pandas (or write into snowflake) and do the rest of the ops. 2) If the data does not fit into memory, we can directly leverage ray dataset to do the processing. \n", | ||
| "\n", | ||
| "**Note**: Ray dataset is not naturally architeched to support join ops, so it's better for to use other method (in memory / snowflake) to perform joins" |
| "resultHeight": 46 | ||
| }, | ||
| "source": [ | ||
| "## Save the Transformed Dataset to a snowflake table\n", |
There was a problem hiding this comment.
nit: capitalize Snowflake
| " database = \"ST_DB\",\n", | ||
| " schema = \"ST_SCHEMA\",\n", |
There was a problem hiding this comment.
(just a reminder that db/schema don't exist for users)
| "source": [ | ||
| "# sql cell\n", | ||
| "\n", | ||
| "# SELECT * FROM RAY_DEMO_JAN21_IMAGE_DS;" | ||
| ] |
There was a problem hiding this comment.
Convert to Snowpark Python call?
Add notebook to show unstrcutured data processing on container runtime