Add read image and process lables natebook by sfc-gh-dan · Pull Request #162 · Snowflake-Labs/sf-samples

sfc-gh-dan · 2025-02-07T01:09:37Z

Add notebook to show unstrcutured data processing on container runtime

sfc-gh-dhung · 2025-02-10T18:41:28Z

samples/ml/container_runtime/read_image_and_process_labels.ipynb

+    "from snowflake.snowpark.context import get_active_session\n",
+    "session = get_active_session()\n"


This won't work outside of Snowflake Notebooks

(Need to create the session from config first)

yeah ray data won't work from notebook as well, this notebook is meant to be used inside a snowbook.

sfc-gh-dhung · 2025-02-10T18:42:07Z

samples/ml/container_runtime/read_image_and_process_labels.ipynb

+    "    database = \"ST_DB\",\n",
+    "    schema = \"ST_SCHEMA\",\n",


This db/schema don't exist for users

I wonder what would be the best practice on this? I guess we cannot assume any database and scheme won't exist on customer account.

sfc-gh-dhung · 2025-02-10T18:43:08Z

samples/ml/container_runtime/read_image_and_process_labels.ipynb

+   },
+   "source": [
+    "### Process both dataset to include addition columns\n",
+    "**Image Dataset**: add a join key, encode the images, standardize image\\n\n",


nit: remove \\n

sfc-gh-dhung · 2025-02-10T18:43:21Z

samples/ml/container_runtime/read_image_and_process_labels.ipynb

+    "### Process both dataset to include addition columns\n",
+    "**Image Dataset**: add a join key, encode the images, standardize image\\n\n",
+    "\n",
+    "**Label Dataset**: add a join key, interrpet the labels"


sfc-gh-dhung · 2025-02-10T18:48:12Z

samples/ml/container_runtime/read_image_and_process_labels.ipynb

+   "source": [
+    "from snowflake.ml.ray.datasource import SFStageImageDataSource, SFStageTextDataSource\n",
+    "\n",
+    "image_source = SFStageImageDataSource(\n",
+    "    stage_location = \"@DATA_STAGE_RAY/images/\",\n",
+    "    database = \"ST_DB\",\n",
+    "    schema = \"ST_SCHEMA\",\n",
+    "    image_size=(256, 256),\n",
+    ")"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "id": "2324e409-b4c5-4405-ad1c-267831be1773",
+   "metadata": {
+    "language": "python",
+    "name": "cell15"
+   },
+   "outputs": [],
+   "source": [
+    "label_source = SFStageTextDataSource(\n",
+    "    stage_location = \"@DATA_STAGE_RAY/labels/\",\n",
+    "    database = \"ST_DB\",\n",
+    "    schema = \"ST_SCHEMA\",\n",
+    ")"
+   ]
+  },


Where should external users get the images and labels?

Let me add a step before this notebook to prepare for the data, to answer your question: this is using a public third party dataset

sfc-gh-dhung · 2025-02-10T18:49:51Z

samples/ml/container_runtime/read_image_and_process_labels.ipynb

+   },
+   "source": [
+    "### Merge image source and label source into a single dataset\n",
+    "We have two ways of achieving this: 1) if customer is more famaliar with `pandas.Dataframe` and if the data fit into memory, then we can convert all data into pandas (or write into snowflake) and do the rest of the ops. 2) If the data does not fit into memory, we can directly leverage ray dataset to do the processing. \n",


nit: sp famaliar

sfc-gh-dhung · 2025-02-10T18:50:15Z

samples/ml/container_runtime/read_image_and_process_labels.ipynb

+    "### Merge image source and label source into a single dataset\n",
+    "We have two ways of achieving this: 1) if customer is more famaliar with `pandas.Dataframe` and if the data fit into memory, then we can convert all data into pandas (or write into snowflake) and do the rest of the ops. 2) If the data does not fit into memory, we can directly leverage ray dataset to do the processing. \n",
+    "\n",
+    "**Note**: Ray dataset is not naturally architeched to support join ops, so it's better for to use other method (in memory / snowflake) to perform joins"


nit: sp architeched

sfc-gh-dhung · 2025-02-10T18:50:49Z

samples/ml/container_runtime/read_image_and_process_labels.ipynb

+    "resultHeight": 46
+   },
+   "source": [
+    "## Save the Transformed Dataset to a snowflake table\n",


nit: capitalize Snowflake

sfc-gh-dhung · 2025-02-10T18:51:15Z

samples/ml/container_runtime/read_image_and_process_labels.ipynb

+    "    database = \"ST_DB\",\n",
+    "    schema = \"ST_SCHEMA\",\n",


(just a reminder that db/schema don't exist for users)

sfc-gh-dhung · 2025-02-10T18:51:39Z

samples/ml/container_runtime/read_image_and_process_labels.ipynb

+   "source": [
+    "# sql cell\n",
+    "\n",
+    "# SELECT * FROM RAY_DEMO_JAN21_IMAGE_DS;"
+   ]


Convert to Snowpark Python call?

add read image and process lables natebook

6b29e14

sfc-gh-dhung reviewed Feb 10, 2025

View reviewed changes

		"from snowflake.snowpark.context import get_active_session\n",
		"session = get_active_session()\n"

Conversation

sfc-gh-dan commented Feb 7, 2025

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants