From e4c47b12cca9a32dcb84da94dcca797a2c08f6db Mon Sep 17 00:00:00 2001 From: rzimmerdev Date: Sat, 7 May 2022 01:20:21 -0300 Subject: [PATCH 1/4] [ 01_huggingface-hub-tour.md ] - WIP - translation to Portuguese --- .../PT/01_huggingface-hub-tour.md | 18 +++++++++--------- 1 file changed, 9 insertions(+), 9 deletions(-) rename 01_huggingface-hub-tour.md => tutorials/PT/01_huggingface-hub-tour.md (97%) diff --git a/01_huggingface-hub-tour.md b/tutorials/PT/01_huggingface-hub-tour.md similarity index 97% rename from 01_huggingface-hub-tour.md rename to tutorials/PT/01_huggingface-hub-tour.md index 0b5429c..e1ed2ff 100644 --- a/01_huggingface-hub-tour.md +++ b/tutorials/PT/01_huggingface-hub-tour.md @@ -55,19 +55,19 @@ The interface has many components, so let’s go through them: - At the top, you can find different **tags** for things such as the task (*text generation, image classification*, etc.), frameworks (*PyTorch*, *TensorFlow*, etc.), the model’s language (*English*, *Arabic*, *etc.*), and license (*e.g. MIT*). -![](./images/mode_card_tags.png) +![](../../images/mode_card_tags.png) - At the right column, you can play with the model directly in the browser using the *Inference API*. GPT2 is a text generation model, so it will generate additional text given an initial input. Try typing something like, “It was a bright and sunny day.” -![](./images/model_card_inference_api.png) +![](../../images/model_card_inference_api.png) - In the middle, you can go through the model card content. It has sections such as Intended uses & limitations, Training procedure, and Citation Info. -![](./images/model_card_content.png) +![](../../images/model_card_content.png) Where does all this data come from? At Hugging Face, everything is based in **Git repositories** and is open-sourced. You can click the “Files and Versions” tab, which will allow you to see all the repository files, including the model weights. The model card is a markdown file **([README.md](http://README.md))** which on top of the content contains metadata such as the tags. -![](./images/model_card_git.png) +![](../../images/model_card_git.png) Since all models are Git-based repositories, you get version control out of the box. Just as with GitHub, you can do things such as Git cloning, adding, committing, branching, and pushing. If you’ve never used Git before, we suggest the following [resource](https://learngitbranching.js.org/). @@ -84,7 +84,7 @@ So far, we’ve explored a single model. Let’s go wild! At the left of [https: - **Libraries:** Although the Hub was originally for transformers models, the Hub has integration with dozens of libraries. You can find models of Keras, spaCy, allenNLP, and more. - **Datasets:** The Hub also hosts thousands of datasets, as you’ll find more about later. -![](./images/model_card_filters.png) +![](../../images/model_card_filters.png) - **Languages:** Many of the models on the Hub are NLP-related. You can find models for hundreds of languages, including low-resource languages. @@ -147,7 +147,7 @@ Let’s go through the steps: And we're done! You can check your repository with all the recently added files! -![](./images/model_card_updated_repo.png) +![](../../images/model_card_updated_repo.png) The UI allows you to explore the model files and commits and to see the diff introduced by each commit. @@ -182,15 +182,15 @@ Let’s explore the [GLUE](https://huggingface.co/datasets/glue) dataset, which - Similar to model repositories, you have a dataset card that documents the dataset. If you scroll down a bit, you will find things such as the summary, the structure, and more. -![](./images/datasets_card.png) +![](../../images/datasets_card.png) - At the top, you can explore a slice of the dataset directly in the browser. The GLUE dataset is divided into multiple sub-datasets (or subsets) that you can select, such as COLA and QNLI. - ![](./images/datasets_slices.png) + ![](../../images/datasets_slices.png) - At the right of the dataset card, you can see a list of models trained on this dataset. -![](./images/datasets_models_trained.png) +![](../../images/datasets_models_trained.png) **Challenge 6**. Search for the Common Voice dataset. Answer these questions: From d895ad6f99a2c956805401ce15be2bbc5cb05b1e Mon Sep 17 00:00:00 2001 From: rzimmerdev Date: Sat, 7 May 2022 16:04:23 -0300 Subject: [PATCH 2/4] [ 02_huggingface-hub-tour.md ] - Finished translation to Portuguese --- 02_ml-demos-with-gradio.ipynb | 525 -------------------- tutorials/PT/02_ml-demos-with-gradio.ipynb | 527 +++++++++++++++++++++ 2 files changed, 527 insertions(+), 525 deletions(-) delete mode 100644 02_ml-demos-with-gradio.ipynb create mode 100644 tutorials/PT/02_ml-demos-with-gradio.ipynb diff --git a/02_ml-demos-with-gradio.ipynb b/02_ml-demos-with-gradio.ipynb deleted file mode 100644 index 6a2316c..0000000 --- a/02_ml-demos-with-gradio.ipynb +++ /dev/null @@ -1,525 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": { - "id": "gh6QOr-qO4Ym" - }, - "source": [ - "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/education-toolkit/blob/main/02_ml-demos-with-gradio.ipynb)\n", - "\n", - "\n", - "\n", - "💡 **Welcome!**\n", - "\n", - "We’ve assembled a toolkit that university instructors and organizers can use to easily prepare labs, homework, or classes. The content is designed in a self-contained way such that it can easily be incorporated into the existing curriculum. This content is free and uses widely known Open Source technologies (`transformers`, `gradio`, etc).\n", - "\n", - "Alternatively, you can request for someone on the Hugging Face team to run the tutorials for your class via the [ML demo.cratization tour](https://huggingface2.notion.site/ML-Demo-cratization-tour-with-66847a294abd4e9785e85663f5239652) initiative!\n", - "\n", - "You can find all the tutorials and resources we’ve assembled [here](https://huggingface2.notion.site/Education-Toolkit-7b4a9a9d65ee4a6eb16178ec2a4f3599). " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "NkJmA-r5L0EB" - }, - "source": [ - "# Tutorial: Build and Host Machine Learning Demos with Gradio ⚡ & Hugging Face 🤗 " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "D_Iv1CJZPekG" - }, - "source": [ - "**Learning goals:** The goal of this tutorial is to learn How To\n", - "\n", - "1. Build a quick demo for your machine learning model in Python using the `gradio` library\n", - "2. Host the demos for free with Hugging Face Spaces\n", - "3. Add your demo to the Hugging Face org for your class or conference. This includes:\n", - " * A setup step for instructors (or conference organizers)\n", - " * Upload instructions for students (or conference participants)\n", - "\n", - "**Duration**: 20-40\n", - " minutes\n", - "\n", - "**Prerequisites:** Knowledge of Python and basic familiarity with machine learning \n", - "\n", - "\n", - "**Author**: [Abubakar Abid](https://twitter.com/abidlabs) (feel free to ping me with any questions about this tutorial) \n", - "\n", - "All of these steps can be done for free! All you need is an Internet browser and a place where you can write Python 👩‍💻" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "PR9faV2NWTrG" - }, - "source": [ - "## Why Demos?\n", - "\n", - "**Demos** of machine learning models are an increasingly important part of machine learning _courses_ and _conferences_. Demos allow:\n", - "\n", - "* model developers to easily **present** their work to a wide audience\n", - "* increase **reproducibility** of machine learning research\n", - "* diverse users to more easily **identify and debug** failure points of models\n", - "\n", - "\n", - "As a quick example of what we would like to build, check out the [Keras Org on Hugging Face](https://huggingface.co/keras-io), which includes a description card and a collection of Models and Spaces built by Keras community. Any Space can be opened in your browser and you can use the model immediately, as shown here: \n", - "\n", - "![](https://i.ibb.co/7y6DGjB/ezgif-5-cc52b7e590.gif)\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "g0KzbU4lQtv3" - }, - "source": [ - "## 1. Build Quick ML Demos in Python Using the Gradio Library" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "rlSs72oUQ1VW" - }, - "source": [ - "`gradio` is a handy Python library that lets you build web demos simply by specifying the list of input and output **components** expected by your machine learning model. \n", - "\n", - "What do I mean by input and output components? Gradio comes with a bunch of predefined components for different kinds of machine learning models. Here are some examples:\n", - "\n", - "* For an **image classifier**, the expected input type is an `Image` and the output type is a `Label`. \n", - "* For a **speech recognition model**, the expected input component is an `Microphone` (which lets users record from the browser) or `Audio` (which lets users drag-and-drop audio files), while the output type is `Text`. \n", - "* For a **question answering model**, we expect **2 inputs**: [`Text`, `Text`], one textbox for the paragraph and one for the question, and the output type is a single `Text` corresponding to the answer. \n", - "\n", - "You get the idea... (for all of the supported components, [see the docs](https://gradio.app/docs/))\n", - "\n", - "In addition to the input and output types, Gradio expects a third parameter, which is the prediction function itself. This parameter can be ***any* regular Python function** that takes in parameter(s) corresponding to the input component(s) and returns value(s) corresponding to the output component(s)\n", - "\n", - "Enough words. Let's see some code!" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/" - }, - "id": "p0MkPbbZbSiP", - "outputId": "e143c5df-5b98-46c6-f2f7-7fc7abebd3d7" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\u001b[K |████████████████████████████████| 871 kB 5.1 MB/s \n", - "\u001b[K |████████████████████████████████| 2.0 MB 41.5 MB/s \n", - "\u001b[K |████████████████████████████████| 52 kB 787 kB/s \n", - "\u001b[K |████████████████████████████████| 1.1 MB 25.8 MB/s \n", - "\u001b[K |████████████████████████████████| 52 kB 1.1 MB/s \n", - "\u001b[K |████████████████████████████████| 210 kB 56.5 MB/s \n", - "\u001b[K |████████████████████████████████| 94 kB 2.8 MB/s \n", - "\u001b[K |████████████████████████████████| 271 kB 58.7 MB/s \n", - "\u001b[K |████████████████████████████████| 144 kB 58.8 MB/s \n", - "\u001b[K |████████████████████████████████| 10.9 MB 44.8 MB/s \n", - "\u001b[K |████████████████████████████████| 58 kB 5.3 MB/s \n", - "\u001b[K |████████████████████████████████| 79 kB 6.6 MB/s \n", - "\u001b[K |████████████████████████████████| 856 kB 60.6 MB/s \n", - "\u001b[K |████████████████████████████████| 61 kB 374 kB/s \n", - "\u001b[K |████████████████████████████████| 3.6 MB 50.0 MB/s \n", - "\u001b[K |████████████████████████████████| 58 kB 4.5 MB/s \n", - "\u001b[?25h Building wheel for ffmpy (setup.py) ... \u001b[?25l\u001b[?25hdone\n", - " Building wheel for python-multipart (setup.py) ... \u001b[?25l\u001b[?25hdone\n" - ] - } - ], - "source": [ - "# First, install Gradio\n", - "!pip install --quiet gradio" - ] - }, - { - "cell_type": "code", - "execution_count": 1, - "metadata": { - "id": "SjTxhry8bWS7" - }, - "outputs": [], - "source": [ - "import numpy as np\n", - "\n", - "def sepia(image):\n", - " sepia_filter = np.array(\n", - " [[0.393, 0.769, 0.189], \n", - " [0.349, 0.686, 0.168], \n", - " [0.272, 0.534, 0.131]]\n", - " )\n", - " sepia_img = image.dot(sepia_filter.T)\n", - " sepia_img /= sepia_img.max()\n", - " return sepia_img" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "OgqlIG2DbrJq" - }, - "outputs": [], - "source": [ - "import gradio as gr\n", - "\n", - "# Write 1 line of Python to create a simple GUI\n", - "gr.Interface(fn=sepia, inputs=\"image\", outputs=\"image\").launch(share=True);" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "0TyTGpSsb7bs" - }, - "source": [ - "Running the code above should produce a simple GUI inside this notebook allowing you to type example inputs and see the output returned by your function. \n", - "\n", - "Notice that we define an `Interface` using the 3 ingredients mentioned earlier:\n", - "* A function\n", - "* Input component(s)\n", - "* Output component(s)\n", - "\n", - "This is a simple example for images, but the same principle holds true for any other kind of data type. For example, here is an interface that generates a musical tone when provided a few different parameters (the specific code inside `generate_tone()` is not important for the purpose of this tutorial):" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 643 - }, - "id": "cHiZAO6ub6kA", - "outputId": "ee9e8bfd-4b86-4ddf-c96d-d389cdc0730e" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`\n", - "Running on public URL: https://20619.gradio.app\n", - "\n", - "This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)\n" - ] - }, - { - "data": { - "text/html": [ - "\n", - " \n", - " " - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - }, - { - "data": { - "text/plain": [ - "(,\n", - " 'http://127.0.0.1:7860/',\n", - " 'https://20619.gradio.app')" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "import numpy as np\n", - "import gradio as gr\n", - "\n", - "def generate_tone(note, octave, duration):\n", - " sampling_rate = 48000\n", - " a4_freq, tones_from_a4 = 440, 12 * (octave - 4) + (note - 9)\n", - " frequency = a4_freq * 2 ** (tones_from_a4 / 12)\n", - " audio = np.linspace(0, int(duration), int(duration) * sampling_rate)\n", - " audio = (20000 * np.sin(audio * (2 * np.pi * frequency))).astype(np.int16)\n", - " return sampling_rate, audio\n", - "\n", - "gr.Interface(\n", - " generate_tone,\n", - " [\n", - " gr.inputs.Dropdown([\"C\", \"C#\", \"D\", \"D#\", \"E\", \"F\", \"F#\", \"G\", \"G#\", \"A\", \"A#\", \"B\"], type=\"index\"),\n", - " gr.inputs.Slider(4, 6, step=1),\n", - " gr.inputs.Textbox(type=\"number\", default=1, label=\"Duration in seconds\"),\n", - " ],\n", - " \"audio\",\n", - " title=\"Generate a Musical Tone!\"\n", - ").launch(share=True)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "23gD280-w-kT" - }, - "source": [ - "**Challenge #1**: build a Gradio demo that takes in an image and returns the same image *flipped upside down* in less than 10 lines of Python code." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "DSE6TZF5e9Oz" - }, - "source": [ - "There are a lot more examples you can try in Gradio's [getting started page](https://gradio.app/getting_started/), which cover additional features such as:\n", - "* Adding example inputs\n", - "* Adding _state_ (e.g. for chatbots)\n", - "* Sharing demos easily using one parameter called `share` (<-- this is pretty cool 😎)\n", - "\n", - "It is especially easy to demo a `transformers` model from Hugging Face's Model Hub, using the special `gr.Interface.load` method. \n", - "\n", - "Let's try a text-to-speech model built by Facebook:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [ - "import gradio as gr\n", - "\n", - "gr.Interface.load(\"huggingface/facebook/fastspeech2-en-ljspeech\").launch(share=True);" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Here is the code to build a demo for [GPT-J](https://huggingface.co/EleutherAI/gpt-j-6B), a large language model & add a couple of examples inputs:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "colab": { - "base_uri": "https://localhost:8080/", - "height": 608 - }, - "id": "N_Cobhx8e8v9", - "outputId": "2bac3837-feff-42ea-a577-60343f19535b" - }, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Fetching model from: https://huggingface.co/EleutherAI/gpt-j-6B\n", - "Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`\n", - "Running on public URL: https://30262.gradio.app\n", - "\n", - "This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)\n" - ] - }, - { - "data": { - "text/html": [ - "\n", - " \n", - " " - ], - "text/plain": [ - "" - ] - }, - "metadata": {}, - "output_type": "display_data" - } - ], - "source": [ - "import gradio as gr\n", - "\n", - "examples = [[\"The Moon's orbit around Earth has\"], [\"There once was a pineapple\"]]\n", - "\n", - "gr.Interface.load(\"huggingface/EleutherAI/gpt-j-6B\", examples=examples).launch(share=True);" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "EoUYf0rYksA9" - }, - "source": [ - "**Challenge #2**: Go to the [Hugging Face Model Hub](https://huggingface.co/models), and pick a model that performs one of the other tasks supported in the `transformers` library (other than the two you just saw: text generation or text-to-speech). Create a Gradio demo for that model using `gr.Interface.load`." - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "b6Ek7cORgDkQ" - }, - "source": [ - "## 2. Host the Demo (for free) on Hugging Face Spaces\n", - "\n", - "Once you made a Gradio demo, you can host it permanently on Hugging Spaces very easily:\n", - "\n", - "Here are the steps to that (shown in the GIF below):\n", - "\n", - "A. First, create a Hugging Face account if you do not already have one, by visiting https://huggingface.co/ and clicking \"Sign Up\"\n", - "\n", - "B. Once you are logged in, click on your profile picture and then click on \"New Space\" underneath it to get to this page: https://huggingface.co/new-space\n", - "\n", - "C. Give your Space a name and a license. Select \"Gradio\" as the Space SDK, and then choose \"Public\" if you are fine with everyone accessing your Space and the underlying code\n", - "\n", - "D. Then you will find a page that provides you instructions on how to upload your files into the Git repository for that Space. You may also need to add a `requirements.txt` file to specify any Python package dependencies.\n", - "\n", - "E. Once you have pushed your files, that's it! Spaces will automatically build your Gradio demo allowing you to share it with anyone, anywhere!\n", - "\n", - "![GIF](https://huggingface.co/blog/assets/28_gradio-spaces/spaces-demo-finalized.gif)\n", - "\n", - "\n", - "\n" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "d4XCmQ_RILoq" - }, - "source": [ - "You can even embed your Gradio demo on any website -- in a blog, a portfolio page, or even in a colab notebook, like I've done with a Pictionary sketch recognition model below:" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": { - "id": "IwNP5DJOKUql" - }, - "outputs": [], - "source": [ - "from IPython.display import IFrame\n", - "IFrame(src='https://hf.space/gradioiframe/abidlabs/Draw/+', width=1000, height=800)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "Dw6H-iQAlF8I" - }, - "source": [ - "**Challenge #3**: Upload your Gradio demo to Hugging Face Spaces and get a permanent URL for it. Share the permanent URL with someone (a colleague, a collaborator, a friend, a user, etc.) -- what kind of feedback do you get on your machine learning model?" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "MqD0O1PKIg3g" - }, - "source": [ - "## 3. Add your demo to the Hugging Face org for your class or conference" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "DrMObQbwLOHm" - }, - "source": [ - "#### **Setup** (for instructors or conference organizers)" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "_45C7MnXNbc0" - }, - "source": [ - "A. First, create a Hugging Face account if you do not already have one, by visiting https://huggingface.co/ and clicking \"Sign Up\"\n", - "\n", - "B. Once you are logged in, click on your profile picture and then click on \"New Organization\" underneath it to get to this page: https://huggingface.co/organizations/new\n", - "\n", - "C. Fill out the information for your class or conference. We recommend creating a separate organization each time that a class is taught (for example, \"Stanford-CS236g-20222\") and for each year of the conference.\n", - "\n", - "D. Your organization will be created and now now users will be able request adding themselves to your organizations by visiting the organization page.\n", - "\n", - "E. Optionally, you can change the settings by clicking on the \"Organization settings\" button. Typically, for classes and conferences, you will want to navigate to `Settings > Members` and set the \"Default role for new members\" to be \"write\", which allows them to submit Spaces but not change the settings. " - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "iSqzO-w8LY0R" - }, - "source": [ - "#### For students or conference participants" - ] - }, - { - "cell_type": "markdown", - "metadata": { - "id": "3x1Oyh4wOdOK" - }, - "source": [ - "A. Ask your instructor / coneference organizer for the link to the Organization page if you do not already have it\n", - "\n", - "B. Visit the Organization page and click \"Request to join this org\" button, if you are not yet part of the org.\n", - "\n", - "C. Then, once you have been approved to join the organization (and built your Gradio Demo and uploaded it to Spaces -- see Sections 1 and 2), then simply go to your Space and go to `Settings > Rename or transfer this space` and then select the organization name under `New owner`. Click the button and the Space will now be added to your class or conference Space! " - ] - } - ], - "metadata": { - "colab": { - "collapsed_sections": [], - "name": "Building and Hosting Machine Learning Demos with Gradio & Hugging Face", - "provenance": [] - }, - "kernelspec": { - "display_name": "Python 3", - "language": "python", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.9.5" - } - }, - "nbformat": 4, - "nbformat_minor": 1 -} diff --git a/tutorials/PT/02_ml-demos-with-gradio.ipynb b/tutorials/PT/02_ml-demos-with-gradio.ipynb new file mode 100644 index 0000000..c6b2a91 --- /dev/null +++ b/tutorials/PT/02_ml-demos-with-gradio.ipynb @@ -0,0 +1,527 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "gh6QOr-qO4Ym" + }, + "source": [ + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/education-toolkit/blob/main/02_ml-demos-with-gradio.ipynb)\n", + "\n", + "\n", + "\n", + "💡 **Bem-vindo!**\n", + "\n", + "Nós reunimos um conjunto de ferramentas que instrutores universitários e organizadores podem usar para preparar laboratórios, tarefas ou aulas.\n", + "O conteudo foi projetado de uma forma autocontida, para ser facilmente incorporado no currículo existente. O conteúdo é gratuito e usa\n", + "tecnologias amplamente reconhecidas como Open Source (`transformers`, `gradio`, etc).\n", + "\n", + "Alternativamente, você pode pedir para que alguém no time da Hugging Face rodar os tutoriais para suas aulas via a iniciativa [Tour de demo.cratização de ML](https://huggingface2.notion.site/ML-Demo-cratization-tour-with-66847a294abd4e9785e85663f5239652)!\n", + "\n", + "Você também pode encontrar todos os tutoriais e recursos que nós montamos [aqui](https://huggingface2.notion.site/Education-Toolkit-7b4a9a9d65ee4a6eb16178ec2a4f3599)." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NkJmA-r5L0EB" + }, + "source": [ + "# Tutorial: Construa e Hospede uma demonstração de Aprendizado de Máquina com o Gradio ⚡ & Hugging Face 🤗" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "D_Iv1CJZPekG" + }, + "source": [ + "**Objetivos de aprendizado:** Os objetivos deste tutorial são aprender como\n", + "\n", + "1. Construir uma demonstração rápida para o seu modelo de aprendizado de máquina em Python usando a biblioteca `gradio`\n", + "2. Hospedar a demo de graça com o Hugging Face Spaces\n", + "3. Adicionar sua demo à Hugging Face org para sua aula ou conferência. Isso incluirá:\n", + " * Uma configuração passo a passo para instrutores (ou organizadores de conferências)\n", + " * Envio de instruções para estudantes (ou participantes de conferências)\n", + "\n", + "**Duração**: 20 a 40\n", + " minutos\n", + "\n", + "**Pré-requisitos:** Conhecimento de Python e familiaridade básica com aprendizado de máquina\n", + "\n", + "**Autor**: [Abubakar Abid](https://twitter.com/abidlabs) (feel free to ping me with any questions about this tutorial)\n", + "**Tradutor**: [Rafael Zimmer](https://github.com/rzimmerdev)\n", + "\n", + "Todas as etapas podem ser realizadas de graça! Tudo que você irá precisar é uma conexão à internet e um lugar onde possa programar em Python 👩‍💻" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PR9faV2NWTrG" + }, + "source": [ + "## Porque ?\n", + "\n", + "**Demos** de modelos de aprendizado de máquina são uma parte importante de _cursos_ e _conferências_ sobre aprendizado de máquina. Demonstrações permitem:\n", + "\n", + "* que desenvolvedores de modelos **apresentem** os seus trabalhos a uma ampla audiência\n", + "* um aumento na **reprodutividade** da pesquisa sobre aprendizado de máquina\n", + "* que usuários *identifiquem e debugem* pontos de falhas de modelos mais facilmente\n", + "\n", + "\n", + "Para um exemplo rápido sobre o que gostariamos de montar, confira o [Keras Org na Hugging Face](https://huggingface.co/keras-io), que inclui uma descrição\n", + "e uma coleção de Modelos e Espaços construídos pela comunidade do Keras. Qualquer Espaço pode ser aberto em seu navegador, e você poderá usar o modelo imediatamente, como mostrado a seguir:\n", + "\n", + "![](https://i.ibb.co/7y6DGjB/ezgif-5-cc52b7e590.gif)\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "g0KzbU4lQtv3" + }, + "source": [ + "## 1. Montando demonstrações rápidas em Python de Aprendizado de Máquina usando o Gradio" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rlSs72oUQ1VW" + }, + "source": [ + "`gradio` é uma biblioteca de Python extremamente útil que permite a construção de demonstrações online simplesmente especificando uma lista de componentes de entrada e saída esperados pelo seu modelo de aprendizado de máquina.\n", + "\n", + "O que podem então ser considerados como componentes de entrada e saída? O Gradio vem com um conjunto de componentes pré-definidos para diversos tipos de modelos de aprendizado de máquina. A seguir temos alguns exemplos:\n", + "\n", + "* Para um **classificador de imagem**, a entrada esperada é do tipo `Imagem` e a saída do tipo `Label`.\n", + "* Para um **modelo reconhecedor de fala**, o componente de entrada é do tipo `Microphone` (que permite aos usuários gravar áudio pelo navegador), ou áudio (que permite usuários puxar e soltar arquivos de áudio), enquanto a saída é do tipo `Text`.\n", + "* Para um **modelo de questões e respostas**, **2 entradas** são esperadas: [`Text`, `Text`], uma para a caixa de texto com um parágrafo, e outro para questão, e a saída é única, do tipo `Text` correspondendo à resposta.\n", + "\n", + "Você entendeu a idéia... (para todos os componentes aceitos, [acesse a documentação](https://gradio.app/docs/))\n", + "\n", + "Além da entrada e saída, o Gradio espera também um terceiro parâmetro, que é a predição do modelo em si. Esse parâmetro pode ser ***qualquer* função regular do Python** que receba parâmetro(s) correspondendo ao(s) componente(s) de entrada e que tenha como retorno valor(es), correspondendo ao(s) componente(s) de saída.\n", + "\n", + "Chega de discutir. Vamos ao programa!" + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "p0MkPbbZbSiP", + "outputId": "e143c5df-5b98-46c6-f2f7-7fc7abebd3d7" + }, + "outputs": [], + "source": [ + "# Primeiro, installe o gradio\n", + "!pip install --quiet gradio" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": { + "id": "SjTxhry8bWS7" + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "def sepia(image):\n", + " sepia_filter = np.array(\n", + " [[0.393, 0.769, 0.189], \n", + " [0.349, 0.686, 0.168], \n", + " [0.272, 0.534, 0.131]]\n", + " )\n", + " sepia_img = image.dot(sepia_filter.T)\n", + " sepia_img /= sepia_img.max()\n", + " return sepia_img" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": { + "id": "OgqlIG2DbrJq" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Running on local URL: http://127.0.0.1:7860/\n", + "Running on public URL: https://10801.gradio.app\n", + "\n", + "This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)\n" + ] + }, + { + "data": { + "text/plain": "", + "text/html": "\n \n " + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import gradio as gr\n", + "\n", + "# Escreva uma simples linha para criar uma Interface Gráfica\n", + "gr.Interface(fn=sepia, inputs=\"image\", outputs=\"image\").launch(share=True);" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0TyTGpSsb7bs" + }, + "source": [ + "O código acima deverá produzir uma simples interface gráfica dentro do ‘notebook’, que lhe permitirá enviar uma entrada e ver a saída como retorno da sua função.\n", + "\n", + "Note também que definimos a `Interface` usando os três ingredientes mencionados anteriormente:\n", + "* Uma função\n", + "* Componente(s) de entrada\n", + "* Componente(s) de saída\n", + "\n", + "Fizemos um exemplo simples para imagens, mas a idéia fundamental vale para quaisquer outros tipos de dados. Por exemplo, abaixo há uma interface que irá\n", + "gerar um tom musical quando receber alguns parâmetros diferentes (o código específico dentro de `generate_tone()` não é importante para os propósitos deste tutorial):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 643 + }, + "id": "cHiZAO6ub6kA", + "outputId": "ee9e8bfd-4b86-4ddf-c96d-d389cdc0730e" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`\n", + "Running on public URL: https://20619.gradio.app\n", + "\n", + "This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "(,\n", + " 'http://127.0.0.1:7860/',\n", + " 'https://20619.gradio.app')" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "import gradio as gr\n", + "\n", + "def generate_tone(note, octave, duration):\n", + " sampling_rate = 48000\n", + " a4_freq, tones_from_a4 = 440, 12 * (octave - 4) + (note - 9)\n", + " frequency = a4_freq * 2 ** (tones_from_a4 / 12)\n", + " audio = np.linspace(0, int(duration), int(duration) * sampling_rate)\n", + " audio = (20000 * np.sin(audio * (2 * np.pi * frequency))).astype(np.int16)\n", + " return sampling_rate, audio\n", + "\n", + "gr.Interface(\n", + " generate_tone,\n", + " [\n", + " gr.inputs.Dropdown([\"C\", \"C#\", \"D\", \"D#\", \"E\", \"F\", \"F#\", \"G\", \"G#\", \"A\", \"A#\", \"B\"], type=\"index\"),\n", + " gr.inputs.Slider(4, 6, step=1),\n", + " gr.inputs.Textbox(type=\"number\", default=1, label=\"Duration in seconds\"),\n", + " ],\n", + " \"audio\",\n", + " title=\"Generate a Musical Tone!\"\n", + ").launch(share=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "23gD280-w-kT" + }, + "source": [ + "**Desafio #1**: construa uma demonstração do Gradio que receba uma imagem e retorne a mesma image *virada de cabeça pra baixo* em menos de 10 linhas de código em Python." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DSE6TZF5e9Oz" + }, + "source": [ + "Há diversos outros exemplos para você testar na [página de introdução](https://gradio.app/getting_started/) do Gradio, que cobre funcionalidades adicionais, como:\n", + "* Adicionar exemplos para entradas\n", + "* Adicionar _estados_ (para chatbots, por exemplo)\n", + "* Comparar demonstrações facilmente usando o parâmetro `share` (<-- bem interessante 😎)\n", + "\n", + "É especialmente fácil transformar um modelo `transformers` do Hugging Face's Model Hub em uma demo, usando o método especial `gr.Interface.load`\n", + "\n", + "Testaremos um modelo de texto para fala, construído pelo Facebook:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import gradio as gr\n", + "\n", + "gr.Interface.load(\"huggingface/facebook/fastspeech2-en-ljspeech\").launch(share=True);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Aqui é carregado o código para a demo do [GPT-J](https://huggingface.co/EleutherAI/gpt-j-6B), um grande modelo de linguagem & algumas entradas como exemplo:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 608 + }, + "id": "N_Cobhx8e8v9", + "outputId": "2bac3837-feff-42ea-a577-60343f19535b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Fetching model from: https://huggingface.co/EleutherAI/gpt-j-6B\n", + "Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`\n", + "Running on public URL: https://30262.gradio.app\n", + "\n", + "This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import gradio as gr\n", + "\n", + "examples = [[\"The Moon's orbit around Earth has\"], [\"There once was a pineapple\"]]\n", + "\n", + "gr.Interface.load(\"huggingface/EleutherAI/gpt-j-6B\", examples=examples).launch(share=True);" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EoUYf0rYksA9" + }, + "source": [ + "**Desafio #2**: vá para o [Model Hub da Hugging Face](https://huggingface.co/models) e escolha um modelo que realize alguma das outras tarefas aceitas pela biblioteca de `transformers` (diferente dos dois que você acabou de ver: geração de texto e texto para fala). Crie uma demonstração do Gradio para o modelo usando o `gr.Interface.load`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b6Ek7cORgDkQ" + }, + "source": [ + "## 3. Hospede a Demo (de graça) no Hugging Face Spaces\n", + "\n", + "Quando tiver feito a demonstração com o Gradio, poderá hospedá-la permanentemente no Hugging Face Spaces facilmente:\n", + "\n", + "A seguir estão os passos a seguir (demonstrados no GIF abaixo):\n", + "\n", + "A. Primeiro, crie uma conta na Hugging Face se já não tiver uma; visite https://huggingface.co/ and clicking e clique em \"Sign Up\"\n", + "\n", + "B. Após entrar, clique na sua foto de perfil e depois clique em \"Novo Espaço\", logo abaixo para chegar nessa página: https://huggingface.co/new-space\n", + "\n", + "C. Dê um nome ao seu espaço e uma licença. Selecione \"Gradio\" como o SDK do Space, e selecione \"Público\" se decidir dar acesso ao seu Space para todos e o código dentro do mesmo\n", + "\n", + "D. No mesmo ambiente, há uma página com instruções de como carregar arquvios do seu repositório Git para o Space. Será necessário\n", + "adicionar um arquiv `requirements.txt` para especificar quaisquer dependências de pacotes do Python.\n", + "\n", + "E. Após enviar seus arquivos, sente-se e relaxe! Os Spaces irão automaticamente construir a sua demonstração do Gradio, permitindo que você as compartilhe com qualquer um, em qualquer lugar!\n", + "\n", + "![GIF](https://huggingface.co/blog/assets/28_gradio-spaces/spaces-demo-finalized.gif)\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d4XCmQ_RILoq" + }, + "source": [ + "Você pode até embutir a sua demonstração do Gradio em qualquer página — seja num blog, em um portfólio online, ou até mesmo em um Notebook do Colab, como foi feito a seguir com um modelo de reconhecimento de retratos do Pictionary:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IwNP5DJOKUql" + }, + "outputs": [], + "source": [ + "from IPython.display import IFrame\n", + "IFrame(src='https://hf.space/gradioiframe/abidlabs/Draw/+', width=1000, height=800)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dw6H-iQAlF8I" + }, + "source": [ + "**Desafio #3**: Carregue a sua demonstração do Gradio para o Hugging Face Spaces e receba um link permanente para o mesmo. Divulgue o link permanente com alguém (um colega, colaborador, amigo, usuário, etc.) - e receba opiniões sobre o seu modelo de aprendizado de máquina." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MqD0O1PKIg3g" + }, + "source": [ + "## 3. Adicione sua demo a Hugging Face org para sua aula ou conferência" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DrMObQbwLOHm" + }, + "source": [ + "#### **Setup** (para instrutores ou organizadores de conferências)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_45C7MnXNbc0" + }, + "source": [ + "A. Primeiro, crie uma conta na Hugging Face se já não tiver uma; visite https://huggingface.co/ e clique em \"Sign Up\"\n", + "\n", + "B. Após entrar, clique na sua foto de perfil e depois clique em \"Nova Organização\" logo abaixo para acessar essa página: https://huggingface.co/organizations/new\n", + "\n", + "C. Preencha a informação relativa a sua aula ou conferência. Recomendamos criar uma organização separada toda vez que um curso diferente for dado (por exemplo, \"Stanford-CS236g-2022\") ou para cada ano em que houver a conferência.\n", + "\n", + "D. Sua organização será criada, e agora novos usuários poderão enviar pedidos de inscrição ao visitar a página da sua organização.\n", + "\n", + "E. Opcionalmente, você também pode mudar as configurações ao clicar em \"Organization settings\". Tipicamente, para aulas e conferências, deverá ir em `Settings > Members` e mudar a \"Default role for new members\" (cargo padrão para novos membros) para \"write\" (escrever), o que permitirá novos membros enviar Spaces, mas não mudar as configurações da organização." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iSqzO-w8LY0R" + }, + "source": [ + "#### Para estudantes ou participantes de conferências" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3x1Oyh4wOdOK", + "pycharm": { + "name": "#%% md\n" + } + }, + "source": [ + "A. Peça a seu instrutor / organizador da conferência pelo link da página da Organização, se já não o tiver.\n", + "\n", + "B. Visite a página da Organização e clique em \"Request to join this org\" (Pedir para se juntar à essa organização), se já não fizer parte da mesma.\n", + "\n", + "C. Finalmente, após ter sido aprovado para entrar (e já ter construído sua demonstração do Gradio e enviado para o Spaces - retorne à Seção 1 e 2 para ver como), simplesmente vá ao seu Space e acesse `Settings > Rename or transfer this space` (renomear ou transferir este Space) e selecione a organização desejada no menu `New owner`. Clique no botão para confirmar e o seu Space já será adicionado ao Space do seu curso ou conferência." + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "Building and Hosting Machine Learning Demos with Gradio & Hugging Face", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.5" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} \ No newline at end of file From c160b89237cf419d60244faac3cbd037f3c1fd9d Mon Sep 17 00:00:00 2001 From: rzimmerdev Date: Sat, 7 May 2022 16:11:52 -0300 Subject: [PATCH 3/4] [ Undo Move ] - Accidentally moved original files instead of copying them. --- 01_huggingface-hub-tour.md | 221 +++++++++++++++++++++++++++++++++++++ 1 file changed, 221 insertions(+) create mode 100644 01_huggingface-hub-tour.md diff --git a/01_huggingface-hub-tour.md b/01_huggingface-hub-tour.md new file mode 100644 index 0000000..e1ed2ff --- /dev/null +++ b/01_huggingface-hub-tour.md @@ -0,0 +1,221 @@ +# Workshop: A Tour through the Hugging Face Hub + + + +**Duration:** 20 to 40 minutes + +**Goal:** Learn how to efficiently use the free [Hub platform](http://hf.co) to be able to collaborate in the ecosystem and within teams in Machine Learning (ML) projects. + +Learning goals: + +- Learn about and explore the over 30,000 models shared on the Hub. +- Learn efficient ways to find suitable models and datasets for your task. +- Learn how to contribute and work collaboratively. +- Explore ML demos created by the community. + +**Format:** Either short lab or take-home + +**Audience:** Students from any background interested in using existing models or sharing their models. + +**Prerequisites** + +- High-level understanding of Machine Learning. +- (Optional, but encouraged) Experience with Git ([resource](https://learngitbranching.js.org/)) + +## **Why the Hub?** + +The Hub is a central platform where anyone can share and explore models, datasets, and ML demos. The "solve AI" problem won't be solved by a single company, but by a culture of sharing knowledge and resources. Because of this, the Hub aims to build the most extensive collection of Open Source models, datasets, and demos. + +Here are some facts about the Hugging Face Hub: + +- There are over 30,000 public models. +- There are models for Natural Language Processing, Computer Vision, Audio/Speech, and Reinforcement Learning! +- There are models for over 180 languages. +- Any ML library can leverage the Hub: from TensorFlow and PyTorch to advanced integrations with spaCy, SpeechBrain, and 20 other libraries. + +## Exploring a model + +Let’s kick off the exploration of models. You can access 30,000 models at [hf.co/models](http://hf.co/models). You will see [gpt2](https://huggingface.co/gpt2) as one of the models with the most downloads. Let’s click on it. + +The website will take you to the model card when you click a model. A model card is a tool that documents models, providing helpful information about the models and being essential for discoverability and reproducibility. + +The interface has many components, so let’s go through them: + +[https://www.youtube.com/watch?v=XvSGPZFEjDY&feature=emb_imp_woyt](https://www.youtube.com/watch?v=XvSGPZFEjDY&feature=emb_imp_woyt) + +- At the top, you can find different **tags** for things such as the task (*text generation, image classification*, etc.), frameworks (*PyTorch*, *TensorFlow*, etc.), the model’s language (*English*, *Arabic*, *etc.*), and license (*e.g. MIT*). + +![](../../images/mode_card_tags.png) + +- At the right column, you can play with the model directly in the browser using the *Inference API*. GPT2 is a text generation model, so it will generate additional text given an initial input. Try typing something like, “It was a bright and sunny day.” + +![](../../images/model_card_inference_api.png) + +- In the middle, you can go through the model card content. It has sections such as Intended uses & limitations, Training procedure, and Citation Info. + +![](../../images/model_card_content.png) + +Where does all this data come from? At Hugging Face, everything is based in **Git repositories** and is open-sourced. You can click the “Files and Versions” tab, which will allow you to see all the repository files, including the model weights. The model card is a markdown file **([README.md](http://README.md))** which on top of the content contains metadata such as the tags. + +![](../../images/model_card_git.png) + +Since all models are Git-based repositories, you get version control out of the box. Just as with GitHub, you can do things such as Git cloning, adding, committing, branching, and pushing. If you’ve never used Git before, we suggest the following [resource](https://learngitbranching.js.org/). + +**Challenge 1**. Open the `config.json` file of the GPT2 repository. The config file contains hyperparameters as well as useful information for loading the model. From this file, answer: + +- Which is the activation function? +- What is the vocabulary size? + +## **Exploring Models** + +So far, we’ve explored a single model. Let’s go wild! At the left of [https://huggingface.co/models](https://huggingface.co/models), you can filter for different things: + +- **Tasks:** There is support for dozens of tasks in different domains: Computer Vision, Natural Language Processing, Audio, and more. You can click the +13 to see all available tasks. + - **Libraries:** Although the Hub was originally for transformers models, the Hub has integration with dozens of libraries. You can find models of Keras, spaCy, allenNLP, and more. +- **Datasets:** The Hub also hosts thousands of datasets, as you’ll find more about later. + +![](../../images/model_card_filters.png) + +- **Languages:** Many of the models on the Hub are NLP-related. You can find models for hundreds of languages, including low-resource languages. + +**Challenge 2**. How many token classification models are there in English? + +**Challenge 3**. If you had to pick a Spanish model for Automatic Speech Recognition, which would you choose? (It can be any model for this task and language) + +## Adding a model + +Let’s say you want to upload a model to the Hub. This model could be a model of any ML library: Scikit-learn, Keras, Transformers, etc. + +Let’s go through the steps: + +1. Go to [huggingface.co/new](http://huggingface.co/new) to create a new model repository. The repositories you make can be either public or private. +2. You start with a public repo that has a model card. You can upload your model either by using the Web UI or by doing it with Git. If you’ve never used Git before, we suggest just using the Web interface. You can click Add File and drag and drop the files you want to add. If you want to understand the complete workflow, let’s go with the Git approach. + + 1. Install both git and git-lfs installed on your system. + 1. Git: [https://git-scm.com/book/en/v2/Getting-Started-Installing-Git](https://git-scm.com/book/en/v2/Getting-Started-Installing-Git) + 2. Git-lfs: [https://git-lfs.github.com/](https://git-lfs.github.com/). Large files need to be uploaded with Git LFS. Git does not work well once your files are above a few megabytes, which is frequent in ML. ML models can be up to gigabytes or terabytes! 🤯 + 2. Clone the repository you just created + + ```python + git clone https://huggingface.co// + ``` + + 3. Go to the directory and initialize Git LFS + 1. Optional. We already provide a list of common file extensions for the large files in `.gitattributes`, If the files you want to upload are not included in the `.gitattributes` file, you might need as shown here: You can do so with + + ```python + git lfs track "*.your_extension" + ``` + + ```python + git lfs install + ``` + + 4. Add your files to the repository. The files depend on the framework/libraries you’re using. Overall, what is important is that you provide all artifacts required to load the model. For example: + 1. For TensorFlow, you might want to upload a SavedModel or `h5` file. + 2. For PyTorch, usually, it’s a `pytorch_model.bin`. + 3. For Scikit-Learn, it’s usually a `joblib` file. + + Here is an example in Python saving a Scikit-Learn model file. + + ```python + from sklearn import linear_model + reg = linear_model.LinearRegression() + reg.fit([[0, 0], [1, 1], [2, 2]], [0, 1, 2]) + + from joblib import dump, load + dump(reg, 'model.joblib') + ``` + + 5. Commit and push your files (make sure the saved file is within the repository) + + ```python + git add . + git commit -m "First model version" + git push + ``` + +And we're done! You can check your repository with all the recently added files! + +![](../../images/model_card_updated_repo.png) + +The UI allows you to explore the model files and commits and to see the diff introduced by each commit. + +**Challenge 4**. It’s your turn! Upload a dummy model of the library of your choice. + +Now that the model is in the Hub, others can find them! You can also collaborate with others easily by creating an organization. Hosting through the Hub allows a team to update repositories and do things you might be used to, such as working in branches and working collaboratively. The Hub also enables versioning in your models: if a model checkpoint is suddenly broken, you can always head back to a previous version. + +At the top of the `README`, you can find some metadata. You will only find the license right now, but you can add more things. Let’s try some of it: + +```yaml + tags: +- es # This will automatically be detected as a language tag. +- bert # You can have additional tags for filtering +- text-classification # This will automatically be detected as a task tag. +datasets: +- llamas # This will link to a dataset on the Hub if it exists. +``` + +**Challenge 5**. Using the [documentation](https://huggingface.co/docs/hub/model-repos#how-are-model-tags-determined), change the default example in the widget. + +The metadata allows people to discover your model quickly. Your model will now show up when you search for text classification models in Spanish. The model will also show up when looking at the dataset. + +Wait...datasets? + +## Datasets + +With ML pipelines, you usually have a dataset to train the model. The Hub hosts around 3000 datasets that are open-sourced and free to use in multiple domains. On top of it, the open-source `datasets` [library](https://huggingface.co/docs/datasets/) allows the easy use of these datasets, including huge ones, using very convenient features such as streaming. This lab won't go through the library, but it does explain how to explore them. + +Similar to models, you can head to [https://hf.co/datasets](https://hf.co/datasets). At the left, you can find different filters based on the task, license, and size of the dataset. + +Let’s explore the [GLUE](https://huggingface.co/datasets/glue) dataset, which is a famous dataset used to test the performance of NLP models. + +- Similar to model repositories, you have a dataset card that documents the dataset. If you scroll down a bit, you will find things such as the summary, the structure, and more. + +![](../../images/datasets_card.png) + +- At the top, you can explore a slice of the dataset directly in the browser. The GLUE dataset is divided into multiple sub-datasets (or subsets) that you can select, such as COLA and QNLI. + + ![](../../images/datasets_slices.png) + +- At the right of the dataset card, you can see a list of models trained on this dataset. + +![](../../images/datasets_models_trained.png) + +**Challenge 6**. Search for the Common Voice dataset. Answer these questions: + +- What tasks can the Common Voice dataset be used to? +- How many languages are covered in this dataset? +- Which are the dataset splits? + +## ML Demos + +Sharing your models and datasets is great, but creating an interactive, publicly available demo is even cooler. Demos of models are an increasingly important part of the ecosystem. Demos allow: + +- model developers to easily **present** their work to a wide audience, such as in stakeholder presentations, conferences, and course projects +- to increase **reproducibility** in machine learning by lowering the barrier to test a model +- to share with a non-technical audience **the impact of a model** +- build a machine learning **portfolio** + +There are Open-Source Python frameworks such as Gradio and Streamlit that allow building these demos very easily, and tools such as Hugging Face [Spaces](http://hf.co/spaces/launch) which allow to host and share them. As a follow-up lab, we recommend doing the **Build and Host Machine Learning Demos with Gradio & Hugging Face** tutorial. + +> In this tutorial, you get to: +> +> - Explore ML demos created by the community. +> - Build a quick demo for your machine learning model in Python using the `gradio` library +> - Host the demos for free with Hugging Face Spaces +> - Add your demo to the Hugging Face org for your class or conference +> +> ***Duration: 20-40 minutes*** +> +> 👉 [click here to access the tutorial](https://colab.research.google.com/github.com/huggingface/education-toolkit/tree/main/02_ml-demos-with-gradio.ipynb) From 2fa41565832fac6f7b43d00a52d1b9abcfffca42 Mon Sep 17 00:00:00 2001 From: rzimmerdev Date: Sat, 7 May 2022 16:13:11 -0300 Subject: [PATCH 4/4] [ Undo Move ] - Copied 02_ml-demos-with-gradio.ipynb back --- 02_ml-demos-with-gradio.ipynb | 525 ++++++++++++++++++++++++++++++++++ 1 file changed, 525 insertions(+) create mode 100644 02_ml-demos-with-gradio.ipynb diff --git a/02_ml-demos-with-gradio.ipynb b/02_ml-demos-with-gradio.ipynb new file mode 100644 index 0000000..aae2364 --- /dev/null +++ b/02_ml-demos-with-gradio.ipynb @@ -0,0 +1,525 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": { + "id": "gh6QOr-qO4Ym" + }, + "source": [ + "[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/huggingface/education-toolkit/blob/main/02_ml-demos-with-gradio.ipynb)\n", + "\n", + "\n", + "\n", + "💡 **Welcome!**\n", + "\n", + "We’ve assembled a toolkit that university instructors and organizers can use to easily prepare labs, homework, or classes. The content is designed in a self-contained way such that it can easily be incorporated into the existing curriculum. This content is free and uses widely known Open Source technologies (`transformers`, `gradio`, etc).\n", + "\n", + "Alternatively, you can request for someone on the Hugging Face team to run the tutorials for your class via the [ML demo.cratization tour](https://huggingface2.notion.site/ML-Demo-cratization-tour-with-66847a294abd4e9785e85663f5239652) initiative!\n", + "\n", + "You can find all the tutorials and resources we’ve assembled [here](https://huggingface2.notion.site/Education-Toolkit-7b4a9a9d65ee4a6eb16178ec2a4f3599). " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "NkJmA-r5L0EB" + }, + "source": [ + "# Tutorial: Build and Host Machine Learning Demos with Gradio ⚡ & Hugging Face 🤗 " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "D_Iv1CJZPekG" + }, + "source": [ + "**Learning goals:** The goal of this tutorial is to learn How To\n", + "\n", + "1. Build a quick demo for your machine learning model in Python using the `gradio` library\n", + "2. Host the demos for free with Hugging Face Spaces\n", + "3. Add your demo to the Hugging Face org for your class or conference. This includes:\n", + " * A setup step for instructors (or conference organizers)\n", + " * Upload instructions for students (or conference participants)\n", + "\n", + "**Duration**: 20-40\n", + " minutes\n", + "\n", + "**Prerequisites:** Knowledge of Python and basic familiarity with machine learning \n", + "\n", + "\n", + "**Author**: [Abubakar Abid](https://twitter.com/abidlabs) (feel free to ping me with any questions about this tutorial) \n", + "\n", + "All of these steps can be done for free! All you need is an Internet browser and a place where you can write Python 👩‍💻" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "PR9faV2NWTrG" + }, + "source": [ + "## Why Demos?\n", + "\n", + "**Demos** of machine learning models are an increasingly important part of machine learning _courses_ and _conferences_. Demos allow:\n", + "\n", + "* model developers to easily **present** their work to a wide audience\n", + "* increase **reproducibility** of machine learning research\n", + "* diverse users to more easily **identify and debug** failure points of models\n", + "\n", + "\n", + "As a quick example of what we would like to build, check out the [Keras Org on Hugging Face](https://huggingface.co/keras-io), which includes a description card and a collection of Models and Spaces built by Keras community. Any Space can be opened in your browser and you can use the model immediately, as shown here: \n", + "\n", + "![](https://i.ibb.co/7y6DGjB/ezgif-5-cc52b7e590.gif)\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "g0KzbU4lQtv3" + }, + "source": [ + "## 1. Build Quick ML Demos in Python Using the Gradio Library" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "rlSs72oUQ1VW" + }, + "source": [ + "`gradio` is a handy Python library that lets you build web demos simply by specifying the list of input and output **components** expected by your machine learning model. \n", + "\n", + "What do I mean by input and output components? Gradio comes with a bunch of predefined components for different kinds of machine learning models. Here are some examples:\n", + "\n", + "* For an **image classifier**, the expected input type is an `Image` and the output type is a `Label`. \n", + "* For a **speech recognition model**, the expected input component is an `Microphone` (which lets users record from the browser) or `Audio` (which lets users drag-and-drop audio files), while the output type is `Text`. \n", + "* For a **question answering model**, we expect **2 inputs**: [`Text`, `Text`], one textbox for the paragraph and one for the question, and the output type is a single `Text` corresponding to the answer. \n", + "\n", + "You get the idea... (for all of the supported components, [see the docs](https://gradio.app/docs/))\n", + "\n", + "In addition to the input and output types, Gradio expects a third parameter, which is the prediction function itself. This parameter can be ***any* regular Python function** that takes in parameter(s) corresponding to the input component(s) and returns value(s) corresponding to the output component(s)\n", + "\n", + "Enough words. Let's see some code!" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/" + }, + "id": "p0MkPbbZbSiP", + "outputId": "e143c5df-5b98-46c6-f2f7-7fc7abebd3d7" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "\u001b[K |████████████████████████████████| 871 kB 5.1 MB/s \n", + "\u001b[K |████████████████████████████████| 2.0 MB 41.5 MB/s \n", + "\u001b[K |████████████████████████████████| 52 kB 787 kB/s \n", + "\u001b[K |████████████████████████████████| 1.1 MB 25.8 MB/s \n", + "\u001b[K |████████████████████████████████| 52 kB 1.1 MB/s \n", + "\u001b[K |████████████████████████████████| 210 kB 56.5 MB/s \n", + "\u001b[K |████████████████████████████████| 94 kB 2.8 MB/s \n", + "\u001b[K |████████████████████████████████| 271 kB 58.7 MB/s \n", + "\u001b[K |████████████████████████████████| 144 kB 58.8 MB/s \n", + "\u001b[K |████████████████████████████████| 10.9 MB 44.8 MB/s \n", + "\u001b[K |████████████████████████████████| 58 kB 5.3 MB/s \n", + "\u001b[K |████████████████████████████████| 79 kB 6.6 MB/s \n", + "\u001b[K |████████████████████████████████| 856 kB 60.6 MB/s \n", + "\u001b[K |████████████████████████████████| 61 kB 374 kB/s \n", + "\u001b[K |████████████████████████████████| 3.6 MB 50.0 MB/s \n", + "\u001b[K |████████████████████████████████| 58 kB 4.5 MB/s \n", + "\u001b[?25h Building wheel for ffmpy (setup.py) ... \u001b[?25l\u001b[?25hdone\n", + " Building wheel for python-multipart (setup.py) ... \u001b[?25l\u001b[?25hdone\n" + ] + } + ], + "source": [ + "# First, install Gradio\n", + "!pip install --quiet gradio" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": { + "id": "SjTxhry8bWS7" + }, + "outputs": [], + "source": [ + "import numpy as np\n", + "\n", + "def sepia(image):\n", + " sepia_filter = np.array(\n", + " [[0.393, 0.769, 0.189], \n", + " [0.349, 0.686, 0.168], \n", + " [0.272, 0.534, 0.131]]\n", + " )\n", + " sepia_img = image.dot(sepia_filter.T)\n", + " sepia_img /= sepia_img.max()\n", + " return sepia_img" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "OgqlIG2DbrJq" + }, + "outputs": [], + "source": [ + "import gradio as gr\n", + "\n", + "# Write 1 line of Python to create a simple GUI\n", + "gr.Interface(fn=sepia, inputs=\"image\", outputs=\"image\").launch(share=True);" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "0TyTGpSsb7bs" + }, + "source": [ + "Running the code above should produce a simple GUI inside this notebook allowing you to type example inputs and see the output returned by your function. \n", + "\n", + "Notice that we define an `Interface` using the 3 ingredients mentioned earlier:\n", + "* A function\n", + "* Input component(s)\n", + "* Output component(s)\n", + "\n", + "This is a simple example for images, but the same principle holds true for any other kind of data type. For example, here is an interface that generates a musical tone when provided a few different parameters (the specific code inside `generate_tone()` is not important for the purpose of this tutorial):" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 643 + }, + "id": "cHiZAO6ub6kA", + "outputId": "ee9e8bfd-4b86-4ddf-c96d-d389cdc0730e" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`\n", + "Running on public URL: https://20619.gradio.app\n", + "\n", + "This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "text/plain": [ + "(,\n", + " 'http://127.0.0.1:7860/',\n", + " 'https://20619.gradio.app')" + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import numpy as np\n", + "import gradio as gr\n", + "\n", + "def generate_tone(note, octave, duration):\n", + " sampling_rate = 48000\n", + " a4_freq, tones_from_a4 = 440, 12 * (octave - 4) + (note - 9)\n", + " frequency = a4_freq * 2 ** (tones_from_a4 / 12)\n", + " audio = np.linspace(0, int(duration), int(duration) * sampling_rate)\n", + " audio = (20000 * np.sin(audio * (2 * np.pi * frequency))).astype(np.int16)\n", + " return sampling_rate, audio\n", + "\n", + "gr.Interface(\n", + " generate_tone,\n", + " [\n", + " gr.inputs.Dropdown([\"C\", \"C#\", \"D\", \"D#\", \"E\", \"F\", \"F#\", \"G\", \"G#\", \"A\", \"A#\", \"B\"], type=\"index\"),\n", + " gr.inputs.Slider(4, 6, step=1),\n", + " gr.inputs.Textbox(type=\"number\", default=1, label=\"Duration in seconds\"),\n", + " ],\n", + " \"audio\",\n", + " title=\"Generate a Musical Tone!\"\n", + ").launch(share=True)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "23gD280-w-kT" + }, + "source": [ + "**Challenge #1**: build a Gradio demo that takes in an image and returns the same image *flipped upside down* in less than 10 lines of Python code." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DSE6TZF5e9Oz" + }, + "source": [ + "There are a lot more examples you can try in Gradio's [getting started page](https://gradio.app/getting_started/), which cover additional features such as:\n", + "* Adding example inputs\n", + "* Adding _state_ (e.g. for chatbots)\n", + "* Sharing demos easily using one parameter called `share` (<-- this is pretty cool 😎)\n", + "\n", + "It is especially easy to demo a `transformers` model from Hugging Face's Model Hub, using the special `gr.Interface.load` method. \n", + "\n", + "Let's try a text-to-speech model built by Facebook:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [ + "import gradio as gr\n", + "\n", + "gr.Interface.load(\"huggingface/facebook/fastspeech2-en-ljspeech\").launch(share=True);" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Here is the code to build a demo for [GPT-J](https://huggingface.co/EleutherAI/gpt-j-6B), a large language model & add a couple of examples inputs:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "colab": { + "base_uri": "https://localhost:8080/", + "height": 608 + }, + "id": "N_Cobhx8e8v9", + "outputId": "2bac3837-feff-42ea-a577-60343f19535b" + }, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Fetching model from: https://huggingface.co/EleutherAI/gpt-j-6B\n", + "Colab notebook detected. To show errors in colab notebook, set `debug=True` in `launch()`\n", + "Running on public URL: https://30262.gradio.app\n", + "\n", + "This share link expires in 72 hours. For free permanent hosting, check out Spaces (https://huggingface.co/spaces)\n" + ] + }, + { + "data": { + "text/html": [ + "\n", + " \n", + " " + ], + "text/plain": [ + "" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import gradio as gr\n", + "\n", + "examples = [[\"The Moon's orbit around Earth has\"], [\"There once was a pineapple\"]]\n", + "\n", + "gr.Interface.load(\"huggingface/EleutherAI/gpt-j-6B\", examples=examples).launch(share=True);" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "EoUYf0rYksA9" + }, + "source": [ + "**Challenge #2**: Go to the [Hugging Face Model Hub](https://huggingface.co/models), and pick a model that performs one of the other tasks supported in the `transformers` library (other than the two you just saw: text generation or text-to-speech). Create a Gradio demo for that model using `gr.Interface.load`." + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "b6Ek7cORgDkQ" + }, + "source": [ + "## 2. Host the Demo (for free) on Hugging Face Spaces\n", + "\n", + "Once you made a Gradio demo, you can host it permanently on Hugging Spaces very easily:\n", + "\n", + "Here are the steps to that (shown in the GIF below):\n", + "\n", + "A. First, create a Hugging Face account if you do not already have one, by visiting https://huggingface.co/ and clicking \"Sign Up\"\n", + "\n", + "B. Once you are logged in, click on your profile picture and then click on \"New Space\" underneath it to get to this page: https://huggingface.co/new-space\n", + "\n", + "C. Give your Space a name and a license. Select \"Gradio\" as the Space SDK, and then choose \"Public\" if you are fine with everyone accessing your Space and the underlying code\n", + "\n", + "D. Then you will find a page that provides you instructions on how to upload your files into the Git repository for that Space. You may also need to add a `requirements.txt` file to specify any Python package dependencies.\n", + "\n", + "E. Once you have pushed your files, that's it! Spaces will automatically build your Gradio demo allowing you to share it with anyone, anywhere!\n", + "\n", + "![GIF](https://huggingface.co/blog/assets/28_gradio-spaces/spaces-demo-finalized.gif)\n", + "\n", + "\n", + "\n" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "d4XCmQ_RILoq" + }, + "source": [ + "You can even embed your Gradio demo on any website -- in a blog, a portfolio page, or even in a colab notebook, like I've done with a Pictionary sketch recognition model below:" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": { + "id": "IwNP5DJOKUql" + }, + "outputs": [], + "source": [ + "from IPython.display import IFrame\n", + "IFrame(src='https://hf.space/gradioiframe/abidlabs/Draw/+', width=1000, height=800)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "Dw6H-iQAlF8I" + }, + "source": [ + "**Challenge #3**: Upload your Gradio demo to Hugging Face Spaces and get a permanent URL for it. Share the permanent URL with someone (a colleague, a collaborator, a friend, a user, etc.) -- what kind of feedback do you get on your machine learning model?" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "MqD0O1PKIg3g" + }, + "source": [ + "## 3. Add your demo to the Hugging Face org for your class or conference" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "DrMObQbwLOHm" + }, + "source": [ + "#### **Setup** (for instructors or conference organizers)" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "_45C7MnXNbc0" + }, + "source": [ + "A. First, create a Hugging Face account if you do not already have one, by visiting https://huggingface.co/ and clicking \"Sign Up\"\n", + "\n", + "B. Once you are logged in, click on your profile picture and then click on \"New Organization\" underneath it to get to this page: https://huggingface.co/organizations/new\n", + "\n", + "C. Fill out the information for your class or conference. We recommend creating a separate organization each time that a class is taught (for example, \"Stanford-CS236g-20222\") and for each year of the conference.\n", + "\n", + "D. Your organization will be created and now now users will be able request adding themselves to your organizations by visiting the organization page.\n", + "\n", + "E. Optionally, you can change the settings by clicking on the \"Organization settings\" button. Typically, for classes and conferences, you will want to navigate to `Settings > Members` and set the \"Default role for new members\" to be \"write\", which allows them to submit Spaces but not change the settings. " + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "iSqzO-w8LY0R" + }, + "source": [ + "#### For students or conference participants" + ] + }, + { + "cell_type": "markdown", + "metadata": { + "id": "3x1Oyh4wOdOK" + }, + "source": [ + "A. Ask your instructor / coneference organizer for the link to the Organization page if you do not already have it\n", + "\n", + "B. Visit the Organization page and click \"Request to join this org\" button, if you are not yet part of the org.\n", + "\n", + "C. Then, once you have been approved to join the organization (and built your Gradio Demo and uploaded it to Spaces -- see Sections 1 and 2), then simply go to your Space and go to `Settings > Rename or transfer this space` and then select the organization name under `New owner`. Click the button and the Space will now be added to your class or conference Space! " + ] + } + ], + "metadata": { + "colab": { + "collapsed_sections": [], + "name": "Building and Hosting Machine Learning Demos with Gradio & Hugging Face", + "provenance": [] + }, + "kernelspec": { + "display_name": "Python 3", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.5" + } + }, + "nbformat": 4, + "nbformat_minor": 1 +} \ No newline at end of file