Update lesson 10 (APIs)

pletcher · pletcher · commit db358dabf350 · 2025-03-24T12:12:40.000-04:00
diff --git a/.gitignore b/.gitignore
@@ -1,3 +1,4 @@
 .ipynb_checkpoints
 _build
-.venv
+.venv
+.envrc
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
@@ -0,0 +1,6 @@
+repos:
+- repo: https://github.com/kynan/nbstripout
+  rev: 0.8.1
+  hooks:
+    - id: nbstripout
+      args: ['--extra-keys=metadata.celltoolbar cell.metadata.heading_collapsed']
diff --git a/10_apis.ipynb b/10_apis.ipynb
@@ -23,46 +23,122 @@
     "\n",
     "Often, however, when someone mentions an API, they are referring to a web-based API that is usually accessed over HTTP(S). You might have heard about the kerfuffle when Twitter shut down much of the access to its API, or when Reddit did the same thing a few years earlier. These APIs are servers that provide _interfaces_ (the \"I\" in \"API\") to a platform's data.\n",
     "\n",
-    "As you probably noticed while reading [Walker 2019](https://studentwork.prattsi.org/dh/2019/05/13/getting-data-for-digital-humanities-with-apis/), it is not exactly uncommon for references to APIs to become out of date.\n",
+    "As you probably noticed while reading @Walker2019, it is not exactly uncommon for references to APIs to become out of date.\n",
     "\n",
     "Luckily, we can still use the API provided by the [Digital Public Library of America](https://dp.la) for our work for this class.\n",
     "\n",
     "We'll be working with the Python [Requests](https://docs.python-requests.org/en/latest/) library, which provides its own easy-to-use API for making HTTP requests. In other words, it's APIs all the way down.\n",
     "\n",
-    "## Getting an access token"
+    "## Getting an access token\n",
+    "\n",
+    "Generally, APIs will ask that you first obtain a key to use them. Even if APIs offer unlimited requests, it is important for them to require users to supply an API key so that they can track (often anonymized) usage statistics, errors, and so on.\n",
+    "\n",
+    "Sometimes, APIs require you to pay, either immediately or after making a certain number of requests. Keys can be used to track usage for payment calculations, too. For an example of this system, see OpenAI's [pricing page](https://openai.com/api/pricing/).\n",
+    "\n",
+    "### An API Key for DPLA\n",
+    "\n",
+    "For this tutorial, we'll work with the Digital Public Library of America's (DPLA) API. Take a few minutes to read through their [API Basics](https://pro.dp.la/developers/api-basics), then request an API key.\n",
+    "\n",
+    ":::{note} Request types\n",
+    "\n",
+    "You'll notice that you must submit a `POST` request to receive an API key. `POST` is one of several HTTP verbs. When you enter a URL into a web browser and hit \"Enter,\" you're typically issuing a `GET` request: `GET` requests do not have a request body; they simply ask for the information at the provided URL, perhaps with some query parameters (the `key=value` pairs after a `?` in the URL).\n",
+    "\n",
+    "`POST` requests, by contrast, _may_ contain a request body. You've probably submitted `POST` requests without knowing it whenever you sign up for a new service. That's essentially what we're doing with DPLA here, we're just doing it from the command line instead of through an interface that DPLA has built.\n",
+    "\n",
+    "The DPLA [documentation](https://pro.dp.la/developers/policies#get-a-key) instructs you to submit a request using `curl`, but we don't have access to `curl` from this notebook. Instead, let's make the request using the Python \"Requests\" library."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
-    "# Getting an access token"
+    "%pip install requests"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import requests\n",
+    "\n",
+    "my_email = \"YOUR EMAIL HERE\"\n",
+    "\n",
+    "requests.post(f\"https://api.dp.la/v2/api_key/{my_email}\")"
    ]
   },
   {
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "## Making your first request"
+    "After running the above code cell, you should receive an email with your API code. It's good practice not to share these codes or include them in version control (i.e., git).\n",
+    "\n",
+    "Instead, create an account-specific [secret](https://docs.github.com/en/codespaces/managing-your-codespaces/managing-your-account-specific-secrets-for-github-codespaces) by following the instructions provided by GitHub. \n",
+    "\n",
+    "Let's call the secret `DPLA_API_KEY`. (It's conventional to use all caps for environment variables and secrets.)\n",
+    "\n",
+    "Make sure to give your fork of this repository access to the secret, and then restart this codespace. We'll be here when you get back."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Making your first request\n",
+    "\n",
+    "As we saw above, making requests using the `requests` library is pretty straightforward — for a `GET` request, we can just pass a URL to `requests.get()`.\n",
+    "\n",
+    "In order for the request to be successful, though, we'll need to include the API key in the `api_key` querystring parameter. And to do that, we'll need to use the `os` library in Python."
    ]
   },
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "import os\n",
+    "import requests\n",
+    "\n",
+    "DPLA_API_KEY = os.getenv(\"DPLA_API_KEY\")"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "Let's use the example provided by the DPLA documentation, querying for the term \"weasel\"."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
    "outputs": [],
    "source": [
-    "# Making your first request"
+    "\n",
+    "requests.get(f\"https://api.dp.la/v2/items?q=weasels&api_key={DPLA_API_KEY}\")\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "`<Response [200]>` means that our request was successful, but it doesn't give us a whole lot of information. This is because we have not read the response body. To do so, let's assign the response — which is the return value of `requests.get()` — to a variable and read it as JSON."
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": null,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "response = requests.get(f\"https://api.dp.la/v2/items?q=weasels&api_key={DPLA_API_KEY}\")\n",
+    "\n",
+    "response.json()"
    ]
   },
   {
@@ -75,11 +151,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "# Reading responses"
@@ -95,11 +167,7 @@
   {
    "cell_type": "code",
    "execution_count": null,
-   "metadata": {
-    "vscode": {
-     "languageId": "plaintext"
-    }
-   },
+   "metadata": {},
    "outputs": [],
    "source": [
     "# Constructing queries"
@@ -111,7 +179,8 @@
    "source": [
     "## Readings\n",
     "\n",
-    "- [Walker 2019](https://studentwork.prattsi.org/dh/2019/05/13/getting-data-for-digital-humanities-with-apis/): Getting Data for Digital Humanities with APIs: A Gentle Introduction\n",
+    "- @Walker2019\n",
+    "- @Matthes2023 [chs. 15–17]\n",
     "\n",
     "## Homework\n",
     "\n",
@@ -124,8 +193,22 @@
   }
  ],
  "metadata": {
+  "kernelspec": {
+   "display_name": ".venv",
+   "language": "python",
+   "name": "python3"
+  },
   "language_info": {
-   "name": "python"
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.12.7"
   }
  },
  "nbformat": 4,
diff --git a/bibliography.bib b/bibliography.bib
@@ -221,6 +221,17 @@ @article{Verhelst2023
   file = {/Users/pletcher/Zotero/storage/U2Z8SVZR/Verhelst - 2023 - Who is Speaking A Computational Analysis of Homeric Heroic Voices in the Homerocentones (First Rece.pdf}
 }
 
+@online{Walker2019,
+  title = {Getting {{Data}} for {{Digital Humanities}} with {{APIs}}: {{A Gentle Introduction}} – {{Digital Humanities}} @ {{Pratt School}} of {{Information}}},
+  shorttitle = {Getting {{Data}} for {{Digital Humanities}} with {{APIs}}},
+  author = {Walker},
+  date = {2019-05-13},
+  url = {https://studentwork.prattsi.org/dh/2019/05/13/getting-data-for-digital-humanities-with-apis/},
+  urldate = {2025-03-24},
+  langid = {american},
+  file = {/Users/pletcher/Zotero/storage/CND48CPP/getting-data-for-digital-humanities-with-apis.html}
+}
+
 @article{Wellmon2015,
   entrysubtype = {magazine},
   title = {Sacred {{Reading}}: {{From Augustine}} to the {{Digital Humanists}}},
diff --git a/requirements.txt b/requirements.txt
@@ -10,3 +10,6 @@ jupyterlab_myst
 
 # To tokenize texts
 nltk
+
+# To strip notebook output before commiting
+pre-commit

-Original file line number
+Diff line change
@@ @@ -1,3 +1,4 @@ @@
 .ipynb_checkpoints
 _build
 -.venv
 +.venv
 +.envrc