|
6 | 6 | "source": [ |
7 | 7 | "# Data Exploration Example in Python\n", |
8 | 8 | "\n", |
9 | | - "**Author**: Shania Braithwaite and PatentsView Team\n", |
10 | | - "\n" |
| 9 | + "- **Author**: Shania Braithwaite and PatentsView Team\n", |
| 10 | + "- **Date**: February 2024\n" |
11 | 11 | ] |
12 | 12 | }, |
13 | 13 | { |
14 | 14 | "cell_type": "markdown", |
15 | 15 | "metadata": {}, |
16 | 16 | "source": [ |
17 | 17 | "**Table of contents**<a id='toc0_'></a> \n", |
18 | | - "- 1. [Initialize Local DuckDB Database](#toc1_) \n", |
19 | | - " - 1.1. [Download Bulk Data Files from PatentsView](#toc1_1_) \n", |
20 | | - " - 1.2. [Create/Connect to DuckDB Database](#toc1_2_) \n", |
21 | | - " - 1.3. [Create Tables in Database](#toc1_3_) \n", |
22 | | - "- 2. [Prepare Data](#toc2_) \n", |
23 | | - " - 2.1. [Preview Tables](#toc2_1_) \n", |
24 | | - " - 2.2. [Joining Tables](#toc2_2_) \n", |
25 | | - "- 3. [Analyzing Biotechnology Patents](#toc3_) \n", |
26 | | - " - 3.1. [IPC Codes](#toc3_1_) \n", |
27 | | - " - 3.2. [Trends Over Time](#toc3_2_) \n", |
28 | | - " - 3.3. [Assignee Types](#toc3_3_) \n", |
29 | | - " - 3.4. [U.S. Maps](#toc3_4_) \n", |
| 18 | + "- 1. [Overview](#toc1_) \n", |
| 19 | + " - 1.1. [System Dependencies](#toc1_1_) \n", |
| 20 | + "- 2. [Initialize Local DuckDB Database](#toc2_) \n", |
| 21 | + " - 2.1. [Download Bulk Data Files from PatentsView](#toc2_1_) \n", |
| 22 | + " - 2.2. [Create/Connect to DuckDB Database](#toc2_2_) \n", |
| 23 | + " - 2.3. [Create Tables in Database](#toc2_3_) \n", |
| 24 | + "- 3. [Prepare Data](#toc3_) \n", |
| 25 | + " - 3.1. [Preview Tables](#toc3_1_) \n", |
| 26 | + " - 3.2. [Joining Tables](#toc3_2_) \n", |
| 27 | + "- 4. [Analyzing Biotechnology Patents](#toc4_) \n", |
| 28 | + " - 4.1. [IPC Codes](#toc4_1_) \n", |
| 29 | + " - 4.2. [Trends Over Time](#toc4_2_) \n", |
| 30 | + " - 4.3. [Assignee Types](#toc4_3_) \n", |
| 31 | + " - 4.4. [U.S. Maps](#toc4_4_) \n", |
30 | 32 | "\n", |
31 | 33 | "<!-- vscode-jupyter-toc-config\n", |
32 | 34 | "\tnumbering=true\n", |
|
42 | 44 | "cell_type": "markdown", |
43 | 45 | "metadata": {}, |
44 | 46 | "source": [ |
45 | | - "## Overview\n", |
| 47 | + "## 1. <a id='toc1_'></a>[Overview](#toc0_)\n", |
46 | 48 | "\n", |
47 | 49 | "This notebook provides an example exploration of patents data using PatentsView's bulk data downloads, Python, and popular data science tools. It shows how to:\n", |
48 | 50 | "\n", |
|
52 | 54 | "\n", |
53 | 55 | "As a running example, we're looking into biotechnology patents identified via a subset of International Patent Classification (IPC) codes. Our goal is to explore the data, considering the distribution of biotechnology patent topic and the geographic distribution of biotechnology patent assignees.\n", |
54 | 56 | "\n", |
55 | | - "### System Dependencies\n", |
| 57 | + "### 1.1. <a id='toc1_1_'></a>[System Dependencies](#toc0_)\n", |
56 | 58 | "\n", |
57 | 59 | "You can install required packages for running this notebook using:\n", |
58 | 60 | "```python\n", |
|
70 | 72 | "cell_type": "markdown", |
71 | 73 | "metadata": {}, |
72 | 74 | "source": [ |
73 | | - "## 1. <a id='toc1_'></a>[Initialize Local DuckDB Database](#toc0_)\n", |
| 75 | + "## 2. <a id='toc2_'></a>[Initialize Local DuckDB Database](#toc0_)\n", |
74 | 76 | "\n", |
75 | 77 | "First, we set up a local DuckDB database to help efficiently process larger-than-memory data." |
76 | 78 | ] |
|
79 | 81 | "cell_type": "markdown", |
80 | 82 | "metadata": {}, |
81 | 83 | "source": [ |
82 | | - "### 1.1. <a id='toc1_1_'></a>[Download Bulk Data Files from PatentsView](#toc0_)" |
| 84 | + "### 2.1. <a id='toc2_1_'></a>[Download Bulk Data Files from PatentsView](#toc0_)" |
83 | 85 | ] |
84 | 86 | }, |
85 | 87 | { |
|
149 | 151 | "cell_type": "markdown", |
150 | 152 | "metadata": {}, |
151 | 153 | "source": [ |
152 | | - "### 1.2. <a id='toc1_2_'></a>[Create/Connect to DuckDB Database](#toc0_)" |
| 154 | + "### 2.2. <a id='toc2_2_'></a>[Create/Connect to DuckDB Database](#toc0_)" |
153 | 155 | ] |
154 | 156 | }, |
155 | 157 | { |
|
186 | 188 | "cell_type": "markdown", |
187 | 189 | "metadata": {}, |
188 | 190 | "source": [ |
189 | | - "### 1.3. <a id='toc1_3_'></a>[Create Tables in Database](#toc0_)" |
| 191 | + "### 2.3. <a id='toc2_3_'></a>[Create Tables in Database](#toc0_)" |
190 | 192 | ] |
191 | 193 | }, |
192 | 194 | { |
|
225 | 227 | "cell_type": "markdown", |
226 | 228 | "metadata": {}, |
227 | 229 | "source": [ |
228 | | - "## 2. <a id='toc2_'></a>[Prepare Data](#toc0_)\n", |
| 230 | + "## 3. <a id='toc3_'></a>[Prepare Data](#toc0_)\n", |
229 | 231 | "\n", |
230 | 232 | "With the data loaded in our local DuckDB database, we can now preview and prepare the data." |
231 | 233 | ] |
|
234 | 236 | "cell_type": "markdown", |
235 | 237 | "metadata": {}, |
236 | 238 | "source": [ |
237 | | - "### 2.1. <a id='toc2_1_'></a>[Preview Tables](#toc0_)" |
| 239 | + "### 3.1. <a id='toc3_1_'></a>[Preview Tables](#toc0_)" |
238 | 240 | ] |
239 | 241 | }, |
240 | 242 | { |
|
638 | 640 | "cell_type": "markdown", |
639 | 641 | "metadata": {}, |
640 | 642 | "source": [ |
641 | | - "### 2.2. <a id='toc2_2_'></a>[Joining Tables](#toc0_)" |
| 643 | + "### 3.2. <a id='toc3_2_'></a>[Joining Tables](#toc0_)" |
642 | 644 | ] |
643 | 645 | }, |
644 | 646 | { |
|
714 | 716 | "cell_type": "markdown", |
715 | 717 | "metadata": {}, |
716 | 718 | "source": [ |
717 | | - "## 3. <a id='toc3_'></a>[Analyzing Biotechnology Patents](#toc0_)" |
| 719 | + "## 4. <a id='toc4_'></a>[Analyzing Biotechnology Patents](#toc0_)" |
718 | 720 | ] |
719 | 721 | }, |
720 | 722 | { |
|
735 | 737 | "cell_type": "markdown", |
736 | 738 | "metadata": {}, |
737 | 739 | "source": [ |
738 | | - "### 3.1. <a id='toc3_1_'></a>[IPC Codes](#toc0_)" |
| 740 | + "### 4.1. <a id='toc4_1_'></a>[IPC Codes](#toc0_)" |
739 | 741 | ] |
740 | 742 | }, |
741 | 743 | { |
@@ -285131,7 +285133,7 @@ |
285131 | 285133 | "cell_type": "markdown", |
285132 | 285134 | "metadata": {}, |
285133 | 285135 | "source": [ |
285134 | | - "### 3.2. <a id='toc3_2_'></a>[Trends Over Time](#toc0_)" |
| 285136 | + "### 4.2. <a id='toc4_2_'></a>[Trends Over Time](#toc0_)" |
285135 | 285137 | ] |
285136 | 285138 | }, |
285137 | 285139 | { |
@@ -286168,7 +286170,7 @@ |
286168 | 286170 | "cell_type": "markdown", |
286169 | 286171 | "metadata": {}, |
286170 | 286172 | "source": [ |
286171 | | - "### 3.3. <a id='toc3_3_'></a>[Assignee Types](#toc0_)" |
| 286173 | + "### 4.3. <a id='toc4_3_'></a>[Assignee Types](#toc0_)" |
286172 | 286174 | ] |
286173 | 286175 | }, |
286174 | 286176 | { |
@@ -286526,7 +286528,7 @@ |
286526 | 286528 | "cell_type": "markdown", |
286527 | 286529 | "metadata": {}, |
286528 | 286530 | "source": [ |
286529 | | - "### 3.4. <a id='toc3_4_'></a>[U.S. Maps](#toc0_)" |
| 286531 | + "### 4.4. <a id='toc4_4_'></a>[U.S. Maps](#toc0_)" |
286530 | 286532 | ] |
286531 | 286533 | }, |
286532 | 286534 | { |
|
0 commit comments