Patent Data API & Graph Modeling

Overview

This project was developed during a Technological Initiation research program under the supervision of a PhD researcher (https://editoraessentia.iff.edu.br/index.php/conepe/article/view/20627).

The objective was to design and implement a system capable of extracting, structuring, and modeling large-scale patent data in order to enable refined search and relational analysis.

While massive patent datasets are publicly available through platforms like PatentsView, there is a lack of structured tools that support interactive filtering and relationship-based exploration. This project aimed to address that gap through API integration and graph data modeling.

Problem Statement

Patent databases contain extensive and complex information, including:

Inventors
Organizations
Technologies
Filing dates
Patent citations
Geographic data

Although accessible via public APIs, meaningful relational exploration requires:

Structured data ingestion
Efficient storage model
Relationship-focused database design

The project focused on building a structured patent data repository to support advanced exploration and analysis.

Technical Architecture

The system architecture followed these steps:

Data Extraction
- Integration with the PatentsView REST API
- Data retrieval using Node.js
- Direct communication with the PatentsView team for technical clarification
Data Processing
- JSON payload transformation
- Structuring relational entities
Database Evaluation
- Comparative analysis of relational vs graph databases
- Selection of Neo4J due to:
  - Native graph modeling
  - Efficient relationship queries
  - High performance for connected datasets
Data Storage
- Implementation of graph data modeling
- Storage of patent entities and relationships in Neo4J

Development Methodology

The project followed the Scrum framework, organized into:

Two-week sprints
Incremental feature delivery
Continuous refinement

Technologies Used

Node.js
REST API Integration
JSON Processing
Neo4J (Graph Database) - https://graphacademy.neo4j.com/c/597f2254-e561-4859-b1b0-51500d2d4345/
Selenium
Scrum

Key Contributions

API integration and data ingestion
JSON data processing and transformation
Graph database modeling
Evaluation and selection of database technology
Agile collaboration in a research environment
Technical communication in English with external API providers

Results

The project resulted in a structured graph-based patent database capable of representing complex relationships between patent entities.

This foundation supports future development of interactive visualization and advanced filtering tools for patent exploration.

Notes

This repository contains the implementation code for data extraction and modeling. The live database instance is not publicly available.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
Node4J_test		Node4J_test
PatentsView_Drive		PatentsView_Drive
Patentsgov_drive		Patentsgov_drive
__pycache__		__pycache__
arquivos desatualizados		arquivos desatualizados
newapp_react		newapp_react
LICENSE		LICENSE
README.md		README.md
geckodriver.log		geckodriver.log

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Patent Data API & Graph Modeling

Overview

Problem Statement

Technical Architecture

Development Methodology

Technologies Used

Key Contributions

Results

Notes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Patent Data API & Graph Modeling

Overview

Problem Statement

Technical Architecture

Development Methodology

Technologies Used

Key Contributions

Results

Notes

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages