This project was developed during a Technological Initiation research program under the supervision of a PhD researcher (https://editoraessentia.iff.edu.br/index.php/conepe/article/view/20627).
The objective was to design and implement a system capable of extracting, structuring, and modeling large-scale patent data in order to enable refined search and relational analysis.
While massive patent datasets are publicly available through platforms like PatentsView, there is a lack of structured tools that support interactive filtering and relationship-based exploration. This project aimed to address that gap through API integration and graph data modeling.
Patent databases contain extensive and complex information, including:
- Inventors
- Organizations
- Technologies
- Filing dates
- Patent citations
- Geographic data
Although accessible via public APIs, meaningful relational exploration requires:
- Structured data ingestion
- Efficient storage model
- Relationship-focused database design
The project focused on building a structured patent data repository to support advanced exploration and analysis.
The system architecture followed these steps:
-
Data Extraction
- Integration with the PatentsView REST API
- Data retrieval using Node.js
- Direct communication with the PatentsView team for technical clarification
-
Data Processing
- JSON payload transformation
- Structuring relational entities
-
Database Evaluation
- Comparative analysis of relational vs graph databases
- Selection of Neo4J due to:
- Native graph modeling
- Efficient relationship queries
- High performance for connected datasets
-
Data Storage
- Implementation of graph data modeling
- Storage of patent entities and relationships in Neo4J
The project followed the Scrum framework, organized into:
- Two-week sprints
- Incremental feature delivery
- Continuous refinement
- Node.js
- REST API Integration
- JSON Processing
- Neo4J (Graph Database) - https://graphacademy.neo4j.com/c/597f2254-e561-4859-b1b0-51500d2d4345/
- Selenium
- Scrum
- API integration and data ingestion
- JSON data processing and transformation
- Graph database modeling
- Evaluation and selection of database technology
- Agile collaboration in a research environment
- Technical communication in English with external API providers
The project resulted in a structured graph-based patent database capable of representing complex relationships between patent entities.
This foundation supports future development of interactive visualization and advanced filtering tools for patent exploration.
This repository contains the implementation code for data extraction and modeling. The live database instance is not publicly available.