Skip to content

ClarissaDamas/Patent-Data-API-Graph-Modeling

Repository files navigation

Patent Data API & Graph Modeling

Overview

This project was developed during a Technological Initiation research program under the supervision of a PhD researcher (https://editoraessentia.iff.edu.br/index.php/conepe/article/view/20627).

The objective was to design and implement a system capable of extracting, structuring, and modeling large-scale patent data in order to enable refined search and relational analysis.

While massive patent datasets are publicly available through platforms like PatentsView, there is a lack of structured tools that support interactive filtering and relationship-based exploration. This project aimed to address that gap through API integration and graph data modeling.


Problem Statement

Patent databases contain extensive and complex information, including:

  • Inventors
  • Organizations
  • Technologies
  • Filing dates
  • Patent citations
  • Geographic data

Although accessible via public APIs, meaningful relational exploration requires:

  • Structured data ingestion
  • Efficient storage model
  • Relationship-focused database design

The project focused on building a structured patent data repository to support advanced exploration and analysis.


Technical Architecture

The system architecture followed these steps:

  1. Data Extraction

    • Integration with the PatentsView REST API
    • Data retrieval using Node.js
    • Direct communication with the PatentsView team for technical clarification
  2. Data Processing

    • JSON payload transformation
    • Structuring relational entities
  3. Database Evaluation

    • Comparative analysis of relational vs graph databases
    • Selection of Neo4J due to:
      • Native graph modeling
      • Efficient relationship queries
      • High performance for connected datasets
  4. Data Storage

    • Implementation of graph data modeling
    • Storage of patent entities and relationships in Neo4J

Development Methodology

The project followed the Scrum framework, organized into:

  • Two-week sprints
  • Incremental feature delivery
  • Continuous refinement

Technologies Used


Key Contributions

  • API integration and data ingestion
  • JSON data processing and transformation
  • Graph database modeling
  • Evaluation and selection of database technology
  • Agile collaboration in a research environment
  • Technical communication in English with external API providers

Results

The project resulted in a structured graph-based patent database capable of representing complex relationships between patent entities.

This foundation supports future development of interactive visualization and advanced filtering tools for patent exploration.


Notes

This repository contains the implementation code for data extraction and modeling. The live database instance is not publicly available.

About

Research project focused on large-scale patent data extraction, graph modeling, and interactive data exploration. The project integrates the PatentsView API using Node.js and stores structured patent relationships in Neo4J for advanced querying and visualization.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors