Skip to content

fpasoslibres/AIMS-Justice-Miners

Repository files navigation

Justice Miners Team:

  • Lina María Gómez Mesa
  • Federico Melo Barrero
  • David Alejandro Fuquen Flórez
  • Sofía Prada Ávila
  • Santiago Martínez Novoa

AIMS Hackathon - Modern Slavery Detection & Analysis Project

This repository contains a comprehensive research project focused on detecting and analyzing modern slavery practices through data-driven solutions. The project is structured as a modular system with independent sub-projects organized in two main pillars, facilitating individual development, maintenance, and future enhancements.

Project Architecture

The solution is organized into independent sub-projects that can be developed and maintained separately, each addressing specific aspects of modern slavery detection and analysis. This modular approach enables:

  • Individual development: Each sub-project can be worked on independently.
  • Enhanced maintainability: Isolated components reduce complexity and dependencies.
  • Scalable future development: New features can be added without affecting existing modules.
  • Flexible deployment: Components can be deployed and scaled independently.

1. Problem Statement

Government agents and policy makers face significant challenges in automatically collecting Modern Slavery (MS) Statements, detecting and analyzing their visual elements, and performing transparency checks.

2. Objective

To develop an automated, data-driven solution that:

  • Systematically collects corporate modern slavery statements from multiple sources.
  • Enriches company profiles with relevant compliance and risk data.
  • Analyzes and visualizes patterns, trends, and compliance levels.
  • Provides actionable insights for researchers, policymakers, and organizations working to combat modern slavery.

3. Solution/Data Use Case Description

Our comprehensive solution consists of interconnected yet independent modules:

Data Collection Pipeline (Pillar 1)

  • Automated web crawler (collect_statements) that systematically gathers corporate modern slavery statements.
  • Company profile enrichment system (build_company_profiles) that enhances collected data with additional corporate information.
  • Visibility analysis scrapper (check_visibility) for assessing statement accessibility and compliance in companies webpage.

Analysis & Processing Engine (Pillar 2)

  • Multi-modal document processor that converts PDF statements into structured markdown with intelligent content extraction using vision-enabled LLMs.
  • Signature compliance extractor that systematically identifies and validates executive signatures, dates, and signatory information for regulatory compliance.
  • Batch content analyzer that scales document processing across directories and datasets with configurable LLM providers and extraction pipelines.

4. Pitch

Link to video

5-minute presentation showcasing the automated detection capabilities, dashboard functionality, and real-world impact potential of the modern slavery analysis platform.

5. Datasets

Location: /datasets

Our datasets are primarily generated by two key sub-projects within the collection pipeline:

Primary Datasets:

  • Corporate Statements Dataset (company_statements_crawled.zip): Generated by the collect_statements crawler

    • Modern slavery statements from corporate websites
    • Metadata including publication dates, company information, and accessibility metrics
  • Enriched Company Profiles (datasets_processed_company_profiles.zip): Produced by the build_company_profiles system

    • Enhanced corporate data with risk indicators
    • Industry classifications and supply chain information
    • Compliance history and regulatory data

Data Quality & Transformations:

  • Automated data validation pipelines ensuring consistency
  • Standardized formatting across multiple source types
  • Deduplication and merge processes for comprehensive company profiles
  • Quality assurance metrics and data lineage tracking

API Access:

  • RESTful APIs for accessing processed datasets
  • Real-time data streaming capabilities for dashboard integration
  • Secure authentication for sensitive compliance data

6. Project Code

Location: /project

The codebase is organized into two main pillars with independent sub-projects:

Pillar 1: Data Collection & Processing

pillar_1/
├── build_company_profiles/    # Company data enrichment
├── check_visibility/          # Compliance visibility analysis  
└── collect_statements/        # Web crawler for statement collection

Pillar 2: PDF extractor leveraging on open source LLMs

pillar_2/
├── adapters/                  # Data integration layer
├── core/                      # Main processing engine
├── models/                    # ML/AI analytical models
├── outputs/                   # Generated reports and results
├── prompts/                   # AI prompt engineering
└── utils/                     # Shared utilities

Key Features:

  • Modular design: Each sub-project operates independently
  • Scalable architecture: Components can be deployed separately
  • Comprehensive testing: Individual module testing capabilities
  • Configuration management: Centralized config system (config.py)
  • Easy deployment: Independent requirements and setup per module

7. Additional Documentation

Location: /docs

Included Documentation:

  • Project presentation: Comprehensive overview of methodology and results
  • Interactive dashboards: Real-time visualization tools that consume the generated datasets
    • Compliance trend analysis
    • Risk assessment visualizations
    • Corporate statement analytics
    • Industry comparison metrics

Dashboard Features:

  • Real-time data integration from project datasets
  • Interactive filtering by industry, region, and risk level
  • Trend analysis over time periods
  • Exportable reports for further analysis

8. Declaration of Intellectual Property

This project is developed as part of the AIMS Hackathon focused on combating modern slavery through technological innovation. The modular, open-source approach is designed to facilitate collaboration and continued development by the research community.

Technical Specifications

  • Python-based: Core implementation in Python 3.8+
  • Modular architecture: Independent sub-projects for scalability
  • API-first design: RESTful interfaces for data access
  • Dashboard integration: Real-time visualization capabilities
  • Docker support: Containerized deployment options

Getting Started

Each sub-project contains its own README with specific setup instructions. The modular design allows you to:

  1. Run individual components independently
  2. Scale specific modules based on requirements
  3. Contribute to specific areas without affecting other components
  4. Deploy incrementally as modules are completed

For detailed setup instructions, refer to the individual module documentation in each pillar directory.

This project builds on the open research of Project AIMS (AI against Modern Slavery) by Mila and QUT.

GitHub repository: ai4h_aims-au.

Disclaimers

Computational Resources & Comparative Results

  • Describe here the resources used in developing your solution (e.g. GPUs, etc).

No Claims About Companies

This repository and its accompanying models, datasets, metrics, dashboards, and comparative analyses are provided strictly for research and demonstration purposes.

Any comparisons, rankings, or assessments of companies or organizations are exploratory in nature. They may be affected by incomplete data, modeling limitations, or methodological choices. These results must not be used to make factual, legal, or reputational claims about any entity without independent expert review and validation.

Do not use this repository’s contents to make public statements or claims about specific companies, organizations, or individuals.

Terms and Conditions

By submitting this solution to the AIMS Hackathon, our team acknowledges and agrees to abide by the Event’s Terms and Conditions.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors