Skip to content

ADSCIAN/pmf-extract

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

PMF Extract

A Streamlit-based web application that extracts structured data about natural hazards (specifically floods) from news articles and web pages using Mistral AI.

Features

  • URL Content Extraction: Uses trafilatura to fetch and clean article text from any given URL.
  • AI-Powered Analysis: Leverages Mistral AI models to identify and extract detailed information about flood events.
  • Structured Data: Extracts metadata, flood types (Muddy, Pluvial, Fluvial, etc.), severity, hazard drivers, impact locations, and detailed damage reports (Human, Housing, Infrastructure, etc.).
  • Multi-language Support: Can process articles in various languages and provides a summary in English.
  • Interactive UI: Simple Streamlit interface for easy configuration and visualization of results.

Installation

This project uses uv for dependency management.

  1. Clone the repository:

    git clone https://github.com/ADSCIAN/pmf-extract.git
    cd pmf-extract
  2. Install dependencies: Using uv:

    uv sync

    Or using pip:

    pip install -r requirements.txt

    (Note: You can generate requirements.txt using uv pip export -o requirements.txt if needed)

Configuration

The application requires a Mistral AI API key.

  1. Obtain an API key from Mistral AI Console.
  2. You can either:
    • Enter the key directly in the application sidebar.
    • Create a secret.py file in the root directory with the following content:
      MISTRAL_API_KEY = "your_api_key_here"

Usage

Run the Streamlit application:

uv run streamlit run app.py

Or if using standard python:

streamlit run app.py
  1. Navigate to the Extractor page.
  2. Enter a URL of a news article about a flood or natural hazard.
  3. Click Extract Structured Data.
  4. View the extracted information in the interactive tables and JSON format.

Project Structure

  • app.py: Main Streamlit application and UI logic.
  • document.py: Pydantic schemas and prompts for AI extraction.
  • pyproject.toml: Project metadata and dependencies.
  • secret.py: (Optional) Local storage for your API key.

About

Demo for flood information extraction from news article using Mistral AI API and a streamlit app.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages