A Streamlit-based web application that extracts structured data about natural hazards (specifically floods) from news articles and web pages using Mistral AI.
- URL Content Extraction: Uses
trafilaturato fetch and clean article text from any given URL. - AI-Powered Analysis: Leverages Mistral AI models to identify and extract detailed information about flood events.
- Structured Data: Extracts metadata, flood types (Muddy, Pluvial, Fluvial, etc.), severity, hazard drivers, impact locations, and detailed damage reports (Human, Housing, Infrastructure, etc.).
- Multi-language Support: Can process articles in various languages and provides a summary in English.
- Interactive UI: Simple Streamlit interface for easy configuration and visualization of results.
This project uses uv for dependency management.
-
Clone the repository:
git clone https://github.com/ADSCIAN/pmf-extract.git cd pmf-extract -
Install dependencies: Using
uv:uv sync
Or using
pip:pip install -r requirements.txt
(Note: You can generate requirements.txt using
uv pip export -o requirements.txtif needed)
The application requires a Mistral AI API key.
- Obtain an API key from Mistral AI Console.
- You can either:
- Enter the key directly in the application sidebar.
- Create a
secret.pyfile in the root directory with the following content:MISTRAL_API_KEY = "your_api_key_here"
Run the Streamlit application:
uv run streamlit run app.pyOr if using standard python:
streamlit run app.py- Navigate to the Extractor page.
- Enter a URL of a news article about a flood or natural hazard.
- Click Extract Structured Data.
- View the extracted information in the interactive tables and JSON format.
app.py: Main Streamlit application and UI logic.document.py: Pydantic schemas and prompts for AI extraction.pyproject.toml: Project metadata and dependencies.secret.py: (Optional) Local storage for your API key.