SIEM-equivalent AWS CloudTrail threat hunting on a single ordinary laptop β no cloud infrastructure required.
Drop in your CloudTrail logs, run one command, and start hunting threats immediately.
- No-query hunting β select a built-in hunt from the Streamlit dropdown and get instant results β no SQL knowledge required
- AI-assisted analysis β OpenAI API (
gpt-5.4) automatically analyses query result DataFrames and surfaces key findings in plain language - GeoIP enrichment β country, city, and ASN for every source IP via MaxMind GeoLite2
- Built-in BI dashboard β Apache Superset with pre-built CloudTrail charts
- Single-command launch β
docker compose up -d
Three Docker containers share one DuckDB file via a bind mount (docker/data/db/).
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Docker Compose β
β β
β ββββββββββββββββ ββββββββββββββββ βββββββββββββββ β
β β ingester β β agent β β dashboard β β
β β (Rust) β β (Streamlit) β β (Superset) β β
β β β β β β β β
β β CloudTrail β β AI Chat β β Visualize β β
β β gz ingest β β SQL gen/exec β β β β
β β READ_WRITE β β READ_ONLY β β READ_ONLY β β
β ββββββββ¬ββββββββ ββββββββ¬ββββββββ ββββββββ¬βββββββ β
β ββββββββββββββββββββ΄ββββββββββββββββββ β
β β β
β βββββββββΌβββββββ β
β β DuckDB β β
β β (Bind Mount) β β
β β (SSD) β β
β ββββββββββββββββ β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
The diagram below shows the full lifecycle from log ingestion through to a completed AI-assisted threat hunting session.
sequenceDiagram
participant OPS as Operator
participant ING as ingester (Rust)
participant DB as DuckDB (bind mount)
participant APP as chat / Streamlit
participant OAI as OpenAI API
participant SS as dashboard / Superset
participant U as Analyst (Browser)
Note over OPS,ING: Phase 1 β Ingest
OPS->>ING: docker compose run ingester ingest --path /data/logs
ING->>ING: walk & filter files (date, path glob)
ING->>ING: parallel parse (rayon) + SHA-256 dedup
ING->>DB: batch insert via DuckDB Appender (READ_WRITE)
ING->>DB: GeoIP enrich (optional)
ING-->>OPS: IngestStats printed
Note over OPS,SS: Phase 2 β Start services
OPS->>APP: docker compose up -d
OPS->>SS: docker compose up -d
APP->>DB: open READ_ONLY connection
SS->>DB: open READ_ONLY connection
Note over U,OAI: Phase 3 β AI-assisted hunting (chat)
U->>APP: natural language question
APP->>OAI: generate_sql(question, schema, history)
OAI-->>APP: SQL string
APP->>APP: apply_date_filter + apply_row_limit
APP->>APP: validate_query (blocklist + EXPLAIN)
APP->>DB: execute SQL (READ_ONLY)
DB-->>APP: result rows (DataFrame)
APP->>OAI: generate_analysis(sql, results)
OAI-->>APP: fact-based Markdown summary
APP-->>U: table + analysis + chat history
Note over U,SS: Phase 4 β BI dashboard (Superset)
U->>SS: open http://localhost:8088
SS->>DB: execute chart queries (READ_ONLY)
DB-->>SS: aggregated result sets
SS-->>U: interactive charts + filters
| Requirement | Details |
|---|---|
| Docker | Docker Desktop or Docker Engine + Compose v2 |
| Resources | 16 GB RAM minimum, SSD recommended |
| CloudTrail logs | .json or .json.gz files exported from AWS |
| (Optional) OpenAI API key | Required for AI query generation |
| (Optional) MaxMind GeoLite2 | .mmdb files for GeoIP enrichment |
# 1. Clone
git clone https://github.com/fukusuket/THuntCloud.git
cd THuntCloud/docker
# 2. Place CloudTrail logs
cp /path/to/cloudtrail/logs/*.json.gz logs/
# 3. Ingest logs
docker compose --profile ingest run --rm ingester ingest --path /data/logs
# 4. Start all services
docker compose up -d --buildOpen http://localhost:8501 (AI Chat) or http://localhost:8088 (Dashboard, admin/admin).
Place GeoLite2 .mmdb files in docker/data/geoip/, then:
docker compose --profile ingest run --rm ingester ingest \
--path /data/logs \
--geoip-city /data/geoip/GeoLite2-City.mmdb \
--geoip-asn /data/geoip/GeoLite2-ASN.mmdbAll commands are run from the docker/ directory.
docker compose down && docker compose up -d --build # Rebuild & restart
docker compose logs -f # View logs
docker compose --profile resync run --rm superset-resync # Fix blank dashboard after re-ingest| Module | Language | Role | README |
|---|---|---|---|
ingester |
Rust 1.85+ | CloudTrail log ingestion (READ_WRITE) | ingester/README.md |
agent |
Python 3.12+ / Streamlit | AI-assisted interactive chat for threat hunting (READ_ONLY) | agent/README.md |
dashboard |
Apache Superset | BI visualization (READ_ONLY) | dashboard/README.md |
Apache License 2.0 β see LICENSE for details. See NOTICE for third-party license attributions.
This project exists thanks to these wonderful projects and datasets :)
- Yamato Security β suzaku-sample-data
- Suzaku β Suzaku, a CloudTrail log analysis tool created by Yamato Security
- flaws.cloud β intentionally vulnerable AWS CloudTrail dataset
- Apache Superset β BI platform
- DuckDB β embedded analytical database
- SIEM on Amazon OpenSearch Service β SIEM-like CloudTrail analytics reference implementation
- AWS CloudTrail Lake query samples β CloudTrail Lake query examples
- MaxMind GeoLite2 β GeoIP databases

