moneysense data is a local, production-style data platform for collecting, storing, and analyzing historical market data from Polymarket, with a strong focus on historical backtesting of trading bots.
The core goal of this project is to build a reliable data foundation for researching, testing, and iterating on algorithmic trading strategies in prediction markets.
Building trading bots without high-quality historical data is pointless and it's hard to find reliable data for backtesting polymarket markets.
Polymarket does not provide clean, replayable datasets out of the box, especially for high-frequency signals like price changes, spreads, and short-term market dynamics. The issue with polymarket historical data endpoints is that it does show only final candle price which is irrelevant for backtesting as the price changes during the candle duration might be significant.
This platform solves that by:
- Recording live market price change events
- Storing it in queryable and compressed formats (ClickHouse and Parquet)
- Making it easy to replay history and run backtests
If you are experimenting with trading bots on Polymarket, this project is meant to be the data layer underneath them.
Flow:
- A Go-based scraper streams live Polymarket market events
- Events are published to Kafka-compatible infrastructure (Redpanda)
- The same stream is consumed by:
- a cold-path archivist writing Parquet files to object storage (MinIO) for long-term storage and backtesting
- (optional) a hot-path consumer writing to ClickHouse for real-time analytics and monitoring
- Data is used for trading bot backtesting and analytics
Note: For backtesting, you can read Parquet files directly from MinIO using DuckDB or Polars. The consumer service is optional and mainly useful for real-time monitoring and analytics.
Service Profiles:
--profile cold: Scraper + Archivist (Parquet only, best for backtesting)--profile hot: Scraper + Consumer (ClickHouse only, best for real-time analytics)--profile apps: Scraper + Consumer + Archivist (full platform)
The platform uses Docker Compose profiles to let you choose which services to run:
Option 1: Cold path only (Parquet archiving for backtesting)
docker compose --profile cold up -d --buildStarts: scraper + archivist → writes Parquet files to MinIO
Option 2: Hot path only (Real-time analytics)
docker compose --profile hot up -d --buildStarts: scraper + consumer → writes to ClickHouse for real-time queries
Option 3: Both paths (Full platform)
docker compose --profile apps up -d --buildStarts: scraper + consumer + archivist → both ClickHouse and Parquet
Option 4: With monitoring tools
docker compose --profile apps --profile monitoring up -d --buildAdds Redpanda Console (web UI) and Monitor (terminal dashboard)
You can modify the configuration in the
docker-compose.ymlfile.
You can find more informations about each service in specific README files in the apps directory.
apps/scraper/README.md
apps/consumer/README.md
apps/archivist/README.mdYou can configure the markets to scrape by modifying the MARKETS environment variable in the docker-compose.yml file. For example:
scraper:
build:
context: ./apps/scraper
dockerfile: Dockerfile
container_name: moneysense-scraper
profiles: ["apps"]
depends_on:
redpanda:
condition: service_healthy
volumes:
- ./recordings:/data
environment:
SINK_MODE: kafka
KAFKA_BROKERS: redpanda:9092
KAFKA_TOPIC: moneysense.events.raw
MARKETS: sol-updown-15m,btc-updown-15m
DISCORD_WEBHOOK_URL:
LOG_FILE: logs/scraper.logThe
MARKETSenvironment variable is a comma-separated list of markets to scrape without timestamp suffix.
Parquet files in MinIO are partitioned by date=YYYY-MM-DD/market=.../hour=HH/ where:
- Date and hour are in UTC - All timestamps (
source_ts) are stored in UTC, and partitioning uses UTC timezone - This ensures consistent partitioning regardless of your local timezone
- Example: An event at
2024-01-15T14:30:00Zwill be stored indate=2024-01-15/market=.../hour=14/
Parquet files are stored in MinIO with partitioning: date=YYYY-MM-DD/market=.../hour=HH/
Using DuckDB (Recommended for backtesting):
import duckdb
conn = duckdb.connect()
# Configure S3 access to MinIO
conn.execute("""
INSTALL httpfs;
LOAD httpfs;
SET s3_endpoint='localhost:9000';
SET s3_access_key_id='minioadmin';
SET s3_secret_access_key='minioadmin_change_me';
SET s3_use_ssl=false;
""")
# Read Parquet files directly from MinIO
df = conn.execute("""
SELECT *
FROM read_parquet('s3://lakehouse/date=2024-01-15/market=sol-updown-15m/hour=14/*.parquet')
WHERE event_type = 'price_change'
ORDER BY source_ts
""").df()
# Or query multiple partitions
df = conn.execute("""
SELECT *
FROM read_parquet([
's3://lakehouse/date=2024-01-15/market=sol-updown-15m/hour=14/*.parquet',
's3://lakehouse/date=2024-01-15/market=sol-updown-15m/hour=15/*.parquet'
])
""").df()Using Polars:
import polars as pl
# Read Parquet files from MinIO
df = pl.read_parquet(
"s3://lakehouse/date=2024-01-15/market=sol-updown-15m/hour=14/*.parquet",
storage_options={
"endpoint_url": "http://localhost:9000",
"access_key_id": "minioadmin",
"secret_access_key": "minioadmin_change_me",
"aws_allow_http": "true"
}
)
# Filter and process
price_changes = df.filter(pl.col("event_type") == "price_change")Using ClickHouse S3 Tables:
-- Create external table pointing to MinIO
CREATE TABLE events_parquet_s3
ENGINE = S3(
'http://minio:9000/lakehouse/date=*/market=*/hour=*/*.parquet',
'Parquet',
'minioadmin',
'minioadmin_change_me'
);
-- Query the data
SELECT * FROM events_parquet_s3
WHERE market = 'sol-updown-15m'
AND toDate(source_ts) = '2024-01-15'
LIMIT 100;# Connect to ClickHouse
docker exec -it moneysense-clickhouse clickhouse-client
# Query recent events
SELECT * FROM polymarket.events_raw
ORDER BY ingested_ts DESC
LIMIT 100;
# Aggregate by market
SELECT market, count() as events,
countIf(event_type = 'price_change') as price_changes
FROM polymarket.events_raw
GROUP BY market;
