Skip to content

oscal-compass-lab/big-query

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

PyPI Download Analytics for compliance-trestle

Recent insights into PyPI package adoption and usage patterns

This repository contains automated BigQuery analytics and reports for PyPI packages.

Report Date: 2026-06-15


📊 Version Adoption Trends

Quarterly download trends by major version over the last 3 years. Shows version adoption patterns and migration trends across releases.

Quarterly Version Trends


🔑 Key Metrics Summary

Metric 30 Days 90 Days
Total Downloads 96,839 238,594
Countries Reached 52 67
CI/CD Installs 66.8% 67.8%
UV Adoption 15.3% 18.0%
Confirmed MCP Usage 185 420

🌍 Geographic Distribution

30-Day Analysis

90-Day Analysis

30-Day Geographic Distribution

90-Day Geographic Distribution

Countries:

Country Downloads %
US United States 89,081 92.0%
SG Singapore 2,063 2.1%
AE United Arab Emirates 1,165 1.2%
RU Russian Federation 830 0.9%
CN China 516 0.5%
SE Sweden 369 0.4%
JP Japan 353 0.4%
GB United Kingdom 345 0.4%
FR France 323 0.3%
HK Hong Kong 296 0.3%
ES Spain 279 0.3%
DE Germany 261 0.3%
IN India 145 0.1%
CH Switzerland 121 0.1%
CA Canada 116 0.1%
AU Australia 100 0.1%
IE Ireland 91 0.1%
NL Netherlands 77 0.1%
KR Korea, Republic of 41 0.0%
PL Poland 30 0.0%
PT Portugal 26 0.0%
IL Israel 26 0.0%
FI Finland 18 0.0%
PK Pakistan 18 0.0%
AT Austria 15 0.0%
NO Norway 14 0.0%
DK Denmark 14 0.0%
TW Taiwan, Province of China 13 0.0%
SA Saudi Arabia 12 0.0%
BE Belgium 10 0.0%
GR Greece 10 0.0%
IT Italy 9 0.0%
VN Viet Nam 7 0.0%
QA Qatar 6 0.0%
CZ Czechia 4 0.0%
RO Romania 4 0.0%
ZA South Africa 4 0.0%
MD Moldova, Republic of 3 0.0%
EE Estonia 3 0.0%
CL Chile 2 0.0%
BG Bulgaria 2 0.0%
BR Brazil 2 0.0%
CO Colombia 2 0.0%
CR Costa Rica 2 0.0%
TH Thailand 2 0.0%
RS Serbia 2 0.0%
MA Morocco 2 0.0%
AD Andorra 1 0.0%
BD Bangladesh 1 0.0%
ID Indonesia 1 0.0%
NZ New Zealand 1 0.0%
PE Peru 1 0.0%

Countries:

Country Downloads %
US United States 219,017 91.8%
SG Singapore 6,455 2.7%
RU Russian Federation 1,811 0.8%
CN China 1,791 0.8%
GB United Kingdom 1,628 0.7%
AE United Arab Emirates 1,171 0.5%
JP Japan 954 0.4%
DE Germany 735 0.3%
FR France 640 0.3%
KR Korea, Republic of 621 0.3%
ES Spain 491 0.2%
HK Hong Kong 486 0.2%
SE Sweden 422 0.2%
CA Canada 404 0.2%
IN India 371 0.2%
AU Australia 273 0.1%
TW Taiwan, Province of China 231 0.1%
IE Ireland 191 0.1%
CH Switzerland 171 0.1%
NL Netherlands 144 0.1%
PT Portugal 64 0.0%
FI Finland 52 0.0%
AT Austria 52 0.0%
IT Italy 47 0.0%
SA Saudi Arabia 39 0.0%
IL Israel 35 0.0%
PL Poland 35 0.0%
NO Norway 31 0.0%
DK Denmark 29 0.0%
PK Pakistan 24 0.0%
BE Belgium 17 0.0%
RO Romania 17 0.0%
VN Viet Nam 14 0.0%
EE Estonia 14 0.0%
QA Qatar 13 0.0%
CZ Czechia 12 0.0%
CR Costa Rica 11 0.0%
GR Greece 10 0.0%
BR Brazil 6 0.0%
CO Colombia 5 0.0%
BD Bangladesh 4 0.0%
ZA South Africa 4 0.0%
CL Chile 4 0.0%
NZ New Zealand 4 0.0%
PE Peru 3 0.0%
MD Moldova, Republic of 3 0.0%
MX Mexico 3 0.0%
PR Puerto Rico 3 0.0%
RS Serbia 2 0.0%
CY Cyprus 2 0.0%
AZ Azerbaijan 2 0.0%
TN Tunisia 2 0.0%
TH Thailand 2 0.0%
ID Indonesia 2 0.0%
GH Ghana 2 0.0%
BG Bulgaria 2 0.0%
LU Luxembourg 2 0.0%
MA Morocco 2 0.0%
MY Malaysia 2 0.0%
PH Philippines 2 0.0%
PY Paraguay 2 0.0%
AD Andorra 1 0.0%
GE Georgia 1 0.0%
EG Egypt 1 0.0%
LI Liechtenstein 1 0.0%
LT Lithuania 1 0.0%
TR Türkiye 1 0.0%

Key Insights:

  • US United States dominance (92.0% in 30d, 91.8% in 90d) consistent across periods
  • 52 countries (30d), 67 countries (90d) demonstrates global reach

🤖 MCP (Model Context Protocol) Usage Analysis

What is MCP?

MCP (Model Context Protocol) is Anthropic's protocol for connecting AI assistants like Claude to external tools and data sources. When developers use Claude Desktop with MCP servers, they often install Python packages via uvx (uv's tool runner).

Detection Methodology

Since MCP servers don't explicitly identify themselves in PyPI logs, we use proxy signals with significant limitations:

  1. HIGH Confidence: uvx subcommand usage (MCP's recommended pattern, but also used for other tools)
  2. Contextual: UV vs pip adoption trends (UV is MCP's recommended installer)
  3. Observational: CI vs non-CI patterns (shows usage context, not MCP specifically)

Important Limitations:

  • Install vs Usage: PyPI data shows package downloads, not actual execution - packages may be installed but never run
  • uvx Ambiguity: The uvx command is used for many tools beyond MCP servers (any Python CLI tool can be run via uvx)
  • Non-CI Context: Non-CI downloads don't isolate MCP usage - most PyPI downloads are non-CI regardless of use case
  • CI Detection Issues: The details.ci field in BigQuery is heuristically derived from user-agent strings (checking for patterns like "github", "travis", "jenkins") and is unreliable - many CI systems don't identify themselves, and some non-CI tools may match the patterns
  • User-Agent Limitations: Cannot distinguish MCP from other UV usage without access to raw user-agent strings, which are not available in the public BigQuery dataset
  • Proxy Signals Only: All MCP detection relies on indirect signals (installer choice, subcommand usage) rather than explicit MCP identification

MCP Analysis Charts

1. Installer Utilized

Shows which installer tool was used to download the package (pip, uv, or poetry). UV is a proxy for MCP since MCP clients use UV.

30 Days

Installer Share 30d

UV: 15.3% of downloads (14,863)

90 Days

Installer Share 90d

UV: 18.0% of downloads (42,922)

2. UV Subcommands (uvx = MCP Pattern)

Breaks down all UV downloads by which UV subcommand was used. The uvx command is the standard pattern MCP clients use to run MCP servers (e.g., Claude Desktop, Cline, etc.).

30 Days

UV Subcommands 30d

185 uvx downloads = HIGH confidence MCP

90 Days

UV Subcommands 90d

420 uvx downloads = HIGH confidence MCP

UV Subcommand Meanings:

  • sync - Synchronize project dependencies → CI/CD pipelines, developers syncing environments
  • pip install - UV's pip-compatible install command → CI/CD, automated builds, legacy workflows
  • no subcommand - UV downloads without subcommand data → Older UV versions or incomplete logging
  • run - Run a script in a virtual environment → Developers, test runners, automation scripts
  • tool install - Install a tool globally → Developers setting up their environment
  • uvx - Run a tool without installing it → MCP clients (Claude Desktop, Cline), developers trying tools
  • lock - Generate a lockfile for dependencies → Developers, CI/CD for reproducible builds
  • pip compile - Compile requirements files → CI/CD, dependency management workflows
  • add - Add a dependency to the project → Developers adding new packages
  • tool run - Run an installed tool → Developers, automation scripts
  • tool upgrade - Upgrade an installed tool → Developers maintaining tools

3. CI vs Non-CI Usage

Separates automated CI/CD installs from other downloads for pip, uv, poetry, and other installers.

30 Days

CI vs Non-CI 30d

UV: 32.9% non-CI (4,894 downloads)

90 Days

CI vs Non-CI 90d

UV: 41.5% non-CI (17,800 downloads)

4. Daily UV Trend

Time series showing daily UV download trends. Highlights confirmed uvx subcommand usage (MCP pattern) alongside total UV downloads to visualize MCP adoption patterns over time.

30 Days

Daily Trend 30d

185 uvx downloads over 30 days

90 Days

Daily Trend 90d

420 uvx downloads over 90 days

Key Findings:

30-Day Analysis:

  1. Confirmed MCP Usage: 185 downloads using uvx subcommand
  2. UV Adoption: 15.3% of downloads
  3. Interactive Usage: 32.9% of UV downloads are non-CI

MCP usage is detectable but small. The broader story is UV's growth as a modern Python installer.

90-Day Analysis:

  1. Confirmed MCP Usage: 420 downloads using uvx subcommand
  2. UV Adoption: 18.0% of downloads
  3. Interactive Usage: 41.5% of UV downloads are non-CI

MCP usage is detectable but small. The broader story is UV's growth as a modern Python installer.


🚀 Deployment Environment Analysis

Platform Distribution

Categorizes downloads by platform based on OS and distribution detection. Identifies AWS (Amazon Linux), Containers (Alpine), Enterprise (RHEL), Ubuntu, Debian, macOS, Windows, and other platforms. Shows the overall platform mix of package users.

30 Days

Platform Distribution 30d

90 Days

Platform Distribution 90d

Deployment Types

Shows the distribution of downloads across different deployment environments, automatically categorized based on OS, distribution, libc type, and CI detection. Categories may include containers, cloud VMs, CI/CD pipelines, and developer workstations.

30 Days

Deployment Types 30d

90 Days

Deployment Types 90d

Architecture Distribution

Shows CPU architecture breakdown (x86_64, ARM64, etc.) detected from download metadata. Tracks adoption of ARM-based systems like AWS Graviton and Apple Silicon.

30 Days

Architecture Distribution 30d

90 Days

Architecture Distribution 90d

Enterprise vs Cloud-Native

Compares traditional enterprise Linux distributions (RHEL, CentOS) against cloud-native platforms (Amazon Linux, Alpine). Indicates adoption patterns in regulated vs cloud-first environments.

30 Days

Enterprise vs Cloud-Native 30d

90 Days

Enterprise vs Cloud-Native 90d

libc Distribution (Container Signal)

Shows the distribution of C library implementations (glibc vs musl). musl libc is a strong indicator of containerized deployments, particularly Alpine Linux in Docker/Kubernetes.

30 Days

libc Distribution 30d

90 Days

libc Distribution 90d

Deployment Context

Categorizes downloads by deployment scenario based on OS type, Linux distribution, and CI detection. Shows patterns like containerized pipelines (Alpine+CI), cloud automation (Amazon Linux+CI), enterprise Linux (RHEL), CI environments, developer workstations (macOS/Windows), and other contexts.

30 Days

Use Cases 30d

90 Days

Use Cases 90d

Deployment Summary

Key deployment metrics at a glance: container adoption, cloud provider usage, enterprise deployment, CI/CD percentage, ARM architecture adoption, and musl libc usage.

30 Days

Deployment Summary 30d

90 Days

Deployment Summary 90d



🔄 Automated Updates

This repository is automatically updated weekly by GitHub Actions:

  • Schedule: Weekly on Mondays at 6 AM UTC (2 AM ET)
  • Authentication: Service account JSON key stored in GitHub secrets
  • Manual trigger: Available via GitHub Actions UI
  • Setup guide: See SETUP.md

🔍 Data Sources & Methodology

Data Source: Google BigQuery public dataset bigquery-public-data.pypi.file_downloads

Analysis Period:

  • 30-day reports: Last 30 days from data fetch date
  • 90-day reports: Last 90 days from data fetch date

Update Frequency:

  • Automated: Daily via GitHub Actions
  • Caching: Data fetched once per day, cached locally to minimize BigQuery costs
  • Cache Management: Old cache files automatically removed after successful new fetch
  • Manual trigger: Available for on-demand updates

Privacy: All data comes from PyPI's public dataset. No personal information is collected or stored.


Analytics powered by Google BigQuery and GitHub Actions

About

big-query

Resources

License

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors