Skip to content

datanexum/databridge-core

Repository files navigation

DataBridge Core

PyPI version Python License: MIT

Your finance team just spent 4 hours on VLOOKUP. This takes 5 seconds.

DataBridge Core is a Python toolkit for data reconciliation, profiling, ingestion, and Excel triage. Compare CSV files, find fuzzy matches, detect schema drift, scan Excel workbooks, and send results to Slack -- from the command line or Python.

pip install databridge-core

5-Second Demo

# Profile a file
databridge profile sales.csv

# Compare two sources -- find orphans, conflicts, match rate
databridge compare source.csv target.csv --keys id

# Fuzzy match names across systems
databridge fuzzy erp_accounts.csv gl_accounts.csv --column name --threshold 80

# Scan Excel files and classify by archetype
pip install 'databridge-core[triage]'
databridge triage ./excel_files/

Python API

from databridge_core import compare_hashes, profile_data, load_csv

# Profile your data
profile = profile_data("chart_of_accounts.csv")
print(f"{profile['rows']} rows, {profile['columns']} columns")
print(f"Potential keys: {profile['potential_key_columns']}")

# Compare two sources
result = compare_hashes("source.csv", "target.csv", key_columns="account_id")
stats = result["statistics"]
print(f"Match rate: {stats['match_rate_percent']}%")
print(f"Conflicts: {stats['conflicts']}, Orphans: {stats['total_orphans']}")

Smart Excel Import

from databridge_core import smart_import_excel, detect_anchor_cell

# Auto-detect header rows, skip junk, clean column names
df = smart_import_excel("messy_report.xlsx")

# Find the real data start in a complex spreadsheet
anchor = detect_anchor_cell("messy_report.xlsx")
print(f"Data starts at: {anchor['cell']}")

Templates

from databridge_core.templates import TemplateService

svc = TemplateService(templates_dir="templates")
templates = svc.list_templates(domain="accounting")
rec = svc.get_template_recommendations(industry="manufacturing", statement_type="pl")

Slack Integration

from databridge_core.integrations import SlackClient

slack = SlackClient(bot_token="xoxb-...")
slack.send_message("#data-ops", "Reconciliation complete: 99.5% match rate")
slack.post_reconciliation_report("#data-ops", result)

Excel Triage

from databridge_core.triage import scan_and_classify

result = scan_and_classify("./excel_files/", output_dir="./reports/")
print(f"Scanned {result['summary']['total_files']} files")
print(f"Archetypes: {result['summary']['archetype_counts']}")

Commands

Command Description
databridge profile <file> Profile data: structure, quality, cardinality
databridge compare <a> <b> --keys <col> Hash comparison: orphans, conflicts, match rate
databridge fuzzy <a> <b> -c <col> Fuzzy match columns across two files
databridge diff <a> <b> Text diff between two files
databridge drift <old> <new> Detect schema drift between CSVs
databridge transform <file> -c <col> --op upper Clean a column (upper/lower/strip/trim/remove_special)
databridge merge <a> <b> --keys <col> Merge two CSVs on key columns
databridge find "*.csv" Find files matching a pattern
databridge parse <text> Parse tabular data from messy text
databridge triage <dir> Scan Excel files and classify by archetype
databridge smart-import <file> Smart Excel import with anchor detection

Optional Extras

pip install 'databridge-core[fuzzy]'    # Fuzzy matching (rapidfuzz)
pip install 'databridge-core[pdf]'      # PDF text extraction (pypdf)
pip install 'databridge-core[ocr]'      # OCR image extraction (pytesseract)
pip install 'databridge-core[sql]'      # Database queries (sqlalchemy)
pip install 'databridge-core[triage]'   # Excel triage scanning (openpyxl)
pip install 'databridge-core[all]'      # Everything
pip install 'databridge-core[dev]'      # Development tools (pytest, ruff, build)

Modules

Module Description Extra Required
reconciler Hash comparison, fuzzy matching, diffing, merging -
profiler Data profiling, schema drift detection, expectations -
ingestion CSV, JSON, PDF, OCR, smart Excel import [pdf], [ocr]
linker Entity resolution and record linkage -
connectors Snowflake, database connectors [sql]
detection ERP detection, anomaly detection -
templates Industry hierarchy templates, skills, knowledge base -
integrations Slack client (BaseClient + SlackClient) -
triage Batch Excel scanning and archetype classification [triage]

Built for Finance

DataBridge Core is the open-source foundation of DataBridge AI -- a full platform for financial hierarchy management, dbt model generation, and enterprise data reconciliation with 336+ MCP tools.

How it works: Upload your Chart of Accounts. Get a production-ready financial hierarchy and dbt models. Zero config.

What's Next?

DataBridge Core provides the SDK foundation. For the full platform experience:

  • MCP Server (336+ tools): Headless AI-native data engine
  • Docker: docker run -p 786:786 ghcr.io/datanexum/databridge-mcp:latest
  • Claude Code Plugin: claude plugin install datanexum/databridge-plugin
  • Remote SSE: https://mcp.databridge.dataamplifier.io/sse

See the full documentation for details.

Changelog

See CHANGELOG.md for full version history.

License

MIT -- See LICENSE.

About

Upload your Chart of Accounts. Get a production-ready financial hierarchy and dbt models. Zero config.

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Contributors

Languages