Skip to content

Yashasvi2229/DataUnion

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

71 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Team: Waystar RoyCo
Theme: Open Innovation


DataUnion Logo

DataUnion

Ethical AI Data Economy

Empowering Transparency and Consent in the AI Data Economy

Where contributors own their data โ€ข Companies get quality datasets โ€ข Everyone wins

Next.js TypeScript Supabase Tailwind

๐Ÿ“ Architecture โ€ข ๐Ÿš€ Scalability โ€ข ๐Ÿ’ฐ Finance โ€ข ๐Ÿ“Š Research โ€ข ๐Ÿ”ฎ Round 2 Updates




๐Ÿšจ The Crisis in AI Data

The AI industry has a dirty secret: most training data is acquired without consent, compensation, or transparency.

The Current Reality

  • ๐Ÿดโ€โ˜ ๏ธ Unauthorized scraping is the industry standard
  • โš–๏ธ Billion-dollar lawsuits (NYT vs OpenAI, Getty vs Stability AI)
  • ๐Ÿ”’ Zero consent from people whose data powers AI
  • ๐Ÿ’ธ No compensation for creators and contributors
  • ๐Ÿ“‰ Poor quality from unverified, untraceable sources
  • ๐ŸŒ Legal uncertainty threatening AI innovation

The Impact

  • $1.5B settlement paid by Anthropic for copyright infringement
  • โ‚ฌ250M fine imposed on Google by French regulators
  • $67.4B annual losses from AI hallucinations (2024)
  • 50+ active lawsuits targeting AI companies for data theft
  • Regulatory crackdown underway (EU AI Act with โ‚ฌ35M fines)

๐Ÿ“Š See RESEARCH.md for detailed analysis


๐Ÿ’ก Introducing DataUnion

The world's first consent-driven AI data marketplace

We're building the infrastructure to make ethical AI development the new standard


๐ŸŽฏ How It Works

๐Ÿ‘ฅ For Contributors

OWN YOUR DATA

  • Full control over usage rights
  • Fair compensation for every use
  • Complete transparency & tracking
  • Revoke consent anytime

๐Ÿข For AI Companies

LICENSE WITH CONFIDENCE

  • Legally-sourced, consented data
  • Quality-verified datasets
  • Immutable audit trails
  • GDPR & AI Act compliant

๐ŸŒ For Society

ETHICAL AI FUTURE

  • Fair data economy
  • Full traceability
  • Legal certainty
  • Trust-based innovation

โœจ Core Features

Feature Description Benefit
๐Ÿ” Explicit Consent Granular permissions with one-click revocation You stay in control
๐Ÿ“Š Full Traceability Immutable audit logs for every transaction Complete transparency
๐Ÿ’ฐ Fair Compensation Automatic payouts based on actual usage Get paid what you deserve
๐Ÿค– AI Quality Engine Validates data integrity, assigns quality scores Higher value datasets
๐Ÿช Transparent Marketplace Browse verified datasets with clear pricing No hidden terms
๐Ÿ“ˆ Real-time Analytics Track your data's impact and earnings Stay informed

User Journeys

๐Ÿ‘ฅ Contributor Journey (Click to expand)
  1. ๐Ÿ”‘ Sign Up โ†’ Create account and verify email
  2. ๐Ÿ“ค Upload Data โ†’ Submit with customizable consent preferences
  3. โœ… Get Validated โ†’ AI engine analyzes and assigns quality score (0-100)
  4. ๐Ÿ“Š Track Usage โ†’ See which companies licensed your data in real-time
  5. ๐Ÿ’ฐ Earn Rewards โ†’ Receive automatic payouts when your data is used

Result: You control your data, earn fair compensation, and maintain full transparency.

๐Ÿข Company Journey (Click to expand)
  1. ๐Ÿ” Browse Marketplace โ†’ Discover verified, consented datasets
  2. ๐Ÿ“ˆ Review Metrics โ†’ Check quality scores, sample data, contributor count
  3. ๐Ÿ’ณ Purchase License โ†’ Transparent pricing, instant access via secure API
  4. โœ… Use Ethically โ†’ Full audit trail of every data access
  5. ๐Ÿ“‹ Stay Compliant โ†’ Automatic GDPR/AI Act compliance documentation

Result: Legal certainty, quality data, and ethical AI development.


๐ŸŽฌ See It In Action

Demo Flow

๐Ÿ“น Demo Flow: Upload Data โ†’ AI Validation โ†’ Quality Score โ†’ Marketplace โ†’ License โ†’ Payout

๐Ÿ—๏ธ System Architecture

DataUnion System Architecture

Production-ready architecture: Next.js โ€ข Supabase โ€ข Payment Gateway

๐Ÿ”„ Complete Data Journey

Data Lifecycle Flow

From contribution to payout in 7 transparent steps

1. CONTRIBUTE โ†’ Upload data with consent preferences
2. VALIDATE   โ†’ AI engine assigns quality score
3. POOL       โ†’ Data added to verified marketplace
4. LICENSE    โ†’ Companies purchase with transparent terms
5. TRACK      โ†’ Every use logged immutably
6. DISTRIBUTE โ†’ Revenue shared fairly
7. EARN       โ†’ Contributors receive automatic payouts

๐Ÿ“ For deep technical dive: TECHNICAL.md contains full architecture, database schema, and sequence diagrams


๐Ÿ› ๏ธ Tech Stack

Modern, Production-Ready Technologies

Layer Technology Why We Chose It
Frontend Next.js 16.1 (App Router) Server-side rendering, React Server Components, optimal performance
Language TypeScript 5.0 Type safety, better developer experience, fewer bugs
Styling Tailwind CSS v4 Utility-first, responsive design, small bundle size
Animations Framer Motion Smooth, professional UI interactions
Backend Supabase (PostgreSQL) Real-time, scalable, built-in auth, RLS security
Authentication Supabase Auth Secure, battle-tested, multiple providers
Deployment Vercel + Supabase Cloud Edge network, auto-scaling, zero config

๐Ÿ“ Project Structure

DataUnion/
โ”œโ”€โ”€ ๐Ÿ“„ README.md              โ† You are here!
โ”œโ”€โ”€ ๐Ÿ“ TECHNICAL.md           โ† Architecture & diagrams
โ”œโ”€โ”€ ๐Ÿ“Š RESEARCH.md            โ† Market analysis & regulations  
โ”œโ”€โ”€ ๐Ÿš€ Round2 Updates.md                         โ† Round 2 improvements
โ”‚
โ”œโ”€โ”€ app/                      โ† Next.js App Router
โ”‚   โ”œโ”€โ”€ page.tsx             โ† Landing page
โ”‚   โ”œโ”€โ”€ contributor/         โ† Contributor dashboard
โ”‚   โ”œโ”€โ”€ company/             โ† Company marketplace
โ”‚   โ””โ”€โ”€ walkthrough/         โ† Interactive transparency demo
โ”‚
โ”œโ”€โ”€ components/
โ”‚   โ”œโ”€โ”€ ui/                  โ† Reusable UI components
โ”‚   โ”œโ”€โ”€ dashboard/           โ† Dashboard widgets
โ”‚   โ””โ”€โ”€ walkthrough/         โ† Tutorial components
โ”‚
โ”œโ”€โ”€ lib/
โ”‚   โ”œโ”€โ”€ supabase/            โ† Database client & utilities
โ”‚   โ””โ”€โ”€ utils.ts             โ† Helper functions
โ”‚
โ”œโ”€โ”€ supabase/
โ”‚   โ”œโ”€โ”€ schema.sql           โ† Database schema (7 tables)
โ”‚   โ””โ”€โ”€ seed.sql             โ† Demo data
โ”‚
โ””โ”€โ”€ docs/
    โ””โ”€โ”€ diagrams/            โ† Architecture visualizations
        โ”œโ”€โ”€ system-architecture.jpg
        โ”œโ”€โ”€ data-lifecycle.jpg
        โ”œโ”€โ”€ database-schema.jpg
        โ””โ”€โ”€ sequence-diagram.jpg

๐Ÿš€ Quick Start

Prerequisites

โœ… Node.js 18 or higher
โœ… npm or yarn
โœ… Supabase account (free tier works!)

Installation in 5 Minutes

# 1๏ธโƒฃ Clone the repository
git clone https://github.com/theDakshJaitly/DataUnion.git
cd DataUnion

# 2๏ธโƒฃ Install dependencies
npm install

# 3๏ธโƒฃ Set up environment variables
cp env.example .env.local

# Edit .env.local and add:
# NEXT_PUBLIC_SUPABASE_URL=your_supabase_url
# NEXT_PUBLIC_SUPABASE_ANON_KEY=your_supabase_key

# 4๏ธโƒฃ Initialize database
# - Open Supabase SQL Editor
# - Run supabase/schema.sql
# - (Optional) Run supabase/seed.sql for demo data

# 5๏ธโƒฃ Start the app
npm run dev

# โœ… Open http://localhost:3000

๐Ÿ“š Complete Documentation

Document What's Inside
๐Ÿ“ TECHNICAL.md System architecture โ€ข 4 detailed diagrams โ€ข Database schema โ€ข Scalability & security
๐Ÿš€ SCALABILITY.md 4-phase scaling strategy โ€ข Failure handling โ€ข Circuit breakers โ€ข 99.5% uptime target
๐Ÿ’ฐ FINANCE.md 90/10 revenue split โ€ข Market analysis ($17B by 2032) โ€ข Business model โ€ข PESTLE analysis
๐Ÿ“Š RESEARCH.md Problem analysis โ€ข Regulatory landscape (GDPR, EU AI Act) โ€ข Market statistics โ€ข Case studies
๐Ÿ”ฎ Round2 Updates.md Round 2 features โ€ข Technical improvements โ€ข Scaling strategy โ€ข Timeline

๐Ÿ“Š Why This Matters

The AI Data Economy is Worth Billions, But It's Built on Broken Foundations

DataUnion fixes this by providing:

  • โœ… Legal Certainty for AI companies (no more lawsuit risk)
  • โœ… Fair Compensation for data contributors (share in the value you create)
  • โœ… Transparency in the AI supply chain (know where data comes from)
  • โœ… Trust through verifiable consent and immutable audit trails

๐Ÿ’ก Market Opportunity: The ethical AI data market is projected to reach $15B by 2030
๐Ÿ“Š Current Crisis: $67.4B lost annually from poor-quality AI data (hallucinations)
โš–๏ธ Legal Risk: $1.75B+ already paid in settlements (Anthropic $1.5B + Google โ‚ฌ250M)

Source: Analysis in RESEARCH.md


๐Ÿ‘ฅ Meet the Team

Team Member Role Contribution
Yashasvi Pandey ๐Ÿ’ป Full-Stack Developer, Team Lead UI โ€ข Main README โ€ข Transaction Sequence
Daksh Jaitly ๐Ÿ’ป Full-Stack Developer, Designer System Architecture โ€ข Database design โ€ข Scalability โ€ข Technical documentation
Arjun Sharma ๐Ÿ“Š Research Analyst, Designer Market research โ€ข Client-side AI Engine โ€ข Financial Model
Shivansh Sharma Frontend Developer, ๐Ÿ“Š Research Analyst Authentication โ€ข Round2 Updates โ€ข Designing

Collaborative Development: All code reviews, architectural decisions, and documentation done as a team

๐Ÿ†• Round 2 Changes Implemented

๐Ÿง  Client-Side AI Engine

Zero-Server Inference Architecture

  • AI quality scoring runs entirely in the user's browser via Web Workers
  • @xenova/transformers (WASM-optimized language model)
  • 100 concurrent uploads with 0% server CPU increase
  • Instant feedback (no network latency)

Quality Scoring Formula:

  • Domain Relevance (35%)
  • Semantic Coherence (35%)
  • Entity Density (20%)
  • Readability (10%)

๐Ÿ“ See TECHNICAL.md for architecture details

๐Ÿ’ฐ Advanced Financial Model

90/10 Revenue Split

  • Contributors get 90% of every license sale
  • Platform takes only 10% (vs industry 50-60%)
  • Quality-weighted payout distribution
  • Atomic transactions + idempotency keys

Market Opportunity:

  • Projected $17B market by 2032
  • Targeting high-value RLHF sector ($1,400-$56,000/domain)
  • Undercutting Scale AI's 50% take rate

๐Ÿ’ฐ See FINANCE.md for full economic analysis

๐Ÿ›ก๏ธ Failure Handling System

Mission-Critical Resilience

AI Engine Failure:
โ†’ Graceful degradation to server-side queue

Database Outage:
โ†’ Circuit breaker pattern, read-only mode

Transaction Failures:
โ†’ Atomic rollback, zero double-charges

Monitoring:
โ†’ Sentry, Supabase Logs, Vercel Analytics

๐Ÿš€ See SCALABILITY AND FAILURE HANDLING.md for contingency plans

๐Ÿ“ˆ 4-Phase Scalability Strategy

Growth Roadmap:

Phase 1 (0-100 users):
Single Supabase instance, client-side AI

Phase 2 (100-1K users):
Connection pooling, edge caching

Phase 3 (1K-10K users):
Read replicas, CDN, composite indices

Phase 4 (10K+ users):
Sharding, geo-replication, async payouts

Target: 99.5% uptime, <2s transaction latency

๐Ÿš€ See SCALABILITY AND FAILURE HANDLING.md for phase details



๐Ÿ™Œ Acknowledgments

Built for Hack the Winter - The Second Wave (Angry Bird Edition)
Graphic Era Hill University, Bhimtal

Powered by Next.js โ€ข Supabase โ€ข Tailwind CSS โ€ข The Open Source Community


๐Ÿ“„ Documentation Hub

Quick Links for Judges:

Technical Business Planning
Architecture Finance Model Round 2 Updates
Scalability Market Research Live Demo

๐ŸŒŸ Star this repo if you believe in ethical AI! ๐ŸŒŸ

โญ Star on GitHub โ€ข ๐Ÿ“ View Architecture โ€ข ๐Ÿ’ฐ View Finance โ€ข ๐Ÿš€ View Scalability โ€ข ๐Ÿ“Š Read Research

Building an Ethical AI Future, One Dataset at a Time โค๏ธ

About

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages