Skip to content

FaaizS/DiskDBLite

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

18 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DiskDBLite

Overview

DiskDBLite is a from-scratch, disk-backed database engine built to understand how real databases store, index, and retrieve data at the byte and page level. The project focuses on core systems concepts such as B-Tree indexing, slotted page layouts, and SQL parsing — without relying on existing database libraries.

Features

  • Disk-backed storage - Data persists to disk using fixed-size 4KB pages
  • B-Tree indexing - Efficient O(log n) key lookups with automatic page splitting
  • SQL interface - Supports CREATE TABLE, INSERT, and SELECT statements
  • Interactive shell - REPL for executing SQL commands
  • Persistence - Database survives restarts with full data integrity
  • Metrics harness - Non-interactive benchmark and restart-time scripts with JSON output

Quick Start

Build

mvn clean package

Run Interactive Shell

# Using Maven
mvn exec:java -Dexec.mainClass="com.diskdblite.Main" -Dexec.args="mydb.db"

# Or using the JAR directly
java -jar target/diskdblite-1.0-SNAPSHOT.jar mydb.db

Run Metrics Harness

# Non-interactive benchmark (writes artifacts/bench/<timestamp>.json)
./scripts/bench.sh

# Restart-time probe (writes artifacts/recovery/<timestamp>.json)
./scripts/recovery.sh

Detailed metric definitions and reproducible commands are documented in docs/metrics.md.

Example Session

╔═══════════════════════════════════════════════════════╗
║         DiskDBLite v1.0.0 - Disk-Backed Database      ║
╚═══════════════════════════════════════════════════════╝

Created new database: mydb.db

Type 'help' for commands, 'exit' to quit.

diskdb> CREATE TABLE users (id INT PRIMARY KEY, name VARCHAR, email VARCHAR);
Table 'users' created.
Time: 5ms

diskdb> INSERT INTO users VALUES (1, 'Alice', 'alice@example.com');
1 row inserted.
Time: 2ms

diskdb> INSERT INTO users VALUES (2, 'Bob', 'bob@example.com');
1 row inserted.
Time: 1ms

diskdb> SELECT * FROM users;
id | name | email
------------------
1 | Alice | alice@example.com
2 | Bob | bob@example.com
(2 row(s))
Time: 1ms

diskdb> SELECT * FROM users WHERE id = 1;
id | name | email
------------------
1 | Alice | alice@example.com
(1 row(s))
Time: 0ms

diskdb> exit
Goodbye!

Removing a database

The database is a single file on disk (e.g. mydb.db). There is no in-app command to drop or clear it. To remove a database:

  1. Exit the shell (exit or quit) so the file is not in use.
  2. Delete the file (e.g. rm mydb.db in a terminal, or delete it in your file manager).

Supported SQL

CREATE TABLE

CREATE TABLE tablename (
    column1 INT PRIMARY KEY,
    column2 VARCHAR,
    column3 INTEGER
)

INSERT

INSERT INTO tablename VALUES (1, 'text', 42)

SELECT

-- Select all rows
SELECT * FROM tablename

-- Select with WHERE clause
SELECT * FROM tablename WHERE column = value

Shell Commands

Command Description
help Show help
tables List all tables
describe <table> Show table schema
exit Exit the shell

Architecture

┌─────────────────────────────────────────────────────────────┐
│                        SQL Shell                            │
│                      (Main.java)                            │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                      SQL Engine                             │
│                    (sql package)                            │
│         Lexer → Parser → Statement Objects                  │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                    Query Executor                           │
│                   (query package)                           │
│              High-level CRUD operations                     │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                       Catalog                               │
│                  (catalog package)                          │
│              Table metadata management                      │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                        B-Tree                               │
│                   (btree package)                           │
│          Key-value storage with indexing                    │
│     BTree ← BTreeLeafPage / BTreeInternalPage               │
└─────────────────────────────────────────────────────────────┘
                              │
                              ▼
┌─────────────────────────────────────────────────────────────┐
│                      Storage Layer                          │
│                  (storage package)                          │
│   Page → SlottedPage → FileManager → Disk                   │
└─────────────────────────────────────────────────────────────┘

Project Structure

src/main/java/com/diskdblite/
├── Main.java                    # Interactive SQL shell
├── bench/
│   ├── BenchRunner.java         # Non-interactive benchmark runner
│   └── RestartProbe.java        # Single-query restart-time probe
├── storage/
│   ├── Constants.java           # PAGE_SIZE, etc.
│   ├── FileManager.java         # Low-level file I/O
│   ├── Page.java                # 4KB page abstraction
│   ├── PageType.java            # Page type enum
│   └── SlottedPage.java         # Variable-length record storage
├── btree/
│   ├── BTree.java               # B-Tree index structure
│   ├── BTreeLeafPage.java       # Leaf nodes (store data)
│   └── BTreeInternalPage.java   # Internal nodes (routing)
├── catalog/
│   ├── Catalog.java             # Table registry
│   ├── TableInfo.java           # Table metadata
│   ├── ColumnInfo.java          # Column metadata
│   ├── ColumnType.java          # INT, VARCHAR types
│   └── Row.java                 # Row representation
├── query/
│   ├── QueryExecutor.java       # Execute operations
│   └── QueryResult.java         # Query results
└── sql/
    ├── Lexer.java               # SQL tokenizer
    ├── Token.java               # Token types
    ├── Parser.java              # SQL parser
    ├── Statement.java           # AST nodes
    ├── ParseException.java      # Parse errors
    └── SQLEngine.java           # Parsing + execution

scripts/
├── bench.sh                     # Build + benchmark runner wrapper
└── recovery.sh                  # Build + restart-time probe wrapper

docs/
└── metrics.md                   # Metric definitions and reproducibility guide

Design Principles

  1. Disk-backed storage - All data persists to disk in 4KB pages
  2. Explicit page layout - Clear byte-level serialization
  3. Separation of concerns - Each layer has a single responsibility
  4. Correctness over performance - Readable code, not optimized
  5. Minimal features - INSERT and SELECT only; UPDATE and DELETE are omitted because they would require altering the B-tree (key removal, rebalancing) and significantly complicate the implementation
  6. No concurrency - Single-threaded, no locks needed
  7. No transactions - No ACID guarantees beyond durability

Running Tests

# Run all tests
mvn test

# Run specific test class
mvn test -Dtest=IntegrationTest

# Run with output
mvn test -Dtest=IntegrationTest -q

Running Benchmarks

# Standard benchmark run
./scripts/bench.sh

# 5 benchmark trials
for i in 1 2 3 4 5; do ./scripts/bench.sh; done

# Restart-time measurement
./scripts/recovery.sh

Key Concepts Demonstrated

  • B-Tree indexing with automatic page splitting
  • Slotted page layout for variable-length records
  • Serialization of structured data to/from bytes
  • SQL parsing with lexer/parser architecture
  • Layered architecture for database systems

Limitations

  • Primary key must be INTEGER type
  • No UPDATE or DELETE (would require B-tree key removal and rebalancing)
  • No transactions or recovery
  • No concurrent access
  • No query optimizer
  • SELECT only supports * (no column selection)
  • WHERE only supports equality (=)
  • In the interactive shell, arrow keys may show as escape sequences (e.g. ^[[A); the shell still runs the correct command. For less visual clutter, avoid arrow keys or run in an environment that supports line editing.

License

Educational project - not for production use.

About

A from-scratch, disk-backed database engine in Java with B-Tree indexing, slotted pages, and a SQL REPL—no external DB libraries.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors