Skip to content

byteink/ByteBucket

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

120 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

ByteBucket

Self-hosted, S3-compatible object storage. One small Go binary, a React admin UI, and a filesystem-backed store. Speaks the AWS S3 wire protocol on one port and exposes the same operations through a browser-friendly admin API on another.

docker pull ghcr.io/byteink/bytebucket:latest
  • S3 API on port 9000 β€” AWS Signature V4, XML responses. Works with the AWS SDK, aws s3, rclone, s3cmd, mc, Terraform, boto3, anything that speaks S3.
  • Admin API + UI on port 9001 β€” header-authenticated JSON surface plus an embedded React dashboard at /.
  • Multipart upload, per-bucket CORS, presigned URLs, real ETags, request IDs, structured JSON logs, Prometheus metrics.

Heads up β€” security. The admin port (9001) must not be exposed to the public internet. Put it behind a private network, VPN, SSH tunnel, or reverse proxy with access control. Details and the deferred-hardening list are in SECURITY.md.

Working on the code? See DEVELOPMENT.md for the contributor guide (repo layout, local setup, Vite dev loop, testing, release flow, conventions).


Contents

  1. Quick start
  2. Configuration
  3. Admin web UI
  4. S3 API (port 9000)
  5. Admin API (port 9001)
  6. Per-bucket CORS
  7. Observability
  8. Using it from code
  9. Storage layout and persistence
  10. Limits
  11. Troubleshooting
  12. License

Quick start

Docker

docker run -d \
  --name bytebucket \
  -p 9000:9000 \
  -p 9001:9001 \
  -v bytebucket-data:/data \
  -e ENCRYPTION_KEY="$(openssl rand -base64 32)" \
  -e ACCESS_KEY_ID="admin" \
  -e SECRET_ACCESS_KEY="$(openssl rand -base64 32)" \
  ghcr.io/byteink/bytebucket:latest

Then open http://localhost:9001 and log in with the admin access key / secret you just set.

docker compose

services:
  bytebucket:
    image: ghcr.io/byteink/bytebucket:latest
    restart: unless-stopped
    ports:
      - "9000:9000"
      - "9001:9001"
    environment:
      ENCRYPTION_KEY: "32-byte-random-or-base64-encoded-key"
      ACCESS_KEY_ID: "admin"
      SECRET_ACCESS_KEY: "your-strong-secret"
    volumes:
      - bytebucket-data:/data

volumes:
  bytebucket-data:

On first boot only, the server reads ACCESS_KEY_ID / SECRET_ACCESS_KEY to create a super-user in BoltDB. After that, those env vars are ignored β€” rotate credentials through the admin API. ENCRYPTION_KEY is required on every boot (it decrypts stored secrets at rest).


Configuration

All configuration is via environment variables.

Variable Required Default Description
ENCRYPTION_KEY yes β€” 32 raw bytes or base64-encoded 32-byte key. Encrypts stored user secrets at rest. Lose it, lose every credential. Rotate carefully.
ACCESS_KEY_ID first boot only β€” Super-user access key, used once to seed the user database.
SECRET_ACCESS_KEY first boot only β€” Super-user secret, same.
GIN_MODE no debug Set to release in production. The provided Docker image sets this.
LOG_LEVEL no info debug, info, warn, error.
LOG_FORMAT no json json for production / log aggregators, text for local dev readability.
RATE_LIMIT_ENABLED no false Master switch for per-client request rate limiting. Off by default; when disabled the middleware short-circuits after one atomic read, so the per-request cost is negligible. Seeds the baseline that a runtime override (see below) can replace.
RATE_LIMIT_RPS no 0 Sustained requests per second allowed per client IP (token refill rate). Only meaningful when limiting is enabled.
RATE_LIMIT_BURST no 0 Token-bucket depth: the largest instantaneous spike one client may make before the sustained RPS rate gates it.
RATE_LIMIT_TRUSTED_PROXIES no 0 Number of reverse-proxy hops in front of the server. Selects which X-Forwarded-For entry is the real client. 0 ignores X-Forwarded-For and keys on the socket peer.

Rate limiting

Off by default. Set RATE_LIMIT_ENABLED=true to throttle requests per client IP with a token bucket. It runs early in the chain on both ports (after logging/metrics, before auth), so an unauthenticated flood is rejected before it reaches signature verification or the filesystem. Both ports share one per-IP budget, so a client cannot double its allowance by splitting traffic across the S3 and admin surfaces.

  • Set RATE_LIMIT_RPS (sustained rate) and RATE_LIMIT_BURST (spike depth) together. A request that exceeds its bucket gets 503 Slow Down with a Retry-After header β€” AWS SDKs treat SlowDown as retryable and back off automatically.
  • Behind a proxy, set RATE_LIMIT_TRUSTED_PROXIES to the number of hops in front of ByteBucket (e.g. 1 for a single nginx / traefik / ALB). The client IP is resolved by counting that many trusted hops in from the right of X-Forwarded-For; the nearest proxy is the connection peer and is not in the header. Leaving it at 0 ignores X-Forwarded-For and keys on the socket peer β€” correct only when ByteBucket is directly exposed. Match it to your actual topology: setting it too low lets a client spoof its limiter key by prepending X-Forwarded-For entries.
  • The limiter store is bounded (hard entry cap plus idle eviction), so it cannot be turned into a memory-exhaustion vector by an attacker minting source IPs.
  • Runtime override. The RATE_LIMIT_* variables are only the startup baseline. An admin can enable, disable, or retune limiting at runtime from the dashboard's Settings page (or PUT /api/config/ratelimit) without a restart; changes apply live to both ports. A saved override is persisted and wins over the environment until you clear it ("Reset to defaults" / DELETE /api/config/ratelimit), which reverts to the RATE_LIMIT_* baseline.

Ports

Port Role Auth Expose publicly?
9000 S3 wire protocol AWS SigV4 Yes, if that's the point.
9001 Admin API + web UI + /metrics X-Admin-AccessKey + X-Admin-Secret headers No. Keep private.

Persistence

One volume at /data. Layout:

/data
  users.db              # BoltDB β€” users, ACLs, encrypted secrets
  objects/<bucket>/...  # object bytes + .meta / .tags.json / .acl.json / .cors.json sidecars
  uploads/<bucket>/...  # in-flight multipart uploads

Back up /data as a unit. Objects and metadata are on a filesystem; any snapshot / rsync / restic flow works.


Admin web UI

Port 9001 serves a minimal React dashboard at /. It is same-origin with the admin API β€” no CORS, no AWS SDK in the browser, no third-party calls.

  • Log in with the admin access key and secret.
  • Manage users and per-user ACLs.
  • Create / list / delete buckets.
  • Browse, upload, download, delete objects.
  • Edit per-bucket CORS as a JSON document.

Credentials live in the browser's localStorage for the session and are sent on every request as X-Admin-* headers. There are no session cookies, no CSRF tokens, no login rate limiting β€” that's why the admin port must not be public. See SECURITY.md for the hardening backlog.


S3 API (port 9000)

Standard S3 surface. Any S3 client pointed at http://<host>:9000 with forcePathStyle: true and a user's access key / secret works.

Operations

  • Buckets β€” PUT /:bucket, GET /, GET /:bucket (list objects), DELETE /:bucket, HEAD /:bucket.
  • Objects β€” PUT /:bucket/:key, GET /:bucket/:key, HEAD /:bucket/:key, DELETE /:bucket/:key. GET honours the Range: header for partial / resumable downloads, returning 206 Partial Content with Content-Range; HEAD and full GET advertise Accept-Ranges: bytes.
  • Multipart upload β€” POST /:bucket/:key?uploads, PUT /:bucket/:key?partNumber=N&uploadId=X, POST /:bucket/:key?uploadId=X (complete), DELETE /:bucket/:key?uploadId=X (abort), GET /:bucket?uploads (list uploads), GET /:bucket/:key?uploadId=X (list parts).
  • Object tagging β€” PUT /:bucket/:key?tagging, GET /:bucket/:key?tagging, DELETE /:bucket/:key?tagging. Up to 10 tags per object (key 1-128, value 0-256 UTF-8 chars, no duplicate keys). Tags are stored in a .tags.json sidecar independently of object data, so setting or removing them never changes the object's ETag.
  • CORS β€” PUT /:bucket?cors, GET /:bucket?cors, DELETE /:bucket?cors.
  • Presigned URLs β€” SigV4 X-Amz-* query-string style, TTL up to the configured expiry, no server-side state needed.

Wire format

XML in, XML out. Matches AWS S3 response shapes for ListAllMyBucketsResult, ListBucketResult, CORSConfiguration, InitiateMultipartUploadResult, CompleteMultipartUploadResult, and the standard <Error> body. ETags are the hex MD5 of object bytes, quoted. Multipart ETags are <hex>-<partCount>, matching S3's composite format.

Example β€” put and get via curl --aws-sigv4

export AK=your_access_key
export SK=your_secret_key

# Create a bucket
curl -X PUT http://localhost:9000/my-bucket \
  --aws-sigv4 "aws:amz:us-east-1:s3" --user "$AK:$SK"

# Upload an object
curl -X PUT http://localhost:9000/my-bucket/hello.txt \
  --aws-sigv4 "aws:amz:us-east-1:s3" --user "$AK:$SK" \
  --data-binary 'hello'

# Download it back
curl http://localhost:9000/my-bucket/hello.txt \
  --aws-sigv4 "aws:amz:us-east-1:s3" --user "$AK:$SK"

Admin API (port 9001)

All admin API endpoints live under /api/* so they cannot collide with the React SPA's client-side routes (/users, /buckets, /buckets/:name/cors, ...) served at the root. /health and /metrics stay at the root as operational endpoints.

Every authenticated request carries:

X-Admin-AccessKey: <your-admin-access-key>
X-Admin-Secret:    <your-admin-secret>

Health

  • GET /health β†’ { "status": "ok" } β€” unauthenticated, suitable for readiness probes.

Users

  • POST /api/users β€” create a user. Server generates the access key + secret and returns them once in the response. Body takes an acl array.
  • GET /api/users β€” list users (secrets never returned).
  • PUT /api/users/:accessKeyID β€” replace ACL.
  • DELETE /api/users/:accessKeyID β€” remove.

Admin vs regular users. "Admin" is not a flag β€” it's an ACL pattern. A user is considered admin (can log in to the dashboard and hit the admin API) if and only if their ACL contains {"effect":"Allow","buckets":["*"],"actions":["*"]}. Anything narrower is an S3-only user, scoped to whatever the ACL allows, and cannot access the admin surface. Multiple admins are fine. New users created from the admin UI start with an empty ACL β€” edit the ACL afterwards to grant exactly the access they need.

Examples:

// Admin β€” full access, can use the dashboard
{ "acl": [{ "effect": "Allow", "buckets": ["*"], "actions": ["*"] }] }

// Read-only user on one bucket β€” no dashboard access
{ "acl": [{ "effect": "Allow", "buckets": ["reports"], "actions": ["s3:GetObject", "s3:ListBucket"] }] }

// Write-only uploader β€” no dashboard, no reads
{ "acl": [{ "effect": "Allow", "buckets": ["uploads"], "actions": ["s3:PutObject"] }] }

S3 operations via the admin surface

Every S3 bucket and object operation is mounted at /api/s3/* with a JSON wire format. Same handlers, same storage, just admin auth instead of SigV4. This is what the embedded UI uses; external tooling can use it too.

  • GET /api/s3/ β€” list buckets.
  • PUT /api/s3/:bucket β€” create.
  • GET /api/s3/:bucket β€” list objects.
  • DELETE /api/s3/:bucket β€” delete.
  • PUT /api/s3/:bucket/:key β€” upload (raw body).
  • GET /api/s3/:bucket/:key β€” download (raw body).
  • HEAD /api/s3/:bucket/:key β€” metadata only.
  • DELETE /api/s3/:bucket/:key β€” delete.
  • GET /api/s3/:bucket/:key honours Range: (partial download) exactly as the SigV4 surface does.
  • PUT|GET|DELETE /api/s3/:bucket/:key?tagging β€” per-object tags as JSON ({"tagSet":[{"key":"env","value":"prod"}]}).
  • PUT|GET|DELETE /api/s3/:bucket?cors β€” per-bucket CORS as JSON.

Example

export ADMIN_AK=...
export ADMIN_SK=...

# Create a bucket
curl -X PUT http://localhost:9001/api/s3/my-bucket \
  -H "X-Admin-AccessKey: $ADMIN_AK" -H "X-Admin-Secret: $ADMIN_SK"

# Upload an object
curl -X PUT http://localhost:9001/api/s3/my-bucket/hello.txt \
  -H "X-Admin-AccessKey: $ADMIN_AK" -H "X-Admin-Secret: $ADMIN_SK" \
  --data-binary 'hello'

# Create a user with full access
curl -X POST http://localhost:9001/api/users \
  -H "X-Admin-AccessKey: $ADMIN_AK" -H "X-Admin-Secret: $ADMIN_SK" \
  -H "Content-Type: application/json" \
  -d '{"acl":[{"effect":"Allow","buckets":["*"],"actions":["*"]}]}'

Per-bucket CORS

CORS lives on the bucket, exactly like AWS S3. There is no global allowlist, no CORS_ALLOWED_ORIGINS env var. A bucket with no CORS configuration rejects cross-origin browser requests β€” that is the S3 contract.

Endpoints

  • PUT /:bucket?cors (port 9000, XML body) or PUT /api/s3/:bucket?cors (port 9001, JSON body)
  • GET /:bucket?cors / GET /api/s3/:bucket?cors
  • DELETE /:bucket?cors / DELETE /api/s3/:bucket?cors

JSON shape (admin surface)

{
  "CORSRules": [
    {
      "AllowedMethods": ["GET", "PUT"],
      "AllowedOrigins": ["https://app.example.com"],
      "AllowedHeaders": ["*"],
      "ExposeHeaders":  ["ETag"],
      "MaxAgeSeconds":  600
    }
  ]
}

XML shape (SigV4 surface)

Same grammar as AWS PutBucketCors:

<CORSConfiguration>
  <CORSRule>
    <AllowedMethod>GET</AllowedMethod>
    <AllowedOrigin>https://app.example.com</AllowedOrigin>
    <MaxAgeSeconds>600</MaxAgeSeconds>
  </CORSRule>
</CORSConfiguration>

Observability

Request IDs

Every response carries an x-amz-request-id header (UUIDv4, minted per request). Error bodies repeat the same ID as <RequestId> in XML or requestId in JSON. Use this to correlate a client-visible failure with a server log line.

Structured logs

One JSON line per request at the end of handling:

{"time":"2026-04-14T07:15:03.882Z","level":"INFO","msg":"http_request","method":"GET","path":"/s3/:bucket","status":200,"duration_ms":3.1,"remote_ip":"10.0.0.4","request_id":"5aba...","auth_method":"sigv4","user_access_key":"AKIA...","bytes_in":0,"bytes_out":482}

Stable fields. path is always the route template (no object keys, no signatures). Query strings are stripped. Status drives the level: 5xx β†’ ERROR, 4xx β†’ WARN, else INFO. Configure with LOG_LEVEL and LOG_FORMAT.

Prometheus metrics

GET /metrics on port 9001 serves Prometheus text format. ByteBucket speaks the format β€” it does not bundle a scraper. Point any Prometheus-compatible collector at it (Prometheus, Grafana Agent, VictoriaMetrics, the OpenTelemetry Collector's Prometheus receiver).

Exposed series:

  • http_requests_total{method,path,status} β€” counter.
  • http_request_duration_seconds{method,path} β€” latency histogram.
  • http_request_size_bytes, http_response_size_bytes β€” payload histograms.
  • bytebucket_multipart_uploads_in_progress β€” gauge.
  • bytebucket_objects_bytes_total{bucket} β€” per-bucket byte total (best-effort delta, not reconciled on restart).
  • Standard go_* and process_* collectors.

The endpoint is unauthenticated. It relies on the same network boundary that protects the admin port (see SECURITY.md).

Graceful shutdown

On SIGTERM or SIGINT the server stops accepting new connections and drains in-flight requests for up to 30 seconds before exiting. Kubernetes' default terminationGracePeriodSeconds is 30s, which means Shutdown wins the race to SIGKILL in a normal rollout.


Using it from code

AWS SDK for JavaScript (v3)

import { S3Client, PutObjectCommand, GetObjectCommand } from '@aws-sdk/client-s3';
import { getSignedUrl } from '@aws-sdk/s3-request-presigner';

const s3 = new S3Client({
  region: 'us-east-1',
  endpoint: 'http://localhost:9000',
  forcePathStyle: true,
  credentials: { accessKeyId: 'AK', secretAccessKey: 'SK' },
});

await s3.send(new PutObjectCommand({ Bucket: 'b', Key: 'k.txt', Body: 'hi' }));
const url = await getSignedUrl(s3, new GetObjectCommand({ Bucket: 'b', Key: 'k.txt' }), { expiresIn: 900 });

Multipart, presigned URLs, and streaming uploads all work as expected.

boto3 (Python)

import boto3
s3 = boto3.client(
    's3',
    endpoint_url='http://localhost:9000',
    aws_access_key_id='AK',
    aws_secret_access_key='SK',
    region_name='us-east-1',
    config=boto3.session.Config(s3={'addressing_style': 'path'}),
)
s3.upload_file('big.bin', 'my-bucket', 'big.bin')  # uses multipart automatically

aws CLI

aws --endpoint-url http://localhost:9000 s3 cp ./big.bin s3://my-bucket/big.bin

rclone

[bytebucket]
type = s3
provider = Other
access_key_id = AK
secret_access_key = SK
endpoint = http://localhost:9000
force_path_style = true

Admin API (any language)

It's plain HTTP + JSON; use fetch, axios, requests, httpx, or curl. No SDK, no SigV4.


Storage layout and persistence

Everything lives under /data:

/data/
  users.db                              # BoltDB
  objects/
    <bucket>/
      <object>                          # raw bytes
      <object>.meta                     # JSON sidecar: ETag, checksums, user metadata
      <object>.tags.json                # JSON sidecar: object tag set (independent of ETag)
      .acl.json                         # per-bucket canned ACL
      .cors.json                        # per-bucket CORS config
  uploads/
    <bucket>/
      <uploadId>/
        manifest.json                   # metadata + state
        <partNumber>                    # raw part bytes
  • Backups. Snapshot the whole /data volume. Object bytes + their sidecar must travel together. BoltDB's single file is consistent on snapshot thanks to its write-ahead design.
  • Corruption recovery. If a .meta sidecar is missing, the ETag is recomputed lazily on next read. Stored objects are never mutated after PUT, so bitrot detection is a matter of periodically verifying MD5 against the stored ETag.
  • Deletion removes the object and its sidecar, then collapses empty parent directories.

Limits

  • Max header size: 1 MiB.
  • Max request body: 5 GiB on port 9000 (S3 single-PUT ceiling), 100 MiB on port 9001 (admin surface).
  • Per-connection timeouts: 10 s on headers, 5 min on read/write, 120 s idle. Very large single-PUT or GET on slow links may hit the 5-min bound; prefer multipart upload for anything above a few hundred MiB.
  • Multipart: 1 to 10000 parts per upload, no minimum part size enforced (real S3 requires 5 MiB for all but the last part β€” ByteBucket is lenient).
  • Object tags: up to 10 per object; key 1-128 and value 0-256 UTF-8 chars; no duplicate keys; tagging document capped at 16 KiB.
  • Rate limiting: off by default (see Configuration). When enabled, requests are throttled per client IP and over-limit calls get 503 SlowDown with Retry-After.
  • Presigned URL expiry: bounded by the request's X-Amz-Expires claim; no server-side cap beyond what the client signed.
  • Versioning, object locking, server-side encryption, replication, and lifecycle policies: not implemented.
  • BoltDB is a single-writer embedded DB. Fine for up to tens of thousands of users on a single node; don't expect horizontal scale.

Troubleshooting

  • SignatureDoesNotMatch β€” clock skew between client and server, wrong region (ByteBucket treats all requests as us-east-1), or trailing slash / header canonicalisation differences. The error body's <RequestId> matches a server log line with the full canonical request trace at DEBUG.
  • NoSuchCORSConfiguration on a preflight β€” set one via the admin UI or the ?cors endpoint.
  • Admin UI says "Invalid credentials" β€” you're hitting /api/users with X-Admin-* headers; the super-user bootstrap only runs when the user DB is empty. Check that ENCRYPTION_KEY matches what was used on first boot.
  • Lost admin credentials or ENCRYPTION_KEY β€” delete /data/users.db and restart with fresh env vars. Objects survive; users and ACLs are gone.
  • Empty <Owner> or dummy-* in responses β€” you're on an older build. Upgrade to ghcr.io/byteink/bytebucket:latest.
  • Connection hangs on large uploads β€” use multipart. Per-connection write timeout is 5 minutes.
  • Metrics endpoint returns 404 β€” you hit port 9000. /metrics is on 9001.

License

Licensed under the Server Side Public License. Free for open-source and commercial use; offering ByteBucket itself as a managed, paid service requires open-sourcing the complete service stack.

About

πŸš€ ByteBucket – Self-hosted, fully S3-compatible object storage system built with Go, supporting secure user management, JSON-based metadata storage, and easy Docker deployment. Licensed under SSPL.

Resources

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors