I Built a Database in a Weekend

(post largely written by AI, forgive me. i’m tired.)

This weekend I built a database engine from scratch. Not a wrapper around SQLite. Not a thin layer over someone else’s storage engine. A full LSM-tree storage engine, append-only log, CRDT conflict resolution, query engine with reactive subscriptions, vector search, and peer-to-peer sync — all in Rust, compiled to WebAssembly, with 220 tests passing.

The old calculus

Before AI coding assistants, building a database was a multi-year, multi-team endeavor. You’d never even consider it for a side project. The rational decision was always: evaluate existing tools, pick the closest fit, work around its limitations. The ecosystem mattered enormously.

This created a gravitational pull toward popular languages and established tools. You’d pick Node.js not because it was ideal for your problem, but because npm had the packages you needed. You’d use PostgreSQL not because relational was the right model, but because the ecosystem of ORMs, migration tools, and hosting providers was unbeatable.

Technical decisions were ecosystem decisions.

What changed

AI doesn’t just make you type faster. It changes which projects are feasible.

When I can describe a bloom filter implementation and get working code with proper false-positive rate tuning in minutes instead of hours, the cost side of “roll your own” drops by an order of magnitude.

This shifts the calculus. The question is no longer a compromise “what off-the-shelf solution best fits my use case?”.

The database I actually needed

I wanted a database for the personal/AI era. Something that:

Runs entirely in the browser via WASM
Works offline, syncs when online via peer-to-peer
Has reactive queries
Supports vector search for AI embeddings

Existing WASM databases include:

sqlite wasm
pglite
duckdb

I could have pieced together several existing tools to get most of what I wanted. But to be honest, I just really wanted to build a database for my specific needs.

What Aregula is

Aregula is a local-first database engine. The architecture draws from three systems:

LevelDB’s LSM-tree for the storage layer — sorted SSTables, bloom filters, write-ahead log, size-tiered compaction
Kappa architecture where an append-only log is the single source of truth — every query is a materialized view of the log

Secure Scuttlebutt’s replication model for sync — per-device signed feeds, hash chains, state vectors for incremental delta computation

It’s written in Rust as 12 crates, each defined by a trait interface so any layer can be swapped:

aregula-engine      ← WASM entry point
aregula-query     ← Filters, sorting, reactive subscriptions
aregula-ql        ← Rich predicates (Like, Contains, In, regex)
aregula-agg       ← Aggregations (count, sum, avg, group_by)
aregula-vector    ← HNSW vector index (cosine, L2, dot product)
aregula-crdt      ← LWW per-field merge, HLC timestamps
aregula-log       ← Append-only log, ed25519 signing, SHA-256 hash chains
aregula-lsm       ← LSM-tree storage engine
aregula-io        ← FileSystem abstraction (Memory, OPFS, Disk)

Plus a signaling server for WebRTC handshakes, an HTTP server with SSE for relay-based sync, and a TypeScript SDK.

Performance that surprised me

I expected “custom engine” to mean “slower than production databases.” That’s not what happened:

Benchmark	SQLite (native CLI)	Aregula
10K writes	47,800 ops/sec	31,700 ops/sec
10K reads (PK lookup)	87,700 ops/sec	1,430,000 ops/sec

Writes are 0.66x SQLite — and 81% of that cost is ed25519 signature generation, not storage. The raw key-value layer runs at ~370K ops/sec, which is the low end of RocksDB territory. Reads are 16x faster because LSM point lookups on hot data are inherently faster than B-tree traversal. But I plan to spend more time optimizing write performance.

A purpose-built engine can beat general-purpose databases on the workloads it was designed for. This has always been true. What’s new is that building one is a weekend project instead of a multi-year commitment.

Niche tools, not mainstream ecosystems

I wrote this in Rust. Not because Rust has the biggest ecosystem for database development — it doesn’t. I chose it because it compiles to small, fast WASM with no garbage collector, which is exactly what you want for a database running inside a browser worker thread.

In the old world, this would’ve been a risky choice. Fewer libraries, smaller community, steeper learning curve. But when AI handles the implementation details — the SSTable binary format, the bloom filter math, the HLC clock algorithm — the ecosystem gap stops mattering. You’re choosing the language for its technical properties, not its package registry.

This applies beyond language choice. I implemented my own LSM-tree instead of wrapping RocksDB. I built HNSW vector search instead of embedding Faiss. I wrote a signaling server in Rust instead of using a Node.js library. Each of these would’ve been a hard “no” two years ago. Not because they’re technically wrong, but because the implementation cost wasn’t justified. AI changed the justification.

I had an idea of what I wanted to build starting off, the core abstractions around Kappa, LevelDB, and SSB. But using AI to research database technology enabled me to more easily learn from existing systems and understand the design trade-offs. This is huge because that’s the real engineering. Writing code is just syntax.

Sync to a Pi, not the cloud

We’re entering an era where LLMs have enabled millions of people to write personal software. Aregula enables these apps to run locally with redundancy.

Every device — your phone, your laptop, a Raspberry Pi on your shelf — gets its own ed25519 keypair and becomes a full peer. Sync happens directly between them. There’s no central server that holds your data, no subscription fee, no terms of service that change, no company that gets acquired and shuts down your backend.

The Raspberry Pi case is the one I keep coming back to. A $35 computer on your home network running the Aregula server becomes your personal sync hub. Your browser apps sync to it over the local network. It’s always on, it’s always yours, and it’s a complete replica with cryptographic proof of every mutation’s origin. If your laptop dies, the Pi has everything. If the Pi dies, your laptop has everything. No cloud required.

This isn’t new technology — people have run personal servers forever. What’s new is that the sync protocol is built into the database itself. You open the app, it syncs. The signed append-only logs handle conflict resolution automatically. Two devices that edited the same record offline converge deterministically when they reconnect.

For personal AI developed apps — your data lives on your devices.

Conclusion

When AI lowers the cost of building from scratch the question stops being “which existing tool is closest to what I need?” and becomes “what do I actually need?”

Aregula is open source and written in Rust. It’s new, so you probably shouldn’t use it yet.