137 migrations and counting: building a personal OS schema from scratch

14 March 2026 by

Suraj Barman

This article chronicles the evolution of a personal operating system built atop PostgreSQL, growing from a modest ten‑table Fastify prototype to a 147‑table schema after 137 migrations. It explores design choices such as a unified public schema, soft‑delete columns, pgvector embeddings, and a sync engine that together enable a self‑hosted, AI‑aware digital life manager.

Initial Schema Design

The first release defined thirteen tables, each with a UUID primary key generated by gen_random_uuid(). Core entities included notes, contacts, events, and an embeddings table that stored vector representations for semantic search. By installing the pgcrypto and vector extensions, the system prepared for future cryptographic operations and high‑dimensional similarity queries, laying a flexible foundation for later growth.

Embedding Architecture with pgvector

Embedding storage leveraged the vector data type, allowing 768‑dimensional arrays to be indexed with ivfflat for efficient nearest‑neighbor lookup. The embeddings table featured a domain column to differentiate source types (notes, emails, etc.) and a model_used field added later to track versioning. This design enabled a single retrieval path for any content, simplifying LLM‑driven augmentation pipelines.

Unified Public Schema Decision

All domain tables were placed in the default public schema rather than segregating by business area. This eliminated cross‑schema joins for the knowledge graph, allowing the entity_links table to reference any record via a generic (source_type, source_id) pair. The approach also streamlined hook implementation: a single CRUD interceptor could apply embedding, heat tracking, and entity extraction uniformly across all entities.

Soft Delete Strategy with deleted_at

Every table received a deleted_at timestamp from day one, providing a non‑destructive deletion model. This facilitated bidirectional sync with external services such as Gmail and Google Calendar, as the system could differentiate between never‑seen records and those explicitly removed. Restoration became a trivial UPDATE operation, and audit trails could reconstruct deletion timelines without additional logging tables.

Sync Engine and Change Log

The sync subsystem introduced change_log, sync_connectors, and sync_history tables. Each CRUD action emitted a log entry containing the affected record_id, operation type, and timestamp. Connectors consumed these entries to propagate changes to third‑party APIs, while sync_history recorded batch outcomes, enabling retry logic and conflict resolution. This architecture turned the database into an event‑sourced source of truth.

Heat Scoring and Ranking Mechanism

To prioritize frequently accessed items, a record_heat table stored a decay‑based score between 0 and 1. Every read operation incremented the score, while a background job applied exponential decay to older interactions. Search queries multiplied relevance scores by the heat factor, ensuring that hot records surfaced prominently without sacrificing lexical matching quality.

Multi‑User Migration Challenges

Introducing multi‑user support required adding a user_id column to every domain table, a migration that generated thousands of ALTER TABLE statements and index rebuilds. Earlier inclusion of user_id would have avoided this churn. Additionally, embedding versioning was retrofitted by adding model_version to the embeddings table, allowing incremental re‑embedding when switching from nomic‑embed‑text to qwen3‑embedding:0.6b. The experience highlighted the importance of anticipating cross‑tenant identifiers and model provenance from the outset.