Tommi Hippeläinen
December 14, 2024
In recent conversations with seasoned database engineers and venture capitalists, one theme repeatedly surfaces: the idea of a “perfect” database is more myth than reality. These discussions have prompted me to re-examine the architectural choices underpinning modern databases and what it would mean, in theory, to have a perfect infrastructure stack. While no such perfect database truly exists, the thought exercise of imagining one—and the underlying hardware and software needed to achieve near-ideal performance—can teach us a great deal about both the constraints and the possibilities within today’s data ecosystem.
At a high level, database technologies differ in their internal architectures, storage models, and target workloads. Relational databases, for instance, have long relied on row-oriented storage and B-tree indexing, making them well-suited for transactional workloads but less optimal for large-scale analytics. Conversely, columnar databases and data warehouses place data attributes together, enabling high-performance analytical queries over large datasets while being less efficient at small, record-level lookups or frequent updates.
NoSQL systems introduce other trade-offs: key-value stores, document databases, and graph databases abandon strict relational schemas for more flexible data models, improving scalability and often simplifying horizontal distribution. However, these approaches can complicate certain query patterns or data integrity guarantees. Each type of database engine is a product of its design choices: storage format, indexing strategy, and architectural decisions that reflect the workloads and use cases they aim to serve.
Another fundamental distinction lies in whether the database is distributed across multiple nodes or kept on a single machine. Non-distributed, single-node databases tend to have simpler architectures. They rely on a single host’s CPU, memory, and storage, and while they may scale up by adding more resources to that host, they hit practical limits of capacity, throughput, and fault tolerance.
Distributed databases, on the other hand, split data across multiple nodes. Each node contributes processing power, memory, and storage capacity, allowing the system to grow linearly by adding more machines. Distributed databases can improve fault tolerance by replicating data across geographically dispersed data centers. However, distributing data introduces complexity: network latency comes into play, data must be partitioned and balanced, consistency models vary, and query planners must orchestrate operations across multiple nodes. The cost of data retrieval is no longer just a matter of reading from local storage, but also of communicating over a network, synchronizing state, and ensuring consistency between nodes.
Just as no single storage format is perfect for all queries, no single distribution model is ideal for all workloads. High-scale online services that demand global availability and elastic scaling benefit from distributed architectures. Smaller or highly latency-sensitive applications might find the overhead of distribution unnecessary. Ultimately, both distributed and non-distributed approaches face constraints that influence how data should be stored, accessed, and optimized.
A critical axis along which databases differentiate is whether data resides primarily in-memory or on disk. In-memory databases keep active datasets in RAM, ensuring microsecond-level access times and simplified data structures without the latency of disk I/O. This often leads to phenomenal performance in latency-sensitive applications, especially where real-time analytics or ultra-fast transactions are required.
Yet, pure in-memory systems have their own limitations: RAM is expensive and not nearly as large as disk-based storage. As data volumes swell into the terabyte or petabyte range, keeping everything in memory becomes cost-prohibitive or simply impossible. Disk-based databases therefore remain prevalent, employing sophisticated caching layers, prefetching algorithms, and indexing structures to partially mask the latency of reading data from persistent storage.
Traditional disk access methods, even with SSDs and NVMe technologies, still operate on a block-based paradigm. Databases must issue I/O requests, wait for completion, and manage buffer pools to cache frequently accessed pages. Though SSDs and NVMe devices have brought enormous improvements over spinning disks, their latencies remain in the microseconds to low millisecond range, and they require a layer of indirection: the database engine must translate logical requests into block I/O operations.
This block-oriented approach forces database designers to choose how data is arranged—row-wise, column-wise, or in some hybrid format—and how it is indexed. Data layout decisions aim to minimize expensive block I/Os, but they inevitably benefit some queries at the expense of others. This leads to the fundamental constraint: there is no single data organization that is optimal for all possible query patterns. One must choose priorities—low-latency point lookups, large scans, complex joins—and design accordingly.
What if we could change the rules? Imagine building a storage infrastructure that presents persistent data directly to the CPU as if it were memory. Rather than thinking in terms of block I/O, the CPU would issue load/store instructions directly to remote persistent storage.
Technologies like Fibre Channel (FC) and InfiniBand (IB) have long offered high-throughput, low-latency interconnects for storage or clustering. If these fabrics were extended to carry a memory-access protocol—a concept we might call “memory-over-FC”—then remote storage arrays could appear to the host as another layer of addressable memory. This approach would let databases directly access persistent capacity over standard interconnects. The block layer could be bypassed, and caching structures simplified.
In such a scenario, performance would improve over traditional disk-based models. Latency overheads from complex I/O scheduling would shrink. Yet, even with memory-over-FC or memory-over-IB, you would still incur protocol overheads and distance limitations. The performance enhancements would be measurable but only to a point. The fundamental hardware and interconnect layers, though improved, would not fully emulate DRAM-level speeds. The database would run faster, but it would not suddenly make all data layouts equally optimal.
If we truly wanted to maximize performance, we could go further. Redefine the entire stack—hardware and software—to support direct CPU addressing of persistent storage media at nearly DRAM-like latency. This might involve special-purpose controllers, integrated coherent memory fabrics like CXL, and storage devices engineered to handle load/store operations natively. The CPU could treat petabytes of persistent data as if it were main memory, drastically cutting down on complexity.
In such a fully customized environment, latency-sensitive operations would be executed orders of magnitude faster. Large analytical queries that used to thrash buffer caches could execute seamlessly. The complexity of choosing row vs. columnar layouts might diminish, as random access becomes nearly as cheap as sequential scans. The database engine could lean on simpler, more uniform data representations, trusting the hardware to deliver data quickly wherever it resides.
Pushing the Vision Further: The Ideal Hardware and Software Stack To achieve truly transformative gains, the underlying infrastructure must be reimagined from the ground up. Rather than adapting existing transports and protocols, consider a purpose-built stack designed solely around near-DRAM-latency persistent storage:
Coherent Interconnects: Use advanced coherence protocols (like CXL or CCIX) natively integrated into the CPU’s memory subsystem. These protocols would ensure that persistent storage modules—whether local or remote—are treated as coherent memory regions, maintaining cache consistency, load/store semantics, and direct addressing without layers of indirection.
Specialized Controllers and ASICs: Replace generic storage controllers with custom ASICs or FPGAs optimized for memory semantics. These controllers would eliminate translation layers and block-based abstractions, presenting a direct load/store interface. They could also handle redundancy, erasure coding, and data integrity at the hardware level, further offloading complexity from the database engine.
Tiered Persistent Memory Layers: Incorporate non-volatile memory modules (e.g., next-generation NVDIMMs, persistent memory DIMMs, or SCM devices) that blend the persistence of SSDs with near-DRAM access times. The CPU could freely address petabytes of data without significant overhead, and data placement or migration between tiers would happen automatically in hardware.
Simplified Software Layers: The database engine would no longer need a conventional buffer pool or elaborate caching algorithms. It could rely on direct addressing and let the hardware handle data retrieval optimizations. This approach would also minimize the complexity of choosing between row-store or column-store formats. With near-instant random access, the database could store data in a more generalized, flexible structure, focusing on simpler internal logic.
Network-Transparent Memory Fabric: For distributed setups, extend the coherent memory fabric beyond a single chassis. Controllers and switches that maintain memory coherence between geographically separate nodes would allow multiple systems to share a vast, unified address space. While challenging, this would effectively merge the concepts of cluster computing, distributed memory, and persistent storage into one seamless domain.
Building such a stack involves reinventing data center infrastructure. It would require industry-wide cooperation, new standards, and a rethinking of how we design, purchase, and deploy compute and storage systems. The benefits, however, would be enormous: microsecond-level access times at the petabyte scale, simpler database architectures, and the potential to run analytical and transactional workloads on the same platform with minimal compromise.
Even with all this progress, the notion of a perfect database that arranges data optimally for all workloads remains elusive. While a near-perfect hardware stack reduces the penalty of suboptimal data access patterns, it does not eliminate the inherent trade-offs. Different queries and workloads require different optimizations. Some benefit from columnar formats, others from row-oriented designs, and still others from specialized indexing and compression schemes. Computation, query optimization, and data modeling complexities persist, no matter how fast the storage access becomes.
The perfect database does not exist because it represents a moving target—shaped by evolving workloads, changing user requirements, and new analytical methods. What specialized hardware and direct memory semantics can do is raise the baseline of performance. With an infrastructure that minimizes storage latency and complexity, database engines have more freedom and flexibility. But they will still need to choose how to represent, index, and process data. Just as no single tool solves every problem in software design, no single data arrangement or architectural approach will handle every query with equal perfection.
The idea of a perfect database is a useful thought experiment, pushing us to imagine what would be possible if we redesigned our systems from first principles. By transcending traditional disk access and moving towards direct memory semantics—initially over existing fabrics like FC or IB, and ultimately through purpose-built hardware and coherent memory protocols—we could achieve substantial performance gains.
However, such an approach only addresses certain dimensions of complexity. It will not magically unify all data models or index strategies under one optimal configuration. The inherent trade-offs remain, and as data grows more portable and becomes increasingly critical to AI-driven applications, working with different databases and managing database migrations will remain essential. In fact, these practices become even more important as organizations demand greater flexibility, data portability, and the ability to integrate new AI-driven workloads seamlessly.
This is precisely why we are building reDB: to embrace the realities of evolving data ecosystems, data portability, and the complexities of integrating AI, while striving to push the boundaries of performance and efficiency in today’s multi-database world.
Maybe someday we will build the perfect database stack too.