Skip to content

concept: noSQL

David Liu edited this page Jan 8, 2025 · 11 revisions

RDBMS are hard to do below functions efficiently in a distributed system

  • Transactions
  • Referential Integrity: Joins

Polymorphism

  • aka. schemaless
  • This eliminates the need for maintaining a central system (data) catalog or updating an object-relational mapper (ORM)
  • downstream techniques, e.g. SCD types and design, will not be available, because every dimension is already joined to the fact data in extremely denormalized ways.

Key-value stores

based on Dynamo paper published by Amazon at the ACM Symposium on Operating Systems Principles

Vendor

Riak

  • Riak implements the principles from Amazon's Dynamo paper
  • Written in Erlang
  • pluggable storage backend for its core storage
  • open source in 2017

Redis

Aerospike

Column-family stores

A very wide, sparsely populated table structure

  • It includes a number of families of columns that specify the keys for this particular table structure.
  • Journey is started from BigTable paper in 2006 at the Operating Systems Design and Implementation (OSDI)

Use cases

  • In many cases, these systems are combined with batch-oriented systems that use Map/Reduce as the processing model for advanced querying
  • Like Document store with table schema

Vendor

  • GCP BigTable
  • Apache Cassandra
  • HBase

Document stores

  • Document can be an XML, JSON, YAML, LibreOffice/MS 365
  • Document is stored and retrieved in a schema-less fashion

Vendor

Clone this wiki locally