Dynamo: Amazon's Highly Available Key-value Store

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels

What is the Problem?

Amazon runs a world-wide e-commerce platform using tens of thousands of servers located in many data centers around the world. Reliability is one of the most important requirements because even the slightest outage has significant financial consequences and impacts customer trust. Besides, the platform has to be highly scalable in order to support continuous growth.

Summary

This paper describes the design and implementation of Dynamo, a highly available and scalable distributed data store, which is used to manage the state of services that have very high reliability requirements and need tight control over the tradeoffs between availability, consistency, cost-effectiveness, and performance. Dynamo even allows read and write operations to continue during network partitions and resolves update conflicts using different conflict resolution mechanisms. When compared to traditional database systems, Dynamo sacrifice some features such as richness of schema representation and strong consistency in favor of higher performance and availability at massive scales.

Key Insights

Data fetched are not guaranteed to be up-to-date, but updates are guaranteed to be propagated to all nodes eventually.

Notable Design Details/Strengths

Dynamo employs a mechanism to dynamically partition the data over the set of nodes in the system, which makes incrementally scale possible.
To further achieve high availability and durability, Dynamo replicates the data on multiple hosts.

Limitations/Weaknesses

Dynamo exposes data consistency and reconciliation logic issues to the developers. For any new applications that want to use Dynamo, some analysis is required during the initial stages of the development to pick the right conflict resolution mechanisms that meet the business case appropriately, which requires extra efforts.
A write operation in Dynamo also requires a read to be performed for managing the vector timestamps. This can be very limiting in environments where systems need to handle a very high write throughput.

Summary of Key Results

This paper is built and evaluated based on a realistic scenario, which further proves that Dynamo suits Amazon's large-scale scenario very well.
The object buffer maintained by each storage node results in lowering the 99.9^(th) percentile latency by a factor of 5 during peak traffic even for a very small buffer of a thousand objects.

Open Questions

How can we balance the tradeoff between the network communication and the promising performance of the large-scale clusters? The principal bottleneck in large-scale clusters is often inter-node communication bandwidth. Many applications must exchange information with remote nodes to proceed with their local computation. Internet services increasingly employ service-oriented architectures (e.g., Dynamo), where the retrieval of a single web page can require coordination and communication with literally hundreds of individual sub-services running on remote nodes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamo: Amazon's Highly Available Key-value Store

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels

What is the Problem?

Summary

Key Insights

Notable Design Details/Strengths

Limitations/Weaknesses

Summary of Key Results

Open Questions

FilesExpand file tree

10.11.md

Latest commit

History

10.11.md

File metadata and controls

Dynamo: Amazon's Highly Available Key-value Store

Giuseppe DeCandia, Deniz Hastorun, Madan Jampani, Gunavardhan Kakulapati, Avinash Lakshman, Alex Pilchin, Swaminathan Sivasubramanian, Peter Vosshall, and Werner Vogels

What is the Problem?

Summary

Key Insights

Notable Design Details/Strengths

Limitations/Weaknesses

Summary of Key Results

Open Questions