Cassandra
Paper: Cassandra Cassandra / Distributed Wide Column NoSQL Database Goal Design a distributed and scalable system that can store a huge amount of semi-structured data, which is indexed by a row key where each row can have an unbounded number of columns. Background Open source Apache Project developed at FB in 2007 for Inbox Search feature. Designed to provide Scalability, Availability, Reliability to store large amounts of data. Combines distributed nature of Amazon’s Dynamo(K-V store) and DataModel for Google’s BigTable which is a Column based store. Decentralized architecture with no Single Point of Failure(SPOF), Performance can scale linearly with addition of nodes. What is Cassandra? Cassandra is typically classified as an AP (i.e., Available and Partition Tolerant) system which means that availability and partition tolerance are generally considered more important than the consistency. Eventually Consistent Similar to Dynamo, Cassandra can be tuned with replication-factor and consistency levels to meet strong consistency requirements, but this comes with a performance cost. Uses peer-to-peer architecture where each node communicates to all other nodes. Cassandra Use Cases Any application where eventual consistency is not a concern can utilize Cassandra. Cassandra is optimized for high throughput writes. Can be used for collecting big data for performing real-time analysis. Storing key-value data with high availability(Reddit/Dig) because of linear scaling w/o downtime. Time Series Data Model Write Heavy Applications NoSQL High Level Architecture Agenda Cassandra Common Terms High Level Architecture Cassandra Common Terms Column: A Key-Value pair. Most basic unit of data structure in Cassandra. ...