Intro To Cassandra
I am currently reading
Designing Data-Driven Applications and am about to do a mini series on some technologies I come across in the book and some of the pros/cons.
What is Cassandra?
Cassandra is a LSM-tree based NoSQL database that is a good choice when there is a large amount of data and consistency is not a priority. Cassandra is fully distrubuted and boasts that there is no single point of failure. It specialises in high performance and is horizontally scalable.
Thanks to its distribution model being p2p, it can easily distribute data across multiple data centers and cloud availability zones.
A hashing mechanism known as the "partitioner" is used to take a table row's primary key, compute a numerical token for it and assign it to one of the nodes in a cluster.
According to blog over at rackspace, "While Cassandra has multiple partitioners from which to choose, the default partitioner randomizes data across a cluster and ensures an even distribution of all of the data. In addition, Cassandra automatically maintains the balance of data across a cluster even when existing nodes are removed or new nodes are added to a system."
In relation to terminology, the main point of difference when compared to an Oracle Database is that a Database/Schema is referred to in Cassandra as a
Cassandra also boasts a few important features:
- Rich data model
- Dynamic schema
- Typed data
- Data locality
- Field updates
- Easy for programmers
Cassandra Query Language
Cassandra uses the Cassandra Query Language (CQL) which runs through the Cassandra shell (cqlsh).
You can actually do a hybrid deployment of a Cassandra and Oracle Database - these usually are a testament to the company needs.
Cassandra itself, given it's lack of emphasis on consistency, offers the
AID part of
Some of the downsides to Cassandra include its lack of aggregation functionality, lack of table joins (there requiring de-normalisation pre-insertion) and search basing only on keys and indexes.
Playing around with Cassandra
Docker to be installed on your local machine.
This intro follows the initial post at https://medium.com/@michaeljpr/five-minute-guide-getting-started-with-cassandra-on-docker-4ef69c710d84 - be sure to support them.
Additional calls that are useful
A cutdown version taken from the blog post referenced above:
Great! We can then run a notebook to play around with Cassandra.
Cassandra Basics Calls
If you log into the
cqlsh shell, we can start playing around.
Creating a Keyspace
The keyspace (equivalent of a database in RDBMS) can be what holds data objects and is the level where you specify options for a data partitioning and replication strategy.
Create a Table, Inserting Data, Updating Data and Querying Data
Querying columns other than the primary key
In order to do this, we need to generate an index on another column:
This should now give you a general intro into how Cassandra basics! The Docker images will give you a quick basis to spin up containers running Cassandra to play around with, stop and remove.
1,200+ PEOPLE ALREADY JOINED ❤️️
Get fresh posts + news direct to your inbox.
No spam. We only send you relevant content.