If you are reading this blog post then you must have used at least
one database in your career and wondering what's up with these new trend of
"NoSQL".
|
Before getting into NoSQL databases, let’s first look at the
history of RDBMS (Relational Data Base Management System) and its main
concepts.
|
RBDMS has a history of 40 years with many enhancements. It all
started with E.F. Codd and his 12 rules to make a database as RDBMS.
|
Beyond a lot of intended benefits, the relational model is
well-suited to client-server programming and today it is the predominant
technology for storing structured data in web and business applications.
Classical relational databases follow the ACID property. That is, a database
transaction must be Atomic, Consistent, Isolated and Durable. The details of
ACID are as follows:
|
ACID - Atomicity,
Consistency, Isolation, Durability
|
||||
Atomicity
|
Consistency
|
Isolation
|
Durability
|
|
Atomicity requires that each transaction be "all or
nothing": if one part of the transaction fails, then the entire
transaction fails, and the database state is left unchanged. An atomic system
must guarantee atomicity in each and every situation, including power
failures, errors and crashes. To the outside world, a committed transaction
appears (by its effects on the database) to be indivisible
("atomic"), and an aborted transaction does not happen.
|
The consistency property
ensures that any transaction will bring the database from one valid state to
another. Any data written to the database must be valid according to all
defined rules, including constraints, cascades, triggers, and
any combination thereof. This does not guarantee correctness of the
transaction in all ways the application programmer might have wanted (that is
the responsibility of application-level code), but merely that any
programming errors cannot result in the violation of any defined rules.
|
The isolation property
ensures that the concurrent execution of transactions results in a system
state that would be obtained if transactions were executed sequentially,
i.e., one after the other. Providing isolation is the main goal
of concurrency control. Depending on the concurrency control method
(i.e., if it uses strict - as opposed to relaxed -
serializability), the effects of an incomplete transaction might not even be
visible to another transaction.
|
The durability property
ensures that once a transaction has been committed, it will remain so, even
in the event of power loss, crashes, or errors. In a relational
database, for instance, once a group of SQL statements execute, the results
need to be stored permanently (even if the database crashes immediately
thereafter). To defend against power loss, transactions (or their effects)
must be recorded in a non-volatile memory.
|
Apart from these ACID properties, there are some basic characteristics due to which Relational DBMS become popular. Some of them are:
|
||||
·
Data is stored in a set
of Tables or data is stored in the format of row and column in a table.
|
||||
·
Relationships are
represented by data.
|
||||
·
Tables are joined by
relational links.
|
||||
·
Reduced duplication of
data in database can be achieved by normalization.
|
||||
·
They allow greater
flexibility and efficiency
|
||||
Shortcomings of RDBMS
|
||||
RDBMS is sufficient to store and manipulate all the structured
data efficiently but in today’s world the velocity and nature of data
used/generated over the Internet is growing exponentially. As we can often
see in areas like social media, the data used has no specific structure
boundary. This makes unavoidable the need to handle unstructured data which
is non-relational and schema-less in nature. For RDBMS it becomes a real
challenge to provide the cost effective and fast Create, Read, Update and
Delete (CRUD) operation as it has to deal with the overhead of joins and
maintaining relationships amongst various data.
|
||||
Therefore a new mechanism is required to deal with such data in
an easy and efficient way. This is where NoSQL comes into the picture to
handle unstructured BIG data in an efficient way to provide maximum business value
and customer satisfaction.
|
||||
NoSQL
|
||||
NoSQL is not a campaign against the SQL language. NoSQL stands
for “Not Only SQL.” It provides more possibilities beyond the classic
relational approach of data persistence to the developers.
|
||||
NoSQL refers to a broad class of non-relational databases that
differ from classical RDBMS in some significant aspects, most notably because
they do not use SQL as their primary query language, instead providing access
by means of Application Programming Interfaces (APIs).
|
||||
The reason behind such a big switch or in other words the
advantages of NoSQL are the following:
|
||||
·
High scalability
|
||||
·
Distributed Computing
|
||||
·
Lower cost
|
||||
·
Schema flexibility
|
||||
·
Un/semi-structured data
|
||||
·
No complex relationships
|
||||
As RDBMS follows the ACID property, NoSQL databases are “BASE”
Systems. The BASE acronym was defined by Eric Brewer, who is also known for
formulating the CAP theorem whose properties are used by BASE System.
|
A BASE system gives up on consistency so as to have
greater Availability and Partition tolerance. A BASE can be defined as
following:
|
||||
Basically Available indicates that the system does
guarantee availability.
|
||||
Soft state indicates that the state of the system may change
over time, even without input. This is because of the eventual consistency
model.
|
||||
Eventual consistency indicates that the system will become
consistent over time, given that the system doesn’t receive input during that
time.
|
||||
With that said here is the simple pictorial representation of
above mentioned concepts and which one to choose from the given business
needs.
|
||||
Types of NoSQL:
|
||||
As of today there are four types of NoSQL Databases available:
|
||||
Key-Value: Data is stored in an
array of Key-Value pair. It has a Big Hash Table of keys & values. It is
Fast scalable but cannot handle complex data.
|
||||
Columnar: Unlike RDBMS, Each
storage block contains data from only one column. Fast, scalable,
denormalized, map reduce, good for series, not efficient for complex data.
|
||||
Document Database: It stores documents made
up of tagged elements. A Good operational system to store XML, JSON object
related content. Not Analytical and moderately scalable. Its Object Oriented
Database
|
||||
Graph: A network database that
uses edges and nodes to represent and store data. Great for complex data,
transactional. Less scalable.
|
||||
Detailed Description:
|
||||
Key-Value:
|
||||
These type of databases work by matching keys with values,
similar to a dictionary. There is no structure nor relation. After connecting
to the database server (e.g. Redis), an application can state a key (e.g.
the_answer_to_life) and provide a matching value (e.g. 42) which can later be
retrieved the same way by supplying the key.
|
||||
Key / value DBMSs are usually used for quickly storing basic
information, and sometimes not-so-basic ones after performing, for example, a
CPU and memory intensive computation. They are extremely performant,
efficient and usually easily scalable.
|
||||
Columnar:
|
||||
Column based NoSQL database management systems work by advancing
the simple nature of key / value based ones.
|
||||
Despite their complicated-to-understand image on the internet,
these databases work very simply by creating collections of one or more key /
value pairs that match a record.
|
||||
Unlike the traditional defines schemas of relational databases,
column-based NoSQL solutions do not require a pre-structured table to work
with the data. Each record comes with one or more columns containing the
information and each column of each record can be different.
|
||||
Basically, column-based NoSQL databases are two dimensional
arrays whereby each key (i.e. row / record) has one or more key / value pairs
attached to it and these management systems allow very large and
un-structured data to be kept and used (e.g. a record with tons of
information).
|
||||
Document Database:
|
||||
Document based NoSQL database management systems can be
considered the latest craze that managed to take a lot of people by storm.
These DBMS work in a similar fashion to column-based ones; however, they
allow much deeper nesting and complex structures to be achieved (e.g. a
document, a document within a document).
|
||||
Documents overcome the constraints of one or two level of key /
value nesting of columnar databases. Basically, any complex and arbitrary
structure can form a document, which can be stored using these management
systems.
|
||||
Despite their powerful nature, and the ability to query records
by individual keys, document based management systems have their own issues
and downfalls compared to others. For example, retrieving a value of a record
means getting the whole lot of it and same goes for updates, all of which
affect the performance.
|
||||
Graph:
|
||||
Finally, the very interesting flavor of NoSQL database
management systems is the graph based ones.
|
||||
The graph based DBMS models represent the data in a completely
different way than the previous three models. They use tree-like structures
(i.e. graphs) with nodes and edges connecting each other through relations.
|
||||
Similarly to mathematics, certain operations are much simpler to
perform using these type of models thanks to their nature of linking and
grouping related pieces of information (e.g. connected people).
|
||||
These databases are commonly used by applications whereby clear
boundaries for connections are necessary to establish. For example, when you
register to a social network of any sort, your friends' connection to you and
their friends' friends' relation to you are much easier to work with using
graph-based database management systems.
|
NoSQL Advantages
|
NoSQL Disadvantages
|
|||
High Scalability
|
Too many options (> 250), which one to pick.
|
|||
Schema Flexibility
|
Limited query capabilities (so far)
|
|||
Distributed Computing (Reliability, Scalability, Sharing of
Resources, Speed)
|
Eventual consistency is not intuitive to program for strict
scenarios like banking applications.
|
|||
No complicated relationships
|
Lacks Joins, Group by, Order by facilities
|
|||
Lower cost (Hardware Costs)
|
ACID transactions
|
|||
Open Source – All of the NoSQL options with the exceptions of
Amazon S3 (Amazon Dynamo) are open-source solutions. This provides a low-cost
entry point.
|
Limited or no guarantee of support – Open source
|
Feature
|
NoSQL
|
RDBMS
|
Data Volume
|
Handles Huge Data Volumes
|
Handles Limited Data Volumes
|
Data Validity
|
Highly Guaranteed
|
Less Guaranteed
|
Scalability
|
Horizontally
|
Horizontally & Vertically
|
Query Language
|
No declarative query language
|
Structured Query Language (SQL)
|
Schema
|
No predefined schema or less rigid schemas
|
Predefined Schema (Data Definition Language & Data
Manipulation Language)
|
Data Type
|
Supports unstructured and unpredictable data
|
Supports relational data and its relationships are stored in
separate tables
|
ACID/BASE
|
Based on BASE principle (Basically, Available, Soft State,
Eventually Consistent)
|
Based on ACID principle (Atomicity, Consistency, Isolation and
Durability)
|
Transaction Management
|
Weaker transactional guarantee
|
Strong transactional guarantees
|
Data Storage Technique
|
Schema-free collections are utilized to store different types and
document structures, such as {“color”, “blue”} and {“price”, “23.5”} can be
stored within a single collection.
|
No collections are used for data storage; instead use DML for
it.
|
Conclusion
|
|||
Here is another quick comparison between NoSQL and RDBMS:
|
|||
Opt NoSQL
|
|||
1. If data is huge,
unstructured, sparse/growing
|
|||
2. Less rigid schema
|
|||
3. Performance &
Availability preferred over Redundancy
|
|||
4. While scaling out
is an out-of-the-box feature, it does not prevent scale up,
|
|||
5. Cost Effective-
uses clusters of cheap commodity servers to manage the exploding data and
transaction volumes
|
|||
Opt RDBMS
|
|||
1. If Analytics, BI or
Reporting is required.
|
|||
2. For Benefits of
ACID
|
|||
3. Rigid Schema
|
|||
4. No redundancy
allowed
|
|||
5. Allows Scale up
& limited Scale-out (sharding)
|
|||
6. Expensive- rely on
expensive proprietary servers and storage systems
|
|||
There are more than 250+ NoSQL Products available in the market
today. Below are few of them with their characteristics.
DB
Name/Type
|
Year
Started
|
Document
Database
|
Columner
Database
|
Graph
Database
|
Key-Value
|
Multi-Model
|
Dead
|
ACID
|
MongoDB
|
2009
Feb
|
Y
|
||||||
CouchDB
|
2010
July
|
Y
|
||||||
ArangoDB
|
2011
|
Y
|
Y
|
Y
|
Y
|
|||
Cloud Datastore (Google)
|
2013
May
|
Y
|
Y
|
|||||
DocumentDB (Microsoft Azure)
|
2015
|
Y
|
Y
|
|||||
RethinkDB
|
2009
June
|
Y
|
||||||
SequoiaDB
|
2013
|
Y
|
||||||
NosDB (for .NET)
|
2016
July
|
Y
|
||||||
RavenDB (for .NET)
|
2010
April
|
Y
|
||||||
MarkLogic
|
2001
|
Y
|
Y
|
Y
|
||||
Clusterpoint Database
|
2006
|
Y
|
Y
|
Y
|
||||
Terrastore
|
2009
|
Y
|
Y !!!
|
|||||
JasDB
|
2013
June
|
Y
|
||||||
RaptorDB (for .NET)
|
2012
April
|
Y
|
||||||
EJDB
|
2012
Nov
|
Y
|
||||||
Cassandra
|
2010
Apr
|
Y
|
Y
|
|||||
Scylla
|
2015
Sep
|
Y
|
||||||
Hbase
|
2008
!!!
|
Y
|
||||||
BigTable (Google)
|
2005
Feb
|
Y
|
||||||
Accumulo
|
2012
March
|
Y
|
Y
|
Y
|
||||
MonetDB
|
2016
Dec
|
Y
|
||||||
Riak
|
2009
Aug
|
Y
|
||||||
Redis
|
2009
May
|
Y
|
||||||
Dynamo (Amazon)
|
2013
|
Y
|
||||||
MemcacheDB
|
2008
Dec
|
Y
|
||||||
Neo4j
|
2010
Feb
|
Y
|
||||||
OrientDB
|
2010
|
Y
|
Y
|
|||||
FlockDB
|
2010
April
|
Y
|
||||||
InfoGrid
|
2010 !!
|
Y
|
||||||
InfiniteGraph
|
2010
|
Y
|
||||||
gunDB
|
2015
Jan
|
Y
|
Y
|
Y
|
||||
LucidDB
|
Y
|
|||||||
Hypertable
|
2015
Dec
|
Y
|