Thursday, June 21, 2018

AWS (Amazon Web Services) Product Listing - Data(base) Related :as of June 2018

There is nothing much to blog about in this post as there is already way too much information listed below. 

Thought of putting it together after spending some decent amount of time reviewing AWS (Amazon Web Services) products and what it has to offer.

The listing below only covers the product(s) that are related to Data(base) area as I am only focusing on that area for now. 

There are tons of products available in AWS ( if one wants a wealth of information.

This list is as of June 2018 and I do not think I can keep this up to date as the way AWS is going is uncontrollable :)

AWS Product List for Data(base) Related
AWS Compute
Amazon EC2
(Elastic Compute Cloud)
Virtual Servers in the Cloud
Amazon Elastic Compute Cloud (Amazon EC2) is a web service that provides secure, resizable compute capacity in the cloud. It is designed to make web-scale cloud computing easier for developers.

Amazon EC2’s simple web service interface allows you to obtain and configure capacity with minimal friction. It provides you with complete control of your computing resources and lets you run on Amazon’s proven computing environment. Amazon EC2 reduces the time required to obtain and boot new server instances to minutes, allowing you to quickly scale capacity, both up and down, as your computing requirements change. Amazon EC2 changes the economics of computing by allowing you to pay only for capacity that you actually use. Amazon EC2 provides developers the tools to build failure resilient applications and isolate them from common failure scenarios.
Amazon Lambda
Run your code in Response to Events
AWS Lambda lets you run code without provisioning or managing servers. You pay only for the compute time you consume - there is no charge when your code is not running.

With Lambda, you can run code for virtually any type of application or backend service - all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability. You can set up your code to automatically trigger from other AWS services or call it directly from any web or mobile app.
Amazon ECS
(Elastic Container Service)
Run containerized applications in production
Amazon Elastic Container Service (Amazon ECS) is a highly scalable, high-performance container orchestration service that supports Docker containers and allows you to easily run and scale containerized applications on AWS. Amazon ECS eliminates the need for you to install and operate your own container orchestration software, manage and scale a cluster of virtual machines, or schedule containers on those virtual machines.

With simple API calls, you can launch and stop Docker-enabled applications, query the complete state of your application, and access many familiar features such as IAM roles, security groups, load balancers, Amazon CloudWatch Events, AWS CloudFormation templates, and AWS CloudTrail logs.
Amazon EKS
(Elastic Container Service for Kubernetes)
Highly available, scalable, and secure Kubernetes service
Amazon Elastic Container Service for Kubernetes (Amazon EKS) makes it easy to deploy, manage, and scale containerized applications using Kubernetes on AWS.

Amazon EKS runs the Kubernetes management infrastructure for you across multiple AWS availability zones to eliminate a single point of failure. Amazon EKS is certified Kubernetes conformant so you can use existing tooling and plugins from partners and the Kubernetes community. Applications running on any standard Kubernetes environment are fully compatible and can be easily migrated to Amazon EKS.

Amazon EKS is generally available for all AWS customers.
AWS Storage Offering
Amazon S3
Object storage built to store and retrieve any amount of data from anywhere
Companies today need the ability to simply and securely collect, store, and analyze their data at a massive scale.
Amazon S3 is object storage built to store and retrieve any amount of data from anywhere – web sites and mobile apps, corporate applications, and data from IoT sensors or devices. It is designed to deliver 99.999999999% durability, and stores data for millions of applications used by market leaders in every industry.
S3 provides comprehensive security and compliance capabilities that meet even the most stringent regulatory requirements. It gives customers flexibility in the way they manage data for cost optimization, access control, and compliance. S3 provides query-in-place functionality, allowing you to run powerful analytics directly on your data at rest in S3. And Amazon S3 is the most supported cloud storage service available, with integration from the largest community of third-party solutions, systems integrator partners, and other AWS services.
Amazon EBS
(Elastic Block Storage)
Persistent block storage for Amazon EC2
Amazon Elastic Block Store (Amazon EBS) provides persistent block storage volumes for use with Amazon EC2 instances in the AWS Cloud. Each Amazon EBS volume is automatically replicated within its Availability Zone to protect you from component failure, offering high availability and durability. Amazon EBS volumes offer the consistent and low-latency performance needed to run your workloads. With Amazon EBS, you can scale your usage up or down within minutes – all while paying a low price for only what you provision.

Amazon EBS is designed for application workloads that benefit from fine tuning for performance, cost and capacity. Typical use cases include Big Data analytics engines (like the Hadoop/HDFS ecosystem and Amazon EMR clusters), relational and NoSQL databases (like Microsoft SQL Server and MySQL or Cassandra and MongoDB), stream and log processing applications (like Kafka and Splunk), and data warehousing applications (like Vertica and Teradata).

AWS offers two flavours of networked block storage as a service with EBS (Elastic Block Storage):
1] Standard
2] Provisioned IOPS (Input/Output Operations per Second)
These two offerings differ primarily by performance, meaning that the speed of an application hosted on AWS using EBS for Storage will be impacted by the choice.
Amazon EFS
(Elastic File System)
Scalable, reliable, and elastic file storage for the AWS Cloud
Amazon Elastic File System (Amazon EFS) provides simple, scalable, elastic file storage for use with AWS Cloud services and on-premises resources. It is easy to use and offers a simple interface that allows you to create and configure file systems quickly and easily. Amazon EFS is built to elastically scale on demand without disrupting applications, growing and shrinking automatically as you add and remove files, so your applications have the storage they need, when they need it.

Amazon EFS is a regional service designed for high availability and durability supporting a broad spectrum of use cases, including web serving and content management, enterprise applications, media and entertainment processing workflows, home directories, database backups, developer tools, container storage, and big data analytics workloads.

You can mount your Amazon EFS file systems on your on-premises datacenter servers when connected to your Amazon VPC with AWS Direct Connect. You can mount your EFS file systems on on-premises servers to migrate data sets to EFS, enable cloud bursting scenarios, or backup your on-premises data to EFS.
Amazon Glacier
Long-term, secure, durable object storage for data archiving
Amazon Glacier is a secure, durable, and extremely low-cost cloud storage service for data archiving and long-term backup. It is designed to deliver 99.999999999% durability, and provides comprehensive security and compliance capabilities that can help meet even the most stringent regulatory requirements. Amazon Glacier provides query-in-place functionality, allowing you to run powerful analytics directly on your archive data at rest. Customers can store data for as little as $0.004 per gigabyte per month, a significant savings compared to on-premises solutions. To keep costs low yet suitable for varying retrieval needs, Amazon Glacier provides three options for access to archives, from a few minutes to several hours.
AWS Storage Gateway
Hybrid cloud storage with seamless local integration and optimized data transfer
AWS Storage Gateway is a hybrid storage service that enables your on-premises applications to seamlessly use AWS cloud storage. You can use the service for backup and archiving, disaster recovery, cloud bursting, storage tiering, and migration. Your applications connect to the service through a gateway appliance using standard storage protocols, such as NFS and iSCSI. The gateway connects to AWS storage services, such as Amazon S3, Amazon Glacier, and Amazon EBS, providing storage for files, volumes, and virtual tapes in AWS. The service includes a highly-optimized data transfer mechanism, with bandwidth management, automated network resilience, and efficient data transfer, along with a local cache for low-latency on-premises access to your most active data.
Amazon Snowball
Migrate or transport petabyte-scale data sets into and out of AWS
Snowball is a petabyte-scale data transport solution that uses devices designed to be secure to transfer large amounts of data into and out of the AWS Cloud. Using Snowball addresses common challenges with large-scale data transfers including high network costs, long transfer times, and security concerns. Customers today use Snowball to migrate analytics data, genomics data, video libraries, image repositories, backups, and to archive part of data center shutdowns, tape replacement or application migration projects. Transferring data with Snowball is simple, fast, more secure, and can be as little as one-fifth the cost of transferring data via high-speed Internet.

With Snowball, you don’t need to write any code or purchase any hardware to transfer your data. Simply create a job in the AWS Management Console ("Console") and a Snowball device will be automatically shipped to you. Once it arrives, attach the device to your local network, download and run the Snowball Client ("Client") to establish a connection, and then use the Client to select the file directories that you want to transfer to the device. The Client will then encrypt and transfer the files to the device at high speed. Once the transfer is complete and the device is ready to be returned, the E Ink shipping label will automatically update and you can track the job status via Amazon Simple Notification Service (SNS), text messages, or directly in the Console.
Amazon Snowball Edge
Petabyte-scale data transport with on-board storage and compute
AWS Snowball Edge is a 100TB data transfer device with on-board storage and compute capabilities. You can use Snowball Edge to move large amounts of data into and out of AWS, as a temporary storage tier for large local datasets, or to support local workloads in remote or offline locations.

Snowball Edge connects to your existing applications and infrastructure using standard storage interfaces, streamlining the data transfer process and minimizing setup and integration. Snowball Edge can cluster together to form a local storage tier and process your data on-premises, helping ensure your applications continue to run even when they are not able to access the cloud.
Amazon Snowmobile
Migrate or transport exabyte-scale data sets into and out of AWS
AWS Snowmobile is an Exabyte-scale data transfer service used to move extremely large amounts of data to AWS. You can transfer up to 100PB per Snowmobile, a 45-foot long ruggedized shipping container, pulled by a semi-trailer truck. Snowmobile makes it easy to move massive volumes of data to the cloud, including video libraries, image repositories, or even a complete data center migration. Transferring data with Snowmobile is more secure, fast and cost effective.

After an initial assessment, a Snowmobile will be transported to your data center and AWS personnel will configure it for you so it can be accessed as a network storage target. When your Snowmobile is on site, AWS personnel will work with your team to connect a removable, high-speed network switch from Snowmobile to your local network and you can begin your high-speed data transfer from any number of sources within your data center to the Snowmobile. After your data is loaded, Snowmobile is driven back to AWS where your data is imported into Amazon S3 or Amazon Glacier.

Snowmobile uses multiple layers of security to help protect your data including dedicated security personnel, GPS tracking, alarm monitoring, 24/7 video surveillance, and an optional escort security vehicle while in transit. All data is encrypted with 256-bit encryption keys you manage through the AWS Key Management Service (KMS) and designed for security and full chain-of-custody of your data.
AWS Databases Offering
Amazon RDS (Relational Database Service):
Amazon Aurora Aurora is a MySQL and PostgreSQL compatible RDBMS built for the cloud that combines the performance and availability of high-end commercial databases with the simplicity and cost-effectiveness of open source databases.
Aurora is upto Five times faster than standard MySQL and Three times faster than standard PostgreSQL.
It features a distributed, fault-tolerant, self-healing storage system that auto-scales up to 64TB per db instance.
PostgreSQL PostgreSQL is Open Source RDBMS. Amazon RDS for PostgreSQL database instance can be provisioned with either standard storage or Provisionsed IOPS storage. Once Proviisioned, you can scale up to 16TB of strage and 40,000 IOPS.
It also enables you to scale out beyond the capacity of single database deployment for read-heavy database workloads.
MySQL Worlds most popular Open Source RDBMS. Amazon RDS supports versions 5.5, 5.6, 5.7 as of June 2018
MariaDB Popular Open Source RDBMS created by the original developers of MySQL. Amazon RDS supports MariaDB 10.2 as of June 2018
Oracle You can run Amazon RDS for Oracle under two different licensing models:
1] License Included
2] BYOL (Bring Your Own License)
Support for a given edition varies by licensing model:
Standard Edition One and Two: supports both licensing models
Standard Edition: BYOL
Enterprise Edition: BYOL
MS SQL Server Amazon RDS for MS SQL Server supports "License Included" licensing model. You do not need separately purchased license.
Maximum Database Storage Size - 16TB (up from 4TB as of June 2018).
Amazon RDS supports multiple editions of SQL Server (2008 R2, 2012, 2014, 2016, and 2017) including Express, Web, Standard Edition and Enterprise Edition.
Not all editions may be available in all regions; see pricing for detals.
Non Relational Databases:  
Amazon DynamoDB | NoSQL Database Amazon DynamoDB is a fast and flexible nonrelational database service for all applications that need consistent, single-digit millisecond latency at any scale. It is a fully managed cloud database and supports both document and key-value store models. Its flexible data model, reliable performance, and automatic scaling of throughput capacity make it a great fit for mobile, web, gaming, ad tech, IoT, and many other applications.

Amazon DynamoDB Accelerator (DAX) is a fully managed, highly available, in-memory cache that can reduce DynamoDB response times from milliseconds to microseconds, even at millions of requests per second.
Amazon Redshift Amazon Redshift is a fast, fully managed data warehouse that makes it simple and cost-effective to analyze all your data using standard SQL and your existing Business Intelligence (BI) tools. It allows you to run complex analytic queries against petabytes of structured data, using sophisticated query optimization, columnar storage on high-performance local disks, and massively parallel query execution. Most results come back in seconds.

Amazon Redshift also includes Redshift Spectrum, allowing you to directly run SQL queries against exabytes of unstructured data in Amazon S3. No loading or transformation is required, and you can use open data formats, including Avro, CSV, Grok, Ion, JSON, ORC, Parquet, RCFile, RegexSerDe, SequenceFile, TextFile, and TSV. Redshift Spectrum automatically scales query compute capacity based on the data being retrieved, so queries against Amazon S3 run fast, regardless of data set size.
Amazon ElastiCache Amazon ElastiCache offers fully managed Redis and Memcached. Seamlessly deploy, operate, and scale popular open source compatible in-memory data stores. Build data-intensive apps or improve the performance of your existing apps by retrieving data from high throughput and low latency in-memory data stores. Amazon ElastiCache is a popular choice for Gaming, Ad-Tech, Financial Services, Healthcare, and IoT apps.

Redis is a fast, open source, in-memory key-value data store for use as a database, cache, message broker, and queue. It delivers sub-millisecond response times enabling millions of requests per second for real-time applications in Gaming, Ad-Tech, Financial Services, Healthcare, and IoT. Redis is a popular choice for caching, session management, real-time analytics, geospatial, chat/messaging, media streaming, and gaming leaderboards.

Memcached is a general-purpose distributed memory caching system. It is often used to speed up dynamic database-driven websites by caching data and objects in RAM to reduce the number of times an external data source (such as a database or API) must be read. Memcached is free and open-source software, licensed under the Revised BSD license.
Amazon Nepture Amazon Neptune is a fast, reliable, fully-managed graph database service that makes it easy to build and run applications that work with highly connected datasets. The core of Amazon Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with milliseconds latency. Amazon Neptune supports popular graph models Property Graph and W3C's RDF, and their respective query languages Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune powers graph use cases such as recommendation engines, fraud detection, knowledge graphs, drug discovery, and network security.
AWS Databases Migration Service Offering
Amazon DMS
(Database Migration Service)
AWS Database Migration Service helps you migrate databases to AWS quickly and securely. The source database remains fully operational during the migration, minimizing downtime to applications that rely on the database. The AWS Database Migration Service can migrate your data to and from most widely used commercial and open-source databases.

The service supports homogenous migrations such as Oracle to Oracle, as well as heterogeneous migrations between different database platforms, such as Oracle to Amazon Aurora or Microsoft SQL Server to MySQL. It also allows you to stream data to Amazon Redshift, Amazon DynamoDB, and Amazon S3 from any of the supported sources, which are Amazon Aurora, PostgreSQL, MySQL, MariaDB, Oracle Database, SAP ASE, SQL Server, IBM DB2 LUW, and MongoDB, enabling consolidation and easy analysis of data in a petabyte-scale data warehouse. AWS Database Migration Service can also be used for continuous data replication with high-availability.

When migrating databases to Aurora, Redshift or DynamoDB, you can use DMS free for six months. 
AWS Analytics
Athena Amazon Athena is an interactive query service that makes it easy to analyze data in Amazon S3 using standard SQL. Athena is serverless, so there is no infrastructure to manage, and you pay only for the queries that you run.

Athena is easy to use. Simply point to your data in Amazon S3, define the schema, and start querying using standard SQL. Most results are delivered within seconds. With Athena, there’s no need for complex ETL jobs to prepare your data for analysis. This makes it easy for anyone with SQL skills to quickly analyze large-scale datasets.

Athena is out-of-the-box integrated with AWS Glue Data Catalog, allowing you to create a unified metadata repository across various services, crawl data sources to discover schemas and populate your Catalog with new and modified table and partition definitions, and maintain schema versioning. You can also use Glue’s fully-managed ETL capabilities to transform data or convert it into columnar formats to optimize cost and improve performance.
Amazon Glue AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code to execute your data transformations and data loading processes.

AWS Glue generates code that is customizable, reusable, and portable. Once your ETL job is ready, you can schedule it to run on AWS Glue's fully managed, scale-out Apache Spark environment. AWS Glue provides a flexible scheduler with dependency resolution, job monitoring, and alerting.

AWS Glue is serverless, so there is no infrastructure to buy, set up, or manage. It automatically provisions the environment needed to complete the job, and customers pay only for the compute resources consumed while running ETL jobs. With AWS Glue, data can be available for analytics in minutes.
Amazon EMR Amazon Elastic MapReduce (Amazon EMR) is a web service that enables businesses, researchers, data analysts, and developers to easily and cost-effectively process vast amounts of data.

Amazon EMR provides a managed Hadoop framework that makes it easy, fast, and cost-effective to process vast amounts of data across dynamically scalable Amazon EC2 instances. You can also run other popular distributed frameworks such as Apache Spark, HBase, Presto, and Flink in Amazon EMR, and interact with data in other AWS data stores such as Amazon S3 and Amazon DynamoDB.

Amazon EMR securely and reliably handles a broad set of big data use cases, including log analysis, web indexing, data transformations (ETL), machine learning, financial analysis, scientific simulation, and bioinformatics.
Amazon CloudSearch Amazon CloudSearch manages all the server resources needed to build and deploy search indexes. All you have to do is upload your data to a search domain and start submitting requests.

Amazon CloudSearch is a managed service in the AWS Cloud that makes it simple and cost-effective to set up, manage, and scale a search solution for your website or application.
Amazon CloudSearch supports 34 languages and popular search features such as highlighting, autocomplete, and geospatial search.
Amazon Elasticsearch Service Amazon Elasticsearch Service makes it easy to deploy, secure, operate, and scale Elasticsearch for log analytics, full text search, application monitoring, and more. Amazon Elasticsearch Service is a fully managed service that delivers Elasticsearch’s easy-to-use APIs and real-time analytics capabilities alongside the availability, scalability, and security that production workloads require. The service offers built-in integrations with Kibana, Logstash, and AWS services including Amazon Virtual Private Cloud (VPC), AWS Key Management Service (KMS), Amazon Kinesis Data Firehose, AWS Lambda, Amazon Cognito and Amazon CloudWatch so that you can go from raw data to actionable insights quickly and securely.
Amazon Kinesis Amazon Kinesis makes it easy to collect, process, and analyze real-time, streaming data so you can get timely insights and react quickly to new information. Amazon Kinesis offers key capabilities to cost-effectively process streaming data at any scale, along with the flexibility to choose the tools that best suit the requirements of your application. With Amazon Kinesis, you can ingest real-time data such as video, audio, application logs, website clickstreams, and IoT telemetry data for machine learning, analytics, and other applications. Amazon Kinesis enables you to process and analyze data as it arrives and respond instantly instead of having to wait until all your data is collected before the processing can begin.
Amazon QuickSight Amazon QuickSight is a fast, cloud-powered BI service that makes it easy to build
visualizations, perform ad-hoc analysis, and quickly get business insights from your
data. Using our cloud-based service you can easily connect to your data, perform
advanced analysis, and create stunning visualizations and rich dashboards that can be
accessed from any browser or mobile device.

QuickSight is the first BI service to offer pay-per-session pricing, making it even more
cost-effective for you to provide access to analytics and insights for all of your users.
With pay-per-session pricing there are no upfront costs, no annual commitments, and
no charges for inactive users!
Amazon Data Pipeline AWS Data Pipeline is a web service that helps you reliably process and move data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals. With AWS Data Pipeline, you can regularly access your data where it’s stored, transform and process it at scale, and efficiently transfer the results to AWS services such as Amazon S3, Amazon RDS, Amazon DynamoDB, and Amazon EMR.

AWS Data Pipeline helps you easily create complex data processing workloads that are fault tolerant, repeatable, and highly available. You don’t have to worry about ensuring resource availability, managing inter-task dependencies, retrying transient failures or timeouts in individual tasks, or creating a failure notification system. AWS Data Pipeline also allows you to move and process data that was previously locked up in on-premises data silos.

Tuesday, June 20, 2017

Databases - RDBMS and NoSQL

If you are reading this blog post then you must have used at least one database in your career and wondering what's up with these new trend of "NoSQL".

Before getting into NoSQL databases, let’s first look at the history of RDBMS (Relational Data Base Management System) and its main concepts.

RBDMS has a history of 40 years with many enhancements. It all started with E.F. Codd and his 12 rules to make a database as RDBMS.

Beyond a lot of intended benefits, the relational model is well-suited to client-server programming and today it is the predominant technology for storing structured data in web and business applications. Classical relational databases follow the ACID property. That is, a database transaction must be Atomic, Consistent, Isolated and Durable. The details of ACID are as follows:

ACID - Atomicity, Consistency, Isolation, Durability
Atomicity requires that each transaction be "all or nothing": if one part of the transaction fails, then the entire transaction fails, and the database state is left unchanged. An atomic system must guarantee atomicity in each and every situation, including power failures, errors and crashes. To the outside world, a committed transaction appears (by its effects on the database) to be indivisible ("atomic"), and an aborted transaction does not happen.
The consistency property ensures that any transaction will bring the database from one valid state to another. Any data written to the database must be valid according to all defined rules, including constraints, cascades, triggers, and any combination thereof. This does not guarantee correctness of the transaction in all ways the application programmer might have wanted (that is the responsibility of application-level code), but merely that any programming errors cannot result in the violation of any defined rules.
The isolation property ensures that the concurrent execution of transactions results in a system state that would be obtained if transactions were executed sequentially, i.e., one after the other. Providing isolation is the main goal of concurrency control. Depending on the concurrency control method (i.e., if it uses strict - as opposed to relaxed - serializability), the effects of an incomplete transaction might not even be visible to another transaction.
The durability property ensures that once a transaction has been committed, it will remain so, even in the event of power loss, crashes, or errors. In a relational database, for instance, once a group of SQL statements execute, the results need to be stored permanently (even if the database crashes immediately thereafter). To defend against power loss, transactions (or their effects) must be recorded in a non-volatile memory.

Apart from these ACID properties, there are some basic characteristics due to which Relational DBMS become popular. Some of them are:
·         Data is stored in a set of Tables or data is stored in the format of row and column in a table.
·         Relationships are represented by data.
·         Tables are joined by relational links.
·         Reduced duplication of data in database can be achieved by normalization.
·         They allow greater flexibility and efficiency
Shortcomings of RDBMS
RDBMS is sufficient to store and manipulate all the structured data efficiently but in today’s world the velocity and nature of data used/generated over the Internet is growing exponentially. As we can often see in areas like social media, the data used has no specific structure boundary. This makes unavoidable the need to handle unstructured data which is non-relational and schema-less in nature. For RDBMS it becomes a real challenge to provide the cost effective and fast Create, Read, Update and Delete (CRUD) operation as it has to deal with the overhead of joins and maintaining relationships amongst various data.

Therefore a new mechanism is required to deal with such data in an easy and efficient way. This is where NoSQL comes into the picture to handle unstructured BIG data in an efficient way to provide maximum business value and customer satisfaction.

NoSQL is not a campaign against the SQL language. NoSQL stands for “Not Only SQL.” It provides more possibilities beyond the classic relational approach of data persistence to the developers.

NoSQL refers to a broad class of non-relational databases that differ from classical RDBMS in some significant aspects, most notably because they do not use SQL as their primary query language, instead providing access by means of Application Programming Interfaces (APIs).

The reason behind such a big switch or in other words the advantages of NoSQL are the following:
·         High scalability
·         Distributed Computing
·         Lower cost
·         Schema flexibility
·         Un/semi-structured data
·         No complex relationships
As RDBMS follows the ACID property, NoSQL databases are “BASE” Systems. The BASE acronym was defined by Eric Brewer, who is also known for formulating the CAP theorem whose properties are used by BASE System.

The CAP theorem states that a distributed computer system cannot guarantee all of the following three properties at the same time:

CAP theorem


Every read receives the most recent write or an error
Every request receives a (non-error) response – without guarantee that it contains the most recent write
The system continues to operate despite an arbitrary number of messages being dropped (or delayed) by the network between nodes

A BASE system gives up on consistency so as to have greater Availability and Partition tolerance. A BASE can be defined as following:
Basically Available indicates that the system does guarantee availability.
Soft state indicates that the state of the system may change over time, even without input. This is because of the eventual consistency model.
Eventual consistency indicates that the system will become consistent over time, given that the system doesn’t receive input during that time.

With that said here is the simple pictorial representation of above mentioned concepts and which one to choose from the given business needs.

Types of NoSQL:
As of today there are four types of NoSQL Databases available:
Key-Value: Data is stored in an array of Key-Value pair. It has a Big Hash Table of keys & values. It is Fast scalable but cannot handle complex data.
Columnar: Unlike RDBMS, Each storage block contains data from only one column. Fast, scalable, denormalized, map reduce, good for series, not efficient for complex data.
Document Database: It stores documents made up of tagged elements. A Good operational system to store XML, JSON object related content. Not Analytical and moderately scalable. Its Object Oriented Database

Graph: A network database that uses edges and nodes to represent and store data. Great for complex data, transactional. Less scalable.
Detailed Description:
These type of databases work by matching keys with values, similar to a dictionary. There is no structure nor relation. After connecting to the database server (e.g. Redis), an application can state a key (e.g. the_answer_to_life) and provide a matching value (e.g. 42) which can later be retrieved the same way by supplying the key.
Key / value DBMSs are usually used for quickly storing basic information, and sometimes not-so-basic ones after performing, for example, a CPU and memory intensive computation. They are extremely performant, efficient and usually easily scalable.
Column based NoSQL database management systems work by advancing the simple nature of key / value based ones.
Despite their complicated-to-understand image on the internet, these databases work very simply by creating collections of one or more key / value pairs that match a record.
Unlike the traditional defines schemas of relational databases, column-based NoSQL solutions do not require a pre-structured table to work with the data. Each record comes with one or more columns containing the information and each column of each record can be different.

Basically, column-based NoSQL databases are two dimensional arrays whereby each key (i.e. row / record) has one or more key / value pairs attached to it and these management systems allow very large and un-structured data to be kept and used (e.g. a record with tons of information).
Document Database:
Document based NoSQL database management systems can be considered the latest craze that managed to take a lot of people by storm. These DBMS work in a similar fashion to column-based ones; however, they allow much deeper nesting and complex structures to be achieved (e.g. a document, a document within a document).
Documents overcome the constraints of one or two level of key / value nesting of columnar databases. Basically, any complex and arbitrary structure can form a document, which can be stored using these management systems.
Despite their powerful nature, and the ability to query records by individual keys, document based management systems have their own issues and downfalls compared to others. For example, retrieving a value of a record means getting the whole lot of it and same goes for updates, all of which affect the performance.
Finally, the very interesting flavor of NoSQL database management systems is the graph based ones.

The graph based DBMS models represent the data in a completely different way than the previous three models. They use tree-like structures (i.e. graphs) with nodes and edges connecting each other through relations.

Similarly to mathematics, certain operations are much simpler to perform using these type of models thanks to their nature of linking and grouping related pieces of information (e.g. connected people).

These databases are commonly used by applications whereby clear boundaries for connections are necessary to establish. For example, when you register to a social network of any sort, your friends' connection to you and their friends' friends' relation to you are much easier to work with using graph-based database management systems.

NoSQL Advantages
NoSQL Disadvantages
High Scalability
Too many options (> 250), which one to pick.
Schema Flexibility
Limited query capabilities (so far)

Distributed Computing (Reliability, Scalability, Sharing of Resources, Speed)
Eventual consistency is not intuitive to program for strict scenarios like banking applications.
No complicated relationships
Lacks Joins, Group by, Order by facilities
Lower cost (Hardware Costs)
ACID transactions
Open Source – All of the NoSQL options with the exceptions of Amazon S3 (Amazon Dynamo) are open-source solutions. This provides a low-cost entry point.
Limited or no guarantee of support – Open source

Data Volume
Handles Huge Data Volumes
Handles Limited Data Volumes
Data Validity
Highly Guaranteed
Less Guaranteed
Horizontally & Vertically
Query Language
No declarative query language
Structured Query Language (SQL)
No predefined schema or less rigid schemas
Predefined Schema (Data Definition Language & Data Manipulation Language)
Data Type
Supports unstructured and unpredictable data
Supports relational data and its relationships are stored in separate tables
Based on BASE principle (Basically, Available, Soft State, Eventually Consistent)
Based on ACID principle (Atomicity, Consistency, Isolation and Durability)
Transaction Management
Weaker transactional guarantee
Strong transactional guarantees
Data Storage Technique
Schema-free collections are utilized to store different types and document structures, such as {“color”, “blue”} and {“price”, “23.5”} can be stored within a single collection.
No collections are used for data storage; instead use DML for it.

Here is another quick comparison between NoSQL and RDBMS:
1.     If data is huge, unstructured, sparse/growing
2.     Less rigid schema
3.     Performance & Availability preferred over Redundancy
4.      While scaling out is an out-of-the-box feature, it does not prevent scale up,
5.     Cost Effective- uses clusters of cheap commodity servers to manage the exploding data and transaction volumes
1.    If Analytics, BI or Reporting is required.
2.     For Benefits of ACID
3.     Rigid Schema
4.     No redundancy allowed
5.     Allows Scale up & limited Scale-out (sharding)
6.     Expensive- rely on expensive proprietary servers and storage systems
There are more than 250+ NoSQL Products available in the market today. Below are few of them with their characteristics.

DB Name/Type
Year Started
Document Database
Columner Database
Graph Database
2009 Feb

2010 July



Cloud Datastore (Google)
2013 May

DocumentDB (Microsoft Azure)

2009 June


NosDB (for .NET)
2016 July

RavenDB (for .NET)
2010 April



Clusterpoint Database



Y !!!

2013 June

RaptorDB (for .NET)
2012 April

2012 Nov

2010 Apr



2015 Sep


2008 !!!


BigTable (Google)
2005 Feb


2012 March



2016 Dec


2009 Aug


2009 May


Dynamo (Amazon)


2008 Dec


2010 Feb





2010 April


2010 !!




2015 Jan





2015 Dec