SQL, NoSQL, NewSQL – Tutorials

In this part we cover the following topics

Data and database
SQL
- Summary
Big Data -- big problem with data
- When data becomes a problem
- Volume
NoSQL
NewSQL
- Summary
Summary

Data and database

We can define database as an organized collection of data stored in accordance with specific rules. In this sense postage stamps collection, books stored in a shelf (in some systematic or chaotic way) or even kid's cars collection are examples of databases.

We used to think about databases in much more narrower and thus more useful sense.

First, we think about pure immaterial data -- we store numbers, texts, images and songs but not real objects. Probably because real objects are much more harder to manipulate in an automatic way than sequences of characters.

We define data as a set of values of qualitative or quantitative variables (properties) describing some object or phenomenon.

One data, many data...

The Latin word data is the plural of datum, (en. (thing) given, pl. dawać/ofiarować) neuter past participle of dare (en. to give, pl. daję). In consequence, datum should be used in the singular and data for plural, though, in non-specialist, everyday writing, data is most commonly used in the singular, as a mass noun (like "information", "sand" or "rain"). Saying the truth, I observe this tendency to be more and more popular. The first English use of the word data is from the 1640s. Using the word data to mean transmittable and storable computer information was first done in 1946. The expression data processing was first used in 1954. [W3]

Although the terms "data", "information" and "knowledge" are often used interchangeably, each of these terms has a distinct meaning. Data is a dumb set of values. Nothing more. When the data is processed and transformed in such a way that it becomes useful to the users, it is known as information. So when data starts to ,,speek'', when something valueless is turned into priceless, we have an information. Going further with data "transformations" we reach to DIKW (data, information, knowledge, wisdom) pyramid. The DIKW pyramid shows that data, produced by events, can be enriched with context to create information, information can be supplied with meaning to create knowledge and knowledge can be integrated to form wisdom, wich is at the top.

There is a nice saying (by Miles Kington):

Knowledge is knowing a tomato is a fruit.
Wisdom is not putting it in a fruit salad

And this is true essence of the problem we are discusse.

Second, we pay a great attention to automatic way of processing. The best known tool allowing us to do so nowadays are computers. That is why immaterial data is so useful for us -- we can turn them into digital data and feed them a computer systems to make them do for us things we won't do ourself.

This should explains why nowadays we define database as a digital data collected in accordance with the rules adopted for a given computer program specialized for collecting, storing and processing this data. Such a program (often a package of various programs) is called a database management system (DBMS).

The database management system (DBMS) is the software that interacts with end users, applications, and the database itself to capture and analyze the data. It serves as an intermediate layer isolating end user from all "unnecessary" technical details. In common language we use the term database to loosely refer to any of the DBMS, the database system or an application associated with the database.

[IMAGE: ]

SQL

We can classify database-management systems according to the database models that they support. Not going far into the past we can say that first large-scale used model, dominant in the market for more than 20 years, were relational databases arise in the 1970s. We refer them as SQL databases because Structured Query Language (pronance it as S-Q-L or sequel) was used by the vast majority of them for writing and querying data. SQL (in a sense: SQL databases) utilizes Edgar F. Codd’s relational model.

Database model used by SQL assume that data is represented in terms of tuples, grouped into relations. We can think about relations as an Excell table while tuples as a rows of this table. Every tuple in turn consist of one or more attributes which resembles spreadsheet's columns. Main properties of this model are

We may have any number of tables in our database.
In every table we can have any but precisely defined number of columns.
Using keys (which are unique identifier for every row within a given table) we can define relationships between tables.

[IMAGE: Relational_model_concepts - mam]

Having some data organized this way

Table "Customers"   Table "Orders"
ID Name             ID CustomerID   Total
 1 Al                1          1  100.00
 2 Betty             2          1   33.00
 3 Carolina          3          4    5.00
 4 Diana             4          2  250.50
 5 Emma              5          3   64.00
 6 Fiona             6          6  172.00

Table "Customers" Table "Orders"

ID Name ID CustomerID Total

1 Al 1 1 100.00

2 Betty 2 1 33.00

3 Carolina 3 4 5.00

4 Diana 4 2 250.50

5 Emma 5 3 64.00

6 Fiona 6 6 172.00

we can do some basic operation on them. Notice that the "CustomerID" column in the "Orders" table refers to the "ID" column in the "Customers" table.

Retrieve data stored in database

SELECT name FROM Customers; Al Betty Carolina Diana Emma Fiona

1
2
3
4
5
6
7
8
9

SELECT name
FROM Customers;

Al
Betty
Carolina
Diana
Emma
Fiona
Retrieve data stored in database imposing some conditions

SELECT CustomerID FROM Orders WHERE Total > 100; 2 6

1
2
3
4
5
6

SELECT CustomerID
FROM Orders
WHERE Total > 100;

2
6

Retrieve joined data stored in database imposing some conditions

SELECT Name
FROM Customers, Orders
WHERE Total > 100
	AND Customers.ID = Orders.CustomerID
	
or	
	
SELECT Name 
FROM Customers 
INNER JOIN Orders ON Customers.ID = Orders.CustomerID
WHERE Total > 100;
	

Betty
Fiona

SELECT Name

FROM Customers, Orders

WHERE Total > 100

AND Customers.ID = Orders.CustomerID

SELECT Name

FROM Customers

INNER JOIN Orders ON Customers.ID = Orders.CustomerID

WHERE Total > 100;

Betty

Fiona

Insert data into database

INSERT INTO Customers(ID, Name)
VALUES (7, 'Helen');

Table "Customers"   Table "Orders"
ID Name             ID CustomerID   Total
 1 Al                1          1  100.00
 2 Betty             2          1   33.00
 3 Carolina          3          4    5.00
 4 Diana             4          2  250.50
 5 Emma              5          3   64.00
 6 Fiona             6          6  172.00
 7 Helen

INSERT INTO Customers(ID, Name)

VALUES (7, 'Helen');

Table "Customers" Table "Orders"

ID Name ID CustomerID Total

1 Al 1 1 100.00

2 Betty 2 1 33.00

3 Carolina 3 4 5.00

4 Diana 4 2 250.50

5 Emma 5 3 64.00

6 Fiona 6 6 172.00

7 Helen

Update exsting data

UPDATE Customers
SET name = 'Grace'
WHERE ID = 7;

Table "Customers"   Table "Orders"
ID Name             ID CustomerID   Total
 1 Al                1          1  100.00
 2 Betty             2          1   33.00
 3 Carolina          3          4    5.00
 4 Diana             4          2  250.50
 5 Emma              5          3   64.00
 6 Fiona             6          6  172.00
 7 Grace

UPDATE Customers

SET name = 'Grace'

WHERE ID = 7;

Table "Customers" Table "Orders"

ID Name ID CustomerID Total

1 Al 1 1 100.00

2 Betty 2 1 33.00

3 Carolina 3 4 5.00

4 Diana 4 2 250.50

5 Emma 5 3 64.00

6 Fiona 6 6 172.00

7 Grace

Delete existing database

DELETE FROM Customers
WHERE name = 'Grace';

Table "Customers"   Table "Orders"
ID Name             ID CustomerID   Total
 1 Al                1          1  100.00
 2 Betty             2          1   33.00
 3 Carolina          3          4    5.00
 4 Diana             4          2  250.50
 5 Emma              5          3   64.00
 6 Fiona             6          6  172.00

DELETE FROM Customers

WHERE name = 'Grace';

Table "Customers" Table "Orders"

ID Name ID CustomerID Total

1 Al 1 1 100.00

2 Betty 2 1 33.00

3 Carolina 3 4 5.00

4 Diana 4 2 250.50

5 Emma 5 3 64.00

6 Fiona 6 6 172.00

An inseparable part of this system is a set of rules known as the normal forms. What is interesting, relational model defines few levels of conformance specifying how data should be organized into tables. The main goal of all normal forms is to force user to keep data in a form limiting data redundancy and helping to avoid troubles while data is inserted, updated or deleted. Normalization guidelines are cumulative. For a database to be in 2NF (second normal form), it must first fulfill all the criteria of a 1NF (first normal form) database; to be in 3NF, it mus be in 2NF , etc.

The way of data organization imposed by normal forms, results the way we think about real objects. A relational database is like a garage that forces you to take your car apart and store the pieces in little drawers, every time you drive into it.

[IMAGE: car and dismounted car]

Mismatch between the relational model (the way we store data) and real object propagated to object-oriented programming languages (the way we use data) had serious consequences. Whenever an object was stored into or retrieved from a relational database, multiple SQL operations would be required to convert from the object oriented representation to the relational representation. This was cumbersome for the programmer and could lead to performance or reliability issues.

This led to first atempt to replace relational databases with something else. This is why in the late 1980s and early 1990s object-oriented DBMSs were developed. These object-oriented DBMSs (OODBMS), however, never saw wide-spread market adoption. The main reasons for this state of affairs was that they lacked a standard, universal, interface like SQL. People are so used to using SQL, that other intefaces seemed to be awkward or useless. It is true even now -- every modern database technology offers SQL-like interface even if internaly it is not relational system. OODBMS offered the advantages to the application developer, but forgot about those who wished to consume information for business purposes. This could be the reason that OODBMS systems had completely failed to gain market share. We have to remember that databases don’t exist to make programmers life simpler. They represent significant assets that must be accessible to those who want to mine the information for decision making and business intelligence. By implementing a data model that was only understandable and could be used by the programmer, and ignoring the business user of a usable SQL interface, the OODBMS failed to gain support outside programmers world.

This way SQL databases have defended their dominant position in the market. Other words, relational databases, despite its drawbacks, were very well established in IT, seems to be perfectly crafted to all needs and nothing announced that a new era was coming. Unexpectedly in the 2000s, non-relational databases became popular, referred to as NoSQL because they use query languages different than SQL used so far.

Transaction model and ACID

The relational model does not itself define the way in which the database handles concurrent data change requests named transactions. To ensure consistency and integrity of data an ACID transaction model is used and became de facto the standard for all serious relational database implementations. An ACID transaction should be

Atomic. The transaction can not be divided - either all the statements in the transaction are applied to the database or none are.
Consistent. The database remains in a consistent state before and after transaction execution.
Isolated. While multiple transactions can be executed by one or more users simultaneously, one transaction should not see the effects of other in-progress transactions.
Durable. Once a transaction is saved (committed) to the database, its changes are expected to persist even if there is a failure of operating system or hardware.

From one side ACID along with relations is the source of the power of relational databases. On the other hand this is a source of serious and very difficult to overcome problems.

Summary

Relational model is undoubtedly characterized by the following set of positive features.

ACID transactions at the database level makes development and usage easier.
Most SQL code is portable to other SQL databases.
Typed columns and constraints helps validate data before it’s added to the database which increase consistency of the data stored in database.
Build in mechanism like views or roles prevents data to be changed or viewed by unauthorized users.

To be honest one cannot forget about negative side of relational model.

ACID transactions may block system for a short time which may be unacceptable.
The object-relational mapping is possible but can be complex and add one more intermediate layer.
RDBMSs don’t scale out. Sharding over many servers can be done but requires new or tuned application code and will be operationally inefficient.
It is difficult to store high-variability data in tables.
It is difficult to store data in real time and make real time processing.

Big Data -- big problem with data

Why was NoSQL created? You could say SQL was getting old. Nothing could be more wrong. Information technology is one of the few areas in which the system components do not change because of their age. The only impulse for change is usefulness. Never age. An unknown factor had to appear, forcing us to abandon previously known technologies.

When data becomes a problem

Let's make a very simple mental experiment.

Operations on the stack of N sheets of paper.

Stamping task Stamping each sheet of paper.
To increase performance, we should do one of the following actions.
1. Increase the speed of stamping a single sheet employing a mega man -- a man who can do this much faster than all other known man.
  [IMAGE mega man at work]
  Despite of having mega man, this increase in speed has its natural physical limitations.
2. We can divide the stack into smaller (sub)stacks and assign each smaller stack to another person. Increasing stamping speed can be achieved by assigning more people to this task. What's more, despite the fact that each of the people compared to the mega man from the first example will be much less efficient, this task taken as a whole will be solved much faster.
Numbering task Numbering all N cards with N natural numbers from 1 to N. In this case there is no possibility to divide task into smaler (sub)tasks. The only chance to increase performance is to increase processing power of a one single processing unit (employ mega man).

In the stamping task case we saw an example of so called system scaling. Generally speaking, scalability is the property of a system to handle a growing amount of work by adding resources to the system. In our case there are two options to scale our system: either by increasing the powere of single execution unit (mega man) -- this is called vertical scalling, or increase the number of execution unit -- this is called horizontal scalling. Horizontal scalling seems to be more perspective. There is "only" one small but important detail: our task should be divisible into independent subtasks but this is not always the case as we have seen in the numbering task. We will say that stamping task is a scalable task while numbering task is a non-scalable task.

When does a non-scalable task become a problem for us? Notice that need for scaling occurs only in case of voluminous set of data (we say often: large data or big data). For small N there should be no problems to complete both stamping task and numbering task by only one man in reasonable time. When there is relatively little data, non-scalable (or at most vertically scalable) systems are sufficient. So data may be a problem when there is a lot of it. But what does a lot mean? How is it today? Do we have a lot of data today?

Volume

It is assumed that today organizations and users world-wide create over 2.5 EBs (2^60, more than 10^18, 5 bilions DVD) of data a day. As a point of comparison, the Library of Congress currently holds more than 300 TBs (2^40, more than 10^12, 65000 DVD) of data. Almost any aspect of our live is or can be a source of data (another question is if we really need them) and they can be generated by human, machines as well as environment. The most common are (H is used to denote human, M -- machines, E -- environment)

(H) social media, such as Facebook and Twitter,
(H) scientific and research experiments, such as physical simulation,
(H) online transactions, such as point-of-sale and banking,
(M) sensors, such as GPS sensors, RFIDs, smart meters and telematics,
(M) any working device we monitor for our safety, such as planes or cars,
(E) weather conditions, cosmic radiation.

How much data?

Every minute

4.166.667 likes made by Facebook users,
347.222 tweets on Twieeter,
100.040 cals on Skype,
77.166 hours of movies from Netflix,
694 Uber users take ride,
51.000 application are downloaded from AppStore.

[A9, W26, W27, W28, W29, W30]

Today, we collect all data we can get regardless whether we really need them or not. The term data lake was created for this type of data "collection". They acts as a lake with lots of different things inside, not always easily accessible and visible. We do this (collect data) not necessarily because we need it but because we can. Maybe one day we will need it -- we think. Over time, large amounts of data can hide what we previously stored. This is like my basement: I know that some elements I need are somewhere in it. Unfortunately, because there are so many elements, it's cheaper, easier and faster to buy new one than try to find them in my basement element lake.

What is worse, volume is not the only problem with data. Along with that there are more factors playing also very important role: velocity and variety. All of them constitutes something we called nowadays big data.

According to some sources, today big data features may be summed up in more than three different Vs -- see Big Data characteristics [W29, W31] for more details.

This growth of data size is outpacing the growth of storage capacity, leading to the emergence of information management systems where data is stored in a distributed way, but accessed and analysed as if it resides on a single machine. Unfortunately, an increase in processing power does not directly translate to faster data access, triggering a rethink on existing database systems.

This justifies why we've heard about NoSQL so "recently" (relational (SQL) databases are known from the 1970s, nonrelational databases (NoSQL) from the 2000s). Until the data explosion, the existing solutions (SQL) proved to be sufficient. Now they are not able to cope with the current needs of working with large data sets in real time

NoSQL

One of the most important negative feature of SQL databases is it relational, forced by normal forms, nature. Every time we have to get some information, we have to combine data distributed among many tables into something united. Going back to the previously given example with a car, every time we want to take a ride, we have to combine all of the pieces collected in little drawers and rebuild our car. In real database life such an operation of combining, called join, is repeated many times in a minute or even second. It needs time, memory and utilizes CPU power. Natural way of thinking is to get rid of this. Let's try to think about our data as it doesn't need joins. Let's organize them in some different way.

In a way, it turned out that a change in the approach to the method of data storage resulted in easily implmentable distributivity and scalabilty.

NoSQL shouldn't be taken too literally: it is not against the SQL language. SQL as well as other query languages are used with NoSQL databases. For our purpose, we will use NoSQL definition I have found in [1] and I liked it very much

NoSQL is a set of concepts that allows the rapid and efficient processing of data sets with a focus on performance, reliability, and agility.

As we can see, we don't discredit any existing databases system, including deserved RDBMS systems. As we can see from the above definition in the NoSQL solutions, we focus primarily on

fast processing,
efficient processing,
reliable processing,
agile processing.

Motivation

Let's pay attention to the factors that were taken into consideration during NoSQL developing.

Flexibility. One drawback when working with relational database is that we have to know many things (or all of them) in advance before we start using this database. We are expected to know all the tables and columns that will be needed to support an application. It is also commonly assumed that most of the columns in a table will be needed by most of the rows. In practice sometimes we have no idea how many and what types of columns we will need or we add a column to our database just to support an option which happens only once in a million cases.
Moreover, very often our choice changes frequently as we develop our bussines or IT infrastructure and identify importan factors or redefine those already known. This may be named as agile approach to data -- the flexibility to collect, store and processing any type of data any time its appear.

Flexibility allows us also reduce costs, which is highlighted in the one of the following items.

Availability. Today's world is full of information in a sense that we can find an information we are looking for in many different sources. We know it, so if one of the sources is unavailable we will not wait for it but more likely will shift to another one. Think about the way you use web browser to find the webpage with information you need. In just a second you have thousands of results. In consequence if you click on the first link and wait more than 3 seconds to see a resulting webpage you become irritated and click second link in hope to get result in less then one second. If you don't believe, read notes below.

User requirements for database availability are becoming more and more rigorous. First of all, they expect that in the event of a system failure, they will not lose any data or only a strictly defined amount of them -- database system must be reliable. However, it may take a long time to fix the failure and restore the database. Therefore, meeting the second expectations of users, namely ensuring the continuous availability of database systems, even in the event of damage to their individual components or a sudden increase in load, requires the use of appropriate continuous availability technology ( High Availability - HA ).

Availability is usually represented by the number of 9's

Availability	Time of unavailability within a month	Time of unavailability within a year
95%	36 hours	18 days
99%	7 hours	3.5 days
99.5%	3.5 hours	<2 days
99.9%	43m i 12s	8h i 45m
99.99%	4m i 19s	52m i 36s
99.999%	25s	5m i 15s

[A2]

Amazon S3 is designed for 99.999999999% (11 9's) of durability, and stores data for millions of applications for companies all around the world. [W18]

Speed. For the customer's point of view, availability seems to be the most important factor. Relational architecture with lots of constraints set on data (tables) and additional rules (e.g. databases trigers) and needs of splitting every object into the smallest pieces (normal forms) does not support fast data saving. It would be much useful to allow store immediately anything we want at the time we want deffering all rules checking to some later point in the time.
Moreover, very often relational complexity is no needed. We don't need sophisticated logic to handle all the articles from a blog. This logic can be easily implemented in application code in favour of data availability.
Cost. One option is that we can use one expensive server and second for it's backup. At the beginning most of its resources will be unused. Load will increase through time and reaches its limit in five years but you have to pay for this now -- it's a waste of money. Other option is to buy low-cost machine(s) sufficient for current needs. When load will increase a little bit, we can add one more node to our cluster. We can do such a small steps every time a load goes up. Thus we can save some money, which is important especially when we start our business.
Scalability. Scaling is the only way we can meet above requirements. It is important that this should be as easy and as with as little additional costs as it is only would be possible.

Depending on the needs of a particular application, some of these factors may be more important than others -- this explains why NoSQL family is so big.

NoSQL family

Column family stores
Key-value stores
Document stores
Graph stores
Column databases
Time series databases

BASE set of rules

Relational database systems have their ACID set of rules. By chemical analogy, NoSQL systems have their BASE set of rules. While ACID systems focus on high data integrity, NoSQL systems take into consideration a slightly different set of constraints named BASE.

ACID systems are focus on the consistency and integrity of data above all other considerations. Temporarily blocking is a reasonable price we have to pay to ensure that our system returns reliable and accurate information. ACID systems are said to be pessimistic in that they must consider all possible failure modes in a computing environment. According to Murphy’s Law: if anything can go wrong it will go wrong, and ACID systems are ready for this and guarantee that they will survive.

In contrast, BASE systems focus something significantly different: the availability. BASE systems most important objective is to allow new data to be stored, even at the risk of being out of sync for a short period of time. BASE systems aren’t considered pessimistic in that they don’t worry about the details if one process is behind. They’re optimistic in that they assume that eventually, in no so distance future, all systems will catch up and become consistent.

BASE is the equivalent of ACID in NoSQL world. It stands for these concepts

Basic availability means that the database appears to work most of the time. It allows systems to be temporarily inconsistent so that transactions are manageable. In BASE systems, the information and service capability are basically available. This means that there can be a partial failure in some parts of the distributed system but the rest of the system continues to function.
Soft-state means that stores don’t have to be write-consistent, nor do different replicas have to be mutually consistent all the time. Some inaccuracy is temporarily allowed and data may change while being used. State of the system may change over time, even without input. This is because of eventual consistency.
Eventual consistency means that there may be times when the database is in an inconsistent state. Eventually, when all service logic is executed, the system is left in a consistent state.

CAP theorem

The CAP theorem is about how distributed database systems behave in the face of network instability.

When working with distributed systems over unreliable networks we need to consider the properties of consistency and availability in order to make the best decision about what to do when systems fail. The CAP theorem introduced by Eric Brewer in 2000 states that any distributed database system can have at most two of the following three desirable properties

Consistency. Consistency is about having a single, up-to-date, readable version of our data available to all clients. Our data should be consistent - no matter how many clients reading the same items from replicated and distributed partitions we should get consistent results. All writes are atomic and all subsequent requests retrieve the new value.
High availability. This property states that the distributed database will always allow database clients to make operations like select or update on items without delay. Internal communication failures between replicated data shouldn’t prevent operations on it. The database will always return a value as long as a single server is running.
Partition tolerance. This is the ability of the system to keep responding to client requests even if there’s a communication failure between database partitions. The system will still function even if network communication between partitions is temporarily lost.

Note that the CAP theorem only applies in cases when there’s a connection failure between partitions in our cluster. The more reliable our network, the lower the probability we will need to think about this theorem. The CAP theorem helps us understand that once we partition our data, we must determine which options best match our business requirements: consistency or availability. Remember: at most two of the aforementioned three desirable properties can be fulfilled, so we have to select either consistency or availability.

Summary

NoSQL is undoubtedly characterized by the following set of positive features

Relax ACID rules in favor of BASE which in many cases is a price we can pay.
Schema-less increase processing flexibility.
It’s easy to store high volume of high variability data arriving with high velocity.
In many cases modular architecture allows components to be exchanged.
Possible linear scaling as new processing nodes are added to the cluster.
Possible low(er) operational costs

To be honest one cannot forget about negative side of NoSQL

ACID transactions are not supported which demands different way of thinking compared to relational model.
Lack of common query language (like SQL in relational databases).
Lack od build in security mechanism like views or roles.

After few years that distributive, easily scalable nature of NoSQL databases proved to be something we must have, not because we want but because we need it -- this is the only option to process enormous amount of data. On the other hand, lack od ACID model replaced by BASE in many bussines cases is unacceptable. As it usually happens in real life, the best solution lies somewhere in the middle between SQL and NoSQL worlds. This is how NewSQL databases arosen.

NewSQL

Term NewSQL was first time used in 2011 (according to [W24] see also [W21, W22]. Like NoSQL, NewSQL is not to be taken too literally: the new thing about the NewSQL is not the new SQL but some design goals we want to achieve.

I would say that, like NoSQL, NewSQL is used to describe a loosely-affiliated group of companies developing some type of software. If it is realy new justifying new prefix before SQL term is questionable. All of them tries to develop (new) relational database products and services designed to bring the benefits of the relational model to distributed architectures, or to improve the performance of relational databases to the extent that horizontal scalability is no longer a necessity. The first seems to be difficult -- this was one of the cause of NoSQL databases creation: inability to scale SQL databases. The second seems (to me) to be impossible -- this was one of the cause of NoSQL databases creation: inability to increase (or decrease if needed) processing power of a single unit with flexibility required by today's bussines world on the one hand and without paying a lot on the other. If you recal example with mega man from section When data becomes a problem, you know that is phisicaly impossibile to infinitely increase processing power of a single unit.

Of course, as always when new technology emerges, there were people who thought that NoSQL would be a perfect solution for all IT problems. For those who know NoSQL background (which we tried to present in previous section) it shouldn't be surprised that NoSQL couldn't be such an answer. In spite of the technological maturity proved by NoSQL solutions, the RDBMS users in enterprises are reluctant to switch to it. We may ask Why?

The greatest deployment of traditional RDBMS is primarily in enterprises, what seems to be a perfect area of application. Probably every enterprise fall into big data rush, so all big-data issues (discussed in Big Data -- big problem with data section) are relevant to them. So why not NoSQL? IT world is very, very practical and nothing happens without evident need. Even though there are varieties of NoSQL offerings, they are typically characterised by lack of SQL support, and non-adherence to ACID properties replaced by unpredictible BASE. Despite NoSQL could help enterprises manage large distributed data, enterprises cannot afford to lose the ACID properties, which were the key reasons for choosing an RDBMS in the first place. Also, since NoSQL solutions don’t provide SQL support, which most current enterprise applications require, this pushes enterprises away from NoSQL.

Once again in the databases history omission of compliance with well known and highly adopted standards (SQL) met with rejection by IT environment (the first time it concerned object-oriented databases as we described in SQL section).

To address big-data transactional business scenarios that neither traditional relatinal systems (SQL) nor NoSQL systems address, alternative database systems have evolved, collectively named NewSQL systems. NewSQL is a shorthand for the various new scalable and/or high-performance SQL database.

Technical characteristics of NewSQL solutions, according to Michael Stonebraker, are as follow [W25]

SQL as the primary mechanism for application interaction.
ACID support for transactions.
A non-locking concurrency control mechanism so real-time reads will not conflict with writes, and thereby cause them to stall.
An architecture providing much higher per-node performance than available from the traditional RDBMS (SQL) solutions.
A scale-out, shared-nothing architecture, capable of running on a large number of nodes without suffering bottlenecks.

So we can think about NewSQL as a Holy Grail unifying SQL and NoSQL approaches into one universal solution and the specific use (as SQL, NoSQL or mixed) will be the result of existing needs. Unfortunately, as experience teaches, if is something seems to be applicable to all different cases, it can't be realy working.

Given that relational DBMSs have been around for over four decades, it is justifiable to ask whether the claim of NewSQL’s superiority is actually true or whether it is simply marketing. If they are indeed able to get better performance, then the next question is whether there is anything scientifically new about them that enables them to achieve these gains or is it just that hardware has advanced so much that now the bottlenecks from earlier years are no longer a problem. [A10] If NewSQL is only a marketing terms do we really need it? Wouldn't our time be better invested in trying to understand what the fundamental issues are and how to overcome them. [W25]

Moreover, despite that there are planty of NewSQL systems, they have had a relatively slow rate of adoption, especially compared to the developer-driven NoSQL stream. This is probably because NewSQL DBMSs are designed to support the transactional workloads that are mostly found in enterprise applications. Decisions regarding database choices for these enterprise applications are likely to be more conservative than for new Web application workloads. This is also evident from the fact that NewSQL DBMSs are used to complement or replace existing RDBMS deployments, whereas NoSQL are being deployed in new application workloads.

Summary

NewSQL is undoubtedly characterized by the following set of positive features.

ACID support for transactions.
SQL as the primary mechanism for application interaction.
Much higher per-node performance than available from the traditional RDBMS (SQL) solutions.
A non-locking.
Scalable.

To be honest one cannot forget about negative side of NewSQL.

If for NewSQL we can get much higher per-node performance than available so far from the traditional RDBMS (SQL) solutions, we can also adopt it to those SQL systems reducing the NewSQL advantage in this aspect.
Slow adoption of NewSQL compared to NoSQL is a significant sign that should be taken into consideration -- no mather what NewSQL vendors says, something prevents adoption in the market. It might be a sign of irrelevance of this direction to real needs.

Summary

SQL examples:
Oracle, MySql, Microsoft SQL Server, PostgreSQL, IBM Db2, Microsoft Access, SQLite.
NoSQL examples:
- Column family stores
  Amazon SimpleDB, Accumulo, Cassandra, Druid, HBase, Hypertable, Vertica.
- Key-value stores
  Apache Ignite, ArangoDB, BerkeleyDB, Dynamo, FoundationDB, InfinityDB, LevelDB ,MemcacheDB, MUMPS, Oracle NoSQL Database, OrientDB, Project Voldemort, Redis, Riak, Berkeley DB.
- Document stores
  Apache CouchDB, ArangoDB, BaseX, Clusterpoint, Couchbase, Cosmos DB, IBM Domino, MarkLogic, MongoDB, OrientDB, Qizx, RethinkDB.
- Graph stores
  Apache Giraph, MarkLogic, Neo4J, OrientDB, Virtuoso.
- Column databases
  Druid, Vertica.
- Time series databases
  Druid, InfluxDB, OpenTSDB, Riak-TS, RedisTimeSeries.
NewSQL examples:
Clustrix, GenieDB, ScalArc, Schooner, VoltDB, RethinkDB, ScaleDB, Akiban, CodeFutures, ScaleBase, Translattice, NimbusDB, Drizzle, MySQL Cluster with NDB, and MySQL with HandlerSocket, Tokutek, JustOne DB, Amazon Relational Database Service, Microsoft SQL Azure, Xeround, Database.com, FathomDB.