The Need For MapReduce and NoSQL – Gaurav Sharma's Blog

The Need for MapReduce

Relational Database Management Systems have been in use since 1970s. They provide the SQL language interface.

They are good at needle in the haystack problems – finding small results from big datasets.

They provide a number of advantages:

A declarative query language
Schemas
Logical Data Independence
Database Indexing
Optimizations Through Use of Relational Algebra
Views
Acid Properties (Atomicity, Consistency, Isolation and Durability)

They provide scalability in the sense that even if the data doesn’t fit in main memory, the query will finish efficiently.

However, they are not good at scalability in another sense – that is, having multiple machines available, or multi-cores available will not reduce the time the query takes.

The need thus arose for a system which gives scalability when more machines are added and is thus able to process huge datasets (in 50 GB or more range).

The NoSQL databases give up on atleast one of the ACID properties to achieve better performance on parallel and/or distributed hardware.