Answer to Question #243657 in UNIX/Linux Programming for Avi

Question

Answer to Question #243657 in UNIX/Linux Programming for Avi

Answers>

Programming & Computer Science>

UNIX/Linux Programming

Question #243657

Why Hadoop and Why not RDBMS for managing Big Data? Elaborate with suitable
example. [5]

Expert's answer

The following are some of the reasons why Hadoop should be used in managing big data over RDBMS

In Terms of Data Volume

Volume means the quantity of data which could be comfortably stored and effectively processed. Relational databases surely work better when the load is low, probably gigabytes of data. This was the case for so long in information technology applications, but when the data size has grown to Terabytes or Petabytes, RDBMS isn’t competent to ensure the desired results.

On the other hand, considering Hadoop is the right approach when the need is to handle a bigger data size. Hadoop can be used to process a huge volume of data effectively compared to the traditional relational database management systems.

Database Architecture

Considering the database architecture, as we have seen above Hadoop works on the components as:

HDFS, which is the distributed file system of the Hadoop ecosystem.

MapReduce, which is a programming model that help process huge data sets.

Hadoop YARN, which helps in managing the computing resources in multiple clusters.

However, the traditional RDBMS will possess data based on the ACID properties, i.e., Atomicity, Consistency, Isolation, and Durability, which are used to maintain integrity and accuracy in data transactions. Such transactions would be of any sectors like banking systems, telecommunication, e-commerce, manufacturing, or education, etc.

Throughput

It is the total data volume process over a specific time period so that the output could be optimized. Relational database management systems are found to be a failure in terms of achieving a higher throughput if the data volume is high, whereas Apache Hadoop Framework does an appreciable job in this regard. This is one major reason why there is an increasing usage of Hadoop in the modern-day data applications than RDBMS.

Data Diversity

The diversity of data refers to various types of data processed. There are structures, unstructured, and semi-structured data available now. Hadoop possesses a significant ability to store and process data of all the above-mentioned types and prepare it for processing. When it comes to processing big volume unstructured data, Hadoop is now the best-known solution.

However, traditional relational databases could only be used to manage structured or semi-structured data, in a limited volume. RDBMS fails in managing unstructured data. However, it is very difficult to fit in data from various sources to any proper structure. So, we can see that Hadoop is the apt solution in handling data diversity than RDBMS.

Learn more about our help with Assignments: Programming & Computer Science

Comments

No comments. Be the first!

Answer to Question #243657 in UNIX/Linux Programming for Avi

Comments

Leave a comment

Related Questions