What is Big Data?

What is Big Data?

Big data

Big data is in media and on job posting quite often. The term literally refers to massive size of data. That can mean billions records and petabytes of data. More about this later. Big data is always about the WHAT, and then about the WHY. In addition, that means, analysis of large dataset helps in finding interesting and valuable facts. Analysis of large dataset helps in understanding the reasons behind happenings after studying facts.

Big data is not a technology or software. Big data indicates that analysis of entire set of data will be done. Results are usually derived from studying all data available. Study that used to be done using small subsets of data can be done, in today’s world, using massive amounts of data – and at much faster speeds. The benefits of such analysis includes deeper insight into what is happening around us, how markets are changing, how human nature and nature around humans is changing, how climate is altering etc.

Also, a short example is that of grand slam tennis. After a long rally, the amount of data collected and compared with previous plays is just mesmerizing. Commentators often discuss how many shots each person played and in comparison how many shots were played in a women’s game vs. men’s game, or which surface observed the longest rallies, or the average speed of each shot in the rally, or average drop shots or lobs in that rally etc. Furthermore, the enjoyable statistics noticed during grand slam tennis games are a result of analysis.

Understanding relationships

Big data is all about understanding relationships between various things, sometimes the findings can be bizarre. Think about analyzing the song being played on the radio and sneezing babies. If huge amounts of data was collected and studied only to find that babies sneeze the most when certain notes are played in the song, how interestingly bizarre that would be?

Can smaller dataset such as a few gigabytes be categorized as big data? Yes, in my opinion, it can. Especially relevant in my field of study is analysis of logs. A few gigabytes of data collected from dataloggers may reveal anomalies that may not be easy to detect with random sampling.

Technologies such as map reduce, Hadoop etc. are some of the many technologies used that study large datasets. For examples refer to the example of big data in manufacturing industry.

Recommend book: ISBN 978-0-544-00269-2 – A Revolution That Will Transform How We Live, Work, and Think Hardcover