What is Big Data?
Big data is a large set of data that’s used for analysis to discover what is happening. Big data isn’t about explaining why certain things are happening. Example of big data in manufacturing range from product development to shipping. Instead of random sampling from large quantity of information, Big Data takes all the information that’s available. Relevant or sometimes not-so-relevant information gets added to allow visualization of what is happening. Big data consists of processes to take all data that’s available. Doing so helps answer the question: What is happening. Big data is a theory that encourages the use of all relevant data in analysis.
Big data and N=all
It is often practical to pick up 5 random parts out of 100 that are ready for shipping and analyze them for certain defects. Random sampling like that is rather common. Once data is digitalized, random sampling may sometimes continue. A manager may choose to look at a 20 of his 100 customers to see how this sample size of 20 (N=20) is ordering products. Big data encourages analyzing all 100 customers and showcasing interesting information. N=all means take all data.
Big data example in manufacturing business
In this example we take a manufacturing plant that makes steel boxes with locks and handles. That’s all. Designers build models of different boxes, handles and locks with the help of engineers who build drawings for them. Bill of materials are built, parts are purchased and manufacturing begins after orders start coming in. Just this simple manufacturing process will generate decent amount of data.
Big data takes all orders that have come in. Order system may also keep history. Based on that one can determine how many orders are coming daily, have they changed over a period of time, when the last spike of orders was and how many cancellations happened. One can determine the customers who have ordered the most quantities, most dollars, average dollars per order shipped, total dollars of all cancelled orders etc. It can also be determined how often an order changes and if the changes lead to the dollars going up on the order or if customers are asking us to reduce or cancel orders. This is all general analysis so far. The data set is specific.
Going along the manufacturing line, it can be noticed that blue colored boxes are built in a batch of 50. When orders come in, manufacturing order is released. If 2000 boxes have to be shipped, 400 manufacturing orders are created. Each manufacturing order builds 50 boxes in a batch. They are inspected and then shipped to customers. During this process, parts are needed. Based on the orders coming in and forecasting of how much the company will sell, parts are ordered. Parts are put together into an area after manufacturing order is released so that the area can build boxes.
Technicians building the boxes record how much time they took to do each step and associate that with steps of the manufacturing orders. Based on all of this it can be determined throughput – how many boxes can be built in a month when there is no shortage of parts, which parts were short, how many minutes does it take to complete step 1 of the manufacturing process, which step takes the longs, how much idle time employees have when a long running process is going on, etc.
Data from orders and manufacturing can be combined to investigate the time it takes to create blue-boxes going to customer X, who buys the most products in the year. It can be predicted that if customer X doubles their order, we would have to add 3 more people to the production line to speed up operations. Or, we may have to buy a new machine to keep up with doubling of customer X’s demand. This can be done by evaluating operational efficiencies and capacities of machines. One can even determine whether certain equipment are reaching replacement age.
Supervisors can investigate failure rate of products using quality control data. Quality control may have dataset of their own. This can be added to the overall analysis to determine amount of rework and customer impact. Now that the data is much larger and analysis is considering all of it (N=all), we are referring this as big data. Purchasing managers can evaluate how quickly an important part can be procured and which vendor provides the highest on-time-delivery. Operational team can evaluate the parts that fail the most and the vendors that supply those parts. Counterfeit parts can be quickly caught as well through analysis of big data.
In case of a product recall, big data can help identify the vendors that sold parts and the customers that received products containing those parts. Data generated from recall can help an organization understand how quickly it takes to contact customers and how quickly each and every product can be located. Organizations an also figure out how much time it takes to recall 80% (or N%) of their products.
This is the power of big data; the power of analyzing entire set of data to find what is happening in their subjects of interest.