This book, “Big Data,” serves as a guide to the world of big data, in which the vast amount of information gathered by corporations and states renders causality unnecessary for making predictions and strategies. Instead, we can rely solely on correlation to uncover amazing trends and phenomena.
However, the evidence suggests that big data is no more reliable than traditional statistical methods, and can lead to overconfidence in inaccurate predictions.
If you are still undecided about whether to read the book, this summary will provide you with all the information you need to make an informed decision.
Without further ado, let’s dive into it.
Book Summary
Lesson 1: Big Data has eliminated the need to rely on insufficient data samples in order to make inferences about larger populations.
Before the Internet and computers, getting information was a challenging task. We had limited data collection capabilities and had to work with what we had. For instance, when conducting a telephone survey of voters for an election, we could only interview a few hundred people, assuming that their responses represent the entire population. This process is called sampling, where we select a subset of data hoping it represents the total set.
However, the problem with sampling is that if we want to predict voting behavior for a specific subgroup of the population, we may not have interviewed enough people to draw any conclusions. For instance, if we want to predict the voting behavior of state employees, we may only have interviewed ten of them, which makes our conclusions unreliable. Even if we focus on a subgroup like civil servants under 30, if we only interview one person, we have no basis for predictions.
This issue is a catch-22 with sampling. The closer we look at the data, the fewer observations we have to make sound conclusions.
But with the advent of Big Data, we now have access to vast amounts of data, making it easier to collect information. For example, a Big Data version of the election survey could include tens of thousands of people, or even the entire population of the city. Therefore, we can “zoom in” on specific subsets of data virtually indefinitely, making it easier to predict behavior accurately.
Lesson 2: Sometimes larger, less organized records are better than fewer, more accurate records.
Have you ever wondered how language translation programs work? In the 1980s, IBM engineers came up with a new concept that revolutionized the translation industry. Instead of using grammar rules and dictionaries, they fed the computer examples of texts that had already been translated and let it use statistical probability to determine the appropriate word or phrase. However, the system was not as accurate in translating less common words and phrases as it was with the most common ones.
The problem was that the data sample they used was small, only three million sentence pairs from certified translations of Canadian government records. This led to inaccuracies that were especially problematic when studying outcomes that occur infrequently. Despite high-quality input, the system failed.
However, a decade later, Google solved the translation problem by using a different method. They chose to use the entire Internet, despite its dubious accuracy. Their technology searched billions of pages of text to get the best possible translation. The system’s translations were more accurate than any competing system, despite the questionable quality of the input.
The lesson here is that inaccuracies in data can be a serious problem when we have only a tiny sample of the data. But with Big Data, the impact of errors becomes much smaller. This is because the data sets are so huge that access to so much data reduces the significance of any errors. In other words, inaccuracies in Big Data are less noticeable than inaccuracies in small data sets.
So, what can we learn from this? When working with small data sets, it is essential to ensure the quality and quantity of the data to reduce inaccuracies. On the other hand, when working with Big Data, the focus should be on analyzing the data to identify patterns and trends rather than eliminating inaccuracies.
Lesson 3: In many cases, it is enough just to know that two elements are connected, and that is all that can be determined from Big Data.
Have you ever wondered if the color of a car affects its reliability? It may seem like an odd question, but in a data analysis contest in 2012, participants found that cars with orange paint were half as likely to have problems as other cars. This surprising correlation is an excellent example of how big data is changing the way we approach problem-solving.
In the past, we would have had to theorize and test hypotheses about why certain things were correlated. However, with the vast amounts of data available today, we can let the statistics speak for themselves. By analyzing all the data, we can uncover correlations we never expected.
For example, IBM and the University of Ontario conducted a study to support the treatment of premature infants. By examining vital sign data from newborns before any outward symptoms appeared, they found that the babies’ vital signs were very stable even before a severe infection occurred. This discovery goes against previous assumptions that only unstable vital signs indicated a problem. With this new knowledge, clinicians can now better care for patients when they need it the most.
The use of correlations like these has already begun, and it’s changing the way we approach problem-solving. Instead of trying to come up with a theory and then testing it, we can now analyze the data and see what correlations exist. While we may never fully understand why certain things are correlated, we can still use that information to our advantage.
Lesson 4: Although most data is collected for a reason, their later use can often be even more valuable.
In today’s world, data collection is happening all around us, often with a specific goal in mind, such as financial accounting or improving the customer experience. However, companies are quickly realizing that the data they collect can be used for much more than just their initial purpose. This is where the concept of “Big Data” comes in.
One example of this is the Swift interbank payment system, which tracks billions of financial transactions globally. By analyzing this data, Swift can produce exceptionally accurate estimates of the state of the international economy. Similarly, historical search terms on the Internet, which at first seem useless after serving their primary purpose of providing search results, can be analyzed to gain valuable insights into market movements and consumer preferences.
Even something as simple as the location data captured by mobile operators for routing calls can be used for traffic monitoring or targeted advertising based on the user’s location.
The value of Big Data is being recognized by companies and individuals alike, and they are already developing services and tools to leverage the many potential applications for the information they collect.
Lesson 5: With the right attitude, anyone can discover untapped sources of value in the information they collect.
Having access to data is only valuable if you know what to do with it. Similarly, having the knowledge and means to evaluate data is useless if you don’t have access to it. However, there are successful individuals in the Big Data industry who lack both data access and analytical skills. These individuals possess a “Big Data mindset,” which means they are aware of when and where data sets can be used for insights that are valuable to many people.
One example of such a person is Bradford Cross, who co-founded the travel website FlightCaster. Despite not having access to proprietary data or sophisticated analysis tools, Cross and his team were able to predict airplane delays in the United States using publicly available meteorological records and airline data. Their predictions were so accurate that even airline employees began using the website to check their own flights.
Decide.com is another successful company that utilizes a similar approach. They store over twenty-five billion price checks of e-commerce websites and analyze the data to present consumers with the best possible prices and inform them of the ideal time to purchase a product.
The potential value of data is becoming increasingly recognized as the data-driven economy takes shape. The data gold rush presents a tremendous opportunity for individuals and companies with a Big Data perspective.
Big Data Review
Big Data by Viktor Mayer-Schönberger and Kenneth Cukier is an informative and engaging exploration of the vast amount of information being collected by governments and corporations about us. The book explores how big data can be used to make predictions and decisions about everything from our buying habits to our health, and even criminal activity. The authors argue that big data has the power to reveal trends and phenomena we wouldn’t otherwise recognize, even if we don’t understand the causal process that leads to them.
The book is filled with fascinating examples, such as how Walmart used big data to boost sales of Pop-Tarts during storm season, and how Billy Beane, general manager of the Oakland A’s baseball team, used statistical analysis to improve the team’s performance. However, the authors’ reliance on interpretations of examples from previous popular science books, rather than original research and examples, may be a drawback for some readers.
While the authors spend much of the book hyping the potential and achievements of big data, they do caution about the dangers of over-reliance on this method of analysis. They discuss the implications for privacy in the future and the frightening concept of predictive policing, in which data is used to select individuals for extra scrutiny simply because an algorithm pointed to them as more likely to commit a crime.
Overall, Big Data is an interesting and thought-provoking read. The authors make a convincing case for the power and potential of big data while also acknowledging its limitations and potential pitfalls. It’s a worthwhile read for anyone interested in the impact of big data on our lives.
Viktor Mayer-Schönberger (born 1966) is a Professor of Internet Governance and Regulation at the Oxford Internet Institute, University of Oxford. He is also a faculty affiliate at Harvard’s Belfer Center. Mayer-Schönberger is the co-author of several books including “Framers” (with Kenneth Cukier and Francis de Vericourt), “Reinventing Capitalism” (with Thomas Ramge), the international bestseller “Big Data” (with Kenneth Cukier), and the award-winning ‘Delete’.
Kenneth Cukier is an award-winning journalist and bestselling author. He currently serves as the Deputy Executive Editor at The Economist and is the host of its weekly tech podcast. His book “Big Data,” co-authored with Viktor Mayer-Schönberger, was a New York Times bestseller and has been translated into over 20 languages.
From 2002 to 2004, Cukier was a research fellow at Harvard’s Kennedy School of Government. He also serves on the board of directors of Chatham House, a British foreign-policy think-tank, and is an associate fellow at Saïd Business School at the University of Oxford. In addition, he is a member of the Council on Foreign Relations.
Buy The Book: Big Data
If you want to buy the book Big Data, you can get it from the following links: