Since George Orwell created the sinister ‘Big Brother’ in his epic novel ‘1984’, the world has come to terms with nothing being sacrosanct, least of all, privacy. Over the decades since the 1990s, when information technology was just beginning to be recognized as the next big wave that would change the course of the world, businesses started realizing that they could now have easier access to a larger audience of potential customers by using the technologies that the web provided. From the earliest browsers, email and chat messaging services, forums and bulletin boards, online games, web portals and any other web properties requiring registration, all our transactions, monetary and otherwise, have been recorded. Big data was waiting to take the world by storm.
Fast forward to 2003. In the words of Google’s Executive Chairman Eric Schmidt, “From the dawn of civilization until 2003, humankind generated five exabytes of data. Now we produce five exabytes every two days… and the pace is accelerating.” Indeed, the amounts of data we produce today is mind-boggling. Even more worrying is the fact that the automatic data generated by machines such as CCTV cameras, sensors, vehicles, computes and mobile devices, etc., is overtaking the manual data generated by humans, and data volume is growing to the extent that a complementary new field called machine learning is currently a ‘hot’ specialization and is generating intense demand in the job market and businesses are coveting such specialists. The irony is that this might very soon render a very large number of people across the world useless because their less than creative, machine programmable tasks, will soon be performed efficiently and effectively by machines, which have been fed all the parameters on which the tasks need to be performed and have perfected the right responses to all possible scenarios.
Big data analytics tools like Hadoop typically ‘crunch’ massive amounts of data, amounting to several terabytes, in astonishingly small timeframes like a few hours or even minutes, by breaking down the data into smaller sets, which are individually worked upon, on multiple computers, and the individual results for each of these are combined to produce a final result that relates to the entire mass of data. This idea was Google’s guiding principle in creating their path breaking search engine, and today forms the basis of big data analysis.
What is the purpose of using the tools of big data? Primarily businesses, and increasingly governments, are realizing the need to analyze ever-increasing (in volume and complexity) data to study individual and collective trends at a minute level, which helps them to come up with policies and products and services that serve their purposes while ostensibly serving their stakeholders better. The range of data encompasses phone records, commercial transactions, email, chat and browser logs, GPS coordinates, social media activities, devices that transmit data about our activities, such as smart watches, Google Glass, smart phones, tablets, etc.; the list goes on.
Businesses that have a large volume of customers, such as telecom service providers, the travel industry, retail, etc., use big data tools to analyze individual behaviour so as to take predictive actions. For example, a retail chain can analyze the log of data about a customer using her transactions to predict an event in his/her life, such as a birthday, pregnancy, etc., and target relevant promotions to that person coinciding with the event. Thus, in real time, businesses can study customer behaviour and use promotions based on future activity. In effect, they can predict our lives.
An example of a non-commercial application of big data is the analysis of our music consumption by an audio website over time and its subsequently generating playlists containing music that matches our personal tastes.
From the point of view of machine learning, a great example of the application of big data analysis would be in the case of Google X’s ambitious project of developing self-driving cars that have to process a massive amount of information in real time using their various sensors so as to be safe for use on the roads.
There are four pillars on which big data is based. These are: volume, velocity, variety and veracity. This refers to the increasing size of data, the speed at which data is being generated, the increasing complexity of data due to the various forms of it, and the amount of real world value that each of this data represents.
While critically evaluating the idea of big data, one cannot deny some of its benefits, which include creating customized products and services across various sectors, providing enterprise-wide insights that can help companies develop policies that redefine them, reducing unnecessary costs, identifying new opportunities and revenue streams, providing better security, analyzing potential risk better, making our infrastructure smarter, making healthcare more efficient, understanding customer preferences better, and many more.
Though organizations like Google, Facebook and others face numerous lawsuits our breaches of privacy, we are coming to terms with the glum fact that in today’s digital age, we can never hope to be completely anonymous, and that a record of our activities and our lives will exist and update in some form or another across the spectrum of our lives.
The recent revelations by ex-NSA contractor Edward Snowden reveal how rampant spying is in this digital age, and that no one is exempt from it. Spy agencies have gone to the level of using tracking devices in USB cords, so that even if devices are not online, they can transmit data. This is indeed scary, and makes us realize that we can be manipulated in innumerable ways, and that a variety of institutions know more about us as individuals than we dare acknowledge.
George Orwell’s disturbing dystopia is indeed manifesting itself subtly but steadily.