| ET: What is 'Big Data' and what is its importance to business and society today?
KS:
As the popular definition goes, Big Data is characterized by the 3Vs: high-volume, high-velocity and high-variety information assets that can provide deep insights and thus enable businesses in effective decision making. It is rather a phenomenon of data deluge, engendered by advancement of technology and parallel explosion of newer channels of data collection via a nexus of social, mobile and cloud computing. It is the sheer pervasiveness of digitization in our social fabric and it is only guaranteed to explode with the growth in the internet of things. The real power of Big Data is in its ability to fuse varied data from multiple sources and derive meaningful and powerful insights from it. It has the enormous ability to deliver the right product, service or message to a consumer even before he/she asks for it!
Businesses today are no longer grappling with the term Big Data but have graduated to the next step asking, how can I leverage it to derive good ROI? Most businesses are collecting data about their customers and their preferences from various sources and a lot of it is generally, not required for their day-to-day operations. Additionally, businesses typically have other internal sources of data which are untapped or underused – the so called “dark data”. With the advent of technologies that help process all kinds of data, business can run analytics on these untapped datasets and derive valuable insights that will help them devise innovative strategies to increase their revenue. Businesses have to realize that this dark data has probable economic potential and it is in their interest to mine it for competitive advantage. There are many use cases across industries and geographies where Big Data has provided businesses unprecedented value. In this new era of Big Data, information is a big asset for any organization.
The importance of Big Data to the society today, is an interesting and an important question. The unprecedented computational power coupled with downward trending cost of storage has enabled unexpected discoveries, innovations and improvements in our quality of life. The data-centric world accrues numerous benefits to the society – right from competitive retail pricing, early warning signs of infection in both, adults and neonatal, improving energy efficiency, boosting agricultural productivity, and many more. Just like anything else, there are always two sides of a coin – one being opportunistic for benefitting from it and the other staring at the risk of loss of privacy and serious damage. The technology is so advanced that it can do near-perfect personalization. The sophisticated algorithms that run on data avalanches could potentially cause unintended collateral damage to an individual. While these may seem to deter usage of Big Data, there will be privacy laws that will be in place that will act as a protective shield to the society and at the same time allow unfettered flow of data that can be used to benefit the society at large.
ET: With businesses today trying to tackle the data explosion era that we live in today, what are the tools and technologies available in managing Big Data?
KS: The tools and technology landscape has seen significant advancements since the release of 100% open source Apache Hadoop, a very popular framework for distributed parallel processing. It has gained tremendous adoption in the industry driven primarily by the enterprise grade Hadoop distributions from HortonWorks and Cloudera. Hadoop, at its core, has Hadoop Distributed File System (HDFS), a file system to store any kind of data and a new paradigm called Map Reduce, for processing that data. This is supplemented by a number of tools that form the Hadoop ecosystem. Pig is one such tool, a simple data flow language that abstracts the complexity of map reduce. Hive is another tool that provides SQL like interface to do basic analytics on data in HDFS. Just as Hadoop is a fantastic platform for batch processing, Apache STORM is a platform that facilitates distributed real time computation. It is the de facto platform for real time use cases. Another complementary toolkit as part of this ecosystem, that provides high throughput distribution messaging platform is Apache Kafka. Moving on to the visualization aspects, Hadoop is not inherently designed to provide very high read throughput. There is a new category of toolset called NOSQL technologies. Apache Cassandra, HBase and MongoDB are some of the more popular ones that more often than not act as the bridge between Hadoop and the visualization layer. On the visualization front, there is a powerful open source Javascript library called D3.js (and NVD3.js) that provides fantastic charting capabilities. In addition, there are notable third party vendor tools like Datameer and Platfora for data visualization with native connectivity to Hadoop file system.
Apart from these, Apache Spark which provides very fast in-memory processing of data in Hadoop and is especially suitable for running certain types of algorithms (like the logistic regression) which are not most suited for processing on Hadoop.
Notwithstanding all of these, almost all of the existing traditional BI tool vendors have invariably built connectors to Hadoop.
In summary, as is evident, tools are available in abundance to process the Big Data. Once these tool sets mature and become a part of the organizations enterprise landscape, Big Data will cease to be so and will be considered normal data.
The tremendous advancement in technology has literally made “finding needle in a haystack” practical!
ET: What are the challenges for Big Data in the Indian context? Is there a different rate of progress for Big Data in India versus other developed countries?
KS: We have to invariably look at the Indian IT Services organizations and other industry organizations through separate lenses when it comes to the Indian context. The reason is that Indian IT companies have naturally jumped on this bandwagon when there was a whiff in the air. They have been aggressively trying to build the centre of excellences around this emerging technology and build competency to garner their share of pie and to deliver value to their existing clients, primarily in the western world. In this context, the challenge has been the same as that faced by any other IT organization and that is, a generally slow adoption of this technology worldwide. It took time to sift through the hype and get a handle on the real substance that this technology offers.
When it comes to non-IT organizations, i.e, the corporates in India are still slow in adopting this technology. The Economist Intelligence Unit conducted a survey in 2013 and published a report “The Hype and Hope: The road to Big Data adoption in Asia-Pacific”, which gives a good insight into the subject. Albeit many believe in the benefits accrued from leveraging Big Data, the lack of progress is attributed to internal issues that inhibit adoption, restricted access to data, lack of clear communication about Big Data strategy and in general, more of a wait-and-watch philosophy.That said, any industry that is data-centric, i.e., naturally collects lots of data and analyses it, will be polarized towards this technology, telecom and retail for example.
It is quite clear that the rate of progress is certainly different and that we lag quite significantly when compared, especially to the US. It is also important to note that from a pure technology play, we are building the competency at a fair clip and can be fairly competitive. As entrepreneurship blossoms and customer centricity increases, more organizations will take the leap to reap benefits from Big Data, for example, India’s largest carmaker Maruti Suzuki, is an early Big Data adopter and has been able to grow 2.42% in a slowing economy in the automobile industry. ICICI Prudential life, India’s largest private life insurance firm, is also testing waters with Big Data.
ET: What is your advice to one preparing for the even bigger data explosion that is yet to come?
KS: It is necessary to be prepared before the avalanche dawns. The internet of things is bound to increase data manifold. Organizations have to realize that this information is an asset and therefore needs to have a proper infrastructure strategy and organization structure in place for information management. Organizations have to assess the competencies available internally and build as necessary to respond to the potential opportunities. Organizations should be keenly following the path tread by other industries and look out for cases that may have parallels in their own business. Big Data initiatives are not just for IT, business leaders need to be intimately involved and therefore communicating the organizations Big Data strategy to all stakeholders is extremely important. Organizations will have to develop a data culture wherein insights from facts are valued and at the same time recognize the human frailty in discovering such sights. Finally, while it is necessary to be opportunistic, it is also important to ensure social etiquette is not breached to avoid any kind of embarrassment to individuals or bring disrepute to the organization.
ET: Sears India has come a long from starting operations in 2009 to what it is today. Could you please share some insights about your firm and how your company operates in the Big Data arena?
KS: Sears India is a global in-house centre for the parent company, Sears Holdings Corporation, headquartered in Hoffman Estates, Illinois, USA.
Sears India started on its Big Data journey back in 2011. Since then, we have done pioneering efforts in this technology, ranging from legacy modernization, to replacing traditional ETL, pricing analytics to doing real-time processing and reporting. We started with modernizing executive information system from a legacy to Big Data platform. There were few other efforts that were started in parallel and one that blossomed into a sizeable program was our legacy modernization effort. This program was primarily run from Sears India wherein we were migrating our core business applications to the newer Hadoop platform. This program helped the parent organization save millions of Dollars on a year-on-year basis and at the same time help build a very strong Big Data competency within the organization. We have proven a very strong use case for deriving operational efficiency by using Big Data technologies. One of the Gartner studies says that 51% of the organizations are leveraging Hadoop for process efficiency and cost reduction. During the last couple of years, Sears India has successfully implemented multiple projects on Hadoop platform within core systems, supply chain, point-of-sales, pricing, marketing and online business units. Big Data is an integral part of our technology landscape. Sears India has demonstrated to be more than an able partner to our parent organization and has certainly stolen a march over other organizations in India, in this emerging technology.
back to top ^ |