Companies of every size understand, or are starting to understand, the potential value that can be unlocked in the data they collect, store and process every day. Machine learning (ML) is one, rather simplistic and broad, way to define the algorithms that data scientists are developing to help companies unlock new insights from big data.
Raw data is almost useless. Similar to oil, sitting deep underground, you can’t do much with it.
Oil needs to be extracted, processed and refined before it’s of any use. Data is the same, and the data - oil analogy has been going around for a while. Like oil, data is dirty too, and ML is one of the most useful ways it can be cleaned and turned into something more valuable that doesn't need gloves to handle.
According to IBM, $3.1 trillion is the annual cost (of cleaning, and opportunities lost) of unclean data. Everything from missing to incorrect data fields, to duplicates, to formatting and structural errors make data hard to use and make sense of. Too often, when data is passed around internally, one team is cleaning up the mess of another, or simply duplicating what they've been sent to create a new slightly different version.
All of these changes and errors pile up. So, when a company gets excited about investing in a big data project, the results take longer to materialize than many would expect. Extracting, processing and refining is what takes so much time.
In a New York Times article, data scientists explained that “far too much handcrafted work — what data scientists call “data wrangling,” “data munging” and “data janitor work” — is still required.” Monica Rogati, Vice President for data science at Jawbone, said that this data janitor work “is a huge — and surprisingly so — part of the job. At times, it feels like everything we do.”
This is one of the main reasons big data projects take longer than expected. Results aren't achieved as quickly, and more investment is often needed than companies initially expect. It’s also why we aren't seeing any kind of “Big Data revolution.” As an FT journalist points out: “the largest changes happen at incremental level, where they creep up on an industry before turning into revolutionary value creation/destruction cycles.”
Accepting the arguments that making progress with big data, to unlock new value and insights, takes time, is this something businesses should invest in?
It depends, for every company. It depends on the goals, what you want to achieve (and the ROI from that, based on the amount invested) and the project outcomes. In some cases, this is a process and therefore efficiency-based investment (automating previously manual processes).
In others, companies could be seeking to unlock new business opportunities, create new products and services. ML is also increasingly used in the investment sector. Investors are turning to ML to overcome behavioral biases in factor-based investing. Société Générale is one of the many banks experimenting with ML to implement new investment strategies and approaches.
In another case, ML algorithms are being used by a UK-based startup to help enterprise organizations increase profits with real-time consumer-driven pricing insights. Until the ML algorithms were developed for this, it would have taken decades to run the calculations for those kind of insights. Now, in action, it can take a matter of days or weeks, and companies are witnessing a 19% net margin increase in under 9 weeks.
Other examples of ML in action include the Amazon, Netflix and Pinterest recommendation engines. Self-driving cars and trucks, such as those Tesla and Google are developing, are other examples. As are most online sentiment analysis and monitoring tools, and financial sector fraud detection.
All of these services that wouldn't exist without the ML algorithms that make them work more effectively at scale. A McKinsey Global Institute study predicts that “by 2030, 70 percent of companies” will have adopted ML or an AI-driven technology solution, either to improve and automate manual processes, or replace legacy software, or create new products and services.
Either way, despite the gradual nature of this revolution, change is coming. Slowly but surely, as data is cleaned and processed, the ability for companies to use ML will increase. As startups develop new ML-based solutions, and the cost of implementing them goes down (along with the cost of recruiting data scientists as the number of professionals with those qualifications increases), businesses will adopt these more readily. The positive impact of ML on businesses of every size will increase and accelerate, generating the kind of positive economic impact many have been predicting.