How Machines Learn: The Top Four Approaches to ML in Business
Friday March 23, 2018. 01:00 PM , from The Apple Blog
Machine learning sits at the forefront of innovation across a growing number of industries in today’s business world. Still, it’s a mistake to think of machine learning as one monolithic business solution — there are many forms of machine learning and each is capable of solving different sets of problems. The most popular forms of ML used in business today are supervised, unsupervised, semi-supervised, and reinforcement learning. At Vidora, we’ve used these techniques to help Fortune 500 partners solve some of their most pressing problems in innovative ways. This article draws from our experiences to demystify these four common approaches to ML, introducing practical applications of each technique so that anyone in your organization can recognize how machine learning can enhance your business.
Machine Learning at a Glance
Machine learning is an approach to Artificial Intelligence which borrows principles from computer science and statistics to model relationships in data. Unlike other AI systems which distill human knowledge into explicit rules (e.g. Expert Systems), ML instructs an algorithm to learn for itself by analyzing data. The more data it processes, the smarter the algorithm gets.
Machine learning is not a new concept. Its theoretical foundation was laid in the 1950s when Alan Turing conceptualized a “learning machine”. That same decade, Frank Rosenblatt invented the “perceptron” to roughly simulate the learning process of the brain. More algorithms followed, but machine learning remained largely confined to academia until only recently. With explosions in data availability and computational power, it is finally possible for businesses to deploy machine learning at scale. Organizations have had success with each type of learning, but making the right choice for your business problem requires an understanding of which conditions are best suited for each approach.
If you know which metric you’d like to predict and have examples labeled with that metric, supervised learning is the best approach. A supervised algorithm is shown the “right answer” for a set of sample data and finds a function which approximates the relationship between the inputs and outputs. This functional mapping takes the general form y = f(x) — specify your target output y, provide your inputs x, and the ML algorithm will learn the optimal f() by finding patterns in the data.
y = f(x)
Used to generate predictions
Supervised learning outputs typically have one of two forms. Regression outputs are real-valued numbers that exist in a continuous space. For instance, many of Vidora’s eCommerce customers want to forecast how much money each customer is likely to spend, so that high-value customer may be targeted with personalized promotional offers. A simple linear regression structures this problem through the familiar formula y = mx + b, where y is predicted expenditure and x is some attribute of each customer — say, number of site visits. During training, we supply labeled input-output pairs — i.e. customers for which transaction history is already known — and the algorithm finds the optimal parameters m and b to make this relationship as accurate as possible. In reality, Vidora’s regression model is likely to input hundreds of customer attributes each with its own parameter, but the algorithm’s mechanism of action remains the same.
Classification outputs, on the other hand, fall into discrete categories. For example, Vidora’s subscription customers often wish to identify the best communication channel to reach and retain each user: email or push notification. A linear classification algorithm distinguishes between the two by plotting attributes of each user and finding a line which separates the data into two groups based on their labels. Users known to be responsive to email fall on one side of the line, and those responsive to push fall on the other.
Popular supervised learning algorithms:
Convolutional deep neural networks
Support vector machines
Convolutional deep neural networks
Unsupervised learning is used when training data has no specific label for the algorithm to predict. Without “right answers” to train on, the job of an unsupervised algorithm becomes clustering the data in order to uncover new rules and patterns. Finding inherent structures in the data can yield important and practical insights, from detecting data anomalies that mark credit card fraud, to revealing what your best customers have in common.
Popular unsupervised learning algorithms:
Principal component analysis
Non-negative matrix factorization
Hidden Markov model
At Vidora, we’ve seen that collecting labeled data at scale is a challenge for many business organizations, but unlabeled data is relatively abundant. Semi-supervised learning makes use of this plentiful unlabeled data to gain a better understanding of the population structure and distribution. For instance, a bank which offers home loans may wish to identify which of its customers own a house, but may have limited access to this information. Under the semi-supervised approach, an algorithm would first use information obtained from labeled data to predict homeownership for unlabeled data. Next, both the labeled and predicted data are passed through a supervised framework to learn a homeowner identification model. Despite never being evaluated, the estimated labels may improve performance of the supervised model by providing a larger set of potential homeowners from which the algorithm can learn.
Popular semi-supervised learning algorithms:
Reinforcement learning is used in situations where the computer is an agent interacting with its environment in pursuit of a goal. Here, feedback is the key ingredient. Rather than being shown a “right answer”, the algorithm is provided a reward signal against which it evaluates and adjusts its methods. With experience, the algorithm learns which sequence of actions gives it the best chance of maximizing its reward and achieving its goal.
Reinforcement learning typically requires huge amounts of data, but doesn’t force your business to be highly specific about its goals. Some autonomous vehicles learn to drive through reinforcement. These cars are instructed to get from point A to point B under only two broad conditions: obey the rules of the road, and don’t crash. The rest is learned through trial and error. Google’s famed AlphaGo program also learned to play the ancient Chinese board game Go using reinforcement. Armed with only the game’s rules and a goal of winning, AlphaGo learned which moves tended to maximize its chance of success. Merely two years after making its first move, AlphaGo famously dethroned the Go world champion in 2016.
Popular reinforcement learning algorithms:
Monte Carlo tree search
ML and Your Business
Each of supervised, unsupervised, semi-supervised, and reinforcement learning has shown meaningful success in the business world. As the practical scope of machine learning broadens, fluency in its key concepts becomes an increasingly important business skill even for those with no data science experience. Recognizing which sorts of problems each ML approach is best-equipped to solve empowers business experts to recognize where the technology may make its greatest contributions to key business outcomes.
Michael Firn is a Product Manager at Vidora, where he works closely with both Vidora’s engineering team and Vidora’s Fortune 500 partners such as News Corp, Walmart and Time to help develop and implement machine learning solutions to their business problems.
May, Sat 26 - 15:52 CEST