What is machine learning?

By Pierre Baudin

Data Manager

Over the past decade, a growing number of companies have adopted artificial intelligence techniques in their digital transformation strategy in the way they design their products, manage their businesses,
and define their processes.

The collection, analysis and use of data are now seen as essential engines for the growth of a business.

The many advances in the field of artificial intelligence, whether it be cancer detection, chatbots, recommendation algorithms or fraud detection, are for the vast majority linked to advances in the field of machine learning.

To learn more about this subject, I invite you to discover in this article what machine learning is, how it works and why it is used.

Definition

Machine learning is an application of artificial intelligence (AI). It is a set of methods that gives systems the ability to make and improve predictions or behaviors based on data without being explicitly programmed.

Machine learning is a paradigm shift from "normal programming" where all instructions must be explicitly given to the computer to "indirect programming" which requires the use of data.

The performance of machine learning models is intimately linked to data, both in quantity and in quality. The more of these, the more the algorithm can learn and refine its model to deliver more precise results.

Once trained, the use of the machine learning model makes it possible, for example, to find patterns on new data or to predict future value.

🔎A machine learning algorithm is also able to learn on increasing datasets and thus continuously improve its performance.

How does it work?

Housing price estimation, product recommendations, traffic sign detection, credit default prediction, and fraud detection are examples of machine learning applications that all have in common that they can be solved by machine learning.

The tasks are different, but the approach is the same.

Step 1. Data collection and preparation

As we saw above, the more the better. The data should contain the outcome you want to predict and additional information from which to make the prediction.

Here are some examples:

For a traffic light detector ("Is there a traffic light in the image?"), we need a collection of images of intersection with an indication whether a traffic light is visible or not.
For a default risk predictor, you need past data on actual loans, information on whether or not customers were in default with their loans, and data that will help you make predictions, such as than income, past credit defaults, etc.
For the prediction of a churn rate, which consists of detecting customers who may cancel a subscription to a service, we will need data on how they use the service (example: purchasing behavior, session duration, response to marketing campaign, etc.).

Step 2. Training the algorithm and adjusting the parameters

When the data has been cleaned and prepared (see: the concept of tidy data), we can move on to selecting a machine learning algorithm. From the data of the chosen problem, it generates our traffic light detection model, the credit rating model or the attrition model.

To generalize, the algorithm begins by randomly estimating reasonable models to identify the variable to predict. In the event of an error, a series of adjustments are made by the algorithm itself to bring the prediction closest to the target learning variable.

The machine learning algorithm also has a series of additional parameters available to its user in order to optimize the results and performance of the model.

Adjusting the machine learning model is like adjusting buttons and switches on a television until you get a clearer signal

These processes constitute the training phase of a machine learning system.

Step 3. Using the model

The goal of building a machine learning model is to solve a problem, and a machine learning model can only do that when faced with new data.

The integration of a machine learning model in a product or a process, such as an autonomous car, a credit application process or a CRM (Customer Relationship Management) platform then makes it possible to use the power of past data to a real-time application.

As such, deploying models is as important as creating models.

Why use machine learning?

The increase in the volumes and varieties of data available (see Big data and the 4 V of data), cheaper and more powerful computer processing (cloud computing), and affordable data storage (cloud storage) encourage the rapid and automatic production of models capable of analyzing ever larger and more complex data and thus providing faster and more precise results, even on a very large scale.

Therefore, by building accurate machine learning models, an organization has a better chance of identifying profitable opportunities - or avoiding unknown risks.

💡In the case of our CRM, predicting the customer churn rate can result in metrics being put in place to retain customer loyalty before it's too late. The ability to predict that a customer is at high risk of dropping out when there is still time to do something about it represents a huge source of potential additional revenue for the business.

Congratulations, you now know more about machine learning. To continue reading and learning about this fascinating subject of data science.