### Machine Learning Method for Data Classification

Classification methods divide objects into predefined categories according to their characteristics with the help from classifiers. A classifier is a function that maps an input to a class and its aim is to find suitable rules to which the data can be assigned to the respective class. Normally this is done in machine learning by using a Supervised Learning approach. In the following article, we briefly introduce the most common machine learning methods for solving classification problems.

### Decision Trees

Decision trees in their simplest form can be visualized by thinking of a regular tree; they represent a flow chart like structure where each node represents a “test” on an attribute, each branch represents the outcome of the test, and each leaf node represents a class label. The paths from the root to the leaf represent the predefined rules for your classification methods.

### The Bayes Classifier

The Bayes classifier assigns objects to the class to which they most likely belong. The basis for calculating this probability is called a cost function. This represents the objects as vectors, in which each trait is mapped to a dimension, then the cost function determines the probability that a single trait of the object belongs to a class. The individual traits are considered independently of each other. Finally, the object is assigned to the class whose individual traits most closely match a class.

### k-Nearest-Neighbors

The k-nearest-neighbors method determines the affiliation of an object to a class, taking into account the majority principle meaning that an object is added to the class that is most strongly represented in its neighborhood. To determine the distances between two objects, the Euclidean distance is often used. However, there are alternatives, such as the Manhattan metric, which in certain circumstances may work better. A commonly used method to find a suitable k is to run the algorithm with different values for k and then use the most suitable value. As a general rule, it should be noted that a multi-dimensional space can be used instead of a two-dimensional space.

### Support Vector Machine

The Support Vector Machine represents each object as a vector in an n-dimensional vector space. Especially for training data that is not linear separable, this is important so that the vectors can be separated by a plane. To separate the different vectors linearly, you choose a hyperplane so that the vectors are as far away from it as possible. Further objects are then classified according to their position in this hyperspace. In the example shown above, the original two-dimensional vectors are mapped into a three-dimensional space. As can be seen, the vectors in two-dimensional space cannot be separated linearly but the three-dimensional space, however, it is possible to place a plane "e" in such a way that the two sets can be separated linearly.

### Neural Networks

With neural networks, it is possible to classify content in images such as objects, faces, and places with a high degree of certainty if sufficient training data is available. Object recognition has been becoming increasingly popular within the scope of neural networks because of its ability to achieve very good results. Our blog entry on neural networks shows this method in detail.

## Challenges of Classification

When choosing the appropriate classifier, the following characteristics must be taken into account:

### Accuracy

The accuracy of the classification depends on the rules used. It must be remembered that the accuracy on the training data versus the untrained data is usually different, so it is always possible that a machine learning model produces perfect results for the training data but not for the test data. However, it is possible that poor accuracy is only achieved when classifying new data. In the end, the ratio of correctly classified objects determines the accuracy of the model.

### Speed

Under certain circumstances, the speed of classification can be an important criterion for a model. For example, a classifier may achieve 90% accuracy in one-hundredth of the time compared to a classifier that achieves 95% accuracy. So it may be better to do without accuracy if you can achieve an improvement in speed instead.

### Comprehensibility

It may be important that it is easy to understand how a classifier arrives at its end result. Even if the result is accurate and fast, above all, people working with the model must have confidence that the model produces a valid result (ie; when making important business decisions)

## Application

The use of classification offers many exciting applications in diverse disciplines, such as computer vision; where the aim is to give a computer a visual understanding of the world, which, in turn, is used in various aspects such as self-propelled cars and object recognition. The use of machine learning has many other uses as well, for example, the ability to classify spam/non-spam emails or even in the field of medicine for the early detection of diseases.

Featured Image by Mike from Pexels

### What are Artificial Neural Networks?

An artificial neural network is a mathematical model that is based on biological neural networks found in animal brains. Within such a network, groups of neurons are combined in multiple layers that are connected to each other.

Each neural network has at least one input layer and one output layer. In addition, any number of layers, known as hidden layers, can be placed in between the input and the output layer, as is the case in deep learning. In practice, more hidden layers in a network result in a higher degree of complexity and depth, but also more computing power required to train and run the model. ## How Artificial Neural Networks Work

In a neural network, the neurons of adjacent layers are connected to each other using weighted connections. This weighting indicates the influence of the respectively connected neurons. When the network is initialized, the weights are initialized with random values.

Neurons of hidden layers will always have an input and an output. The total input of a neuron, known as the network input, specifies a propagation function such as the linear combination of values. The linear combination adds up the inputs multiplied by the respective weights.

Next, the activation function is used to calculate the activity of a neuron (or its activity level). In order for neural networks to be able to solve nonlinear problems, the activation function must be a nonlinear function such as the sigmoid function or the rectifier function and the corresponding rectified linear unit (ReLU), which has recently gained widespread popularity. The rectifier function has the form:

𝑓(𝑥) = max(0 , 𝑥)

The output functions of a neuron determines its output. The neuron transfers this output weight to the next neurons in the layer. Usually, the identity function is used for this, where the output is equal to the activity level (id(𝑥) = 𝑥).

### Sample Calculation

You have 2 input vectors v1 = 0,5 und v2 = 0,8. The propagation function as in the example of the graph would be calculated as:

0,3 * 0,5 + 0,8 * 0,7 = 0,71

With this value you can calculate the activation function, in this example calculating using the rectifier function:

f(𝑥) = max(0 , 0,71)

Afterwards the result is transferred to the output function:

id(0,71) = 0,71

### Training Processes

According to certain learning rules, weight adjustment takes place with each training process. The weight of neuron connections are adjusted in such a ways that the output of the model is moved closer and closer to the desired result.

A frequently used learning rule when working with neural networks is called backpropagation.

#### Backpropagation

In backpropagation, an external "teacher" compares the result of the neural network with the desired outcome, determines the quadratic error, and then the information is fed back into the net. During this process, the weight in each layer is adjusted so that the resulting error becomes smaller and smaller over time. This is done using the gradient method; a numerical method for determining the minimum of a function. This method usually only finds a local minimum, however, it is still used in practice because it is too complex and costly to determine a global minimum arithmetically.

In order to use backpropagation, one needs a large amount of already labeled data to train the neural network. This is due to the fact that this method requires a relatively low learning rate. The low learning rate is necessary because the gradient method gradually approaches the minimum instead of taking big of steps and mistakenly skipping over the minimum. One of the advantages of backpropagation is that it can be applied to multi-layer networks.

### Test Phase

After the training has been completed, the net needs to be evaluated for whether or not it has learned something or if meaningful weight adjustments have taken place; this is called model validation. To do this, you give the model many learned inputs as well as unlearned inputs which makes it possible to examine if the model has just recorded or memorized the training objects or whether it actually solves general tasks correctly.

## Network Types

### Feed Forward Neural Network

A feed forward neural network is an artificial network in which information is passed layer by layer from the input to the output layer. It is important that the information is always transferred in the direction of the output layer and never the other way around.

There are also 2 types of feed forward neural networks:

#### Perceptron

The perceptron is the simplest form of an artificial neural network. First introduced in 1958 by Frank Rosenblatt, it has only one neuron, which is also the output vector. The input weights were already adjustable, allowing the perceptron to classify inputs that differ slightly from the vector originally learned. However, not only the original version of the perceptron is referred to as such, but instead, a distinction is made between the following variants:

• Single Layer Perceptron:
The single layer perceptron corresponds to the perceptron originally published by Rosenblatt as described above.
• Multi Layer Perceptron (MLP):
The multi-layer Perceptron has several layers of neurons. In most cases, a sigmoid function is used as an activation function. An advantage of MLPs compared to simple perceptrons is that they are able to solve nonlinear problems.

#### Convolutional Neural Network (CNN)

The convolutional neural network usually has at least 5 layers and the main principle of CNNs is that each layer performs pattern recognition. With pattern recognition, each layer is refined further based on the output of the previous layer in a local region. This procedure is based on the receptive field that, for example, can be found on the retina of the human eye. Because backpropagation is used as the learning method, however, a large amount of labeled training data is necessary for CNNs to deliver useful results.

CNNs are used for image and video recognition, image classification and natural language processing, among other things, and have delivered outstanding results in recent years in competitions such as the SQuAD (Stanford Question Answering Dataset) test.

### Recurrent Neural Networks

In contrast, to feed forward networks, recurrent neural networks allow information to go backward and pass through areas of the network again, which means that neurons can also return information to previous layers. This creates a kind of memory in the neural network. Recurrent neural networks are able to predict the future input of a range of inputs, which is why they are often used for handwriting or speech recognition.

The following types of Recurrent Neural Networks can be distinguished:

• Direct Feedback:
Neurons are fed their immediate output as an input
• Indirect Feedback:
The output is fed back to the input of the neurons from previous layers
• Lateral Feedback:
The output of a neuron is passed on to the input of a neuron of the same layer
• Complete Connections:
All neurons are connected to each other

Title Photo by chivozol from Pexels

### Where Does the Machine Learning and AI Megatrend Come From?

Nowadays, the ideas of machine learning (ML), neural networks, and artificial intelligence (AI) are trending topics seeming to be the focus of discussion everywhere. In this article, we briefly summarize the development of Machine Learning in the last ten years and explain why this trend will be applied more in all economic sectors.

## Background

In the 1940s, Warren McCulloch and Walter Pitts laid the foundations of machine learning with their publication “A Logical Calculus of the Ideas Immanent in Nervous Activity“ on the topics of neurons and nerve networks.

In 1957 Frank Rosenblatt developed the Perceptron algorithm, which represents a simplified model of a biological neuron. Three years later, Bernard Widrow and Marcian Hoff developed ADALINE, an early artificial neural network and for the first time, the weights of the inputs could be learned by the network.

However, the publication of the book "Perceptrons" by Marvin Minsky and Seymour Papert in 1969 meant that after the initial euphoria about machine learning, the topic lost its importance and we fell into the so-called "AI winter". The book presents not only the strengths but also the serious limitations of perceptrons such as the XOR problem. The XOR problem represented such a hurdle because classical perceptrons can only solve linearly separable functions. However, the XOR function generates a non-linear system that can not be solved in a linear manner.

## New Revivals

David Rumelhart, Geoff Hinton, and Ronald Wiliams laid the foundation for deep learning through backpropagation experiments in 1986 and they solved the XOR problem by applying the method of backpropagation to multi-layer neural networks.

Another big step in machine learning was the use of deep learning. Deep learning refers to a class of machine learning algorithms that can solve nonlinear problems due to their high number of layers. Each layer processes the data transferred from the layer above thus abstracting the data, layer by layer.

## Machine Learning Today

### The Influence of AlexNet on Machine Learning

In the last decade, the topic gained popularity again especially in 2012, Geoff Hinton, Alex Krizhevsky and Ilya Sutskever caused quite a stir with their Convolutional Neural Network AlexNet.

#### Success in the Large Scale Visual Recognition Challenge

With AlexNet, they were able to achieve an outstanding result by using deep learning methods at the annual ImageNet Large Scale Visual Recognition Challenge ( ILSVRC ), which has been held annually since 2010. The aim is to design the most efficient image recognition software possible by using the free ImageNet database. In the first year, the best result was an error rate of 28.2%. By the second year, the error rate was still 25.7% and the 2nd best result from 2012 still had an error rate of 26.2%.  The AlexNet team, in contrast, achieved an error rate of just 16.4%. This result quickly made a big impact in the professional world, which rekindled the hype about and the importance of machine learning.

#### Reasons for the Success of AlexNet

On the one hand, this result can be attributed to advances in the theory of machine learning algorithms. For example, the use of the so-called "rectified linear activation unit" (ReLU) has greatly increased the efficiency and speed of deep learning algorithms.  Among other problems, the use of ReLU has since solved the Vanishing Gradient Problem; where certain parts of a network may no longer be active during the training of the neural net and in worst-case scenarios, means that this network can no longer be trained.

Unlike previous competitors, Hinton used graphics cards instead of CPUs thanks to the CUDA technology released by Nvidia in 2007. This technology allowed for graphics cards to be used for general calculations. In a 2006 study, Rajat Raina, Anand Madhavan, and Andrew Ng showed that the use of graphics cards instead of CPU's could increase the speed of neural network training by up to 15 times.

### Development After to AlexNet

After the success of AlexNet, the potential behind these methods were increasingly recognized, which is why even big companies like Google started to engage with machine learning. As an example, machine learning algorithms can be used to develop self-driving cars (eg Waymo), because of their ability to solve non-linear problems. From this trend, various program libraries such as Google's TensorFlow, Keras, or Theano, developed by the University of Montreal, emerged.

## Why is it applicable today?

Machine learning methods are recently finding great applicability because of the tools above and the more widely available computing power. The prices for graphics cards have fallen in relation to computing power in recent years, as the following illustrations show.

 Graphics Card GFLOPS Price (\$) Publication Year GFLOPS/€ Nvidia GeForce GTX 680 3.090 500 2012 6,2 Nvidia GeForce GTX 780 3.977 499 2013 6,1 Nvidia GeForce GTX 780 Ti 5.046 699 2013 7,2 Nvidia GeForce GTX 980 4.612 549 2014 8,4 Nvidia GeForce GTX 980 Ti 5.632 649 2015 8,7 Nvidia GeForce GTX 1080 8.228 499 2017 16,5 Nvidia GeForce GTX 1080 Ti 10.609 699 2017 15,2 Nvidia GeForce RTX 2080 8.920 699 2018 12,8 Nvidia GeForce RTX 2080 Ti 11.750 999 2018 11,8

### Development of the most Powerful Graphics Cards for Machine Learning Applications

Google's 2016 Tensor Processing Units (TPU) enabled the acceleration of machine learning applications and also allowed accelerated training of neural networks in later generations from the years 2017 and 2018. Also helpful in the application of neural networks is the ability to rely on GPU clusters, because they allow fast training of the networks.  Today, it is not even necessary to perform the calculations on your own computer, instead, it is possible to perform the calculations at very reasonable prices in the cloud ( ImageNet Benchmark ).

## Areas of Applications

Computer vision is one of the most important areas of application for machine learning algorithms. Computer vision is a term used to describe when one enables a computer to gain a general understanding of images or videos to obtain information from them. Another area of application is speech analysis and the evaluation of texts. Speech analysis teaches the computer to understand general spoken words and, for example, convert them into a written text. In text analysis, the computer is supposed to be able to extract information from any text.

All of these areas result in exciting use cases such as the evaluation of satellite data, the enhancement of image searches, the analysis of public sentiment, or self-driving cars.

## Do only International IT Companies Benefit from this Development?

Use cases where good results can be achieved quickly include:

• Automatic evaluation of images or video recordings
• Predicting key figures (demand, inventory levels, etc.) allow quicker and better decisions can be made
• Knowledge extraction from documents and large text bodies
• Automatic classification of frequently occurring business transactions (for example, in banking, insurance, or other audit cases) into automatically acceptable requests and those that still require manual post-processing.

Title Photo by Pixabay  from Pexels

### What is Machine Learning?

Machine Learning enables computers to learn knowledge from data without someone or something explicitly programming it. This knowledge is a function that assigns a suitable output to an input. An algorithm adjusts the function until it achieves the desired results. In recent years, a number of training methods have been established, which only lead to good results with the availability of very large data sets. The phrase "data is the new oil" often refers to the fact that companies with richer data sets can train more powerful models.

## Narrow AI vs. General AI

Our current machine learning algorithms are generally only applicable to very specific problems; a so-called, Narrow AI.  General AI is an algorithm that learns an abstract world view model. Like a human being, this would be able to combine knowledge from different fields and transfer it to previously unknown problems. However, we are still a long way from such a General AI. It is not even clear whether such a General AI is possible in principle. Nevertheless: Narrow AI today is already able to solve problems that were previously difficult or impossible to solve with computers more efficiently than humans.

### Supervised and Unsupervised Learning

There are two approaches to train Machine Learning models; supervised and unsupervised learning.  In Supervised Learning, the input and the desired output are known at training time. From this information, the algorithm generates a model that describes the relationship between input and output. After the training, this model can provide general results for an input, i.e. results that are not limited to the training data set. For example, Supervised Learning Training is used to recognize image content. During training, images are provided with a list of image content for each image as input. The training should enable the model to recognize the correct objects as output.

In Unsupervised Learning, however, the program does not receive any information about the desired output. The program then has to create a model that generates suitable outputs for given inputs on the basis of their similarities. It is difficult to judge the result of such a model qualitatively because there is no specification for the results. The automatic recognition of clusters in quantitative data is a typical problem, however, unsupervised learning models achieve good results. Among other things, a program can automatically recognize outliers and new patterns in data.

## Problems Machine Learning Can Solve

As mentioned earlier, machine learning algorithms are currently being applied to very specific problems. Three abstract problem types can be solved:

### The Classification Problem

The classification problem is the automated grouping of objects into classes. A classic example of this is a program that is able to recognize whether a cat or a dog is on a picture. For classification problems, supervised learning models are often used. On the basis of suitable examples, the model independently learns to assign objects to the correct class. Which properties are considered is either given to the model depending on the method (Feature Engineering) or also learned automatically.

### The Regression Problem

The Regression Problem is about estimating the future course of a function. A classic example would be the prediction of the water level of a river. Here, conclusions are drawn from previous years to predict the future situation. A supervised learning model is often used for this purpose. This model can be trained with historical data that allow conclusions to be drawn about future data.

### The Clustering Problem

The goal of the clustering problem is to design an algorithm that independently categorizes given data into groups of similar objects; often an unsupervised learning model is used. Such a model does not need more precise specifications as to how it should categorize. This ensures that certain groups of data, which the developer may not even perceive as such, are not excluded from the outset. A classic example would be target group analysis in marketing, such as personal advertising. Customers are divided into different groups in order to display specific advertising.

## Challenges when using Machine Learning Methods

However, the use of machine learning methods poses a number of challenges despite the many advantages:

### Data Quality

In practice, much of the work of machine learning projects consists of obtaining, understanding, and preparing the right data. This step, generally called "preprocessing", ensures that the data represents exactly what the model is actually trained on. Particularly in larger organizations, the collection and aggregation of data is not a trivial task where several departments have to cooperate with each other. If the necessary data is not available for the desired scope or quality, the first project phase often consists of collecting and processing this data.

Errors and distortions in the data can lead to the trained model also learning these, one speaks here of "biases", i.e. prejudices that emerge from the data.

### Differences Between Different Machine Learning Methods in terms of Performance and Explainability

The different machine learning methods differ in terms of their performance and their respective explanations. In general, if a machine learning model achieves high performance, its explainability decreases, as well as vice versa.

For example, neural networks perform very well but trying to understand the solution reveals that it does not have much to do with what we would understand as logical problem-solving. This is because the models consist of hundreds or even more independent variables and many more calculations. This is called the black box problem. This characteristic becomes problematic whenever this information is important, e.g. in vital decisions such as in medicine, when a machine learning model suggests a treatment.

In contrast, there are procedures such as linear regression or decision trees that are very evident. As a result, the models are not well suited for complex problems.