Concepts - Image processing

Supervised learning

In supervised learning, the dataset used for training also contains the correct values of the samples, these are called annotated datasets. In a character recognition model, the annotation is the character itself in the images, in an animal classification and localisation model, the annotation is the species of the animal and the coordinates of the bounding box in the image. In semantic segmentation, each pixel is assigned to one class and that mask of classes is the expected result. Data today is said to be valuable because it is a huge job to create and annotate a dataset. Supervised learning can be used for many tasks from filtering spam messages to weather forecasting.

Unsupervised learning

Unsupervised learning obviously means that the algorithm learns on unlabelled data. In this case, patterns and structures must be recognised autonomously, classes are created naturally, but they are not necessarily given a name. New samples are then assigned to their closest cluster. Unsupervised learning is particularly useful, for example, for marketing purposes to create customer groups, which can be used to achieve more effective targeted advertising and more relevant product recommendations.

Reinforcement learning

RL algorithms are rewarded by feedback from different operations as they learn, discovering the best way to reach their goal over time. They self-learn to take even short-term penalties to achieve a more optimal result later, as their intelligence grows. For example they can be used to train robots to move in unknown areas.

ANN, DNN

Artificial neural networks and deep neural networks are based on the same concept, but DNN refers to a more complex, deeper structure. Neurons form layers, data flows from layer to layer. If there are many hidden layers between the input and output layer, it is a deep network involving deep learning. Each neuron receives values from its connected peers, inserts them into a formula and then passes them on. Training the network means discovering the best formulas, which takes a lot of time, trials and computational power.

Neuron, Weight and Bias

Neurons are represented by circles, with lines connecting neurons in the layers. Each line has a weight, which indicates the importance of the data coming from it. Neurons calculate the weighted sum the value received from thousands or millions of connected neurons and then add a constant (called a bias) that is unique to each neuron. This value is not immediately passed on, but is first inserted into an activation function.

$ y = f(b + \displaystyle\sum_{i=0}^{n} x_i ⋅ w_i) $

Activation function

Their task is to break the linearity, otherwise a neural network would behave as a simple linear or polynomial regression. In order to learn the complex relationship between the input and output data, different functions are used. In the figure below, one of the most important activation functions, Softmax, is not shown because it cannot be visualized. It is commonly used in output layers, as it gives a probability distribution.

Softmax:

$ \dfrac{e^{z_i}}{\displaystyle\sum_{j=1}^{K} e^{z_j}} $

Forward propagation

In an already trained neural network, data flows from input to output during use. At this point, all weights and biases are given. Forward propagation therefore refers to the use of a ready-made neural network, for which a smaller computing capacity may be sufficient, like an office laptop, smartphone or microcontroller.

Backpropagation

The first step of training is the generation of random weight and bias values. Then, using the training dataset and annotations, the algorithm tunes them going backwards from the output to input, while computing error (crossentropy, MSE, etc.) and partial derivatives, and then updates the weights and biases using an optimizer like Gradient descent or Adam. The goal is to reduce the error per iteration.

Classification and regression

Two commonly used concepts are classification and regression. In regression the output does not represent a category or class, but can take any value. Estimating the price of real estate or the skeleton points of poses on video, finding the bounding box coordinates of objects, or even increasing the resolution of photographs are all regression problems. The output of classifiers is also a set of numbers, but they indicate the category.

Other techniques besides NN

In addition to neural networks, other machine learning techniques include SVM (Support Vector Machine), Decision Tree, kNN (k-Nearest Neighbour), Naive Bayes and many others, all of which have their own uses.