A CNN is a type of artificial neural network (ANN) used for image processing and computer vision.
CNNs help extract edges, corners, objects, figures, patterns, texture, and more complex information from an image. CNNs are used in a wide range of industrial areas for various purposes, such as image classification, object detection, face recognition, medical imaging, video analysis, natural language processing, autonomous vehicles, robotics, games, simulations, etc.
The following are the steps of a CNN:
Step 1: Input Layer
Tensor: A tensor is a mathematical expression used in machine learning (ML), deep learning (DL), and artificial neural networks (ANN). It is a general expression of multi-dimensional arrays. A tensor might be a scalar (a = 5), which is a 0th-degree tensor; a vector (v = [2,3,5]), which is a 1st-degree tensor; or a matrix (M = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]), which is a 2nd-degree tensor. Higher-dimensional tensors, such as a 3rd-degree tensor (T = H x W x C), where H is height, W is width, and C is the channel for RGB (Red, Green, Blue) color images, also exist. If tensors are used for videos, we can also add time (T), resulting in a 4th-degree tensor, as T = T x H x W x C.
The input layer is the first layer of a CNN. Typically, an image is used, which has height, width, and depth (channels). For example, a 32x32x3 tensor represents an image with a 32×32 dimension.
Step 2: Convolutional Layer
Kernel: A kernel is a filter or matrix used to extract information from input images. For example, a 3×3 or 5×5 matrix slides (also referred to as flowing or striding) over the image.
Stride: The stride defines how many steps the kernel/filter moves over the image. If the stride is 1, the filter moves by one pixel.
Feature Maps: After applying the kernels to the image, new matrices are produced. Feature maps reflect specific characteristics of an image.
In the convolutional layer, kernels are applied to the input layer. Each filter attempts to extract specific information from the image.
Step 3: Activation Function
Activation functions are used in ANN to introduce non-linearity to the output of a neuron. These functions process the input and provide an output within a specific range. Common examples include the sigmoid function (range: [0, 1]), the Tanh function (range: [-1, 1]), the ReLU function (range: [0, ∞)), the Leaky ReLU function (range: (-∞, ∞)), and the Softmax function (range: [0, 1]).
After Step 2, an activation function is applied to the feature map. The most common activation function is ReLU (Rectified Linear Unit), represented as f(x) = max(0,x). ReLU function makes all negative values 0 and leaves positive values unchanged.
Step 4: Pooling Layer
This layer reduces the dimensionality of the feature maps and helps decrease the computational cost of the model. The most common pooling technique is Max Pooling. In Max Pooling, a filter (e.g., with dimensions 2×2) scans the feature maps, and the maximum value in each region is taken. This technique reduces the image dimensions and increases efficiency.
Step 5: Fully Connected Layer
Flattening is the process of converting a 3-dimensional tensor into a one-dimensional vector. For example, if the output tensors of the pooling layer are 7x7x64, after flattening, they become a one-dimensional vector: 7x7x64 = 3136.
The information gathered from the pooling layer is flattened. This layer acts as a standard neural network, connecting each neuron with all the neurons from the previous layers. Operations like classification and regression are performed in this layer.
Step 6: Output Layer
This is the final layer of the CNN and provides the results. It uses the Softmax function for classification problems to compute the probabilities of whether an image belongs to a specific class.
Step 7: Backpropagation and Optimization
During training, the model’s predictions are compared to the actual values, and errors are calculated. Using optimization algorithms such as Adam or Stochastic Gradient Decent, the model’s weights are updated. This process continues throughout the training phase.