Technology attributes
Other attributes
A convolutional neural network (CNN or ConvNet) is a deep learning algorithm, one of the various types of artificial neural networks used for different applications and data types. CNNs have become the dominant approach to various computer vision tasks, specifically for image recognition and tasks that involve the processing of pixel data. CNNs can take in an input image, assign importance (learnable weights and biases) to the different aspects/objects in the image, and be able to differentiate one from the other. They can also classify speech or audio signal inputs.
CNN architecture is analogous to the connectivity pattern of the human brain, consisting of billions of neurons. More specifically for CNNs, these neurons are arranged in a specific way that is inspired by the brain's frontal lobe, the area responsible for processing visual stimuli. CNNs are designed to automatically and adaptively learn spatial hierarchies of features (low and high-level patterns) via a backpropagation algorithm. The mathematical construction of a CNN is typically composed of three types of layers, or building blocks, convolution, pooling, and fully connected layers. Convolution and pooling layers perform feature extraction, while a fully connected layer maps the extracted feature for the final output, such as a classification.
The convolution layer, composed of a stack of mathematical operations, plays a critical role in CNNs and is what gives them their name. Images store pixel values in a two-dimensional grid or array. A small grid of parameters called a kernel is applied at each image position to efficiently process features wherever they occur in the image. As one layer feeds its output to the next layer, extracted features can become hierarchically more complex. During training, the kernel parameters are optimized for a specific task. This is done by minimizing the difference between outputs and truth labels via optimization algorithms such as backpropagation and gradient descent.
The foundational research behind CNNs dates back to Kunihiko Fukushima and Yann LeCun. In 1980, Fukushima published a paper titled "Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position." In 1989, LeCun successfully applied backpropagation to train neural networks to identify and recognize patterns within a series of handwritten zip codes. LeCun's work was described in his paper titled "Backpropagation Applied to Handwritten Zip Code Recognition."
However, the use CNNs remained limited due to the need for large training datasets and computational resources. Early CNNs also only worked with low-resolution images due to their simple architecture. The first CNN to find significant use in computer vision tasks was AlexNet, released in 2012. A complex CNN using GPUs to train the model, AlexNet's performance superseded all existing non-neural models' performance.
The most common computer vision and audio processing use cases for CNNs are in the following fields:
- Healthcare—Examples include examing thousands of visual reports to detect any anomalous conditions in patients.
- Automotive—CNN technology is powering research into autonomous vehicles and self-driving cars.
- Social media—Social media platforms use CNNs to identify people in a user's photograph and help the user tag their friends.
- Retail—E-commerce platforms are incorporating visual search.
- Audio processing for virtual assistants—CNNs in virtual assistants learn and detect user-spoken keywords and process the input to guide their actions and respond to the user.