Deep learning

Is a

Industry

Technology

Industry attributes

Parent Industry

Machine learning

Child Industry

Generative adversarial network

Technology attributes

Related Industries

Artificial neural network

Other attributes

Also Known As

Key People

Yoshua Bengio

Geoffrey Hinton

Alexey Grigorevich Ivakhnenko

Yann LeCun

Wikidata ID

Q197536

Overview

Deep learning is a subset of machine learning based on neural networks that process data using a method inspired by the human brain. In traditional computing, a program directs the computer to perform a task by following explicit step-by-step instructions. In deep learning, the computer is not explicitly told how to solve a particular task. Instead, it uses a learning algorithm to extract patterns in the data that relate the input data to the desired output. Deep learning models can learn to perform classification tasks by recognizing complex patterns in various forms of data, such as images, text, and audio. They can also be used to automate tasks that would previously require human intelligence, such as describing images or transcribing audio files.

The use of deep learning has seen significant growth with the development of deeply layered neural networks, the use of GPUs to accelerate execution, and the access to large datasets for training. Deep learning technology is driving many AI applications across a range of fields, including computer vision, natural language processes, speech engines, recommendation engines, digital assistants, text generation, and industrial automation, as well as emerging technologies, such as self-driving cars and virtual reality.

Deep learning can be defined as a neural network with three or more layers. While a single-layer neural network can make approximate predictions, adding hidden layers can help optimize analysis and refine outputs for accuracy. Neural networks attempt to simulate the behavior of the human brain, allowing deep learning models to "learn" from large amounts of data. The name "deep learning" comes from the fact it requires neural networks with additional layers to learn from the data.

Diagram showing the structure of a deep learning neural network.

Deep learning differs from classical machine learning by the type of data it uses and the methods by which it learns. Machine learning leverages structured, labeled data to make predictions. Specific features are defined in the input data for the model. To use unstructured data, machine learning generally requires some form of pre-processing and organization. Deep learning eliminates the need for pre-processing, enabling the ingesting and processing of unstructured data. Deep learning automates feature extraction and removes some of the reliance on human experts. For example, a deep learning algorithm could identify features in order to classify images in a dataset without human intervention. Then using processes, such as gradient descent and backpropagation, the algorithm adjusts and improves for accuracy. In contrast, machine learning would require a hierarchy of features defined manually by a human expert.

Deep learning neural networks are constructed of multiple layers of software nodes (artificial neurons) mimicking the interconnected neurons of the human brain. Deep learning models can learn by example, clarifying complex abstractions by building a hierarchy in which each level of abstraction is created using knowledge gained from the preceding layer. Each layer builds upon the last to refine and optimize predictions and classifications. Deep learning performs nonlinear transformations to its input to create a statistical model as an output. Iterations continue until the output reaches an acceptable level of accuracy. Achieving this level requires a deep learning model that has been trained on a large dataset and significant processing power.

Deep learning isn't a single approach; it is a class of algorithms that can be applied to a broad spectrum of problems. There are many architectures and algorithms used in deep learning. These can be divided into supervised deep learning (convolutional neural networks, recurrent neural networks, long short-term memory networks, and gated recurrent networks) and unsupervised deep learning (self-organized maps, autoencoders, and restricted Boltzmann machines).

History

French mathematician Adrien-Marie Legendre published what is now often referred to as a linear neural network in 1805. Johann Carl Friedrich Gauss was later also credited for similar unpublished work done in 1795. Legendre's neural network consisted of two layers: an input and an output. Each input unit holds a number that is connected to the output with a weight. The output is the sum of the product of the input and its weights. Given a training set of input vectors and desired target values for each of them, neural network weights are adjusted such that the sum of the squared errors between the outputs and the corresponding targets is minimized.

The first non-learning recurrent neural network architecture was introduced and analyzed by physicists Ernst Ising and Wilhelm Lenz in the 1920s. It settles into an equilibrium state in response to input conditions. Warren McCulloch and Walter Pitts proposed an artificial neuron (a computational model of the “nerve net” in the brain) in 1943. In 1958, American psychologist Frank Rosenblatt introduced the idea of the Perceptron, a device that mimicked the neural structure of the brain and demonstrated an ability to learn. His multilayer perceptrons (MLPs) had a non-learning first layer with randomized weights and an adaptive output layer. While only the first layer could "learn," Rosenblatt developed ideas that would go on to become extreme learning machines. In 1962, Rosenblatt would also write about "back-propagating errors" in MLPs with a hidden layer. However, he would not go on to develop a general deep-learning algorithm using MLPs.

Soviet mathematician Alexey Ivakhnenko (with help from his associate V.G. Lapa) demonstrated successful learning in deep feedforward network architectures in 1965. Their work introduced the first general learning algorithms for deep MLPs with arbitrarily many hidden layers. They went on to publish a paper in 1971 describing a deep learning net with eight layers, trained by their method. Using a training set of input vectors with corresponding target output vectors, layers are incrementally grown and trained by regression analysis. Similar to later deep neural networks, Ivakhnenko's nets learned to create hierarchical, distributed, internal representations of incoming data.

In 1986, Geoffrey Hinton, along with colleagues David Rumelhart and Ronald Williams, published the back-propagation training algorithm as a method of training multilayer neural networks. However, some in the field state Finnish mathematician Seppo Linnainmaa invented back-propagation in the 1960s. Yann LeCun pioneered the use of neural networks for image recognition tasks, and in his 1998 paper, he defined the concept of convolutional neural networks, which mimic the human visual cortex. Around the same time, Joh Hopfield popularized the first recurrent neural network, known as the "Hopfield" network. Jürgen Schmidhuber and Sepp Hochreiter expanded upon Hopfield's work, introducing the long short-term memory in 1975. The architecture improved the efficiency of recurrent neural networks.

In 2000, Yoshua Bengio introduced high-dimension word embeddings as a representation of word meaning. His group also introduced a form of attention mechanism that led to breakthroughs in machine translation. In 2012, Hinton and two of his students (Alex Krizhevsky and Ilya Sutskever) demonstrated the power of deep learning with significant results in the ImageNet competition. Their work was based on a dataset collated by Fei-Fei Li and others. Around the same time, Jeffrey Dean and Andrew Ng made significant breakthroughs on large-scale image recognition at Google Brain.

Yoshua Bengio, Geoffrey Hinton, and Yann LeCun were the recipients of the 2018 ACM A.M. Turing Award for "conceptual and engineering breakthroughs that have made deep neural networks a critical component of computing."