Computer Vision

Is a

Technology

Industry

Industry attributes

Parent Industry

Artificial Intelligence (AI)

Child Industry

‌

Image Classification

‌

Super-Resolution paper

Swarm robotics

‌

Fingerprint recognition

‌

Image processing

Facial recognition

...

Technology attributes

Related Industries

Artificial neural network

Other attributes

Wikidata ID

Q844240

Computer Vision is a field in computer science which deals with building applications to extract relevant information from visual imagery by training with known models. Computer vision deals with camera imaging geometry, image formation, depth perception, feature detection, matching, motion estimation and tracking, classification.

Computer vision is a multidisciplinary field that could be called a subfield of artificial intelligence and machine learning. Because of this, computer vision borrows and reuses techniques from a range of disparate engineering and computer science field. Computer vision methods and systems are highly application dependent. Some systems can stand-alone to solve specific measurement or detection problems. Others are sub-systems of a large design which work in the larger system for control of mechanical actuators, planning, information database, and man-machine interfaces. The specific implementation of a computer vision subsystem will also depend on its intended functionality.

Early experiments into computer vision started in the 1950s. By the 1970s, computer vision was put to use commercially to distinguish between typed and handwritten text. While the applications for computer vision have grown.

Applications of computer vision

Application

Description

3D modeling

Computer vision can be used for 3D modeling of objects or environments, including medical image analysis or topographical analysis.

Automotive safety

The use of computer vision is being used for automotive safety systems, including detecting driver drowsiness, or stopping possible collisions.

Autonomous vehicles

Autonomous vehicles, including cars, submersibles, drones, robots, trucks, land-based vehicles, and unmanned vehicles, can use computer vision to create fully autonomous vehicles. In these systems, computer vision can be used for navigation, environment mapping, obstacle warning, and the detection of task specific events.

Facial recognition

Can be used to recognize faces for security systems, or for surveillance and tracking purposes, or for biometric systems based on facial recognition.

Gaming and controls

Computer vision systems can be used for virtual reality and game control systems to allow people playing games to do so in a more immersive way and without hand-based control.

Computer vision recognition

In computer vision, image processing, and machine vision the purpose of determining whether or not an image data contains a specific object, feature, or activity. Computer vision for recognition includes tasks such as image classification or identification, object localization and object detection.

The best algorithms for recognition tasks are based on convolutional neural networks. The algorithms still struggle with objects that are small or thin, such as an ant on a stem of a flower or a person holding a narrow pen. And they have trouble with images which have been distorted with filters.

Specialized recognition tasks

Task

Description

2D code reading

This task works to read 2D codes such as data matrix or QR codes.

Content-based image retrieval

This task asks a computer vision system to find all images in a larger set of images which contain specific content. The content can be specified in different ways, such as in terms of a target image or in terms of high-level search criteria.

Facial recognition

This task works to recognize specific faces in images, and can be used for biometric systems.

Optical character recognition

This task works to identify characters in images of printed or handwritten text, usually to encode text in a format more amenable to editing or indexing.

Pose estimation

This task works to estimate the position or orientation of a specific object relative to the camera. An example would be assisting a robot arm in retrieving objects in an assembly line or picking parts from a bin.

Image classification

In image classification, a computer vision model is tasked with classifying images into distinct categories. This is done by training the computer with examples of each image class to develop learning algorithms to understand image classes and learn about the visual appearance of each class.

The purpose of image classification is for automation of the performance of a task. This can include the labelling of an image through tagging, the location of an object in an image, or as part of a larger system for guiding an autonomous car. Image classification is also used in surveillance systems to detect threats, camera occlusions, and emergency situations. And the technique has been used for facial recognition systems for biometrics, even being used for iris recognition as a biometric technique. And image classification has been used in robotics for automated systems.

Object localization

In object localization, a computer vision system is trained to define objects within images and outputting bounding boxes and labels for individual objects. In object localization, a single dominant object is detected and placed in a bounding box for object classification. The classification becomes those images within the box and those object outside of the boxes. This can be used for detecting all cars within an image for autonomous car models. Or it can be used in surveillance for identifying specified objects in images.

Object detection

In object detection, a computer vision system will process image data for a specific condition. This includes the detection of possible abnormal cells or tissues in medical images or the detection of a vehicle in an automatic road toll system. Detection based on simple and fast computations is sometimes used for finding smaller regions of interesting image data which can be further analyzed by more computationally demanding techniques to produce a correct interpretation.

New systems of image detection models include You Look Only Once (YOLO) model which uses a single neural network trained end to end that takes a photograph as input and predicts bounding boxes and class labels for each bounding box. And the Fast R-CNN model (Fast Region-based convolutional neural network) which has been improved for both speed of training and detection. The Fast R-CNN model is designed to both propose and refine region proposals as part of the training process. These regions are then used to improve the number of region proposals and accelerate the test-time operation of the model.

Object Tracking

Object tracking refers to the use of computer vision systems for following a specific object or multiple objects of interest in a given scene. This has traditionally been used in video and real-world interactions where observations are made following an initial object detection. Object tracking is useful for autonomous vehicle systems to understanding the directionality of objects around the car. Object tracking methods can be divided by the observation model into a generative method and discriminative method.

The generative method uses generative models to describe the characteristics and minimize the reconstruction error to search an object.

The discriminative method can be used to distinguish between the object and the background. This method is also referred to as tracking-by-detection. To achieve tracking-by-detection, the method uses deep learning to recognize the wanted object from candidates. It does through two basic network models: stacked auto encoders (SAE) and convolutional neural networks (CNN).

Object tracking tasks

Task

Description

Egomotion

This task works to determine the 3D rigid motion of the camera from an image sequence.

Optical Flow

This task works to determine, for each point in an image, how that point is moving relative to the image plane, or its apparent motion.

Tracking

This task works to follow the movements of a set of interest points or objects in the image sequence.

Segmentation

The process of segmentation in computer vision divides images into pixel groupings which can in turn be labelled and classified. Especially in semantic segmentation, the model works to semantically understand the role of each pixel in the image. Such that it works to recognize whether the pixels describe a car, a bike, a person, or a pole, and works to delineate the boundaries of each object. Segmentation, unlike image classification, works to produce dense pixel-wise predictions from models.

Visual representation of segmentation layers.

Most segmentation is done through fully convolutional networks (FCN) which provide architectures for dense predictions without fully connected layers. This allows segmentation maps to be generated for images of any size and are generated faster compared to other approaches. FCN networks also use downsampling and upsampling in the network to reduce inefficiencies at original image resolution. Downsampling layers are known as striped convolution, and an upsampling layer is known as transposed convolution.

Instance segmentation

Instance segmentation segments different instances of classes, such as labelling five cars with five different colors. This works similar to semantic segmentation. In instance segmentation and classification, there is generally an image with a single object as the focus and the task is to say what the image is. A computer vision system will locate the objects in the image with bounding boxes, and works to classify the different objects, identify their boundaries, and understand their relations to each other.

Scene Reconstruction

Given one or more images of a scene or a video, scene reconstruction aims at computing a 3D model of the scene. In the simplest case the model can be a set of 3D points. Other methods produce complete 3D surface models. The use of scene reconstruction or 3D imaging not requiring motion or scanning, and related image processing algorithms, has enabled advances in scene reconstruction. Using scene reconstruction, a digital version of real world objects can be developed.

Image Restoration

Part of computer vision, image restoration is a family of inverse problems for obtaining a high quality image from a corrupted input image. The corruption may occur due to the image-capture process (from signal noise or lens blur), post-processing (from file compression), or photography in non-ideal conditions (such that there is haze or motion blur). The computer vision systems will restore the images by analyzing the image data in terms of the local image structures, such as lines or edges, and control the filtering based on these structures for better image noise removal compared to simpler approaches.