Industry attributes
Technology attributes
Other attributes
Computer Vision is a field in computer science which deals with building applications to extract relevant information from visual imagery by training with known models. Computer vision deals with camera imaging geometry, image formation, depth perception, feature detection, matching, motion estimation and tracking, classification.
Computer vision is a multidisciplinary field that could be called a subfield of artificial intelligence and machine learning. Because of this, computer vision borrows and reuses techniques from a range of disparate engineering and computer science field. Computer vision methods and systems are highly application dependent. Some systems can stand-alone to solve specific measurement or detection problems. Others are sub-systems of a large design which work in the larger system for control of mechanical actuators, planning, information database, and man-machine interfaces. The specific implementation of a computer vision subsystem will also depend on its intended functionality.
Early experiments into computer vision started in the 1950s. By the 1970s, computer vision was put to use commercially to distinguish between typed and handwritten text. While the applications for computer vision have grown.
Applications of computer vision
In computer vision, image processing, and machine vision the purpose of determining whether or not an image data contains a specific object, feature, or activity. Computer vision for recognition includes tasks such as image classification or identification, object localization and object detection.
The best algorithms for recognition tasks are based on convolutional neural networks. The algorithms still struggle with objects that are small or thin, such as an ant on a stem of a flower or a person holding a narrow pen. And they have trouble with images which have been distorted with filters.
Specialized recognition tasks
In image classification, a computer vision model is tasked with classifying images into distinct categories. This is done by training the computer with examples of each image class to develop learning algorithms to understand image classes and learn about the visual appearance of each class.
The purpose of image classification is for automation of the performance of a task. This can include the labelling of an image through tagging, the location of an object in an image, or as part of a larger system for guiding an autonomous car. Image classification is also used in surveillance systems to detect threats, camera occlusions, and emergency situations. And the technique has been used for facial recognition systems for biometrics, even being used for iris recognition as a biometric technique. And image classification has been used in robotics for automated systems.
In object localization, a computer vision system is trained to define objects within images and outputting bounding boxes and labels for individual objects. In object localization, a single dominant object is detected and placed in a bounding box for object classification. The classification becomes those images within the box and those object outside of the boxes. This can be used for detecting all cars within an image for autonomous car models. Or it can be used in surveillance for identifying specified objects in images.
In object detection, a computer vision system will process image data for a specific condition. This includes the detection of possible abnormal cells or tissues in medical images or the detection of a vehicle in an automatic road toll system. Detection based on simple and fast computations is sometimes used for finding smaller regions of interesting image data which can be further analyzed by more computationally demanding techniques to produce a correct interpretation.
New systems of image detection models include You Look Only Once (YOLO) model which uses a single neural network trained end to end that takes a photograph as input and predicts bounding boxes and class labels for each bounding box. And the Fast R-CNN model (Fast Region-based convolutional neural network) which has been improved for both speed of training and detection. The Fast R-CNN model is designed to both propose and refine region proposals as part of the training process. These regions are then used to improve the number of region proposals and accelerate the test-time operation of the model.
Object tracking refers to the use of computer vision systems for following a specific object or multiple objects of interest in a given scene. This has traditionally been used in video and real-world interactions where observations are made following an initial object detection. Object tracking is useful for autonomous vehicle systems to understanding the directionality of objects around the car. Object tracking methods can be divided by the observation model into a generative method and discriminative method.
The generative method uses generative models to describe the characteristics and minimize the reconstruction error to search an object.
The discriminative method can be used to distinguish between the object and the background. This method is also referred to as tracking-by-detection. To achieve tracking-by-detection, the method uses deep learning to recognize the wanted object from candidates. It does through two basic network models: stacked auto encoders (SAE) and convolutional neural networks (CNN).
Object tracking tasks
The process of segmentation in computer vision divides images into pixel groupings which can in turn be labelled and classified. Especially in semantic segmentation, the model works to semantically understand the role of each pixel in the image. Such that it works to recognize whether the pixels describe a car, a bike, a person, or a pole, and works to delineate the boundaries of each object. Segmentation, unlike image classification, works to produce dense pixel-wise predictions from models.
Most segmentation is done through fully convolutional networks (FCN) which provide architectures for dense predictions without fully connected layers. This allows segmentation maps to be generated for images of any size and are generated faster compared to other approaches. FCN networks also use downsampling and upsampling in the network to reduce inefficiencies at original image resolution. Downsampling layers are known as striped convolution, and an upsampling layer is known as transposed convolution.
Instance segmentation segments different instances of classes, such as labelling five cars with five different colors. This works similar to semantic segmentation. In instance segmentation and classification, there is generally an image with a single object as the focus and the task is to say what the image is. A computer vision system will locate the objects in the image with bounding boxes, and works to classify the different objects, identify their boundaries, and understand their relations to each other.
Given one or more images of a scene or a video, scene reconstruction aims at computing a 3D model of the scene. In the simplest case the model can be a set of 3D points. Other methods produce complete 3D surface models. The use of scene reconstruction or 3D imaging not requiring motion or scanning, and related image processing algorithms, has enabled advances in scene reconstruction. Using scene reconstruction, a digital version of real world objects can be developed.
Part of computer vision, image restoration is a family of inverse problems for obtaining a high quality image from a corrupted input image. The corruption may occur due to the image-capture process (from signal noise or lens blur), post-processing (from file compression), or photography in non-ideal conditions (such that there is haze or motion blur). The computer vision systems will restore the images by analyzing the image data in terms of the local image structures, such as lines or edges, and control the filtering based on these structures for better image noise removal compared to simpler approaches.