Computer vision started as an MIT undergrad summer project in 1966 that was supposed to be done and dusted in that one summer. Although this did not pan out as planned, the technology has grown rapidly since then, and now finds its application in several industries.
Here we take a look at the basic (and sometimes more complex) terminologies related to this field.
A branch of tech which makes sense of visual content (images, videos, graphics). All visual content is basically a collection of pixel-value; computer vision considers these pixel values and tries to understand what they signify/represent.
A method in which computers are taught by example.
In traditional programming, the “rules” have to be written; the program then converts inputs to outputs by following these rules. In machine learning, you give the program varied examples of inputs and desired outputs, and the program uses trial and error to write and learn the rules by itself.
Neuron - aka Parameter/Perceptron
A mathematical function that takes an assorted number of inputs and outputs, multiplies them together with its weights [the weights also change with time as the network learns] and gives a single value output. This output is then fed as an input into other neurons.
An arrangement of neurons/perceptrons in a way that the architecture understands the underlying principles and relationships in a dataset. The working of these Neural Networks mimics the working of neurons in a human brain.
The set of data and ground truth of outputs that are used to train a machine learning model. For instance, in case of object detection, the set of data would comprise images and the ground truth would be the annotations that you want your model to learn.
Machine Learning model
A mathematical model that recognises certain types of patterns. You train a model over a set of data, providing it with an algorithm that it can use to learn from those data.
Did you know?
A neural network is a subclass of machine learning models.
Image / Video keywording
The ability to detect concrete and abstract contents inside an image or video.
The ability to identify faces in images and videos and provide valuable information about them.
The mathematical representation of remarkable qualities or features of an object in a data in the form of a list of numbers. This mathematical representation is used for statistical analysis.
On-premises software is technology that is installed and run on devices on the premises of the individual or organization using the software, rather than at a remote facility such as a server farm or cloud.
Metrics are similar to a student’s markshee. They’re used to evaluate the performance of a machine learning system. Most commonly used metrics are accuracy, precision, F1-score, Area under the Curve (AUC) & Receiver Operating Characteristic (ROC).
The Mobius SDK provides nearly 11k keywords aka concepts out-of-the-box. However the user may want to create new and highly specific concepts. Therefore, Mobius Labs provides the ability to train any number of new concepts. This way users can define new concepts instead of using a predefined set of concepts.
An important concept in videos is the one of ‘shots’. A shot is a sequence of frames where the semantics (that is, the content) only changes slowly. In order to perform a meaningful analysis of a video, it is highly beneficial to identify so-called ‘video shot boundaries’, or ‘shot boundaries’ for short.
The highlighting feature allows to obtain ‘highlight scores’ for video frames, which allows to identify the most important/interesting parts of a video. This can be very useful for example in order to create a summary of a video that can be shown if someone is browsing through a video database.
The Similarity search module of the Mobius Vision SDK allows users to find visually similar images to an input image.