Idemetric Processing

Flexible processor architecture for idemetric perception

Intesym Ltd. has developed a reconfigurable software processor based on its Idemetrics technology for performing real-time full-colour multi-dimensional data processing, such as image recognition, image similarity, image categorisation, movie scene analysis & modification, and scene understanding.

The Idemetric Processor is highly parallel and efficient and allows its various internal units to be dynamically configured in myriad ways to process both static and dynamic data, either to produce a processed output thereof (absolute proximal stimulus) or to compare against databases of other data (relative proximal stimulus) for autonomous decision making. In addition it can operate in either closed-loop or open-loop modes as desired, along with feedback-driven reconfiguration and semantic analysis for additional intelligent behaviour.

An internal receptive field network exists within the processor, and larger networks can be built by connecting multiple processors together, forming a counterpart to biological vision systems.

An example of the processor architecture is seen in the Intesym Image Similarity Server, which operates as a web-based search engine and can not only measure overall similarity of a query against samples in a database but can also extract and identify objects within the query scene.

Cortica is a microchip embodiment of the architecture, aimed primarily at high-speed similarity searches through large databases, with a single energy-efficient chip offering thoughputs of dozens of images per second with each image compared against a database of millions of samples. Multiple Cortica processors can be combined to multiply the speed or increase the comprehensiveness of the searching.

Whilst the Idemetric Processor has been developed primarily for image processing, the power and flexibility of this capability makes it also suited to many other forms of data and signals, such as audio, music, tactile, smell, and motion stimuli. In addition, the arrangement of the Idemetric Processor, especially through the hardware-accelerated features of Cortica, is ideal for artificial intelligence applications and algorithmic complexity investigations.

IdemetricIdem ‘same’ + Metric ‘measure’

Idemetrics is a very flexible technology and has many uses, including:

Taxonomy Identifying the species of plants and animals; discerning between spiral, barred, and elliptical galaxies.
Sorting Assiging images to categories based on their content, such as people, cars, animals, landscapes.
Distribution Determining the distribution of objects within a scene, such as locating and following a crowd.
Density Determining the density or quantity of objects within a scene, such as how busy a street is, or measuring bacterial cultures.
Geography Matching landscapes; identifying changes in landscapes.
Cartography Matching OS-style maps and road-maps with aerial photographs.
OCR Number-plate recognition; road-sign reading; font recognition; reading handwriting; spell-checking; interpreting sign-language.
Stylistic neutrality Comparing images of similar things in different styles, such as matching photographs against sketches, clip-art, or “photo-fits”.
Image cleansing Automatic removal of unwanted parts of an image (e.g. background stars in photographs of nebulae).
Scene understanding Identifying objects; identifying objects within objects; identifying errors in a scene; identifying image faults.

The nature of image recognition

Image recognition is an internal process operating on an object in a scene:

  • Observer can determine an object using only prior knowledge of what to look for.
  • Objects recognised according to the discovery of expected characteristics.
  • Algorithms designed to match visual data against expectations.
  • Often requires a different algorithm for each class of object to be recognised.
  • If there is insufficient prior knowledge then recognition fails.
  • If the context is unknown or the image is ambiguous then recognition fails.

Recognition is most suited to situations where objects within a scene need to be identified independently of the overall content of the scene. It is usually performed by searching the image for characteristic elements of known objects, such as shape or colour. Algorithms for specific applications, such as finding people or car registration numbers, are common, effective, and robust.

Since almost any image analysis task can be expressed by algorithms working on prior knowledge of what to look for in an image, the term ‘image recognition’ can span almost any analysis function. By extension, the same principle can be applied to other forms of data such as audio or other arbitrary data-sets; for example, to identify a specific rendition of a musical score.

Pros Cons
Good accuracy if algorithm is robust. Requires multiple object-specific algorithms.
Can be made to suit almost any application. Speed varies according to each algorithm.
No database. Requires high computational power.
Low resource requirements. Only as good as the prior knowledge.
Can fail if image does not meet algorithm criteria.
Not suited to arbitrary whole-scene comparisons.

 

The nature of image similarity

Image similarity is an external process operating over a whole scene:

  • Observer compares two scenes against each other.
  • Calculates the probability that two scenes are similar.
  • Does not require any prior knowledge.
  • Uses a database of sample images of each class of scenes.
  • A single algorithm works with all classes of scenes according to database.
  • A scene containing a single object gives similar functionality to image recognition.

Similarity is most suited to situations where an imprecise determination is required of what a scene depicts, e.g. ‘Is this unknown painting most likely a Van Gogh or a Monet?’. It is usually performed by generating a suitable signature of an image, where the difference between the signatures of two images is a measure of the distance between the two images (a distance of 0 means the images are identical; a distance of 1 means they are completely different). It is not so suited to finding single objects within a scene unless the scene depicts only a single object.

Since the calculation of similarity is performed on a signature, not on the visual image, then in principle the signature can be derived from any data source, allowing the single algorithm to operate not only on images, but also on audio or other arbitrary data sets; for example, to compare the similarity of musical performances, or even to synaesthetically measure the similarity of an image to a piece of music.

Pros Cons
Needs only a single algorithm for comparison of signatures. No single interpretation of what constitutes similarity.
Requires no prior knowledge of objects. Requires context-dependent judgements of what is important.
New objects can be found by adding samples to a database. May need multiple feature-extraction algorithms.
Database samples can be of disparate nature, e.g. photos, sketches, clip-art. Quality of result depends on the quality of the database.
Is tolerant of variations between images. Not directly suited to identifying objects within a scene.
Can be applied to an entire arbitrary scene. Has moderate resource requirements.
Has low computational requirements.
Speeds of millions of comparisons per second, independent of scene complexity.

 

Capability Comparison

Measure Recognition Similarity
Conceptual nature Analytical Intuitive
Applicability Specific inspection General impression
Basis of capabilities Prior knowledge Database of samples
Complexity High Low
Number of algorithms Many One
Speed Low High
Accuracy High; Objective Good; Subjective
Resources used Low ~ Medium Medium ~ High
Computing power Medium ~ High Low

Adaptive receptive fields with reconfigurable Idemetric Processors

Intesym Idemetric Processors form a stimuli receptive field network with optional feedback paths, and multiple processors can be combined to expand the size of such a field to support more complex processing requirements. Such a network resembles biological receptive fields, e.g. a visual perception system would be comparable to the field hierarchy running from the retina (image input) to the extrastriate visual areas (useful output).

The input of such a field can be multi-dimensional and is typically an image, but could also be audio or other signal, and the output is a conceptual representation of the input. The output could be as simple as percentage similarities of the query relative to the contents of a database for image recognition, or it could be as complex as representing an “understanding” of a complete scene for a roving robot. Feedbacks can exist within the field to allow for the resolution of ambiguities or for refining processing according to subsequent discoveries. The feedback mechanism can also provide a dynamic memory for the processing of real-time signals or motion pictures.

Web-based image search engine

Using a convenient AJAX-based interface, the similarity server provides a useful search engine with which a user selects a database, submits a query image, and the most similar images within the database are returned. Databases of millions of images are easily and quickly handled with searches through such databases being completed in as little as one second.

Applications

Of course, there is the question: “What constitutes similarity?”. The answer to this is not as simple as may be at first assumed. Is Snoopy more similar to a real beagle than he is to Garfield? It depends on whether the measure of similarity relates to what Snoopy is (a cartoon drawing) or what he represents (a beagle). As such, truly generic similarity is immeasurable by any single method. To cope with this dilemma, the similarity server uses a reconfigurable Idemetric Processor to allow different types of images to be measured for similarity in different ways.

Once configured for a particular type of image, all processing of queries is automatic without any user decisions or manipulations, and all required configurations can be stored and automatically applied on-the-fly by the processor according to the type, content, and context of the image presented to it, thus making the system very autonomous.

Web-client screenshots

Interface

Using a normal web-browser, a query image is submitted to the server and the results returned in the main panel.

The Intesym idemetric processor used by the server can be either a software-based implementation or a hardware-based Cortica unit.

Scene similarity

In the image to the right, a picture of three golden dragons in front of a hedge is submitted. From the database the most similar images are found as being a copy of the submitted image (best match), a photograph of a single dragon (second best match), and a photograph of a tree hedge (third best match). Other photographs in the database were quite different to these and so followed with lower probabilities of similarity.

Object taxonomy

In the image to the right, a picture of a spiral galaxy is submitted. The best results are shown to be primarily spiral galaxies, with elliptical shapes following. The database contained many images of galaxies, clusters, and nebulae. It demonstrates the ability of the system to perform purely visual taxonomic discrimination, where the results accord with the category of the scene instead of being ranked by the similarity of detail.