News Releases

Toshiba’s Ultra-Fast Data Matching Technology is Over 50 Times Faster than its Predecessors

Advances big data analytics—can find an individual image among 10 million in 8.31 milliseconds
25 May, 2016

TOKYO—Toshiba Corporation (TOKYO: 6502) has developed an ultra-fast data search and matching technology that outperforms similar systems by a factor of 50. It can be applied to any data that can be represented as a high-dimensional vector, and its wide ranging applications include big data analytics of large scale media databases*1 and facial recognition—in experiments, the system recognized a single individual among 5,800 people in a photo database of 10 million images in only 8.31 milliseconds*2.

Advances in big data analysis continue to secure dramatic refinements in such areas as machine learning and failure prediction, bringing increasing benefits to daily life. However, data volumes continue to grow exponentially and to keep pace analysis and recognition capabilities must also accelerate.

Toshiba’s technology builds indexes of high-dimensional feature data*3 extracted from objects, including complex, multi-faceted objects such as the human face or representations of product sale patterns and stock prices over time. The database can be searched for pattern matches, and produces results at an unmatched, ultra-fast rate. This performance acceleration rests on three components, shown below.

Tree components of Maching Technology

•   Vector Coding Technology: encodes feature of objects as short vectors, and maintains the shortest possible difference between the vectors.

•   Vector Indexing Technology: recognizes similar vectors without any need to compute the distance between them.

•   Pipeline Lookup Technology: a combination of both coarse and fine lookup

Vector Indexing Technology is an original technology developed by Toshiba. It builds groups of similar vectors, and so enables rapid identification of the group close to the vector in a query. It does not need to compute the distance between individual vectors and the query, realizing ultra-fast lookup of vectors.

Toshiba initially intends to apply the technology in three areas: pattern mining, media recognition and big data analysis. For example, pattern mining would allow a particular person to be identified almost instantly among a large set of images taken by surveillance cameras, while media recognition could be used to protect soft targets, such as airports and railway stations*4 by automatically identifying persons wanted by the authorities.

Toshiba also hoping to support its clients and contribute to society by deploying this technology to new fields such as deep learning.

The company plan to release a new database product based on the new recognition technology and GridDB, its scalable database, that will enable ultra-fast processing of big data and large-scale media databases in fiscal year 2016.

Future applications

1. Pattern Mining—finding similar patterns
Surveillance cameras installed over a wide area, a town for example, can be monitored Cameras can be installed in diverse facilities, such as railway stations, airports, highway entrances, amusement parks, ATMs, banks, ticket vendors, etc. Surveillance of crowds moving from a railway station to a stadium or a concert hall may be another application.

Pttern Mining of Surveillance cameras

Financial data mining of characteristic movements of stock prices.

Pattern Mining of characteristic movements of stock prices

Identification of individual in massive image database could be done, in order to create or search within a video database.

Pattern Mining of massive image database

2. Reinforcement of media recognition
Soft targets could be protected by detecting individuals on wanted lists. In industry, recognition off a single component within a database of 10-million industrial parts could be done almost instantly, boosting productivity.

3. Big data analytics
Cloud services for automatic data analysis with machine learning and prediction could be implemented. Analysis of sales data or sensor data could be achieved simply by uploading the data to a server.

Video: Wide area surveillance

This video shows application of the technology to wide area surveillance.

 

* Click Play button to start movie. YouTube is the service provided from other company, and please follow the terms of use in YouTube.

Note:
[1]
Collections of surveillance video, TV program archives, phone conversations recorded by call center and web texts are examples of such databases.
[2]
Experiment parameters: Precision of 98% in recognition of 10 million images of the faces of 5,800 people. The results were as follow.

elapsed time (msec)

Precision of face recognition

This technology:

8.31

98% and over

Preceding technology A:

449.3

Preceding technology B:

389.5

[3]
Feature data expressed as high dimensional vectors, with 100-100,000 dimensions; many more dimensions than 2D (planar) and 3D (spatial) vectors.
[4]
With current technology, the identification of a single individual in a database of 10,000,000 criminals takes approximately 20 seconds. Toshiba’s technology can do it in just 0.68 seconds. (Toshiba estimate for theoretical system).