Corporate Research & Development Center


Research and Development

  • Research News
  • Research Fields
  • Awards
  • Media
  • Videos

Toshiba’s High-Speed Crowd Counting AI Delivers World-Class Accuracy on General-Use PC
-Toshiba’s unique deep learning algorithm provides low-cost crowd analysis solution,
supports public safety in the new normal-

June 12, 2020
Toshiba Corporation

TOKYO—At a time of growing concern about social distancing and public health and safety, Toshiba Corporation (TOKYO: 6502) has developed a PC-based, high-speed, high accuracy image-analysis AI that monitors crowds and congestion by estimating the number of people in camera images using its unique deep learning algorithm.

Unlike many other solutions, Toshiba’s AI does not require a high-end graphics processor unit (GPU)(Note 1) and can run on a PC with a typical central processing unit (CPU)(Note 2), and handle high-speed analysis of a total of 180 images a minute. Even so, the new AI both reduces the cost of estimating crowd numbers and improves on accuracy—compared to other approaches, Toshiba’s new AI reduces estimation error per image from 16.0% to 14.7%, confirming its world-leading accuracy(Note 3).

In today’s “new normal,” when people are equally concerned about being able to go about their daily lives and the heighten risk of infection from being in crowds, Toshiba’s new AI is a public health and safety tool that delivers the ability to quickly identify and manage crowds and congestion. It makes it possible to carry out crowd analysis in many different kinds of location with only minimal computing resources.

Toshiba will present this technology at the 26th Symposium on Sensing via Image Information (SSII2020) held from June 10 to 12, 2020.

As more security cameras are installed, with estimates of 1-billion worldwide by 2022, their usefulness in monitoring the movement of traffic and individuals has made them important tools for public safety. The Covid-19 pandemic has also shown how monitoring crowds can contribute to public health by identifying congestion hot spots. Advances in deep learning AI, an area where Toshiba cultivates expertise, now make it possible to estimate crowd numbers and density at high speed with high accuracy.

Figure 1: Examples of Use in Crowd Measurement and Control

Until now, cost has been a hurdle to the wider use of technology for estimating crowd size, according to Yuto Yamaji, an AI researcher at Toshiba’s corporate R&D Center. “Current approaches need high-end hardware, a powerful GPU to handle the scale and complexity of the computing. Installing technology like that across large facilities is expensive,” he explains. But that’s not the only concern. “There’s also an accuracy problem. As the result of applying a fixed scale to image analysis, it’s harder to realize accurate detection if the apparent size of a person changes, depending on distance from the camera.”

Toshiba approached the first problem by developing a deep network that uses only the CPU, the core component on any PC, for all high-speed processing. Toshiba has confirmed that the AI is able to provide high-speed analysis that can handle one image from three cameras every second, a total of 180 images a minute with a single PC.

Figure 2: Technology Overview

Alongside this, the company developed an algorithm to boost count accuracy. It compensates for perspective by creating a network structure that allows multiple groups to be identified and analyzed, regardless of size, as shown in Figure 3.
Evaluated against images from a public dataset(Note 4) of densely populated locations, Toshiba’s new AI reduced estimation error for a single image from the 16.0% that has already been achieved to 14.7%, confirming its world-leading accuracy.
It extracts data from the images, and visualizes it as a density map overlaid on images, it is possible to display areas of congestion and slow-moving flows of people.

Figure 3: Unique Deep Learning Technology

Toshiba has high hopes for the AI: “We see this as an easily implemented, automated solution for public spaces,” says Yuto Yamaji. “Installed in places like stores, stadiums and other facilities, it can monitor, visualize and provide notifications on crowd conditions, and contribute to the management of traffic flows and reduce congestion.”

Toshiba is now evaluating application of the AI in Toshiba Group’s products and services, and looking to commercialize it by the end of this fiscal year. Going forward, the company plans to extend its use into other areas, such as counting vehicles to support traffic management, counting objects as part of inventory management, and other areas where it can meet social needs.

(Note 1)
GPU: Graphics Processing Unit. A specialized electronic circuit designed for image-processing.
(Note 2)
CPU: Central processing unit; the core component of any computer, its “brain.”
(Note 3)
Based on the ShanghaiTech-PartA public dataset. The deep learning is carried out on 300 images. The absolute error ratio for the estimated number of people in each crowd image to the actual number was calculated using 182 evaluation images. (Toshiba’s investigation, April 2020)
(Note 4)
Public datasets: images provided by universities to provide an objective benchmark for different approaches to image analysis. Toshiba evaluated its AI with Shanghai Tech-PartA data which is commonly used for evaluating crowd counting.

Media Inquiry

Toshiba Corporation Media Relations Group: +81 3 3457 2100