Global

Corporate Research & Development Center

Overview

  • Topics
  • Message from the Director
  • Global R&D
  • Locations
  • History

Research and Development

Web Crawler Policy

Our goal and policy concerning the collecting of web pages by Corporate Research & Development Center, Toshiba Corporation are briefly described below. If you have any query, please email us. We appreciate your cooperation and support.

1. Goal

Our main objective at the Corporate Research & Development Center for the collection of web pages is to conduct research on natural language processing and to develop new technologies and products for the Toshiba Group.

2. Policy

Our crawler always respects the common crawling norm such as the following:

Our crawler accesses each site in a page-by-page manner with some intervals.

Though we interleave the crawling processes with the processes for detecting host aliases, chances are that an aliased server may be accessed simultaneously under different host names.

It always reads the robots.txt file and never crawls restricted pages.

You can specify directives to the crawler in robots.txt file at the top of your site (e.g., http://www.toshiba.co.jp/robots.txt). For example, the following directive forbids our crawler to retrieve any content from your site.


User-agent: TosCrawler

Disallow: /

If you want to control the rate of access, specify Crawl-delay parameter in robots.txt file. For example, the following directs our crawler to access the site not more than once every 30 seconds.


User-agent: TosCrawler

Crawl-delay: 30.0

If the respective web page has the robots meta tag included as follows, our crawler never crawls the page.

You can also protect the contents in a file-by-file manner with the robots meta tags . If you put the following in the header of your HTML documents, our crawler will not follow the links found in the documents.


<META NAME="robots" CONTENT="nofollow, noindex">

We exercise great care regarding the management of web pages.

We register collected web pages and dictionaries in databases in the Corporate Research & Development Center, Toshiba Corp. We manage access to the database and prevent unauthorized access.

3. Contact

For any query or comment or request please email us.

address

Please clarify host name(s) and IP address(es) of your site in the email.

Knowledge Media Laboratory, R&D Center

2012/10/04