Robots.txt Template

Simple robots.txt template. Keep unwanted robots out (disallow). White lists (allow) legitimate user-agents. Useful for all websites.

The Robots Exclusion Standard [1], also known as the robots exclusion protocol or simply robots.txt, is a standard used by websites to communicate with web crawlers and other web robots. The standard specifies how to inform the web robot about which areas of the website should not be processed or scanned.

Robots Exclusion Standard file templates

This website contains 2 robots.txt file templates (regular and minified) to help webmasters keep unwanted web robots (e.g. scraper bots, people search engines, seo tools, marketing tools, etc.) away from their websites while allowing access to legitimate robots (e.g. search engine crawlers).

To be legitimate and get listed, robots must fully obey the Robots Exclusion Standard. The robots.txt file templates contain a white list. Unlisted robots (User-agents) are, by the conventions of the Robots Exclusion Standard, not allowed to access.

Templates

The robots.txt template files contain an alphabetically ordered whitelist of legitimate web robots. In the commented version, each bot is shortly described in a comment above the (list of) user-agent(s). Uncomment or delete bots (User-agents) you do not wish to allow to access your website.

There are two robots.txt file versions, which you can simply copy and paste to use:

  1. Regular (with comments)
  2. Minified (no comments)

Regular template (with comments)

################################# ROBOTS.TXT ###################################
#                                                                              #
# Alphabetically ordered whitelisting of legitimate web robots, which obey the #
# Robots Exclusion Standard (robots.txt). Each bot is shortly described in a   #
# comment above the (list of) user-agent(s). Comment out or delete lines which #
# contain User-agents you do not wish to allow on your website.                #
# Important: Blank lines are not allowed in the final robots.txt file!         #
# Updates can be retrieved from: https://www.ditig.com/robots-txt-template     #
#                                                                              #
# This document is licensed with a CC BY-NC-SA 4.0 license.                    #
#                                                                              #
# Last update: 2021-11-04                                                      #
#                                                                              #
################################################################################
# so.com chinese search engine
User-agent: 360Spider
User-agent: 360Spider-Image
User-agent: 360Spider-Video
# google.com landing page quality checks
User-agent: AdsBot-Google
User-agent: AdsBot-Google-Mobile
# google.com app resource fetcher
User-agent: AdsBot-Google-Mobile-Apps
# bing ads bot
User-agent: adidxbot
# apple.com search engine
User-agent: Applebot
user-agent: AppleNewsBot
# baidu.com chinese search engine
User-agent: Baiduspider
User-agent: Baiduspider-image
User-agent: Baiduspider-news
User-agent: Baiduspider-video
# bing.com international search engine
User-agent: bingbot
User-agent: BingPreview
# bublup.com suggestion/search engine
User-agent: BublupBot
# commoncrawl.org open repository of web crawl data
User-agent: CCBot
# cliqz.com german in-product search engine
User-agent: Cliqzbot
# coccoc.com vietnamese search engine
User-agent: coccoc
User-agent: coccocbot-image
User-agent: coccocbot-web
# daum.net korean search engine
User-agent: Daumoa
# dazoo.fr french search engine
User-agent: Dazoobot
# deusu.de german search engine
User-agent: DeuSu
# duckduckgo.com international privacy search engine
User-agent: DuckDuckBot
User-agent: DuckDuckGo-Favicons-Bot
# eurip.com european search engine
User-agent: EuripBot
# exploratodo.com latin search engine
User-agent: Exploratodo
# facebook.com social network
User-agent: Facebot
# feedly.com feed fetcher
User-agent: Feedly
# findx.com european search engine
User-agent: Findxbot
# goo.ne.jp japanese search engine
User-agent: gooblog
# google.com international search engine
User-agent: Googlebot
User-agent: Googlebot-Image
User-agent: Googlebot-Mobile
User-agent: Googlebot-News
User-agent: Googlebot-Video
# so.com chinese search engine
User-agent: HaoSouSpider
# goo.ne.jp japanese search engine
User-agent: ichiro
# istella.it italian search engine
User-agent: istellabot
# jike.com / chinaso.com chinese search engine
User-agent: JikeSpider
# lycos.com & hotbot.com international search engine
User-agent: Lycos
# mail.ru russian search engine
User-agent: Mail.Ru
# google.com adsense bot
User-agent: Mediapartners-Google
# mojeek.com search engine
User-agent: MojeekBot
# bing.com international search engine
User-agent: msnbot
User-agent: msnbot-media
# orange.com international search engine
User-agent: OrangeBot
# pinterest.com social networtk
User-agent: Pinterest
# botje.nl dutch search engine
User-agent: Plukkie
# qwant.com french search engine
User-agent: Qwantify
# rambler.ru russian search engine
User-agent: Rambler
# seznam.cz czech search engine
User-agent: SeznamBot
# soso.com chinese search engine
User-agent: Sosospider
# yahoo.com international search engine
User-agent: Slurp
# sogou.com chinese search engine
User-agent: Sogou blog
User-agent: Sogou inst spider
User-agent: Sogou News Spider
User-agent: Sogou Orion spider
User-agent: Sogou spider2
User-agent: Sogou web spider
# sputnik.ru russian search engine
User-agent: SputnikBot
# ask.com international search engine
User-agent: Teoma
# twitter.com bot
User-agent: Twitterbot
# wotbox.com international search engine
User-agent: wotbox
# yacy.net p2p search software
User-agent: yacybot
# yandex.com russian search engine
User-agent: Yandex
User-agent: YandexMobileBot
# search.naver.com south korean search engine
user-agent: Yeti
# yioop.com international search engine
User-agent: YioopBot
# yooz.ir iranian search engine
User-agent: yoozBot
# youdao.com chinese search engine
User-agent: YoudaoBot
# crawling rule(s) for above bots
Disallow:
# disallow all other bots
User-agent: *
Disallow: /

Minified template (without comments)

################################# ROBOTS.TXT ###################################
# Updates can be retrieved from: https://www.ditig.com/robots-txt-template     #
# This document is licensed with a CC BY-NC-SA 4.0 license.                    #
# Last update: 2021-11-04                                                      #
################################################################################
User-agent: 360Spider
User-agent: 360Spider-Image
User-agent: 360Spider-Video
User-agent: AdsBot-Google
User-agent: AdsBot-Google-Mobile
User-agent: AdsBot-Google-Mobile-Apps
User-agent: adidxbot
User-agent: Applebot
User-agent: AppleNewsBot
User-agent: Baiduspider
User-agent: Baiduspider-image
User-agent: Baiduspider-news
User-agent: Baiduspider-video
User-agent: bingbot
User-agent: BingPreview
User-agent: BublupBot
User-agent: CCBot
User-agent: Cliqzbot
User-agent: coccoc
User-agent: coccocbot-image
User-agent: coccocbot-web
User-agent: Daumoa
User-agent: Dazoobot
User-agent: DeuSu
User-agent: DuckDuckBot
User-agent: DuckDuckGo-Favicons-Bot
User-agent: EuripBot
User-agent: Exploratodo
User-agent: Facebot
User-agent: Feedly
User-agent: Findxbot
User-agent: Googlebot
User-agent: Googlebot-Image
User-agent: Googlebot-Mobile
User-agent: Googlebot-News
User-agent: Googlebot-Video
User-agent: HaoSouSpider
User-agent: ichiro
User-agent: istellabot
User-agent: JikeSpider
User-agent: Lycos
User-agent: Mail.Ru
User-agent: Mediapartners-Google
User-agent: MojeekBot
User-agent: msnbot
User-agent: msnbot-media
User-agent: OrangeBot
User-agent: Pinterest
User-agent: Plukkie
User-agent: Qwantify
User-agent: Rambler
User-agent: SeznamBot
User-agent: Sosospider
User-agent: Slurp
User-agent: Sogou blog
User-agent: Sogou inst spider
User-agent: Sogou News Spider
User-agent: Sogou Orion spider
User-agent: Sogou spider2
User-agent: Sogou web spider
User-agent: SputnikBot
User-agent: Teoma
User-agent: Twitterbot
User-agent: wotbox
User-agent: yacybot
User-agent: Yandex
User-agent: YandexMobileBot
User-agent: Yeti
User-agent: YioopBot
User-agent: yoozBot
User-agent: YoudaoBot
Disallow:
User-agent: *
Disallow: /

Warranty and Liability

The author makes absolutely no claims and representations to warranties regarding the accuracy or completeness of the information provided. However, you can use the templates on this website AT YOUR OWN RISK.

The descision which bot is wanted/unwanted is done by the author, who is very conservative and opinionated when it comes to blocking bots. However, the author's decisions should be sufficient for many. Do not forget to adjust the list of allowed/forbidden directories to your needs.

License

The Robots.txt template was originally written by Jonas Jacek who licensed it under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.

References

  1. Wikipedia - Robots Exclusion Standard