-
Notifications
You must be signed in to change notification settings - Fork 0
Expand file tree
/
Copy pathrobots.txt
More file actions
95 lines (93 loc) · 2.44 KB
/
robots.txt
File metadata and controls
95 lines (93 loc) · 2.44 KB
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
# Block all known AI crawlers and assistants
# from using content for training AI models.
# Source: https://robotstxt.com/ai
User-Agent: GPTBot
User-Agent: ClaudeBot
User-Agent: Claude-User
User-Agent: Claude-SearchBot
User-Agent: CCBot
User-Agent: Google-Extended
User-Agent: Applebot-Extended
User-Agent: Facebookbot
User-Agent: Meta-ExternalAgent
User-Agent: Meta-ExternalFetcher
User-Agent: diffbot
User-Agent: PerplexityBot
User-Agent: Perplexity‑User
User-Agent: Omgili
User-Agent: Omgilibot
User-Agent: webzio-extended
User-Agent: ImagesiftBot
User-Agent: Bytespider
User-Agent: TikTokSpider
User-Agent: Amazonbot
User-Agent: Youbot
User-Agent: SemrushBot-OCOB
User-Agent: Petalbot
User-Agent: VelenPublicWebCrawler
User-Agent: TurnitinBot
User-Agent: Timpibot
User-Agent: OAI-SearchBot
User-Agent: ICC-Crawler
User-Agent: AI2Bot
User-Agent: AI2Bot-Dolma
User-Agent: DataForSeoBot
User-Agent: AwarioBot
User-Agent: AwarioSmartBot
User-Agent: AwarioRssBot
User-Agent: Google-CloudVertexBot
User-Agent: PanguBot
User-Agent: Kangaroo Bot
User-Agent: Sentibot
User-Agent: img2dataset
User-Agent: Meltwater
User-Agent: Seekr
User-Agent: peer39_crawler
User-Agent: cohere-ai
User-Agent: cohere-training-data-crawler
User-Agent: DuckAssistBot
User-Agent: Scrapy
User-Agent: Cotoyogi
User-Agent: aiHitBot
User-Agent: Factset_spyderbot
User-Agent: FirecrawlAgent
User-Agent: bedrockbot
User-Agent: DeepSeekBot
User-Agent: GoogleAgent-Mariner
User-Agent: Gemini-Deep-Research
User-Agent: Google-NotebookLM
User-Agent: Google-Agent
User-Agent: GoogleAgent-URLContext
User-Agent: Google-Firebase
User-Agent: MistralAI-User
User-Agent: SemrushBot-FT
User-Agent: SemrushBot-ESI
User-Agent: AddSearchBot
User-Agent: bigsur.ai
User-Agent: Brightbot
User-Agent: Crawlspace
User-Agent: EchoboxBot
User-Agent: FriendlyCrawler
User-Agent: LinerBot
User-Agent: Panscient
User-Agent: Panscient.com
User-Agent: Poseidon Research Crawler
User-Agent: SBIntuitionsBot
User-Agent: TerraCotta
User-Agent: Thinkbot
User-Agent: Yak
User-Agent: YandexAdditional
User-Agent: YandexAdditionalBot
Disallow: /
DisallowAITraining: /
# Block any non-specified AI crawlers (e.g., new
# or unknown bots) from using content for training
# AI models, while allowing the website to be
# indexed and accessed by bots. These directives
# are still experimental and may not be supported
# by all AI crawlers.
User-Agent: *
DisallowAITraining: /
Content-Usage: ai=n
Content-Signal: search=yes, ai-input=no, ai-train=no
Allow: /