Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
28 changes: 28 additions & 0 deletions docs/fumadocs/next.config.mjs
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,34 @@ const config = {
trailingSlash: false,
skipTrailingSlashRedirect: true,
assetPrefix: 'https://skillkit-docs.vercel.app',
async headers() {
return [
{
source: '/:path*.(css|js|png|jpg|jpeg|svg|webp|woff2|woff|ttf|ico)',
headers: [
{ key: 'Cache-Control', value: 'public, max-age=86400, s-maxage=604800, stale-while-revalidate=2592000' },
],
},
{
source: '/_next/static/:path*',
headers: [
{ key: 'Cache-Control', value: 'public, max-age=31536000, immutable' },
],
},
{
source: '/docs/:path*',
headers: [
{ key: 'Cache-Control', value: 'public, max-age=300, s-maxage=86400, stale-while-revalidate=604800' },
],
},
{
source: '/',
headers: [
{ key: 'Cache-Control', value: 'public, max-age=300, s-maxage=86400, stale-while-revalidate=604800' },
],
},
];
},
};

export default withMDX(config);
138 changes: 138 additions & 0 deletions docs/fumadocs/public/robots.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Humans + agents welcome. Training crawlers + SEO scrapers blocked.

User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: DuckDuckBot
Allow: /

User-agent: Applebot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: FirecrawlAgent
Allow: /

User-agent: firecrawl
Allow: /

User-agent: Context7Bot
Allow: /

User-agent: Crawl4AI
Allow: /

User-agent: Clawdbot
Allow: /

User-agent: OpenClaw
Allow: /

User-agent: Hermes
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: ImagesiftBot
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: peer39_crawler
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: Timpibot
Disallow: /

User-agent: ICC-Crawler
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: DotBot
Disallow: /

User-agent: PetalBot
Disallow: /

User-agent: BLEXBot
Disallow: /

User-agent: MegaIndex
Disallow: /

User-agent: SeznamBot
Disallow: /

User-agent: DataForSeoBot
Disallow: /

User-agent: *
Allow: /

Sitemap: https://skillkit-docs.vercel.app/sitemap.xml
22 changes: 22 additions & 0 deletions docs/fumadocs/src/middleware.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
import { NextResponse } from 'next/server';
import type { NextRequest } from 'next/server';

const BLOCK = /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot/i;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 FacebookBot missing from middleware BLOCK regex despite being in robots.txt Disallow list

The robots.txt at docs/fumadocs/public/robots.txt:78-79 explicitly disallows FacebookBot, but the BLOCK regex in the middleware omits it. This means FacebookBot will pass through the middleware (falling through to the default NextResponse.next() at line 17) and serve content normally, undermining the intended bot-blocking enforcement.

Suggested change
const BLOCK = /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot/i;
const BLOCK = /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|FacebookBot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot/i;
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.


const ALLOW = /Googlebot|Bingbot|DuckDuckBot|Applebot(?!-Extended)|ChatGPT-User|OAI-SearchBot|PerplexityBot|Perplexity-User|Claude-User|Claude-SearchBot|FirecrawlAgent|firecrawl|Context7Bot|Crawl4AI|Clawdbot|OpenClaw|Hermes/i;

export function middleware(req: NextRequest) {
const ua = req.headers.get('user-agent') || '';
if (ALLOW.test(ua)) return NextResponse.next();
if (BLOCK.test(ua)) {
Comment on lines +4 to +11

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Blocklist/enforcement mismatch and precedence bug in UA checks.

FacebookBot is disallowed in docs/fumadocs/public/robots.txt (Line 78) but missing from BLOCK (Line 4). Also, Line 10 checks ALLOW before BLOCK, so a mixed UA containing both patterns can bypass blocking.

Suggested fix
-const BLOCK = /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot/i;
+const BLOCK = /GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|FacebookBot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot/i;

 export function middleware(req: NextRequest) {
   const ua = req.headers.get('user-agent') || '';
-  if (ALLOW.test(ua)) return NextResponse.next();
   if (BLOCK.test(ua)) {
     return new NextResponse('disallowed by robots.txt', {
       status: 403,
       headers: { 'Cache-Control': 'public, max-age=86400' },
     });
   }
+  if (ALLOW.test(ua)) return NextResponse.next();
   return NextResponse.next();
 }
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/fumadocs/src/middleware.ts` around lines 4 - 11, The middleware
currently tests ALLOW before BLOCK which lets mixed UAs bypass blocking and also
omits FacebookBot from BLOCK; update the middleware function to first test BLOCK
(e.g., if (BLOCK.test(ua)) return
NextResponse.rewrite(.../NextResponse.redirect/NextResponse.next with block
code) so block precedence wins, then test ALLOW afterwards, and add the missing
"FacebookBot" token to the BLOCK RegExp constant so explicit disallowed agents
are caught; keep using the existing symbols BLOCK, ALLOW, middleware,
NextRequest and NextResponse to locate and modify the code.

return new NextResponse('disallowed by robots.txt', {
status: 403,
headers: { 'Cache-Control': 'public, max-age=86400' },
});
}
return NextResponse.next();
}

export const config = {
matcher: '/((?!_next/static|_next/image|favicon|robots\\.txt|sitemap\\.xml).*)',
};
138 changes: 138 additions & 0 deletions docs/skillkit/public/robots.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,138 @@
# Humans + agents welcome. Training crawlers + SEO scrapers blocked.

User-agent: Googlebot
Allow: /

User-agent: Bingbot
Allow: /

User-agent: DuckDuckBot
Allow: /

User-agent: Applebot
Allow: /

User-agent: ChatGPT-User
Allow: /

User-agent: OAI-SearchBot
Allow: /

User-agent: PerplexityBot
Allow: /

User-agent: Perplexity-User
Allow: /

User-agent: Claude-User
Allow: /

User-agent: Claude-SearchBot
Allow: /

User-agent: FirecrawlAgent
Allow: /

User-agent: firecrawl
Allow: /

User-agent: Context7Bot
Allow: /

User-agent: Crawl4AI
Allow: /

User-agent: Clawdbot
Allow: /

User-agent: OpenClaw
Allow: /

User-agent: Hermes
Allow: /

User-agent: GPTBot
Disallow: /

User-agent: ClaudeBot
Disallow: /

User-agent: anthropic-ai
Disallow: /

User-agent: CCBot
Disallow: /

User-agent: Google-Extended
Disallow: /

User-agent: Applebot-Extended
Disallow: /

User-agent: Bytespider
Disallow: /

User-agent: Amazonbot
Disallow: /

User-agent: FacebookBot
Disallow: /

User-agent: Meta-ExternalAgent
Disallow: /

User-agent: cohere-ai
Disallow: /

User-agent: Diffbot
Disallow: /

User-agent: ImagesiftBot
Disallow: /

User-agent: Omgilibot
Disallow: /

User-agent: peer39_crawler
Disallow: /

User-agent: YouBot
Disallow: /

User-agent: Timpibot
Disallow: /

User-agent: ICC-Crawler
Disallow: /

User-agent: AhrefsBot
Disallow: /

User-agent: SemrushBot
Disallow: /

User-agent: MJ12bot
Disallow: /

User-agent: DotBot
Disallow: /

User-agent: PetalBot
Disallow: /

User-agent: BLEXBot
Disallow: /

User-agent: MegaIndex
Disallow: /

User-agent: SeznamBot
Disallow: /

User-agent: DataForSeoBot
Disallow: /

User-agent: *
Allow: /

Sitemap: https://skillkit.dev/sitemap.xml
42 changes: 42 additions & 0 deletions docs/skillkit/vercel.json
Original file line number Diff line number Diff line change
Expand Up @@ -21,5 +21,47 @@
"source": "/docs",
"destination": "https://skillkit-docs.vercel.app/docs"
}
],
"headers": [
{
"source": "/(.*)\\.(css|js|png|jpg|jpeg|svg|webp|woff2|woff|ttf|ico|mp4)",
"headers": [
{ "key": "Cache-Control", "value": "public, max-age=86400, s-maxage=604800, stale-while-revalidate=2592000" }
]
},
{
"source": "/assets/(.*)",
"headers": [
{ "key": "Cache-Control", "value": "public, max-age=31536000, immutable" }
]
},
{
"source": "/(.*)\\.html",
"headers": [
{ "key": "Cache-Control", "value": "public, max-age=300, s-maxage=86400, stale-while-revalidate=604800" }
]
},
{
"source": "/",
"headers": [
{ "key": "Cache-Control", "value": "public, max-age=300, s-maxage=86400, stale-while-revalidate=604800" }
]
},
{
"source": "/api",
"headers": [
{ "key": "Cache-Control", "value": "public, max-age=300, s-maxage=86400, stale-while-revalidate=604800" }
]
}
],
"redirects": [
{
"source": "/((?!robots\\.txt$).*)",
"has": [
{ "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 FacebookBot missing from vercel.json redirect user-agent pattern despite being in robots.txt Disallow list

The robots.txt at docs/skillkit/public/robots.txt:78-79 explicitly disallows FacebookBot, but the user-agent regex in the vercel.json redirect rule omits it. This means FacebookBot requests will not be redirected to /robots.txt and will be served content normally, undermining the intended bot-blocking enforcement for the skillkit docs site.

Suggested change
{ "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
{ "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|FacebookBot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

],
"destination": "/robots.txt",
"permanent": false
}
Comment on lines +57 to +65

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

FacebookBot is disallowed in robots but not matched in redirect blocklist.

Line 61 omits FacebookBot, while docs/skillkit/public/robots.txt (Line 78) disallows it. This creates policy drift and allows that crawler to bypass this redirect control.

Suggested fix
-        { "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
+        { "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|FacebookBot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
"redirects": [
{
"source": "/((?!robots\\.txt$).*)",
"has": [
{ "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
],
"destination": "/robots.txt",
"permanent": false
}
"redirects": [
{
"source": "/((?!robots\\.txt$).*)",
"has": [
{ "type": "header", "key": "user-agent", "value": "(?i).*(GPTBot|ClaudeBot|anthropic-ai|CCBot|Google-Extended|Applebot-Extended|Bytespider|Amazonbot|FacebookBot|Meta-ExternalAgent|cohere-ai|Diffbot|ImagesiftBot|Omgilibot|peer39_crawler|YouBot|Timpibot|ICC-Crawler|AhrefsBot|SemrushBot|MJ12bot|DotBot|PetalBot|BLEXBot|MegaIndex|SeznamBot|DataForSeoBot).*" }
],
"destination": "/robots.txt",
"permanent": false
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@docs/skillkit/vercel.json` around lines 57 - 65, The redirect blocklist in
the "redirects" array (the object that has the header with "key": "user-agent"
and the "value" regex) is missing FacebookBot whereas robots.txt disallows it;
update the user-agent regex value to include FacebookBot (add the token
"FacebookBot" into the alternation list) so the redirect that maps to
"/robots.txt" will also match and block FacebookBot, ensuring the header-based
redirect and docs/skillkit/public/robots.txt remain consistent.

]
}
Loading