SEOTechnical SEOUpdated 2026.04.28

robots.txt

Also known as로봇 텍스트robots 파일

In one line

robots.txt is the text file at a site's root that tells search engines and AI crawlers which paths they may or may not crawl — a long-standing web standard.

Going deeper

robots.txt has been around since 1994 and was finally standardised as RFC 9309 by the IETF in 2022. It is a plain text file at the site root (https://example.com/robots.txt). You target a bot with User-agent and control path access with Allow / Disallow. Every major search crawler — Googlebot, Bingbot — and the new wave of AI crawlers — GPTBot, ClaudeBot, PerplexityBot, Google-Extended — read it first. In other words, robots.txt is the first gate deciding whether your site shows up in AI answers at all.

The syntax is simple. A 'User-agent: *' line names which bots the rules apply to, then 'Disallow: /admin/' or 'Allow: /' lines control access by path. The asterisk matches every bot, and you can stack per-bot rules (e.g., 'User-agent: GPTBot' followed by 'Disallow: /'). A trailing 'Sitemap: https://example.com/sitemap.xml' line is optional but a common convention.

The most common misunderstanding is that robots.txt blocks indexing. It does not — it blocks crawling, not indexing. A Disallow-ed URL with many external links can still appear in the index without its body, showing up on SERPs as No information is available for this page, which is worse than not appearing at all. To genuinely keep a page out of the index, add <meta name="robots" content="noindex"> to the page. Caveat: noindex only works if the bot can crawl the page, so blocking with robots.txt and noindex simultaneously is contradictory.

Two production incidents are worth pre-empting. First, a staging Disallow: / shipped to production — the entire site vanishes from search. More than half of traffic dropped to zero after relaunch postmortems trace back to that single line. Second, default-on AI crawler blocks at the CDN layer — Cloudflare and others have shipped Block AI bots features that turn on by default, silently locking out GPTBot and ClaudeBot. Unless you have a deliberate policy reason, leaving these defaults on quietly removes you from ChatGPT and Claude answers.

For GEO and AEO, robots.txt is literally the gate to citation eligibility. The bots to check by name: GPTBot (OpenAI), ChatGPT-User (browsing), Google-Extended (Gemini, SGE), ClaudeBot and anthropic-ai (Anthropic), PerplexityBot, Applebot-Extended, CCBot (Common Crawl). CCBot in particular underpins the training datasets of nearly every major LLM — blocking it costs you long-term citation presence. Add Search Console's robots.txt tester and a direct check of https://example.com/robots.txt to your monthly hygiene routine.

Sources

Related terms

How does your brand show up in AI answers?

Villion measures how your brand appears across ChatGPT, Perplexity and AI Overviews, then automates the work that lifts citation rate and share of voice.

Get a free audit