Skip to main content

llms-generator

Crawl any website and generate llms.txt with one command.

Updated: Jun 15, 2026 | Open Source
Python 3.10+ MIT License One Command
Abdul Aouwal - Technical SEO Consultant

Abdul Aouwal

Technical SEO Consultant

AI Search Visibility & Structured Data

Get Help
TL;DR

Install with pip install llms-generator, run llms-gen https://example.com, and upload the generated llms.txt to your domain root. No account required.

Quickstart

pip install llms-generator
llms-gen https://example.com

What Is llms.txt?

The AI-ready site map standard

llms.txt is a standard proposal by Jeremy Howard (AnswerDotAI, September 2024). Place a Markdown file at /llms.txt on your domain. Inside, list your important pages with short descriptions. ChatGPT and Claude read this file when answering user questions. They skip your HTML, navigation menus, and ads.

The format is precise Markdown. An H1 heading with your project name. A blockquote summary. Sections split by H2 headings. Each entry is a link with a description after the colon.

llms.txt coexists with robots.txt and sitemap.xml. Robots.txt controls crawler access. Sitemap.xml lists every indexable page. llms.txt provides a curated overview. It is for answering user questions, not for training models.

Why Does llms.txt Matter Now?

Context windows are limited

ChatGPT, Claude, and Gemini have small context windows. They cannot read your entire website with navigation, JavaScript, and ads. llms.txt gives them your essential pages in one request.

The Optional section has a special meaning. Entries marked Optional can be skipped when the LLM needs shorter context. Use it for secondary resources, tutorials, or examples. Primary docs go in named sections.

Key Takeaways

  • LLMs have limited context windows and cannot read your entire website in one request.
  • llms.txt gives AI assistants your essential pages in a single Markdown file.
  • The Optional section lets you mark secondary content that LLMs can skip when context is tight.

What Is the llms.txt Specification?

Standard file structure

The llms.txt specification defines a Markdown file structure with an H1 heading, an optional blockquote summary, H2 section headings, and link lists with colon-separated descriptions. The file goes at the root of your domain at /llms.txt.

Standard file structure, in order:

  • H1 heading. Project or site name (required).
  • Blockquote. Short summary of the site.
  • Optional paragraphs. Additional details about the project.
  • H2 sections. Categories like "Docs", "Guides", "API".
  • Link lists. [Title](URL): Description inside each section.
# Project Name
> Short description of the project or site.

Additional context for the LLM goes here.

## Docs
- [Getting Started](https://example.com/docs/getting-started): Step-by-step setup for new users.
- [API Reference](https://example.com/docs/api): Complete REST endpoint documentation.

## Optional
- [Tutorial Videos](https://example.com/tutorials): Walkthroughs for common tasks.

Use ## Optional as a special section. LLMs skip these links when context is tight. Everything else is considered essential.

File Role Typical Size
llms.txtCurated directory of essential pages2 to 50 KB
llms-full.txtFull page content concatenated into one document50 KB to 5+ MB

Key Takeaways

  • llms.txt uses a standard Markdown structure: H1 title, blockquote summary, H2 sections, link lists with descriptions.
  • Two file types exist: llms.txt (2-50 KB curated directory) and llms-full.txt (50 KB-5+ MB full content).
  • The Optional section lets LLMs skip secondary content when working with limited context.

What Features Does llms-generator Include?

What the tool does automatically

llms-generator includes six built-in features that handle crawling, filtering, grouping, and output generation automatically. You only need to point it at a URL.

1

Robot Checks

Respects robots.txt, X-Robots-Tag headers, and meta robots tags on every page. Pages marked noindex are excluded.

2

Auto Grouping

Pages are grouped by directory path. /docs/* becomes "Docs", /blog/* becomes "Blog". Sections are ordered by priority.

3

JS Fallback

Playwright headless browser is used as fallback for JavaScript-rendered sites. Browser is launched once and reused.

4

Dual Output

Generates both llms.txt (curated directory) and llms-full.txt (full page content) with a single --full flag.

5

URL Normalization

Deduplicates http/https variants and trailing slash differences. Follows HTTP redirects and records final URLs.

6

Spec Compliant

Output follows the llmstxt.org specification. H1 title, blockquote summary, H2 sections, and link lists with descriptions.

Key Takeaways

  • llms-generator respects robots.txt, X-Robots-Tag, and meta robots on every page automatically.
  • Pages are grouped by directory path into named sections like Docs, Blog, and API.
  • Dual output mode generates both llms.txt and llms-full.txt with a single --full flag.

What CLI Options Are Available?

All available flags and defaults

Flag Default Description
URLrequiredTarget website URL
--depth2Maximum crawl depth
--outputllms.txtOutput file path
--fullfalseAlso generate llms-full.txt
--delay1.0Seconds between requests
--no-jsfalseSkip Playwright JS fallback

How Does llms-generator Work?

Step-by-step crawl process

1

Parse robots.txt

Respects Disallow and Crawl-Delay rules. Gracefully handles missing or restricted files.

2

BFS Crawl

Starts from your URL and follows internal links breadth-first up to the configured depth.

3

Per-Page Analysis

Extracts title, h1, meta description, first paragraph, and directory path. Falls back to Playwright for JS-rendered content.

4

Section Grouping

Groups pages by top-level directory path. /docs/* becomes "Docs", /blog/* becomes "Blog".

5

Spec Output

Writes valid llms.txt per the llmstxt.org specification with proper H1, blockquote, H2 sections, and link lists.

Key Takeaways

  • llms-generator starts by parsing robots.txt, then runs a breadth-first crawl respecting all robots directives.
  • Each page is analyzed for title, h1, meta description, and falls back to Playwright for JS-rendered content.
  • Output is a spec-compliant llms.txt with proper H1, blockquote, H2 sections, and link lists.

How Do I Install and Run llms-generator?

Install and run in minutes

You need Python installed on your computer. Python is a programming language. Most Mac and Linux computers have it already. On Windows, download it from python.org. Get version 3.10 or newer.

Open your terminal (Command Prompt on Windows, Terminal on Mac/Linux). Type this and press Enter:

pip install llms-generator

Wait for the installation to finish. Then type this, replacing the URL with your own website:

llms-gen https://example.com

That creates llms.txt in the current folder. Open it with any text editor. You will see your pages grouped into sections with descriptions.

To also generate the full content file, add --full:

llms-gen https://example.com --full

What Do You Need to Run llms-generator?

Requirements

Python 3.10 or higher. Upload access to your server root folder (usually called public_html or www).

How Do I Crawl and Audit My Site?

Start the crawl

llms-gen https://example.com --depth 3 --delay 1.0

The tool starts at your URL. It follows internal links up to the depth you set. Every page goes through three checks before it makes the output:

  • robots.txt. Skips disallowed paths automatically.
  • X-Robots-Tag. Respects noindex and nofollow from HTTP headers.
  • <meta name="robots">. Respects page-level directives in the HTML.

Set --delay 1.0 to wait one second between requests. For sites under 50 pages, 0.5 seconds is safe.

What Gets Filtered Automatically When I Crawl?

Automatic filtering

You do not need to build a URL list by hand. The tool filters while it crawls:

Kept in llms.txt Filtered Out
HTML pages with real contentLogin, signup, admin, and account pages
Docs, guides, tutorials, blog posts404s, 500s, and empty responses
API references and changelogsPDFs, images, CSS, JS files
About, contact, FAQ, privacy pagesPages with noindex meta tag

To skip a directory like /tag/ or /author/, add a Disallow rule in your robots.txt. The tool reads it automatically.

User-agent: llms-generator/0.1
Disallow: /tag/
Disallow: /author/

How Do I Generate llms.txt?

Auto-grouped output

Pages are grouped by directory path. /docs/getting-started goes under ## Docs. /blog/hello-world goes under ## Blog.

llms-gen https://example.com --depth 3 --output llms.txt

Sections appear in priority order: Home, About, Docs, API, Blog, then the rest alphabetically. Each entry shows the page title, URL, and a one-line description from the page's meta description or h1 tag.

# Example Site
> Example documentation and blog content.

## Docs
- [Getting Started](https://example.com/docs/getting-started): How to get started on the platform.
- [API Reference](https://example.com/docs/api): Full endpoint documentation.

## Blog
- [Hello World](https://example.com/blog/hello): Announcing the public launch.

How Do I Generate llms-full.txt?

Full content version

Pass --full once. The tool writes both files in one run. llms.txt stays as a structured directory. llms-full.txt stacks every page's full text under section headings.

llms-gen https://example.com --full

ChatGPT or Claude can read llms-full.txt in one request without fetching each page separately. Keep the file under 100,000 tokens, about 75,000 words. Smaller models drop content past their limit.

How Do I Write Descriptions That Get Cited by AI?

Optimize for AI

The tool pulls descriptions from your page's meta description or h1 tag. You can edit them after generation. Open the llms.txt file and rewrite any line. The format is plain Markdown.

Use your actual brand name in every description. Do not write "our platform" or "we offer". Write the real name so the AI can connect your content to your business.

Put the main point first in each entry. Write each description so it works as a standalone sentence. If the AI reads only that one line, it should still be useful. For example, instead of "How to install the package", write "Install llms-generator with pip in under one minute."

Add Article, Person, and Organization structured data to your pages. Schema markup tells AI systems what a page is, who wrote it, and who published it. The crawler finds the page through llms.txt, then reads the schema to decide if it should cite you. Use a schema generator to create this code without writing it by hand.

Use semantic HTML tags like <article> and <section> instead of nested div containers. Clean HTML helps the crawler extract your text accurately for the full content file.

How Do I Deploy llms.txt to My Server?

Upload to your server

Open your FTP client or hosting file manager. Upload llms.txt to the same folder that holds index.html and robots.txt. That folder is usually called public_html, www, or htdocs.

Open this URL in your browser to confirm it works:

https://yourdomain.com/llms.txt

You should see a plain text file starting with # Your Site Name. If you get a 404, move the file up one folder.

How Do I Automate llms.txt Updates?

Keep it fresh

Most users can skip this. Re-run llms-gen when your site changes and upload the new file. If you use GitHub, you can set it to regenerate on every push. That is optional.

Review your llms.txt every three to six months. Remove old pages. Add new ones. Refresh descriptions if your content has changed. ChatGPT and Claude prefer recently updated directories over stale ones.

Frequently Asked Questions

Common questions about llms-generator

No. It is not a ranking signal. It only affects how AI crawlers discover your content.

No. Only one llms.txt goes at the root of your domain.

Yes, if you install Playwright. Run pip install llms-generator[js]. The tool falls back to JS rendering when HTTP fetch returns empty content.

Add Disallow rules to your robots.txt. The tool respects them on every crawl. There is no CLI flag for path exclusion.

About --delay times the number of pages. A 50-page site at 1-second delay takes roughly one minute.

Yes. The tool auto-generates descriptions from your page's meta description or h1 tag. Open llms.txt after generation and edit any line. The format is plain Markdown.

What's the Final Checklist Before Shipping?

Before you ship

  • Installed llms-generator with pip
  • Ran llms-gen https://yoursite.com --depth 3
  • Generated llms.txt (add --full for the full version)
  • Checked that all links work
  • Uploaded the file to your domain root
  • Opened https://yoursite.com/llms.txt in a browser to verify

Install the package, run one command, upload the file. That is all it takes to create an llms.txt for any website.

Once it is live, open ChatGPT or Claude and ask what your site does. If the answer is accurate, your descriptions are working. If not, tighten them and regenerate.