API Documentation Crawler

A simple Python utility to crawl and extract API documentation from websites. Creates a single readable text file containing all documentation content. This text file can then easily be used as context for AI tools to assist a developer with creating an app integration or other tasks.

Setup

Install dependencies:

pip install requests beautifulsoup4

Clone repository:

git clone [repository-url]
cd api-crawler

Usage

Edit crawler.py to set your target documentation URL:

base_url = "https://docs.example.com/api/"  # Replace with API docs URL
links_list

Run crawler:

python api-crawler/crawler.py

Find extracted documentation in documentation.txt

Features

Crawls all pages under specified documentation URL
Extracts readable text content
Preserves page structure with clear section boundaries
Includes source URLs for reference
Rate-limited to be server-friendly

Output Format

The generated documentation.txt will contain sections formatted as:

================================================================================
PAGE: [Page Title]
URL: [Source URL]
================================================================================

[Page Content]

Customization

Modify get_page_text() function to adjust content extraction for specific documentation structures.

Contributing

Feel free to submit issues and pull requests for improvements.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Documentation Crawler

Setup

Usage

Features

Output Format

Customization

Contributing

FilesExpand file tree

readme.md

Latest commit

History

readme.md

File metadata and controls

API Documentation Crawler

Setup

Usage

Features

Output Format

Customization

Contributing