A powerful web crawler tool that extracts website content and converts it to Markdown format. Perfect for analyzing how search engines, AI agents, and bots perceive your website structure.
Example crawl on site rdjarbeng.com produces these lines (showed just first few lines)
[  ](https://rdjarbeng.com/)
[Posts](https://rdjarbeng.com/posts) [Personal](https://rdjarbeng.com/personal) [ Gallery ](https://rdjarbeng.com/gallery) [About](https://rdjarbeng.com/about)
# Richard Djarbeng's Blog
Occasion Spotlight
# Ghana, an Introduction- All That Glitters is Still Gold
Celebrating the 69th Independence Day of Ghana. Explore the rich culture, history, and golden opportunities of the first sub-Saharan nation to gain independence.
[Read the Post](https://rdjarbeng.com/personal/a-country-called-ghana/)

## Latest Posts
[Tags](https://rdjarbeng.com/tags/) | [Categories](https://rdjarbeng.com/categories/)
[  How Google Turned 5 Million News Articles into a Flash Flood Warning System 13 March 2026 · 7 min read ](https://rdjarbeng.com/how-google-turned-5-million-news-articles-into-a-flash-flood-warning-system/)
[  Why Your Monthly AI Bill Might Soon Rival Your Headcount Costs From A Personal Experience 6 March 2026 · 2 min read ](https://rdjarbeng.com/why-your-monthly-ai-bill-might-soon-rival-your-headcount-costs-from-a-personal-experience/)
[  Google Bans Hundreds of Paying Antigravity Users for Using OpenClaw - Then Says "We Heard You" 2 March 2026 · 3 min read ](https://rdjarbeng.com/google-bans-hundreds-of-paying-antigravity-users-for-using-openclaw-then-says-we-heard-you/)
Crawl Tester is a Python-based web crawler that helps you understand how your website appears to search engines and AI agents. It crawls a given URL and outputs the extracted content as clean, well-formatted Markdown, allowing you to verify that your site structure, metadata, and content are properly accessible to bots and agents.
- 🕷️ Async Web Crawling: Fast, non-blocking crawling using async/await
- 📝 Markdown Output: Clean, readable Markdown conversion of web content
- ⚙️ Configurable: Easily customize crawl behavior and output
- 🤖 Bot-Friendly Analysis: Test how bots and search engines see your content
- 🎯 Flexible URLs: Crawl any website and verify its structure
- ⏱️ Timestamped Output: Automatic file naming with timestamps for organization
- Python 3.8+
- pip (Python package manager)
- Clone the repository:
git clone https://github.com/yourusername/crawl-tester.git
cd crawl-tester- Install dependencies:
pip install -r requirements.txt- Install Playwright browsers (one-time setup):
playwright installRun the crawler on a website:
python main.py --output my_siteThis will crawl http://rdjarbeng.com by default and save output to my_site_YYYYMMDD_HHMMSS.md.
To crawl a different URL, modify the url parameter in main.py:
result = await crawler.arun(url="https://your-website.com", crawler_config=config)- crawl4ai: Advanced web crawling library
- playwright: Browser automation for rendering JavaScript-heavy sites
- fastapi: (Optional) For building APIs around the crawler
- uvicorn: (Optional) ASGI server for FastAPI
crawl-tester/
├── main.py # Main crawler script
├── requirements.txt # Python dependencies
├── README.md # This file
├── LICENSE # MIT License
└── .gitignore # Git ignore patterns
The crawler can be configured using CrawlerRunConfig(). Some useful options:
verbose: Enable detailed loggingwait_until: Wait for specific page load conditionstimeout: Set crawl timeout- Custom headers and user agents
Example:
config = CrawlerRunConfig(
verbose=True,
timeout=30
)
result = await crawler.arun(url="https://example.com", crawler_config=config)- SEO Analysis: Verify how search engines see your content
- Bot Testing: Check how AI agents perceive your site structure
- Content Verification: Ensure metadata and structured data are properly exposed
- Accessibility Audit: Verify semantic HTML is crawlable
- Web Scraping: Extract and archive website content as Markdown
- Monitoring: Track how bots index your site over time
Contributions are welcome! Please feel free to submit a Pull Request. For major changes, please open an issue first to discuss what you would like to change.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
This project is licensed under the MIT License - see the LICENSE file for details.
If you get errors related to Playwright browsers, run:
playwright installIncrease the timeout in CrawlerRunConfig:
config = CrawlerRunConfig(timeout=60) # 60 secondsThe crawler automatically handles JavaScript-rendered content through Playwright.
For issues, questions, or suggestions, please open an Issue.
- crawl4ai - Advanced web crawling library
- Playwright - Browser automation
- FastAPI - Modern web framework
Made with ❤️ by developers, for developers