✨ Launching Soon Private Beta

Turn any website into clean content

Submit any URL and get a complete snapshot with sanitized HTML + Markdown for every page. Perfect for RAG systems, offline analysis, and content migration.

View API Documentation Join Private Beta →

crawlkit.dev

POST /v1/jobs

{
  "url": "https://example.com", 
  "depth": 2,
  "max_pages": 100
}

Response: archive_example_com_20240601.zip

index.md

about.md

contact.md

Features that power your content pipeline

Everything you need to extract clean, structured content from any website.

JavaScript Rendering

Full JavaScript rendering using Playwright ensures you capture dynamic content that traditional crawlers miss.

Clean Content

Removes scripts, styles, tracking code, and unwanted elements for clean, readable content.

HTML & Markdown

Get both sanitized HTML and converted Markdown for every page, ready for any workflow.

Download as ZIP

All content delivered as a structured ZIP archive for simple integration into your systems.

Configurable Crawling

Set crawl depth, page limits, and other options to get exactly the content you need.

Webhooks

Receive a notification when your crawl job is complete through our webhook system.

Perfect for

RAG Systems

Build powerful Retrieval Augmented Generation systems with clean, structured web content as your knowledge base.

Content Migration

Extract content from legacy sites for migration to modern CMS platforms without the manual copy-paste.

Offline Archives

Create offline archives of websites for research, compliance, or preservation purposes.

Competitive Analysis

Analyze competitor content in a structured format without manual extraction or scraping.

Join the private beta

We're launching soon! Sign up to get early access and help shape the future of web content extraction.