The Contents API extracts clean HTML or Markdown content from a given URL. Pass it a list of URLs and get back the full page content for each, ready for LLM consumption—no parsing, no HTML noise, no browser automation required.
The Contents API and the livecrawl parameter in the Search API both extract full page content, but they serve different workflows:
Use the Contents API when you have a list of specific URLs you want to read. Use livecrawl when you want full content returned alongside search results in one go.
Each URL in your request returns a structured object:
You control which formats are returned via the formats parameter—request markdown, html, and/or metadata in any combination.
Pass up to 10 URLs in a single request. The API crawls them all in parallel and returns the content. No need to manage a headless browser or deal with raw HTML yourself.
The markdown format strips navigation menus, ads, footers, and other boilerplate. You get actual content of the page—ready to drop into a prompt.
Use crawl_timeout (1–60 seconds) to balance speed vs. completeness. For fast pages: 5–10 seconds. For heavy JavaScript-rendered pages: 20–30 seconds.
Request metadata alongside content to get the page’s site name and favicon URL—useful for building UIs that display source attribution.
Monitor competitor pricing, feature, or blog pages. Fetch the content on a schedule, feed it to an LLM, and surface meaningful changes—without manual checking.
You have a list of authoritative sources—documentation pages, whitepapers, internal wikis. Fetch them all, convert to clean Markdown, and index into your vector store.
Give users the ability to ask questions about specific URLs. Fetch the page content on the fly and feed it as context into your LLM—turning any URL into a searchable document.
Each format adds processing time. If you only need Markdown for LLM consumption, don’t request html. If you don’t need site metadata for your UI, skip metadata.
A single request with 10 URLs is faster than 10 separate requests. The API processes them in parallel.
crawl_timeout based on the target siteFor simple static pages, 5–10 seconds is usually enough. For JavaScript-heavy pages (SPAs, dashboards), increase to 20–30 seconds to give the renderer time to complete.
If one URL in a batch fails to crawl (e.g., it’s behind a login wall or returns a 404), the API returns null for its markdown and html fields. Always check before processing:
$1.00 per 1,000 pages
All new accounts receive $100 in free credits to get started. Pricing is simple and based on the number of pages you fetch.
What’s included:
For volume discounts, annual pricing, or enterprise features, visit you.com/pricing or contact [email protected].
Full parameter reference, request/response schemas, and error codes
Pair search results with livecrawl to get full content alongside real-time web data
Get your API key and make your first call in under five minutes
Use the official SDK for cleaner integration