For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Go to Platform
HomeSDKs & toolsAPI referencePricing
  • Get started
    • Welcome
    • Quickstart
  • Search API
    • Overview
    • Get live news
    • Retrieve page content
    • Search Operators
    • How to Evaluate the Search API
  • Contents API
    • Overview
  • Research API
    • Overview
  • Finance Research API
    • Overview
  • Administration
    • Account
    • Billing
    • API Keys
  • Support
    • Platform
    • Support
    • Resources
    • System Status
LogoLogo
HomeSDKs & toolsAPI referencePricing
LogoLogo
Go to Platform
On this page
  • Overview
  • How it works
  • Crawl web results
  • Crawl news results
  • Crawl both web and news
  • Control crawl timeout
  • HTML vs Markdown
  • Already have URLs?
  • Next steps
Search API

Retrieve page content

Live-crawl search results to get full HTML or Markdown page content. Ideal for RAG, knowledge base construction, and deep content analysis.

|View as Markdown|Open in Claude|
Was this page helpful?
Built with

Overview

By default, search results include snippets — 100–200 words of extracted text per result. Enable live crawling to get the full page content: typically 2,000–10,000 words of clean HTML or Markdown per result.

This is what enables:

  • Deep RAG with full document context
  • Knowledge base construction from live web data
  • Comprehensive content synthesis across sources
  • Full article bodies for news results

How it works

Add livecrawl to any search request. The API fetches each matching result’s page in real time and attaches a contents object to it. You choose which result types to crawl and what format to return.

ParameterTypeOptionsDescription
livecrawlstringweb, news, allWhich result types to crawl
livecrawl_formatsstringhtml, markdownFormat for returned content. Repeat to get both: ?livecrawl_formats=html&livecrawl_formats=markdown (GET) or pass an array (POST)
crawl_timeoutinteger1–60 (default 10)Max seconds to wait per page

markdown is recommended for LLM use cases — it strips navigation, ads, and boilerplate HTML, leaving only the core content.

Livecrawl is billed separately from the base Search API rate. Each page crawled costs $1.00 per 1,000 pages — the same rate as the Contents API. A single call with count=10 and livecrawl=all crawls up to 20 pages (10 web + 10 news), adding $0.02 to the $0.005 base call cost.

Crawl web results

Set livecrawl=web to attach full page content to web results. The contents.markdown (or contents.html) field is added to each result that was successfully crawled.

1from youdotcom import You
2from youdotcom.models import LiveCrawl, LiveCrawlFormats
3
4with You(api_key_auth="api_key") as you:
5 res = you.search.unified(
6 query="transformer architecture explained",
7 count=5,
8 livecrawl=LiveCrawl.WEB,
9 livecrawl_formats=LiveCrawlFormats.MARKDOWN,
10 )
11
12 if res.results and res.results.web:
13 for result in res.results.web:
14 print(f"{result.title}")
15 print(f" URL: {result.url}")
16 if result.contents:
17 print(f" Content ({len(result.contents.markdown)} chars)")
18 print(f" Preview: {result.contents.markdown[:200]}...\n")
19 else:
20 print(" (No content retrieved)\n")

Crawl news results

Set livecrawl=news to get full article bodies for news results. Combine with freshness for breaking news pipelines.

1from youdotcom import You
2from youdotcom.models import Freshness, LiveCrawl, LiveCrawlFormats
3
4with You(api_key_auth="api_key") as you:
5 res = you.search.unified(
6 query="semiconductor supply chain",
7 freshness=Freshness.WEEK,
8 count=5,
9 livecrawl=LiveCrawl.NEWS,
10 livecrawl_formats=LiveCrawlFormats.MARKDOWN,
11 )
12
13 if res.results and res.results.news:
14 for article in res.results.news:
15 print(f"{article.title}")
16 if article.contents:
17 print(article.contents.markdown[:400])
18 print()

Crawl both web and news

Use livecrawl=all to crawl every result type in one request.

1from youdotcom import You
2from youdotcom.models import LiveCrawl, LiveCrawlFormats
3
4with You(api_key_auth="api_key") as you:
5 res = you.search.unified(
6 query="quantum computing breakthroughs",
7 count=5,
8 livecrawl=LiveCrawl.ALL,
9 livecrawl_formats=LiveCrawlFormats.MARKDOWN,
10 )
11
12 if res.results:
13 for result in (res.results.web or []):
14 if result.contents:
15 print(f"[WEB] {result.title}: {len(result.contents.markdown)} chars")
16
17 for result in (res.results.news or []):
18 if result.contents:
19 print(f"[NEWS] {result.title}: {len(result.contents.markdown)} chars")

Control crawl timeout

By default the crawler waits up to 10 seconds per page. For latency-sensitive applications, reduce crawl_timeout. For complex or slow-loading pages, increase it (up to 60 seconds).

1from youdotcom import You
2from youdotcom.models import LiveCrawl, LiveCrawlFormats
3
4with You(api_key_auth="api_key") as you:
5 # Low-latency pipeline: only wait 3 seconds per page
6 res = you.search.unified(
7 query="latest Python releases",
8 count=5,
9 livecrawl=LiveCrawl.WEB,
10 livecrawl_formats=LiveCrawlFormats.MARKDOWN,
11 crawl_timeout=3,
12 )
13
14 if res.results and res.results.web:
15 for result in res.results.web:
16 status = "crawled" if result.contents else "skipped (timeout)"
17 print(f"{result.title} — {status}")

HTML vs Markdown

FormatBest for
markdownLLM prompts, RAG, text analysis — clean, no boilerplate
htmlRendering, scraping structured data, preserving page layout

Already have URLs?

If you have a list of URLs and don’t need to search first, use the Contents API directly. It accepts URLs without a query and returns the same markdown or html content.


Next steps

Contents API

Fetch page content directly from URLs, no search query needed

Get live news

Retrieve and filter real-time news results

API reference

View all parameters and response schemas

Search overview

Back to Search API overview