Retrieve page content

Live-crawl search results to get full HTML or Markdown page content. Ideal for RAG, knowledge base construction, and deep content analysis.

Retrieve page content

Live-crawl search results to get full HTML or Markdown page content. Ideal for RAG, knowledge base construction, and deep content analysis.

Overview

By default, search results include snippets — 100–200 words of extracted text per result. Enable live crawling to get the full page content: typically 2,000–10,000 words of clean HTML or Markdown per result.

This is what enables:

Deep RAG with full document context
Knowledge base construction from live web data
Comprehensive content synthesis across sources
Full article bodies for news results

How it works

Add livecrawl to any search request. The API fetches each matching result’s page in real time and attaches a contents object to it. You choose which result types to crawl and what format to return.

Parameter	Type	Options	Description
`livecrawl`	string	`web`, `news`, `all`	Which result types to crawl
`livecrawl_formats`	string	`html`, `markdown`	Format for returned content. Repeat to get both: `?livecrawl_formats=html&livecrawl_formats=markdown` (GET) or pass an array (POST)
`crawl_timeout`	integer	`1`–`60` (default `10`)	Max seconds to wait per page

markdown is recommended for LLM use cases — it strips navigation, ads, and boilerplate HTML, leaving only the core content.

Livecrawl is billed separately from the base Search API rate. Each page crawled costs $1.00 per 1,000 pages — the same rate as the Contents API. A single call with count=10 and livecrawl=all crawls up to 20 pages (10 web + 10 news), adding $0.02 to the $0.005 base call cost.

Crawl web results

Set livecrawl=web to attach full page content to web results. The contents.markdown (or contents.html) field is added to each result that was successfully crawled.

1 from youdotcom import You
2 from youdotcom.models import LiveCrawl, LiveCrawlFormats
3 
4 with You(api_key_auth="api_key") as you:
5   res = you.search.unified(
6     query="transformer architecture explained",
7     count=5,
8     livecrawl=LiveCrawl.WEB,
9     livecrawl_formats=LiveCrawlFormats.MARKDOWN,
10   )
11 
12   if res.results and res.results.web:
13     for result in res.results.web:
14       print(f"{result.title}")
15       print(f"  URL: {result.url}")
16       if result.contents:
17         print(f"  Content ({len(result.contents.markdown)} chars)")
18         print(f"  Preview: {result.contents.markdown[:200]}...\n")
19       else:
20         print("  (No content retrieved)\n")

Crawl news results

Set livecrawl=news to get full article bodies for news results. Combine with freshness for breaking news pipelines.

1 from youdotcom import You
2 from youdotcom.models import Freshness, LiveCrawl, LiveCrawlFormats
3 
4 with You(api_key_auth="api_key") as you:
5   res = you.search.unified(
6     query="semiconductor supply chain",
7     freshness=Freshness.WEEK,
8     count=5,
9     livecrawl=LiveCrawl.NEWS,
10     livecrawl_formats=LiveCrawlFormats.MARKDOWN,
11   )
12 
13   if res.results and res.results.news:
14     for article in res.results.news:
15       print(f"{article.title}")
16       if article.contents:
17         print(article.contents.markdown[:400])
18       print()

Crawl both web and news

Use livecrawl=all to crawl every result type in one request.

1 from youdotcom import You
2 from youdotcom.models import LiveCrawl, LiveCrawlFormats
3 
4 with You(api_key_auth="api_key") as you:
5   res = you.search.unified(
6     query="quantum computing breakthroughs",
7     count=5,
8     livecrawl=LiveCrawl.ALL,
9     livecrawl_formats=LiveCrawlFormats.MARKDOWN,
10   )
11 
12   if res.results:
13     for result in (res.results.web or []):
14       if result.contents:
15         print(f"[WEB] {result.title}: {len(result.contents.markdown)} chars")
16 
17     for result in (res.results.news or []):
18       if result.contents:
19         print(f"[NEWS] {result.title}: {len(result.contents.markdown)} chars")

Control crawl timeout

By default the crawler waits up to 10 seconds per page. For latency-sensitive applications, reduce crawl_timeout. For complex or slow-loading pages, increase it (up to 60 seconds).

1 from youdotcom import You
2 from youdotcom.models import LiveCrawl, LiveCrawlFormats
3 
4 with You(api_key_auth="api_key") as you:
5   # Low-latency pipeline: only wait 3 seconds per page
6   res = you.search.unified(
7     query="latest Python releases",
8     count=5,
9     livecrawl=LiveCrawl.WEB,
10     livecrawl_formats=LiveCrawlFormats.MARKDOWN,
11     crawl_timeout=3,
12   )
13 
14   if res.results and res.results.web:
15     for result in res.results.web:
16       status = "crawled" if result.contents else "skipped (timeout)"
17       print(f"{result.title} — {status}")

HTML vs Markdown

Format	Best for
`markdown`	LLM prompts, RAG, text analysis — clean, no boilerplate
`html`	Rendering, scraping structured data, preserving page layout

Already have URLs?

If you have a list of URLs and don’t need to search first, use the Contents API directly. It accepts URLs without a query and returns the same markdown or html content.

Next steps

Contents API

Fetch page content directly from URLs, no search query needed

Get live news

Retrieve and filter real-time news results

API reference

View all parameters and response schemas

Search overview

Back to Search API overview

Overview

This is what enables:

Deep RAG with full document context
Knowledge base construction from live web data
Comprehensive content synthesis across sources
Full article bodies for news results

How it works

Parameter	Type	Options	Description
`livecrawl`	string	`web`, `news`, `all`	Which result types to crawl
`livecrawl_formats`	string	`html`, `markdown`	Format for returned content. Repeat to get both: `?livecrawl_formats=html&livecrawl_formats=markdown` (GET) or pass an array (POST)
`crawl_timeout`	integer	`1`–`60` (default `10`)	Max seconds to wait per page

markdown is recommended for LLM use cases — it strips navigation, ads, and boilerplate HTML, leaving only the core content.

Crawl web results

Set livecrawl=web to attach full page content to web results. The contents.markdown (or contents.html) field is added to each result that was successfully crawled.

1 from youdotcom import You
2 from youdotcom.models import LiveCrawl, LiveCrawlFormats
3 
4 with You(api_key_auth="api_key") as you:
5   res = you.search.unified(
6     query="transformer architecture explained",
7     count=5,
8     livecrawl=LiveCrawl.WEB,
9     livecrawl_formats=LiveCrawlFormats.MARKDOWN,
10   )
11 
12   if res.results and res.results.web:
13     for result in res.results.web:
14       print(f"{result.title}")
15       print(f"  URL: {result.url}")
16       if result.contents:
17         print(f"  Content ({len(result.contents.markdown)} chars)")
18         print(f"  Preview: {result.contents.markdown[:200]}...\n")
19       else:
20         print("  (No content retrieved)\n")

Crawl news results

Set livecrawl=news to get full article bodies for news results. Combine with freshness for breaking news pipelines.

1 from youdotcom import You
2 from youdotcom.models import Freshness, LiveCrawl, LiveCrawlFormats
3 
4 with You(api_key_auth="api_key") as you:
5   res = you.search.unified(
6     query="semiconductor supply chain",
7     freshness=Freshness.WEEK,
8     count=5,
9     livecrawl=LiveCrawl.NEWS,
10     livecrawl_formats=LiveCrawlFormats.MARKDOWN,
11   )
12 
13   if res.results and res.results.news:
14     for article in res.results.news:
15       print(f"{article.title}")
16       if article.contents:
17         print(article.contents.markdown[:400])
18       print()

Crawl both web and news

Use livecrawl=all to crawl every result type in one request.

1 from youdotcom import You
2 from youdotcom.models import LiveCrawl, LiveCrawlFormats
3 
4 with You(api_key_auth="api_key") as you:
5   res = you.search.unified(
6     query="quantum computing breakthroughs",
7     count=5,
8     livecrawl=LiveCrawl.ALL,
9     livecrawl_formats=LiveCrawlFormats.MARKDOWN,
10   )
11 
12   if res.results:
13     for result in (res.results.web or []):
14       if result.contents:
15         print(f"[WEB] {result.title}: {len(result.contents.markdown)} chars")
16 
17     for result in (res.results.news or []):
18       if result.contents:
19         print(f"[NEWS] {result.title}: {len(result.contents.markdown)} chars")

Control crawl timeout

By default the crawler waits up to 10 seconds per page. For latency-sensitive applications, reduce crawl_timeout. For complex or slow-loading pages, increase it (up to 60 seconds).

1 from youdotcom import You
2 from youdotcom.models import LiveCrawl, LiveCrawlFormats
3 
4 with You(api_key_auth="api_key") as you:
5   # Low-latency pipeline: only wait 3 seconds per page
6   res = you.search.unified(
7     query="latest Python releases",
8     count=5,
9     livecrawl=LiveCrawl.WEB,
10     livecrawl_formats=LiveCrawlFormats.MARKDOWN,
11     crawl_timeout=3,
12   )
13 
14   if res.results and res.results.web:
15     for result in res.results.web:
16       status = "crawled" if result.contents else "skipped (timeout)"
17       print(f"{result.title} — {status}")

HTML vs Markdown

Format	Best for
`markdown`	LLM prompts, RAG, text analysis — clean, no boilerplate
`html`	Rendering, scraping structured data, preserving page layout

Already have URLs?

If you have a list of URLs and don’t need to search first, use the Contents API directly. It accepts URLs without a query and returns the same markdown or html content.

Next steps

Contents API

Fetch page content directly from URLs, no search query needed

Get live news

Retrieve and filter real-time news results

API reference

View all parameters and response schemas

Search overview

Back to Search API overview

1	from youdotcom import You
2	from youdotcom.models import LiveCrawl, LiveCrawlFormats
3
4	with You(api_key_auth="api_key") as you:
5	res = you.search.unified(
6	query="transformer architecture explained",
7	count=5,
8	livecrawl=LiveCrawl.WEB,
9	livecrawl_formats=LiveCrawlFormats.MARKDOWN,
10	)
11
12	if res.results and res.results.web:
13	for result in res.results.web:
14	print(f"{result.title}")
15	print(f" URL: {result.url}")
16	if result.contents:
17	print(f" Content ({len(result.contents.markdown)} chars)")
18	print(f" Preview: {result.contents.markdown[:200]}...\n")
19	else:
20	print(" (No content retrieved)\n")

1	from youdotcom import You
2	from youdotcom.models import Freshness, LiveCrawl, LiveCrawlFormats
3
4	with You(api_key_auth="api_key") as you:
5	res = you.search.unified(
6	query="semiconductor supply chain",
7	freshness=Freshness.WEEK,
8	count=5,
9	livecrawl=LiveCrawl.NEWS,
10	livecrawl_formats=LiveCrawlFormats.MARKDOWN,
11	)
12
13	if res.results and res.results.news:
14	for article in res.results.news:
15	print(f"{article.title}")
16	if article.contents:
17	print(article.contents.markdown[:400])
18	print()