The Research API returns grounded, natural language answers to questions of varying complexity.
It runs multiple searches, processes the results, cross-references sources, and synthesizes everything into a thorough, Markdown-formatted answer with inline citations.
When you need a typed response, you can also get structured JSON by defining an output_schema.
Ask a hard question, get a researched answer with sources.
The Search API and the Research API serve different purposes by delivering different outputs:
Use the Search API when you want raw results to feed into your own pipeline. Use the Research API when you want a ready-to-use answer backed by sources.
Research operates as an agentic system that autonomously plans and executes a multi-step research strategy for your question.
Research uses You.com’s Search, Contents, and Live News APIs as its core tools. Rather than firing generic web queries, the system selects the right tool for each sub-question — search for discovery, contents for deep page reads, live news for time-sensitive information, and several other internal tools to aid in generating the best possible answer. This targeted tool selection reduces wasted calls and gives the reasoning model cleaner inputs at each step.
The system also evaluates retrieved sources for freshness, diversity, and relevance before incorporating them into the answer.
Deep research generates far more information than any single LLM context window can hold. Research uses context-masking and compaction strategies that let it operate well beyond those limits — maintaining coherent reasoning across hundreds or thousands of turns without losing track of what it found, what it verified, and what remains unresolved.
At higher effort levels, a single query can run more than 1,000 reasoning turns and process up to 10 million tokens.
The system receives a compute budget determined by the research_effort tier you choose. It plans its approach around that budget, allocating more effort to verifying ambiguous or high-stakes claims and moving quickly through well-sourced facts. This is the mechanism that enables the range of latency, accuracy, and cost tradeoffs across tiers.
Every Research API response includes:
content: A Markdown-formatted answer by default, or a JSON object when you provide output_schema. Inline citations such as [[1, 2]] reference items in the sources array.content_type: The format of the content field. text is returned for default Markdown responses. object is returned for structured output.sources: The web pages the API read and cited in the answer — each with a URL, title, and relevant snippets.The research_effort parameter controls how much compute the API allocates to your question. Higher effort means more searches, deeper source reading, and more cross-referencing — at the cost of longer response times.
For the same query, the difference between tiers is substantial. Here’s an abridged comparison for the question “Which global cities improved air quality the most over the past 10 years, and what measurable actions contributed?”:
The exhaustive response identifies additional cities (Seoul, with specific UNEP data), includes more granular measurements (µg/m³ ranges, percentage reductions over specific date ranges), and cross-references more sources to verify claims.
Every claim in the response links back to a specific source via inline citations. Your users (or your system) can verify any statement by following the numbered references to the sources array.
The content field is formatted in Markdown with headers, lists, and inline citations — ready to render in a UI or feed into downstream processing.
source_control lets you constrain which web sources the research agent searches and visits. Use it when you want results from trusted domains only, need to block specific sites, want recent content, or need results focused on a specific country.
source_control is a top-level request field alongside input and research_effort.
include_domains and exclude_domains cannot be used together in the same request. boost_domains can be combined with exclude_domains, but not with include_domains.
You can also combine filters:
Use output_schema when you want output.content returned as a JSON object instead of free-form text. This is useful for returning predictable fields, extracting entities, or feeding Research API output into another typed system.
output_schema is supported with standard, deep, and exhaustive research effort. It is not supported with lite. Sending output_schema with research_effort: "lite" returns 422.
When output_schema is provided, the structured result is returned in output.content and output.content_type is object. Sources remain in output.sources. The API does not add citation fields into your schema object automatically.
output_schema follows a narrow JSON Schema subset designed for reliable structured generation.
Required rules:
anyOf.properties.additionalProperties: false.required.{"type": "null"} is not supported outside anyOf. Use a nullable union such as ["string", "null"] instead.Supported patterns include nested objects, arrays, enums, nested anyOf, and non-recursive $defs and $ref.
Unsupported keywords:
allOfcontainsnotdependentRequireddependentSchemasformatif / then / elsemaxContains / minContainsmaxItems / minItemsmaxLength / minLengthmaxProperties / minPropertiesmaximum / minimummultipleOfpatternpatternPropertiespropertyNamesunevaluatedItems / unevaluatedPropertiesuniqueItemsSelected limits:
If the schema is invalid, the request fails validation before model execution. The schema string budget counts property names, $defs names, enum values, and const values. It applies to schema shape only. Request-level limits such as total task spec size are enforced separately at the request layer.
source_control and output_schema can be combined in a single request. For example, you can restrict research to specific domains while requesting a structured response:
When a question can’t be answered from a single source — comparative analyses, multi-factor evaluations, questions that span multiple domains — the Research API handles the synthesis for you.
“Compare the pricing models of the top 3 vector databases and their tradeoffs for a 10M-document collection”
Quickly gather verified, cited information about companies, markets, or technologies. The citation-backed output gives you traceability that raw LLM generation can’t.
Build internal research tools where employees can ask complex questions and get sourced answers — product comparisons, regulatory summaries, technical deep dives — without manually reading dozens of pages.
Use the Research API as the first step in a content pipeline: ask a research question, get a cited draft, then use it as source material for blog posts, reports, or briefings.
Don’t use exhaustive for simple factual questions — lite or standard will be faster and cheaper. Save deep and exhaustive for questions where thoroughness and accuracy justify the longer response time.
The inline citations make verification straightforward. For legal, financial, or medical contexts, build a step that follows citation URLs to confirm claims before surfacing them to end users.
The input field supports up to 40,000 characters. For complex research tasks, include context, constraints, or specific angles you want covered. A well-scoped question produces a more focused answer.
Research API pricing is tiered by effort level. All new accounts receive $100 in free credits to get started.
Higher effort tiers allocate more compute for deeper reasoning, more source verification, and higher accuracy. See the research effort levels table above for pricing and latency by tier.
For volume discounts, annual pricing, or enterprise features, visit you.com/pricing or contact [email protected].