Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.schemagen.io/llms.txt

Use this file to discover all available pages before exploring further.

AI URL Extraction lets you generate a complete, pre-filled schema form from any live URL. Paste the address of a product page, article, business listing, or event, and SchemaGen’s AI pipeline fetches the page, analyzes its content, and maps everything it finds to the correct Schema.org type and properties. You review the result, make any adjustments, and publish—no manual data entry required.
AI URL Extraction is available on Pro (29/mo)andAgency(29/mo) and **Agency** (99/mo) plans. Free plan users can use the manual Schema Builder instead.

How to extract a schema from a URL

1

Open Generate from URL

In the dashboard, click Generate from URL in the left sidebar or from the New Schema menu. The extraction interface opens with a URL input field.
2

Paste your URL

Enter the full URL of the page you want to extract schema from—including the https:// prefix. For example: https://example.com/products/trail-running-shoes.
3

Start extraction

Click Extract. SchemaGen sends the URL through the AI pipeline: the page is crawled and rendered, its content is analyzed by Claude AI, and the relevant data is mapped to Schema.org fields. This typically takes 5–15 seconds.
4

Review the pre-filled form

When extraction completes, the Schema Builder opens with every detected field pre-filled. The AI selects the most appropriate schema type for the page and populates properties like name, description, price, images, author, and dates from the page content.
5

Adjust and correct

Review every field before saving. AI extraction is accurate but not infallible—check that names, prices, dates, and URLs are correct. Add any missing required or recommended fields that the AI did not detect.
6

Save as draft or publish

When the form looks right, click Save as Draft to review it further, or Publish to deploy it to the SDN immediately.
Always review the extracted schema before publishing. The AI reads visible page content, so it works best on pages with clear, structured text. Verify that required fields (marked with a red asterisk) are filled and that factual details like prices and dates match your page exactly.

Pages that work best

AI extraction produces the highest-quality results for pages with rich, structured content on them.

Product pages

E-commerce product pages with names, prices, descriptions, images, SKUs, and availability status extract with high accuracy.

Articles and blog posts

News articles and blog posts with bylines, publish dates, headlines, and body copy map cleanly to Article and NewsArticle types.

Local business listings

Pages with business name, address, phone number, hours, and service area work well for LocalBusiness and its subtypes.

Events

Event pages with names, dates, locations, ticket links, and organizer information populate Event schemas reliably.
Pages with very little visible text, heavy JavaScript rendering requirements, or paywalled content may extract with fewer populated fields. For those pages, use the manual Schema Builder and fill in the fields directly.

The AI copywriting agent

During extraction, SchemaGen also runs its AI copywriting agent on the detected content. The agent rewrites descriptions and names to be more concise, better formatted, and optimized for how search engines read schema values. For example, if a product page has a description like “Buy our trail running shoes online today—free shipping on orders over $50!”, the agent trims it to a factual description that works better as structured data: “Lightweight trail running shoes with reinforced toe cap and Vibram outsole.” You can see the AI-enhanced version in the pre-filled form. If you prefer the original page text, paste it back in manually before saving.

Error states

If extraction fails, SchemaGen returns one of these errors:
The page did not respond within the allowed window. This happens on slow servers, pages with long redirect chains, or pages that block automated crawlers with JavaScript challenges. Wait a moment and try again. If it consistently times out, use the manual Schema Builder instead.
The AI pipeline is under heavy load and cannot accept new requests right now. This is temporary. Wait a few minutes and try again.
The page actively blocks crawlers (for example, via robot directives or bot detection). SchemaGen cannot access this page automatically. Use the manual Schema Builder and fill in the fields from the page yourself.
The AI analysis step completed but returned no usable data. This is rare and typically resolves on retry. If it persists, the page may have too little extractable text content for the AI to work with.

Usage quota

Every AI URL Extraction counts toward your monthly AI generation quota. Pro plan users have a monthly limit on AI generations. Agency plan users have a higher limit. If you reach your quota, the extraction feature is unavailable until your quota resets at the start of your next billing cycle. Manual schema generation via the Schema Builder does not count toward this quota.
You can check your current usage at any time from Account → Usage in the dashboard.