AI Extract Node

5 min readnode-ai-extract

Use a language model to extract structured fields from unstructured text, outputting a typed JSON object.

AI Extract Node

Runs a prompt + JSON schema through a language model to extract structured data from free-form text. The output is a validated JSON object stored in a variable.

When to use it

Extract name, email, and budget from a free-text enquiry message.
Parse a PDF (scraped with the HTTP Scrape node) into structured fields like company name, address, and services.

---

Required fields

Field	Type	Required	Description
model	enum	Yes	`gpt-4o`, `claude-3-5-sonnet`, or `gemini-1.5-pro`
prompt	string	Yes	Instructions for what to extract
schema	JSON object	Yes	JSON Schema defining the fields to extract
text	string	Yes	The source text — typically `{{variables.pdfText}}` or `{{variables.scrapedText}}`

---

Optional fields

Field	Type	Required	Description
outputVar	string	No	Variable name for the result (default: first key of schema or `extractResult`)

---

Variables available

Variable	Description
`{{variables.extractResult}}`	Structured JSON object matching your schema (or custom name)

---

Step-by-step setup

Scrape or retrieve source text earlier in the flow (e.g. HTTP Scrape → scrapedText).
Add an AI Extract node.
Write the extraction prompt e.g. Extract the company name, registration number, and address from the following text.
Define your schema:

json { "type": "object", "properties": { "companyName": { "type": "string" }, "regNumber": { "type": "string" }, "address": { "type": "string" } } }

Set text to {{variables.scrapedText}}.
Downstream nodes can reference {{variables.extractResult.companyName}} etc.

---

Example config

json { "model": "gpt-4o", "prompt": "Extract the fields defined in the schema from the text below.", "schema": { "type": "object", "properties": { "budget": { "type": "number" }, "timeline": { "type": "string" }, "service": { "type": "string" } } }, "text": "{{variables.enquiryText}}", "outputVar": "enquiryData" }

---

Tips & gotchas

Larger schemas → more tokens used → higher cost per run. Keep schemas focused.
If a field can't be found in the source text, the model will typically return null for that key. Check with an If/Else node if you need it.
This node is particularly powerful after the HTTP Scrape node for parsing scraped page content.