Automations

AI Extract Node

5 min readnode-ai-extract

Use a language model to extract structured fields from unstructured text, outputting a typed JSON object.

AI Extract Node

Runs a prompt + JSON schema through a language model to extract structured data from free-form text. The output is a validated JSON object stored in a variable.

When to use it
  • Extract name, email, and budget from a free-text enquiry message.
  • Parse a PDF (scraped with the HTTP Scrape node) into structured fields like company name, address, and services.

---

Required fields
FieldTypeRequiredDescription
modelenumYesgpt-4o, claude-3-5-sonnet, or gemini-1.5-pro
promptstringYesInstructions for what to extract
schemaJSON objectYesJSON Schema defining the fields to extract
textstringYesThe source text — typically {{variables.pdfText}} or {{variables.scrapedText}}

---

Optional fields
FieldTypeRequiredDescription
outputVarstringNoVariable name for the result (default: first key of schema or extractResult)

---

Variables available
VariableDescription
{{variables.extractResult}}Structured JSON object matching your schema (or custom name)

---

Step-by-step setup
  1. Scrape or retrieve source text earlier in the flow (e.g. HTTP Scrape → scrapedText).
  2. Add an AI Extract node.
  3. Write the extraction prompt e.g. Extract the company name, registration number, and address from the following text.
  4. Define your schema:

json { "type": "object", "properties": { "companyName": { "type": "string" }, "regNumber": { "type": "string" }, "address": { "type": "string" } } }

  1. Set text to {{variables.scrapedText}}.
  2. Downstream nodes can reference {{variables.extractResult.companyName}} etc.

---

Example config

json { "model": "gpt-4o", "prompt": "Extract the fields defined in the schema from the text below.", "schema": { "type": "object", "properties": { "budget": { "type": "number" }, "timeline": { "type": "string" }, "service": { "type": "string" } } }, "text": "{{variables.enquiryText}}", "outputVar": "enquiryData" }

---

Tips & gotchas
  • Larger schemas → more tokens used → higher cost per run. Keep schemas focused.
  • If a field can't be found in the source text, the model will typically return null for that key. Check with an If/Else node if you need it.
  • This node is particularly powerful after the HTTP Scrape node for parsing scraped page content.