Automations

HTTP Scrape Node

4 min readnode-scrape-url

Fetch and extract the readable text content of any public web page.

HTTP Scrape Node

Fetches a URL, strips HTML, and stores the readable text content in a variable. Combine with the AI Extract node to turn any web page into structured data.

When to use it
  • Scrape a lead's company website to auto-populate their industry, services, and description.
  • Pull content from a news article URL found via Google Search, then summarise it with an AI node.

---

Required fields
FieldTypeRequiredDescription
urlstringYesFull URL to scrape — {{variables}} supported

---

Optional fields
FieldTypeRequiredDescription
maxCharsnumberNoTruncate output to this many characters (default: 8000)
includeLinksbooleanNoInclude URLs found in the page (default: false)
outputVarstringNoVariable name for result (default: scrapedText)

---

Variables available
VariableDescription
{{variables.scrapedText}}Extracted text content of the page (or custom name)

---

Step-by-step setup
  1. Add an HTTP Scrape node.
  2. Set url — often sourced from a previous Google Search result: {{variables.searchResults[0].url}}.
  3. Set maxChars to limit the text before passing to an AI node (8000 chars ≈ 2000 tokens).
  4. Connect an AI Extract node downstream to parse the text into structured fields.

---

Example config

json { "url": "{{variables.companyWebsite}}", "maxChars": 6000, "includeLinks": false, "outputVar": "siteText" }

---

Tips & gotchas
  • JavaScript-heavy SPAs may return little useful content — the scraper fetches raw HTML, not rendered DOM.
  • Always set maxChars before piping to an AI node to control token costs.
  • Some sites block automated requests. If the scrape returns an error or empty text, the site may have bot protection.