Automations

HTTP Scrape Node

4 min readnode-scrape-url

Fetch and extract the readable text content of any public web page.

HTTP Scrape Node

Fetches a URL, strips HTML, and stores the readable text content in a variable. Combine with the AI Extract node to turn any web page into structured data.

When to use it

Scrape a lead's company website to auto-populate their industry, services, and description.
Pull content from a news article URL found via Google Search, then summarise it with an AI node.

---

Required fields

Field	Type	Required	Description
url	string	Yes	Full URL to scrape — `{{variables}}` supported

---

Optional fields

Field	Type	Required	Description
maxChars	number	No	Truncate output to this many characters (default: 8000)
includeLinks	boolean	No	Include URLs found in the page (default: false)
outputVar	string	No	Variable name for result (default: `scrapedText`)

---

Variables available

Variable	Description
`{{variables.scrapedText}}`	Extracted text content of the page (or custom name)

---

Step-by-step setup

Add an HTTP Scrape node.
Set url — often sourced from a previous Google Search result: {{variables.searchResults[0].url}}.
Set maxChars to limit the text before passing to an AI node (8000 chars ≈ 2000 tokens).
Connect an AI Extract node downstream to parse the text into structured fields.

---

Example config

json { "url": "{{variables.companyWebsite}}", "maxChars": 6000, "includeLinks": false, "outputVar": "siteText" }

---

Tips & gotchas

JavaScript-heavy SPAs may return little useful content — the scraper fetches raw HTML, not rendered DOM.
Always set maxChars before piping to an AI node to control token costs.
Some sites block automated requests. If the scrape returns an error or empty text, the site may have bot protection.

For Each Loop Node

Google Search Node