Automations
HTTP Scrape Node
4 min readnode-scrape-url
Fetch and extract the readable text content of any public web page.
HTTP Scrape Node
Fetches a URL, strips HTML, and stores the readable text content in a variable. Combine with the AI Extract node to turn any web page into structured data.
When to use it
- Scrape a lead's company website to auto-populate their industry, services, and description.
- Pull content from a news article URL found via Google Search, then summarise it with an AI node.
---
Required fields
| Field | Type | Required | Description |
|---|---|---|---|
| url | string | Yes | Full URL to scrape — {{variables}} supported |
---
Optional fields
| Field | Type | Required | Description |
|---|---|---|---|
| maxChars | number | No | Truncate output to this many characters (default: 8000) |
| includeLinks | boolean | No | Include URLs found in the page (default: false) |
| outputVar | string | No | Variable name for result (default: scrapedText) |
---
Variables available
| Variable | Description |
|---|---|
{{variables.scrapedText}} | Extracted text content of the page (or custom name) |
---
Step-by-step setup
- Add an HTTP Scrape node.
- Set
url— often sourced from a previous Google Search result:{{variables.searchResults[0].url}}. - Set
maxCharsto limit the text before passing to an AI node (8000 chars ≈ 2000 tokens). - Connect an AI Extract node downstream to parse the text into structured fields.
---
Example config
json
{
"url": "{{variables.companyWebsite}}",
"maxChars": 6000,
"includeLinks": false,
"outputVar": "siteText"
}
---
Tips & gotchas
- JavaScript-heavy SPAs may return little useful content — the scraper fetches raw HTML, not rendered DOM.
- Always set
maxCharsbefore piping to an AI node to control token costs. - Some sites block automated requests. If the scrape returns an error or empty text, the site may have bot protection.