Skip to content

Commit 2f3b4b4

Browse files
authored
Merge pull request #33 from ScrapeGraphAI/docs/add-wait-ms-to-crawl
docs: add wait_ms parameter to SmartCrawler
2 parents bdef511 + 64be0e2 commit 2f3b4b4

2 files changed

Lines changed: 7 additions & 3 deletions

File tree

api-reference/endpoint/smartcrawler/start.mdx

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,9 @@ Content-Type: `application/json`
3636
"same_domain": "boolean"
3737
},
3838
"sitemap": "boolean",
39-
"stealth": "boolean"
40-
"webhook_url": str
39+
"stealth": "boolean",
40+
"webhook_url": "string",
41+
"wait_ms": "integer"
4142
}
4243
```
4344

@@ -58,7 +59,8 @@ Content-Type: `application/json`
5859
| rules | object | No | - | Crawl rules for filtering URLs. Object with optional fields: `exclude` (array of regex URL patterns), `include_paths` (array of path patterns to include, supports wildcards `*` and `**`), `exclude_paths` (array of path patterns to exclude, takes precedence over `include_paths`), `same_domain` (boolean, default: true). See Rules section below for details. |
5960
| sitemap | boolean | No | false | Use sitemap.xml for discovery |
6061
| stealth | boolean | No | false | Enable stealth mode to bypass bot protection using advanced anti-detection techniques. Adds +4 credits to the request cost |
61-
| webhook_url | str | No | None | Webhook URL to send the job result to. When provided, a signed webhook notification will be sent upon job completion. See [Webhook Signature Verification](#webhook-signature-verification) below.
62+
| webhook_url | string | No | None | Webhook URL to send the job result to. When provided, a signed webhook notification will be sent upon job completion. See [Webhook Signature Verification](#webhook-signature-verification) below. |
63+
| wait_ms | integer | No | 3000 | Milliseconds to wait before scraping each page. Useful for pages with heavy JavaScript rendering that need extra time to load. |
6264

6365
### Example
6466
```json

services/smartcrawler.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -107,6 +107,7 @@ curl -X 'POST' \
107107
| rules | object | No | Crawl rules object with optional fields: `exclude` (array of regex URL patterns), `include_paths` (array of path patterns to include, supports wildcards `*` and `**`), `exclude_paths` (array of path patterns to exclude, takes precedence over `include_paths`), `same_domain` (boolean, default: true). See below for details. |
108108
| sitemap | bool | No | Use sitemap.xml for discovery (default: false). |
109109
| webhook_url | string | No | URL to receive webhook notification on job completion. |
110+
| wait_ms | int | No | Milliseconds to wait before scraping each page. Useful for pages with heavy JavaScript rendering that need extra time to load (default: 3000). |
110111

111112

112113
<Note>
@@ -463,6 +464,7 @@ POST https://api.scrapegraphai.com/v1/crawl
463464
| max_pages | int | No | Max pages to crawl |
464465
| rules | object | No | Crawl rules object with optional fields: `exclude` (regex URL patterns), `include_paths` (path patterns to include), `exclude_paths` (path patterns to exclude), `same_domain` (boolean) |
465466
| sitemap | bool | No | Use sitemap.xml |
467+
| wait_ms | int | No | Milliseconds to wait before scraping each page. Useful for pages with heavy JavaScript rendering (default: 3000). |
466468

467469
#### Response Format
468470
```json

0 commit comments

Comments
 (0)