Constructor
new Crawler()
Members
private _after
private _cleanup
private _concurrency
private _pageTimeout
private _progressbar
private _setup
private _skipAfter
private _tasks
private _timeout
private _verbose
private _worker
Methods
after(callback) → {module:crawler/crawler~Crawler}
Set a callback called after a worker has finished processing all its tasks
Parameters:
Name | Type | Description |
---|---|---|
callback |
return crawler to chain method calls
cleanup(func) → {module:crawler/crawler~Crawler}
The function given as parameter will be called if the browser crash, before calling setup again
Parameters:
Name | Type | Description |
---|---|---|
func |
return crawler to chain method calls
concurrency(nb) → {module:crawler/crawler~Crawler}
Set the number of worker for the crawler. Can be used to launch several browsers.
Parameters:
Name | Type | Description |
---|---|---|
nb |
number |
return crawler to chain method calls
async crawl() → {Promise}
Start crawling. This is an async operation, don't forget to await it!
setup(func) → {module:crawler/crawler~Crawler}
The function passed as an argument will be used to set the context, (ie launch the browser, start the logger, etc...) The first parameter is a boolean true if setup is called to restart the browser after a crash, and false if it's the first time
Parameters:
Name | Type | Description |
---|---|---|
func |
return crawler to chain method calls
skipAfter(nb) → {module:crawler/crawler~Crawler}
Indicate how many times a worker should try to run a task before skipping it, if the task keep failing (ie: the browser keep crashing)
Parameters:
Name | Type | Description |
---|---|---|
nb |
number |
return crawler to chain method calls
tasks(tasks) → {module:crawler/crawler~Crawler}
Set the tasks for the crawler. A task is just data passed as an argument to the worker function. The tasks can be anything, such as a array of urls.
Parameters:
Name | Type | Description |
---|---|---|
tasks |
Array.<Task> |
return crawler to chain method calls
timeout(duration)
Set the max time a task should take. If the task takes more than this then the browser will be considered unresponsive and will be restarted.
Parameters:
Name | Type | Description |
---|---|---|
duration |
number | number of seconds allotted for a task |
verbose(bool) → {module:crawler/crawler~Crawler}
By default the crawler will display a progressbar. If you don't want to display it, then: crawler.verbose(false)
Parameters:
Name | Type | Description |
---|---|---|
bool |
boolean |
return crawler to chain method calls
worker(func) → {module:crawler/crawler~Crawler}
Give to the crawler the function that will be executed by a worker to process a task.
Parameters:
Name | Type | Description |
---|---|---|
func |
return crawler to chain method calls