Class

Crawler

Crawler()

Constructor

Methods

after(callback) → {module:crawler/crawler~Crawler}

Set a callback called after a worker has finished processing all its tasks

Parameters:
Name Type Description
callback

View Source crawler/crawler.js, line 73

return crawler to chain method calls

module:crawler/crawler~Crawler

cleanup(func) → {module:crawler/crawler~Crawler}

The function given as parameter will be called if the browser crash, before calling setup again

Parameters:
Name Type Description
func

View Source crawler/crawler.js, line 97

return crawler to chain method calls

module:crawler/crawler~Crawler

concurrency(nb) → {module:crawler/crawler~Crawler}

Set the number of worker for the crawler. Can be used to launch several browsers.

Parameters:
Name Type Description
nb number

View Source crawler/crawler.js, line 62

return crawler to chain method calls

module:crawler/crawler~Crawler

async crawl() → {Promise}

Start crawling. This is an async operation, don't forget to await it!

View Source crawler/crawler.js, line 154

Promise

setup(func) → {module:crawler/crawler~Crawler}

The function passed as an argument will be used to set the context, (ie launch the browser, start the logger, etc...) The first parameter is a boolean true if setup is called to restart the browser after a crash, and false if it's the first time

Parameters:
Name Type Description
func

View Source crawler/crawler.js, line 86

return crawler to chain method calls

module:crawler/crawler~Crawler

skipAfter(nb) → {module:crawler/crawler~Crawler}

Indicate how many times a worker should try to run a task before skipping it, if the task keep failing (ie: the browser keep crashing)

Parameters:
Name Type Description
nb number

View Source crawler/crawler.js, line 145

return crawler to chain method calls

module:crawler/crawler~Crawler

tasks(tasks) → {module:crawler/crawler~Crawler}

Set the tasks for the crawler. A task is just data passed as an argument to the worker function. The tasks can be anything, such as a array of urls.

Parameters:
Name Type Description
tasks Array.<Task>

View Source crawler/crawler.js, line 51

return crawler to chain method calls

module:crawler/crawler~Crawler

timeout(duration)

Set the max time a task should take. If the task takes more than this then the browser will be considered unresponsive and will be restarted.

Parameters:
Name Type Description
duration number

number of seconds allotted for a task

View Source crawler/crawler.js, line 118

verbose(bool) → {module:crawler/crawler~Crawler}

By default the crawler will display a progressbar. If you don't want to display it, then: crawler.verbose(false)

Parameters:
Name Type Description
bool boolean

View Source crawler/crawler.js, line 129

return crawler to chain method calls

module:crawler/crawler~Crawler

worker(func) → {module:crawler/crawler~Crawler}

Give to the crawler the function that will be executed by a worker to process a task.

Parameters:
Name Type Description
func

View Source crawler/crawler.js, line 107

return crawler to chain method calls

module:crawler/crawler~Crawler