summaryrefslogtreecommitdiff
path: root/crocoite/task.py
AgeCommit message (Collapse)AuthorFilesLines
2018-06-20Synchronous SiteLoader event handlingLars-Dominik Braun1-7/+31
Previously a browser crash stalled the entire grab, since events from pychrome were handled asynchronously in a different thread and exceptions were not propagated to the main thread. Now all browser events are stored in a queue and processed by the main thread, allowing us to handle browser crashes gracefully (more or less). This made the following additional changes necessary: - Clear separation between producer (browser) and consumer (WARC, stats, …) - Behavior scripts now yield events as well, instead of accessing the WARC writer - WARC logging was removed (for now) and WARC writer does not require serialization any more
2018-05-04Share recursive argument parserLars-Dominik Braun1-7/+2
2018-05-04Add distributed recursive crawlsLars-Dominik Braun1-3/+56
2018-05-04Move page archiving logic to SinglePageControllerLars-Dominik Braun1-0/+71
In preparation for recursive crawls.