Age | Commit message (Collapse) | Author | Files | Lines | |
---|---|---|---|---|---|
2018-08-04 | Reintroduce WARC logging | Lars-Dominik Braun | 1 | -4/+5 | |
Commit 7730e0d64ec895091a0dd7eb0e3c6ce2ed02d981 removed logging to WARC files. Add it again, but with a different implementation.. Credits to structlog for inspiration. | |||||
2018-06-21 | Fix a few issues pointed out by pylint | Lars-Dominik Braun | 1 | -6/+6 | |
2018-06-20 | Add __slots__ to classes | Lars-Dominik Braun | 1 | -0/+2 | |
This is mainly a quality of life change | |||||
2018-06-20 | Synchronous SiteLoader event handling | Lars-Dominik Braun | 1 | -7/+31 | |
Previously a browser crash stalled the entire grab, since events from pychrome were handled asynchronously in a different thread and exceptions were not propagated to the main thread. Now all browser events are stored in a queue and processed by the main thread, allowing us to handle browser crashes gracefully (more or less). This made the following additional changes necessary: - Clear separation between producer (browser) and consumer (WARC, stats, …) - Behavior scripts now yield events as well, instead of accessing the WARC writer - WARC logging was removed (for now) and WARC writer does not require serialization any more | |||||
2018-05-04 | Share recursive argument parser | Lars-Dominik Braun | 1 | -7/+2 | |
2018-05-04 | Add distributed recursive crawls | Lars-Dominik Braun | 1 | -3/+56 | |
2018-05-04 | Move page archiving logic to SinglePageController | Lars-Dominik Braun | 1 | -0/+71 | |
In preparation for recursive crawls. |