Age | Commit message (Collapse) | Author | Files | Lines |
|
This is mainly a quality of life change
|
|
Previously a browser crash stalled the entire grab, since events from
pychrome were handled asynchronously in a different thread and
exceptions were not propagated to the main thread.
Now all browser events are stored in a queue and processed by the main
thread, allowing us to handle browser crashes gracefully (more or less).
This made the following additional changes necessary:
- Clear separation between producer (browser) and consumer (WARC, stats,
…)
- Behavior scripts now yield events as well, instead of accessing the
WARC writer
- WARC logging was removed (for now) and WARC writer does not require
serialization any more
|
|
In preparation for recursive crawls.
|
|
|
|
If there is any and it was not included in the response already.
|
|
Broken by commit a21d7332e33a3e47a363004196451721d449e70b
|
|
|
|
|
|
No functional changes, just cleanup. Replaces onload and onsnapshot
events. Move screen metric emulation, DOM snapshots and screenshots here
as well.
|
|
|
|
+refactoring.
|
|
This is an undocumented DevTools feature.
|
|
Logger and SiteWriter both access .write_record() concurrently, which
can corrupt WARC files. Move the writer to its own thread and decouple
it with a queue. Since we’re probably I/O-bound this may speed up
writeback as well.
|
|
We can’t do that safely due to a race-condition.
|
|
|
|
Reusable browser communication and WARC writing.
|