Age | Commit message (Collapse) | Author | Files | Lines |
|
Using websockets, vue and bulma.
|
|
Gonna rewrite that properly.
|
|
Judging from the docs this is the proper way to store these resources.
Enable both for the IRC bot by default, since they won’t interfere with
IA’s wayback machine.
|
|
Previously a browser crash stalled the entire grab, since events from
pychrome were handled asynchronously in a different thread and
exceptions were not propagated to the main thread.
Now all browser events are stored in a queue and processed by the main
thread, allowing us to handle browser crashes gracefully (more or less).
This made the following additional changes necessary:
- Clear separation between producer (browser) and consumer (WARC, stats,
…)
- Behavior scripts now yield events as well, instead of accessing the
WARC writer
- WARC logging was removed (for now) and WARC writer does not require
serialization any more
|
|
Move contrib/ scripts to .tools and add entry points to setup.py, rename
crocoite-standalone to crocoite-grab.
|
|
Very useful for distributed, recursive crawls which create one WARC per
page.
|
|
|
|
|
|
In preparation for recursive crawls.
|
|
|
|
This is a workaround for https://github.com/celery/celery/issues/4480
|
|
|
|
No functional changes, just cleanup. Replaces onload and onsnapshot
events. Move screen metric emulation, DOM snapshots and screenshots here
as well.
|
|
|
|
|
|
Using celery. Also adds a plugin for the IRC bot sopel. Code still needs
some love, but it should work.
|