Age | Commit message (Collapse) | Author | Files | Lines |
|
We may not be able to reproduce every failure, so logging as much as
possible is important to figure out what went wrong. Also, in case a bug
is uncovered in the future, we can check the logs and possibly fix it
with -errata.
|
|
Needs a testcase.
|
|
Previously Item was just a simple wrapper around Chrome’s Network.*
events. This turned out to be quite nasty when testing, so its
replacement, RequestResponsePair, does some level of abstraction. This
makes testing alot easier, since we now can simply instantiate it
without building a proper DevTools event.
Should come without any functional changes.
|
|
Replaces str.format, which is less readable due to its separation of
format and arguments.
|
|
Use library yarl (already pulled in by aiohttp). No URL processed should
be a string.
|
|
Fixes None dereference.
|
|
|
|
Fix a few random issues pointed out by pylint, mainly unused imports.
|
|
canClearBrowserCookies apparently has been removed from protocol 1.3.
|
|
Move it to .devtools. Seems more fitting.
|
|
|
|
Fixes test failures. Very fragile code unfortunately.
|
|
Just truncate the WARC record like we do with responses. Also add a few
tests, but they’re not covering the call to getRequestPostData. Not sure
what we have to do here.
|
|
Commit 7730e0d64ec895091a0dd7eb0e3c6ce2ed02d981 removed logging to WARC
files. Add it again, but with a different implementation.. Credits to
structlog for inspiration.
|
|
Judging from the docs this is the proper way to store these resources.
Enable both for the IRC bot by default, since they won’t interfere with
IA’s wayback machine.
|
|
|
|
It just seems a little nicer than plain old unittest
|
|
This is mainly a quality of life change
|
|
Previously a browser crash stalled the entire grab, since events from
pychrome were handled asynchronously in a different thread and
exceptions were not propagated to the main thread.
Now all browser events are stored in a queue and processed by the main
thread, allowing us to handle browser crashes gracefully (more or less).
This made the following additional changes necessary:
- Clear separation between producer (browser) and consumer (WARC, stats,
…)
- Behavior scripts now yield events as well, instead of accessing the
WARC writer
- WARC logging was removed (for now) and WARC writer does not require
serialization any more
|
|
It was replaced by --remote-debugging-pipe in version 67. pychrome does
not support that out of the box, so instead we’ll let Chrome choose its
own port and poll a file in its user-data-dir.
|
|
Broken by commit 75019eac4545bb2e8b90033834e91beef614cdf3
|
|
Use an actual class that supports multiple invokations.
|
|
|
|
If there is any and it was not included in the response already.
|
|
|
|
Broken by commit a21d7332e33a3e47a363004196451721d449e70b
|
|
When something goes wrong, these block the entire grab.
|
|
alert, confirm and prompt and beforeunload
|
|
To be expanded, but it’s a start…
|
|
|
|
We passed it to the child and don’t need it any more.
|
|
|
|
|
|
|
|
Fixes bcfbdd9b45b7e872ee77e1366197443d855d8c7c
|
|
|
|
|
|
|
|
Using celery. Also adds a plugin for the IRC bot sopel. Code still needs
some love, but it should work.
|
|
Unless --browser argument is given. Uses sane settings and a temporary
profile directory.
|
|
Fixes 6f628ca24ac2b243dd4a611ff1ecff2d35aaa019
|
|
|
|
Reusable browser communication and WARC writing.
|