summaryrefslogtreecommitdiff
path: root/crocoite
AgeCommit message (Collapse)AuthorFilesLines
2019-01-10browser: Use hypothesis’ domains()Lars-Dominik Braun1-5/+2
Fixes test.
2019-01-07controller: Test timeoutsLars-Dominik Braun1-0/+106
Lots of copy&pasta. Unfortunately the controller uses asyncio.sleep in a few places.
2019-01-07Log Chrome’s responses to WARC by defaultLars-Dominik Braun5-19/+32
We may not be able to reproduce every failure, so logging as much as possible is important to figure out what went wrong. Also, in case a bug is uncovered in the future, we can check the logs and possibly fix it with -errata.
2019-01-05browser: Do not overwrite request data when prefetchingLars-Dominik Braun1-2/+0
Needs a testcase.
2019-01-05html: Handle CDATALars-Dominik Braun1-1/+5
When loading XML documents Chrome presents a pretty-printed version to the user, which still contains the original XML when exporting via DOM.getDocument. Not sure how to test this.
2019-01-05controller: Fix PrefixLimitLars-Dominik Braun1-1/+1
Probably broken by the transition to URL() in commit 5e444dd6511d97308a84ae9c86ebf14547d01f01 And yes, we desperately need some tests for this.
2019-01-04behavior: Ignore onstop() failureLars-Dominik Braun1-4/+14
Fails if the page is reloaded/redirected. See issue #13.
2019-01-04logger: Do not log debug by defaultLars-Dominik Braun1-1/+1
Must’ve slipped through.
2019-01-04coverage: Ignore a few unreachable statementsLars-Dominik Braun2-7/+7
2019-01-04behavior: Support clicking area and add testcaseLars-Dominik Braun2-7/+76
2019-01-03browser: Turn Item into RequestResponsePairLars-Dominik Braun6-485/+627
Previously Item was just a simple wrapper around Chrome’s Network.* events. This turned out to be quite nasty when testing, so its replacement, RequestResponsePair, does some level of abstraction. This makes testing alot easier, since we now can simply instantiate it without building a proper DevTools event. Should come without any functional changes.
2018-12-31extract-screenshot: Remove URL from filenameLars-Dominik Braun1-8/+19
URL’s can get quite long, overflowing the file name length limit. Instead use sequential filenames and output metadata to stdout.
2018-12-25warc: Add testsLars-Dominik Braun4-17/+280
Using hyothesis-based testcase generation. This is quite nice compared to manual test data generation, since it catches alot more corner cases (if done right). This commit also fixes a few issues, including: - log records will only be written if the log is nonempty - properly quote packageUrl path’s - drop old thread checking code - use placeholder url for scripts without name
2018-12-25logger: Fix constructor default argumentsLars-Dominik Braun2-3/+12
Default arguments cannot be mutable objects.
2018-12-24Drop deprecated debug parameterLars-Dominik Braun1-1/+1
2018-12-24Use f-strings where possibleLars-Dominik Braun11-60/+63
Replaces str.format, which is less readable due to its separation of format and arguments.
2018-12-23Skip test if invalid domain existsLars-Dominik Braun1-7/+17
Must not exist for this test.
2018-12-22Fix recursive mode’s URL parsingLars-Dominik Braun1-1/+2
Broken by commit 5e444dd6511d97308a84ae9c86ebf14547d01f01. URL’s read from stdin must be converted from str.
2018-12-22Switch -recursive to asyncio’s .cancel()Lars-Dominik Braun2-55/+58
RecursiveController used a custom .cancel() method before. Instead we can simply cancel .run() and handle the CancelledError inside run() and fetch().
2018-12-21Remove unused EventHandler propertyLars-Dominik Braun1-6/+0
Crash detection was moved into -recursive’s return code checking a while ago.
2018-12-21util: Skip missing source filesLars-Dominik Braun1-1/+1
Requirement extraction fails if the package is an .egg file (i.e. not extracted). Do not try to compute checksum/file length for them.
2018-12-21Parse URLs by defaultLars-Dominik Braun10-89/+68
Use library yarl (already pulled in by aiohttp). No URL processed should be a string.
2018-12-17Add simple errata toolLars-Dominik Braun2-1/+98
Fixes #9.
2018-12-13behavior: Whitelist gab.com as wellLars-Dominik Braun1-4/+6
2018-12-11behavior: Add click test URLs for TwitterLars-Dominik Braun1-1/+3
2018-12-08behavior: Dump script options to file as wellLars-Dominik Braun1-3/+5
click.js’s data was part of the script before 22adde79940d32c5f094f26f3e18b7160e7ccafc. Now it is injected dynamically, but it still would be nice to have the data available.
2018-12-08controller: Reraise queue processing errors earlyLars-Dominik Braun1-1/+7
2018-12-08tools: Add version info to merged WARCsLars-Dominik Braun4-17/+54
In preparation for #9. I was hoping to reuse one of schema.org’s microdata schema’s, but neither Action (archival action) nor SoftwareApplication (version information) seem to be suitable.
2018-12-06behavior: Fix patreon selectorLars-Dominik Braun1-3/+2
And that proves their CSS class names are not stable and cannot be used.
2018-12-05behavior: Add gamasutra.com click selectorLars-Dominik Braun1-0/+7
2018-12-02behavior: Add more documentationLars-Dominik Braun1-2/+14
2018-12-02behavior: Remove outdated commentLars-Dominik Braun1-3/+0
2018-12-02behavior: Re-enable clearDeviceMetricsOverrideLars-Dominik Braun1-4/+1
Seems to be working again. Chrome bug?
2018-12-02behavior: Improve click testingLars-Dominik Braun2-22/+56
Some pages require scrolling, so we need a SinglePageController. Also mark network-dependent tests with xfail, so they won’t affect the overall test result unless you know what you’re doing (--runxfail).
2018-12-02controller: Add only enabled behavior scripts to warcinfoLars-Dominik Braun1-5/+5
2018-12-02behavior: Remove unused slotsLars-Dominik Braun1-2/+0
2018-12-02controller: Remove unused argumentLars-Dominik Braun2-5/+4
Has been replaced by handler a while ago.
2018-12-01util: Remove unused functionLars-Dominik Braun2-6/+1
2018-12-01behavior: Add selector test casesLars-Dominik Braun1-0/+78
Fixes #3.
2018-12-01behavior: Move click script data to external fileLars-Dominik Braun4-149/+169
First step of issue #3
2018-12-01cli: Fix --behaviorLars-Dominik Braun1-2/+3
2018-11-28behavior: Expand issue comments on GitHubLars-Dominik Braun1-0/+6
2018-11-26behavior: Close Facebook’s nag screenLars-Dominik Braun1-1/+1
Worked previously, broken by a site update.
2018-11-25behavior: Turn scroll JS code into classLars-Dominik Braun2-27/+33
2018-11-25single: Graceful ^CLars-Dominik Braun2-2/+13
Allow cancellation of timeout wait.
2018-11-24behavior: Never scroll html/body elementsLars-Dominik Braun1-1/+1
Fixes weird positioning of elements tethered to viewport top.
2018-11-24behavior: Fix scrollingLars-Dominik Braun4-42/+49
- Introduce stop() method callable from Python. Looks like the old method (global variable) was not working (any more?). This is much better anyway. - Restore state of scrolled elements (not window). Fixes weird screenshots of twitter.com.
2018-11-24browser: Ignore load failures for nonexisting requestsLars-Dominik Braun1-2/+3
Fixes None dereference.
2018-11-22controller: Improve idle waitingLars-Dominik Braun3-19/+89
2018-11-19controller: Add parameters to warcinfoLars-Dominik Braun1-0/+7
Add parameters the grab was run with, so we can actually reproduce a run.