summaryrefslogtreecommitdiff
path: root/crocoite
AgeCommit message (Collapse)AuthorFilesLines
2018-12-22Switch -recursive to asyncio’s .cancel()Lars-Dominik Braun2-55/+58
RecursiveController used a custom .cancel() method before. Instead we can simply cancel .run() and handle the CancelledError inside run() and fetch().
2018-12-21Remove unused EventHandler propertyLars-Dominik Braun1-6/+0
Crash detection was moved into -recursive’s return code checking a while ago.
2018-12-21util: Skip missing source filesLars-Dominik Braun1-1/+1
Requirement extraction fails if the package is an .egg file (i.e. not extracted). Do not try to compute checksum/file length for them.
2018-12-21Parse URLs by defaultLars-Dominik Braun10-89/+68
Use library yarl (already pulled in by aiohttp). No URL processed should be a string.
2018-12-17Add simple errata toolLars-Dominik Braun2-1/+98
Fixes #9.
2018-12-13behavior: Whitelist gab.com as wellLars-Dominik Braun1-4/+6
2018-12-11behavior: Add click test URLs for TwitterLars-Dominik Braun1-1/+3
2018-12-08behavior: Dump script options to file as wellLars-Dominik Braun1-3/+5
click.js’s data was part of the script before 22adde79940d32c5f094f26f3e18b7160e7ccafc. Now it is injected dynamically, but it still would be nice to have the data available.
2018-12-08controller: Reraise queue processing errors earlyLars-Dominik Braun1-1/+7
2018-12-08tools: Add version info to merged WARCsLars-Dominik Braun4-17/+54
In preparation for #9. I was hoping to reuse one of schema.org’s microdata schema’s, but neither Action (archival action) nor SoftwareApplication (version information) seem to be suitable.
2018-12-06behavior: Fix patreon selectorLars-Dominik Braun1-3/+2
And that proves their CSS class names are not stable and cannot be used.
2018-12-05behavior: Add gamasutra.com click selectorLars-Dominik Braun1-0/+7
2018-12-02behavior: Add more documentationLars-Dominik Braun1-2/+14
2018-12-02behavior: Remove outdated commentLars-Dominik Braun1-3/+0
2018-12-02behavior: Re-enable clearDeviceMetricsOverrideLars-Dominik Braun1-4/+1
Seems to be working again. Chrome bug?
2018-12-02behavior: Improve click testingLars-Dominik Braun2-22/+56
Some pages require scrolling, so we need a SinglePageController. Also mark network-dependent tests with xfail, so they won’t affect the overall test result unless you know what you’re doing (--runxfail).
2018-12-02controller: Add only enabled behavior scripts to warcinfoLars-Dominik Braun1-5/+5
2018-12-02behavior: Remove unused slotsLars-Dominik Braun1-2/+0
2018-12-02controller: Remove unused argumentLars-Dominik Braun2-5/+4
Has been replaced by handler a while ago.
2018-12-01util: Remove unused functionLars-Dominik Braun2-6/+1
2018-12-01behavior: Add selector test casesLars-Dominik Braun1-0/+78
Fixes #3.
2018-12-01behavior: Move click script data to external fileLars-Dominik Braun4-149/+169
First step of issue #3
2018-12-01cli: Fix --behaviorLars-Dominik Braun1-2/+3
2018-11-28behavior: Expand issue comments on GitHubLars-Dominik Braun1-0/+6
2018-11-26behavior: Close Facebook’s nag screenLars-Dominik Braun1-1/+1
Worked previously, broken by a site update.
2018-11-25behavior: Turn scroll JS code into classLars-Dominik Braun2-27/+33
2018-11-25single: Graceful ^CLars-Dominik Braun2-2/+13
Allow cancellation of timeout wait.
2018-11-24behavior: Never scroll html/body elementsLars-Dominik Braun1-1/+1
Fixes weird positioning of elements tethered to viewport top.
2018-11-24behavior: Fix scrollingLars-Dominik Braun4-42/+49
- Introduce stop() method callable from Python. Looks like the old method (global variable) was not working (any more?). This is much better anyway. - Restore state of scrolled elements (not window). Fixes weird screenshots of twitter.com.
2018-11-24browser: Ignore load failures for nonexisting requestsLars-Dominik Braun1-2/+3
Fixes None dereference.
2018-11-22controller: Improve idle waitingLars-Dominik Braun3-19/+89
2018-11-19controller: Add parameters to warcinfoLars-Dominik Braun1-0/+7
Add parameters the grab was run with, so we can actually reproduce a run.
2018-11-19Coding styleLars-Dominik Braun12-58/+44
Fix a few random issues pointed out by pylint, mainly unused imports.
2018-11-17html: Add tests for tree walkerLars-Dominik Braun1-1/+23
2018-11-17logger: Add more testsLars-Dominik Braun2-3/+25
2018-11-17browser: Add tests for header deserializationLars-Dominik Braun1-0/+39
2018-11-17devtools: Update browser flagsLars-Dominik Braun1-0/+12
Add a few more that seem reasonable.
2018-11-17browser: clearBrowserCookies is supported unconditionallyLars-Dominik Braun1-4/+1
canClearBrowserCookies apparently has been removed from protocol 1.3.
2018-11-17tools: Add original HTTP header to revisit recordLars-Dominik Braun2-11/+13
The payloads may be the same, but the headers are usually not.
2018-11-17click: Add gab.aiLars-Dominik Braun1-0/+10
Load more posts on profile page and more comments and replies on individual post pages.
2018-11-14Async chrome process startupLars-Dominik Braun6-157/+161
Move it to .devtools. Seems more fitting.
2018-11-10tools: Fix WARC mergingLars-Dominik Braun2-18/+205
WARC-Target-URI was taken from the previous record, even if the URI was different. This essentially removes the revisited URL from the archive. Also add a few tests. And boy, warcio is a mess.
2018-11-08devtools: Disable websocket pings to ChromeLars-Dominik Braun2-1/+12
Chrome does not like that.
2018-11-06Switch single mode to asyncioLars-Dominik Braun5-175/+141
This is a direct port to asyncio without any design changes. These need to happen in further refinements. Fixes issue #1.
2018-11-06Switch site loader to async DevTools communicationLars-Dominik Braun2-229/+236
2018-11-06Add simple asyncio-based DevTool communicationLars-Dominik Braun2-0/+406
Inspired by pychrome/aiochrome, but includes crash handling and async get() instead of callbacks.
2018-11-03html: Add tests for tag/attribute strippingLars-Dominik Braun1-0/+38
2018-10-30recursive: Actually stop the grab when canceledLars-Dominik Braun1-1/+3
This change was lost during the merge of 958563a3602780b48599c27acf212139c2e6904d.
2018-10-30Reduce idle wait time after stopping pageLars-Dominik Braun1-4/+4
2018-10-30Increase default timeoutsLars-Dominik Braun1-2/+2
These are more sane than the previous super-short defaults. Obviously this will slow down recursive crawls.