summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2018-11-24browser: Ignore load failures for nonexisting requestsLars-Dominik Braun1-2/+3
Fixes None dereference.
2018-11-22travis: Switch to xenialLars-Dominik Braun1-1/+3
The image offers Python 3.7 and 3.8-dev
2018-11-22controller: Improve idle waitingLars-Dominik Braun3-19/+89
2018-11-19controller: Add parameters to warcinfoLars-Dominik Braun1-0/+7
Add parameters the grab was run with, so we can actually reproduce a run.
2018-11-19Coding styleLars-Dominik Braun12-58/+44
Fix a few random issues pointed out by pylint, mainly unused imports.
2018-11-17html: Add tests for tree walkerLars-Dominik Braun1-1/+23
2018-11-17logger: Add more testsLars-Dominik Braun2-3/+25
2018-11-17browser: Add tests for header deserializationLars-Dominik Braun1-0/+39
2018-11-17devtools: Update browser flagsLars-Dominik Braun1-0/+12
Add a few more that seem reasonable.
2018-11-17browser: clearBrowserCookies is supported unconditionallyLars-Dominik Braun1-4/+1
canClearBrowserCookies apparently has been removed from protocol 1.3.
2018-11-17tools: Add original HTTP header to revisit recordLars-Dominik Braun2-11/+13
The payloads may be the same, but the headers are usually not.
2018-11-17click: Add gab.aiLars-Dominik Braun1-0/+10
Load more posts on profile page and more comments and replies on individual post pages.
2018-11-14Async chrome process startupLars-Dominik Braun6-157/+161
Move it to .devtools. Seems more fitting.
2018-11-10tools: Fix entry pointLars-Dominik Braun1-1/+1
2018-11-10tools: Fix WARC mergingLars-Dominik Braun2-18/+205
WARC-Target-URI was taken from the previous record, even if the URI was different. This essentially removes the revisited URL from the archive. Also add a few tests. And boy, warcio is a mess.
2018-11-09Add xml report for codecov.ioLars-Dominik Braun2-1/+4
2018-11-09Add codecov.ioLars-Dominik Braun2-0/+5
2018-11-08Travis has no version 3.7 yetLars-Dominik Braun1-1/+2
Use -dev
2018-11-08Drop support for Python <3.6Lars-Dominik Braun2-4/+2
2018-11-08Update READMELars-Dominik Braun1-2/+4
Dependency changes after asyncio transition.
2018-11-08devtools: Disable websocket pings to ChromeLars-Dominik Braun2-1/+12
Chrome does not like that.
2018-11-06Switch single mode to asyncioLars-Dominik Braun6-176/+141
This is a direct port to asyncio without any design changes. These need to happen in further refinements. Fixes issue #1.
2018-11-06Switch site loader to async DevTools communicationLars-Dominik Braun2-229/+236
2018-11-06Add simple asyncio-based DevTool communicationLars-Dominik Braun4-1/+412
Inspired by pychrome/aiochrome, but includes crash handling and async get() instead of callbacks.
2018-11-03html: Add tests for tag/attribute strippingLars-Dominik Braun1-0/+38
2018-10-30recursive: Actually stop the grab when canceledLars-Dominik Braun1-1/+3
This change was lost during the merge of 958563a3602780b48599c27acf212139c2e6904d.
2018-10-30Reduce idle wait time after stopping pageLars-Dominik Braun1-4/+4
2018-10-30Increase default timeoutsLars-Dominik Braun1-2/+2
These are more sane than the previous super-short defaults. Obviously this will slow down recursive crawls.
2018-10-23single: Set and recursive: check exit statusLars-Dominik Braun2-12/+34
Use exit status to signal something is wrong. Check it within recursive, increment crashed counter and do not move the resulting WARC, it might be broken.
2018-10-22behavior: Unload script only if the handle is validLars-Dominik Braun1-2/+4
For some reason with Google Chrome 70 this is not the case any more.
2018-10-14irc: Add PoC dashboardLars-Dominik Braun7-16/+277
Using websockets, vue and bulma.
2018-10-14irc: Graceful bot shutdownLars-Dominik Braun3-16/+110
Wait for remaining jobs to finish without accepting new ones, but still allow some interaction with the bot (status/revoke).
2018-10-11recursive: Gracefully shut down on SIGINT/TERMLars-Dominik Braun2-4/+18
2018-10-10Add timezone to logger datesLars-Dominik Braun2-1/+4
UTC everywhere. Make that clear.
2018-10-03controller: Depth limit does not work with i>1Lars-Dominik Braun1-1/+3
No easy way to fix this, so just limit to [0, 1] for now.
2018-10-03irc: Fix mode parsingLars-Dominik Braun2-7/+37
Ignore unsupported modes, add tests.
2018-10-02irc: Refactoring/beautificationLars-Dominik Braun2-101/+266
Add logging, split bot into abstract bot implementation and actual chromebot implementation, move some reusable checks into decorators.
2018-09-29Add documentationLars-Dominik Braun3-3/+44
For -recursive and -irc
2018-09-29irc: Limit number of processes spawnedLars-Dominik Braun2-21/+25
2018-09-29Add simple IRC botLars-Dominik Braun3-0/+275
chromebot is back! Dropping sopel, because it does not work well with asyncio.
2018-09-25Prevent recursing into arbitrary schemesLars-Dominik Braun1-1/+9
HTTP(S) only.
2018-09-25Parallelize recursive grabsLars-Dominik Braun2-5/+17
❤️ asyncio.
2018-09-25Add recursive controllerLars-Dominik Braun3-1/+170
Simple and sequential.
2018-09-25Immediately flush loggerLars-Dominik Braun1-0/+2
Consumers can read the latest gossip faster now.
2018-09-25Log extracted linksLars-Dominik Braun2-2/+25
2018-08-21Remove celery and recursionLars-Dominik Braun6-609/+24
Gonna rewrite that properly.
2018-08-19README: Add rationaleLars-Dominik Braun1-25/+87
Explain a few design decisions
2018-08-17behavior: Load more comments from FacebookLars-Dominik Braun1-0/+4
2018-08-05test_browser: Properly handle failed requestsLars-Dominik Braun2-15/+14
Fixes test failures. Very fragile code unfortunately.
2018-08-04Properly handle failure to retrieve request bodyLars-Dominik Braun3-5/+50
Just truncate the WARC record like we do with responses. Also add a few tests, but they’re not covering the call to getRequestPostData. Not sure what we have to do here.