summaryrefslogtreecommitdiff
path: root/crocoite
AgeCommit message (Collapse)AuthorFilesLines
2019-06-17html: Fix CDATA walkingLars-Dominik Braun2-5/+42
Missing “from” keyword, returned generator instead of dicts. Properly recreate CDATA elements now.
2019-06-17cli: Log exit statusLars-Dominik Braun1-0/+1
2019-05-30controller: Fix -recursive statsLars-Dominik Braun1-2/+5
have previously included running jobs. Remove them.
2019-05-30controller: Correctly re-raise exceptionsLars-Dominik Braun1-1/+2
asyncio.gather returns the task’s results or exception, not task objects. Probably a copy&paste error.
2019-05-30controller: Fix DepthLimitLars-Dominik Braun2-12/+45
The policy itself must be stateless, since there can be multiple ExtractLinks events (which would cause DepthLimit to reduce its depth every time).
2019-05-26behavior: Add clicking for vimeo.comLars-Dominik Braun1-0/+11
2019-05-22behavior: Extract links from plain-text documentsLars-Dominik Braun1-0/+13
2019-05-13devtools: Try to delete temp Chrome data dir – hardLars-Dominik Braun1-1/+11
Fixes #17.
2019-05-12behavior: Ignore invalid URLs when extracting linksLars-Dominik Braun2-2/+18
Fixes #18.
2019-05-05irc: Switch job id’s to proquintsLars-Dominik Braun1-4/+41
They’re easier to read and remember for humans. Plus we don’t really need 128 bits of randomness. Time-based id’s are fine here.
2019-05-05irc: Add job info to warcinfo recordLars-Dominik Braun2-6/+22
2019-05-05cli: Allow adding extra data to warcinfo recordLars-Dominik Braun2-4/+12
2019-05-04behavior: Add clicking for imgur.comLars-Dominik Braun1-0/+12
2019-05-02behavior: Load more content on steamcommunity.comLars-Dominik Braun1-1/+7
2019-03-22Move documentation to SphinxLars-Dominik Braun1-0/+44
2019-03-22behavior: Test DomSnapshotLars-Dominik Braun1-1/+27
2019-03-21behavior: Test ScreenshotLars-Dominik Braun2-16/+61
2019-03-21behavior: Test crashLars-Dominik Braun1-13/+36
2019-03-20behavior: Fix Reddit selectorsLars-Dominik Braun1-3/+11
2019-03-16browser: Raise exception if navigation failedLars-Dominik Braun3-8/+12
Stop early if there’s nothing to do.
2019-03-16Add more debug messagesLars-Dominik Braun3-2/+23
…to controller and behavior
2019-03-16browser: Use different UUID for loadingFinished/FailedLars-Dominik Braun1-1/+1
2019-03-08Use yaml.safe_load_allLars-Dominik Braun2-2/+2
load_all is deprecated. A safe YAML subset is fine for our purpose. See https://msg.pyyaml.org/load
2019-03-08behavior: Add “more replies” selector for YouTubeLars-Dominik Braun1-0/+4
2019-03-08behavior: Fix selectorsLars-Dominik Braun1-7/+5
Fix Facebook/Patreon selectors and Instagram example URL.
2019-03-08irc: Add config option need_voiceLars-Dominik Braun3-26/+53
Do not hardcode required priviledge to use bot, make it configureable.
2019-03-06irc: Remove unused args for on*Lars-Dominik Braun1-3/+3
onMode will not always receive nick and user argument (i.e. server sets mode). Remove them, since they are unused.
2019-03-05irc: Fix NAMES reply handlingLars-Dominik Braun1-1/+6
User list may be send using multiple reply messages if too long. Do not overwrite the previous one.
2019-03-05Replace mutable default argumentsLars-Dominik Braun2-9/+9
This fixes IRC permission checks. Previously all users who joined the channel after the bot stored their modes in the same set(). Can be detected with pylint W0102.
2019-02-02irc: Fail if bot command is emptyLars-Dominik Braun1-1/+1
2019-02-02irc: Retry if reconnect failsLars-Dominik Braun1-4/+8
2019-01-27Support manhole debuggingLars-Dominik Braun1-0/+5
Add optional support for manhole to all cli tools. Activated by signal USR1.
2019-01-27irc: Add URL blacklistLars-Dominik Braun2-3/+17
2019-01-27irc: Switch configuration to JSONLars-Dominik Braun1-12/+12
2019-01-27recursive: Avoid deadlock if unknown exception occursLars-Dominik Braun1-0/+9
Kill the subprocess and make sure we retrieve exceptions from .fetch()
2019-01-27Increase subprocess’ StreamReader limitsLars-Dominik Braun2-2/+2
We’re sending quite big JSON objects since 3a2fcc69a8eb4237b2862b3e291971d38748f115.
2019-01-26controller: Make sure idleTimeout is always appliedLars-Dominik Braun1-1/+3
If the browser goes idle before we enter `while True` we never notice and thus the idleTimeout is never applied.
2019-01-26irc: Fix format stringLars-Dominik Braun1-6/+6
2019-01-10browser: Use hypothesis’ domains()Lars-Dominik Braun1-5/+2
Fixes test.
2019-01-07controller: Test timeoutsLars-Dominik Braun1-0/+106
Lots of copy&pasta. Unfortunately the controller uses asyncio.sleep in a few places.
2019-01-07Log Chrome’s responses to WARC by defaultLars-Dominik Braun5-19/+32
We may not be able to reproduce every failure, so logging as much as possible is important to figure out what went wrong. Also, in case a bug is uncovered in the future, we can check the logs and possibly fix it with -errata.
2019-01-05browser: Do not overwrite request data when prefetchingLars-Dominik Braun1-2/+0
Needs a testcase.
2019-01-05html: Handle CDATALars-Dominik Braun1-1/+5
When loading XML documents Chrome presents a pretty-printed version to the user, which still contains the original XML when exporting via DOM.getDocument. Not sure how to test this.
2019-01-05controller: Fix PrefixLimitLars-Dominik Braun1-1/+1
Probably broken by the transition to URL() in commit 5e444dd6511d97308a84ae9c86ebf14547d01f01 And yes, we desperately need some tests for this.
2019-01-04behavior: Ignore onstop() failureLars-Dominik Braun1-4/+14
Fails if the page is reloaded/redirected. See issue #13.
2019-01-04logger: Do not log debug by defaultLars-Dominik Braun1-1/+1
Must’ve slipped through.
2019-01-04coverage: Ignore a few unreachable statementsLars-Dominik Braun2-7/+7
2019-01-04behavior: Support clicking area and add testcaseLars-Dominik Braun2-7/+76
2019-01-03browser: Turn Item into RequestResponsePairLars-Dominik Braun6-485/+627
Previously Item was just a simple wrapper around Chrome’s Network.* events. This turned out to be quite nasty when testing, so its replacement, RequestResponsePair, does some level of abstraction. This makes testing alot easier, since we now can simply instantiate it without building a proper DevTools event. Should come without any functional changes.
2018-12-31extract-screenshot: Remove URL from filenameLars-Dominik Braun1-8/+19
URL’s can get quite long, overflowing the file name length limit. Instead use sequential filenames and output metadata to stdout.