summaryrefslogtreecommitdiff
path: root/crocoite
AgeCommit message (Collapse)AuthorFilesLines
2019-07-02Stabilize WARC headersLars-Dominik Braun6-46/+73
In preparation for 1.0 release: - Correct mime types - Add X-Crocoite-Type, so logs, scripts, dom-snapshots and screenshots can be identified easily - Remove random WARC headers like X-Chrome-Initiator. We don’t want to maintain those. - Remove non-standard urn-based package URLs. Can’t use them without a urn-registration
2019-06-28tools: Add missing \n to JSON outputLars-Dominik Braun1-0/+1
Fixes 76811bd3f0b3fc8688939e31fdab2c71c89cc75b
2019-06-27extract-screenshot: Allow extracting only the first screenshotLars-Dominik Braun1-1/+6
2019-06-27merge: Dump machine-readable infoLars-Dominik Braun1-2/+18
2019-06-26Allow turning off cert validationLars-Dominik Braun3-11/+37
Add --insecure switch (shamelessly stolen from CURL) to both, -grab and -irc.
2019-06-26behavior: screenshot: Extend viewport for fixed elementsLars-Dominik Braun2-11/+57
Fixes #14, but needs a test case.
2019-06-18behavior: Fix screenshotsLars-Dominik Braun1-4/+16
Chrome’s behavior wrt screeshots changed in some version, so now artificially extending the viewport via device metrics is required.
2019-06-18Re-inject behavior scripts on site reloadLars-Dominik Braun7-52/+114
Fixes #13. Event handler’s push() is async now.
2019-06-18Fix idle state tracking race conditionLars-Dominik Braun4-93/+121
Closes #16. Expose SiteLoader’s page idle changes through events and move state tracking into controller event handler. Relies on tracking time instead of asyncio event, which is more reliable.
2019-06-17devtools: Fix testcaseLars-Dominik Braun1-3/+18
The body is only available after receiving the loadingFinished event.
2019-06-17html: Fix CDATA walkingLars-Dominik Braun2-5/+42
Missing “from” keyword, returned generator instead of dicts. Properly recreate CDATA elements now.
2019-06-17cli: Log exit statusLars-Dominik Braun1-0/+1
2019-05-30controller: Fix -recursive statsLars-Dominik Braun1-2/+5
have previously included running jobs. Remove them.
2019-05-30controller: Correctly re-raise exceptionsLars-Dominik Braun1-1/+2
asyncio.gather returns the task’s results or exception, not task objects. Probably a copy&paste error.
2019-05-30controller: Fix DepthLimitLars-Dominik Braun2-12/+45
The policy itself must be stateless, since there can be multiple ExtractLinks events (which would cause DepthLimit to reduce its depth every time).
2019-05-26behavior: Add clicking for vimeo.comLars-Dominik Braun1-0/+11
2019-05-22behavior: Extract links from plain-text documentsLars-Dominik Braun1-0/+13
2019-05-13devtools: Try to delete temp Chrome data dir – hardLars-Dominik Braun1-1/+11
Fixes #17.
2019-05-12behavior: Ignore invalid URLs when extracting linksLars-Dominik Braun2-2/+18
Fixes #18.
2019-05-05irc: Switch job id’s to proquintsLars-Dominik Braun1-4/+41
They’re easier to read and remember for humans. Plus we don’t really need 128 bits of randomness. Time-based id’s are fine here.
2019-05-05irc: Add job info to warcinfo recordLars-Dominik Braun2-6/+22
2019-05-05cli: Allow adding extra data to warcinfo recordLars-Dominik Braun2-4/+12
2019-05-04behavior: Add clicking for imgur.comLars-Dominik Braun1-0/+12
2019-05-02behavior: Load more content on steamcommunity.comLars-Dominik Braun1-1/+7
2019-03-22Move documentation to SphinxLars-Dominik Braun1-0/+44
2019-03-22behavior: Test DomSnapshotLars-Dominik Braun1-1/+27
2019-03-21behavior: Test ScreenshotLars-Dominik Braun2-16/+61
2019-03-21behavior: Test crashLars-Dominik Braun1-13/+36
2019-03-20behavior: Fix Reddit selectorsLars-Dominik Braun1-3/+11
2019-03-16browser: Raise exception if navigation failedLars-Dominik Braun3-8/+12
Stop early if there’s nothing to do.
2019-03-16Add more debug messagesLars-Dominik Braun3-2/+23
…to controller and behavior
2019-03-16browser: Use different UUID for loadingFinished/FailedLars-Dominik Braun1-1/+1
2019-03-08Use yaml.safe_load_allLars-Dominik Braun2-2/+2
load_all is deprecated. A safe YAML subset is fine for our purpose. See https://msg.pyyaml.org/load
2019-03-08behavior: Add “more replies” selector for YouTubeLars-Dominik Braun1-0/+4
2019-03-08behavior: Fix selectorsLars-Dominik Braun1-7/+5
Fix Facebook/Patreon selectors and Instagram example URL.
2019-03-08irc: Add config option need_voiceLars-Dominik Braun3-26/+53
Do not hardcode required priviledge to use bot, make it configureable.
2019-03-06irc: Remove unused args for on*Lars-Dominik Braun1-3/+3
onMode will not always receive nick and user argument (i.e. server sets mode). Remove them, since they are unused.
2019-03-05irc: Fix NAMES reply handlingLars-Dominik Braun1-1/+6
User list may be send using multiple reply messages if too long. Do not overwrite the previous one.
2019-03-05Replace mutable default argumentsLars-Dominik Braun2-9/+9
This fixes IRC permission checks. Previously all users who joined the channel after the bot stored their modes in the same set(). Can be detected with pylint W0102.
2019-02-02irc: Fail if bot command is emptyLars-Dominik Braun1-1/+1
2019-02-02irc: Retry if reconnect failsLars-Dominik Braun1-4/+8
2019-01-27Support manhole debuggingLars-Dominik Braun1-0/+5
Add optional support for manhole to all cli tools. Activated by signal USR1.
2019-01-27irc: Add URL blacklistLars-Dominik Braun2-3/+17
2019-01-27irc: Switch configuration to JSONLars-Dominik Braun1-12/+12
2019-01-27recursive: Avoid deadlock if unknown exception occursLars-Dominik Braun1-0/+9
Kill the subprocess and make sure we retrieve exceptions from .fetch()
2019-01-27Increase subprocess’ StreamReader limitsLars-Dominik Braun2-2/+2
We’re sending quite big JSON objects since 3a2fcc69a8eb4237b2862b3e291971d38748f115.
2019-01-26controller: Make sure idleTimeout is always appliedLars-Dominik Braun1-1/+3
If the browser goes idle before we enter `while True` we never notice and thus the idleTimeout is never applied.
2019-01-26irc: Fix format stringLars-Dominik Braun1-6/+6
2019-01-10browser: Use hypothesis’ domains()Lars-Dominik Braun1-5/+2
Fixes test.
2019-01-07controller: Test timeoutsLars-Dominik Braun1-0/+106
Lots of copy&pasta. Unfortunately the controller uses asyncio.sleep in a few places.