summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-07-02behavior: Add missing uuid’s to logging callLars-Dominik Braun1-2/+5
2019-07-02Fix exit status loggingLars-Dominik Braun1-1/+1
Fixes commit 158f55eb7fb24fa26727a008ad44964390171060. Logger works only if WARC is still open.
2019-07-02Stabilize WARC headersLars-Dominik Braun6-46/+73
In preparation for 1.0 release: - Correct mime types - Add X-Crocoite-Type, so logs, scripts, dom-snapshots and screenshots can be identified easily - Remove random WARC headers like X-Chrome-Initiator. We don’t want to maintain those. - Remove non-standard urn-based package URLs. Can’t use them without a urn-registration
2019-06-28tools: Add missing \n to JSON outputLars-Dominik Braun1-0/+1
Fixes 76811bd3f0b3fc8688939e31fdab2c71c89cc75b
2019-06-27extract-screenshot: Allow extracting only the first screenshotLars-Dominik Braun1-1/+6
2019-06-27merge: Dump machine-readable infoLars-Dominik Braun1-2/+18
2019-06-26Allow turning off cert validationLars-Dominik Braun3-11/+37
Add --insecure switch (shamelessly stolen from CURL) to both, -grab and -irc.
2019-06-26behavior: screenshot: Extend viewport for fixed elementsLars-Dominik Braun2-11/+57
Fixes #14, but needs a test case.
2019-06-18behavior: Fix screenshotsLars-Dominik Braun1-4/+16
Chrome’s behavior wrt screeshots changed in some version, so now artificially extending the viewport via device metrics is required.
2019-06-18Re-inject behavior scripts on site reloadLars-Dominik Braun7-52/+114
Fixes #13. Event handler’s push() is async now.
2019-06-18Fix idle state tracking race conditionLars-Dominik Braun4-93/+121
Closes #16. Expose SiteLoader’s page idle changes through events and move state tracking into controller event handler. Relies on tracking time instead of asyncio event, which is more reliable.
2019-06-17devtools: Fix testcaseLars-Dominik Braun1-3/+18
The body is only available after receiving the loadingFinished event.
2019-06-17html: Fix CDATA walkingLars-Dominik Braun2-5/+42
Missing “from” keyword, returned generator instead of dicts. Properly recreate CDATA elements now.
2019-06-17cli: Log exit statusLars-Dominik Braun1-0/+1
2019-05-30controller: Fix -recursive statsLars-Dominik Braun1-2/+5
have previously included running jobs. Remove them.
2019-05-30controller: Correctly re-raise exceptionsLars-Dominik Braun1-1/+2
asyncio.gather returns the task’s results or exception, not task objects. Probably a copy&paste error.
2019-05-30controller: Fix DepthLimitLars-Dominik Braun2-12/+45
The policy itself must be stateless, since there can be multiple ExtractLinks events (which would cause DepthLimit to reduce its depth every time).
2019-05-26behavior: Add clicking for vimeo.comLars-Dominik Braun1-0/+11
2019-05-24dashboard: Remove delete buttonLars-Dominik Braun2-16/+3
There’s really no point in having it
2019-05-24dashboard: Add global bot statsLars-Dominik Braun2-2/+18
2019-05-22behavior: Extract links from plain-text documentsLars-Dominik Braun1-0/+13
2019-05-13devtools: Try to delete temp Chrome data dir – hardLars-Dominik Braun1-1/+11
Fixes #17.
2019-05-12behavior: Ignore invalid URLs when extracting linksLars-Dominik Braun2-2/+18
Fixes #18.
2019-05-05irc: Switch job id’s to proquintsLars-Dominik Braun1-4/+41
They’re easier to read and remember for humans. Plus we don’t really need 128 bits of randomness. Time-based id’s are fine here.
2019-05-05irc: Add job info to warcinfo recordLars-Dominik Braun2-6/+22
2019-05-05cli: Allow adding extra data to warcinfo recordLars-Dominik Braun2-4/+12
2019-05-04behavior: Add clicking for imgur.comLars-Dominik Braun1-0/+12
2019-05-02behavior: Load more content on steamcommunity.comLars-Dominik Braun1-1/+7
2019-03-22Move documentation to SphinxLars-Dominik Braun9-215/+433
2019-03-22behavior: Test DomSnapshotLars-Dominik Braun1-1/+27
2019-03-21behavior: Test ScreenshotLars-Dominik Braun2-16/+61
2019-03-21behavior: Test crashLars-Dominik Braun1-13/+36
2019-03-21setup.py: Require Python >=3.6Lars-Dominik Braun1-1/+2
2019-03-20behavior: Fix Reddit selectorsLars-Dominik Braun1-3/+11
2019-03-16browser: Raise exception if navigation failedLars-Dominik Braun3-8/+12
Stop early if there’s nothing to do.
2019-03-16Add more debug messagesLars-Dominik Braun3-2/+23
…to controller and behavior
2019-03-16browser: Use different UUID for loadingFinished/FailedLars-Dominik Braun1-1/+1
2019-03-08Use yaml.safe_load_allLars-Dominik Braun2-2/+2
load_all is deprecated. A safe YAML subset is fine for our purpose. See https://msg.pyyaml.org/load
2019-03-08behavior: Add “more replies” selector for YouTubeLars-Dominik Braun1-0/+4
2019-03-08behavior: Fix selectorsLars-Dominik Braun1-7/+5
Fix Facebook/Patreon selectors and Instagram example URL.
2019-03-08irc: Add config option need_voiceLars-Dominik Braun4-27/+55
Do not hardcode required priviledge to use bot, make it configureable.
2019-03-06irc: Remove unused args for on*Lars-Dominik Braun1-3/+3
onMode will not always receive nick and user argument (i.e. server sets mode). Remove them, since they are unused.
2019-03-05irc: Fix NAMES reply handlingLars-Dominik Braun1-1/+6
User list may be send using multiple reply messages if too long. Do not overwrite the previous one.
2019-03-05Replace mutable default argumentsLars-Dominik Braun2-9/+9
This fixes IRC permission checks. Previously all users who joined the channel after the bot stored their modes in the same set(). Can be detected with pylint W0102.
2019-02-02irc: Fail if bot command is emptyLars-Dominik Braun1-1/+1
2019-02-02irc: Retry if reconnect failsLars-Dominik Braun1-4/+8
2019-01-27Support manhole debuggingLars-Dominik Braun2-0/+8
Add optional support for manhole to all cli tools. Activated by signal USR1.
2019-01-27irc: Add URL blacklistLars-Dominik Braun3-3/+20
2019-01-27irc: Switch configuration to JSONLars-Dominik Braun4-23/+25
2019-01-27recursive: Avoid deadlock if unknown exception occursLars-Dominik Braun1-0/+9
Kill the subprocess and make sure we retrieve exceptions from .fetch()