summaryrefslogtreecommitdiff
AgeCommit message (Collapse)AuthorFilesLines
2019-07-25behavior: Ignore failed onload script injectionLars-Dominik Braun1-10/+21
Will be re-injected by controller anyway.
2019-07-13Cookie injection supportLars-Dominik Braun8-22/+214
Add command-line options injecting individual cookies or cookie file into Chrome. Provide default cookie file. This changes the IRC bot’s command splitting to shlex.split, which allows shell-like argument quoting. Fixes #7.
2019-07-11devtools: Add more crash error handlingLars-Dominik Braun1-6/+23
In case the whole browser crashes (rare) we will neither be able to close the tab on __aexit__, nor send SIGTERM to it. Make sure we still terminate gracefully.
2019-07-06Improve documentationLars-Dominik Braun2-12/+68
2019-07-06controller: Add missing importLars-Dominik Braun1-1/+1
2019-07-04Release version 1.0.0v1.0.0Lars-Dominik Braun1-1/+11
2019-07-04dashboard: Ignore invalid json inputLars-Dominik Braun1-1/+5
We should be able to recover from this.
2019-07-04behavior: Update click selector listLars-Dominik Braun1-18/+3
Remove instagram, no stable CSS names. Update gab.
2019-07-04Update documentationLars-Dominik Braun6-60/+62
Re-arrange stuff, add release guide. Needs a lot more work though.
2019-07-04devtools: Prefix temp directoriesLars-Dominik Braun1-1/+1
2019-07-04Rename cli utilsLars-Dominik Braun6-98/+127
crocoite-recursive is now just crocoite, crocoite-grab is not user-facing any more and called crocoite-single. In preparation for 1.0 release.
2019-07-03irc: Do not respond when not addressed directlyLars-Dominik Braun1-1/+1
This fixes annoying messages when using the bot’s nick as the first word of a message, i.e. “chromebot can do that”.
2019-07-02behavior: Add missing uuid’s to logging callLars-Dominik Braun1-2/+5
2019-07-02Fix exit status loggingLars-Dominik Braun1-1/+1
Fixes commit 158f55eb7fb24fa26727a008ad44964390171060. Logger works only if WARC is still open.
2019-07-02Stabilize WARC headersLars-Dominik Braun6-46/+73
In preparation for 1.0 release: - Correct mime types - Add X-Crocoite-Type, so logs, scripts, dom-snapshots and screenshots can be identified easily - Remove random WARC headers like X-Chrome-Initiator. We don’t want to maintain those. - Remove non-standard urn-based package URLs. Can’t use them without a urn-registration
2019-06-28tools: Add missing \n to JSON outputLars-Dominik Braun1-0/+1
Fixes 76811bd3f0b3fc8688939e31fdab2c71c89cc75b
2019-06-27extract-screenshot: Allow extracting only the first screenshotLars-Dominik Braun1-1/+6
2019-06-27merge: Dump machine-readable infoLars-Dominik Braun1-2/+18
2019-06-26Allow turning off cert validationLars-Dominik Braun3-11/+37
Add --insecure switch (shamelessly stolen from CURL) to both, -grab and -irc.
2019-06-26behavior: screenshot: Extend viewport for fixed elementsLars-Dominik Braun2-11/+57
Fixes #14, but needs a test case.
2019-06-18behavior: Fix screenshotsLars-Dominik Braun1-4/+16
Chrome’s behavior wrt screeshots changed in some version, so now artificially extending the viewport via device metrics is required.
2019-06-18Re-inject behavior scripts on site reloadLars-Dominik Braun7-52/+114
Fixes #13. Event handler’s push() is async now.
2019-06-18Fix idle state tracking race conditionLars-Dominik Braun4-93/+121
Closes #16. Expose SiteLoader’s page idle changes through events and move state tracking into controller event handler. Relies on tracking time instead of asyncio event, which is more reliable.
2019-06-17devtools: Fix testcaseLars-Dominik Braun1-3/+18
The body is only available after receiving the loadingFinished event.
2019-06-17html: Fix CDATA walkingLars-Dominik Braun2-5/+42
Missing “from” keyword, returned generator instead of dicts. Properly recreate CDATA elements now.
2019-06-17cli: Log exit statusLars-Dominik Braun1-0/+1
2019-05-30controller: Fix -recursive statsLars-Dominik Braun1-2/+5
have previously included running jobs. Remove them.
2019-05-30controller: Correctly re-raise exceptionsLars-Dominik Braun1-1/+2
asyncio.gather returns the task’s results or exception, not task objects. Probably a copy&paste error.
2019-05-30controller: Fix DepthLimitLars-Dominik Braun2-12/+45
The policy itself must be stateless, since there can be multiple ExtractLinks events (which would cause DepthLimit to reduce its depth every time).
2019-05-26behavior: Add clicking for vimeo.comLars-Dominik Braun1-0/+11
2019-05-24dashboard: Remove delete buttonLars-Dominik Braun2-16/+3
There’s really no point in having it
2019-05-24dashboard: Add global bot statsLars-Dominik Braun2-2/+18
2019-05-22behavior: Extract links from plain-text documentsLars-Dominik Braun1-0/+13
2019-05-13devtools: Try to delete temp Chrome data dir – hardLars-Dominik Braun1-1/+11
Fixes #17.
2019-05-12behavior: Ignore invalid URLs when extracting linksLars-Dominik Braun2-2/+18
Fixes #18.
2019-05-05irc: Switch job id’s to proquintsLars-Dominik Braun1-4/+41
They’re easier to read and remember for humans. Plus we don’t really need 128 bits of randomness. Time-based id’s are fine here.
2019-05-05irc: Add job info to warcinfo recordLars-Dominik Braun2-6/+22
2019-05-05cli: Allow adding extra data to warcinfo recordLars-Dominik Braun2-4/+12
2019-05-04behavior: Add clicking for imgur.comLars-Dominik Braun1-0/+12
2019-05-02behavior: Load more content on steamcommunity.comLars-Dominik Braun1-1/+7
2019-03-22Move documentation to SphinxLars-Dominik Braun9-215/+433
2019-03-22behavior: Test DomSnapshotLars-Dominik Braun1-1/+27
2019-03-21behavior: Test ScreenshotLars-Dominik Braun2-16/+61
2019-03-21behavior: Test crashLars-Dominik Braun1-13/+36
2019-03-21setup.py: Require Python >=3.6Lars-Dominik Braun1-1/+2
2019-03-20behavior: Fix Reddit selectorsLars-Dominik Braun1-3/+11
2019-03-16browser: Raise exception if navigation failedLars-Dominik Braun3-8/+12
Stop early if there’s nothing to do.
2019-03-16Add more debug messagesLars-Dominik Braun3-2/+23
…to controller and behavior
2019-03-16browser: Use different UUID for loadingFinished/FailedLars-Dominik Braun1-1/+1
2019-03-08Use yaml.safe_load_allLars-Dominik Braun2-2/+2
load_all is deprecated. A safe YAML subset is fine for our purpose. See https://msg.pyyaml.org/load