summaryrefslogtreecommitdiff
path: root/crocoite
AgeCommit message (Collapse)AuthorFilesLines
2019-12-30behavior: Additional selector for TwitterHEADmasterLars-Dominik Braun1-0/+3
Fixes #23. Thanks to SootBectr.
2019-12-29behavior: Document RAM usage of screenshot pluginLars-Dominik Braun1-0/+2
2019-12-29cli: Ignore future cancellation on the top levelLars-Dominik Braun1-2/+6
2019-12-29controller: Include dest template in temp file nameLars-Dominik Braun1-1/+2
Makes it easier to figure out which temporary file belongs to which job for the IRC bot.
2019-12-29behavior: Replace broken test linkLars-Dominik Braun1-1/+1
2019-12-29behavior: Fix test failureLars-Dominik Braun1-0/+1
yarl before version 1.4 fails to parse this (invalid) URL. Now it simply accepts it. Pin yarl version to avoid random breakage in the future.
2019-10-19devtools: Fix load testcaseLars-Dominik Braun1-18/+35
Handle new *ExtraInfo events, but do not use them in browser yet, since they’re still marked experimental.
2019-10-18click: Fix click selectorsLars-Dominik Braun1-2/+2
YouTube and Vimeo.
2019-10-13browser: Work around missing responseReceived eventsLars-Dominik Braun1-0/+7
Looks like Chrome extensively reuses request ids now. Sucks, since we relied on their uniqueness. For now ignore requests without a dedicated responseReceived event. See issue #24.
2019-10-13extract-links: Do not depend on document.bodyLars-Dominik Braun1-1/+1
Fixes #25. Root frame does not actually display a page. Can’t reproduce this issue with a simple test case unfortunately.
2019-10-13devtools: Remove explicit loop parameterLars-Dominik Braun1-5/+4
aiohttp removed it with release 4.0.0a1: https://github.com/aio-libs/aiohttp/commit/c8dbe758e2cfa4304cab9a1b056031aba92e4f02 and we weren’t using it anyway.
2019-07-29doc: Auto-generate list of supported click selectorsLars-Dominik Braun1-20/+22
Using shinx plugin. Also improve click selector descriptions for this purpose.
2019-07-28behavior: Update click selectorsLars-Dominik Braun1-9/+3
2019-07-28behavior: Increase idle timeout for click testingLars-Dominik Braun1-1/+3
2019-07-28Fix wrong Content-Type header parameterLars-Dominik Braun4-23/+119
In line with HTTP “encoding” parameter should be called “charset”. Fixable errata item created. Fixes issue #19.
2019-07-25behavior: Ignore failed onload script injectionLars-Dominik Braun1-10/+21
Will be re-injected by controller anyway.
2019-07-13Cookie injection supportLars-Dominik Braun6-16/+138
Add command-line options injecting individual cookies or cookie file into Chrome. Provide default cookie file. This changes the IRC bot’s command splitting to shlex.split, which allows shell-like argument quoting. Fixes #7.
2019-07-11devtools: Add more crash error handlingLars-Dominik Braun1-6/+23
In case the whole browser crashes (rare) we will neither be able to close the tab on __aexit__, nor send SIGTERM to it. Make sure we still terminate gracefully.
2019-07-06controller: Add missing importLars-Dominik Braun1-1/+1
2019-07-04dashboard: Ignore invalid json inputLars-Dominik Braun1-1/+5
We should be able to recover from this.
2019-07-04behavior: Update click selector listLars-Dominik Braun1-18/+3
Remove instagram, no stable CSS names. Update gab.
2019-07-04devtools: Prefix temp directoriesLars-Dominik Braun1-1/+1
2019-07-04Rename cli utilsLars-Dominik Braun3-90/+102
crocoite-recursive is now just crocoite, crocoite-grab is not user-facing any more and called crocoite-single. In preparation for 1.0 release.
2019-07-03irc: Do not respond when not addressed directlyLars-Dominik Braun1-1/+1
This fixes annoying messages when using the bot’s nick as the first word of a message, i.e. “chromebot can do that”.
2019-07-02behavior: Add missing uuid’s to logging callLars-Dominik Braun1-2/+5
2019-07-02Fix exit status loggingLars-Dominik Braun1-1/+1
Fixes commit 158f55eb7fb24fa26727a008ad44964390171060. Logger works only if WARC is still open.
2019-07-02Stabilize WARC headersLars-Dominik Braun6-46/+73
In preparation for 1.0 release: - Correct mime types - Add X-Crocoite-Type, so logs, scripts, dom-snapshots and screenshots can be identified easily - Remove random WARC headers like X-Chrome-Initiator. We don’t want to maintain those. - Remove non-standard urn-based package URLs. Can’t use them without a urn-registration
2019-06-28tools: Add missing \n to JSON outputLars-Dominik Braun1-0/+1
Fixes 76811bd3f0b3fc8688939e31fdab2c71c89cc75b
2019-06-27extract-screenshot: Allow extracting only the first screenshotLars-Dominik Braun1-1/+6
2019-06-27merge: Dump machine-readable infoLars-Dominik Braun1-2/+18
2019-06-26Allow turning off cert validationLars-Dominik Braun3-11/+37
Add --insecure switch (shamelessly stolen from CURL) to both, -grab and -irc.
2019-06-26behavior: screenshot: Extend viewport for fixed elementsLars-Dominik Braun2-11/+57
Fixes #14, but needs a test case.
2019-06-18behavior: Fix screenshotsLars-Dominik Braun1-4/+16
Chrome’s behavior wrt screeshots changed in some version, so now artificially extending the viewport via device metrics is required.
2019-06-18Re-inject behavior scripts on site reloadLars-Dominik Braun7-52/+114
Fixes #13. Event handler’s push() is async now.
2019-06-18Fix idle state tracking race conditionLars-Dominik Braun4-93/+121
Closes #16. Expose SiteLoader’s page idle changes through events and move state tracking into controller event handler. Relies on tracking time instead of asyncio event, which is more reliable.
2019-06-17devtools: Fix testcaseLars-Dominik Braun1-3/+18
The body is only available after receiving the loadingFinished event.
2019-06-17html: Fix CDATA walkingLars-Dominik Braun2-5/+42
Missing “from” keyword, returned generator instead of dicts. Properly recreate CDATA elements now.
2019-06-17cli: Log exit statusLars-Dominik Braun1-0/+1
2019-05-30controller: Fix -recursive statsLars-Dominik Braun1-2/+5
have previously included running jobs. Remove them.
2019-05-30controller: Correctly re-raise exceptionsLars-Dominik Braun1-1/+2
asyncio.gather returns the task’s results or exception, not task objects. Probably a copy&paste error.
2019-05-30controller: Fix DepthLimitLars-Dominik Braun2-12/+45
The policy itself must be stateless, since there can be multiple ExtractLinks events (which would cause DepthLimit to reduce its depth every time).
2019-05-26behavior: Add clicking for vimeo.comLars-Dominik Braun1-0/+11
2019-05-22behavior: Extract links from plain-text documentsLars-Dominik Braun1-0/+13
2019-05-13devtools: Try to delete temp Chrome data dir – hardLars-Dominik Braun1-1/+11
Fixes #17.
2019-05-12behavior: Ignore invalid URLs when extracting linksLars-Dominik Braun2-2/+18
Fixes #18.
2019-05-05irc: Switch job id’s to proquintsLars-Dominik Braun1-4/+41
They’re easier to read and remember for humans. Plus we don’t really need 128 bits of randomness. Time-based id’s are fine here.
2019-05-05irc: Add job info to warcinfo recordLars-Dominik Braun2-6/+22
2019-05-05cli: Allow adding extra data to warcinfo recordLars-Dominik Braun2-4/+12
2019-05-04behavior: Add clicking for imgur.comLars-Dominik Braun1-0/+12
2019-05-02behavior: Load more content on steamcommunity.comLars-Dominik Braun1-1/+7