summaryrefslogtreecommitdiff
path: root/crocoite/behavior.py
AgeCommit message (Collapse)AuthorFilesLines
2019-12-29behavior: Document RAM usage of screenshot pluginLars-Dominik Braun1-0/+2
2019-07-25behavior: Ignore failed onload script injectionLars-Dominik Braun1-10/+21
Will be re-injected by controller anyway.
2019-07-02behavior: Add missing uuid’s to logging callLars-Dominik Braun1-2/+5
2019-07-02Stabilize WARC headersLars-Dominik Braun1-2/+8
In preparation for 1.0 release: - Correct mime types - Add X-Crocoite-Type, so logs, scripts, dom-snapshots and screenshots can be identified easily - Remove random WARC headers like X-Chrome-Initiator. We don’t want to maintain those. - Remove non-standard urn-based package URLs. Can’t use them without a urn-registration
2019-06-26behavior: screenshot: Extend viewport for fixed elementsLars-Dominik Braun1-11/+37
Fixes #14, but needs a test case.
2019-06-18behavior: Fix screenshotsLars-Dominik Braun1-4/+16
Chrome’s behavior wrt screeshots changed in some version, so now artificially extending the viewport via device metrics is required.
2019-05-12behavior: Ignore invalid URLs when extracting linksLars-Dominik Braun1-1/+8
Fixes #18.
2019-03-21behavior: Test ScreenshotLars-Dominik Braun1-6/+11
2019-03-16Add more debug messagesLars-Dominik Braun1-0/+4
…to controller and behavior
2019-03-08Use yaml.safe_load_allLars-Dominik Braun1-1/+1
load_all is deprecated. A safe YAML subset is fine for our purpose. See https://msg.pyyaml.org/load
2019-01-07Log Chrome’s responses to WARC by defaultLars-Dominik Braun1-2/+7
We may not be able to reproduce every failure, so logging as much as possible is important to figure out what went wrong. Also, in case a bug is uncovered in the future, we can check the logs and possibly fix it with -errata.
2019-01-04behavior: Ignore onstop() failureLars-Dominik Braun1-4/+14
Fails if the page is reloaded/redirected. See issue #13.
2019-01-04coverage: Ignore a few unreachable statementsLars-Dominik Braun1-6/+6
2018-12-25warc: Add testsLars-Dominik Braun1-0/+3
Using hyothesis-based testcase generation. This is quite nice compared to manual test data generation, since it catches alot more corner cases (if done right). This commit also fixes a few issues, including: - log records will only be written if the log is nonempty - properly quote packageUrl path’s - drop old thread checking code - use placeholder url for scripts without name
2018-12-24Use f-strings where possibleLars-Dominik Braun1-5/+5
Replaces str.format, which is less readable due to its separation of format and arguments.
2018-12-21Parse URLs by defaultLars-Dominik Braun1-22/+10
Use library yarl (already pulled in by aiohttp). No URL processed should be a string.
2018-12-08behavior: Dump script options to file as wellLars-Dominik Braun1-3/+5
click.js’s data was part of the script before 22adde79940d32c5f094f26f3e18b7160e7ccafc. Now it is injected dynamically, but it still would be nice to have the data available.
2018-12-02behavior: Add more documentationLars-Dominik Braun1-2/+14
2018-12-02behavior: Remove outdated commentLars-Dominik Braun1-3/+0
2018-12-02behavior: Re-enable clearDeviceMetricsOverrideLars-Dominik Braun1-4/+1
Seems to be working again. Chrome bug?
2018-12-02behavior: Remove unused slotsLars-Dominik Braun1-2/+0
2018-12-01util: Remove unused functionLars-Dominik Braun1-1/+1
2018-12-01behavior: Move click script data to external fileLars-Dominik Braun1-2/+25
First step of issue #3
2018-11-25behavior: Turn scroll JS code into classLars-Dominik Braun1-1/+2
2018-11-24behavior: Fix scrollingLars-Dominik Braun1-21/+13
- Introduce stop() method callable from Python. Looks like the old method (global variable) was not working (any more?). This is much better anyway. - Restore state of scrolled elements (not window). Fixes weird screenshots of twitter.com.
2018-11-06Switch single mode to asyncioLars-Dominik Braun1-41/+48
This is a direct port to asyncio without any design changes. These need to happen in further refinements. Fixes issue #1.
2018-10-22behavior: Unload script only if the handle is validLars-Dominik Braun1-2/+4
For some reason with Google Chrome 70 this is not the case any more.
2018-08-04Reintroduce WARC loggingLars-Dominik Braun1-17/+17
Commit 7730e0d64ec895091a0dd7eb0e3c6ce2ed02d981 removed logging to WARC files. Add it again, but with a different implementation.. Credits to structlog for inspiration.
2018-06-25warc: Save DOM-/image screenshot as WARC conversionLars-Dominik Braun1-5/+12
Judging from the docs this is the proper way to store these resources. Enable both for the IRC bot by default, since they won’t interfere with IA’s wayback machine.
2018-06-21Fix a few issues pointed out by pylintLars-Dominik Braun1-4/+1
2018-06-20Add __slots__ to classesLars-Dominik Braun1-1/+19
This is mainly a quality of life change
2018-06-20Synchronous SiteLoader event handlingLars-Dominik Braun1-44/+80
Previously a browser crash stalled the entire grab, since events from pychrome were handled asynchronously in a different thread and exceptions were not propagated to the main thread. Now all browser events are stored in a queue and processed by the main thread, allowing us to handle browser crashes gracefully (more or less). This made the following additional changes necessary: - Clear separation between producer (browser) and consumer (WARC, stats, …) - Behavior scripts now yield events as well, instead of accessing the WARC writer - WARC logging was removed (for now) and WARC writer does not require serialization any more
2018-06-03behavior: Wrap extract links script in anonymous namespaceLars-Dominik Braun1-1/+2
Otherwise it may clash with symbols defined by the page.
2018-05-04behavior: Add link extraction scriptLars-Dominik Braun1-2/+20
2018-04-20Save screenshot of entire pageLars-Dominik Braun1-6/+16
…and not just the current viewport. Due to limitations within Chrome it may be necessary to manually stitch multiple images if the page height exceeds 16k pixels.
2018-03-05Add generic click behavior scriptLars-Dominik Braun1-7/+8
Configureable. Clicks elements matching one (or more) CSS selectors once or multiple times. Currently supported: Facebook, Twitter, Disqus (embedded iframe)
2018-03-04Remove instagram behavior scriptLars-Dominik Braun1-6/+1
The “load more” button does not exist any more.
2017-12-24Refactor behavior scriptsLars-Dominik Braun1-11/+213
No functional changes, just cleanup. Replaces onload and onsnapshot events. Move screen metric emulation, DOM snapshots and screenshots here as well.
2017-12-19Select default behavior scripts by site URLLars-Dominik Braun1-0/+41