summaryrefslogtreecommitdiff
path: root/crocoite/behavior.py
AgeCommit message (Collapse)AuthorFilesLines
2018-06-20Synchronous SiteLoader event handlingLars-Dominik Braun1-44/+80
Previously a browser crash stalled the entire grab, since events from pychrome were handled asynchronously in a different thread and exceptions were not propagated to the main thread. Now all browser events are stored in a queue and processed by the main thread, allowing us to handle browser crashes gracefully (more or less). This made the following additional changes necessary: - Clear separation between producer (browser) and consumer (WARC, stats, …) - Behavior scripts now yield events as well, instead of accessing the WARC writer - WARC logging was removed (for now) and WARC writer does not require serialization any more
2018-06-03behavior: Wrap extract links script in anonymous namespaceLars-Dominik Braun1-1/+2
Otherwise it may clash with symbols defined by the page.
2018-05-04behavior: Add link extraction scriptLars-Dominik Braun1-2/+20
2018-04-20Save screenshot of entire pageLars-Dominik Braun1-6/+16
…and not just the current viewport. Due to limitations within Chrome it may be necessary to manually stitch multiple images if the page height exceeds 16k pixels.
2018-03-05Add generic click behavior scriptLars-Dominik Braun1-7/+8
Configureable. Clicks elements matching one (or more) CSS selectors once or multiple times. Currently supported: Facebook, Twitter, Disqus (embedded iframe)
2018-03-04Remove instagram behavior scriptLars-Dominik Braun1-6/+1
The “load more” button does not exist any more.
2017-12-24Refactor behavior scriptsLars-Dominik Braun1-11/+213
No functional changes, just cleanup. Replaces onload and onsnapshot events. Move screen metric emulation, DOM snapshots and screenshots here as well.
2017-12-19Select default behavior scripts by site URLLars-Dominik Braun1-0/+41