Age | Commit message (Collapse) | Author | Files | Lines | |
---|---|---|---|---|---|
2019-05-24 | dashboard: Add global bot stats | Lars-Dominik Braun | 2 | -2/+18 | |
2019-03-08 | irc: Add config option need_voice | Lars-Dominik Braun | 1 | -1/+2 | |
Do not hardcode required priviledge to use bot, make it configureable. | |||||
2019-01-27 | irc: Add URL blacklist | Lars-Dominik Braun | 1 | -0/+3 | |
2019-01-27 | irc: Switch configuration to JSON | Lars-Dominik Braun | 2 | -10/+12 | |
2018-12-05 | irc: Add example config file | Lars-Dominik Braun | 1 | -0/+10 | |
2018-10-14 | irc: Add PoC dashboard | Lars-Dominik Braun | 3 | -0/+156 | |
Using websockets, vue and bulma. | |||||
2018-08-21 | Remove celery and recursion | Lars-Dominik Braun | 1 | -229/+0 | |
Gonna rewrite that properly. | |||||
2018-06-25 | warc: Save DOM-/image screenshot as WARC conversion | Lars-Dominik Braun | 1 | -2/+1 | |
Judging from the docs this is the proper way to store these resources. Enable both for the IRC bot by default, since they won’t interfere with IA’s wayback machine. | |||||
2018-06-20 | Synchronous SiteLoader event handling | Lars-Dominik Braun | 1 | -5/+4 | |
Previously a browser crash stalled the entire grab, since events from pychrome were handled asynchronously in a different thread and exceptions were not propagated to the main thread. Now all browser events are stored in a queue and processed by the main thread, allowing us to handle browser crashes gracefully (more or less). This made the following additional changes necessary: - Clear separation between producer (browser) and consumer (WARC, stats, …) - Behavior scripts now yield events as well, instead of accessing the WARC writer - WARC logging was removed (for now) and WARC writer does not require serialization any more | |||||
2018-05-05 | Rename command line tools | Lars-Dominik Braun | 2 | -124/+0 | |
Move contrib/ scripts to .tools and add entry points to setup.py, rename crocoite-standalone to crocoite-grab. | |||||
2018-05-05 | contrib: Add WARC merging script | Lars-Dominik Braun | 1 | -0/+70 | |
Very useful for distributed, recursive crawls which create one WARC per page. | |||||
2018-05-04 | sopel: Use recursive, distributed controller | Lars-Dominik Braun | 1 | -2/+7 | |
2018-05-04 | IRC plugin: Use argparse | Lars-Dominik Braun | 1 | -17/+33 | |
2018-05-04 | Move page archiving logic to SinglePageController | Lars-Dominik Braun | 1 | -15/+12 | |
In preparation for recursive crawls. | |||||
2018-04-20 | Add screenshot extraction script to contrib/ | Lars-Dominik Braun | 1 | -0/+54 | |
2018-02-22 | irc plugin: Serialize celery operations | Lars-Dominik Braun | 1 | -68/+105 | |
This is a workaround for https://github.com/celery/celery/issues/4480 | |||||
2017-12-25 | Increase default body size | Lars-Dominik Braun | 1 | -4/+4 | |
2017-12-24 | Refactor behavior scripts | Lars-Dominik Braun | 1 | -10/+7 | |
No functional changes, just cleanup. Replaces onload and onsnapshot events. Move screen metric emulation, DOM snapshots and screenshots here as well. | |||||
2017-12-22 | Add simple stats-keeping SiteLoader | Lars-Dominik Braun | 1 | -1/+14 | |
2017-12-19 | Select default behavior scripts by site URL | Lars-Dominik Braun | 1 | -2/+24 | |
2017-12-17 | Add distributed archiving | Lars-Dominik Braun | 1 | -0/+144 | |
Using celery. Also adds a plugin for the IRC bot sopel. Code still needs some love, but it should work. |