diff options
author | Lars-Dominik Braun <lars@6xq.net> | 2017-12-10 12:31:07 +0100 |
---|---|---|
committer | Lars-Dominik Braun <lars@6xq.net> | 2017-12-17 16:40:16 +0100 |
commit | 84c3f69293fa79d752127410c7468038c907c96a (patch) | |
tree | 4a71dcddd6abc6eeda30ed40bd78d91518efde38 /README.rst | |
parent | f816319081d5253974ddb70b655d55f4a880a77a (diff) | |
download | crocoite-84c3f69293fa79d752127410c7468038c907c96a.tar.gz crocoite-84c3f69293fa79d752127410c7468038c907c96a.tar.bz2 crocoite-84c3f69293fa79d752127410c7468038c907c96a.zip |
Add distributed archiving
Using celery. Also adds a plugin for the IRC bot sopel. Code still needs
some love, but it should work.
Diffstat (limited to 'README.rst')
-rw-r--r-- | README.rst | 38 |
1 files changed, 38 insertions, 0 deletions
@@ -66,3 +66,41 @@ also saved. This causes its own set of issues though: - JavaScript-based navigation does not work. +Distributed crawling +-------------------- + +Configure using celeryconfig.py + +.. code:: python + + broker_url = 'pyamqp://' + result_backend = 'rpc://' + warc_filename = '{domain}-{date}-{id}.warc.gz' + temp_dir = '/tmp/' + finished_dir = '/tmp/finished' + +Start a Celery worker:: + + celery -A crocoite.cli worker --loglevel=info + +Then queue archive job:: + + crocoite-standalone --distributed … + +Alternative: IRC bot using sopel_. Use contrib/celerycrocoite.py + +~/.sopel/default.cfg + +.. code:: ini + + [core] + nick = chromebot + host = irc.efnet.fr + port = 6667 + owner = someone + extra = /path/to/crocoite/contrib + enable = celerycrocoite + channels = #somechannel + +Then in #somechannel ``chromebot: ao <url>`` + |