summaryrefslogtreecommitdiff
path: root/README.rst
diff options
context:
space:
mode:
authorLars-Dominik Braun <lars@6xq.net>2017-12-10 12:31:07 +0100
committerLars-Dominik Braun <lars@6xq.net>2017-12-17 16:40:16 +0100
commit84c3f69293fa79d752127410c7468038c907c96a (patch)
tree4a71dcddd6abc6eeda30ed40bd78d91518efde38 /README.rst
parentf816319081d5253974ddb70b655d55f4a880a77a (diff)
downloadcrocoite-84c3f69293fa79d752127410c7468038c907c96a.tar.gz
crocoite-84c3f69293fa79d752127410c7468038c907c96a.tar.bz2
crocoite-84c3f69293fa79d752127410c7468038c907c96a.zip
Add distributed archiving
Using celery. Also adds a plugin for the IRC bot sopel. Code still needs some love, but it should work.
Diffstat (limited to 'README.rst')
-rw-r--r--README.rst38
1 files changed, 38 insertions, 0 deletions
diff --git a/README.rst b/README.rst
index 3a7aa7c..3d5af5f 100644
--- a/README.rst
+++ b/README.rst
@@ -66,3 +66,41 @@ also saved. This causes its own set of issues though:
- JavaScript-based navigation does not work.
+Distributed crawling
+--------------------
+
+Configure using celeryconfig.py
+
+.. code:: python
+
+ broker_url = 'pyamqp://'
+ result_backend = 'rpc://'
+ warc_filename = '{domain}-{date}-{id}.warc.gz'
+ temp_dir = '/tmp/'
+ finished_dir = '/tmp/finished'
+
+Start a Celery worker::
+
+ celery -A crocoite.cli worker --loglevel=info
+
+Then queue archive job::
+
+ crocoite-standalone --distributed …
+
+Alternative: IRC bot using sopel_. Use contrib/celerycrocoite.py
+
+~/.sopel/default.cfg
+
+.. code:: ini
+
+ [core]
+ nick = chromebot
+ host = irc.efnet.fr
+ port = 6667
+ owner = someone
+ extra = /path/to/crocoite/contrib
+ enable = celerycrocoite
+ channels = #somechannel
+
+Then in #somechannel ``chromebot: ao <url>``
+