diff options
author | Lars-Dominik Braun <lars@6xq.net> | 2018-08-21 11:27:05 +0200 |
---|---|---|
committer | Lars-Dominik Braun <lars@6xq.net> | 2018-08-21 13:19:47 +0200 |
commit | 53e4df3fe732417988532e5b3d8b4dc7e781a3df (patch) | |
tree | 2ed52af2b575afcb0165e03eebf6d4f4d30f965e /README.rst | |
parent | 8e5ac24c85ca9388410b2afda9a05fa4a3d9bf92 (diff) | |
download | crocoite-53e4df3fe732417988532e5b3d8b4dc7e781a3df.tar.gz crocoite-53e4df3fe732417988532e5b3d8b4dc7e781a3df.tar.bz2 crocoite-53e4df3fe732417988532e5b3d8b4dc7e781a3df.zip |
Remove celery and recursion
Gonna rewrite that properly.
Diffstat (limited to 'README.rst')
-rw-r--r-- | README.rst | 61 |
1 files changed, 0 insertions, 61 deletions
@@ -17,12 +17,10 @@ The following dependencies must be present to run crocoite: - pychrome_ - warcio_ - html5lib_ -- Celery_ (optional) .. _pychrome: https://github.com/fate0/pychrome .. _warcio: https://github.com/webrecorder/warcio .. _html5lib: https://github.com/html5lib/html5lib-python -.. _Celery: http://www.celeryproject.org/ It is recommended to prepare a virtualenv and let pip handle the dependency resolution for Python packages instead: @@ -121,65 +119,6 @@ does not work any more. Secondly it also saves a screenshot of the full page, so even if future browsers cannot render and display the stored HTML a fully rendered version of the website can be replayed instead. -Advanced usage --------------- - -crocoite offers more than just a one-shot command-line interface. - -Distributed crawling -^^^^^^^^^^^^^^^^^^^^ - -Configure using celeryconfig.py - -.. code:: python - - broker_url = 'pyamqp://' - result_backend = 'rpc://' - warc_filename = '{domain}-{date}-{id}.warc.gz' - temp_dir = '/tmp/' - finished_dir = '/tmp/finished' - -Start a Celery worker:: - - celery -A crocoite.task worker -Q crocoite.archive,crocoite.controller --loglevel=info - -Then queue archive job:: - - crocoite-grab --distributed http://example.com - -The worker will create a temporary file named according to ``warc_filename`` in -``/tmp`` while archiving and move it to ``/tmp/finished`` when done. - -IRC bot -^^^^^^^ - -Configure sopel_ (``~/.sopel/default.cfg``) to use the plugin located in -``contrib/celerycrocoite.py`` - -.. code:: ini - - [core] - nick = chromebot - host = irc.efnet.fr - port = 6667 - owner = someone - extra = /path/to/crocoite/contrib - enable = celerycrocoite - channels = #somechannel - -Then start it by running ``sopel``. The bot must be addressed directly (i.e. -``chromebot: <command>``). The following commands are currently supported: - -a <url> - Archives <url> and all of its resources (images, css, …). A unique UID - (UUID) is assigned to each job. -s <uuid> - Get status of job with <uuid> -r <uuid> - Revoke job with <uuid>. If it started already the job will be killed. - -.. _sopel: https://sopel.chat/ - Related projects ---------------- |