path: root/README.rst
diff options
authorLars-Dominik Braun <>2018-09-29 16:51:57 +0200
committerLars-Dominik Braun <>2018-09-29 16:51:57 +0200
commit07c34b2d004f16798c17ed479679a511c6bd2f29 (patch)
treed3b0696a23ed155ab5fad067f6ab003166343e77 /README.rst
parentcbcdde65aa667369b0890a042e5b44d6b1e377aa (diff)
Add documentation
For -recursive and -irc
Diffstat (limited to 'README.rst')
1 files changed, 35 insertions, 0 deletions
diff --git a/README.rst b/README.rst
index b1fce2c..61c5b04 100644
--- a/README.rst
+++ b/README.rst
@@ -17,10 +17,12 @@ The following dependencies must be present to run crocoite:
- pychrome_
- warcio_
- html5lib_
+- bottom_ (IRC client)
.. _pychrome:
.. _warcio:
.. _html5lib:
+.. _bottom:
It is recommended to prepare a virtualenv and let pip handle the dependency
resolution for Python packages instead:
@@ -119,6 +121,39 @@ does not work any more. Secondly it also saves a screenshot of the full page,
so even if future browsers cannot render and display the stored HTML a fully
rendered version of the website can be replayed instead.
+Advanced usage
+crocoite is built with the Unix philosophy (“do one thing and do it well”) in
+mind. Thus ``crocoite-grab`` can only save a single page. If you want recursion
+use ``crocoite-recursive``, which follows hyperlinks according to ``--policy``.
+It can either recurse a maximum number of levels or grab all pages with the
+same prefix as the start URL:
+.. code:: bash
+ crocoite-recursive --policy prefix output
+will save all pages in ``/dir/`` and below to individual files in the output
+directory ``output``. You can customize the command used to grab individual
+pages by appending it after ``output``. This way distributed grabs (ssh to a
+different machine and execute the job there, queue the command with Slurm, …)
+are possible.
+IRC bot
+A simple IRC bot (“chromebot”) is provided with the command ``crocoite-irc``.
+It reads its configuration from a config file like the example provided in
+``contrib/chromebot.ini`` and supports the following commands:
+a <url> -j <concurrency> -r <policy>
+ Archive <url> with <concurrency> processes according to recursion <policy>
+s <uuid>
+ Get job status for <uuid>
+r <uuid>
+ Revoke or abort running job with <uuid>
Related projects