summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorLars-Dominik Braun <lars@6xq.net>2017-12-17 19:52:33 +0100
committerLars-Dominik Braun <lars@6xq.net>2017-12-17 19:52:33 +0100
commit5e56d12e4f2fc37f759f7e5115916adcae8642e7 (patch)
treefcd9eb5fa2f64f8bc0cbd3e6a4a90bc198b98df2
parent6879a7e6a7625129d3fbec2db8016eae07196f76 (diff)
downloadcrocoite-5e56d12e4f2fc37f759f7e5115916adcae8642e7.zip
crocoite-5e56d12e4f2fc37f759f7e5115916adcae8642e7.tar.gz
crocoite-5e56d12e4f2fc37f759f7e5115916adcae8642e7.tar.bz2
Extend README
-rw-r--r--README.rst35
1 files changed, 25 insertions, 10 deletions
diff --git a/README.rst b/README.rst
index 145477f..b2bebf2 100644
--- a/README.rst
+++ b/README.rst
@@ -1,8 +1,9 @@
crocoite
========
-Archive websites using Google Chrome and its DevTools protocol.
-Tested with Google Chrome 62.0.3202.89 for Linux only.
+Archive websites using `headless Google Chrome_` and its DevTools protocol.
+
+.. _headless Google Chrome: https://developers.google.com/web/updates/2017/04/headless-chrome
Dependencies
------------
@@ -11,26 +12,24 @@ Dependencies
- pychrome_
- warcio_
- html5lib_
+- Celery_
.. _pychrome: https://github.com/fate0/pychrome
.. _warcio: https://github.com/webrecorder/warcio
.. _html5lib: https://github.com/html5lib/html5lib-python
+.. _Celery: http://www.celeryproject.org/
Usage
-----
One-shot commandline interface and pywb_ playback::
- google-chrome-stable --window-size=1920,1080 --remote-debugging-port=9222 &
crocoite-standalone http://example.com/ example.com.warc.gz
rm -rf collections && wb-manager init test && wb-manager add test example.com.warc.gz
wayback &
$BROWSER http://localhost:8080
-For `headless Google Chrome`_ add the parameters ``--headless --disable-gpu``.
-
.. _pywb: https://github.com/ikreymer/pywb
-.. _headless Google Chrome: https://developers.google.com/web/updates/2017/04/headless-chrome
Injecting JavaScript
^^^^^^^^^^^^^^^^^^^^
@@ -86,11 +85,16 @@ Start a Celery worker::
Then queue archive job::
- crocoite-standalone --distributed …
+ crocoite-standalone --distributed http://example.com ''
-Alternative: IRC bot using sopel_. Use contrib/celerycrocoite.py
+The worker will create a temporary file named according to ``warc_filename`` in
+``/tmp`` while archiving and move it to ``/tmp/finished`` when done.
-~/.sopel/default.cfg
+IRC bot
+^^^^^^^
+
+Configure sopel_ (``~/.sopel/default.cfg``) to use the plugin located in
+``contrib/celerycrocoite.py``
.. code:: ini
@@ -103,5 +107,16 @@ Alternative: IRC bot using sopel_. Use contrib/celerycrocoite.py
enable = celerycrocoite
channels = #somechannel
-Then in #somechannel ``chromebot: ao <url>``
+Then start it by running ``sopel``. The bot must be addressed directly (i.e.
+``chromebot: <command>``). The following commands are currently supported:
+
+ao <url>
+ Archives <url> and all of its resources (images, css, …). A unique UID
+ (UUID) is assigned to each job.
+s <uuid>
+ Get status of job with <uuid>
+r <uuid>
+ Revoke job with <uuid>. If it started already the job will be killed.
+
+.. _sopel: https://sopel.chat/