From 5e56d12e4f2fc37f759f7e5115916adcae8642e7 Mon Sep 17 00:00:00 2001 From: Lars-Dominik Braun Date: Sun, 17 Dec 2017 19:52:33 +0100 Subject: Extend README --- README.rst | 35 +++++++++++++++++++++++++---------- 1 file changed, 25 insertions(+), 10 deletions(-) diff --git a/README.rst b/README.rst index 145477f..b2bebf2 100644 --- a/README.rst +++ b/README.rst @@ -1,8 +1,9 @@ crocoite ======== -Archive websites using Google Chrome and its DevTools protocol. -Tested with Google Chrome 62.0.3202.89 for Linux only. +Archive websites using `headless Google Chrome_` and its DevTools protocol. + +.. _headless Google Chrome: https://developers.google.com/web/updates/2017/04/headless-chrome Dependencies ------------ @@ -11,26 +12,24 @@ Dependencies - pychrome_ - warcio_ - html5lib_ +- Celery_ .. _pychrome: https://github.com/fate0/pychrome .. _warcio: https://github.com/webrecorder/warcio .. _html5lib: https://github.com/html5lib/html5lib-python +.. _Celery: http://www.celeryproject.org/ Usage ----- One-shot commandline interface and pywb_ playback:: - google-chrome-stable --window-size=1920,1080 --remote-debugging-port=9222 & crocoite-standalone http://example.com/ example.com.warc.gz rm -rf collections && wb-manager init test && wb-manager add test example.com.warc.gz wayback & $BROWSER http://localhost:8080 -For `headless Google Chrome`_ add the parameters ``--headless --disable-gpu``. - .. _pywb: https://github.com/ikreymer/pywb -.. _headless Google Chrome: https://developers.google.com/web/updates/2017/04/headless-chrome Injecting JavaScript ^^^^^^^^^^^^^^^^^^^^ @@ -86,11 +85,16 @@ Start a Celery worker:: Then queue archive job:: - crocoite-standalone --distributed … + crocoite-standalone --distributed http://example.com '' -Alternative: IRC bot using sopel_. Use contrib/celerycrocoite.py +The worker will create a temporary file named according to ``warc_filename`` in +``/tmp`` while archiving and move it to ``/tmp/finished`` when done. -~/.sopel/default.cfg +IRC bot +^^^^^^^ + +Configure sopel_ (``~/.sopel/default.cfg``) to use the plugin located in +``contrib/celerycrocoite.py`` .. code:: ini @@ -103,5 +107,16 @@ Alternative: IRC bot using sopel_. Use contrib/celerycrocoite.py enable = celerycrocoite channels = #somechannel -Then in #somechannel ``chromebot: ao `` +Then start it by running ``sopel``. The bot must be addressed directly (i.e. +``chromebot: ``). The following commands are currently supported: + +ao + Archives and all of its resources (images, css, …). A unique UID + (UUID) is assigned to each job. +s + Get status of job with +r + Revoke job with . If it started already the job will be killed. + +.. _sopel: https://sopel.chat/ -- cgit v1.2.3