diff options
-rw-r--r-- | README.rst | 35 |
1 files changed, 25 insertions, 10 deletions
@@ -1,8 +1,9 @@ crocoite ======== -Archive websites using Google Chrome and its DevTools protocol. -Tested with Google Chrome 62.0.3202.89 for Linux only. +Archive websites using `headless Google Chrome_` and its DevTools protocol. + +.. _headless Google Chrome: https://developers.google.com/web/updates/2017/04/headless-chrome Dependencies ------------ @@ -11,26 +12,24 @@ Dependencies - pychrome_ - warcio_ - html5lib_ +- Celery_ .. _pychrome: https://github.com/fate0/pychrome .. _warcio: https://github.com/webrecorder/warcio .. _html5lib: https://github.com/html5lib/html5lib-python +.. _Celery: http://www.celeryproject.org/ Usage ----- One-shot commandline interface and pywb_ playback:: - google-chrome-stable --window-size=1920,1080 --remote-debugging-port=9222 & crocoite-standalone http://example.com/ example.com.warc.gz rm -rf collections && wb-manager init test && wb-manager add test example.com.warc.gz wayback & $BROWSER http://localhost:8080 -For `headless Google Chrome`_ add the parameters ``--headless --disable-gpu``. - .. _pywb: https://github.com/ikreymer/pywb -.. _headless Google Chrome: https://developers.google.com/web/updates/2017/04/headless-chrome Injecting JavaScript ^^^^^^^^^^^^^^^^^^^^ @@ -86,11 +85,16 @@ Start a Celery worker:: Then queue archive job:: - crocoite-standalone --distributed … + crocoite-standalone --distributed http://example.com '' -Alternative: IRC bot using sopel_. Use contrib/celerycrocoite.py +The worker will create a temporary file named according to ``warc_filename`` in +``/tmp`` while archiving and move it to ``/tmp/finished`` when done. -~/.sopel/default.cfg +IRC bot +^^^^^^^ + +Configure sopel_ (``~/.sopel/default.cfg``) to use the plugin located in +``contrib/celerycrocoite.py`` .. code:: ini @@ -103,5 +107,16 @@ Alternative: IRC bot using sopel_. Use contrib/celerycrocoite.py enable = celerycrocoite channels = #somechannel -Then in #somechannel ``chromebot: ao <url>`` +Then start it by running ``sopel``. The bot must be addressed directly (i.e. +``chromebot: <command>``). The following commands are currently supported: + +ao <url> + Archives <url> and all of its resources (images, css, …). A unique UID + (UUID) is assigned to each job. +s <uuid> + Get status of job with <uuid> +r <uuid> + Revoke job with <uuid>. If it started already the job will be killed. + +.. _sopel: https://sopel.chat/ |