summaryrefslogtreecommitdiff
path: root/README.rst
diff options
context:
space:
mode:
authorLars-Dominik Braun <lars@6xq.net>2017-11-17 19:54:30 +0100
committerLars-Dominik Braun <lars@6xq.net>2017-11-17 19:54:30 +0100
commit0b8a8e88a3c33c14e52241190ee6478cb2acd49d (patch)
tree784b8672d65527bf22227de782b9d512acfba2cb /README.rst
downloadcrocoite-0b8a8e88a3c33c14e52241190ee6478cb2acd49d.tar.gz
crocoite-0b8a8e88a3c33c14e52241190ee6478cb2acd49d.tar.bz2
crocoite-0b8a8e88a3c33c14e52241190ee6478cb2acd49d.zip
Initial import
Diffstat (limited to 'README.rst')
-rw-r--r--README.rst40
1 files changed, 40 insertions, 0 deletions
diff --git a/README.rst b/README.rst
new file mode 100644
index 0000000..7eea272
--- /dev/null
+++ b/README.rst
@@ -0,0 +1,40 @@
+crocoite
+========
+
+Archive websites using Google Chrome and its DevTools protocol.
+Tested with Google Chrome 62.0.3202.89 for Linux only.
+
+Dependencies
+------------
+
+- Python 3
+- pychrome_
+- warcio_
+
+.. _pychrome: https://github.com/fate0/pychrome
+.. _warcio: https://github.com/webrecorder/warcio
+
+Usage
+-----
+
+One-shot commandline interface and pywb_ playback::
+
+ google-chrome-stable --window-size=1920,1080 --remote-debugging-port=9222 &
+ crocoite-standalone http://example.com/ example.com.warc.gz
+ rm -rf collections && wb-manager init test && wb-manager add test example.com.warc.gz
+ wayback &
+ $BROWSER http://localhost:8080
+
+For `headless Google Chrome`_ add the parameters ``--headless --disable-gpu``.
+
+.. _pywb: https://github.com/ikreymer/pywb
+.. _headless Google Chrome: https://developers.google.com/web/updates/2017/04/headless-chrome
+
+Caveats
+-------
+
+- Original HTTP requests/responses are not available. They are rebuilt from
+ data available. Character encoding for text documents is changed to UTF-8.
+- Some sites request different assets based on screen resolution, some fetch
+ different scripts based on user agent.
+