diff options
author | Lars-Dominik Braun <lars@6xq.net> | 2017-11-17 19:54:30 +0100 |
---|---|---|
committer | Lars-Dominik Braun <lars@6xq.net> | 2017-11-17 19:54:30 +0100 |
commit | 0b8a8e88a3c33c14e52241190ee6478cb2acd49d (patch) | |
tree | 784b8672d65527bf22227de782b9d512acfba2cb /README.rst | |
download | crocoite-0b8a8e88a3c33c14e52241190ee6478cb2acd49d.tar.gz crocoite-0b8a8e88a3c33c14e52241190ee6478cb2acd49d.tar.bz2 crocoite-0b8a8e88a3c33c14e52241190ee6478cb2acd49d.zip |
Initial import
Diffstat (limited to 'README.rst')
-rw-r--r-- | README.rst | 40 |
1 files changed, 40 insertions, 0 deletions
diff --git a/README.rst b/README.rst new file mode 100644 index 0000000..7eea272 --- /dev/null +++ b/README.rst @@ -0,0 +1,40 @@ +crocoite +======== + +Archive websites using Google Chrome and its DevTools protocol. +Tested with Google Chrome 62.0.3202.89 for Linux only. + +Dependencies +------------ + +- Python 3 +- pychrome_ +- warcio_ + +.. _pychrome: https://github.com/fate0/pychrome +.. _warcio: https://github.com/webrecorder/warcio + +Usage +----- + +One-shot commandline interface and pywb_ playback:: + + google-chrome-stable --window-size=1920,1080 --remote-debugging-port=9222 & + crocoite-standalone http://example.com/ example.com.warc.gz + rm -rf collections && wb-manager init test && wb-manager add test example.com.warc.gz + wayback & + $BROWSER http://localhost:8080 + +For `headless Google Chrome`_ add the parameters ``--headless --disable-gpu``. + +.. _pywb: https://github.com/ikreymer/pywb +.. _headless Google Chrome: https://developers.google.com/web/updates/2017/04/headless-chrome + +Caveats +------- + +- Original HTTP requests/responses are not available. They are rebuilt from + data available. Character encoding for text documents is changed to UTF-8. +- Some sites request different assets based on screen resolution, some fetch + different scripts based on user agent. + |