From 0b8a8e88a3c33c14e52241190ee6478cb2acd49d Mon Sep 17 00:00:00 2001 From: Lars-Dominik Braun Date: Fri, 17 Nov 2017 19:54:30 +0100 Subject: Initial import --- README.rst | 40 ++++++++++++++++++++++++++++++++++++++++ 1 file changed, 40 insertions(+) create mode 100644 README.rst (limited to 'README.rst') diff --git a/README.rst b/README.rst new file mode 100644 index 0000000..7eea272 --- /dev/null +++ b/README.rst @@ -0,0 +1,40 @@ +crocoite +======== + +Archive websites using Google Chrome and its DevTools protocol. +Tested with Google Chrome 62.0.3202.89 for Linux only. + +Dependencies +------------ + +- Python 3 +- pychrome_ +- warcio_ + +.. _pychrome: https://github.com/fate0/pychrome +.. _warcio: https://github.com/webrecorder/warcio + +Usage +----- + +One-shot commandline interface and pywb_ playback:: + + google-chrome-stable --window-size=1920,1080 --remote-debugging-port=9222 & + crocoite-standalone http://example.com/ example.com.warc.gz + rm -rf collections && wb-manager init test && wb-manager add test example.com.warc.gz + wayback & + $BROWSER http://localhost:8080 + +For `headless Google Chrome`_ add the parameters ``--headless --disable-gpu``. + +.. _pywb: https://github.com/ikreymer/pywb +.. _headless Google Chrome: https://developers.google.com/web/updates/2017/04/headless-chrome + +Caveats +------- + +- Original HTTP requests/responses are not available. They are rebuilt from + data available. Character encoding for text documents is changed to UTF-8. +- Some sites request different assets based on screen resolution, some fetch + different scripts based on user agent. + -- cgit v1.2.3