From ca01f82227a8b79f1cbc4f5e0be5434804dc3c0e Mon Sep 17 00:00:00 2001 From: Lars-Dominik Braun Date: Mon, 20 Nov 2017 19:19:05 +0100 Subject: Add page created from DOM snapshot --- README.rst | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) (limited to 'README.rst') diff --git a/README.rst b/README.rst index 7eea272..f66da27 100644 --- a/README.rst +++ b/README.rst @@ -10,6 +10,7 @@ Dependencies - Python 3 - pychrome_ - warcio_ +- html5lib .. _pychrome: https://github.com/fate0/pychrome .. _warcio: https://github.com/webrecorder/warcio @@ -34,7 +35,20 @@ Caveats ------- - Original HTTP requests/responses are not available. They are rebuilt from - data available. Character encoding for text documents is changed to UTF-8. -- Some sites request different assets based on screen resolution, some fetch - different scripts based on user agent. + parsed data. Character encoding for text documents is changed to UTF-8. +- Some sites request assets based on screen resolution, pixel ratio and + supported image formats (webp). Replaying those with different parameters + won’t work, since assets for those are missing. Example: missguided.com. +- Some fetch different scripts based on user agent. Example: youtube.com. +- Requests containing randomly generated JavaScript callback function names + won’t work. Example: weather.com. + +Most of these issues can be worked around by using the DOM snapshot, which is +also saved. This causes its own set of issues though: + +- JavaScript-based navigation does not work. +- Scripts modifying styles based on scrolling position are stuck at the end of + page state at the moment. Example: twitter.com +- CSS-based asset loading (screen size, pixel ratio, …) still does not work. +- Canvas contents are probably not preserved. -- cgit v1.2.3