From d0e031ceff074667516130c58113e1ef7f24fa7a Mon Sep 17 00:00:00 2001
From: Lars-Dominik Braun <lars@6xq.net>
Date: Mon, 26 Feb 2018 09:50:34 +0100
Subject: Add more related work

---
 README.rst | 35 +++++++++++++++++++++++++----------
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/README.rst b/README.rst
index 59629fe..fd4ee93 100644
--- a/README.rst
+++ b/README.rst
@@ -12,15 +12,15 @@ HTML pages to adapt them to a new origin and path hierarchy (i.e.
 ``https://web.archive.org/web/<date>/<url>``). With the rise of web apps, which
 load their content dynamically, this is no longer sufficient.
 
-Let’s look at Instagram as an example for this: User’s profiles dynamically
-load content to implement “infinite scrolling”. The corresponding request is a
-GraphQL query, which returns JSON-encoded data with an application-defined
-structure.  This response includes URL’s to images, which must be rewritten as
-well, in order for replay to work correctly. So the replay software needs to
-parse and rewrite JSON as well as HTML.
+Instagram is an example for this: User’s profiles dynamically load content to
+implement “infinite scrolling”. The corresponding request is a GraphQL query,
+which returns JSON-encoded data with an application-defined structure.  This
+response includes URL’s to images, which must be rewritten as well, in order
+for replay to work correctly. So the replay software needs to parse and rewrite
+JSON as well as HTML.
 
 However, this response could have used an arbitrary serialization format and
-may contain relative URL’s or just values used in a URL template, which are
+may contain relative URL’s or just values used in a URL template. Both are
 more difficult to spot than absolute URL’s. This makes server-side rewriting
 difficult and cumbersome, perhaps even impossible.
 
@@ -30,16 +30,16 @@ Implementation
 Instead swayback relies on a new web technology called *Service Workers*. These
 can be installed for a given domain and path prefix. They basically act as a
 proxy between the browser and server, allowing them to intercept and rewrite
-any request a web app makes. Which is exactly what we need to properly replay
+any request a web app makes. This is exactly what is needed to properly replay
 archived web apps.
 
-So swayback provides an HTTP server, responing to queries for the wildcard
+swayback provides an HTTP server, responing to queries for the wildcard
 domain ``*.swayback.localhost``. The page served first installs a service
 worker and then reloads the page. Now the service worker is in control of
 network requests and rewrites a request like (for instance)
 ``www.instagram.com.swayback.localhost:5000/bluebellwooi/`` to
 ``swayback.localhost:5000/raw`` with the real URL in the POST request body.
-swayback’s server looks up that URL in the WARC files provided and and replies
+swayback’s server looks up that URL in the WARC files provided and replies
 with the original server’s response, which is then returned by the service
 worker to the browser without modification.
 
@@ -84,5 +84,20 @@ Related projects
 This approach complements efforts such as crocoite_, a web crawler based on
 Google Chrome.
 
+Reconstructive_/ipwb_
+    Uses Sevice Worker to intercept and rewrite requests. Relies on Referer
+    header. Rewrites links inside HTML pages using Regular Expressions before
+    passing them to the browser. See `Client-side Reconstruction of Composite
+    Mementos Using ServiceWorker`__.
+
+    __ http://www.cs.odu.edu/%7Emkelly/papers/2017_jcdl_serviceWorker.pdf
+pywb_
+    Uses `rewrite modules`_ to alter URLs in HTML pages/JSON
+    responses/cookies/…
+
+.. _rewrite modules: https://github.com/webrecorder/pywb/tree/master/pywb/rewrite
+.. _pywb: https://github.com/webrecorder/pywb/
 .. _crocoite: https://github.com/PromyLOPh/crocoite
+.. _Reconstructive: https://github.com/oduwsdl/Reconstructive/
+.. _ipwb: https://github.com/oduwsdl/ipwb/
 
-- 
cgit v1.2.3