Age | Commit message (Collapse) | Author | Files | Lines | |
---|---|---|---|---|---|
2018-05-04 | Move page archiving logic to SinglePageController | Lars-Dominik Braun | 1 | -3/+3 | |
In preparation for recursive crawls. | |||||
2018-05-04 | Move header unfolding into Item | Lars-Dominik Braun | 1 | -21/+2 | |
2018-05-04 | Fetch request POST body | Lars-Dominik Braun | 1 | -7/+5 | |
If there is any and it was not included in the response already. | |||||
2018-04-14 | Fix base64 body detection | Lars-Dominik Braun | 1 | -1/+1 | |
Broken by commit a21d7332e33a3e47a363004196451721d449e70b | |||||
2018-03-25 | Move getResponseBody call to Item wrapper | Lars-Dominik Braun | 1 | -11/+2 | |
2017-12-25 | Increase default body size | Lars-Dominik Braun | 1 | -2/+4 | |
2017-12-24 | Refactor behavior scripts | Lars-Dominik Braun | 1 | -2/+3 | |
No functional changes, just cleanup. Replaces onload and onsnapshot events. Move screen metric emulation, DOM snapshots and screenshots here as well. | |||||
2017-12-22 | Add simple stats-keeping SiteLoader | Lars-Dominik Braun | 1 | -4/+6 | |
2017-12-22 | Don’t write WARC record if body cannot be retrieved | Lars-Dominik Braun | 1 | -19/+48 | |
+refactoring. | |||||
2017-12-20 | Fix HTTP headers using the same key more than once | Lars-Dominik Braun | 1 | -2/+15 | |
This is an undocumented DevTools feature. | |||||
2017-12-19 | Serialize WARC writing | Lars-Dominik Braun | 1 | -0/+35 | |
Logger and SiteWriter both access .write_record() concurrently, which can corrupt WARC files. Move the writer to its own thread and decouple it with a queue. Since we’re probably I/O-bound this may speed up writeback as well. | |||||
2017-12-17 | Don’t fetch redirected request body | Lars-Dominik Braun | 1 | -8/+12 | |
We can’t do that safely due to a race-condition. | |||||
2017-11-29 | Use Chrome’s timestamps as WARC-Date | Lars-Dominik Braun | 1 | -0/+6 | |
2017-11-29 | Refactoring | Lars-Dominik Braun | 1 | -0/+174 | |
Reusable browser communication and WARC writing. |