summaryrefslogtreecommitdiff
path: root/crocoite/warc.py
AgeCommit message (Collapse)AuthorFilesLines
2017-12-22Add simple stats-keeping SiteLoaderLars-Dominik Braun1-4/+6
2017-12-22Don’t write WARC record if body cannot be retrievedLars-Dominik Braun1-19/+48
+refactoring.
2017-12-20Fix HTTP headers using the same key more than onceLars-Dominik Braun1-2/+15
This is an undocumented DevTools feature.
2017-12-19Serialize WARC writingLars-Dominik Braun1-0/+35
Logger and SiteWriter both access .write_record() concurrently, which can corrupt WARC files. Move the writer to its own thread and decouple it with a queue. Since we’re probably I/O-bound this may speed up writeback as well.
2017-12-17Don’t fetch redirected request bodyLars-Dominik Braun1-8/+12
We can’t do that safely due to a race-condition.
2017-11-29Use Chrome’s timestamps as WARC-DateLars-Dominik Braun1-0/+6
2017-11-29RefactoringLars-Dominik Braun1-0/+174
Reusable browser communication and WARC writing.