Age | Commit message (Collapse) | Author | Files | Lines |
|
In preparation for 1.0 release:
- Correct mime types
- Add X-Crocoite-Type, so logs, scripts, dom-snapshots and screenshots
can be identified easily
- Remove random WARC headers like X-Chrome-Initiator. We don’t want to
maintain those.
- Remove non-standard urn-based package URLs. Can’t use them without a
urn-registration
|
|
Fixes 76811bd3f0b3fc8688939e31fdab2c71c89cc75b
|
|
|
|
|
|
URL’s can get quite long, overflowing the file name length limit.
Instead use sequential filenames and output metadata to stdout.
|
|
Replaces str.format, which is less readable due to its separation of
format and arguments.
|
|
Fixes #9.
|
|
In preparation for #9.
I was hoping to reuse one of schema.org’s microdata schema’s, but
neither Action (archival action) nor SoftwareApplication (version
information) seem to be suitable.
|
|
Fix a few random issues pointed out by pylint, mainly unused imports.
|
|
The payloads may be the same, but the headers are usually not.
|
|
WARC-Target-URI was taken from the previous record, even if the URI was
different. This essentially removes the revisited URL from the archive.
Also add a few tests. And boy, warcio is a mess.
|
|
Judging from the docs this is the proper way to store these resources.
Enable both for the IRC bot by default, since they won’t interfere with
IA’s wayback machine.
|
|
Move contrib/ scripts to .tools and add entry points to setup.py, rename
crocoite-standalone to crocoite-grab.
|