Age | Commit message (Collapse) | Author | Files | Lines |
|
|
|
URL’s can get quite long, overflowing the file name length limit.
Instead use sequential filenames and output metadata to stdout.
|
|
Replaces str.format, which is less readable due to its separation of
format and arguments.
|
|
Fixes #9.
|
|
In preparation for #9.
I was hoping to reuse one of schema.org’s microdata schema’s, but
neither Action (archival action) nor SoftwareApplication (version
information) seem to be suitable.
|
|
Fix a few random issues pointed out by pylint, mainly unused imports.
|
|
The payloads may be the same, but the headers are usually not.
|
|
WARC-Target-URI was taken from the previous record, even if the URI was
different. This essentially removes the revisited URL from the archive.
Also add a few tests. And boy, warcio is a mess.
|
|
Judging from the docs this is the proper way to store these resources.
Enable both for the IRC bot by default, since they won’t interfere with
IA’s wayback machine.
|
|
Move contrib/ scripts to .tools and add entry points to setup.py, rename
crocoite-standalone to crocoite-grab.
|