summaryrefslogtreecommitdiff
path: root/doc
diff options
context:
space:
mode:
authorLars-Dominik Braun <lars@6xq.net>2019-07-11 10:59:05 +0200
committerLars-Dominik Braun <lars@6xq.net>2019-07-13 10:32:56 +0200
commit4905ac083b5f570988446a2b9dde3a8747020f1a (patch)
treec9866b6831835526e5dbee038cf48df3de628a82 /doc
parent8761275f1f569b747cb26578e1c3411e108fb8dd (diff)
downloadcrocoite-4905ac083b5f570988446a2b9dde3a8747020f1a.tar.gz
crocoite-4905ac083b5f570988446a2b9dde3a8747020f1a.tar.bz2
crocoite-4905ac083b5f570988446a2b9dde3a8747020f1a.zip
Cookie injection support
Add command-line options injecting individual cookies or cookie file into Chrome. Provide default cookie file. This changes the IRC bot’s command splitting to shlex.split, which allows shell-like argument quoting. Fixes #7.
Diffstat (limited to 'doc')
-rw-r--r--doc/usage.rst80
1 files changed, 74 insertions, 6 deletions
diff --git a/doc/usage.rst b/doc/usage.rst
index 9bba693..c18f9fb 100644
--- a/doc/usage.rst
+++ b/doc/usage.rst
@@ -24,6 +24,8 @@ Otherwise page screenshots may be unusable due to missing glyphs.
Recursion
^^^^^^^^^
+.. program:: crocoite
+
By default crocoite will only retrieve the URL specified on the command line.
However it can follow links as well. There’s currently two recursion strategies
available, depth- and prefix-based.
@@ -59,16 +61,18 @@ each page of a single job and should always be used.
When running a recursive job, increasing the concurrency (i.e. how many pages
are fetched at the same time) can speed up the process. For example you can
-pass :option:`-j 4` to retrieve four pages at the same time. Keep in mind that each
-process starts a full browser that requires a lot of resources (one to two GB
-of RAM and one or two CPU cores).
+pass :option:`-j` :samp:`4` to retrieve four pages at the same time. Keep in mind
+that each process starts a full browser that requires a lot of resources (one
+to two GB of RAM and one or two CPU cores).
Customizing
^^^^^^^^^^^
-Under the hood crocoite starts one instance of :program:`crocoite-single` to fetch
-each page. You can customize its options by appending a command template like
-this:
+.. program:: crocoite-single
+
+Under the hood :program:`crocoite` starts one instance of
+:program:`crocoite-single` to fetch each page. You can customize its options by
+appending a command template like this:
.. code:: bash
@@ -79,6 +83,70 @@ This reduces the global timeout to 5 seconds and ignores TLS errors. If an
option is prefixed with an exclamation mark (``!``) it will not be expanded.
This is useful for passing :option:`--warcinfo`, which expects JSON-encoded data.
+Command line options
+^^^^^^^^^^^^^^^^^^^^
+
+Below is a list of all command line arguments available:
+
+.. program:: crocoite
+
+crocoite
+++++++++
+
+Front-end with recursion support and simple job management.
+
+.. option:: -j N, --concurrency N
+
+ Maximum number of concurrent fetch jobs.
+
+.. option:: -r POLICY, --recursion POLICY
+
+ Enables recursion based on POLICY, which can be a positive integer
+ (recursion depth) or the string :kbd:`prefix`.
+
+.. option:: --tempdir DIR
+
+ Directory for temporary WARC files.
+
+.. program:: crocoite-single
+
+crocoite-single
++++++++++++++++
+
+Back-end to fetch a single page.
+
+.. option:: -b SET-COOKIE, --cookie SET-COOKIE
+
+ Add cookie to browser’s cookie jar. This option always *appends* cookies,
+ replacing those provided by :option:`-c`.
+
+ .. versionadded:: 1.1
+
+.. option:: -c FILE, --cookie-jar FILE
+
+ Load cookies from FILE. :program:`crocoite` provides a default cookie file,
+ which contains cookies to, for example, circumvent age restrictions. This
+ option *replaces* that default file.
+
+ .. versionadded:: 1.1
+
+.. option:: --idle-timeout SEC
+
+ Time after which a page is considered “idle”.
+
+.. option:: -k, --insecure
+
+ Allow insecure connections, i.e. self-signed ore expired HTTPS certificates.
+
+.. option:: --timeout SEC
+
+ Global archiving timeout.
+
+
+.. option:: --warcinfo JSON
+
+ Inject additional JSON-encoded information into the resulting WARC.
+
IRC bot
^^^^^^^