From 4905ac083b5f570988446a2b9dde3a8747020f1a Mon Sep 17 00:00:00 2001 From: Lars-Dominik Braun Date: Thu, 11 Jul 2019 10:59:05 +0200 Subject: Cookie injection support MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add command-line options injecting individual cookies or cookie file into Chrome. Provide default cookie file. This changes the IRC bot’s command splitting to shlex.split, which allows shell-like argument quoting. Fixes #7. --- doc/usage.rst | 80 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----- 1 file changed, 74 insertions(+), 6 deletions(-) (limited to 'doc') diff --git a/doc/usage.rst b/doc/usage.rst index 9bba693..c18f9fb 100644 --- a/doc/usage.rst +++ b/doc/usage.rst @@ -24,6 +24,8 @@ Otherwise page screenshots may be unusable due to missing glyphs. Recursion ^^^^^^^^^ +.. program:: crocoite + By default crocoite will only retrieve the URL specified on the command line. However it can follow links as well. There’s currently two recursion strategies available, depth- and prefix-based. @@ -59,16 +61,18 @@ each page of a single job and should always be used. When running a recursive job, increasing the concurrency (i.e. how many pages are fetched at the same time) can speed up the process. For example you can -pass :option:`-j 4` to retrieve four pages at the same time. Keep in mind that each -process starts a full browser that requires a lot of resources (one to two GB -of RAM and one or two CPU cores). +pass :option:`-j` :samp:`4` to retrieve four pages at the same time. Keep in mind +that each process starts a full browser that requires a lot of resources (one +to two GB of RAM and one or two CPU cores). Customizing ^^^^^^^^^^^ -Under the hood crocoite starts one instance of :program:`crocoite-single` to fetch -each page. You can customize its options by appending a command template like -this: +.. program:: crocoite-single + +Under the hood :program:`crocoite` starts one instance of +:program:`crocoite-single` to fetch each page. You can customize its options by +appending a command template like this: .. code:: bash @@ -79,6 +83,70 @@ This reduces the global timeout to 5 seconds and ignores TLS errors. If an option is prefixed with an exclamation mark (``!``) it will not be expanded. This is useful for passing :option:`--warcinfo`, which expects JSON-encoded data. +Command line options +^^^^^^^^^^^^^^^^^^^^ + +Below is a list of all command line arguments available: + +.. program:: crocoite + +crocoite +++++++++ + +Front-end with recursion support and simple job management. + +.. option:: -j N, --concurrency N + + Maximum number of concurrent fetch jobs. + +.. option:: -r POLICY, --recursion POLICY + + Enables recursion based on POLICY, which can be a positive integer + (recursion depth) or the string :kbd:`prefix`. + +.. option:: --tempdir DIR + + Directory for temporary WARC files. + +.. program:: crocoite-single + +crocoite-single ++++++++++++++++ + +Back-end to fetch a single page. + +.. option:: -b SET-COOKIE, --cookie SET-COOKIE + + Add cookie to browser’s cookie jar. This option always *appends* cookies, + replacing those provided by :option:`-c`. + + .. versionadded:: 1.1 + +.. option:: -c FILE, --cookie-jar FILE + + Load cookies from FILE. :program:`crocoite` provides a default cookie file, + which contains cookies to, for example, circumvent age restrictions. This + option *replaces* that default file. + + .. versionadded:: 1.1 + +.. option:: --idle-timeout SEC + + Time after which a page is considered “idle”. + +.. option:: -k, --insecure + + Allow insecure connections, i.e. self-signed ore expired HTTPS certificates. + +.. option:: --timeout SEC + + Global archiving timeout. + + +.. option:: --warcinfo JSON + + Inject additional JSON-encoded information into the resulting WARC. + IRC bot ^^^^^^^ -- cgit v1.2.3