Age | Commit message (Collapse) | Author | Files | Lines | |
---|---|---|---|---|---|
2018-11-24 | browser: Ignore load failures for nonexisting requests | Lars-Dominik Braun | 1 | -2/+3 | |
Fixes None dereference. | |||||
2018-11-22 | travis: Switch to xenial | Lars-Dominik Braun | 1 | -1/+3 | |
The image offers Python 3.7 and 3.8-dev | |||||
2018-11-22 | controller: Improve idle waiting | Lars-Dominik Braun | 3 | -19/+89 | |
2018-11-19 | controller: Add parameters to warcinfo | Lars-Dominik Braun | 1 | -0/+7 | |
Add parameters the grab was run with, so we can actually reproduce a run. | |||||
2018-11-19 | Coding style | Lars-Dominik Braun | 12 | -58/+44 | |
Fix a few random issues pointed out by pylint, mainly unused imports. | |||||
2018-11-17 | html: Add tests for tree walker | Lars-Dominik Braun | 1 | -1/+23 | |
2018-11-17 | logger: Add more tests | Lars-Dominik Braun | 2 | -3/+25 | |
2018-11-17 | browser: Add tests for header deserialization | Lars-Dominik Braun | 1 | -0/+39 | |
2018-11-17 | devtools: Update browser flags | Lars-Dominik Braun | 1 | -0/+12 | |
Add a few more that seem reasonable. | |||||
2018-11-17 | browser: clearBrowserCookies is supported unconditionally | Lars-Dominik Braun | 1 | -4/+1 | |
canClearBrowserCookies apparently has been removed from protocol 1.3. | |||||
2018-11-17 | tools: Add original HTTP header to revisit record | Lars-Dominik Braun | 2 | -11/+13 | |
The payloads may be the same, but the headers are usually not. | |||||
2018-11-17 | click: Add gab.ai | Lars-Dominik Braun | 1 | -0/+10 | |
Load more posts on profile page and more comments and replies on individual post pages. | |||||
2018-11-14 | Async chrome process startup | Lars-Dominik Braun | 6 | -157/+161 | |
Move it to .devtools. Seems more fitting. | |||||
2018-11-10 | tools: Fix entry point | Lars-Dominik Braun | 1 | -1/+1 | |
2018-11-10 | tools: Fix WARC merging | Lars-Dominik Braun | 2 | -18/+205 | |
WARC-Target-URI was taken from the previous record, even if the URI was different. This essentially removes the revisited URL from the archive. Also add a few tests. And boy, warcio is a mess. | |||||
2018-11-09 | Add xml report for codecov.io | Lars-Dominik Braun | 2 | -1/+4 | |
2018-11-09 | Add codecov.io | Lars-Dominik Braun | 2 | -0/+5 | |
2018-11-08 | Travis has no version 3.7 yet | Lars-Dominik Braun | 1 | -1/+2 | |
Use -dev | |||||
2018-11-08 | Drop support for Python <3.6 | Lars-Dominik Braun | 2 | -4/+2 | |
2018-11-08 | Update README | Lars-Dominik Braun | 1 | -2/+4 | |
Dependency changes after asyncio transition. | |||||
2018-11-08 | devtools: Disable websocket pings to Chrome | Lars-Dominik Braun | 2 | -1/+12 | |
Chrome does not like that. | |||||
2018-11-06 | Switch single mode to asyncio | Lars-Dominik Braun | 6 | -176/+141 | |
This is a direct port to asyncio without any design changes. These need to happen in further refinements. Fixes issue #1. | |||||
2018-11-06 | Switch site loader to async DevTools communication | Lars-Dominik Braun | 2 | -229/+236 | |
2018-11-06 | Add simple asyncio-based DevTool communication | Lars-Dominik Braun | 4 | -1/+412 | |
Inspired by pychrome/aiochrome, but includes crash handling and async get() instead of callbacks. | |||||
2018-11-03 | html: Add tests for tag/attribute stripping | Lars-Dominik Braun | 1 | -0/+38 | |
2018-10-30 | recursive: Actually stop the grab when canceled | Lars-Dominik Braun | 1 | -1/+3 | |
This change was lost during the merge of 958563a3602780b48599c27acf212139c2e6904d. | |||||
2018-10-30 | Reduce idle wait time after stopping page | Lars-Dominik Braun | 1 | -4/+4 | |
2018-10-30 | Increase default timeouts | Lars-Dominik Braun | 1 | -2/+2 | |
These are more sane than the previous super-short defaults. Obviously this will slow down recursive crawls. | |||||
2018-10-23 | single: Set and recursive: check exit status | Lars-Dominik Braun | 2 | -12/+34 | |
Use exit status to signal something is wrong. Check it within recursive, increment crashed counter and do not move the resulting WARC, it might be broken. | |||||
2018-10-22 | behavior: Unload script only if the handle is valid | Lars-Dominik Braun | 1 | -2/+4 | |
For some reason with Google Chrome 70 this is not the case any more. | |||||
2018-10-14 | irc: Add PoC dashboard | Lars-Dominik Braun | 7 | -16/+277 | |
Using websockets, vue and bulma. | |||||
2018-10-14 | irc: Graceful bot shutdown | Lars-Dominik Braun | 3 | -16/+110 | |
Wait for remaining jobs to finish without accepting new ones, but still allow some interaction with the bot (status/revoke). | |||||
2018-10-11 | recursive: Gracefully shut down on SIGINT/TERM | Lars-Dominik Braun | 2 | -4/+18 | |
2018-10-10 | Add timezone to logger dates | Lars-Dominik Braun | 2 | -1/+4 | |
UTC everywhere. Make that clear. | |||||
2018-10-03 | controller: Depth limit does not work with i>1 | Lars-Dominik Braun | 1 | -1/+3 | |
No easy way to fix this, so just limit to [0, 1] for now. | |||||
2018-10-03 | irc: Fix mode parsing | Lars-Dominik Braun | 2 | -7/+37 | |
Ignore unsupported modes, add tests. | |||||
2018-10-02 | irc: Refactoring/beautification | Lars-Dominik Braun | 2 | -101/+266 | |
Add logging, split bot into abstract bot implementation and actual chromebot implementation, move some reusable checks into decorators. | |||||
2018-09-29 | Add documentation | Lars-Dominik Braun | 3 | -3/+44 | |
For -recursive and -irc | |||||
2018-09-29 | irc: Limit number of processes spawned | Lars-Dominik Braun | 2 | -21/+25 | |
2018-09-29 | Add simple IRC bot | Lars-Dominik Braun | 3 | -0/+275 | |
chromebot is back! Dropping sopel, because it does not work well with asyncio. | |||||
2018-09-25 | Prevent recursing into arbitrary schemes | Lars-Dominik Braun | 1 | -1/+9 | |
HTTP(S) only. | |||||
2018-09-25 | Parallelize recursive grabs | Lars-Dominik Braun | 2 | -5/+17 | |
❤️ asyncio. | |||||
2018-09-25 | Add recursive controller | Lars-Dominik Braun | 3 | -1/+170 | |
Simple and sequential. | |||||
2018-09-25 | Immediately flush logger | Lars-Dominik Braun | 1 | -0/+2 | |
Consumers can read the latest gossip faster now. | |||||
2018-09-25 | Log extracted links | Lars-Dominik Braun | 2 | -2/+25 | |
2018-08-21 | Remove celery and recursion | Lars-Dominik Braun | 6 | -609/+24 | |
Gonna rewrite that properly. | |||||
2018-08-19 | README: Add rationale | Lars-Dominik Braun | 1 | -25/+87 | |
Explain a few design decisions | |||||
2018-08-17 | behavior: Load more comments from Facebook | Lars-Dominik Braun | 1 | -0/+4 | |
2018-08-05 | test_browser: Properly handle failed requests | Lars-Dominik Braun | 2 | -15/+14 | |
Fixes test failures. Very fragile code unfortunately. | |||||
2018-08-04 | Properly handle failure to retrieve request body | Lars-Dominik Braun | 3 | -5/+50 | |
Just truncate the WARC record like we do with responses. Also add a few tests, but they’re not covering the call to getRequestPostData. Not sure what we have to do here. |