Age | Commit message (Collapse) | Author | Files | Lines | |
---|---|---|---|---|---|
2018-12-01 | behavior: Add selector test cases | Lars-Dominik Braun | 1 | -0/+78 | |
Fixes #3. | |||||
2018-12-01 | behavior: Move click script data to external file | Lars-Dominik Braun | 4 | -149/+169 | |
First step of issue #3 | |||||
2018-12-01 | cli: Fix --behavior | Lars-Dominik Braun | 1 | -2/+3 | |
2018-11-28 | behavior: Expand issue comments on GitHub | Lars-Dominik Braun | 1 | -0/+6 | |
2018-11-26 | behavior: Close Facebook’s nag screen | Lars-Dominik Braun | 1 | -1/+1 | |
Worked previously, broken by a site update. | |||||
2018-11-25 | behavior: Turn scroll JS code into class | Lars-Dominik Braun | 2 | -27/+33 | |
2018-11-25 | single: Graceful ^C | Lars-Dominik Braun | 2 | -2/+13 | |
Allow cancellation of timeout wait. | |||||
2018-11-24 | behavior: Never scroll html/body elements | Lars-Dominik Braun | 1 | -1/+1 | |
Fixes weird positioning of elements tethered to viewport top. | |||||
2018-11-24 | behavior: Fix scrolling | Lars-Dominik Braun | 4 | -42/+49 | |
- Introduce stop() method callable from Python. Looks like the old method (global variable) was not working (any more?). This is much better anyway. - Restore state of scrolled elements (not window). Fixes weird screenshots of twitter.com. | |||||
2018-11-24 | browser: Ignore load failures for nonexisting requests | Lars-Dominik Braun | 1 | -2/+3 | |
Fixes None dereference. | |||||
2018-11-22 | controller: Improve idle waiting | Lars-Dominik Braun | 3 | -19/+89 | |
2018-11-19 | controller: Add parameters to warcinfo | Lars-Dominik Braun | 1 | -0/+7 | |
Add parameters the grab was run with, so we can actually reproduce a run. | |||||
2018-11-19 | Coding style | Lars-Dominik Braun | 12 | -58/+44 | |
Fix a few random issues pointed out by pylint, mainly unused imports. | |||||
2018-11-17 | html: Add tests for tree walker | Lars-Dominik Braun | 1 | -1/+23 | |
2018-11-17 | logger: Add more tests | Lars-Dominik Braun | 2 | -3/+25 | |
2018-11-17 | browser: Add tests for header deserialization | Lars-Dominik Braun | 1 | -0/+39 | |
2018-11-17 | devtools: Update browser flags | Lars-Dominik Braun | 1 | -0/+12 | |
Add a few more that seem reasonable. | |||||
2018-11-17 | browser: clearBrowserCookies is supported unconditionally | Lars-Dominik Braun | 1 | -4/+1 | |
canClearBrowserCookies apparently has been removed from protocol 1.3. | |||||
2018-11-17 | tools: Add original HTTP header to revisit record | Lars-Dominik Braun | 2 | -11/+13 | |
The payloads may be the same, but the headers are usually not. | |||||
2018-11-17 | click: Add gab.ai | Lars-Dominik Braun | 1 | -0/+10 | |
Load more posts on profile page and more comments and replies on individual post pages. | |||||
2018-11-14 | Async chrome process startup | Lars-Dominik Braun | 6 | -157/+161 | |
Move it to .devtools. Seems more fitting. | |||||
2018-11-10 | tools: Fix WARC merging | Lars-Dominik Braun | 2 | -18/+205 | |
WARC-Target-URI was taken from the previous record, even if the URI was different. This essentially removes the revisited URL from the archive. Also add a few tests. And boy, warcio is a mess. | |||||
2018-11-08 | devtools: Disable websocket pings to Chrome | Lars-Dominik Braun | 2 | -1/+12 | |
Chrome does not like that. | |||||
2018-11-06 | Switch single mode to asyncio | Lars-Dominik Braun | 5 | -175/+141 | |
This is a direct port to asyncio without any design changes. These need to happen in further refinements. Fixes issue #1. | |||||
2018-11-06 | Switch site loader to async DevTools communication | Lars-Dominik Braun | 2 | -229/+236 | |
2018-11-06 | Add simple asyncio-based DevTool communication | Lars-Dominik Braun | 2 | -0/+406 | |
Inspired by pychrome/aiochrome, but includes crash handling and async get() instead of callbacks. | |||||
2018-11-03 | html: Add tests for tag/attribute stripping | Lars-Dominik Braun | 1 | -0/+38 | |
2018-10-30 | recursive: Actually stop the grab when canceled | Lars-Dominik Braun | 1 | -1/+3 | |
This change was lost during the merge of 958563a3602780b48599c27acf212139c2e6904d. | |||||
2018-10-30 | Reduce idle wait time after stopping page | Lars-Dominik Braun | 1 | -4/+4 | |
2018-10-30 | Increase default timeouts | Lars-Dominik Braun | 1 | -2/+2 | |
These are more sane than the previous super-short defaults. Obviously this will slow down recursive crawls. | |||||
2018-10-23 | single: Set and recursive: check exit status | Lars-Dominik Braun | 2 | -12/+34 | |
Use exit status to signal something is wrong. Check it within recursive, increment crashed counter and do not move the resulting WARC, it might be broken. | |||||
2018-10-22 | behavior: Unload script only if the handle is valid | Lars-Dominik Braun | 1 | -2/+4 | |
For some reason with Google Chrome 70 this is not the case any more. | |||||
2018-10-14 | irc: Add PoC dashboard | Lars-Dominik Braun | 3 | -16/+119 | |
Using websockets, vue and bulma. | |||||
2018-10-14 | irc: Graceful bot shutdown | Lars-Dominik Braun | 3 | -16/+110 | |
Wait for remaining jobs to finish without accepting new ones, but still allow some interaction with the bot (status/revoke). | |||||
2018-10-11 | recursive: Gracefully shut down on SIGINT/TERM | Lars-Dominik Braun | 2 | -4/+18 | |
2018-10-10 | Add timezone to logger dates | Lars-Dominik Braun | 1 | -1/+3 | |
UTC everywhere. Make that clear. | |||||
2018-10-03 | controller: Depth limit does not work with i>1 | Lars-Dominik Braun | 1 | -1/+3 | |
No easy way to fix this, so just limit to [0, 1] for now. | |||||
2018-10-03 | irc: Fix mode parsing | Lars-Dominik Braun | 2 | -7/+37 | |
Ignore unsupported modes, add tests. | |||||
2018-10-02 | irc: Refactoring/beautification | Lars-Dominik Braun | 2 | -101/+266 | |
Add logging, split bot into abstract bot implementation and actual chromebot implementation, move some reusable checks into decorators. | |||||
2018-09-29 | Add documentation | Lars-Dominik Braun | 2 | -3/+9 | |
For -recursive and -irc | |||||
2018-09-29 | irc: Limit number of processes spawned | Lars-Dominik Braun | 2 | -21/+25 | |
2018-09-29 | Add simple IRC bot | Lars-Dominik Braun | 2 | -0/+273 | |
chromebot is back! Dropping sopel, because it does not work well with asyncio. | |||||
2018-09-25 | Prevent recursing into arbitrary schemes | Lars-Dominik Braun | 1 | -1/+9 | |
HTTP(S) only. | |||||
2018-09-25 | Parallelize recursive grabs | Lars-Dominik Braun | 2 | -5/+17 | |
❤️ asyncio. | |||||
2018-09-25 | Add recursive controller | Lars-Dominik Braun | 2 | -1/+169 | |
Simple and sequential. | |||||
2018-09-25 | Immediately flush logger | Lars-Dominik Braun | 1 | -0/+2 | |
Consumers can read the latest gossip faster now. | |||||
2018-09-25 | Log extracted links | Lars-Dominik Braun | 2 | -2/+25 | |
2018-08-21 | Remove celery and recursion | Lars-Dominik Braun | 3 | -317/+23 | |
Gonna rewrite that properly. | |||||
2018-08-17 | behavior: Load more comments from Facebook | Lars-Dominik Braun | 1 | -0/+4 | |
2018-08-05 | test_browser: Properly handle failed requests | Lars-Dominik Braun | 2 | -15/+14 | |
Fixes test failures. Very fragile code unfortunately. |