summaryrefslogtreecommitdiff
path: root/crocoite
AgeCommit message (Expand)AuthorFilesLines
2018-05-04Support --browser again for local crawlsLars-Dominik Braun2-2/+6
2018-05-04Add distributed recursive crawlsLars-Dominik Braun3-31/+91
2018-05-04Add support for recursive crawlsLars-Dominik Braun2-2/+115
2018-05-04browser: Replace context manager decoratorLars-Dominik Braun1-51/+66
2018-05-04behavior: Add link extraction scriptLars-Dominik Braun4-5/+43
2018-05-04Move page archiving logic to SinglePageControllerLars-Dominik Braun5-144/+198
2018-05-04Move header unfolding into ItemLars-Dominik Braun2-21/+24
2018-05-04Fetch request POST bodyLars-Dominik Braun2-8/+20
2018-05-04Test chained redirectsLars-Dominik Braun1-12/+32
2018-04-20Save screenshot of entire pageLars-Dominik Braun1-6/+16
2018-04-14Fix base64 body detectionLars-Dominik Braun2-10/+10
2018-04-14Add timeout to request body fetchLars-Dominik Braun1-3/+4
2018-04-14Handle JavaScript dialogsLars-Dominik Braun1-2/+37
2018-04-04behavior: Add selector for YouTube.Lars-Dominik Braun1-0/+6
2018-03-30Add click selectors for InstagramLars-Dominik Braun1-0/+8
2018-03-25Add a few simple testsLars-Dominik Braun1-0/+190
2018-03-25Replace deprecated logger.warnLars-Dominik Braun1-3/+3
2018-03-25ChromeService: Close listening socketLars-Dominik Braun1-0/+1
2018-03-25Move getResponseBody call to Item wrapperLars-Dominik Braun2-13/+21
2018-03-18browser: Don’t overwrite LogEntry’s argsLars-Dominik Braun1-1/+1
2018-03-18behavior: Add click selectors for redditLars-Dominik Braun1-7/+27
2018-03-05Add generic click behavior scriptLars-Dominik Braun3-37/+119
2018-03-04Remove instagram behavior scriptLars-Dominik Braun2-27/+1
2018-01-20behavior: Scroll all DOM elementsLars-Dominik Braun1-0/+6
2018-01-20twitter: Expand “more replies” linksLars-Dominik Braun1-8/+21
2017-12-27Log messages from browser consoleLars-Dominik Braun1-0/+12
2017-12-25Increase default body sizeLars-Dominik Braun3-5/+34
2017-12-24Refactor behavior scriptsLars-Dominik Braun6-172/+288
2017-12-23Set fake finished response for redirectsLars-Dominik Braun1-1/+4
2017-12-23Drain tab event queue before stoppingLars-Dominik Braun1-0/+2
2017-12-22Add simple stats-keeping SiteLoaderLars-Dominik Braun3-9/+46
2017-12-22SiteLoader: Save entire finished responseLars-Dominik Braun1-2/+9
2017-12-22Don’t write WARC record if body cannot be retrievedLars-Dominik Braun1-19/+48
2017-12-20Increase hardcoded max timeoutsLars-Dominik Braun1-2/+2
2017-12-20Fix HTTP headers using the same key more than onceLars-Dominik Braun1-2/+15
2017-12-19Serialize WARC writingLars-Dominik Braun2-3/+38
2017-12-19Select default behavior scripts by site URLLars-Dominik Braun4-1/+51
2017-12-17Add Twitter fixupsLars-Dominik Braun1-0/+17
2017-12-17Don’t fetch redirected request bodyLars-Dominik Braun1-8/+12
2017-12-17Add distributed archivingLars-Dominik Braun2-151/+221
2017-12-06Start Chrome browser instanceLars-Dominik Braun2-44/+101
2017-12-06Add flags to disable screenshot/DOM snapshotLars-Dominik Braun1-5/+9
2017-12-03Fix UTF-8 encoding nameLars-Dominik Braun1-1/+1
2017-12-03Add page screenshot to WARCLars-Dominik Braun1-0/+14
2017-11-29Add missing timestamp to response data for redirectsLars-Dominik Braun1-1/+1
2017-11-29argparse: Add metavarLars-Dominik Braun1-7/+7
2017-11-29Use Chrome’s timestamps as WARC-DateLars-Dominik Braun2-8/+14
2017-11-29RefactoringLars-Dominik Braun5-403/+571
2017-11-26DOM snapshot: Generate valid HTML5Lars-Dominik Braun2-9/+31
2017-11-25Ignore duplicate URLs when saving DOM snapshotLars-Dominik Braun1-1/+10