Age | Commit message (Collapse) | Author | Files | Lines | |
---|---|---|---|---|---|
2020-08-22 | text: Fix HTMLSerializer’s entity decoding | Lars-Dominik Braun | 1 | -1/+1 | |
2020-08-22 | text: Fix broken circular import | Lars-Dominik Braun | 1 | -1/+2 | |
2020-04-25 | Improve lulua-write | Lars-Dominik Braun | 1 | -86/+206 | |
Introduce composable filters, switch to brotli-compressed tarballs, which has good ratios and fast decompression, reducing I/O significantly. | |||||
2020-03-15 | text: Add actual text to debugging output | Lars-Dominik Braun | 1 | -0/+2 | |
2020-03-15 | text: Fix newlines | Lars-Dominik Braun | 1 | -2/+3 | |
Replace Windows-style newlines with Unix newlines. There were quite a few stray \r’s in the stats. This takes care of them. | |||||
2019-11-17 | Add more tests | Lars-Dominik Braun | 1 | -17/+19 | |
2019-11-16 | Add OpenStreetMap label corpus | Lars-Dominik Braun | 1 | -0/+5 | |
Extract node labels (name:ar) from OpenStreetMap’s planet dump. Heavily leans towards a few common words (“street”, obviously), but we should be fine since the corpus is not that large. | |||||
2019-11-08 | Add OpenSubtitles corpus | Lars-Dominik Braun | 1 | -0/+18 | |
See issue #5. | |||||
2019-11-06 | text: Add TEI.2 parser | Lars-Dominik Braun | 1 | -1/+27 | |
2019-10-03 | text: Add epub reader and hindawi corpus | Lars-Dominik Braun | 1 | -21/+50 | |
See issue #5. | |||||
2019-10-03 | text: Fail if workers die | Lars-Dominik Braun | 1 | -29/+39 | |
2019-09-17 | Initial import | Lars-Dominik Braun | 1 | -0/+260 | |