summaryrefslogtreecommitdiff
path: root/lulua/text.py
AgeCommit message (Collapse)AuthorFilesLines
2021-10-27text: epub: Prevent memory leakLars-Dominik Braun1-0/+4
It looks like ebooklib creates non-GC-able references to ZipFile, which must be cleaned up manually.
2020-08-22text: Fix HTMLSerializer’s entity decodingLars-Dominik Braun1-1/+1
2020-08-22text: Fix broken circular importLars-Dominik Braun1-1/+2
2020-04-25Improve lulua-writeLars-Dominik Braun1-86/+206
Introduce composable filters, switch to brotli-compressed tarballs, which has good ratios and fast decompression, reducing I/O significantly.
2020-03-15text: Add actual text to debugging outputLars-Dominik Braun1-0/+2
2020-03-15text: Fix newlinesLars-Dominik Braun1-2/+3
Replace Windows-style newlines with Unix newlines. There were quite a few stray \r’s in the stats. This takes care of them.
2019-11-17Add more testsLars-Dominik Braun1-17/+19
2019-11-16Add OpenStreetMap label corpusLars-Dominik Braun1-0/+5
Extract node labels (name:ar) from OpenStreetMap’s planet dump. Heavily leans towards a few common words (“street”, obviously), but we should be fine since the corpus is not that large.
2019-11-08Add OpenSubtitles corpusLars-Dominik Braun1-0/+18
See issue #5.
2019-11-06text: Add TEI.2 parserLars-Dominik Braun1-1/+27
2019-10-03text: Add epub reader and hindawi corpusLars-Dominik Braun1-21/+50
See issue #5.
2019-10-03text: Fail if workers dieLars-Dominik Braun1-29/+39
2019-09-17Initial importLars-Dominik Braun1-0/+260