diff options
author | Lars-Dominik Braun <lars@6xq.net> | 2019-10-03 17:23:53 +0200 |
---|---|---|
committer | Lars-Dominik Braun <lars@6xq.net> | 2019-10-03 17:23:53 +0200 |
commit | 2d45ef655f8791037373ab83174fc6c3596227b0 (patch) | |
tree | a05d506928fcc16f8dfdddb860c6ce4c5193bfc4 /doc/index.html | |
parent | 8048f6351fb4611134c2f6e2d9129ec025376914 (diff) | |
download | lulua-2d45ef655f8791037373ab83174fc6c3596227b0.tar.gz lulua-2d45ef655f8791037373ab83174fc6c3596227b0.tar.bz2 lulua-2d45ef655f8791037373ab83174fc6c3596227b0.zip |
text: Add epub reader and hindawi corpus
See issue #5.
Diffstat (limited to 'doc/index.html')
-rw-r--r-- | doc/index.html | 6 |
1 files changed, 4 insertions, 2 deletions
diff --git a/doc/index.html b/doc/index.html index f9daf88..6749647 100644 --- a/doc/index.html +++ b/doc/index.html @@ -129,13 +129,15 @@ dump</a> of the <a href="https://ar.wikipedia.org/">Arabic Wikipedia</a> as of July 2019, extracted using <a href="https://github.com/attardi/wikiextractor/tree/3162bb6c3c9ebd2d15be507aa11d6fa818a454ac">wikiextractor</a> - containing 857386 articles</li> + containing 857,386 articles</li> + <li>1,709 ebooks from <a + href="https://www.hindawi.org/books">hindawi.org</a></li> <li>and a plain-text copy of the Quran from <a href="http://tanzil.net/docs/download">tanzil.net</a> using the options Simple Enhanced and Text (for inclusion of diacritics)</li> </ul> <p> - summing up to roughly 1.5 billion characters. + summing up to roughly two billion characters. <!-- --> The plot below shows <bdo dir="ltr" lang="ar">ا ل ي م و ن</bdo> can be considered the most frequently used letters in the Arabic language. |