This is work in progress and contributions are welcome. Head over to GitHub to see where you can help.
Other languages using the Arabic alphabet (regional dialects, Urdu, Persian) are explicitly not supported.
28 letters make up the Arabic alphabet and quite a few extra symbols are required for proper text input, like the hamza in its different shapes أ إ آ ء ئ ؤ, ta marbutah ة, alif maqsurah ى and various diacritics for vowelized texts. Since the performance of a keyboard layout depends on the text entered it is necessary to study its mono-, di- and trigraph frequencies first. The novel corpus built for the following analysis consists of
|Al-Jazeera||547,110 articles||187.6 million||1.2 billion|
|Arabic Wikipedia||857,386 articles||110.3 million||700.6 million|
|BBC Arabic||149,901 articles||36.3 million||232.1 million|
|hindawi.org||1,709 books||68.0 million||498.2 million|
|OpenStreetMap Arabic Labels||376,148 labels||909.9 thousand||5.5 million|
|ORPUS OpenSubtitles 2018||94,093 movies||361.4 million||2.1 billion|
|tanzil.net Quran||72.0 thousand||703.8 thousand|
|United Nations Parallel Corpus v1.0||116,754 documents||423.1 million||2.9 billion|
|Total||1.2 billion||7.6 billion|
The plot below shows ا ل ي م و ن can be considered the most frequently used letters in the Arabic language. Together they account for more than 55% of all letters in the corpus.
Throughout this evaluation color coding is used to identify fingers:
Asymmetry is defined as the difference between left and right hand usage.
The layout proposed uses four layers and assumes a 102/105 key ISO keyboard common in Europe to accomodate the shift keys necessary. These are in order: Shift on the left and right, caps lock on the left and the rightmost key in the middle row, the key right of the left shift key and the key labeled Alt Gr to the right of the spacebar. Symbols are assigned to the four layers by their function: characters, punctuation, diacritics, other.
The first layer was optimized using an extended reimplementation of carpalx. From several runs with 100.000 iterations each the layout which had good scores and looked reasonable to the human eye was picked. Optimal arrengement of layers two and up are still under investigation.
The most frequent letters have all been assigned to the home row, which makes them easily accessible. ا and ل are typed with different hands, balancing the load on hands almost evenly. The index and middle finger of both hands share the majority of the typing load, but naturally the left middle finger is used more frequently due to its assignment to the letter alif.