This is work in progress and contributions are welcome. Head over to GitHub to see where you can help.
Other languages using the Arabic alphabet (regional dialects, Urdu, Persian) are explicitly not supported.
There are 28 letters in the Arabic alphabet, plus quite a few extra symbols required for proper text input, like the hamza in its different shapes أ إ آ ء ئ ؤ, ta marbutah ة, alif maqsurah ى and various diacritics for vowelized texts. Since the usability of a keyboard layout depends on the text entered it is necessary to study letter and letter combination frequencies first. The corpus used for the following analysis consists of
|Al-Jazeera||547,110 articles||187.6 million||1.2 billion|
|Arabic Wikipedia||857,386 articles||110.3 million||700.6 million|
|BBC Arabic||149,901 articles||36.3 million||232.1 million|
|hindawi.org||1,709 books||68.0 million||498.2 million|
|OpenStreetMap Arabic Labels||376,148 labels||909.9 thousand||5.5 million|
|ORPUS OpenSubtitles 2018||94,093 movies||361.4 million||2.1 billion|
|tanzil.net Quran||72.0 thousand||703.8 thousand|
|United Nations Parallel Corpus v1.0||116,754 documents||423.1 million||2.9 billion|
|Total||1.2 billion||7.6 billion|
The plot below shows ا ل ي م و ن can be considered the most frequently used letters in the Arabic language. Together they account for more than 55% of all letters in the corpus.
The following evaluation uses color coding to identify fingers:
Asymmetry is defined as the difference between left and right hand usage.
While technically speaking not a layout but alternative input method, Intellark by Intellaren is worth mentioning. It is based on repeatedly pressing the same button to modifiy the current character. For example pressing A on the QWERTY keyboard cycles through the alternatives ا أ إ آ and ء. Obviously this is slow, error-prone and violates Dvorak’s guidelines for keyboard layout designs.