لوحة مفاتيح عربية

Ergonomic Arabic Keyboard Layout

لؤلؤة

This is work in progress and contributions are welcome. Head over to GitHub to see where you can help.

Goals

  • Ergonomic typing of unvocalized and vocalized text with 10 fingers
  • Modern Standard Arabic and Quranic Arabic
  • Localized numbers (European/Arabic-Indic)
  • Usable as primary or secondary keyboard
  • Compose-based
  • Support for Markdown, RST, Wikitext and similar markup

Other languages using the Arabic alphabet (regional dialects, Urdu, Persian) are explicitly not supported.

Usage

Windows
Download driver and follow instructions in INSTALL.txt
Android
Install AnySoftKeyboard and Arabic for AnySoftKeyboard
Linux
Run: xmodmap ar-lulua.xmodmap

Learn more

الأبجدية العربية

The Arabic alphabet

28 letters make up the Arabic alphabet and quite a few extra symbols are required for proper text input, like the hamza in its different shapes أ إ آ ء ئ ؤ, ta marbutah ة, alif maqsurah ى and various diacritics for vowelized texts. Since the performance of a keyboard layout depends on the text entered it is necessary to study its mono-, di- and trigraph frequencies first. The novel corpus built for the following analysis consists of

مصدركلامحروف
الجزيرة ٥٤٧٬١١٠ مقالة ١٨٩٫٤ مليون ١٫٢ مليار
ويكيبيديا العربية ٨٥٧٬٣٨٦ مقالة ١١٠٫٨ مليون ٧٠٢٫٢ مليون
بي بي سي العربية ١٤٩٬٩٠١ مقالة ٣٦٫٩ مليون ٢٣٣٫٣ مليون
هنداوي ١٬٧٠٩ كتاب ٦٨٫٥ مليون ٤٩٨٫٢ مليون
خريطة الشارع المفتوحة Arabic Labels ٣٧٦٬١٤٨ labels ٩١٠٫١ ألف ٥٫٥ مليون
ORPUS OpenSubtitles 2018 ٩٤٬٠٩٣ فيلم ٣٦٣٫٩ مليون ٢٫١ مليار
القرآن (بن tanzil.net) ٧٨٫٢ ألف ٧٠٣٫٨ ألف
الأمم المتحدة Parallel Corpus v1.0 ١١٦٬٧٥٤ ملف ٤٢٤٫٣ مليون ٢٫٩ مليار
مجموع ١٫٢ مليار ٧٫٧ مليار
SourceWordsCharacters
Al-Jazeera 547,110 articles 189.4 million   1.2 billion
Arabic Wikipedia 857,386 articles 110.8 million 702.2 million
BBC Arabic 149,901 articles  36.9 million 233.3 million
hindawi.org   1,709 books  68.5 million 498.2 million
OpenStreetMap Arabic Labels 376,148 labels 910.1 thousand   5.5 million
ORPUS OpenSubtitles 2018  94,093 movies 363.9 million   2.1 billion
tanzil.net Quran  78.2 thousand 703.8 thousand
United Nations Parallel Corpus v1.0 116,754 documents 424.3 million   2.9 billion
Total   1.2 billion   7.7 billion

The chosen Quran representation does not include all quranic diacritization symbols, like other datasets. This makes comparison fairer, since most keyboards presented below do not include any of them.

The plot below shows ا ل ي م و ن can be considered the most frequently used letters in the Arabic language. Together they account for more than 55% of all letters in the corpus.

Arabic letter frequency distribution

Layout properties

Throughout this evaluation color coding is used to identify fingers:

red
little finger
blue
ring finger
magenta
middle finger
violet
index finger
cyan
thumb

Asymmetry is defined as the difference between left and right hand button usage \(b_{left/right}\) and includes the thumb:

$$a = \frac{b_{left}}{b_{total}} - \frac{b_{right}}{b_{total}}$$

The layout proposed uses four layers and assumes a 102/105 key ISO keyboard common in Europe to accomodate the shift keys necessary. These are in order: Shift on the left and right, caps lock on the left and the rightmost key in the middle row, the key right of the left shift key and the key labeled Alt Gr to the right of the spacebar. Symbols are assigned to the four layers by their function: characters, punctuation, diacritics, other.

The first layer was optimized using an extended reimplementation of carpalx. From several runs with 100.000 iterations each the layout which had good scores and looked reasonable to the human eye was picked. Afterwards the second layer was optimized using the same process, but only using data from the Hindawi corpus, because it is the only one with at least some fully diacriticised texts. Finally the different brackets were arranged by hand and the remaining symbols algorithmically distributed on the third layer using the raw Wikitext from the Arabic Wikipedia dataset.

The most frequent letters have all been assigned to the home row, which makes them easily accessible. ا and ل are typed with different hands, balancing the load on hands almost evenly. The index and middle finger of both hands share the majority of the typing load, but naturally the left middle finger is used more frequently due to its assignment to the letter alif.

49.87%
6.34
6.04
16.19
13.56
7.74
Asymmetry: -0.003
50.13%
7.79
18.79
10.60
6.14
6.81

Examples:

  • أَوْ كَصَيِّبٍ مِّنَ السَّمَاءِ فِيهِ ظُلُمَاتٌ وَرَعْدٌ وَبَرْقٌ يَجْعَلُونَ أَصَابِعَهُمْ فِي آذَانِهِم مِّنَ الصَّوَاعِقِ حَذَرَ الْمَوْتِ وَاللَّهُ مُحِيطٌ بِالْكَافِرِينَ
  • اللغة العربية هي أكثرُ اللغاتِ السامية تحدثاً، وإحدى أكثر اللغات انتشاراً في العالم، يتحدثُها أكثرُ من ٤٦٧ مليون نسمة.

Acknowledgements

This work would not have been possible without Martin Krzywinski’s work on carpalx. Neo Layout’s kbdneo2 Windows driver proved to be an invaluable source for understanding the undocumented nature of Windows keyboard driver development.