لؤلؤة: لوحة مفاتيح عربية

Goals

Ergonomic typing of unvocalized and vocalized text with 10 fingers

Modern Standard Arabic and Quranic Arabic

Localized numbers (European/Arabic-Indic)

Usable as primary or secondary keyboard

Compose-based

Support for Markdown, RST, Wikitext and similar markup

Other languages using the Arabic alphabet (regional dialects, Urdu, Persian) are explicitly not supported.

Usage

Printout

PDF

Windows

Download driver and follow instructions in INSTALL.txt

Android

Install AnySoftKeyboard and Arabic for AnySoftKeyboard

Linux

Run: xmodmap ar-lulua.xmodmap

Klavaro

A touch typing tutor. Copy the custom keyboard to %localappdata%\klavaro (Windows) or ~/.config/klavaro/ (Linux), select Keyboard → (Custom), then lulua_ar.

الأبجدية العربية

The Arabic alphabet

28 letters make up the Arabic alphabet and quite a few extra symbols are required for proper text input, like the hamza in its different shapes أ إ آ ء ئ ؤ, ta marbutah ة, alif maqsurah ى and various diacritics for vowelized texts. Since the performance of a keyboard layout depends on the text entered it is necessary to study its mono-, di- and trigraph frequencies first. The novel corpus built for the following analysis consists of

{% for c in corpus|sort(attribute='source.name.en') %} {% set count = c.get ('count') %} {% if count %} {# use new style formatting, for some reason %7,d does not work #} {% else %} {% endif %} {% set stats = c.get ('stats') %} {% for k in ('words', 'characters') %} {% set i = stats[k]|approx('ar') %} {% endfor %} {% endfor %} {% for k in ('words', 'characters') %} {% set i = corpustotal[k]|approx('ar') %} {% endfor %}

مصدر		كلام	حروف
{{ c.source.name.ar }}	{{ '{:7,d}'.format(count.num)\|arabnum }} {{ count.kind.ar }}		{{ '%5.1f'\|format(i[0])\|arabnum }} {{ i[1] }}
مجموع		{{ '%5.1f'\|format(i[0])\|arabnum }} {{ i[1] }}

{% for c in corpus|sort(attribute='source.name.en') %} {% set count = c.get ('count') %} {% if count %} {# use new style formatting, for some reason %7,d does not work #} {% else %} {% endif %} {% set stats = c.get ('stats') %} {% for k in ('words', 'characters') %} {% set i = stats[k]|approx %} {% endfor %} {% endfor %} {% for k in ('words', 'characters') %} {% set i = corpustotal[k]|approx %} {% endfor %}

Source		Words	Characters
{{ c.source.name.en }}	{{ '{:7,d}'.format(count.num)\|numspace }} {{ count.kind.en }}		{{ '%5.1f'\|format(i[0])\|numspace }} {{ i[1] }}
Total		{{ '%5.1f'\|format(i[0])\|numspace }} {{ i[1] }}

The chosen Quran representation does not include all quranic diacritization symbols, like other datasets. This makes comparison fairer, since most keyboards presented below do not include any of them.

The plot below shows ا ل ي م و ن can be considered the most frequently used letters in the Arabic language. Together they account for more than 55% of all letters in the corpus.

Arabic letter frequency distribution

Layout properties

Throughout this evaluation color coding is used to identify fingers:

red: little finger
blue: ring finger
magenta: middle finger
violet: index finger
cyan: thumb

Asymmetry is defined as the difference between left and right hand button usage $b_{left/right}$ and includes the thumb:

$$a = \frac{b_{left}}{b_{total}} - \frac{b_{right}}{b_{total}}$$

The layout proposed uses four shift layers in a way inspired by Neo2. Thus it assumes a 102/105 key ISO keyboard common in Europe – but also available in Arab countries – to accommodate for the necessary shift keys. These are in order: Shift on the left and right, caps lock on the left and the rightmost key in the middle row, the key right to the left shift key and the key labeled Alt Gr to the right of the spacebar. Symbols are assigned to the four layers by their function: characters, punctuation, diacritics, other.

Apple, for instance, provides an Arabic hardware keyboard with this physical layout. But both variants, 101/104 key and 102/105 key devices, seem to exist in the Arab world.

The first layer was optimized using an extended reimplementation of carpalx. From several runs with 100.000 iterations each the layout which had good scores and looked reasonable to the human eye was picked. Afterwards the third layer was optimized using the same process, but only using data from the Hindawi corpus, because it is the only one with at least some fully diacriticised texts. Finally the different brackets were arranged by hand and the remaining symbols algorithmically distributed on the second layer using the raw Wikitext from the Arabic Wikipedia dataset.

This is a common way of arranging brackets, because most algorithms ignore human desire for symmetry.

The most frequent letters have all been assigned to the home row, which makes them easily accessible. ا and ل are typed with different hands, balancing the load on hands almost evenly. The index and middle finger of both hands share the majority of the typing load, but naturally the left middle finger is used more frequently due to its assignment to the letter alif.

The layout targets Quaranic and Modern Standard Arabic (MSA), also called Fusha (الفصحى), only. Dialectical Arabic (العامية) is mainly a spoken language, although with the rise of social media sites like Twitter and Facebook this is changing. For now however it’s not an optimization target due to the lack of a good, representative corpus.

Designing the layout to be compose-based has both benefits and disadvantages. Compose-based mainly means the hamza ء is treated like an optional diacritic for Alef, Waw and Yah instead of viewing Alef-Hamza, Waw-Hamza and Yah-Hamza as precombined, atomic units. Although أ and ا are not the same, the hamza can be dropped if the writer’s intention is unambigiously inferable from context. Thus it makes sense to provide hamza as a combining character on the keyboard. Additionally it uses two keys less than precombining it with its stems, allowing the entire alphabet plus hamza diacritic to fit on a single keyboard layer. However, there is a cost to this approach: All hamza variants account for {{ '%.1f'|format(layoutstats['ar-osx'].hamzaImpact*100) }}% of button combinations. Splitting hamza and from its stem means doubling the total number of button combinations and thus button presses, decreasing scores like words per minute (WPM) slightly. Splitting Alef and Alef-Hamza could also reduce pressure on left middle finger and allow for more even distribution, since {{ layoutstats['ar-osx'].hamzaOnAlef|fraction }}^th of all Alef uses are with Hamza.

See for example section 3.3 of Buckwalter’s Issues in Arabic Morphological Analysis.

{% macro fingerhandstats(stats) %} {% set hands = stats.hands %} {% set fingers = stats.fingers %}

{% for hand in Direction %} {% set handpct = hands[hand]/stats.buttonPresses*100 %}

{{ '%.2f'|format(handpct) }}%

{% for finger in fingerOrder[hand] %} {% set fingerpct = fingers[(hand, finger)]/stats.buttonPresses*100 %} {# finger width is relative to parent (i.e. hand) #} {% set fingerwidth = fingers[(hand, finger)]/hands[hand]*100 %}

{% endfor %}

{% if loop.first %}

Asymmetry: {{ '%.3f'|format(stats.asymmetry) }}

{% endif %} {% endfor %}

{% endmacro %}

لوحة مفاتيح عربية

Ergonomic Arabic Keyboard Layout

Goals

Usage

Learn more

الأبجدية العربية

The Arabic alphabet

Layout properties

Acknowledgements