diff options
Diffstat (limited to 'lulua/data/report/index.html')
-rw-r--r-- | lulua/data/report/index.html | 286 |
1 files changed, 209 insertions, 77 deletions
diff --git a/lulua/data/report/index.html b/lulua/data/report/index.html index e35e08f..e2108cd 100644 --- a/lulua/data/report/index.html +++ b/lulua/data/report/index.html @@ -5,15 +5,23 @@ <title>لؤلؤة: لوحة مفاتيح عربية</title> <!--<meta name="description" content="">--> <meta name="viewport" content="width=device-width, initial-scale=1"> - + {# Fonts #} <link href="https://fonts.googleapis.com/css?family=IBM+Plex+Mono|IBM+Plex+Sans:100,400&display=swap" rel="stylesheet"> + + {# Pure.css #} <link rel="stylesheet" href="https://unpkg.com/purecss@2.0.3/build/pure-min.css" integrity="sha384-cg6SkqEOCV1NbJoCu11+bm0NvBRc8IYLRGXkmNrqUBfTjmMYwNKPWBTIKyw9mHNJ" crossorigin="anonymous"> <link rel="stylesheet" href="https://unpkg.com/purecss@2.0.3/build/grids-responsive-min.css" integrity="sha384-+lKLtMyKzY/ZdqRXlhw2whazop7duDxgP+oWPB+EGry5wK+vJnZwSsgMnAmDS/39" crossorigin="anonymous"> - <script src="https://polyfill.io/v3/polyfill.min.js?features=es6" crossorigin="anonymous"></script> - <script id="MathJax-script" async src="https://www.unpkg.com/mathjax@3.1.0/es5/tex-mml-chtml.js" integrity="sha384-ynpCZozLxgqK3wrgBu8qH8qPG3eD8mME8z0zugAX26UMb5HfLp2PtvtDH4vdmgkm" crossorigin="anonymous"></script> + + {# TeX support #} + <link rel="stylesheet" href="https://unpkg.com/katex@0.13.18/dist/katex.min.css" integrity="sha384-zTROYFVGOfTw7JV7KUu8udsvW2fx4lWOsCEDqhBreBwlHI4ioVRtmIvEThzJHGET" crossorigin="anonymous"> + <script defer src="https://unpkg.com/katex@0.13.18/dist/katex.min.js" integrity="sha384-GxNFqL3r9uRJQhR+47eDxuPoNE7yLftQM8LcxzgS4HT73tp970WS/wV5p8UzCOmb" crossorigin="anonymous"></script> + + <script defer src="https://unpkg.com/katex@0.13.18/dist/contrib/auto-render.min.js" integrity="sha384-vZTG03m+2yp6N6BNi5iM4rW4oIwk5DfcNdFfxkk9ZWpDriOkXX8voJBFrAO7MpVl" crossorigin="anonymous" onload="renderMathInElement(document.body);"></script> + {# bokeh #} + {# Defer loading bokeh, so it does not block rendering #} {% for f in bokehres.js_files -%} - <script src="{{ f }}"></script> + <script defer crossorigin="anonymous" src="{{ f }}"></script> {%- endfor %} {% for f in bokehres.css_files -%} <link rel="stylesheet" href="{{ f }}"> @@ -129,11 +137,10 @@ <div class="lbox"> <h2>The Arabic alphabet</h2> <p> - 28 letters make up the Arabic alphabet and quite a few extra - symbols are required for proper text input, like the hamza in its different - shapes <bdo dir="ltr" lang="ar">أ إ آ ء ئ ؤ</bdo>, ta marbutah <bdo - dir="ltr" lang="ar">ة</bdo>, alif maqsurah <bdo dir="ltr" - lang="ar">ى</bdo> and various diacritics for vowelized texts. + 28 letters make up the Arabic alphabet and quite a few extra symbols are + required for proper text input, like the {{ hamzah }} in its different + shapes <bdo dir="ltr" lang="ar">أ إ آ ء ئ ؤ</bdo>, {{ tamarbutah + }}, {{ alifmaqsurah }} and various diacritics for vowelized texts. <!-- --> Since the performance of a keyboard layout depends on the text entered it is necessary to study its mono-, di- and trigraph frequencies first. @@ -222,8 +229,9 @@ </details> <p> - The plot below shows <bdo dir="ltr" lang="ar">ا ل ي م و ن</bdo> can be - considered the most frequently used letters in the Arabic language. + The plot below shows {{ alif }}, {{ lam }}, {{ ya }}, {{ mim }}, {{ + waw }} and {{ nun }} can be considered the most frequently used letters + in the Arabic language. <!-- --> Together they account for more than 55% of all letters in the corpus. </p> @@ -328,17 +336,17 @@ The most frequent letters have all been assigned to the home row, which makes them easily accessible. <!-- --> - <bdo lang="ar" dir="ltr">ا</bdo> and <bdo lang="ar" dir="ltr">ل</bdo> + {{ Alif }} and {{ lam }} are typed with different hands, balancing the load on hands almost evenly. <!-- --> The index and middle finger of both hands share the majority of the typing load, but naturally the left middle finger is used more - frequently due to its assignment to the letter alif. + frequently due to its assignment to the letter {{ alif }}. </p> <p> - The layout targets Quaranic and Modern Standard Arabic (MSA), also called Fusha + The layout targets Quaranic and Modern Standard Arabic (MSA), also called Fuṣḥa (<bdo lang="ar">الفصحى</bdo>), only. <!-- --> Dialectical Arabic (<bdo lang="ar">العامية</bdo>) is mainly a spoken @@ -353,35 +361,35 @@ Designing the layout to be compose-based has both benefits and disadvantages. <!-- --> - Compose-based mainly means the hamza <bdo lang="ar" dir="ltr">ء</bdo> - is treated like an optional diacritic for Alef, Waw and Yah instead of - viewing Alef-Hamza, Waw-Hamza and Yah-Hamza as precombined, atomic - units. + Compose-based mainly means the {{ hamzah }} is treated like an optional + diacritic for {{ alif }}, {{ waw }} and {{ ya }} instead of viewing + {{ alifhamzah }}, {{ wawhamzah }} and {{ yahamzah }} as precombined, + atomic units. <!-- --> - Although <bdo lang="ar" dir="ltr">أ</bdo> and <bdo lang="ar" - dir="ltr">ا</bdo> are not the same, the hamza can be dropped if the - writer’s intention is unambigiously inferable from context. + Although {{ alifhamzah_ }} and {{ alif_ }} are not the same, the {{ + hamzah_ }} can be dropped if the writer’s intention is unambigiously + inferable from context. <!-- --> - Thus it makes sense to provide hamza as a combining character on the - keyboard. + Thus it makes sense to provide {{ hamzah_ }} as a combining character + on the keyboard. <!-- --> Additionally it uses two keys less than precombining it with its stems, - allowing the entire alphabet plus hamza diacritic to fit on a single + allowing the entire alphabet plus hamzah diacritic to fit on a single keyboard layer. <!-- --> However, there is a cost to this approach: - All hamza variants account for {{ + All {{ hamzah_ }} variants account for {{ '%.1f'|format(layoutstats['ar-osx'].hamzaImpact*100) }}% of button combinations. <!-- --> - Splitting hamza and from its stem means doubling the total number of - button combinations and thus button presses, decreasing scores like + Splitting {{ hamzah_ }} and from its stem means doubling the total number + of button combinations and thus button presses, decreasing scores like words per minute (WPM) slightly. <!-- --> - Splitting Alef and Alef-Hamza could also reduce pressure on left middle - finger and allow for more even distribution, since {{ - layoutstats['ar-osx'].hamzaOnAlef|fraction }}<sup>th</sup> of all Alef - uses are with Hamza. + Splitting {{ alif }} and {{ alifhamzah }} could also reduce pressure + on left middle finger and allow for more even distribution, since {{ + layoutstats['ar-osx'].hamzaOnAlef|fraction }}<sup>th</sup> of all {{ + alif }} uses are with {{ hamzah }}. </p> <details class="remarks"> <summary></summary> @@ -415,23 +423,6 @@ {% endif %} {% endfor %} </div> -<div class="sentencestats"> -<p lang="en">Examples:</p> -<ul lang="ar"> -{% for sentence in stats.sentences %} - <li> - {% for match, weight in sentence[0] -%} - {%- if weight is none -%} - <span>{{ match }}</span> - {%- else -%} - {%- set c = weight|blendn((38, 139, 210), (108, 113, 196), (211, 54, 130), (220, 50, 47)) -%} - <span style="color: rgb({{ c[0] }}, {{ c[1] }}, {{ c[2] }});" title="{{ '%5.5f'|format(weight) }}">{{ match }}</span> - {%- endif -%} - {%- endfor %} - </li> -{% endfor %} -</ul> -</div> {% endmacro %} <figure id="ar-lulua-heat"> @@ -450,7 +441,62 @@ <div class="lbox"> <h2><a href="#related">Related work</a></h2> <p>This section explores existing keyboard layouts made for the - Arabic language and analyzes their usability.</p> + Arabic language and analyzes their usability. + <!-- --> + Comparing them with the proposed layout above is difficult at best, + because the layouts presented below cover different character sets. + <!-- --> + Some lack numbers, some do not include short vowels and others provide + no way to type symbols. + <!-- --> + Therefore no individual score is assigned to each layout, but an analysis + of each layout’s features is given. + </p> + </div> + </div> + </div> + + <figure id="triadeffort"> + <div class="lbox" lang="en"> + <div id="triadeffort-div"></div> + </div> + <figcaption class="pure-g flexreverse"> + </figcaption> + </figure> + + <div class="pure-g flexreverse"> + <div class="pure-u-1 pure-u-xl-1-2"> + </div> + <div class="pure-u-1 pure-u-xl-1-2" lang="en"> + <div class="lbox"> + <p> + Carpalx tries to minimize the effort of typing in blocks of three + consecutive keystrokes, triads, and thus a good layout in that sense + should make typing frequent triads easy. + <!-- --> + The figure above plots cumulative triad frequency on the x-axis and + weighted cumulative effort on the y-axis. + <!-- --> + Thus, at an x value of 0.5 the y-axis is the sum of triad frequencies + multiplied by their effort for all triads responsible for 50% of the + typing process. + <!-- --> + Standard layouts are the layouts from <a href="#ar-asmo663">ASMO</a>, + <a href="#ar-linux">Linux</a> and <a href="#ar-osx">OSX</a> whereas + <em>usable</em> lists only those which are actually relevant for typing. + <!-- --> + As we can see the layout presented above meets the optimization goal. + <!-- --> + Only the top 5% of all triads are “easier” to type with <a + href="#ar-malas">Malas’ layout</a>, because lulua splits {{ hamzah }} + from its {{ alif }} stem. + <!-- --> + As expected the <a href="#ar-phonetic">phonetic layout</a> is one of the + worst ones, because QWERTY is not optimized for Arabic letter frequencies. + </p> + <p> + The following sections provide details about these layouts. + </p> </div> </div> </div> @@ -474,8 +520,8 @@ dir="ltr" lang="ar">ض ص، س ش، ح ج خ</bdo>) and not frequency. <!-- --> Also it overuses the right index finger by assigning the four - high-frequency letters <bdo lang="ar" dir="ltr">ا ت و ة</bdo> to - it. + high-frequency letters {{ alif }}, {{ ta }}, {{ waw }} and {{ tamarbutah + }} to it. </p> </div> </div> @@ -497,14 +543,14 @@ <h3><a href="#ar-osx">Mac OS X</a></h3> <p> Mac OS X’s Arabic keyboard layout makes a few small changes to ASMO - 663 by moving the <bdo lang="ar" dir="ltr">ة</bdo> to a hard to + 663 by moving the {{ tamarbutah }} to a hard to reach spot on the right of the top row. <!-- --> It also moves the short vowels from the first to the top row of the second layer and replaces them with symbols. <!-- --> The bottom row keys are aditionally shifted to the right, beginning - with <bdo lang="ar" dir="ltr">ر</bdo>. + with {{ ra }}. </p> </div> </div> @@ -528,15 +574,14 @@ A more common layout is the one used on Linux, which also exists on Windows with minor changes to the first layer. <!-- --> - While its top and center row barely differ from ASMO 663 the - bottom row now contains a separate key for the ligature <bdo - lang="ar" dir="ltr">ﻻ</bdo>, likely inherited from <a + While its top and center row barely differ from + ASMO 663 the bottom row now contains a separate key + for the ligature {{ lamalif }} , likely inherited from <a href="https://oztypewriter.blogspot.com/2014/10/the-arabic-typewriter-keyboard-and.html">early typewriter layouts</a>. <!-- --> But at the cost of pushing punctuation characters to the second - layer, <bdo dir="ltr" lang="ar">د</bdo> into the top and <bdo - dir="ltr" lang="ar">ذ</bdo> even further into the number row. + layer, {{ dal }} into the top and {{ dhal }} even further into the number row. </p> </div> </div> @@ -591,10 +636,10 @@ </p> <p> While the layout distributes load between fingers quite well it - favors the left hand by assigning <bdo dir="ltr" lang="ar">ا</bdo> - and <bdo dir="ltr" lang="ar">ل</bdo> to it. + favors the left hand by assigning {{ alif }} + and {{ lam }} to it. <!-- --> - The decision to place <bdo dir="ltr" lang="ar">ث</bdo> in a very + The decision to place {{ tha }} in a very prominent spot seems weird, given it only accounts for 0.5% of all symbols, even in their own analysis. </p> @@ -636,15 +681,13 @@ <!-- --> Probably due to their unusual assumption that middle- and ring-finger rest in the top row their results are suboptimal, - placing both <bdo dir="ltr" lang="ar">ا</bdo> and <bdo dir="ltr" - lang="ar">ي</bdo> in the top row. + placing both {{ alif }} and {{ ya }} in the top row. <!-- --> Their analysis notices this and suggests improved positions for both characters, but these are not actually implemented. <!-- --> - The big asymmetry is caused by placing <bdo dir="ltr" lang="ar">ا - ل ي</bdo> and <bdo dir="ltr" lang="ar">و</bdo>, four of the five - most frequent letters, on the right hand side. + The big asymmetry is caused by placing {{ alif }}, {{ lam }}, {{ ya }} and + {{ waw }}, four of the five most frequent letters, on the right hand side. </p> </div></div> </div> @@ -672,11 +715,11 @@ optimized for typing speed only, claiming 35% faster typing compared to the <a href="#ar-linux">currently used layouts</a>. <!-- --> - However the decision to put <bdo dir="ltr" lang="ar">ي</bdo> in the top + However the decision to put {{ ya }} in the top row seems odd. <!-- --> - Assigning the same left index finger to <bdo dir="ltr" lang="ar">ا - ي و</bdo>, which are three of the most frequent letters, heavily + Assigning the same left index finger to {{ alif }}, + {{ ya }} and {{ waw }}, which are three of the most frequent letters, heavily strains this particular finger. </p> </div> @@ -711,8 +754,8 @@ well. <!-- --> However their algorithm seems to favor the bottom row instead of the - easier to use top row since it places the letters <bdo dir="ltr" - lang="ar">ب ت ر</bdo> there. + easier to use top row since it places the letters {{ ba }}, {{ ta }} + and {{ ra }} there. </p> </div> </div> @@ -746,20 +789,102 @@ provide <em>three</em> single-quote marks ’ and <em>two</em> Arabic semicolon <bdo dir="ltr" lang="ar">؛</bdo>. <!-- --> - Additionally it places <bdo dir="ltr" lang="ar">ي</bdo> in an even + Additionally it places {{ ya }} in an even worse position than Malas’ layout. </p> </div> </div> </div> - <figure id="ar-osman"> + <figure> <div class="lbox"> <img src="ar-osman-heat.svg"> {{ fingerhandstats(layoutstats['ar-osman']) }} </div> </figure> + <div id="ar-qtaish" class="pure-g flexreverse"> + <div class="pure-u-1 pure-u-xl-1-2"> + </div> + <div class="pure-u-1 pure-u-xl-1-2" lang="en"> + <div class="lbox"> + <h3><a href="#ar-qtaish">Qtaish et al</a></h3> + <p> + Qtaish et al presented this layout in + <a href="http://www.sci-int.com/pdf/637456047563529791.pdf">An Improved Arabic Keyboard Layout</a> in 2021. + <!-- --> + They use a novel corpus of 5 million words or roughly 66 million letters + <!-- 65713689 adding up the numbers in their paper --> consisting of + newspapers, (now defunct) social networks and blogs, as well as six + dictionaries (see remarks). + <!-- --> + Then letters were classified into three categories based on their + frequency, which apparently were used to populate home, top and bottom + row (in this order). + <!-- --> + Additionally bigrams were somehow used to arrange letters and avoid + placing them on the same or adjacent fingers, making rolling finger + movements incentiviced by carpalx impossible. + <!-- --> + Ultimately it looks like the layout was designed by hand and not through + an automated process. <!-- see section 5 of the paper --> + </p> + + <details class="remarks"> + <summary></summary> + <em>Alfarahindi</em> and <em>Alein Dictionary</em> are most likely the same 8th century book <a href="https://en.wikipedia.org/wiki/Kitab_al-%27Ayn">Kitab al-'Ayn</a>. + <!-- --> + There are also <a href="https://ar.wikipedia.org/wiki/%D8%AA%D8%A7%D8%AC_%D8%A7%D9%84%D8%B9%D8%B1%D9%88%D8%B3_%D9%85%D9%86_%D8%AC%D9%88%D8%A7%D9%87%D8%B1_%D8%A7%D9%84%D9%82%D8%A7%D9%85%D9%88%D8%B3">Taj-Alaroos</a>, <a href="https://www.lesanarab.com/letter/">Lesan Alarab</a>, <a href="https://ar.wikipedia.org/wiki/%D8%A7%D9%84%D9%85%D8%B9%D8%AC%D9%85_%D8%A7%D9%84%D9%88%D8%B3%D9%8A%D8%B7">Almujam Alwaseet</a> and <a href="https://ar.wikipedia.org/wiki/%D8%A7%D9%84%D9%85%D9%86%D8%AC%D8%AF_(%D9%84%D9%88%D9%8A%D8%B3_%D9%85%D8%B9%D9%84%D9%88%D9%81)">Almunjed</a>. + <!-- --> + All of them are lexica, not dictionaries. + </details> + + <p> + Although the authors do not provide a number row, it has been added for + fair comparison. + <!-- --> + However to be actually usable the layout would need punctuation symbols and diacritics. + </p> + </div> + </div> + </div> + + <figure id="ar-qtaish"> + <div class="lbox"> + <img src="ar-qtaish-heat.svg"> + {{ fingerhandstats(layoutstats['ar-qtaish']) }} + </div> + </figure> + + <div id="ar-ergoarabic" class="pure-g flexreverse"> + <div class="pure-u-1 pure-u-xl-1-2"> + </div> + <div class="pure-u-1 pure-u-xl-1-2" lang="en"> + <div class="lbox"> + <h3><a href="#ar-ergoarabic">Ergoarabic</a></h3> + <p> + Being first released in 2021 <a + href="https://github.com/darkstego/ergoarabic">Ergoarabic</a> is the + most recent optimized layout for Arabic. + <!-- --> + On the first layer it combines the positions of 17 keys from the <a + href="#ar-linux">standard PC layout</a> with brackets and punctuation + symbols from QWERTY, aiming to retain compatibility with both. + <!-- --> + The remaining keys have been re-arranged or moved to the shift layer by + hand for improved ergonomics. + </p> + </div> + </div> + </div> + + <figure id="ar-ergoarabic"> + <div class="lbox"> + <img src="ar-ergoarabic-heat.svg"> + {{ fingerhandstats(layoutstats['ar-ergoarabic']) }} + </div> + </figure> + <div id="ar-phonetic" class="pure-g flexreverse"> <div class="pure-u-1 pure-u-xl-1-2"> </div> @@ -769,7 +894,7 @@ <p> The <a href="http://arabic.omaralzabir.com/home">Arabic Phonetic Keyboard</a> simply maps the QWERTY layout to Arabic letters, based on their sound. - Thus Q becomes <bdo dir="ltr" lang="ar">ق</bdo>, Y becomes <bdo dir="ltr" lang="ar">ي</bdo> and so on. + Thus Q becomes {{ qaf }}, Y becomes {{ ya }} and so on. It claims to be optimized for writing vowelized texts, especially Quranic Arabic, and thus includes quite a few combining characters and special symbols. @@ -836,9 +961,16 @@ </section> <script> -fetch('letterfreq.json') - .then(function(response) { return response.json(); }) - .then(function(item) { Bokeh.embed.embed_item(item, 'letterfreq-div'); }) +function loadPlots () { + const plots = ['letterfreq', 'triadeffort']; + for (const p of plots) { + fetch(p + '.json') + .then(function(response) { return response.json(); }) + .then(function(item) { Bokeh.embed.embed_item(item, p + '-div'); }); + } +} +if (document.readyState != "loading") loadPlots (); +else document.addEventListener("DOMContentLoaded", loadPlots); </script> </body> |