نقاش:Semantic Arabic Encoding and Format

من ويكي عربآيز
اذهب إلى: تصفح، ابحث

File Problems

Currently, users are unable to access the zip file with the demo code here.

The Basic Concept Needs Reviewing

A Quick Overview of the Problem

After reviewing the page here, one user suggested that this encoding will not solve the problem, but just lead to a new problem altogether. This encoding system tries to separate morphology to the point that the way it is read and the way it is encoded become separate. The average user tends to write a given Arabic word as it is read. Consider the following examples:

Original Arabic, the way the user reads and writes:

Insert Arabic example 1 here.

Arabic via the Tarmeez Semantic Arabic Encoding Format:

Insert Arabic example 2 here.

Thus, this will only lead to a new problem, text input. This format means that we need another program that will properly understand the input in such a format. This problem will become even more complex if the user tries to input a word from an unknown verbal root form into the text. This is ignoring the different lexicons for the different varieties of Arabic, from Classical to Modern Standard to dialectical Arabic. We do not have the ability to make such large and complex dictionaries.

So Where Do We Go From Here?

Instead of looking at the encoding system at the lexical (e.g. word) level, we should look at it according to the letters in the word. The following picture will illustrate the improved encoding system:

Ar new encoding diagram.gif

The idea here is that we are focusing on the naked letter, which means no diacritic markers (e.g. the technical name for the dots you see above certain Arabic letters). So the possible letters are as follows:

The Basic Alphabet in the Encoding:

ا ب ح د ر س ص ط ع ف ك ل م ن ه null

If you ignore the dot underneath the dot underneath the ba'a (as if it were a base form for ba'a, ta'a, and tha'a), then you will see that the other divisions of the encoding will add the diacritics, grammatical vowelling (tashkeel), so on and so forth.

A more thorough explanation of this proposed encoding is to follow.

Benefits of the New Encoding

There are several benefits to this new encoding:

  • The grammatical vowelling (tashkeel) is linked directly to the letter and therefore independent letters are not permissible. This is truer to the combination of morphology and syntax in the original Arabic script.
  • This leads to the added bonus of reducing bitspace needed for the field. Consider the following:
ص "بِسْمِ اللّهِ الرَّحْمَنِ الرَّحِيمِ

A Review of the Basic Concept

Notes Concerning the Suggestions

I wonder if this work if I at least have text under here.

Extending the Short Vowelling

I would hope so.