«Duali»: الفرق بين المراجعتين

من ويكي عربآيز
اذهب إلى: تصفح، ابحث
 
ط (استبدال النص - 'https://git.arabeyes.org/' ب'https://gitlab.com/')
 
(39 مراجعة متوسطة بواسطة 6 مستخدمين غير معروضة)
سطر 1: سطر 1:
  +
{{ Project/en
Duali, named after the legendary founder of the Arabic grammar (Abul Aswad al Du'ali - d. 688), is an Arabic spell-checker that is designed to accommodate to the Arabic language (and extendible to other non-Arab based languages as well).
 
  +
|project-name = Duali
  +
|project-type = Development
  +
|logo = duali.png
  +
|logo-size = 100px
  +
|start-date = Jan 31, 2002
  +
|status = Inactive
  +
|maintainer = [[User:Elzubeir‎|Mohammed Elzubeir]]
  +
|contributors = [[User:Yousif‎|Mohammed Yousif]]
   
  +
}}
The Duali project page can be found [http://www.arabeyes.org/project.php?proj=duali here].
 
   
== How It Works ==
 
   
  +
Duali, named after the legendary founder of the Arabic grammar (Abul Aswad al [http://www.sakkal.com/ArtArabicCalligraphy.html Du'ali] - d. 688), is an Arabic spell-checker that is designed to accommodate to the Arabic language (and extendible to other non-Arab based languages as well).
Duali's dictionary data comes in six files (from the Buckwalter Morphological Analyzer). These files are:
 
   
  +
'''Notes'''<br />
* prefixes
 
* suffixes
 
* stems
 
* tableab (compatibility table prefix+stem)
 
* tableac (compatibility table prefix+suffix)
 
* tablebc (compatibility table stem+suffix)
 
   
  +
* Duali is written under the [http://www.opensource.org/licenses/bsd-license.html BSD] license.
Current, compatibility support is not implemented, which means that some of the incorrectly spelled words are actually flagged as correct, but no correctly spelled word would be flagged incorrect.
 
   
  +
<br />'''Links'''<br />
The data files are encoded in UTF-8. However, the Python version of Duali allows the user to generate the dictionary data files using CP-1256 if they are inclined to do so. Choosing an encoding other than UTF-8 makes Duali slower since the look-ups are done in UTF-8 and so a character encoding conversion would have to happen on each look-up if CP-1256 is used.
 
   
  +
* General Links
There is a lot of data that comes with those above mentioned files. However, only a small subset of this data is used in Duali. What happens is the following:
 
  +
** http://www.arabic-morphology.com/
  +
** http://www.glue.umd.edu/~kareem/research/
  +
** http://www.angelfire.com/tx4/lisan/roots1.htm
  +
** http://members.aol.com/ArabicLexicons/index.html
  +
** http://www.glue.umd.edu/~dlrg/clir/trec2002/resources.html
  +
** [http://www.ccse.kfupm.edu.sa/~husni/ICS484/WebPAges/Munawes/Aracbic.htm http://www.ccse.kfupm.edu.sa/~husni/ICS484/WebPAges/Munawes/Aracbic.htm]* OpenOffice Spellchecking Implementation
  +
** [http://api.openoffice.org/docs/DevelopersGuide/OfficeDev/OfficeDev.htm#1+2+3+Linguistics Implementing a Spell Checker]
  +
** [http://sw.openoffice.org/drafts/linguistic_howto.html Linguistics HOWTO]
   
  +
<br />'''Screenshots'''<br />
1. Duali parses file
 
2. Arabic word recognized
 
3. Word is then segmented to all possible combinations (to prefix+stem+suffix)
 
4. Each of those possible combinations is then checked against the prefixes, stems and suffixes.
 
5. Once a match is found it moves on, else the word is incorrectly spelled.
 
   
  +
* Duali [http://art.arabeyes.org/duali screenshots]
Due to the fact that Arabic words are written in different forms (ie. the spellings of a word are sometimes simplified), some of the correctly spelled words may be flagged as incorrect. For this reason, Duali has a feature to 'normalize' words. The normalization process does the following:
 
   
  +
<br />'''Downloads'''<br />
1. Removes ALEF_MADDA ALEF_HAMZA ALEF_HAMZA_BELOW from an ALEF
 
2. Combines a YEH and HAMZA into a YEH_HAMZA
 
3. Replaces an ALEF_MAKSURA with a YEH
 
4. Replaces a TEH_MARBUTA with a HEH
 
   
  +
* Duali - The Arabic Spell Checker
All of this happens internally in Unicode. However, this has been changed to UTF-8 in the C++ version of Duali. This is mainly due to the lack of regex engines that support Unicode properly.
 
  +
** [http://prdownloads.sourceforge.net/arabeyes/duali-0.2.0.tar.bz2?download duali ver 0.2.0] - Arabic spell checker [source]
  +
** [http://prdownloads.sourceforge.net/arabeyes/duali-0.1.1.tar.gz?download duali ver 0.1.1] - Arabic spell checker [source]
  +
** [http://prdownloads.sourceforge.net/arabeyes/duali_0.1.1-2_all.deb?download duali ver 0.1.1] - Arabic spell checker [debian]
  +
** [http://prdownloads.sourceforge.net/arabeyes/duali-0.1b.tar.bz2?download duali ver 0.1b] - Arabic spell checker [source]
  +
** [http://prdownloads.sourceforge.net/arabeyes/duali-0.1a.tar.bz2?download duali ver 0.1a] - Arabic spell checker [source]* Duali - Dictionary Data
  +
** [http://prdownloads.sourceforge.net/arabeyes/duali-data-0.1b.tar.gz?download duali-data-0.1b] - Arabic dictionary and data files (GPL) [source]
  +
** [http://prdownloads.sourceforge.net/arabeyes/duali-data_0.1b-1_i386.deb?download duali-data-0.1b] - Arabic dictionary and data files (GPL) [debian]* Dictionary Generater [obsolete]
  +
** [http://prdownloads.sourceforge.net/arabeyes/gendic-0.1.tar.bz2?download gendic ver 0.1] - Dictionary generator to create a compact Arabic dictionary for Duali.
   
  +
<br /><br />
== How It Should Work ==
 
   
  +
<center>
Despite all of this above, Duali's true intention is not to do any of the above. This method is a "second best" alternative. The real goal of Duali is to produce a very compact dictionary which is root based. That is to say, a dictionary would hold the following information:
 
   
  +
{| width="95%" border="border" align="center"
* root word
 
  +
|- valign="top"
* possible variations (derivatives) of the root word
 
  +
! ID
* this would be represented numerically from a table of the possible derivatives of the root forms
 
  +
! Due Date
  +
! Priority
  +
! State
  +
! Assigned To
  +
! Description
  +
|- valign="top"
  +
|
  +
[[Duali/Todos/16|16]]
  +
| 2003-04-16
  +
| style="background-color: #FFCC00" | Normal
  +
| style="background-color: #33CC00" | Done
  +
|
  +
[[User:Elzubeir‎|Mohammed Elzubeir]]
  +
| Switch db library
  +
|- valign="top"
  +
|
  +
[[Duali/Todos/79|79]]
  +
| 2003-08-01
  +
| style="background-color: #FFCC00" | Normal
  +
| style="background-color: #FF6600" | Pending
  +
|
  +
[[User:Elzubeir‎|Mohammed Elzubeir]]
  +
| Start porting pyduali to C
  +
|}
   
  +
</center><br />Contribute by choosing an unassigned todo ("Assigned To"=None) or a shared todo ("Assigned To"=All).<br /><br />
The spell checker would then:
 
   
  +
== History Log ==
1. Parse file
 
2. Recognize Arabic word
 
3. Strip the word from its prefx and suffix if any
 
4. Get the root from the stem (if not already a root)
 
5. Look up the root word in dictionary
 
6. Verify the derivative is a valid variation of the root
 
7. Flag word accordingly
 
   
  +
----
Unfortunately this is not currently possible due to the lack of data. In other words, this data that would form the ideal dictionary data set is not available and is not likely to happen without a massive effort.
 
  +
'''(Mar 28, 2004)'''
  +
----
  +
[[User:Yousif‎|M.Yousif]] joins the project and puts the port to C++ in 4th gear. More details of progress can be found [http://lists.arabeyes.org/archives/developer/2004/March/msg00248.html here].<br /><br />
  +
----
  +
'''(Dec 11, 2003)'''
  +
----
  +
Duali ver 0.2.0 is now released. It mainly consists of several bug-fixes and clean-ups. A more detailed changelog is available [http://lists.arabeyes.org/archives/announce/2003/December/msg00000.html here].<br /><br />
  +
----
  +
'''(Dec 09, 2003)'''
  +
----
  +
[[User:Samy‎|Samy]] found an interesting [http://bugs.arabeyes.org/cgi-bin/bugzilla/show_bug.cgi?id=80 bug] that is now fixed. A new release is warranted now.<br /><br />
  +
----
  +
'''(Oct 05, 2003)'''
  +
----
  +
A port in Gentoo Linux for Duali 0.1.1 is now available (as of Sep. 23, 2003), see bug [http://bugs.gentoo.org/show_bug.cgi?id=26908 #26908].<br /><br />
  +
----
  +
'''(Sep 01, 2003)'''
  +
----
  +
Duali ver. 0.1.1 is now officially accepted in [http://www.debian.org/ Debian] (unstable). You can check the package page [http://packages.debian.org/unstable/text/duali.html here]. Many thanks to [[User:negm|Mr. Negm]] for sponsoring it.<br /><br />
  +
----
  +
'''(Aug 24, 2003)'''
  +
----
  +
The Debian packages for Duali have been uploaded and are being processed. This is the ITP bug [http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=202940 #202940].<br /><br />
  +
----
  +
'''(Aug 19, 2003)'''
  +
----
  +
Submitted bug report to create a [http://www.gentoo.org/ Gentoo] port for Duali - bug [http://bugs.gentoo.org/show_bug.cgi?id=26908 #26908].
  +
  +
[http://www.debian.org/ Debian] packages have also been prepared and await [[User:negm|Ayman Negm]]'s evaluation to sponsor it - temporary debs can be found [https://www.arabeyes.org/~elzubeir/duali/ here].<br /><br />
  +
  +
----
  +
'''(Jul 06, 2003)'''
  +
----
  +
pyduali 0.1.1 released. Please refer to this [http://lists.arabeyes.org/archives/developer/2003/July/msg00013.html post] for more details.<br /><br />
  +
----
  +
'''(Apr 22, 2003)'''
  +
----
  +
Duali ver 0.1b released. New features include:
  +
  +
* Multiple encoding support
  +
* Configuration file
  +
  +
Download [http://prdownloads.sourceforge.net/arabeyes/duali-0.1b.tar.bz2?download here].
  +
----
  +
'''(Apr 08, 2003)'''
  +
----
  +
Duali 0.1a is released and can be downloaded [http://sourceforge.net/project/showfiles.php?group_id=34866&release_id=151670 here]. <br /><br /> For more information, please read this [http://lists.arabeyes.org/archives/announce/2003/April/msg00000.html post].<br /><br />
  +
----
  +
'''(Mar 14, 2003)'''
  +
----
  +
  +
Finally, Duali now has an official logo, designed by [[User:negm|Abdelmalek Lahmar]]. Many thanks!
  +
  +
Tentative alpha release data reset to April 1, 2003.
  +
----
  +
'''(Mar 06, 2003)'''
  +
----
  +
  +
After some communication with Tim Buckwalter, some reconsiderations are being made to the overall algorithm (the use of roots vs. stems as a basis for the dictionary).
  +
  +
The new tentative alpha release of duali is set to March 25, 2003.
  +
----
  +
'''(Nov 10, 2002)'''
  +
----
  +
  +
Mohammed Kebdani has semi-finalized the way by which all the tables for the forms and derivatives will be drawn out.
  +
  +
For more information, please read this [http://lists.arabeyes.org/archives/doc/2002/November/msg00031.html post]. Or follow-up on the progress [http://noc-webserver.iam.net.ma/~kebdani1/duali/ here].<br /><br />
  +
  +
----
  +
'''(Oct 22, 2002)'''
  +
----
  +
  +
Mr. Kebdani and [[User:Elzubeir‎|Elzubeir]] will use a different dictionary as the main source of input.
  +
  +
For more information, please read this [http://lists.arabeyes.org/archives/doc/2002/October/msg00063.html post].
  +
----
  +
'''(Sep 07, 2002)'''
  +
----
  +
  +
After receiving a full Arabic dictionary (book), several issues came to light.
  +
  +
A general call for volunteers (among other things) can be found in this [http://lists.arabeyes.org/archives/doc/2002/September/msg00000.html post].
  +
----
  +
'''(Aug 11, 2002)'''
  +
----
  +
  +
The status report is outlined in this [http://lists.arabeyes.org/archives/developer/2002/August/msg00048.html post].
  +
  +
More details on problems are in this [http://lists.arabeyes.org/archives/developer/2002/August/msg00103.html post].
  +
----
  +
'''(Jun 01, 2002)'''
  +
----
  +
  +
Imported [https://gitlab.com/arabeyes-dev/duali/ initial code] to CVS. However, please note that the imported code is broken.
  +
  +
Duali is now under the implementation phase.<br /><br />
  +
  +
----
  +
'''(Apr 17, 2002)'''
  +
----
  +
  +
Started on stripping words to their root -- still not accurate enough (will use morpho3 as a basis).
  +
  +
Will get back to it once school gives me a break.<br /><br />
  +
  +
----
  +
'''(Jan 31, 2002)'''
  +
----
  +
  +
[[User:Chahine‎|Chahine Hamila]] and [[User:Elzubeir‎|Mohammed Elzubeir]] have taken up the project and started working on the definition of the involved heuristics. <br />
  +
[[Category:Migrated pages]]
  +
[[Category:English]]

المراجعة الحالية بتاريخ 08:05، 21 فبراير 2017


Duali
Duali


Start date Jan 31, 2002
Maintainer Mohammed Elzubeir
Contributors Mohammed Yousif
Status Inactive
Mailing list Develop
Version control
Issues



Duali, named after the legendary founder of the Arabic grammar (Abul Aswad al Du'ali - d. 688), is an Arabic spell-checker that is designed to accommodate to the Arabic language (and extendible to other non-Arab based languages as well).

Notes

  • Duali is written under the BSD license.


Links


Screenshots


Downloads



ID Due Date Priority State Assigned To Description

16

2003-04-16 Normal Done

Mohammed Elzubeir

Switch db library

79

2003-08-01 Normal Pending

Mohammed Elzubeir

Start porting pyduali to C

Contribute by choosing an unassigned todo ("Assigned To"=None) or a shared todo ("Assigned To"=All).

History Log


(Mar 28, 2004)


M.Yousif joins the project and puts the port to C++ in 4th gear. More details of progress can be found here.


(Dec 11, 2003)


Duali ver 0.2.0 is now released. It mainly consists of several bug-fixes and clean-ups. A more detailed changelog is available here.


(Dec 09, 2003)


Samy found an interesting bug that is now fixed. A new release is warranted now.


(Oct 05, 2003)


A port in Gentoo Linux for Duali 0.1.1 is now available (as of Sep. 23, 2003), see bug #26908.


(Sep 01, 2003)


Duali ver. 0.1.1 is now officially accepted in Debian (unstable). You can check the package page here. Many thanks to Mr. Negm for sponsoring it.


(Aug 24, 2003)


The Debian packages for Duali have been uploaded and are being processed. This is the ITP bug #202940.


(Aug 19, 2003)


Submitted bug report to create a Gentoo port for Duali - bug #26908.

Debian packages have also been prepared and await Ayman Negm's evaluation to sponsor it - temporary debs can be found here.


(Jul 06, 2003)


pyduali 0.1.1 released. Please refer to this post for more details.


(Apr 22, 2003)


Duali ver 0.1b released. New features include:

  • Multiple encoding support
  • Configuration file

Download here.


(Apr 08, 2003)


Duali 0.1a is released and can be downloaded here.

For more information, please read this post.


(Mar 14, 2003)


Finally, Duali now has an official logo, designed by Abdelmalek Lahmar. Many thanks!

Tentative alpha release data reset to April 1, 2003.


(Mar 06, 2003)


After some communication with Tim Buckwalter, some reconsiderations are being made to the overall algorithm (the use of roots vs. stems as a basis for the dictionary).

The new tentative alpha release of duali is set to March 25, 2003.


(Nov 10, 2002)


Mohammed Kebdani has semi-finalized the way by which all the tables for the forms and derivatives will be drawn out.

For more information, please read this post. Or follow-up on the progress here.


(Oct 22, 2002)


Mr. Kebdani and Elzubeir will use a different dictionary as the main source of input.

For more information, please read this post.


(Sep 07, 2002)


After receiving a full Arabic dictionary (book), several issues came to light.

A general call for volunteers (among other things) can be found in this post.


(Aug 11, 2002)


The status report is outlined in this post.

More details on problems are in this post.


(Jun 01, 2002)


Imported initial code to CVS. However, please note that the imported code is broken.

Duali is now under the implementation phase.


(Apr 17, 2002)


Started on stripping words to their root -- still not accurate enough (will use morpho3 as a basis).

Will get back to it once school gives me a break.


(Jan 31, 2002)


Chahine Hamila and Mohammed Elzubeir have taken up the project and started working on the definition of the involved heuristics.