Developing Arabic fonts
Arabic is said to be complex script, compared to simple scripts like Latin and Cyrillic. To render Arabic text correctly we need more sophisticated font technologies, what is so called `smart fonts', like OpenType, AAT from Apple and SIL's graphite.
Features of Arabic script
Arabic is a cursive script, written from right to left. Each letter has, at least, 4 basic forms (with few exceptions); initial, middle, final and isolated forms, each of which might take different shapes according to the context and calligraphic style.
In OpenType fonts, there are lookup tables that define the various forms of each character, basically initial, medial and final forms simple substiution tables.
Another feature is ligatures (though the name ligature is misleading, as those ligatures are mandatory in Arabic, unlike Latin ligatures). Basically, the ligature is a special glyph that is composed of two or more glyphs. Take lam-alef as an example, it is a glyph that is composed of lam (initial or medial form) with alef (final form). However, if we looked carefully, we will see that lam-alef is no exception, just a special form of lam that it only takes when followed by alef, the same for the alef. In OpenType, lam-alef and like can be addressed either by using ligatures, or more complex contextual substitution.
Another feature of the Arabic script is the extensive use of diacritic marks known as Harakat or Tashkil, those small vocalization marks lay above or bellow base glyphs, a well designed font must provide attachment points to define the spatial relation between diacritic mark and base glyph. As Arabic diacritics may be stacked above each other in relation to one base glyph, it is important to address mark to mark relations.
Arabic in Unicode
Unicode defines two types of Arabic code points. Arabic block, U+0600 - U+06FF, which contain almost all Arabic code points, and Arabic supplement, U+0750 - U+077F, which contains some extra characters. There also Arabic presentation forms A and B, U+FB50 - U+FDFF and U+FE70 - U+FEFF, Arabic presentation forms code different forms of Arabic letters and some ligatures, however, they are included for compatibility with legacy encodings and should never be used to directly encode the text. Presentation forms can be used when defining substitution tables, but that is not mandatory as you can added the contextual forms as unencoded glyphs to your font, and tell OpenType engine which glyph to use through the lookup tables. Note that, including the presentation forms in your font doesn't mean that they will be used, you must add lookup tables.
Through out this tutorial we will use mainly FontForge as a font editor. FontForge is capable for both editing glyph outlines, and OpenType table, and scriptable (using either python or its legacy scripting language).
Start fontforge with no arguments, it'll ask you for a font to open, just choose new, or pass
-new option to it in the command line. FontForge doesn't encode the new font as Unicode by default, go to Encoding -> Reencode and choose "ISO 10646-1 (Unicode, BMP)", this way Arabic Unicode points will be available for us to edit.
Basically, for a font to support Arabic language, we should cover U+0621 - U+0652 (Arabic letters) and U+0660 - U+0669 (Arabic digits; what Unicode calls "Arabic-Indic digits). Refer to Unicode charts for the details of each glyph.
Arabic characters are either dual-joining, right-joining, non joining or transparent characters. Dual-joining characters has initial, medial, final and isolated forms, while right-joining don't have a medial form, other Arabic characters has only one form.
We need to provide glyphs for each form of the character, for OpenType fonts it is not necessary to encode those glyphs as presentation forms, we can just add them as unincoded glyphs as they will be accessed through GPOS table. However, if you want your font to be usable for legacy applications that rely on Arabic presentation forms for rendering contextual forms, you might consider adding those glyphs there.
Creating and populating tables
In modern Arabic typography, lam-alef is considered the only mandatory ligature. There is basically two ways to support lam-alef in your font; ligature substitution and contextual substitution.
Almost all font designers use ligature substitution because its simplicity, however it isn't always the best way. Many OpenType implementations doesn't support ligature caret table, giving us no control on where to insert the cursor between ligature components, which makes the experience of editing text with many ligatures not so pleasant. Here we will need simple ligature substitution table, 'liga', FIXME
In my opinion, contextual substitution is superior to ligature substitution, it decreases the number of unnecessary glyphs in your font, avoid all problems that arise from using ligatures and make your font more close to the soul of Arabic calligraphy. But it needs careful analysis of Arabic calligraphy to define properly the start and end of each glyph.