5.  Common Guidelines and Recommendations

Here we will review some guidelines and make recommendations that are independent of the tools you will use to translate files. If you use a simple text editor or a more sophisticated tool like KBabel, do please read the following section as it pertains to both usage means. The intent here is educate what needs to be done to translate your files. A common structure and procedure is needed from and by all to result in coherent translation practices.

5.1.  'PO' and 'POT' Files

Graphical Desktop Environments like KDE and GNOME and the various applications that ship with them are written using C/C++ programming languages. Translating these environments means translating the tens of thousands of user visible strings from their native language (i.e. English) to a target language (i.e. Arabic). This is NOT done by copying the source code and replacing all the English strings with Arabic ones, its done by translating already extracted strings into their various files in order to ease the overall process.

The user visible strings are, thus, extracted from the source code with a program called 'gettext'. They these strings are extracted then are stored in a text file with a '.pot' extension. The '.pot' extension means that this file has NOT been translated yet (i.e. its in its original form). If you want to start a 'POT' file translation, then you have to first rename the file with the same name but with a '.po' extension. Let's say that you want to start translating the file 'filename.pot', you would then:

$ mv filename.pot filename.po
     

Work on translating the file and when satisfied with the amount of work you have accomplished, do the following:

$ cvs remove filename.pot
$ cvs add filename.po
$ cvs commit[2]
      

The '.po' extension means that the translation has started on this file, without necessarily meaning that it is finished. Please remember to never translate 'POT' files. Either the files you are translating already have a '.po' extension or you'll be forced to rename them according to what has been described above. You are not required to change the extension of 'POT' files that you will NOT be translating in the very near future. This renaming has come to sorta mean that the person that renamed them owns them and will be working on them. If you have any doubts/questions, post to the 'doc' mailing list.

At run-time, the desktop environment will load the user visible strings from the target language translated 'PO' files and display them accordingly without any apparent change in the program behavior. If the translation is incomplete, the environment will simply display the original English strings where the translation is missing.

The sections that follow will give some guidelines with regards to how to translate 'PO' files. Remember that you can always use the 'doc' mailing list whenever you have any concerns (it would be unrealistic of us to pretend that these guidelines treat all issues that a translator may face). Translators are also encouraged to look at already translated 'PO' files in the CVS repository in a possible bid to learn further the various issues noted.

5.2.  'PO' Files Structure

5.2.1.  The Header

All files carry relevant info with regards to who the translator(s) were, what encoding the file uses, various date info (creation, revision, etc). This info is all stored within the header of the file (i.e. in the beginning of the file). The information is of course editable since the info is liable to be changed.

Here's a sample header of a newly created file:

#, fuzzy
msgid ""
msgstr ""
"Project-Id-Version: desktop files\n"
"POT-Creation-Date: 2002-08-14 03:34+0200\n"
"PO-Revision-Date: YEAR-MO-DA HO:MI+ZONE\n"
"Last-Translator: FULL NAME <EMAIL@ADDRESS>\n"
"Language-Team: LANGUAGE <LL@li.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=CHARSET\n"
"Content-Transfer-Encoding: ENCODING\n"
        

With all newly created files you are to remove the "#, fuzzy" line and add at the beginning of the file the following string "# translation of filename.po to Arabic" (replace "filename.po" with the appropriate filename of the file you are translating), add the copyright information string "# Copyright (C) 2003 Free Software Foundation, Inc." and then add your name and email after the copyright line (this will be the translators list). Then change the, "Last-Translator:", "Language-Team:", "Content-Type:" and "Content-Transfer-Encoding:" fields so that they will look as shown below (don't bother with any other existing or non-existing fields for the time being). If you will use a text editor in order to translate the files (as apposed to a translation application like KBabel), remember to update also the "PO-Revision-Date:" field every time you save your files.

Here's a sample of the header while its being translated (or finished):

# translation of desktop_kde-i18n.po to Arabic
# desktop.po - Arabic Translation.
# Copyright (C) 2001,2002,2003 Free Software Foundation, Inc.
# Name1 Surname1 <A_address/@domain_1.com>, 2001.
# Name2 Surname2 <B_address/@domain_2.net>, 2002
# Name3 Surname3 <C_address/@domain_3.org>, 2002,2003
#
msgid ""
msgstr ""
"Project-Id-Version: desktop_kde-i18n\n"
"POT-Creation-Date: 2003-03-02 20:47+0100\n"
"PO-Revision-Date: 2003-02-26 21:27+0200\n"
"Last-Translator: Name3 Surname3 <C_address/@domain_3.org>\n"
"Language-Team: Arabic <support at arabeyes.org>\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"X-Generator: KBabel 1.0.1\n"
        

The lines starting with the '#' character are comments. If you use a translation application (like KBabel), some header fields are automatically created like the "X-Generator:" field. If you are working on a 'PO' file already started by someone else, and you have made significant changes, then just add your name to the translators list and put your name and email in the "Last-Translator:" field (after checking that the person who was in the "Last-Translator:" field is already in the translators list, otherwise add him/her to that list). Check also that the above cited fields have the correct/complete values entered in them. If not, correct them.

Always remember to add your name to the translators' list. It's very important to know who had worked on what file plus it will ensure that you will get the credit you deserve for your work :-)

5.2.2.  The Body & Strings

Beyond the header, a 'PO' file is a mere succession of 'msgid' and 'msgstr', preceded sometimes by comments about the exact location of where those strings appear within the source code[3]. The 'msgid' is already filled with the English string that needs to be translated. What translators need to do is to fill 'msgstr' with the Arabic translation of what appears in the 'msgid' string (do NOT delete the double quotes ' " ').

Sample excerpts from different 'PO' files follow,

#: finddialog.cpp:55
msgid "Galaxies"
msgstr "مجرّات"

msgid ""
"_: do not use a target symbol\n"
"No Symbol"
msgstr ""

#: src/acme.h:80
#, fuzzy
msgid "Eject key"
msgstr "مفتاح الإخراج"
        

The first 'msgid' is simply translated, the second string is not translated yet (that's why the 'msgstr' is empty). The third string is translated but the translator has put a "#, fuzzy" statement before the block. The "fuzzy" indicator is means by which a translator indicates that he/she are not sure of the term and that a second pass is required. The "#," character is of key importance, its a special character sequence. It is important to note that the "#, fuzzy" statement must come before the 'msgid'.

5.3.  Comments within 'msgid' ('_:')

'msgid' strings starting with '_:' indicate a comment to help the translator or to give him/her more information about the message to translate is included. The comment ends with a '\n' (new line) sequence. Do NOT translate the help comment and simply put in the 'msgstr' the translation of only what follows the comment's '\n' (new line) sequence according to what you understood the comment was directing you to do.

Example:

#: cupsdconf.cpp:808 cupsdconf.cpp:831 cupsdconf.cpp:847
msgid ""
"_: Base\n"
"Root"
msgstr ""
"الجذر"
      

5.4.  Special keywords within 'msgid' ('Comment=' or 'Name=' or others)

Translate only what comes after the '=' character while keeping the keyword intact (i.e. don't translate or modify "Comment" or "Name"); keep the keywords in English.

Example:

#: kde-i18n/vi/messages/entry.desktop:1
msgid "Name=Vietnamese"
msgstr "Name=الفييتنامية"
      

5.5.  HTML tags within 'msgid'

Some applications use Rich Text features (for paragraphs, colors, bold fonts, etc) to make their strings look nicer. This is accomplished while using HTML tags such as the "<p>" tag for instance. It is very important that the translated message keep all those tags. This is also the case for C print characters like "\n" or "\t" etc or "%s". Translate the text between the tags only. Also, remember that when you are in Right-to-Left input mode (i.e. when your are typing Arabic), the "\n" sequence may look odd - "n\". In either mode, remember that you need to enter first the "\" character and then the "n" irrespective of how they look on screen.

Example:

#: kio/kio/global.cpp:451
msgid "</p><p><b>Details of the Request</b>:"
msgstr "</p><p><b>تفاصيل الطّلب</b>:"
      

5.6.  Carriage Returns & Punctuation within 'msgid'

Keep the same punctuation in the translated messages as those found in the original strings. Carriage returns (or new lines) need to also be included. So if there multiple strings within a single 'msgid', the resulting 'msgstr' ought to mimic the original message in line number count (lines beginning with "_:" ought to be ignored as noted above since they are mere comments). See the example of comments.

5.7.  Multi-listing 'msgstr' for plural forms

The plural forms feature offers the possibility to display the correct word form depending on the quantity. They are distinguished by the "%n" character.

Sample:

msgid "_n: one cat\n"
"%n cats"
msgstr "قطّة واحدة\n"
"قطّتان\n"
"%n قطط\n"
"%n قطّة"          
      

The first 'msgstr' is displayed when there's a single cat. The second is displayed when there are 2 cats. The third is displayed when there are 3 to 10 cats. The last entry is displayed when the number of cats is equal-to or greater to 11. There is no need to report the "_n" in the translated message. Don't forget to use the "\n" character to separate between the different entries.

5.8.  Shortcuts & underscores in 'msgid'

Characters preceded by an underscore character ("_") are called shortcuts or accelerators. The shortcut is used as a means to access directly a menu entry instead of selecting it via the mouse of cursor. The accelerators need to be translated in order to fit with an Arabic keyboard. That means that you will have to choose a letter from the word (or the sentence) as the accelerator. Try not to assign the same letter for two accelerators which are in the same menu (you can guess that from the location of the 'msgid' for example). As a rule of thumb, assign the first letter of the word to be the accelerator, if that letter is already used as shortcut in the current context, assign the second letter, etc. Do please try to be consistent.

5.9.  Names and Acronyms within 'msgid'

Names (people, locations, some programs) can be transliterated to Arabic characters, others can even be translated totally or partly to Arabic. Acronyms like ASCII, HTTP, C++ etc should be kept in Latin characters. This is the same for program directives, class names, variables etc. Remember to read the Arabic Documentation Standards. When in doubt, post on the 'doc' list.

5.10.  Special Characters within 'msgid'

Some special characters like "&" need to be entered twice to tell the parser to keep it as is. Please translate and take this in account when needed.

Sample:

#: rc.cpp:19
msgid "Color && Animation"
msgstr "اللّون و الحركة"
      

5.11.  Use of the Imperative Form for Actions

Menu actions like "Edit", "Save", "Quit" etc and other related cases must be translated to Arabic in the imperative form. This is done for a standardization purpose (follow with the existing) and in order to give a more lively user/machine interaction. This is of EXTREME importance.

Sample:

#: ktouch.cpp:239
msgid "Save file..."
msgstr "إحفظ ملفّ..."
      

5.12.  UTF-8 Encoding

Please remember to ALWAYS save your files in UTF-8 encoding [4]. Otherwise, it can cause the file to be unreadable by others. UTF-8 is a global all-encompassing encoding (created and maintained by Unicode). UTF-8 enables the user to encode all world's languages as well as various symbols (mathematical ones for instance) in a single file. Before Unicode and UTF-8, it was nearly impossible to exchange documents between localized systems. So it is VERY important that all work be saved in this UTF-8 encoding (posting Arabic messages to 'doc' should also only occur in UTF-8). KBabel, once configured properly (more on that later), will automatically save all your files in UTF-8. If you use a text editor to do your translations, on the other hand, please check that your editor of choice supports UTF-8 and that you are indeed saving your work in that encoding. A check can be done with the 'file' command:

$ file filename.po
      

This command will give you filename.po's encoding.

If you start to work on a file and you find that it isn't saved in UTF-8 or that it has an encoding problem (displays boxes instead of regular characters for example) immediately stop working on this file and report the issue/problem to the 'doc' list. Once the problem is reported and prior to resolution, do work on another file instead of waiting by idly :-)

5.13.  Sanity checking (any errors ?)

In order to check that the 'PO' file that you've translated does not contain any syntactical/syntax errors, you need to,

$ msgfmt -c --statistics filename.po
      

'msgfmt' being a program from GNU's 'gettext' package.

Is there are any errors, you will explicitly be warned about them with the appropriate line number(s) so you can easily locate and correct issues (missing double quotes ' " ' are very common). Baring any errors, this "msgfmt" command will give you statistics about what's been translated, what's not translated and how many fuzzy strings are in the file. Please, report these statistics in the commit log (i.e. when committing the file(s) to the CVS repository).

5.14.  Commit check-list

After translating an entire or part of a 'PO' file which you deem ought to be sent back to the repository, you need to commit the file(s) back to the repository. Here's a check-list of things you MUST do before committing a file:

  1. The file has a '.po' extension.

  2. You have entered the correct information in the file header.

  3. The file is saved and is encoded with UTF-8.

  4. You checked that there are no errors with the 'msgfmt' command.

If one or more of these requirements are not fulfilled, then please refer to what have been discussed above in order to remedy any problems. Otherwise, your files are eligible to be put on CVS :-)

If you had worked and translated several files at once, do please do a 'cvs commit' in order to send all those files grouped rather than committing them separately (saves bandwidth, less headaches, etc). While committing, and this is VERY IMPORTANT, remember to put the files' translation statistics in the commit log (how many strings on a per file basis were translated, untranslated and fuzzy). In case of a conflict (very unlikely), please refer to the CVS HOWTO.



[2] The CVS commit log ought to be something akin to, 'renamed filename.pot to filename.po' (the translation statistics need to be included as well as is noted here). If you renamed more than one 'POT' file, do please commit them all via a single command. In other words, after moving, adding and removing all the files, do 'cvs commit' and all will be picked up If ever in doubt, refer to the CVS HOWTO 'Committing Multiple Files' section.

[3] These types of comments are initiated by "#:" at the beginning of the line.

[4] This document is UTF-8 encoded, set that encoding to view it properly.