Skip to main content

PHP internationalization with gettext

· 4 min read

When you are developing a PHP Web Application, many times you need it to be dynamically available in several languages. One solution for this is to enable the Gettext PHP extension into PHP itself. If you are using a WAMP, XAMP or MAMP distribution the only thing you have to do is removing the semicolon of the gettext.so or gettext.dll extension that you can find in php.ini.

You have to use gettext() function, or its alias _(), to get dynamically translated strings.

All the translated string are in PO (portable object) files. PO files are plain text files that contain the translation. PO files can be created by xgettext or by hand in a plain text editor. However, in our case we are going to use the poedit program.

With poedit, we firstly create a new file .po choosing a language and save it.

Then we click on the "Extract from sources" button:

Then in the Catalog properties dialog, we add the path where the php files with gettext('msg_id') or _('msg_id') are. Be careful: in some version of Poedit the option "Add files" does not work properly. You can always change the paths and archives using the menu Catalog - Properties. Next Poedit automatically extract all messages id from all php sources. Then we have to traslate all string and finally, saving the file, Poedit will create/update the .mo file. In the following image you can see the result obtained from a login php page.

In our case, the name of these files are messages. From the point of view of Poedit these files are a catalog; from the point of view of gettext in php sources, these files are a domain.

These files must be saved in a directory structure such as in the following image:

An example of php code would be:  

<?php

session_start ();

$language = isset($_GET['lang'])?$_GET['lang']:"";
if (empty($language)) {
  $language = "es_ES";
}
putenv ( "LANG=" . $language );
setlocale ( LC_ALL, $language );
 
// Set the text domain as "messages"
$domain = "messages";
bindtextdomain ( $domain, "locale");
bind_textdomain_codeset ( $domain, "UTF-8" );
 
textdomain ( $domain );

echo _('Hello World');

?>

In this example, language code is expected in parameter lang via GET. If no language code is received, the default language code is es_ES.

The $language identifier should correspond and be constructed using the same rules as the locale subdirectory, as is shown in the image of directory structure.

In line 9, with putenv(), we are setting the value of LANG environment variable. We may also set the value of LC_ALL. In this way, we are instructing gettext which locale it will be using for this session.

In line 10, with setlocale(), it is specified the locale used in the application and affects how PHP sorts strings, understands date and time formatting, and formats numeric values. The locale code must be exist in your operative system. You can check it with the comand locale -a in linux or mac; in windows, you must check your regional configuration. For that reason, in some machine this function does not work and you have to assign, for example, the value "es_ES.utf8" for linux or mac, because this value is found in the list that we get with locale -a, or "esn" or "esn_esn" or "Spanish_Spain.UTF82 or "esn_esn.UTF8" or "esn_esp.UTF8" for Windows.

In line 12, in $domain we are setting the catalog file used to store the translation, that is, messages.

In line 14, function bindtextdomain sets the path for the domain; the first parameter is the catalog name without the .mo extension, and the second parameter is the path to the parent directory in which the /LC_MESSAGES sub-path resides. It is very common to use locale, but you can use i18n or whatever you want.

In line 15, bind_textdomain_codeset() specifies the character encoding in which the messages from the domain message catalog will be returned. All the domains that are called from the code have to be previously bound.

In line 17, textdomain() sets the de default domain to search within when calls are made to gettext() or _().

In line 19, we are using echo to show a traslated string, but, of course, we can use other functions such as sprintf() if we want to use placeholders.

We have to take into account that if the message id is not found in the catalog, we will get the message id as a traslated string. For that reason, if the base language of the message ids is, for example, en_US, we would not need the en_US subdirectory.

In a WAMP distribution, all the above explained may not work. In that case, follow this link: http://www.extradrm.com/blog/?p=1035