March 14th, 2017

Lately, I noticed that DefaultTranslator cannot be used in console applications. The SetDefaultLang found in LCLTranslator, which is used to switch between translation files pulls in the entire LCL package. It turns out that SetDefaultLang wants to translate all of the forms in the project. Of course there are no forms in a console application, but the function does not know that.

So I decided to rip out the form updating code in LCLTranslator and to save that modified version in a unit called UnitTranslator. This new unit requires LCLBase package instead of the full LCL package.

Anomalous Behaviour of SetDefaultLang in LCLTranslator

If one wanted to manually select a French translation adapted to Belgium of an application (called app) then it should be sufficient to call SetDefaultLang('fr_BE', ''). The function will search for a corresponding .po or .mo file in all the "usual places". This means looking for the file

If such a file is not found then it will look for a compiled .mo file in the same places. Assuming nothing has been found, then the locale will be reduced to the language ('fr' in this case) and the search will be repeated:

This behaviour makes sense, if a translation into the territorial variation of a language is not available, then might as well use a generic translation in that language. So far so good. But things can go awry.

Usually, the unit DefaultTranslator is added to applications that are supplied with translation files. It consists of the call SetDefaultLang('', '') in its initialization. When no language is specified, SetDefaultLang tries to find the system language and then performs the search described above. It looks to the environment variables 'LANG', 'LC_ALL' and 'LC_MESSAGES' in that order. On my system, the locale is 'fr_CA.UTF-8' which is a language (fr for French), a region (CA for Canada) and a Unicode encoding (UTF-8).

So what happens is that a search for the following files is conducted:

which yields nothing and then the locale is reduced to 'fr' and the search is repeated: The function never searches for 'fr_CA' which I think is not the intended behaviour.

In UnitTranslator, which is my version of the code, I decided to create a function GetLang(const Lang: string): string which queries environment variables as in the original code but it adds a test at the end to handle locales such as found on my system.

if (length(Result) > 5) and (Result[3] = '_') then setlength(Result, 5);

The reason for testing for the '_' is that my Ubuntu system, the /usr/share/locale directory contains locales such as 'be@latin' which should not be amputated to 5 characters.

Since I was rewriting the code for FindLocaleFileName function, I also decided to test for the existence of the 'locale' and 'languages' sub directories before running a search for .po an .mo files. There is no point in looking for files in non-existent directories.

I also decided to combine, as it were, DefaultTranslator with UnitTranslator by including a call to SetDefaultLang('', '') in the later unit's initialization code.

The unit can be found here. I am not too sure about the niceties of the copyright involved here. If I had written this unit without looking at the code from V.I. Volchencko et al, I would have used a BDS two or three clause type of copyright. As far as I am concerned, anyone can use the code in any ethical way.

The Correct Environment Variable

Looking at my environment and locale variables, I see that 'LANGUAGE' might have been a better choice:

michel@hp:~$ printenv ... LANG=fr_CA.UTF-8 GDM_LANG=fr_CA ... LANGUAGE=fr_CA:fr ... michel@hp:~$ locale LANG=fr_CA.UTF-8 LANGUAGE=fr_CA:fr LC_CTYPE="fr_CA.UTF-8" LC_NUMERIC="fr_CA.UTF-8" LC_TIME="fr_CA.UTF-8" LC_COLLATE="fr_CA.UTF-8" LC_MONETARY="fr_CA.UTF-8" LC_MESSAGES="fr_CA.UTF-8" LC_PAPER="fr_CA.UTF-8" LC_NAME="fr_CA.UTF-8" LC_ADDRESS="fr_CA.UTF-8" LC_TELEPHONE="fr_CA.UTF-8" LC_MEASUREMENT="fr_CA.UTF-8" LC_IDENTIFICATION="fr_CA.UTF-8" LC_ALL=

This is more or less confirmed by Rémi's answer (edited by Édouard Lopez) to a query to SuperUser. It looks like the user could have something like 'LANGUAGE= fr:de:en' which would mean that messages should be set to "French message where they exists, if not it will use German messages, and will fallback to English one if there isn't German nor French messages."

Trouble is, I do not know how standard this is. The query was about Debian systems. What about other Linux distributions? The man page about Locale(7) does not mention 'LANGUAGE'. The man page about gettext(3) says that it is a 'Gnu extension':

If the LANGUAGE environment variable is set to a nonempty value, and the locale is not the "C" locale, the value of LANGUAGE is assumed to contain a colon separated list of locale names. The functions will attempt to look up a translation of msgid in each of the locales in turn. This is a GNU extension.

Perhaps the correct approach is to assume that there should be a list of locales to search. The LANGUAGE variable should be the first looked at in establishing that list. If it is not present, then the other environment variables should be looked at.

If only things were that simple. The LC_ALL variable, if defined, is a global override. If I understand all this information correctly, the following values

LANG="en_GB"
LC_MESSAGES="fr_CA.UTF-8"
LC_MEASUREMENT=
LC_ALL=
mean that 'en_GB' is to use for LC_MEASUREMENT, while 'fr_CA.UTF-8' for messages. However, if 'LC_ALL' had been
LC_ALL="de"
the 'de' should be used for 'LC_MEASUREMENT' and 'LC_MESSAGES'. But does this override apply to 'LANGUAGES'?

And while on the subject of the 'LC_ALL' override, I noticed that it is the first environment variable looked at in the GetLanguageIDs of gettext and then 'LC_MESSAGES' is considered and finally 'LANG'. All this is correct as far as I know. But in LCLTranslator the environment variable 'LANG' is checked before calling LazGetLanguageIDs which in turn calls GetLanguageIDs. That would mean that the 'LC_ALL' override would be ignored! There is a comment in the code that this has something to do with Windows.

My head hurts.