Lately, I noticed that DefaultTranslator
cannot be used in
console applications. The SetDefaultLang
found in
LCLTranslator
, which is used to switch between translation files
pulls in the entire LCL package. It turns out that SetDefaultLang
wants to translate all of the forms in the project. Of course there are no
forms in a console application, but the function does not know that.
So I decided to rip out the form updating code in
LCLTranslator
and to save that modified version in a unit called
UnitTranslator
. This new unit requires LCLBase
package
instead of the full LCL
package.
Anomalous Behaviour of SetDefaultLang in LCLTranslator
If one wanted to manually select a French translation adapted to Belgium
of an application (called app
) then it should be sufficient to
call SetDefaultLang('fr_BE', '')
. The function will search for a
corresponding .po or .mo file in all the "usual places". This means looking
for the file
- {appdir}/languages/fr_BE/app.po
- {appdir}/locale/fr_BE/app.po
- {appdir}/locale/fr_BE/LC_MESSAGES/app.po
- /usr/share/locale/fr_BE/LC_MESSAGES/app.po on Unix systems
- {appdir}/app.po
- {appdir}/languages/app.fr_BE.po
- {appdir}/locale/app.fr_BE.po
- {appdir}/languages/fr/app.po
- ... [snip]
- {appdir}/locale/app.fr.po
This behaviour makes sense, if a translation into the territorial variation of a language is not available, then might as well use a generic translation in that language. So far so good. But things can go awry.
Usually, the unit DefaultTranslator
is added to applications
that are supplied with translation files. It consists of the call
SetDefaultLang('', '')
in its initialization. When no language is
specified, SetDefaultLang
tries to find the system language
and then performs the search described above. It looks to the environment
variables 'LANG', 'LC_ALL' and 'LC_MESSAGES' in that order. On my system,
the locale is 'fr_CA.UTF-8' which is a language (fr for French), a region (CA
for Canada) and a Unicode encoding (UTF-8).
So what happens is that a search for the following files is conducted:
- {appdir}/languages/fr_CA.UTF-8/app.?o
- ... [snip]
- {appdir}/locale/app.fr_CA.UTF-8.?o
- {appdir}/languages/fr/app.?o
- ... [snip]
- {appdir}/locale/app.fr.?o
In UnitTranslator
, which is my version of the code, I decided
to create a function GetLang(const Lang: string): string
which queries environment variables as in the original code but it adds
a test at the end to handle locales such as found on my system.
if (length(Result) > 5) and (Result[3] = '_') then setlength(Result, 5);
The reason for testing for the '_' is that my Ubuntu system, the /usr/share/locale directory contains locales such as 'be@latin' which should not be amputated to 5 characters.
Since I was rewriting the code for FindLocaleFileName
function, I also decided to test for the existence of the 'locale' and
'languages' sub directories before running a search for .po an .mo files.
There is no point in looking for files in non-existent directories.
I also decided to combine, as it were, DefaultTranslator
with UnitTranslator
by including a call to
SetDefaultLang('', '')
in the later unit's initialization code.
The unit can be found here. I am not too sure about the niceties of the copyright involved here. If I had written this unit without looking at the code from V.I. Volchencko et al, I would have used a BDS two or three clause type of copyright. As far as I am concerned, anyone can use the code in any ethical way.
The Correct Environment Variable
Looking at my environment and locale variables, I see that 'LANGUAGE' might have been a better choice:
This is more or less confirmed by Rémi's answer (edited by Édouard Lopez) to a query to SuperUser. It looks like the user could have something like 'LANGUAGE= fr:de:en' which would mean that messages should be set to "French message where they exists, if not it will use German messages, and will fallback to English one if there isn't German nor French messages."
Trouble is, I do not know how standard this is. The query was about
Debian systems. What about other Linux distributions? The man page
about Locale(7) does not mention 'LANGUAGE'. The man page about
gettext(3)
says that it is a 'Gnu extension':
If the LANGUAGE environment variable is set to a nonempty value, and the locale is not the "C" locale, the value of LANGUAGE is assumed to contain a colon separated list of locale names. The functions will attempt to look up a translation of msgid in each of the locales in turn. This is a GNU extension.
Perhaps the correct approach is to assume that there should be a list of locales to search. The LANGUAGE variable should be the first looked at in establishing that list. If it is not present, then the other environment variables should be looked at.
If only things were that simple. The LC_ALL variable, if defined, is a global override. If I understand all this information correctly, the following values
LANG="en_GB"
LC_MESSAGES="fr_CA.UTF-8"
LC_MEASUREMENT=
LC_ALL=
mean that 'en_GB' is to use for LC_MEASUREMENT, while 'fr_CA.UTF-8' for
messages. However, if 'LC_ALL' had been
LC_ALL="de"
the 'de' should be used for 'LC_MEASUREMENT' and 'LC_MESSAGES'. But does
this override apply to 'LANGUAGES'?
And while on the subject of the 'LC_ALL' override, I noticed that it is
the first environment variable looked at in the GetLanguageIDs
of gettext
and then 'LC_MESSAGES' is considered and finally
'LANG'. All this is correct as far as I know. But in LCLTranslator
the environment variable 'LANG' is checked before calling
LazGetLanguageIDs
which in turn calls GetLanguageIDs
.
That would mean that the 'LC_ALL' override would be ignored! There is a comment
in the code that this has something to do with Windows.
My head hurts.