Frequently Asked Questions
for GNU gettext

Questions

General

Problems building GNU gettext

Problems integrating GNU gettext

GNU gettext on Windows

Other

Answers

General

Where is the mailing list?

Three mailing lists are available:
The bug-gnu-gettext list is archived as part of the bug-gnu-utils archives. bug-gnu-gettext cannot be subscribed on its own; to receive its contents by mail, subscribe to bug-gnu-utils.

Where is the newest gettext source?

The newest gettext release is available on ftp.gnu.org and its mirrors, in http://ftp.gnu.org/gnu/gettext/.

Prereleases are announced on the autotools-announce mailing list. Note that prereleases are meant for testing and not meant for use in production environments. Please don't use the “gettextize” program of a prerelease on projects which you share with other programmers via CVS.

If you want to live on the bleeding edge, you can also use the development sources. Instructions for retrieving the gettext CVS are found here. Note that building from CVS requires special tools (autoconf, automake, m4, groff, bison, etc.) and requires that you pay attention to the README-alpha and autogen.sh files in the CVS.

I want to be notified of new gettext releases.

If you are interested in stable gettext releases, you can follow the info-gnu mailing list. It is also available as a newsgroup gmane.org.fsf.announce through gmane.org.

You can also periodically check the download location.

If you are interested in testing prereleases as well, you can subscribe to the autotools-announce mailing list.

Problems building GNU gettext

On Solaris, I get a build error “text relocations remain” in the libasprintf subdirectory

libtool (or more precisely, the version of libtool that was available at the time the gettext release waas made) doesn't support linking C++ libraries with some versions of GCC. As a workaround, you can configure gettext with the option --disable-libasprintf.

“make install” fails

make install DESTDIR=/some/tempdir” can fail with an error message relating to libgettextlib or libgettextsrc, or can silently fail to install libgettextsrc. On some platforms, this is due to limitations of libtool regarding DESTDIR. On other platforms, it is due to the way the system handles shared libraries, and libtool cannot work around it. Fortunately, on Linux and other glibc based systems, DESTDIR is supported if no different version of gettext is already installed (i.e. it works if you uninstall the older gettext before building and installing the newer one, or if you do a plain “make install” before “make install DESTDIR=/some/tempdir”). On other systems, when  DESTDIR does not work, you can still do “make install” and copy the installed files to /some/tempdir afterwards.

If “make install” without DESTDIR fails, it's a bug which you are welcome to report to the usual bug report address.

Problems integrating GNU gettext

How do I make use of gettext() in my package?

It's not as difficult as it sounds. Here's the recipe for C or C++ based packages.
You find detailed descriptions of how this all works in the GNU gettext manual, chapters “The Maintainer's View” and “Preparing Program Sources”.

I get a linker error “undefined reference to libintl_gettext”

This error means that the program uses the gettext() function after having included the <libintl.h> file from GNU gettext (which remaps it to libintl_gettext()), however at link time a function of this name could not be linked in. (It is expected to come from the libintl library, installed by GNU gettext.)

There are many possible reasons for this error, but in any case you should consider the -I, -L and -l options passed to the compiler. In packages using autoconf generated configure scripts, -I options come from the CFLAGS and CPPFLAGS variables (in Makefiles also DEFS and INCLUDES), -L options come from the LDFLAGS variable, and -l options come from the LIBS variable. The first thing you should check are the values of these variables in your environment and in the  package's config.status autoconfiguration result.

To find the cause of the error, a little analysis is needed. Does the program's final link command contains the option “-lintl”?

gettextize adds multiple references to the same directories/files to Makefile.am and configure.ac

If gettextize is used on a package, then the po/, intl/, m4/ directories of the package are removed, and then gettextize is invoked on the package again, it will re-add the po/, intl/, m4/ directories and change Makefile.am, configure.ac and ChangeLog accordingly. This is normal. The second use of gettextize here is an abuse of the program. gettextize is a wizard intended to transform a working source package into a working source package that uses the newest version of gettext. If you start out from a nonfunctional source package (it is nonfunctional since you have omitted some directories), you cannot expect that gettextize corrects it.

Often this question arises in packages that use CVS. See the section “CVS Issues / Integrating with CVS” of the GNU gettext documentation. This section mentions a program autopoint which is designed to reconstruct those files and directories created by gettextize that can be omitted from a CVS repository.

My program compiles and links fine, but doesn't output translated strings.

There are several possible reasons. Here is a checklist that allows you to determine the cause.
  1. Check that the environment variables LC_ALL, LC_MESSAGES, LC_CTYPE, LANG, LANGUAGE together specify a valid locale and language.
    To check this, run the commands
    $ gettext --version
    $ gettext --help
    You should see at least some output in your desired language. If not, either
  2. Check that your program contains a setlocale call.
    To check this, run your program under ltrace. For example,
    $ ltrace ./myprog
    ...
    setlocale(6, "")                  = "de_DE.UTF-8"
    If you have no ltrace, you can also do this check by running your program under the debugger. For example,
    $ gdb ./myprog
    (gdb) break main
    (gdb) run
    Breakpoint 1, main ()
    (gdb) break setlocale
    (gdb) continue
    Breakpoint 2, setlocale ()
    ;; OK, the breakpoint has been hit, setlocale() is being called.
    Either way, check that the return value of setlocale() is non-NULL. A NULL return value indicates a failure. 
  3. Check that your program contains a textdomain call, a bindtextdomain call referring to the same message domain, and then really calls the gettext, dgettext or dcgettext function.
    To check this, run the program under ltrace. For example,
    $ ltrace ./myprog
    ...
    textdomain("hello-c")                             = "hello-c"
    bindtextdomain("hello-c", "/opt/share"...) = "/opt/share"...
    dcgettext(0, 0x08048691, 5, 0x0804a200, 0x08048689) = 0x4001721f
    If you have no ltrace, you can also do this check by running your program under the debugger. For example,
    $ gdb ./myprog
    (gdb) break main
    (gdb) run
    Breakpoint 1, main ()
    (gdb) break textdomain
    (gdb) break bindtextdomain
    (gdb) break gettext
    (gdb) break dgettext
    (gdb) break dcgettext
    (gdb) continue
    Breakpoint 2, textdomain ()
    (gdb) continue
    Breakpoint 3, bindtextdomain ()
    (gdb) continue
    Breakpoint 6, dcgettext ()
    Note that here dcgettext() is called instead of the gettext() function mentioned in the source code; this is due to an optimization in <libintl.h>.
    When using libintl on a non-glibc system, you have to add a prefix “libintl_” to all the function names mentioned here, because that's what the functions are really named, under the hood.
    If gettext/dgettext/dcgettext is not called at all, the possible cause might be that some autoconf or Makefile macrology has turned off internationalization entirely (like the --disable-nls configuration option usually does).
  4. Check that the .mo file that contains the translation is really there where the program expects it.
    To check this, run the program under strace and look at the open() calls. For example,
    $ strace ./myprog 2>&1 | grep '^open('
    open("/etc/ld.so.preload", O_RDONLY)    = -1 ENOENT (No such file or directory)
    open("/etc/ld.so.cache", O_RDONLY)      = 5
    open("/lib/libc.so.6", O_RDONLY)        = 5
    open("/usr/lib/locale/locale-archive", O_RDONLY|O_LARGEFILE) = 5
    open("/usr/share/locale/locale.alias", O_RDONLY) = 5
    open("/opt/share/locale/de/LC_MESSAGES/hello-c.mo", O_RDONLY) = 5
    ...
    A nonnegative open() return value means that the file has been found.
    If you have no strace, you can also guess the .mo file's location: it is
    localedir/lang/LC_MESSAGES/domain.mo
    where domain is the argument passed to textdomain(), localedir is the second argument passed to bindtextdomain(), and lang is the language (LL) or language and territory (LL_CC), depending on the environment variables checked in step 1.
  5. Check that the .mo file contains a translation for the string that is being asked for.
    To do this, you need to convert the .mo file back to PO file format, through the command
    $ msgunfmt localedir/lang/LC_MESSAGES/domain.mo
    and look for an msgid that matches the given string.

GNU gettext on Windows

What does Woe32 mean?

“Woe32” denotes the Windows 32-bit operating systems for x86: Windows NT/2000/XP/Vista and Windows 95/98/ME. Microsoft uses the term “Win32” to denote these; this is a psychological trick in order to make everyone believe that these OSes are a “win” for the user. However, for most users and developers, they are a source of woes, which is why I call them “Woe32”.

How do I compile, link and run a program that uses the gettext() function?

When you use RedHat's cygwin environment, it's as on Unix:
When you use the Mingw environment (either from within cygwin, with CC="gcc -mno-cygwin", or from MSYS, with CC="gcc"), I don't know the details.

When you use the Microsoft Visual C/C++ (MSVC) compiler, you will likely use the precompiled Woe32 binaries. For running a program that uses gettext(), one needs the .bin.woe32.zip packages of gettext-runtime and libiconv. As a developer, you'll also need the xgettext and msgfmt programs that are contained in the .bin.woe32.zip package of gettext-tools. Then

Setting the LANG environment variable doesn't have any effect

If neither LC_ALL, LC_MESSAGES nor LANGUAGES is set, it's the LANG environment variable which determines the language into which gettext() translates the messages.

You can test your program by setting the LANG environment variable from outside the program. In a Windows command interpreter:
set LANG=de_DE
.\myprog.exe
Or in a Cygwin shell:
$ env LANG=de_DE ./myprog.exe

If this test fails, look at the question “My program compiles and links fine, but doesn't output translated strings.” above.

If this test succeeds, the problem is related in the way you set the environment variable. Here is a checklist:

Other

What does this mean: “'msgid' and 'msgstr' entries do not both end with '\n'”

It means that when the original string ends in a newline, your translation must also end in a newline. And if the original string does not end in a newline, then your translation should likewise not have a newline at the end.

German umlauts are displayed like “ge"andert” instead of “geändert”

This symptom occurs when the LC_CTYPE facet of the locale is not set; then gettext() doesn't know which character set to use, and converts all messages to ASCII, as far as possible.

If the program is doing

setlocale (LC_MESSAGES, "");

then change it to

setlocale (LC_CTYPE, "");
setlocale (LC_MESSAGES, "");

or do both of these in a single call:

setlocale (LC_ALL, "");

If the program is already doing

setlocale (LC_ALL, "");

then the symptom can still occur if the user has not set LANG, but instead has set LC_MESSAGES to a valid locale and has set LC_CTYPE to nothing or an invalid locale. The fix for the user is then to set LANG instead of LC_MESSAGES.

The LANGUAGE environment variable is ignored after I set LANG=en

This is because “en” is a language name, but not a valid locale name. The ABOUT-NLS  file says:
In the LANGUAGE environment variable, but not in the LANG environment variable, LL_CC combinations can be abbreviated as LL to denote the language's main dialect.
Why is LANG=en not allowed? Because LANG is a setting for the entire locale, including monetary information, and this depends on the country: en_GB, en_AU, en_ZA all have different currencies.

I use accented characters in my source code. How do I tell the C/C++ compiler in which encoding it is (like xgettext's --from-code option)?

Short answer: If you want your program to be useful to other people, then don't use accented characters (or other non-ASCII characters) in string literals in the source code. Instead, use only ASCII for string literals, and use gettext() to retrieve their display-ready form.

Long explanation:
The reason is that the ISO C standard specifies that the character set at compilation time can be different from the character set at execution time.
The character encoding at compilation time is the one which determines how the source files are interpreted and also how string literals are stored in the compiled code. This character encoding is generally unspecified; for recent versions of GCC, it depends on the LC_CTYPE locale in effect during the compilation process.
The character encoding at execution time is the one which determines how standard functions like isprint(), wcwidth() etc. work and how strings written to standard output should be encoded. This character encoding is specified by POSIX to depend on the LC_CTYPE locale in effect when the program is executed; see also the description in the ABOUT-NLS file.
Strings in the compiled code are not magically converted between the time the program is compiled and the time it is run.

Therefore what could you do to get accented characters to work?

Can you ensure that the execution character set is the same as the compilation character set? Even if your program is to be used only in a single country, this is not realistically possible. For example, in Germany there are currently three character encodings in use: UTF-8, ISO-8859-15 and ISO-8859-1. Therefore you would have to explicitly convert the accented strings from the compilation character set to the execution character set at runtime, for example through iconv().

Can you ensure that the compilation character set is the one in which your source files are stored? This is not realistically possible either: For compilers other than GCC, there is no way to specify the compilation character set. So let's assume for a moment that everyone uses GCC; then you will specify the LC_CTYPE or LC_ALL environment variable in the Makefile. But for this you have to assume that everyone has a locale in a given encoding. Be it UTF-8 or ISO-8859-1 - this is not realistic. People often have no locale installed besides the one they use.

Use of wide strings L"..." doesn't help solving the problem, because on systems like FreeBSD or Solaris, the way how wide string literals are stored in compiled code depends on the compilation  character set, just as it does for narrow strings "...". Moreover, wide strings have problems of their own.

Use of ISO C 99 Unicode escapes "\uxxxx" doesn't help either because these characters are converted to the compilation character set at compile time; so again, since you can't guarantee that the compilation character set is not ASCII, you're risking compilation errors just as if the real character had been used in the source instead of the Unicode escape.

So, in summary, there is no way to make accented characters in string literals work in C/C++.

You might then wonder what xgettext's --from-code option is good for. The answer is
  1. For the comments in C/C++ source code. The compiler ignores them.
  2. For other programming languages like Java, for which the compiler converts all string literals to UTF-8.


GNU gettext FAQ
Bruno Haible <bruno@clisp.org>

Last modified: 24 February 2004