Info Node: (gettext.info)Normalizing

www.fifi.org
    Documentation
        Manpages
        GNU Info
        Debian document tree
        Whole document tree
    Trigance web page
    Public services
    User info
    Mailing lists
    Secure server
    Multilingual usage

Validate HTML
Validate CSS

(gettext.info)Normalizing

Normalizing Strings in Entries ============================== There are many different ways for encoding a particular string into a PO file entry, because there are so many different ways to split and quote multi-line strings, and even, to represent special characters by backslahsed escaped sequences. Some features of PO mode rely on the ability for PO mode to scan an already existing PO file for a particular string encoded into the `msgid' field of some entry. Even if PO mode has internally all the built-in machinery for implementing this recognition easily, doing it fast is technically difficult. To facilitate a solution to this efficiency problem, we decided on a canonical representation for strings. A conventional representation of strings in a PO file is currently under discussion, and PO mode experiments with a canonical representation. Having both `xgettext' and PO mode converging towards a uniform way of representing equivalent strings would be useful, as the internal normalization needed by PO mode could be automatically satisfied when using `xgettext' from GNU `gettext'. An explicit PO mode normalization should then be only necessary for PO files imported from elsewhere, or for when the convention itself evolves. So, for achieving normalization of at least the strings of a given PO file needing a canonical representation, the following PO mode command is available: `M-x po-normalize' Tidy the whole PO file by making entries more uniform. The special command `M-x po-normalize', which has no associated keys, revises all entries, ensuring that strings of both original and translated entries use uniform internal quoting in the PO file. It also removes any crumb after the last entry. This command may be useful for PO files freshly imported from elsewhere, or if we ever improve on the canonical quoting format we use. This canonical format is not only meant for getting cleaner PO files, but also for greatly speeding up `msgid' string lookup for some other PO mode commands. `M-x po-normalize' presently makes three passes over the entries. The first implements heuristics for converting PO files for GNU `gettext' 0.6 and earlier, in which `msgid' and `msgstr' fields were using K&R style C string syntax for multi-line strings. These heuristics may fail for comments not related to obsolete entries and ending with a backslash; they also depend on subsequent passes for finalizing the proper commenting of continued lines for obsolete entries. This first pass might disappear once all oldish PO files would have been adjusted. The second and third pass normalize all `msgid' and `msgstr' strings respectively. They also clean out those trailing backslashes used by XView's `msgfmt' for continued lines. Having such an explicit normalizing command allows for importing PO files from other sources, but also eases the evolution of the current convention, evolution driven mostly by aesthetic concerns, as of now. It is easy to make suggested adjustments at a later time, as the normalizing command and eventually, other GNU `gettext' tools should greatly automate conformance. A description of the canonical string format is given below, for the particular benefit of those not having Emacs handy, and who would nevertheless want to handcraft their PO files in nice ways. Right now, in PO mode, strings are single line or multi-line. A string goes multi-line if and only if it has _embedded_ newlines, that is, if it matches `[^\n]\n+[^\n]'. So, we would have: msgstr "\n\nHello, world!\n\n\n" but, replacing the space by a newline, this becomes: msgstr "" "\n" "\n" "Hello,\n" "world!\n" "\n" "\n" We are deliberately using a caricatural example, here, to make the point clearer. Usually, multi-lines are not that bad looking. It is probable that we will implement the following suggestion. We might lump together all initial newlines into the empty string, and also all newlines introducing empty lines (that is, for N > 1, the N-1'th last newlines would go together on a separate string), so making the previous example appear: msgstr "\n\n" "Hello,\n" "world!\n" "\n\n" There are a few yet undecided little points about string normalization, to be documented in this manual, once these questions settle.

automatically generated by

info2www

version 1.2.2.9