Survey Tool Help Text

This is a help-text file for use with the survey tool. You can add a new row, where the Path is a regular expression for an XML path, and the Text to Insert is what you want to show up as help text, or modify existing text. The software that interprets this expects a particular format, so don't make arbitrary changes (see the end).

Path Text to Insert
//ldml/localeDisplayNames.*

Display Names

Languages, scripts (writing systems), territories (countries and regions), currencies, and time zones are represented in computers by internal codes, such as " fr " for the French language or " CA " for the country of Canada.

The ISO names and the "official" names are often not the best ones for CLDR. The goal is the most customary name used in your language, even if it is not the official name. For example, for the territory name in English you would use "Switzerland" instead of "Swiss Confederation", and use "United Kingdom" instead of "The United Kingdom of Great Britain and Northern Ireland". The best source for customary usage is to look at what common publications such as newspapers and magazines do. For example, to see how Congo is used in French, one might search http://www.google.com/search?q=Congo+site%3Alemonde.fr and other publications.

All names must be unique within a given category: thus one cannot use the same translated name for the following two codes; only one can be called "Congo":

Code Possible Pairs of Translations
CD Democratic Republic of the Congo or Congo - Kinshasa or Congo - formerly Zaire
CG Congo Congo - Brazzaville Congo

Avoid using commas and avoid inverting the name (eg "Congo, Democratic Republic of the"). The characters "(" and ")" are discouraged, since they will be confusing in combination with countries in locale names.

//ldml/localeDisplayNames/(keys|types).*

Keys

The keys page lists the key names for translation. These identify particular key words used to identify particular types of variants. The calendar types are typically only used with certain languages, however, they can be used with almost any language:

Locale Code Locale Name (English)
fr@calendar=buddhist French (Buddhist Calendar)
de@calendar=buddhist German (Buddhist Calendar)
... ...

The collation (sort order) types, on the other hand, are only used with certain locales (listed below):

Locale Code Locale Name (English)
de@collation=phonebook German (Phonebook Sort Order)
hi@collation=direct Hindi (Direct Sort Order)
zh@collation=pinyin Chinese (Pinyin Sort Order)
zh@collation=stroke Chinese (Stroke Sort Order)
zh@collation=gb2312han Chinese (Simplified Sort Order - GB2312)
zh@collation=big5han Chinese (Traditional Sort Order - Big5)
es@collation=traditional Spanish (Traditional Sort Order)

The last value ( traditional ) is the only one likely to be extended to other languages over time.

//ldml/localeDisplayNames/territories.*

Territories

Territories include both country names and regions: continents and subcontinents (defined by a UN standard). All of these must be unique: for example, you can't give the same name to the country South Africa (the country) and to Southern Africa (the southern region of the continent of Africa), even though there may be no distinction in your language between the terms for "South" and "Southern". Similarly, North America is the continent that extends down to Panama; Northern America is the region of the Americas north of Mexico.

The country name should be the most natural; you may have to adjust the name of the region. So you might say the equivalent of "South Region of Africa", or add clarifying language like "Amérique du Nord continentale" vs "Amérique du Nord". If you have any question as to the extent of any region, see Territory Containment.

  • A common question is whether to capitalize or not. With a new locale, use whatever is normal practice for what should occur in menus. For an existing locale, especially during the vetting period, follow what is used for the other items already translated.
    • If the capitalization convention as a whole for a language needs to be changed, that should be done before the data submission phase for the next release. Please file a bug to request that this be done.
//ldml/localeDisplayNames/languages.*

Languages

There are a lot of languages here (around 500), and you don't need to look at them all! Many are relatively obscure, and not worth translating in a first pass. Please also look at the following points.

  • Please use phrasing corresponding to the English "Baltic Language" for language collections. That is, use terms that would be appropriate to use for indicating that the target text is "a" Baltic Language, without terms that imply exclusion or multiplicity such as "other" (autre), etc. or "languages" (plural).
  • A common question is whether to capitalize or not. With a new locale, use whatever is normal practice for what should occur in menus. For an existing locale, especially during the vetting period, follow what is used for the other items already translated. This is the practice for scripts, territories and other types of items too.
    • If the capitalization convention as a whole for a language needs to be changed, that should be done before the data submission phase for the next release. Please file a bug to request that this be done.
//ldml/localeDisplayNames/languages.*\[@type="[^"]*_[^"]*"\].*

Compound Language Codes

Some language codes are more complex, of the form "en_AU" for Australian English. If you don't add a translation, then those will be represented by a format like "αγγλικά (Αυστραλία)". That is, the translation would be the native name for "English", followed by the native word for "Australia" in parentheses. If that format is ok, then you don't need to translate the more complex language code. The codes zh_Hant and zh_Hans (for Traditional and Simplified Chinese) on the other hand, should always be translated.

There are a few special cases:

  • "Iberian Portuguese" is the style of Portuguese used in Portugal (as opposed to Brazil)
  • Similarly "Iberian Spanish" is the style of Spanish used in Spain (as opposed to Latin America).
  • "Swiss High German" (Schweizer Hochdeutsch), also called "Swiss Standard German", has the code de_CH.
  • "Swiss German" (Schwyzerdütsch) has the code gsw.

A pattern is used to control how the translations for language and region codes are composed into a name when the compound code doesn't have a specific translation. See the section "localeDisplayPattern".

//ldml/localeDisplayNames/scripts.*

Scripts

Normally only a few scripts are really necessary to translate: those that are used in distinguishing the most common languages that are written in multiple ways. These are Hant and Hans (for traditional and simplified Chinese), Cyrillic, Arabic, and Latin.

  • A common question is whether to capitalize or not. With a new locale, use whatever is normal practice for what should occur in menus. For an existing locale, especially during the vetting period, follow what is used for the other items already translated.
    • If the capitalization convention as a whole for a language needs to be changed, that should be done before the data submission phase for the next release. Please file a bug to request that this be done.
.*/currencies.*

Currencies

This is a long list that contains the currency names and currency symbols for each country, plus historical codes. The coverage level option tries to pick out the ones that are most important to translate. Each currency code can be translated in two ways:

  • As a symbol for use in formatting amounts (such as "12 345,68 US$"), and
  • As a name, typically used to show a list of currencies (such as "dollar des États-Unis")
  • With names, the common question of whether to capitalize or not, arises. With the introduction of pluralized units in CLDR 1.6, it is recognized that currency names may be used equally in menus and flowing text. Therefore, for a new locale, use whatever practice is best suited for use in either menus or flowing text (we recognize that the capitalization rule you adopt may have limitations, and we endeavour to add additional features to the Survey Tool, in the future to alleviate some of these limitations ). For an existing locale, especially during the vetting period, follow what is used for the other items already translated.
//ldml/characters/exemplarCharacters.*

Exemplar Character Set

The exemplar character sets contain the commonly used letters for a given modern form of a language. These are used for testing and for determining the appropriate repertoire of letters for charset conversion or text comparison. The term "letter" is interpreted broadly, and includes characters used to form words, such as 是 or 가. If a sequence of characters is considered a "letter", it will be listed between { and }. For example, {ch}.

There are three categories:

  • The standard characters are those used in customary writing, such as [a-z] for English.
  • The auxiliary characters are additional characters used in foreign words found in typical magazines, newspapers, &c. For example, you could see the name Schröder in English in a magazine, so ö is in the set. However, it is very uncommon to see ł, so that isn't in the auxiliary set for English. Publication style guides, such as The Economist Style Guide for English, are useful for this.
  • The currency characters are additional characters used in currency symbols, like 'US$ 1,234'.
  • The index characters are used as an index for categories of items. Unlike the other characters, it should have either uppercase or lowercase, depending on what is typical for the language. Note that if the character set is large without fixed standard sorting (such as Chinese), the value [] should be used. A draft set of characters was mechanically generated, and will need adjustments: for example, characters or strings that never occur at the start of words are typically removed.
The exemplar set is not a complete set of letters used for a language: punctuation and other symbols are not included, nor uppercase letters (except for Turkish İ). The Survey Tool will flag certain fields with The image “http://unicode.org/cldr/apps/warn.png” cannot be displayed, because it contains errors. if they use characters that are not in exemplar sets. In some cases, this is not truly an error, such as where "NaN" is used in numbers. In other cases, the possible actions for you are are:
  • Fix the field value to not use the character.
  • Fix the exemplar sets, because the character actually is acceptable in your language in one of the above categories.

Any range of characters, such as "a b c d e" can be represented compactly as "a-e". For more information, please see Section 5.6 Character Elements in UTS#35: Locale Data Markup Language (LDML).

.*/numbers.*

Numbers

Numbers are formatted using patterns, like "#,###.00". Different characters stand for different parts of the number: they don't have their normal meaning! In particular, you need to use '.' for the decimal point and ',' for the thousands (grouping) separator, even if they are not used that way in your language. Here are the special characters used in number patterns.

Number Format Symbols
Symbol Meaning
. Not a real period: instead, it will be replaced automatically by the character used for the decimal point in your language, listed under symbols/decimal
, Not a real comma: instead, it will be replaced by the "grouping" (thousands) separator in your language, listed under symbols/group
0 Replaced by a digit (or zero if there aren't enough digits).
# Replaced by a digit (or nothing if there aren't enough). Often used to show the position of the ",".
¤ This will be replaced by a currency symbol, such as $ or USD. Note: by default a space is placed between letters in a currency symbol and adjacent numbers. If this is not right for your language, file a bug to change it using the Feedback link.
...;... If your language uses different formats for negative numbers than just adding "-" at the front, you can put in two patterns. For example: #,##0.00¤;(#,##0.00¤) is used to make negative currencies appear like "(1'234,56£)" instead of "-1'234,56£"

For example, the pattern "#,###.00" when used to format the number 12345.678 could result in "12'345,67". That would happen if the grouping separator for your language is an apostrophe, and the decimal separator is a comma. Translators should not change the pattern of zeros (0) or hash marks (#); those will be reset by software. This is true also for currency formats. Even if your currency doesn't use any decimal points, the currency format will have them in the pattern. You need to modify the patterns when:

  • The grouping separator is not by thousands (eg Hindi).
  • The negative pattern doesn't simply add a minus sign. For example, if a negative number is formed by adding parentheses, then this would look like: #,##0.###;(#,##0.###). That is, the negative form gets added after a semicolon.
  • The currency symbol (¤) is used in a different position.
//ldml/dates/calendars/.*/(pattern|dateFormatItem|intervalFormats).*

Formats for Dates and Times

Dates and times are formatted using patterns, like "mm-dd". Each field, like the month or the hour, is represented by a sequence of letters from A to Z. For example, one or more M's stand for the month. When the software formats a date for your language, a value will be substituted for each field, according to the following table.

Date Format Symbols
Symbol Meaning
G era (eg AD)*
y year
M / L month*
E day of the week (eg Tuesday).*
d day
h / H hour. h for 12 hour, H for 24.
m minute
s second
a am/pm. Only used with "h".
z / v time zone. Use v for full format dates, z for long format dates
'a' since letters have special meaning, if you want a real letter, you need to put it in single quotes. For a real single quote, use '' (that is, two adjacent ' characters).

* Some fields use M or MM for numeric (eg, 1 or 01); MMM for abbreviated (eg, Sept); and MMMM for full (eg, September)


//ldml/dates/calendars/.*/intervalFormats.*

Interval Formats

Interval formats are used for a range of dates or times specified by a start and end, such as "Sept 10-12" (meaning the 10th of September through the 12th of September). The pattern will be something like "MMM d–d", where some of the fields are repeated -- typically with some kind of punctuation mark separating the two fields, but some fields in the second part are omitted. The way this pattern is used is that the part up to the first repeated field is formatted with the first date, and the remainder is formatted with the second date. For example:

Interval Formatting
Format String Date 1 Date 2 Result
MMM d–d 2008-09-13 2008-09-15 Sept. 13–15
MMMM–MMMM, yyyy 2008-09-01 2008-11-31 September-November, 2008

Each combination of fields can be used with dates that differ by different amounts. For example, a format for the fields "yMMMd" (year, abbreviated month, and day) could be used with two dates that differ by year, month, or day -- each type of difference might need a different pattern. For example:

Greatest Difference
Date 1 Date 2 Greatest Difference Format String Shares
2008-09-13 2009-09-15 year MMM d, yyyy – MMM d, yyyy nothing
2008-09-01 2008-11-31 month MMM d – MMM d, yyyy year
2008-09-01 2008-09-05 day MMM d–d, yyyy year and month

Look carefully at each of the examples to see the kinds of formats that would be used in your language.
//ldml/dates/calendars/.*Context.*

Stand-Alone vs. Format Styles

Some languages use two different forms of strings (stand-alone and format) depending on the context. Typically the stand-alone version is the nominative form of the word, and the format version is in the genitive.

Make sure that the correct forms are provided, especially for the months, and used in the patterns. That is, suppose that the language uses "Dezembro" for December when standing alone, but "Dezembru" when with a date (meaning the nth day of that month). Then the formats for months could be something like:

Stand-Alone vs Format Months
Format String Example1 Example2
LLL Dezembro Dez.
d MMM 1 Dezembru 1 Dez.
MMM d yy Dezembru 1 1953 1 Dez. 53

Similarly, suppose that your language formats months differently if they have vowels, eg "14 de gener de 2008" but "14 d'abril de 2008". In that case, the stand-alone and format versions of the months should be:

Format Month Stand-Alone Month
de gener gener
d'abril abril

These must be coordinated with the format strings, which can't have the extra "de" before the month:

Format String Date Result
LLL 2008-1-14 gener
2008-4-14 abril
d MMM 'de' yyyy 2008-1-14 14 d'abril de 2008
2008-4-14 14 de gener de 2008

That is, if your language uses two different forms, then make sure that there are two forms of the months or days where necessary, and adjust the date patterns to use the LLL or LLLL stand-alone form or MMM and MMMM format forms, as needed.

//ldml/dates/calendars/calendar.*timeFormatLength

Standard Time Formats

There are four standard time formats.

  • full should contain hour, minute, second, and long zone (vvvv).
  • long should contain hour, minute, second, and zone (z)
  • medium should contain hour, minute, second.
  • short should contain hour, minute.
//ldml/dates/calendars/calendar.*/quarters/.*

Quarters

The quarters of a year are used in formats such as "2006Q3", typically used for financial periods. If your language doesn't have a common term for this, you might use the equivalent of "Jan-Mar".

//ldml/dates/calendars/calendar.*/fields.*displayName.*

Date Field Labels

The date field labels are the names of the dates or time field, such as "Month" or "Hour", suitable for labels in dialogs or menus.

//ldml/dates/calendars/calendar.*/fields.*relative.*

Relative Periods of Time

Relative fields of time are used to indicate a period relative to today, like "Yesterday" or "Tomorrow". Some languages don't have words or short phrases for some of these. For example, English does not have a word for "the day before yesterday" as some languages do, such as "Vorgestern" in German.

If your language doesn't have a natural term for one of these, please do not supply a translation: instead, pick the "inherited" value, such as The day after tomorrow . The English phrase supplied here is just a placeholder to let you know what the field means, and is not part of the actual English locale data.

//ldml/dates/calendars/calendar.*/(a|p)m

AM and PM

Note that even if your language doesn't use am/pm in any patterns, strings for those need to be defined for testing. As long as the 24 hour symbol (H) is used in the patterns, it won't show up in formatted times and dates.

//ldml/dates/calendars/calendar.*dateTimeFormatLength.*

Date-Time Pattern

The date-time pattern is used to make a date + time out of separate date and time patterns. The date will be substituted for {1} and the time for {0}. It usually doesn't need to be changed.

.*narrow.*

Narrow Date Fields

The narrow date fields are the shortest possible names (in terms of width in common fonts), and are not guaranteed to be unique. Think of what you might find on a credit-card-sized wallet or checkbook calendar, such as in English for days of the week:

S M T W T F S

.*/eras.*

Eras

There are only two values for an era in a Gregorian calendar, "BC" and "AD". These values can be translated into other languages, like "a.C." and and "d.C." for Spanish, but there are no other eras in the Gregorian calendar.

Other calendars have a different numbers of eras. The names for eras are often specific to the given calendar, such as the Japanese era names. You only typically need to translate these if the calendar in question is in common use in one of the countries that uses your language.

.*/references.*

References

References are used to document more controversial cases. Whenever there is a disagreement between translators, or when the choice of translation might not be understood, you should add a reference.

  • Fill in a descriptive title for the reference, such as "The Economist Style Guide"
  • Click the Save button. You will see your new reference listed, and you can add it to other fields.
.*/exemplarCity.*

Time Zone Exemplar Cities

For generic references to time zones, the country is used if possible, composed with a pattern that in English appears as "{0} Time". Thus a time zone may appear as "Malaysia Time" or "Hora de Malasia". If the country has multiple time zones, then a city is used to distinguish which one, thus "Argentina (La Rioja) Time".

Thus normally cities thus only need to be translated if they are in a country with multiple time zones.

.*(M|m)etazone.*

Metazones

For some time zones, the survey tool will state that a particular metazone is in effect. A metazone is simply a grouping of time zones that share a common display name in customary usage. For example, Europe/Paris , Europe/Berlin , and many other time zones share a common display name "Central European Time", and have a common metazone Europe_Central . Use of a metazone allows us to translate this text only once while it can be use in many different time zones. The survey tool will show the default mappings for when a particular metazone was in use for a particular time zone. If you believe the mappings to be incorrect for your locality, please use the link to record any desired changes to the metazone mappings. Metazones have the same display fields as regular time zones, except that they have no exemplar city associated with them.

Often there are situations where a particular time zone has an abbreviation, but the abbreviation is so seldom used that most people would not recognize it. The "commonlyUsed" field for a metazone is used to indicate that abbreviations for a particular time zone or metazone are in common use in the locale. You have two choices:

  • If the GMT format would be understood better, set commonlyUsed to "false"
  • Otherwise, if the abbriviation is commonly understood, set commonlyUsed  to "true".

For example: In English, PST is a commonly used abbreviation for "Pacific Standard Time", for the metazone America_Pacific . While NPT is an abbreviation for "Nepal Time", most English speakers would not recognize the meaning of "2:00 PM NPT". Thus, commonlyUsed should be true for America_Pacific (displaying, for example, 2:00 PM PST) and false for Asia/Katmandu (displaying, for example, "4:00 GMT+05:45").

//ldml/posix/messages.*

POSIX Yes and No

The POSIX yes and no strings should be whatever should count for "No" and "Yes" in your language, plus abbreviations. Don't worry about uppercases, that will be done automatically. Multiple forms can be entered separated by ":", such as "ne:n".

//ldml/layout/in(List|Text).*

Casing Verification

These values can be used to help testing. If the value is set to anything but "mixed", then the items of that type will be checked whether they match, to help to catch inconsistencies. For example, if your language usually has the names of territories in lowercase, then set the value for territories to be "lowercase-words". The values are:

Values Example
mixed This is a mixture of Titlecase and lowercase.
lowercase-words this is a mixture of titlecase and lowercase.
titlecase-words This Is A Mixture Of Titlecase And Lowercase
titlecase-firstword This is a mixture of titlecase and lowercase.

The layout/​inList item has the same values, but a different use. It signals that if the items are put into a list (such as a menu on a computer), then they should be mechanically changed. For example, suppose that names of languages are normally lowercase, but when put into a menu they should normally have the first letter of the first word capitalized. If that's true, then you should set this value to titlecase-firstword .

If that value is wrong for any individual item, then you can override that particular item by adding an "alt" value. To do so, contact your administrator.

//ldml/delimiters/.*

Delimiters

Change this field if your language uses different quotation marks. The alternate forms are for embedded quotations, such as "He said 'Stop!'".

//ldml/dates/dateRangePattern.*

Ranges of Dates

Modify this field to control how a range of dates appears, eg "Oct 12 - Nov 9".

//ldml/dates/timeZoneNames/fallbackFormat

Country-Based Time Zone City Pattern

Modify this field to control the formatting of Country-Based time zone display when a country has multiple time zones, and the city is used to disambiguate them. In the pattern, {0} will be replaced by the city and {1} will be the country. This is normally not changed, except perhaps in languages that don't use spaces.

//ldml/dates/timeZoneNames/gmtFormat

GMT Pattern

Modify this field if the format for GMT time uses different letters, such as HUA+0200 for GMT+02:00, or if the letters GMT occur after the time. Make sure you include the {0}; that is where the actual time value will go!

//ldml/dates/timeZoneNames/hourFormat

GMT Hours Pattern

This field controls the format for the time used with the GMT Pattern. It contains two patterns separated by a ";". The first controls positive time values (and zero), and the second controls the negative values. So to get GMT+02.00 for positive values, and GMT-02.00 for negative values, you'd use +HH.mm;-HH.mm.

//ldml/dates/timeZoneNames/regionFormat

Country-Based Time Zone Pattern

For generic references to time zones, the country is used if possible, composed with a pattern that in English appears as "{0} Time". Thus a time zone may appear as "Malaysia Time" or "Hora de Malasia". If the country has multiple time zones, then a city is used to distinguish which one, thus "Argentina (La Rioja) Time".

Some languages would normally have grammatical adjustments depending on what the name of the city is. For example, one might need "12:43 pm Tempo d'Australia" but "12:43 pm Tempo de Paris". In that case, there are two approaches:

  1. Use "{0}", which will give results like "12:43 pm Australia" and "12:43 pm Paris", or
  2. Use a "form-style" phrasing such as "Tempo de: {0}", which will give results like "12:43 pm Tempo de: Australia" and "12:43 pm Tempo de: Paris".
//ldml/dates/.*/days/.*

Days of the Week

This field is one of the days of the week, such as Sunday or Monday.

.*/timeZoneNames.*

Time Zones

In the standard used for time zones, a time zone is an area of a country that has consistent behavior in terms of its offset from Greenwich Mean Time. In particular, within that zone, the same daylight-savings (summer-time) behavior is observed, now and in the past and future (as far as is known). This means that time zones are fairly fine granularity, as you can see by consulting Territory Containment. The name of the time zone is taken from the most populous city, such as America/Denver. Here are some examples of time zones, and why they are distinct from America/Denver.

Time zone Reason
America/​Chicago Chicago has a different standard offset from GMT (6 hours) than Denver (7 hours).
America/​Phoenix While Phoenix has the same GMT offset as Denver, it doesn't have daylight savings time, while Denver does.
America/​Edmonton Although Edmonton has the same offset and daylight savings behavior as Denver, it is in a different country

Time zones can be displayed in a variety of ways, depending on the environment and program requirements. Here are some examples:

Sample Time Zone Formats
Named List Format Abbreviated With a Time
Country-Based United States (Los Angeles) Time 12:43 pm United States (Los Angeles) Time
Italy Time 12:43 pm Italy Time
Named

 

Pacific Time PT 12:43 pm Pacific Time 12:43 pm PT
Central European Time CET 12:43 pm Central European Time 12:43 pm CET
Pacific Standard Time PST 12:43 pm Pacific Standard Time 12:43 pm PST
GMT GMT-8:00 12:43 pm GMT-8:00
GMT+2:00 12:43 pm GMT+2:00

These are composed from different pieces that you translate.

  • For the country-based formats, you'll be translating the country names anyway, but also city names where a country has multiple zones. You'll also be translating a pattern for the "Time" portion (or leaving it blank if that is better for your language).

  • For the named formats, you'll have the opportunity to translate specific names for that zone, or names common to groupings of time zones (called metazones) that span multiple time zones. You have 6 possible strings to translate: generic (Pacific Time), standard (Pacific Standard Time), daylight (Pacific Daylight Time), plus abbreviations of those. You only want to provide names (and especially abbreviations) where those are customarily understood by speakers of your language. Just because they are in English doesn't mean they should always be translated in your language.

  • For the GMT format, you'll be translating the term "GMT" (if necessary for your language), and the format for the hours (eg, +8:45 vs. +8.45).

//ldml/dates/.*/months/.*

Months of the Year

This field is one of the months of the year, such as January or February.

//ldml/fallback

Locale Fallbacks

You should add here a list of locales that would be most natural to use when no translation is available (this is called a fallback). This is especially useful for minority languages. For example, for Breton [br] the most natural language to fall back to might be French [fr], that is, to use French names for countries that aren't translated. Similarly, the fallback for Moldavian [mo] might be Romanian [ro].

Fallbacks should only be included if a substantial majority of people speaking the language in question would be likely to understand the fallback language. If there are no such languages, the fallback field should be left blank.

Fallbacks can take the script or region into account; the fallback for Northern Sámi (Finland) [se-FI] might be Finnish (Finland) [fi-FI], while the fallback for Northern Sámi [se] generally might be Norwegian [no].

The values you need to use are locale codes, not the names or translations; thus you would put in  fr or  fr_BE , not "French" or "français". If you don't know the codes for the languages in question, you can consult the survey tool Locales, or the BCP 47 registry.

Multiple fallback languages can be entered in order of priority, separated by spaces, for example: nl en.

//ldml/(units/unit|numbers/currencies/currency.*/displayName).*

Localized Units

Localized units provide more natural ways of expressing unit phrases that vary in plural form, such as "1 hour" vs "2 hours". While they cannot express all the intricacies of natural languages, they allow for more natural phrasing than constructions like "1 hour(s)".

Please review the draft rules that CLDR is using for plurals for your language, at Language Plural Rules, and the description there about the plural categories.

Each unit may have multiple plural forms, one for each category. These are composed with numbers using a unitPattern of the form "{0} {1}". A formatted number will be substituted in place of the "{0}", while the unit value will be subsituted in place of the "{1}".

For example, for English if the unit is an hour and the number is 1234, then the number is looked up to get the rule category other. The number is then formatted into "1,234" and composed with the unitName for other and the unitPattern for other to get the final result. Examples are in the table below.

Locale Unit Number Formatted number Plural category unitName for category unitPattern for category Final Result
en hour 0 "0" other "hours" "{0} {1}" "0 hours"
en hour 1 "1" one "hour" "{0} {1}" "1 hour"
en hour 1234 "1,234" other "hours" "{0} {1}" "1,234 hours"
fr hour 0 "0" one "heure" "{0} {1}" "0 heure"
fr hour 1 "1" one "heure" "{0} {1}" "1 heure"
fr hour 1234 "1 234" other "heures" "{0} {1}" "1 234 heures"

There is one "default" unitPattern for each plural category, listed under the unit "one". If the particular unit needs a special unitPattern for a particular plural category, then one can also be added. That is, suppose that for a particular language, in the plural the number goes after the translation of hour instead of before. Then for the unit hour, and plural category other, the unitPattern can be different if needed.

The key is, if the examples look ok you shouldn't need to do anything.

To request a change in the plural rules, please file a request in a bug report.

//ldml/localeDisplayNames/localeDisplayPattern/localePattern.*

Locale Display Patterns


Locale display patterns are used to format a compound language (locale) name such as 'en_AU' or 'uz_Arab'. The pattern is something like "{0} ({1})". When the locale is formatted, the language is substituted for {0}, and the region or script for {1}.

For example, take "en_AU". First the language code 'en' is translated, such as to "anglais", then the country is translated, such as "Australie". The patterns is used to put those together, into something like "anglais (Australie)".  This works the same way if there is a script; for example,  "uz-Arab" => "ouzbek (arabe)".

If there is both a script and a region, then a list is formed using the separator, then {1} is replaced by that list, such as "uz-Arab-AF" => "ouzbek (arabe, Afghanistan)"

For certain compound language (locale) names, you can also supply specific translations. Thus for the whole locale 'en_GB', you can provide a translation like "Australian English".

//ldml/localeDisplayNames/codePatterns/codePattern.*

Code Patterns

Code patterns are used in lists where the name of the language, script, or region is not available -- the code (like "de" for German) will be substituted for the {0} placeholder.  Thus you if the language code 'zaz' is not translated in your language, you might see in a list something like:
  • English
  • French
  • Language: zaz
  • Spanish
The last line is the result of substituting the code 'zaz' into the code pattern. You can choose the pattern that makes sense for your language; if the best choice is to just use the code alone, then use {0}.

The text to insert can be fairly arbitrary HTML. The software that reads this table will search the first column (eg between <td> and </td>) and return the contents of the second column. We plan on adding a few variables also, for the current locale name, in particular. This file uses the survey tool style-sheet, so you can use those styles (and icons, like [stop] ) in the text to insert.

WARNING