Chart Messages

This is a help-text file for use with the survey tool and charts. You can add a new row, where the key is a key that the program knows about, and the Text to Insert is what you want to show up as help text, or modify existing text. The software that interprets this expects a particular format, so don't make arbitrary changes (see the end).

Key Text to Insert
territory_language_information The main goal for CLDR language data is to provide approximate figures for the literate, functional population for each language in each territory: that is, the population that is able to read and write each language, and is comfortable enough to use it with computers.

The GDP and Literacy figures are taken from the World Bank where available, otherwise supplemented by FactBook data and other sources. The GDP figures are "PPP (constant 2000 international $)". Much of the per-language data is taken from the Ethnologue, but is supplemented and processed using many other sources, including per-country census data. (The focus of the Ethnologue is native speakers, which includes people who are not literate, and excludes people who are functional second-langauge users.)

The literacy rate may be discounted to reflect the actual usage of the written form in normal daily life. Thus languages that are typically not written, such as Swiss German, will be given a low literacy rate, even though the whole population could write in Swiss German.

The percentages may add up to more than 100% due to multilingual populations, or may be less than 100% due to illiteracy or because the data has not yet been gathered or processed. Languages with a small population may be omitted.

Official status is supplied where available, formatted as {O}. Hovering with the mouse shows a short description.

  • Likely languages and scripts:To see (and verify) the likely languages and scripts for this subtag, click on the country code.
  • Reporting Defects: If you find errors or omissions in this data, please report the information with the bug or add new links, below.
  • XML Source: supplementalData.xml (see the <territoryInfo>, <calendarData>, <weekData>, and <measurementData> elements)
language_territory_information

For information on the meaning of the different values, see Territory-Language Information.

  • Reporting Defects: If you find errors or omissions in this data, or add a new territory for a language, see the add new links below.
  • XML Source: supplementalData.xml (see the <territoryInfo> element)
detailed_territory_currency_information

The following table shows when currencies were in use in different countries. See also Decimal Digits and Rounding. The digits column shows the number of digits to use; if there is special rounding (such as for CH), that is in parentheses. The Countries column shows which countries the currency is or has been used in, officially.

  • Reporting Defects: If you find errors or omissions in this data, please report the information with a bug report.
  • XML Source: supplementalData.xml (see the <currencyData> element)
languages_and_scripts This table shows some information about the scripts commonly used with different languages. This information is not complete, and is being enhanced over time. The table is sorted by language; for the same information sorted by script, see Scripts and Languages. The following conventions are used in the table:
Column Comment
Language Where there isn't any information in Unicode CLDR as to which languages are written in a given script, the language code is given as Unknown or Invalid Language ("und").
ML The modern language column shows "O" if the language is not in customary modern use (currently following ISO 639-3 Types: Ancient, Extinct, Historical, or Constructed).
P The Primary column shows "N" if the language is neither an official nor a defacto-official language of some country. For more information, see Language-Territory Information.
Script Where there isn't any information in Unicode CLDR as to which script is used by a language, the script code is given as Unknown or Invalid Script ("Zzzz").
MS The modern script column shows "N" if the script is not in customary modern use.
  • Reporting Defects: If you find errors or omissions in this data, please report the information with a bug report.
  • XML Source: supplementalData.xml (see the <languageData> element)
scripts_and_languages This table shows some information about the scripts commonly used with different languages. This information is not complete, and is being enhanced over time. The table is sorted by script; for the same information sorted by language, see Languages and Scripts. The following conventions are used in the table:
Column Comment
Language Where there isn't any information in Unicode CLDR as to which languages are written in a given script, the language code is given as Unknown or Invalid Language ("und").
ML The modern language column shows "O" if the language is not in customary modern use (currently following ISO 639-3 Types: Ancient, Extinct, Historical, or Constructed).
P The Primary column shows "N" if the language combination is neither an official nor a defacto-official language of some country. For more information, see Language-Territory Information.
Script Where there isn't any information in Unicode CLDR as to which script is used by a language, the script code is given as Unknown or Invalid Script ("Zzzz").
MS The modern script column shows "N" if the script is not in customary modern use.
  • Reporting Defects: If you find errors or omissions in this data, please report the information with a bug report.
  • XML Source: supplementalData.xml (see the <languageData> element)
territory_containment_un_m_49

The Territory Containment table shows the organization of territories and regions according to UN M.49, starting with the World. (CLDR supplements this table with the QO code for outlying areas that would not otherwise be included.) As the last column, the timezone IDs for that country are listed.

  • Reporting Defects: If you find errors or omissions in this data, please report the information with a bug report. However, such reports should be limited to cases where the information here deviates from UN M.49.
  • XML Source: supplementalData.xml (see the <territoryContainment> and <timezoneData> elements)
zone_tzid

The Zone-Tzid table shows the mapping from Windows timezone IDs to the standard TZIDs.

character_fallback_substitutions The Character Fallback Substitutions table shows recommended fallbacks for use when a charset or supported repertoire does not contain a desired character, using the data from characters.xml. There is more than one possible fallback: the recommended usage is that when a character value is not in the desired repertoire the following process is used, whereby the first value that is wholly in the desired repertoire is used.
  • toNFC(value)
  • other canonically equivalent sequences, if there are any
  • the explicit substitutes value from characters.xml (in order)
  • toNFKC(value)

The Explicit, NFC, and NFKC substitutes are shown in the chart by different colors. Note that the character fallbacks do lose information, and should not be used where there is a viable alternative, such as HTML escapes.

  • Reporting Defects: If you find errors or omissions in this data, please report the information with a bug report.
  • XML Source: characters.xml 
aliases

Aliases show how to map deprecated codes or aliases onto the ones that should be used to access CLDR data. Most other metadata is not shown in tables; the source data should be consulted. Codes are shown in brackets before or after the English name, eg "Vanuatu [VU]"

likely_subtags There are a number of situations where it is useful to be able to find the most likely language, script, or region, if that information is otherwise missing. For example:
  • Given the language "zh" and the region "TW", what is the most likely script?
  • Given the script "Thai" what is the most likely language or region?
  • Given the region TW, what is the most likely language and script?

Conversely, given a locale, it is useful to find out which fields (language, script, or region) may be superfluous, in the sense that they contain the likely tags. For example, "en_Latn" can be simplified down to "en" since "Latn" is the likely script for "en"; "ja_Japn_JP" can be simplified down to "ja".

The likelySubtag supplemental data provides default information for computing these values. This data is based on the default content data, the population data, and the the suppress-script data in [BCP47]. It is heuristically derived, and may change over time. The chart shows how the data "fills in" the missing fields in the source values to get the target values.

  • Reporting Defects: If you find errors or omissions in this data, please report the information with a bug report.
language_plural_rules

Languages vary in how they handle plurals of nouns or unit expressions ("hours", "meters", and so on). Some languages have two forms, like English; some languages have only a single form; and some languages have multiple forms (see Slovenian below). They also vary between cardinals (such as 1, 2, or 3) and ordinals (such as 1st, 2nd, or 3rd), and in ranges of cardinals (such as "1-2", used in expressions like "1-2 meters long"). CLDR uses short, mnemonic tags for these plural categories. For more information on these categories, see Plural Rules.

  • Examples: The symbol ~ (as in "1.7~2.1") has a special meaning: it is a range of numbers that includes the end points (1.7 and 2.1), and everything between that has exactly the same number of decimals as the end points (thus also 1.8, 1.9, and 2.0, but not 2 or 1.91 or 1.90). The samples are generated mechanically, and are not comprehensive: “0, 2~19, 101~119, …” could show up as the less-complete “0, 2~16, 101 …”.
  • Rules: The plural categories are computed based on machine-readable rules, using the syntax described in Language Plural Rules. In particular, they use special variables and relation defined in Plural Rule Operands and following.
  • Reporting Defects: When you find errors or omissions in this data, please report the information with a bug report. But first read "Reporting Defects" on Plural Rules.
error_locale_header|error_index_header

Please review and correct them. Note that errors in sublocales are often fixed by fixing the main locale.

This list is only generated daily, and so may not reflect fixes you have made until tomorrow. (There were production problems in integrating it fully into the Survey tool. However, it should let you see the problems and make sure that they get taken care of.)

The table below gives a count for each of the following kinds of items. The focus is on correcting the problems, and getting enough votes for "minimal approval" (status=contributed -- high enough to get incorporated into most implementations).

  • Disputed: Of those voting on an item, if enough switched their vote the item could have minimal approval.
  • Conflicted: For this many items, the organization is losing a vote because of conflicts within the organization.
  • Error: The item has a serious error and must be corrected.
  • Warning: The item has a significant problem that should be corrected.
  • Missing Coverage: These items should be translated but are missing.
  • Missing Votes: These items have translations, but not enough votes for "minimal approval".

The text to insert can be fairly arbitrary HTML. The software that reads this table will search the first column (eg between <td> and </td>) and return the contents of the second column.

WARNING