The collation feature allows specifying the sort order and character
   classification behavior of data per-column, or even per-operation.
   This alleviates the restriction that the
   LC_COLLATE and LC_CTYPE settings
   of a database cannot be changed after its creation.
  
    Conceptually, every expression of a collatable data type has a
    collation.  (The built-in collatable data types are
    text, varchar, and char.
    User-defined base types can also be marked collatable, and of course
    a domain over a collatable data type is collatable.)  If the
    expression is a column reference, the collation of the expression is the
    defined collation of the column.  If the expression is a constant, the
    collation is the default collation of the data type of the
    constant.  The collation of a more complex expression is derived
    from the collations of its inputs, as described below.
   
The collation of an expression can be the “default” collation, which means the locale settings defined for the database. It is also possible for an expression's collation to be indeterminate. In such cases, ordering operations and other operations that need to know the collation will fail.
    When the database system has to perform an ordering or a character
    classification, it uses the collation of the input expression.  This
    happens, for example, with ORDER BY clauses
    and function or operator calls such as <.
    The collation to apply for an ORDER BY clause
    is simply the collation of the sort key.  The collation to apply for a
    function or operator call is derived from the arguments, as described
    below.  In addition to comparison operators, collations are taken into
    account by functions that convert between lower and upper case
    letters, such as lower, upper, and
    initcap; by pattern matching operators; and by
    to_char and related functions.
   
For a function or operator call, the collation that is derived by examining the argument collations is used at run time for performing the specified operation. If the result of the function or operator call is of a collatable data type, the collation is also used at parse time as the defined collation of the function or operator expression, in case there is a surrounding expression that requires knowledge of its collation.
    The collation derivation of an expression can be
    implicit or explicit.  This distinction affects how collations are
    combined when multiple different collations appear in an
    expression.  An explicit collation derivation occurs when a
    COLLATE clause is used; all other collation
    derivations are implicit.  When multiple collations need to be
    combined, for example in a function call, the following rules are
    used:
    
If any input expression has an explicit collation derivation, then all explicitly derived collations among the input expressions must be the same, otherwise an error is raised. If any explicitly derived collation is present, that is the result of the collation combination.
Otherwise, all input expressions must have the same implicit collation derivation or the default collation. If any non-default collation is present, that is the result of the collation combination. Otherwise, the result is the default collation.
If there are conflicting non-default implicit collations among the input expressions, then the combination is deemed to have indeterminate collation. This is not an error condition unless the particular function being invoked requires knowledge of the collation it should apply. If it does, an error will be raised at run-time.
For example, consider this table definition:
CREATE TABLE test1 (
    a text COLLATE "de_DE",
    b text COLLATE "es_ES",
    ...
);
Then in
SELECT a < 'foo' FROM test1;
    the < comparison is performed according to
    de_DE rules, because the expression combines an
    implicitly derived collation with the default collation.  But in
SELECT a < ('foo' COLLATE "fr_FR") FROM test1;
    the comparison is performed using fr_FR rules,
    because the explicit collation derivation overrides the implicit one.
    Furthermore, given
SELECT a < b FROM test1;
    the parser cannot determine which collation to apply, since the
    a and b columns have conflicting
    implicit collations.  Since the < operator
    does need to know which collation to use, this will result in an
    error.  The error can be resolved by attaching an explicit collation
    specifier to either input expression, thus:
SELECT a < b COLLATE "de_DE" FROM test1;
or equivalently
SELECT a COLLATE "de_DE" < b FROM test1;
On the other hand, the structurally similar case
SELECT a || b FROM test1;
    does not result in an error, because the || operator
    does not care about collations: its result is the same regardless
    of the collation.
   
The collation assigned to a function or operator's combined input expressions is also considered to apply to the function or operator's result, if the function or operator delivers a result of a collatable data type. So, in
SELECT * FROM test1 ORDER BY a || 'foo';
    the ordering will be done according to de_DE rules.
    But this query:
SELECT * FROM test1 ORDER BY a || b;
    results in an error, because even though the || operator
    doesn't need to know a collation, the ORDER BY clause does.
    As before, the conflict can be resolved with an explicit collation
    specifier:
SELECT * FROM test1 ORDER BY a || b COLLATE "fr_FR";
    A collation is an SQL schema object that maps an SQL name to locales
    provided by libraries installed in the operating system.  A collation
    definition has a provider that specifies which
    library supplies the locale data.  One standard provider name
    is libc, which uses the locales provided by the
    operating system C library.  These are the locales that most tools
    provided by the operating system use.  Another provider
    is icu, which uses the external
    ICU library.  ICU locales can only be
    used if support for ICU was configured when PostgreSQL was built.
   
    A collation object provided by libc maps to a
    combination of LC_COLLATE and LC_CTYPE
    settings, as accepted by the setlocale() system library call.  (As
    the name would suggest, the main purpose of a collation is to set
    LC_COLLATE, which controls the sort order.  But
    it is rarely necessary in practice to have an
    LC_CTYPE setting that is different from
    LC_COLLATE, so it is more convenient to collect
    these under one concept than to create another infrastructure for
    setting LC_CTYPE per expression.)  Also,
    a libc collation
    is tied to a character set encoding (see Section 23.3).
    The same collation name may exist for different encodings.
   
    A collation object provided by icu maps to a named
    collator provided by the ICU library.  ICU does not support
    separate “collate” and “ctype” settings, so
    they are always the same.  Also, ICU collations are independent of the
    encoding, so there is always only one ICU collation of a given name in
    a database.
   
    On all platforms, the collations named default,
    C, and POSIX are available.  Additional
    collations may be available depending on operating system support.
    The default collation selects the LC_COLLATE
    and LC_CTYPE values specified at database creation time.
    The C and POSIX collations both specify
    “traditional C” behavior, in which only the ASCII letters
    “A” through “Z”
    are treated as letters, and sorting is done strictly by character
    code byte values.
   
    Additionally, the SQL standard collation name ucs_basic
    is available for encoding UTF8.  It is equivalent
    to C and sorts by Unicode code point.
   
    If the operating system provides support for using multiple locales
    within a single program (newlocale and related functions),
    or if support for ICU is configured,
    then when a database cluster is initialized, initdb
    populates the system catalog pg_collation with
    collations based on all the locales it finds in the operating
    system at the time.
   
    To inspect the currently available locales, use the query SELECT
    * FROM pg_collation, or the command \dOS+
    in psql.
   
    For example, the operating system might
    provide a locale named de_DE.utf8.
    initdb would then create a collation named
    de_DE.utf8 for encoding UTF8
    that has both LC_COLLATE and
    LC_CTYPE set to de_DE.utf8.
    It will also create a collation with the .utf8
    tag stripped off the name.  So you could also use the collation
    under the name de_DE, which is less cumbersome
    to write and makes the name less encoding-dependent.  Note that,
    nevertheless, the initial set of collation names is
    platform-dependent.
   
    The default set of collations provided by libc map
    directly to the locales installed in the operating system, which can be
    listed using the command locale -a.  In case
    a libc collation is needed that has different values
    for LC_COLLATE and LC_CTYPE, or if new
    locales are installed in the operating system after the database system
    was initialized, then a new collation may be created using
    the CREATE COLLATION command.
    New operating system locales can also be imported en masse using
    the pg_import_system_collations() function.
   
    Within any particular database, only collations that use that
    database's encoding are of interest.  Other entries in
    pg_collation are ignored.  Thus, a stripped collation
    name such as de_DE can be considered unique
    within a given database even though it would not be unique globally.
    Use of the stripped collation names is recommended, since it will
    make one fewer thing you need to change if you decide to change to
    another database encoding.  Note however that the default,
    C, and POSIX collations can be used regardless of
    the database encoding.
   
PostgreSQL considers distinct collation objects to be incompatible even when they have identical properties. Thus for example,
SELECT a COLLATE "C" < b COLLATE "POSIX" FROM test1;
    will draw an error even though the C and POSIX
    collations have identical behaviors.  Mixing stripped and non-stripped
    collation names is therefore not recommended.
   
    With ICU, it is not sensible to enumerate all possible locale names.  ICU
    uses a particular naming system for locales, but there are many more ways
    to name a locale than there are actually distinct locales.
    initdb uses the ICU APIs to extract a set of distinct
    locales to populate the initial set of collations.  Collations provided by
    ICU are created in the SQL environment with names in BCP 47 language tag
    format, with a “private use”
    extension -x-icu appended, to distinguish them from
    libc locales.
   
Here are some example collations that might be created:
de-x-icuGerman collation, default variant
de-AT-x-icuGerman collation for Austria, default variant
        (There are also, say, de-DE-x-icu
        or de-CH-x-icu, but as of this writing, they are
        equivalent to de-x-icu.)
       
und-x-icu (for “undefined”)ICU “root” collation. Use this to get a reasonable language-agnostic sort order.
    Some (less frequently used) encodings are not supported by ICU.  When the
    database encoding is one of these, ICU collation entries
    in pg_collation are ignored.  Attempting to use one
    will draw an error along the lines of “collation "de-x-icu" for
    encoding "WIN874" does not exist”.
   
If the standard and predefined collations are not sufficient, users can create their own collation objects using the SQL command CREATE COLLATION.
    The standard and predefined collations are in the
    schema pg_catalog, like all predefined objects.
    User-defined collations should be created in user schemas.  This also
    ensures that they are saved by pg_dump.
   
New libc collations can be created like this:
CREATE COLLATION german (provider = libc, locale = 'de_DE');
     The exact values that are acceptable for the locale
     clause in this command depend on the operating system.  On Unix-like
     systems, the command locale -a will show a list.
    
     Since the predefined libc collations already include all collations
     defined in the operating system when the database instance is
     initialized, it is not often necessary to manually create new ones.
     Reasons might be if a different naming system is desired (in which case
     see also Section 23.2.2.3.3) or if the operating system has
     been upgraded to provide new locale definitions (in which case see
     also pg_import_system_collations()).
    
    ICU allows collations to be customized beyond the basic language+country
    set that is preloaded by initdb.  Users are encouraged
    to define their own collation objects that make use of these facilities to
    suit the sorting behavior to their requirements.
    See https://unicode-org.github.io/icu/userguide/locale/
    and https://unicode-org.github.io/icu/userguide/collation/api.html for
    information on ICU locale naming.  The set of acceptable names and
    attributes depends on the particular ICU version.
   
Here are some examples:
CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de-u-co-phonebk');CREATE COLLATION "de-u-co-phonebk-x-icu" (provider = icu, locale = 'de@collation=phonebook');German collation with phone book collation type
The first example selects the ICU locale using a “language tag” per BCP 47. The second example uses the traditional ICU-specific locale syntax. The first style is preferred going forward, but it is not supported by older ICU versions.
Note that you can name the collation objects in the SQL environment anything you want. In this example, we follow the naming style that the predefined collations use, which in turn also follow BCP 47, but that is not required for user-defined collations.
CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = 'und-u-co-emoji');CREATE COLLATION "und-u-co-emoji-x-icu" (provider = icu, locale = '@collation=emoji');Root collation with Emoji collation type, per Unicode Technical Standard #51
Observe how in the traditional ICU locale naming system, the root locale is selected by an empty string.
CREATE COLLATION latinlast (provider = icu, locale = 'en-u-kr-grek-latn');CREATE COLLATION latinlast (provider = icu, locale = 'en@colReorder=grek-latn');Sort Greek letters before Latin ones. (The default is Latin before Greek.)
CREATE COLLATION upperfirst (provider = icu, locale = 'en-u-kf-upper');CREATE COLLATION upperfirst (provider = icu, locale = 'en@colCaseFirst=upper');Sort upper-case letters before lower-case letters. (The default is lower-case letters first.)
CREATE COLLATION special (provider = icu, locale = 'en-u-kf-upper-kr-grek-latn');CREATE COLLATION special (provider = icu, locale = 'en@colCaseFirst=upper;colReorder=grek-latn');Combines both of the above options.
CREATE COLLATION numeric (provider = icu, locale = 'en-u-kn-true');CREATE COLLATION numeric (provider = icu, locale = 'en@colNumeric=yes');
        Numeric ordering, sorts sequences of digits by their numeric value,
        for example: A-21 < A-123
        (also known as natural sort).
       
    See Unicode
    Technical Standard #35
    and BCP 47 for
    details.  The list of possible collation types (co
    subtag) can be found in
    the CLDR
    repository.
   
    Note that while this system allows creating collations that “ignore
    case” or “ignore accents” or similar (using the
    ks key), in order for such collations to act in a
    truly case- or accent-insensitive manner, they also need to be declared as not
    deterministic in CREATE COLLATION;
    see Section 23.2.2.4.
    Otherwise, any strings that compare equal according to the collation but
    are not byte-wise equal will be sorted according to their byte values.
   
By design, ICU will accept almost any string as a locale name and match it to the closest locale it can provide, using the fallback procedure described in its documentation. Thus, there will be no direct feedback if a collation specification is composed using features that the given ICU installation does not actually support. It is therefore recommended to create application-level test cases to check that the collation definitions satisfy one's requirements.
The command CREATE COLLATION can also be used to create a new collation from an existing collation, which can be useful to be able to use operating-system-independent collation names in applications, create compatibility names, or use an ICU-provided collation under a more readable name. For example:
CREATE COLLATION german FROM "de_DE"; CREATE COLLATION french FROM "fr-x-icu";
A collation is either deterministic or nondeterministic. A deterministic collation uses deterministic comparisons, which means that it considers strings to be equal only if they consist of the same byte sequence. Nondeterministic comparison may determine strings to be equal even if they consist of different bytes. Typical situations include case-insensitive comparison, accent-insensitive comparison, as well as comparison of strings in different Unicode normal forms. It is up to the collation provider to actually implement such insensitive comparisons; the deterministic flag only determines whether ties are to be broken using bytewise comparison. See also Unicode Technical Standard 10 for more information on the terminology.
     To create a nondeterministic collation, specify the property
     deterministic = false to CREATE
     COLLATION, for example:
CREATE COLLATION ndcoll (provider = icu, locale = 'und', deterministic = false);
This example would use the standard Unicode collation in a nondeterministic way. In particular, this would allow strings in different normal forms to be compared correctly. More interesting examples make use of the ICU customization facilities explained above. For example:
CREATE COLLATION case_insensitive (provider = icu, locale = 'und-u-ks-level2', deterministic = false); CREATE COLLATION ignore_accents (provider = icu, locale = 'und-u-ks-level1-kc-true', deterministic = false);
All standard and predefined collations are deterministic, all user-defined collations are deterministic by default. While nondeterministic collations give a more “correct” behavior, especially when considering the full power of Unicode and its many special cases, they also have some drawbacks. Foremost, their use leads to a performance penalty. Also, certain operations are not possible with nondeterministic collations, such as pattern matching operations. Therefore, they should be used only in cases where they are specifically wanted.