# Reference Lists

Reference lists provide static lookup data that can be used by transformers for operations like mapping values. They allow you to maintain reusable data sets.

# Using Predefined Reference Lists

Tilores provides predefined reference lists that can be loaded from external sources. To use a predefined reference list, specify the external field with the format "filename@version":

{
  "referenceLists": [
    {
      "id": "countries",
      "external": "map-country-code@latest"
    }
  ]
}

# Available Predefined Lists

Name Description
map-country-code Maps country names to ISO 3166-1 alpha-2 codes
map-country-language Maps country codes to ISO 639-3 language codes

# Version Format

The version in "filename@version" can be:

  • latest - Always uses the most recent version
  • A specific version tag (e.g., v0-1-0)
  • A specific major tag (e.g., v0)

# Overriding External Data

You can extend or override entries from an external reference list by providing inline rows. The first column is used as the merge key:

{
  "referenceLists": [
    {
      "id": "countries",
      "external": "map-country-code@latest",
      "rows": [
        ["custom country", "xx"],
        ["united states", "usa"]
      ]
    }
  ]
}

In this example:

  • "custom country" is added as a new entry
  • "united states" overrides the existing mapping from the external list

# Inline Reference Lists

You can also define reference lists entirely inline without using external sources:

{
  "referenceLists": [
    {
      "id": "status-codes",
      "meta": {
        "header": [
          {"type": "token"},
          {"type": "mapping"}
        ]
      },
      "rows": [
        ["active", "A"],
        ["inactive", "I"],
        ["pending", "P"]
      ]
    }
  ]
}

# Reference List Structure

Each reference list has:

  • id - Unique identifier used to reference the list
  • meta.header - Column definitions with types (token, mapping, etc.)
  • rows - Array of value arrays, one per entry

# Column Types

Reference lists support different column types depending on how the data will be used:

# token

The token column type defines the lookup key. When a transformer or matcher looks up a value in a reference list, it searches the token column to find a match. This is typically the first column in a reference list.

# mapping

The mapping column type defines the output value for value mapping operations. When using the Map Value transformer, the input is looked up in the token column and the corresponding mapping column value is returned.

Example: A country name to country code mapping:

{
  "meta": {
    "header": [
      {"type": "token"},
      {"type": "mapping"}
    ]
  },
  "rows": [
    ["united states", "US"],
    ["germany", "DE"],
    ["france", "FR"]
  ]
}

# tokenfrequencyweight

The tokenfrequencyweight column type stores numeric weights for tokens. This is used by the Weighted Token matcher to assign importance to different tokens during matching.

Lower weights indicate more common tokens (less distinctive), while higher weights indicate rarer tokens (more distinctive for matching).

Example: A token weight list for name matching:

{
  "meta": {
    "header": [
      {"type": "token"},
      {"type": "tokenfrequencyweight"}
    ]
  },
  "rows": [
    ["john", 0.7],
    ["smith", 1.0],
    ["doe", 0.5]
  ]
}