# Consistency

Often you want to improve your matchers via IDs that you have available in your system. For a person use case these might be things like social security number, passport number, driver's license number or customer id.

While for some of those, e.g. the customer id, it is fine (often even expected) that a single entity contains multiple of these IDs, others indicate an obvious wrong match (e.g. social security number).

Tilores allows you to configure rules for matching on ID values and to guarantee uniqueness within a single entity. The following example walks you through the steps that are required to setup such a rule.

# Step 1: Understand Your Data

Before you can setup rules you should understand your data:

  • Which fields are available?
  • For which fields do you need entity consistency?
  • What does your schema look like?

For ID fields we typically see one of the following four possible scenarios when it comes to the schema:

  • a single ID value, stored in a single field, no specific ID type
  • separate fields for different ID values, field name defines the ID type
  • a single field with a map where the ID type is the maps key and the value is the ID
  • a single field with an array of nested objects and specific attributes indicating the ID and its type such as (value and type)

For the purpose of this example, we assume that the data is following the last case previously described.

The data for a single record might look something like this (simplified):

{
  "id": "record-id",
  "name": "John Smith",
  "commonIDs": [
    {"type": "SSN", "value": "1234567890"},
    {"type": "DriveLicense", "value": "ABCDEFG"}
  ]
}

The data schema for the RecordInput would therefore look like this:

input RecordInput {
  id: ID!
  name: String!
  commonIDs: [CommonIDInput!]!
}

input CommonIDInput {
  type: String!
  value: String!
}

# Step 2: Extract the Values

Based on the previously defined schema, we want to match on the name and ensure consistency among the IDs.

For this we need to retrieve the name as well as the ID fields in the "Extract / Transform" page as follows:

name and id transformations
name and id transformations

The output for the IDs must either be a map or an array of arrays. Either of the two following formats is fine:

Map Output
Output
{
  "ssn": "1234567890",
  "type": "abcdefg"
}
[
  ["ssn", "1234567980"],
  ["type", "abcdefg"]
]

Since we're already working with array based data, we will transform into the array version using a simple jq-like pattern ([.commonIDs[] | [.type, .value]]) right in the extract step:

jq-like pattern for transforming the ids
jq-like pattern for transforming the ids

# Step 3: Setup Matching Rule

To ensure matching for equal names, we can setup a simple text matcher using the name output that we just created.

matcher configuration for name field
matcher configuration for name field

Afterwards create the matching rule and ensure that it has been added to the rule set for linking and searching.

rule configuration with name matcher
rule configuration with name matcher

ruleset configuration
ruleset configuration

# Step 4: Setup Consistency Rule

With this rule in place, every record with the name "John Smith" would now be added to the same entity ID. To ensure, that there is an ID consistency within a single entity, we need to setup the according rule.

We start by creating a matcher that allows us to match IDs.

configuration for the ID matcher
configuration for the ID matcher

Please ensure, that you selected the "Match on All Shared Types" strategy and that you disabled the "Must Exist" option. Disabling this option allows two records without an ID to match and is a requirement for the next step.

If one record does have an ID while another record does not have any IDs at all, we still want to consider this as a consistent match. To allow this we need to additionally configure an "Empty Aware" matcher. Notice how it wraps the matcher that we just created.

configuration for the empty aware matcher
configuration for the empty aware matcher

The next step is to setup the consistency rule:

consistency rule configuration
consistency rule configuration

and to enable the consistency rule set with that newly created rule:

consistency rule set configuration
consistency rule set configuration

# Performance Considerations

With the steps performed, you would properly separate entities with diverging IDs. However, this comes at some costs and should be used sparsely to guarantee consistency. Internally a new record will find all those entities as potential matches. Then Tilores will compare the new record against all existing records of the potential target entities. While this is optimized internally, this still might lead to a lot of comparisons needed in some situations (e.g. many distinct entities or entities with huge number of records).

# Consistency vs Matching Rules

If you're now wondering why we are not simply using matching rules instead of consistency rules, please have a look at the following three records:

A
B
C
{
  "id": "a",
  "name": "John Smith",
  "commonIDs": [
    {"type": "SSN", "value": "1234567890"}
  ]
}
{
  "id": "b",
  "name": "John Smith",
  "commonIDs": []
}
{
  "id": "c",
  "name": "John Smith",
  "commonIDs": [
    {"type": "SSN", "value": "0987654321"}
  ]
}

Matching rules in Tilores are always comparing record pairs and if a match exists between any two records, those records will end up in the same entity.

In the above example a will match with b and b will match with c. Hence, all three records will end up in the same entity.

Consistency rules work by checking the new record against all records of the potential target entity. If a single record will not match with the new record then the whole entity is not considered a potential match.