# API Reference

The Tilores API is highly customizable. The reference therefore only focusses on the basic and default API.

# GraphQL

# mutation: disassemble

The disassemble mutation helps you to remove records and/or edges.

Example request:

mutation {
  disassemble(input: {
    edges: [
      {
        a: "<uuid-record-a>"
        b: "<uuid-record-b>"
      }
    ]
    recordIDs: [
      "<uuid-record-c>"
      "<uuid-record-d>"
    ]
    createConnectionBan: false
    meta: {
      user: "<user-name>"
      reason: "some reason"
    }
  }) {
    triggered
  }
}

When providing edges, all edges between the two records will be removed, independent of their type. The records itself will not be removed.

When providing recordIDs, the records as well as their edges will be removed.

You must at least provide one record ID or one edge. You can at the same time provide edges and record IDs. If an edge or a record is the only connecting link between two (or more) parts of an entity, then the entity will be split into multiple entities.

All record IDs mentioned in edges and recordIDs must belong to the same entity.

When an entity was split into multiple entities and the setting createConnectionBan is true (default is false), then this will prevent the newly created entities from ever merging again. Any record that would merge these entities will be dropped in that case. Connection bans will be inherited, meaning, that when entity A, which has a connection ban to entity B, is merged into entity C, then C afterwards also will have a connection ban towards B. The same is true for splitting A.

The user and the reason from the optional meta block, will currently only show up in the logs and should explain why and when a record or an edge was removed.

Example response:

{
  "data": {
    "disassemble": {
      "triggered": true
    }
  }
}

deletedEdges and deletedRecords represent the number of removed edges and records. deletedEdges may have a higher number than the edges explicitly removed.

entityIDs lists all IDs of the resulting entities. If the disassemble caused the entity to be deleted, this entry will be empty. If the disassemble caused the entity to be updated, it will contain exactly one entry. And otherwise it will contain at least two entries.

If an error occurs during the disassemble, it will first retry it internally again. If this fails again, it will schedule the removal for a later retry and return with an error message to the API user. Usually, the API user can ignore that specific error - it will eventually perform the disassemble.

# mutation: removeConnectionBan

The removeConnectionBan removes an existing connection ban between two or more entities.

Example request:

mutation {
  removeConnectionBan(input: {
    reference: "<uuid>"
    entityID: "<entity-id-a>"
    others: [
      "<entity-id-b>"
      "<entity-id-c>"
    ]
    meta: {
      user: "<user-name>"
      reason: "some reason"
    }
  }) {
    removed
  }
}

The reference must be a unique ID, that can be reused if the same request needs to be processed again later due to a previous error.

The connection ban will always be removed between entityID and each entity ID from others. It will not affect possible connection bans in between the others entities.

The user and the reason from the meta block, will currently only show up in the logs and should explain why and when a record or an edge was removed.

Example response:

{
  "data": {
    "removeConnectionBan": {
      "removed": true
    }
  }
}

Unless there was an error, the removed field will always return true.

If an error occurs during the removal, it will first retry it internally again. If this fails again, it will schedule the removal for a later retry and return with an error message to the API user. Usually, the API user can ignore that specific error - it will eventually perform the connection ban removal.

# mutation: submit

The submit mutation adds the provided records into Tilores.

Example request:

mutation {
  submit(input: {
    records: [
      {
        id: "my-id",
        myCustomField: "some-value"
      }
    ]
  }) {
    recordsAdded
  }
}

The structure for each value in records is dependent on your customized schema for the RecordInput.

Multiple records submitted in a single mutation will be automatically connected with each other using a STATIC edge. This is independent from the matching rules.

Example response:

{
  "data": {
    "submit": {
      "recordsAdded": 1
    }
  }
}

recordsAdded will just return the number of records that have successfully be submitted into Tilores.

# mutation: submitWithPreview

The submitWithPreview is similar to submit but with the following differences:

As a response it provides a preview of how the entities would look like when/if the provided records were ingested.
Provides the option dryRun where if set to true only the preview provided without actually ingesting the provided records into Tilores.

Example request:

mutation {
  submitWithPreview(input: {
    records: [
      {
        id: "my-id",
        myCustomField: "some-value"
      }
    ]
  }) {
    entities {
      id
      records {
        id
        myCustomField
      }
      edges
      duplicates
      hits
    }
  }
}

Example response:

{
  "data": {
    "entity": {
      "id": "<uuid-entity>",
      "records": [
        {
          "id": "<some-already-existing-record-id>",
          "myCustomField": "some value"
        },
        {
          "id": "<newly-provided-record-id>",
          "myCustomField": "some other value"
        }
      ],
      "edges": [
        "<some-already-existing-record-id>:<newly-provided-record-id>:R1EXACT"
      ],
      "duplicates": {},
      "hits": {}
    }
  }
}

In case the submitted records do not end up in an existing entity then the returned preview will contain an entity with an ID that is different from the final entity ID where the records will be ingested into.

In case record updates were explicitly disabled for the instance, then the entities where the records with provided record IDs currently reside are returned in the preview.

# query: entity

The entity query will search for the provided entity ID and return it.

Example request:

{
  entity(input: {
    id: "<uuid-entity>"
  }){
    entity {
      id
      records {
        id
        myCustomField
      }
      edges
      duplicates
      hits
      score
    }
  }
}

The id is the entity ID to search for.

Example response:

{
  "data": {
    "entity": {
      "id": "<uuid-entity>",
      "records": [
        {
          "id": "<record-id-a>",
          "myCustomField": "some value"
        },
        {
          "id": "<record-id-b>",
          "myCustomField": "some other value"
        },
        {
          "id": "<record-id-c>",
          "myCustomField": "some value"
        }
      ],
      "edges": [
        "<record-id-a>:<record-id-b>:STATIC",
        "<record-id-a>:<record-id-b>:R1EXACT"
      ],
      "duplicates": {
        "<record-id-a>": [
          "<record-id-c>"
        ]
      },
      "hits": {},
      "score": "0.9814342"
    }
  }
}

If an entity with the provided ID exists, then this entity will be returned in the entity field. Otherwise this field will be null.

Each entity has the following fields:

id is the entity ID.

records is a list of all records of that entity. The queryable fields depend on the Record from your custom schema.

edges lists all edges in that entity, which represents how the records are connected with each other.

duplicates lists all duplicates of that entry. When not using rule groups the key of that map will be the record ID of the original and the entries will be the record IDs of the duplicates. When using rule groups, the key will additionally be prefixed with the ID of the rule group and a colon (<rule-group-id>:<record-id-a>) and the values stay unchanged. In that case one record ID of a duplicate can be present in the values of multiple keys.

The hits will always be empty for an entity search.

recordInsights provides filtering, statistics and aggregation on the entity records. Refer to record insights for more details.

edgeInsights provides statistics and aggregation on the entities edges and duplicates. Refer to edge insights for more details.

score reflects the overall quality of matches within the entity. It is represented by a float value in the range (0.0, 1.0] (higher value means better matching quality).

# query: entity by record

The entityByRecord query will search for the provided record ID and return the entity it belongs to.

Example request:

{
  entityByRecord(input: {
    id: "<record-id-b>"
  }){
    entity {
      id
      records {
        id
        myCustomField
      }
      edges
      duplicates
      hits
    }
  }
}

The id is the record ID to search for.

Example response:

{
  "data": {
    "entityByRecord": {
      "id": "<uuid-entity>",
      "records": [
        {
          "id": "<record-id-a>",
          "myCustomField": "some value"
        },
        {
          "id": "<record-id-b>",
          "myCustomField": "some other value"
        },
        {
          "id": "<record-id-c>",
          "myCustomField": "some value"
        }
      ],
      "edges": [
        "<record-id-a>:<record-id-b>:STATIC",
        "<record-id-a>:<record-id-b>:R1EXACT"
      ],
      "duplicates": {
        "<record-id-a>": [
          "<record-id-c>"
        ]
      },
      "hits": {}
    }
  }
}

If a record with the provided ID exists, then its entity will be returned in the entityByRecord field. Otherwise this field will be null.

The fields available for entityByRecord mirror those of entity when querying by entity ID.

# query: search

The search query will search for the provided values using the search rules.

Example request:

{
  search(input: {
    parameters: {
    	myCustomField: "some-value"
    }
  }) {
    entities {
      id
      records {
        id
        myCustomField
      }
      edges
      duplicates
      hits
      score
      hitScore
    }
  }
}

The parameters define your custom search parameters as they are defined using the SearchParams type.

Optionally, you can provide the following parameters:

considerRecords activates the What-IF machine.

page and pageSize limit the amount of entities returned per request. By default all relevant entities will be returned.

searchRules defines which rule set to use during search. Defaults to default. Valid options can be found in the UI or in the rule config under the searchRuleSetIDs section.

Example response:

{
  "data": {
    "entities": [
      {
        "id": "<uuid-entity>",
        "records": [
          {
            "id": "<record-id-a>",
            "myCustomField": "some value"
          },
          {
            "id": "<record-id-b>",
            "myCustomField": "some other value"
          },
          {
            "id": "<record-id-c>",
            "myCustomField": "some value"
          }
        ],
        "edges": [
          "<record-id-a>:<record-id-b>:STATIC",
          "<record-id-a>:<record-id-b>:R1EXACT"
        ],
        "duplicates": {
          "<record-id-a>": [
            "<record-id-c>"
          ]
        },
        "hits": {
          "<record-id-a>": [
            "R1EXACT"
          ]
        },
        "score": "0.9814342",
        "hitScore": "1.0"
      }
    ]
  }
}

entities is either an empty list or a list with all the found entities. Their structure equals the one described in query: entity, except that for the hits you receive a list of record IDs that fit to the search and the rule ID with which they were found and that the hitScore is available.

The hitScore indicates how closely the match aligns with the provided search parameter. It is represented by a float value in the range (0.0, 1.0] (higher value means better matching quality).

# Record Insights

Provides filtering, statistics and aggregation on the entity records. This is available on the entity type.

# Available Functions

filter(conditions: [FilterCondition!]!): RecordInsights!: Returns a new RecordInsights object that only contains the records for which the FilterCondition applies.
- field [Required]: the field upon which to check the criteria.
- equals: ensures that the fields value is equal to the provided value.
- isNull: ensures that the field must have a null value.
- startsWith: ensures that the fields value starts with the provided text. Using startsWith on non-string fields will convert them into strings first. This may lead to unexpected, but correct results.
- endsWith: ensures that the fields value ends with the provided text. Using endsWith on non-string fields will convert them into strings first. This may lead to unexpected, but correct results.
- likeRegex: ensures that the fields value matches the provided regular expression. Using likeRegex on non-string fields will convert them into strings first. This may lead to unexpected, but correct results.
- lessThan: ensures that the fields value is less than the provided value. Using lessThan on non-numeric fields will raise an error. This may lead to unexpected, but correct results.
- lessEquals: ensures that the fields value is less than or equals the provided value. Using lessEquals on non-numeric fields will raise an error.
- greaterThan: ensures that the fields value is greater than the provided value. Using greaterThan on non-numeric fields will raise an error.
- greaterEquals: ensures that the fields value is greater than or equals the provided value. Using greaterEquals on non-numeric fields will raise an error.
- after: ensures that the fields value is after the provided value. Using after on non-time fields will raise an error.
- since: ensures that the fields value is after or at the provided value. Using since on non-time fields will raise an error.
- before: ensures that the fields value is before the provided value. Using before on non-time fields will raise an error.
- until: ensures that the fields value is before or at the provided value. Using before on non-time fields will raise an error.
- invert: negates the results of the checks.
sort(criteria: [SortCriteria!]!): RecordInsights!: Returns a new RecordInsights object that contains the records ordered by the provided SortCriteria.
- field [Required]: the field to sort by.
- direction: defines whether to sort ascending or descending. Allowed values are ASC and DESC.
group(fields: [String!]!, caseSensitive: Boolean): [RecordInsights!]!: Returns a list of RecordInsights objects where the records have been grouped by the provided fields.
limit(count: Int!, offset: Int): RecordInsights!: Returns a new RecordInsights object that contains up to 'count' records.
count: Int!: Returns the amount of records in the currently selected list.
countDistinct(fields: [String!]!, caseSensitive: Boolean): Int!: Returns the number of unique non-null values for the provided field(s).
first: Record: Returns the first record in the list or null for empty lists.
last: Record: Returns the last record in the list or null for empty lists.
values(field: String!): [Any]!: Returns all non-null values of the current records for the provided field.
valuesDistinct(field: String!, caseSensitive: Boolean): [Any]!: Returns all unique non-null values of the current records for the provided field.
frequencyDistribution(field: String!, top: Int, direction: SortDirection): [FrequencyDistributionEntry!]!: Returns how often a non-null value for the provided field is present.
- value: holds the value for which the percentage and frequency applies.
- frequency: is the number of records that have the value.
- percentage: is the percentage of records that have the value. For calculating the percentage only non-null values are considered.
confidence(field: String!, caseSensitive: Boolean): Float describes the probability of having the one truly correct value for the provided path. The resulting value is a float ranging from 0 to 1 representing a percentage. Null values are ignored in the calculation. Returns null if all values are null.
average(field: String!): Float: Returns the average value of the provided numeric field.
max(field: String!): Float: Returns the highest value of the provided numeric field.
median(field: String!): Float: Returns the median value of the provided numeric field.
min(field: String!): Float: Returns the lowest value of the provided numeric field.
sum(field: String!): Float: Returns the sum of the provided numeric field.
standardDeviation(field: String!): Float: Calculates the standard deviation for the provided numeric field.
newest(field: String!): Record: Returns the record for where the provided time field has the highest (most recent) value.
oldest(field: String!): Record: Returns the record for where the provided time field has the lowest (least recent) value.
flatten(field: String!): [Any]!: Merges the values of the provided array field into a single array.
flattenDistinct(field: String!, caseSensitive: Bool): [Any]!: Merges the values of the provided array field into a single array where each value is unique.

# Examples

# Numerical Functions

query {
  entity(input: {id: "123"}) {
      records {
          price
      }
      recordInsights {
          average(field: "price")
          max(field: "price")
          median(field: "price")
          min(field: "price")
          standardDeviation(field: "price")
          sum(field: "price")
      }
  }
}

{
  "data": {
    "entity": {
      "records": [
        {"price": 10},
        {"price": 20},
        {"price": 30},
        {"price": 15},
        {"price": 25},
        {"price": null}
      ],
      "recordInsights": {
        "average": 20,
        "max": 30,
        "median": 20,
        "min": 10,
        "standardDeviation": 7.0710678118655,
        "sum": 100
      }
    }
  }
}

This query retrieves all the price field for all the records of the entity, as well as some statistics related to the price field of their records. Specifically, it calculates the average, maximum, median, minimum, and sum of the prices, as well as their standard deviation.

The price field is assumed to be numeric, as all of these functions require a numeric field. If the price field values were not numeric, an error would be raised.

Null values are ignored in these calculations

# Filter and Sort

query {
  entity(input: {id: "123"}) {
      recordInsights {
          filter(conditions: [
              { field: "status", equal: "active" },
              { field: "created_at", after: "2022-01-01T00:00:00Z" }
          ]) {
              sort(criteria: [
                  { field: "created_at", direction: DESC },
                  { field: "priority" }
              ]) {
                  limit(count: 10) {
                      records {
                          id
                          name
                          status
                          priority
                          created_at
                      }
                  }
              }
          }
      }
  }
}

This query retrieves the RecordInsights for an entity with ID 123. It applies a filter to only include records where the status field is "active" and the created_at field is greater than or equal to January 1st, 2022. It then sorts the resulting records by created_at in descending order and priority in ascending order. It limits the result to 10 records starting from the first one. Finally, it selects the id, name, status, priority, and created_at fields for each of the selected records.

# Group

query {
  entity(input: {id: "123"}) {
    recordInsights {
      group(fields: ["category"]) {
        records {
          id
          name
          category
        }
        count
        names: valuesDistinct(field: "name")
        categories: valuesDistinct(field: "category")
      }
    }
  }
}

Assuming the following records for the entity with ID "123":

[
  {
    "id": "1",
    "name": "Product A",
    "category": "Electronics"
  },
  {
    "id": "2",
    "name": "Product B",
    "category": "Electronics"
  },
  {
    "id": "3",
    "name": "Product C",
    "category": "Clothing"
  }
]

The response would be:

{
  "data": {
    "entity": {
      "recordInsights": {
        "group": [
          {
            "records": [
              {
                "id": "1",
                "name": "Product A",
                "category": "Electronics"
              },
              {
                "id": "2",
                "name": "Product B",
                "category": "Electronics"
              }
            ],
            "count": 2,
            "names": ["Product A","Product B","Product C"],
            "categories": ["Electronics","Clothing"]
          },
          {
            "records": [
              {
                "id": "3",
                "name": "Product C",
                "category": "Clothing"
              }
            ],
            "count": 1
          }
        ]
      }
    }
  }
}

This query groups the records by their category field, returning two groups: one for the "Electronics" category with a count of 2, and one for the "Clothing" category with a count of 1. The records field of each group contains the records in that group, and the count field is the total number of records in that group.

# Edge Insights

Provides statistics and aggregation on the entities edges and duplicates. This is available on the entity type.

# Available Functions

count: Int!: Returns the amount of edges in the provided list.
frequencyDistribution(top: Int, direction: SortDirection): [FrequencyDistributionEntry!]!: Returns how often a rule is present.
matrix(links: [String!]): [EdgeMatrixEntry!]!: Returns a matrix in which it is possible to see the links between each two records and due to which rule or duplicate they are linked.

# Examples

# Edge Matrix

query {
  entity(input: {id: "123"}) {
    edges
    duplicates
    edgeInsights {
      matrix(links: ["R1", "R2", "R3", "duplicate"]) {
        a
        b
        links
      }
    }
  }
}

The response would be:

{
  "data": {
    "entity": {
      "edges": [
        "1:2:R1",
        "1:2:R2",
        "1:2:R4",
        "1:3:R1"
      ],
      "duplicates": {
        "1": ["4"]
      },
      "edgeInsights": {
        "matrix": [
          {
            "a": "1",
            "b": "2",
            "links": {
              "R1": true,
              "R2": true,
              "R3": false,
              "duplicate": false
            }
          },
          {
            "a": "1",
            "b": "3",
            "links": {
              "R1": true,
              "R2": false,
              "R3": false,
              "duplicate": false
            }
          },
          {
            "a": "1",
            "b": "4",
            "links": {
              "R1": false,
              "R2": false,
              "R3": false,
              "duplicate": true
            }
          }
        ]
      }
    }
  }
}

As you can see, the output of the edge matrix contains the same information as edges and duplicates. However, depending on the use case, it might be easier to work with on the client side. Also note, that the links parameter is optional. When omitting it, the result would not contain the values for R3 as they are all false, but would instead receive the output for R4, which was filtered out before.

# What-IF Machine

During search, entity by ID and entity by record ID queries, you can provide optional filter criteria using the considerRecords field. This filters the entities that were originally found and only takes into account the records that match those filters. As a result all other properties (edges, duplicates and hits) are updated accordingly, as if the other records did not exist at all.

This may even lead to situations where you end up with multiple entities or no results at all, or even a much smaller entity, despite only filtering a single record. The result behaves exactly as if you would have never added those records or as you would remove them.

It can be used with a variety of what-if scenarios. For example, you can observe the state of an entity at any given time in the past using the until or before filter condition. Alternatively, you could use it to see what will happen when records are deleted due to old age using the after or since condition. Another use case is to visualize how an entity would appear without the records from a specific source by using the equals and invert filters.

Example for a time based filter:

{
  search(input: {
    parameters: {
    	myCustomField: "some-value"
    }
    considerRecords: {
      field: "myTimestamp"
      before: "2023-05-12T13:23:00Z02:00"
    }
  }) {
    entities {
      id
      records {
        id
        myCustomField
      }
      edges
      duplicates
      hits
    }
  }
}

This will return the entity with only records that have the custom field myTimestamp at a value before 2023-05-12T13:23:00Z02:00. The result then depends on how the records are connected and which records were hit.

Example for a filter to exclude certain record IDs:

{
  search(input: {
    parameters: {
    	myCustomField: "some-value"
    }
    considerRecords: [
      {
        field: "id"
        before: "<record-id-a>"
        invert: true
      },
      {
        field: "id"
        before: "<record-id-b>"
        invert: true
      }
    ]
  }) {
    entities {
      id
      records {
        id
        myCustomField
      }
      edges
      duplicates
      hits
    }
  }
}

This will ignore the records <record-id-a> and <record-id-b> before rebuilding the entity. Keep in mind that you still might end up with a lot less entities, if e.g. one of those two records were the only record that connected two larger clusters.

Many more filter options are available. Please use the GraphQL introspection feature to see what is possible.

# Metrics

Access

The scope tilores/query.metrics is required to use these queries. It can be requested when obtaining access tokens.

# Assembly Status

Feature Availability

This feature is currently only available when using SQS as raw data queue.

This query is useful when performing batch data ingestion, where you can check the status of the assembly process.

# State

READY the assembly process is idle. (no records in queue)
IN_PROGRESS the assembly process is active. (records are in queue to be/and currently are being ingested)

# Estimated Time Remaining

Shows estimated time left in minutes for the records in queue to be fully assembled.

When there are not enough data points to make the estimate then the result is null.

Estimation Accuracy

The estimated time can be highly inaccurate in the following cases:

The first 10 minutes after a batch submission.
Continuous record submission at variable rates.
Record updates.

# Example

Request

query {
  metrics {
		assemblyStatus {
			state
			estimatedTimeRemaining
		}
	}
}

Response

{
	"data": {
		"metrics": {
			"assemblyStatus": {
				"state": "IN_PROGRESS",
				"estimatedTimeRemaining": 6
			}
		}
	}
}

# Entity Event Stream

Not yet integrated into the GraphQL API, but also very helpful when it comes to observing changes in entities is the entity event stream. This stream is provided via AWS SQS or AWS Kinesis based on what is selected during deployment. Requires an IAM user to access the data.

Every single data change in Tilores is published via that stream. The following events are currently available.

# Create Event

The create event will be published, when an entity was newly created, meaning a record was submitted, but was not attached to an existing entity.

Example:

{
  "type": "CREATE",
  "timestamp": "2022-01-01T00:00:00.000000000Z",
  "data": {
    "entities": [{
      "id": "<new-entity-id>",
      "recordIDs": ["<new-record-id>"],
      "obsoleteRecordIDs": []
    }],
    "obsoleteEntities": []
  }
}

# Update Event

The update event will be published when a record is submitted and was matched with exactly one existing entity.

Example:

{
  "type": "UPDATE",
  "timestamp": "2022-01-01T00:00:00.000000000Z",
  "data": {
    "entities": [{
      "id": "<entity-id>",
      "recordIDs": ["<new-record-id>","<record-id>"],
      "obsoleteRecordIDs": []
    }],
    "obsoleteEntities": []
  }
}

# Merge Event

The merge event will be published when a record is submitted and was matched with more than one existing entity, resulting in all of these entities being merged.

Example:

{
  "type": "MERGE",
  "timestamp": "2022-01-01T00:00:00.000000000Z",
  "data": {
    "entities": [{
      "id": "<entity-id-a>",
      "recordIDs": ["<record-id-a>","<record-id-b>","<new-record-id>"],
      "obsoleteRecordIDs": []
    }],
    "obsoleteEntities": [
      {
        "id": "<entity-id-b>",
        "recordIDs": [],
        "obsoleteRecordIDs": []
      }
    ]
  }
}

# Split Event

The split event will be published when an edge or a record that connected at least two parts of an entity was deleted, resulting in at least two new entities.

Example:

{
  "type": "SPLIT",
  "timestamp": "2022-01-01T00:00:00.000000000Z",
  "data": {
    "entities": [
      {
        "id": "<entity-id-a>",
        "recordIDs": ["<record-id-a>"],
        "obsoleteRecordIDs": ["<record-id-a-b>"]
      },
      {
        "id": "<entity-id-b>",
        "recordIDs": ["<record-id-b>"],
        "obsoleteRecordIDs": []
      }
    ],
    "obsoleteEntities": []
  }
}

# Delete Event

The delete event will be published when all records from an entity have been removed.

Example:

{
  "type": "DELETE",
  "timestamp": "2022-01-01T00:00:00.000000000Z",
  "data": {
    "entities": [],
    "obsoleteEntities": [
      {
        "id": "<entity-id>",
        "recordIDs": [],
        "obsoleteRecordIDs": ["<deleted-record-id>"]
      }
    ]
  }
}

# Advanced Analytics

Tilores is designed for realtime use cases and optimized for querying individual entities. If you want to analyze the resulting entities using a SQL-like interface, you can use AWS Athena for this purpose. This requires that the advanced analytics are enabled.

Once enabled, Tilores will automatically snapshot the data that was modified within the last few minutes and create a entities and a records table that can be used from within Athena.

Enabling the analytics module on an existing and populated instance is not recommended, but possible. This requires manual steps depending on the instance size. Please contact service@tilores.io.

# Entities Table

The entities table contains general information about the entities stored in Tilores. This includes some statistical values, such as number of records or number of edges.

The following table gives an overview about the available fields:

Field Name	Data Type	Description
entity_id	string	id of the entity
version	int	version number of the entity, starting at 1
type*	string	storage type hint for some of the fields (e.g. edges), either cbgc or elist
record_count	int	enriched number of records per entity
edge_count	int	enriched number of edges per entity
rule_edge_count	map<string,int>	enriched number of edges per rule
duplicate_count	int	enriched number of duplicates per entity
clique_count	int	number of cliques per entity; always 0 if type equals elist
records	array	list of all records in an entity
edges*	mixed	type dependent edge representation
duplicates*	mixed	type dependent duplicate representation
cliques*	mixed	type dependent cliques representation; always null if type equals elist
data_location	string	data location of the entity header and record data in S3
create_timestamp	timestamp	timestamp of when an entity was created
update_timestamp	timestamp	timestamp of when an entity was last modified
date	date	partitioned field of the last modification date; can be used to optimize date specific queries

Fields marked with a * are not guaranteed to be stable and might change without prior notice.

# Records Table

The records table contains details about the ingested records.

The following table gives an overview about the available fields:

Field Name	Data Type	Description
record_id	string	id of the record
version	int	version number of the entity, starting at 0
entity_id	string	id of the records entity
submit_timestamp	timestamp	timestamp when the record was received by Tilores
assemble_timestamp	timestamp	timestamp when the record was assembled into the entity
data	string	client specific record data; use json_parse or json_extract to receive individual fields

# Example Queries

Below are some common query examples.

Top 10 largest entities:

SELECT
  entity_id,
  record_count,
  edge_count,
  duplicate_count,
  update_timestamp,
  records
FROM entities
ORDER BY record_count DESC, edge_count DESC
LIMIT 10

Preview of 10 records that were recently updated:

SELECT
  record_id,
  entity_id,
  submit_timestamp,
  data
FROM records
ORDER BY submit_timestamp DESC
LIMIT 10

Entities with the most distinct value on a specific field:

SELECT
  entity_id,
  count(distinct field) AS field_count,
  array_agg(distinct field) as field_values
FROM (
  SELECT
    entity_id,
    json_extract(data, '$.myCustomField') AS field -- Adjust the path to one of the fields as defined in the schema
  FROM records
)
GROUP BY entity_id
HAVING count(distinct field) > 0
ORDER BY count(distinct field) DESC
LIMIT 100

Record preview of the largest entity:

WITH e AS (
    SELECT
      *
    FROM entities
    ORDER BY record_count DESC, edge_count DESC
    LIMIT 1
)

SELECT
  e.entity_id,
  e.edge_count,
  r.record_id,
  json_parse(r.data) as data
FROM records r
JOIN e ON e.entity_id = r.entity_id

Please refer to the Athena SQL DML documentation for further details on how to query the tables.