#
API Reference
The Tilores API is highly customizable. The reference therefore only focusses on the basic and default API.
#
GraphQL
#
mutation: disassemble
The disassemble mutation helps you to remove records and/or edges.
Example request:
mutation {
disassemble(input: {
edges: [
{
a: "<uuid-record-a>"
b: "<uuid-record-b>"
}
]
recordIDs: [
"<uuid-record-c>"
"<uuid-record-d>"
]
createConnectionBan: false
meta: {
user: "<user-name>"
reason: "some reason"
}
}) {
triggered
}
}
When providing edges, all edges between the two records will be removed,
independent of their type. The records itself will not be removed.
When providing recordIDs, the records as well as their edges will be removed.
You must at least provide one record ID or one edge. You can at the same time provide edges and record IDs. If an edge or a record is the only connecting link between two (or more) parts of an entity, then the entity will be split into multiple entities.
All record IDs mentioned in edges and recordIDs must belong to the same
entity.
When an entity was split into multiple entities and the setting
createConnectionBan is true (default is false), then this will prevent the
newly created entities from ever merging again. Any record that would merge
these entities will be dropped in that case. Connection bans will be inherited,
meaning, that when entity A, which has a connection ban to entity B, is
merged into entity C, then C afterwards also will have a connection ban
towards B. The same is true for splitting A.
The user and the reason from the optional meta block, will currently only
show up in the logs and should explain why and when a record or an edge was
removed.
Example response:
{
"data": {
"disassemble": {
"triggered": true
}
}
}
deletedEdges and deletedRecords represent the number of removed edges and
records. deletedEdges may have a higher number than the edges explicitly
removed.
entityIDs lists all IDs of the resulting entities. If the disassemble caused
the entity to be deleted, this entry will be empty. If the disassemble caused
the entity to be updated, it will contain exactly one entry. And otherwise it
will contain at least two entries.
If an error occurs during the disassemble, it will first retry it internally again. If this fails again, it will schedule the removal for a later retry and return with an error message to the API user. Usually, the API user can ignore that specific error - it will eventually perform the disassemble.
#
mutation: removeConnectionBan
The removeConnectionBan removes an existing connection ban between two or more
entities.
Example request:
mutation {
removeConnectionBan(input: {
reference: "<uuid>"
entityID: "<entity-id-a>"
others: [
"<entity-id-b>"
"<entity-id-c>"
]
meta: {
user: "<user-name>"
reason: "some reason"
}
}) {
removed
}
}
The reference must be a unique ID, that can be reused if the same request
needs to be processed again later due to a previous error.
The connection ban will always be removed between entityID and each entity ID
from others. It will not affect possible connection bans in between the
others entities.
The user and the reason from the meta block, will currently only show up
in the logs and should explain why and when a record or an edge was removed.
Example response:
{
"data": {
"removeConnectionBan": {
"removed": true
}
}
}
Unless there was an error, the removed field will always return true.
If an error occurs during the removal, it will first retry it internally again. If this fails again, it will schedule the removal for a later retry and return with an error message to the API user. Usually, the API user can ignore that specific error - it will eventually perform the connection ban removal.
#
mutation: submit
The submit mutation adds the provided records into Tilores.
Example request:
mutation {
submit(input: {
records: [
{
id: "my-id",
myCustomField: "some-value"
}
]
}) {
recordsAdded
}
}
The structure for each value in records is dependent on your customized schema
for the RecordInput.
Multiple records submitted in a single mutation will be automatically connected
with each other using a STATIC edge. This is independent from the
matching rules.
Example response:
{
"data": {
"submit": {
"recordsAdded": 1
}
}
}
recordsAdded will just return the number of records that have successfully be
submitted into Tilores.
#
mutation: submitWithPreview
The submitWithPreview is similar to submit but with the following differences:
As a response it provides a preview of how the entities would look like when/if the provided records were ingested.
Provides the option
dryRunwhere if set totrueonly the preview provided without actually ingesting the provided records into Tilores.
Example request:
mutation {
submitWithPreview(input: {
records: [
{
id: "my-id",
myCustomField: "some-value"
}
]
}) {
entities {
id
records {
id
myCustomField
}
edges
duplicates
hits
}
}
}
Example response:
{
"data": {
"entity": {
"id": "<uuid-entity>",
"records": [
{
"id": "<some-already-existing-record-id>",
"myCustomField": "some value"
},
{
"id": "<newly-provided-record-id>",
"myCustomField": "some other value"
}
],
"edges": [
"<some-already-existing-record-id>:<newly-provided-record-id>:R1EXACT"
],
"duplicates": {},
"hits": {}
}
}
}
In case the submitted records do not end up in an existing entity then the returned preview will contain an entity with an ID that is different from the final entity ID where the records will be ingested into.
In case record updates were explicitly disabled for the instance, then the entities where the records with provided record IDs currently reside are returned in the preview.
#
query: entity
The entity query will search for the provided entity ID and return it.
Example request:
{
entity(input: {
id: "<uuid-entity>"
}){
entity {
id
records {
id
myCustomField
}
edges
duplicates
hits
score
}
}
}
The id is the entity ID to search for.
Example response:
{
"data": {
"entity": {
"id": "<uuid-entity>",
"records": [
{
"id": "<record-id-a>",
"myCustomField": "some value"
},
{
"id": "<record-id-b>",
"myCustomField": "some other value"
},
{
"id": "<record-id-c>",
"myCustomField": "some value"
}
],
"edges": [
"<record-id-a>:<record-id-b>:STATIC",
"<record-id-a>:<record-id-b>:R1EXACT"
],
"duplicates": {
"<record-id-a>": [
"<record-id-c>"
]
},
"hits": {},
"score": "0.9814342"
}
}
}
If an entity with the provided ID exists, then this entity will be returned in
the entity field. Otherwise this field will be null.
Each entity has the following fields:
id is the entity ID.
records is a list of all records of that entity. The queryable fields depend
on the Record from your custom schema.
edges lists all edges in that entity, which represents how the records are
connected with each other.
duplicates lists all duplicates of that entry. When not using
rule groups the key of that map will be the record
ID of the original and the entries will be the record IDs of the duplicates.
When using rule groups, the key will additionally be prefixed with the ID of the
rule group and a colon (<rule-group-id>:<record-id-a>) and the values stay
unchanged. In that case one record ID of a duplicate can be present in the
values of multiple keys.
The hits will always be empty for an entity search.
recordInsights provides filtering, statistics and aggregation on the entity
records. Refer to
edgeInsights provides statistics and aggregation on the entities edges and
duplicates. Refer to
score reflects the overall quality of matches within the entity. It is
represented by a float value in the range (0.0, 1.0] (higher value means better
matching quality).
#
query: entity by record
The entityByRecord query will search for the provided record ID and return the
entity it belongs to.
Example request:
{
entityByRecord(input: {
id: "<record-id-b>"
}){
entity {
id
records {
id
myCustomField
}
edges
duplicates
hits
}
}
}
The id is the record ID to search for.
Example response:
{
"data": {
"entityByRecord": {
"id": "<uuid-entity>",
"records": [
{
"id": "<record-id-a>",
"myCustomField": "some value"
},
{
"id": "<record-id-b>",
"myCustomField": "some other value"
},
{
"id": "<record-id-c>",
"myCustomField": "some value"
}
],
"edges": [
"<record-id-a>:<record-id-b>:STATIC",
"<record-id-a>:<record-id-b>:R1EXACT"
],
"duplicates": {
"<record-id-a>": [
"<record-id-c>"
]
},
"hits": {}
}
}
}
If a record with the provided ID exists, then its entity will be returned in the
entityByRecord field. Otherwise this field will be null.
The fields available for entityByRecord mirror those of entity when
#
query: search
The search query will search for the provided values using the
search rules.
Example request:
{
search(input: {
parameters: {
myCustomField: "some-value"
}
}) {
entities {
id
records {
id
myCustomField
}
edges
duplicates
hits
score
hitScore
}
}
}
The parameters define your custom search parameters as they are defined using
the SearchParams type.
Example response:
{
"data": {
"entities": [
{
"id": "<uuid-entity>",
"records": [
{
"id": "<record-id-a>",
"myCustomField": "some value"
},
{
"id": "<record-id-b>",
"myCustomField": "some other value"
},
{
"id": "<record-id-c>",
"myCustomField": "some value"
}
],
"edges": [
"<record-id-a>:<record-id-b>:STATIC",
"<record-id-a>:<record-id-b>:R1EXACT"
],
"duplicates": {
"<record-id-a>": [
"<record-id-c>"
]
},
"hits": {
"<record-id-a>": [
"R1EXACT"
]
},
"score": "0.9814342",
"hitScore": "1.0"
}
]
}
}
entities is either an empty list or a list with all the found entities. Their
structure equals the one described in hits you receive a list of record IDs that fit to the search and
the rule ID with which they were found and that the hitScore is available.
The hitScore indicates how closely the match aligns with the provided search
parameter. It is represented by a float value in the range (0.0, 1.0] (higher
value means better matching quality).
Optionally, you can provide the following parameters in your request:
considerRecords activates the
page and pageSize limit the amount of entities returned per request. By
default all relevant entities will be returned. Paging often improves the search
performance for cases where many entities will be returned.
sort, with child property field and optional child property direction, to
sort the resulting entities. By default, sorting is done by entity id
(field: "id") in ascending order. This can be changed to sort by the hit score
(field: "hitScore"), which by default sorts in descending order. By providing
the direction you can change the order to either ascending (direction: ASC)
order descending (direction: DESC).
searchRules defines which rule set to use during search. Defaults to default.
Valid options can be found in the UI or in the rule config under the
searchRuleSetIDs section.
Example request with sorting, paging and custom search rule set:
{
search(input: {
parameters: {
myCustomField: "some-value"
}
page: 1
pageSize: 5
sort: {
field: "hitScore"
direction: DESC
}
searchRules: "customSearchRuleSet"
}) {
entities {
id
records {
id
myCustomField
}
edges
duplicates
hits
score
hitScore
}
}
}
This will search for entities with the specified custom search rule set and return at maximum the top 5 most relevant entities.
Paging, sorting and hit score calculation is performed before evaluating the
considerRecords options. Combining these options might result in unexpected
results.
#
Record Insights
Provides filtering, statistics and aggregation on the entity records. This is available on the entity type.
Several of the functions below require a field to operate on. For nested data,
use a dot (.) to navigate through levels. Arrays can be accessed by their
zero-based index or with a wildcard (*).
Field Examples:
myCustomField: property on the root levelmyStruct.myNestedField: nested propertymyList.1: second element from the provided listmyList.0.myChildProperty: property of the first element from the listmyList.*.myChildProperty: property of all elements from the list
#
Available Functions
filter(conditions: [FilterCondition!]!): RecordInsights!: Returns a newRecordInsightsobject that only contains the records for which the FilterCondition applies.FilterCondition field[Required]: the field upon which to check the criteria.equals: ensures that the fields value is equal to the provided value.isNull: ensures that the field must have a null value.startsWith: ensures that the fields value starts with the provided text.Using startsWith on non-string fields will convert them into strings first.This may lead to unexpected, but correct results.endsWith: ensures that the fields value ends with the provided text.Using endsWith on non-string fields will convert them into strings first.This may lead to unexpected, but correct results.likeRegex: ensures that the fields value matches the providedregular expression. Using likeRegex on non-string fields will convert theminto strings first. This may lead to unexpected, but correct results.lessThan: ensures that the fields value is less than the providedvalue. Using lessThan on non-numeric fields will raise an error.This may lead to unexpected, but correct results.lessEquals: ensures that the fields value is less than or equals theprovided value. Using lessEquals on non-numeric fields will raise an error.greaterThan: ensures that the fields value is greater than the providedvalue. Using greaterThan on non-numeric fields will raise an error.greaterEquals: ensures that the fields value is greater than or equals theprovided value. Using greaterEquals on non-numeric fields will raise an error.after: ensures that the fields value is after the provided value.Using after on non-time fields will raise an error.since: ensures that the fields value is after or at the provided value.Using since on non-time fields will raise an error.before: ensures that the fields value is before the provided value.Using before on non-time fields will raise an error.until: ensures that the fields value is before or at the provided value.Using before on non-time fields will raise an error.invert: negates the results of the checks.
sort(criteria: [SortCriteria!]!): RecordInsights!: Returns a new RecordInsights object that contains the records ordered by the provided SortCriteria.SortCriteria field[Required]: the field to sort by.direction: defines whether to sort ascending or descending. Allowed valuesareASCandDESC.
group(fields: [String!]!, caseSensitive: Boolean): [RecordInsights!]!: Returns a list of RecordInsights objects where the records have been grouped by the provided fields.limit(count: Int!, offset: Int): RecordInsights!: Returns a new RecordInsights object that contains up to 'count' records.count: Int!: Returns the amount of records in the currently selected list.countDistinct(fields: [String!]!, caseSensitive: Boolean): Int!: Returns the number of unique non-null values for the provided field(s).first: Record: Returns the first record in the list or null for empty lists.last: Record: Returns the last record in the list or null for empty lists.values(field: String!): [Any]!: Returns all non-null values of the current records for the provided field.valuesDistinct(field: String!, caseSensitive: Boolean): [Any]!: Returns all unique non-null values of the current records for the provided field.frequencyDistribution(field: String!, top: Int, direction: SortDirection): [FrequencyDistributionEntry!]!: Returns how often a non-null value for the provided field is present.FrequencyDistributionEntry value: holds the value for which the percentage and frequency applies.frequency: is the number of records that have the value.percentage: is the percentage of records that have the value. Forcalculating the percentage only non-null values are considered.
confidence(field: String!, caseSensitive: Boolean): Floatdescribes the probability of having the one truly correct value for the provided path. The resulting value is a float ranging from 0 to 1 representing a percentage. Null values are ignored in the calculation. Returns null if all values are null.average(field: String!): Float: Returns the average value of the provided numeric field.max(field: String!): Float: Returns the highest value of the provided numeric field.median(field: String!): Float: Returns the median value of the provided numeric field.min(field: String!): Float: Returns the lowest value of the provided numeric field.sum(field: String!): Float: Returns the sum of the provided numeric field.standardDeviation(field: String!): Float: Calculates the standard deviation for the provided numeric field.newest(field: String!): Record: Returns the record for where the provided time field has the highest (most recent) value.oldest(field: String!): Record: Returns the record for where the provided time field has the lowest (least recent) value.flatten(field: String!): [Any]!: Merges the values of the provided array field into a single array.flattenDistinct(field: String!, caseSensitive: Bool): [Any]!: Merges the values of the provided array field into a single array where each value is unique.
#
Examples
#
Numerical Functions
query {
entity(input: {id: "123"}) {
records {
price
}
recordInsights {
average(field: "price")
max(field: "price")
median(field: "price")
min(field: "price")
standardDeviation(field: "price")
sum(field: "price")
}
}
}
{
"data": {
"entity": {
"records": [
{"price": 10},
{"price": 20},
{"price": 30},
{"price": 15},
{"price": 25},
{"price": null}
],
"recordInsights": {
"average": 20,
"max": 30,
"median": 20,
"min": 10,
"standardDeviation": 7.0710678118655,
"sum": 100
}
}
}
}
This query retrieves all the price field for all the records of the entity,
as well as some statistics related to the price field of their records.
Specifically, it calculates the average, maximum, median, minimum, and sum of
the prices, as well as their standard deviation.
The price field is assumed to be numeric, as all of these
functions require a numeric field. If the price field values were not numeric,
an error would be raised.
Null values are ignored in these calculations
#
Filter and Sort
query {
entity(input: {id: "123"}) {
recordInsights {
filter(conditions: [
{ field: "status", equal: "active" },
{ field: "created_at", after: "2022-01-01T00:00:00Z" }
]) {
sort(criteria: [
{ field: "created_at", direction: DESC },
{ field: "priority" }
]) {
limit(count: 10) {
records {
id
name
status
priority
created_at
}
}
}
}
}
}
}
This query retrieves the RecordInsights for an entity with ID 123.
It applies a filter to only include records where the status field is "active"
and the created_at field is greater than or equal to January 1st, 2022. It
then sorts the resulting records by created_at in descending order and
priority in ascending order. It limits the result to 10 records starting
from the first one. Finally, it selects the id, name, status, priority,
and created_at fields for each of the selected records.
#
Group
query {
entity(input: {id: "123"}) {
recordInsights {
group(fields: ["category"]) {
records {
id
name
category
}
count
names: valuesDistinct(field: "name")
categories: valuesDistinct(field: "category")
}
}
}
}
Assuming the following records for the entity with ID "123":
[
{
"id": "1",
"name": "Product A",
"category": "Electronics"
},
{
"id": "2",
"name": "Product B",
"category": "Electronics"
},
{
"id": "3",
"name": "Product C",
"category": "Clothing"
}
]
The response would be:
{
"data": {
"entity": {
"recordInsights": {
"group": [
{
"records": [
{
"id": "1",
"name": "Product A",
"category": "Electronics"
},
{
"id": "2",
"name": "Product B",
"category": "Electronics"
}
],
"count": 2,
"names": ["Product A","Product B","Product C"],
"categories": ["Electronics","Clothing"]
},
{
"records": [
{
"id": "3",
"name": "Product C",
"category": "Clothing"
}
],
"count": 1
}
]
}
}
}
}
This query groups the records by their category field, returning two groups:
one for the "Electronics" category with a count of 2, and one for the "Clothing"
category with a count of 1. The records field of each group contains the records
in that group, and the count field is the total number of records in that group.
#
Edge Insights
Provides statistics and aggregation on the entities edges and duplicates. This is available on the entity type.
#
Available Functions
count: Int!: Returns the amount of edges in the provided list.frequencyDistribution(top: Int, direction: SortDirection): [FrequencyDistributionEntry!]!: Returns how often a rule is present.matrix(links: [String!]): [EdgeMatrixEntry!]!: Returns a matrix in which it is possible to see the links between each two records and due to which rule or duplicate they are linked.
#
Examples
#
Edge Matrix
query {
entity(input: {id: "123"}) {
edges
duplicates
edgeInsights {
matrix(links: ["R1", "R2", "R3", "duplicate"]) {
a
b
links
}
}
}
}
The response would be:
{
"data": {
"entity": {
"edges": [
"1:2:R1",
"1:2:R2",
"1:2:R4",
"1:3:R1"
],
"duplicates": {
"1": ["4"]
},
"edgeInsights": {
"matrix": [
{
"a": "1",
"b": "2",
"links": {
"R1": true,
"R2": true,
"R3": false,
"duplicate": false
}
},
{
"a": "1",
"b": "3",
"links": {
"R1": true,
"R2": false,
"R3": false,
"duplicate": false
}
},
{
"a": "1",
"b": "4",
"links": {
"R1": false,
"R2": false,
"R3": false,
"duplicate": true
}
}
]
}
}
}
}
As you can see, the output of the edge matrix contains the same information as
edges and duplicates. However, depending on the use case, it might be easier
to work with on the client side. Also note, that the links parameter is
optional. When omitting it, the result would not contain the values for R3 as
they are all false, but would instead receive the output for R4, which was
filtered out before.
#
What-IF Machine
During search, entity by ID and entity by record ID queries, you can provide
optional filter criteria using the considerRecords field. This filters the
entities that were originally found and only takes into account the records that
match those filters. As a result all other properties (edges, duplicates and
hits) are updated accordingly, as if the other records did not exist at all.
This may even lead to situations where you end up with multiple entities or no results at all, or even a much smaller entity, despite only filtering a single record. The result behaves exactly as if you would have never added those records or as you would remove them.
It can be used with a variety of what-if scenarios. For example, you can observe
the state of an entity at any given time in the past using the until or before
filter condition. Alternatively, you could use it to see what will happen when
records are deleted due to old age using the after or since condition.
Another use case is to visualize how an entity would appear without the records
from a specific source by using the equals and invert filters.
Example for a time based filter:
{
search(input: {
parameters: {
myCustomField: "some-value"
}
considerRecords: {
field: "myTimestamp"
before: "2023-05-12T13:23:00Z02:00"
}
}) {
entities {
id
records {
id
myCustomField
}
edges
duplicates
hits
}
}
}
This will return the entity with only records that have the custom field
myTimestamp at a value before 2023-05-12T13:23:00Z02:00. The result then
depends on how the records are connected and which records were hit.
Example for a filter to exclude certain record IDs:
{
search(input: {
parameters: {
myCustomField: "some-value"
}
considerRecords: [
{
field: "id"
before: "<record-id-a>"
invert: true
},
{
field: "id"
before: "<record-id-b>"
invert: true
}
]
}) {
entities {
id
records {
id
myCustomField
}
edges
duplicates
hits
}
}
}
This will ignore the records <record-id-a> and <record-id-b> before
rebuilding the entity. Keep in mind that you still might end up with a lot less
entities, if e.g. one of those two records were the only record that connected
two larger clusters.
Many more filter options are available. Please use the GraphQL introspection feature to see what is possible.
#
Metrics
Access
The scope tilores/query.metrics is required to use these queries. It can be
requested when obtaining access tokens.
#
Assembly Status
Feature Availability
This feature is currently only available when using SQS as raw data queue.
This query is useful when performing batch data ingestion, where you can check the status of the assembly process.
#
State
READYthe assembly process is idle. (no records in queue)IN_PROGRESSthe assembly process is active. (records are in queue to be/and currently are being ingested)
#
Estimated Time Remaining
Shows estimated time left in minutes for the records in queue to be fully assembled.
When there are not enough data points to make the estimate then the result is null.
Estimation Accuracy
The estimated time can be highly inaccurate in the following cases:
- The first 10 minutes after a batch submission.
- Continuous record submission at variable rates.
- Record updates.
#
Example
Request
query {
metrics {
assemblyStatus {
state
estimatedTimeRemaining
}
}
}
Response
{
"data": {
"metrics": {
"assemblyStatus": {
"state": "IN_PROGRESS",
"estimatedTimeRemaining": 6
}
}
}
}
#
Performance Recommendations
To optimize performance and reduce response size, use pagination and request only the fields necessary for your workflow. When working with entity-resolved data, prefer field aggregations over querying raw records directly, as raw data often contains duplicates. Aggregations return the relevant consolidated values, minimizing payload size and improving query efficiency.
For optimal performance with large response payloads, include the
Accept-Encoding: gzip header in your requests. This enables the server to
compress the response, reducing bandwidth usage and improving transfer speed.
#
Entity Event Stream
Not yet integrated into the GraphQL API, but also very helpful when it comes to observing changes in entities is the entity event stream. This stream is provided via AWS SQS or AWS Kinesis based on what is selected during deployment. Requires an IAM user to access the data.
Every single data change in Tilores is published via that stream. The following events are currently available.
#
Create Event
The create event will be published, when an entity was newly created, meaning a record was submitted, but was not attached to an existing entity.
Example:
{
"type": "CREATE",
"timestamp": "2022-01-01T00:00:00.000000000Z",
"data": {
"entities": [{
"id": "<new-entity-id>",
"recordIDs": ["<new-record-id>"],
"obsoleteRecordIDs": []
}],
"obsoleteEntities": []
}
}
#
Update Event
The update event will be published when a record is submitted and was matched with exactly one existing entity.
Example:
{
"type": "UPDATE",
"timestamp": "2022-01-01T00:00:00.000000000Z",
"data": {
"entities": [{
"id": "<entity-id>",
"recordIDs": ["<new-record-id>","<record-id>"],
"obsoleteRecordIDs": []
}],
"obsoleteEntities": []
}
}
#
Merge Event
The merge event will be published when a record is submitted and was matched with more than one existing entity, resulting in all of these entities being merged.
Example:
{
"type": "MERGE",
"timestamp": "2022-01-01T00:00:00.000000000Z",
"data": {
"entities": [{
"id": "<entity-id-a>",
"recordIDs": ["<record-id-a>","<record-id-b>","<new-record-id>"],
"obsoleteRecordIDs": []
}],
"obsoleteEntities": [
{
"id": "<entity-id-b>",
"recordIDs": [],
"obsoleteRecordIDs": []
}
]
}
}
#
Split Event
The split event will be published when an edge or a record that connected at least two parts of an entity was deleted, resulting in at least two new entities.
Example:
{
"type": "SPLIT",
"timestamp": "2022-01-01T00:00:00.000000000Z",
"data": {
"entities": [
{
"id": "<entity-id-a>",
"recordIDs": ["<record-id-a>"],
"obsoleteRecordIDs": ["<record-id-a-b>"]
},
{
"id": "<entity-id-b>",
"recordIDs": ["<record-id-b>"],
"obsoleteRecordIDs": []
}
],
"obsoleteEntities": []
}
}
#
Delete Event
The delete event will be published when all records from an entity have been removed.
Example:
{
"type": "DELETE",
"timestamp": "2022-01-01T00:00:00.000000000Z",
"data": {
"entities": [],
"obsoleteEntities": [
{
"id": "<entity-id>",
"recordIDs": [],
"obsoleteRecordIDs": ["<deleted-record-id>"]
}
]
}
}
#
Offload Event
If an event exceeds the SQS size limit (default 256 KB), it is offloaded to S3. A shortened event is published instead, containing a presigned URL to download the full message.
The full message has the same structure as its documented type and can be retrieved with a simple HTTP GET request to the provided URL.
You must download the full event before the URL expires (default 4 days). The expiry is configurable but only applies to newly published events.
Example:
{
"type": "OFFLOAD",
"timestamp": "2022-01-01T00:00:00.000000000Z",
"url": "https://example.s3.region.amazonaws.com/sqs-events/timestamp-v4uuid.json?authparams"
}
#
Advanced Analytics
Tilores is designed for realtime use cases and optimized for querying individual entities. If you want to analyze the resulting entities using a SQL-like interface, you can use AWS Athena for this purpose. This requires that the advanced analytics are enabled.
Once enabled, Tilores will automatically snapshot the data that was modified
within the last few minutes and create a entities and a records table that
can be used from within Athena.
Enabling the analytics module on an existing and populated instance is not recommended, but possible. This requires manual steps depending on the instance size. Please contact service@tilores.io.
#
Entities Table
The entities table contains general information about the entities stored in Tilores. This includes some statistical values, such as number of records or number of edges.
The following table gives an overview about the available fields:
Fields marked with a * are not guaranteed to be stable and might change
without prior notice.
#
Records Table
The records table contains details about the ingested records.
The following table gives an overview about the available fields:
#
Example Queries
Below are some common query examples.
Top 10 largest entities:
SELECT
entity_id,
record_count,
edge_count,
duplicate_count,
update_timestamp,
records
FROM entities
ORDER BY record_count DESC, edge_count DESC
LIMIT 10
Preview of 10 records that were recently updated:
SELECT
record_id,
entity_id,
submit_timestamp,
data
FROM records
ORDER BY submit_timestamp DESC
LIMIT 10
Entities with the most distinct value on a specific field:
SELECT
entity_id,
count(distinct field) AS field_count,
array_agg(distinct field) as field_values
FROM (
SELECT
entity_id,
json_extract(data, '$.myCustomField') AS field -- Adjust the path to one of the fields as defined in the schema
FROM records
)
GROUP BY entity_id
HAVING count(distinct field) > 0
ORDER BY count(distinct field) DESC
LIMIT 100
Record preview of the largest entity:
WITH e AS (
SELECT
*
FROM entities
ORDER BY record_count DESC, edge_count DESC
LIMIT 1
)
SELECT
e.entity_id,
e.edge_count,
r.record_id,
json_parse(r.data) as data
FROM records r
JOIN e ON e.entity_id = r.entity_id
Please refer to the Athena SQL DML documentation for further details on how to query the tables.