For best and solid profiling and prediction, Taltosh creates a factual event space. Each event is an atomic data item having at primary type and a set of properties (GPS, bank transaction, IP communication, a ledger entry), optionally equipped with other data such as timestamp and payload. However these columnar data is not always obtainable. This is where Taltosh proprietary NLP model based factual data extraction comes into play and converts your unstructured data to a set of relevant events that can be categorized, aggregated, processed and easily analyzed.
Do you have a different set of data that is to be fact extracted? With an on-premise setup, Taltosh engineers quickly setup a fine-tuner engine for your corpus – or sets of corpuses – with the appropriate transformation goals. In case you have a specific language Taltosh is also able to optimize the workflow and extraction processes to deliver the most precise fact extraction from your unstructured data.
Step 1.
Unstructured data input
Content (unstructured text)
During the trial of Maxim Ye the judge exposed an information about the toxic relationship between Mr. Ye and his spouse Anna T.; Anna reported abuse on friday 2023-01-01 that was allegedly orchestrated by Mr. Ye happened at their home, Oregon Drive 3
Step 2.
Structured data output
Fact Set conversion
{ "id": 3, "tags": [ "spot", "drones" ], "payload": "During the trial of Maxim Ye the judge exposed an information about the toxic relationship between Mr. Ye and his spouse Anna T.; Anna reported abuse on friday 2023-01-01 that was allegedly orchestrated by Mr. Ye happened at their home, Oregon Drive 3", "acquisition": { "timestamp": "unknown", "location": "trial", "type": "report" }, "entity": [ { "type": "person", "idref": "Maxim Ye", "uuid": "maxim_ye_0", "identifiers": [ ], "properties": [ { } ] }, { "type": "person", "idref": "Anna T.", "uuid": "anna_t_0", "identifiers": [ ], "properties": [ { } ] }, { "type": "location", "idref": "Oregon Drive 3", "uuid": "oregon_drive_3_0", "identifiers": [ ], "properties": [ { } ] } ], "activities": [ { "id": 0, "type": "presence", "date_start": "2023-01-01", "date_precision": "precise", "tags": [ "abuse" ], "severity": "fact", "location": "oregon_drive_3_0", "participants": [ { "entityref": "maxim_ye_0", "role": "emitter" }, { "entityref": "annat_t_0", "role": "receiver" } ] } ], "links": [ { "entity_a": "maxim_ye_0", "entity_b": "anna_t_0", "strength": "primary", "tags": [ "marriage", "spouse" ], "direction": "0" } ] },
Step 3.
Flatten: artifacts, entities, events and links
Persistence
This step homogenizes the data and stores atomic entries for each data type in the respective storages for fast querying and easy visualization purposes.
Entities: at first step, entities are being looked up to find similar entities, the algorithm either store entities and their properties or enrich existing ones. Additional enrichment is done in this step, for example resolving gps coordinates/hashes for places and addresses, formatting certain data types and resolving references.
Events: within each activity, an event is created and for each entity contained within the activity an additional event is created. This way, querying an entity activity timeline or all entities that were active around a GPS coordinate will be efficient and fast.
Links: each inferred and explicit link is created within the graph database enabling the running of quick, efficient and scalable relational mapping, lookup and distance queries on your data
Even if your case seems utterly complex we might already have a case resolver waiting for You!