Data Hub & lineage
One hub for every shape of data — with the receipts attached.
Structured rows, unstructured documents and IoT telemetry rarely live together. In the Indentia Data Hub they do — joined by what they mean, not by where they came from. Every record arrives with full lineage, so a question always has an answer and a paper trail.
How it combines
Three data shapes, one entity-keyed hub.
Reactive & proactive
Acts when the data does. Looks when it doesn't.
Some sources emit events the moment something changes — a chat message arrives, a sensor crosses a threshold, a record gets updated. The hub reacts immediately. Other sources — old databases, file shares, archive systems — never tell you anything. The hub scans those on a schedule, detects deltas, and pulls only what's new. One model, two behaviours, no gaps.
- Reactive — webhooks, CDC streams, NATS events, IoT telemetry. New data is visible within seconds.
- Proactive — scheduled crawlers with delta detection. Only changed rows / files / objects come through.
- Cross-source joins — a customer's contract (structured), their support emails (unstructured) and their device telemetry (IoT) all link to the same entity.
- One query language — SPARQL over the unified graph. Lineage and data live side by side.
Capabilities
What the hub does for the data.
One hub for all data shapes
Structured tables, unstructured documents and IoT telemetry land in the same hub. Joined by entity (an order, a sensor, a person, a contract) — not by file location.
Lineage on every record
Every record carries an OpenLineage chain back to its source: which file, which sensor, which transformation, which approval. Answer the regulator with a query, not a forensic exercise.
Reactive ingestion
New events trigger pipelines automatically. A document drop, an IoT signal, a row change — each one fans out to the consumers that care, with backpressure to keep things sane.
Proactive scanning
For sources that don't emit events, the hub scans on a schedule — with delta detection so unchanged data doesn't get re-processed.
Data contracts
Each producer publishes a contract: schema, freshness, SLA. Breakages are caught at the boundary, not deep inside a downstream notebook.
Lineage and data, same store
Lineage is RDF in the same knowledge graph as the data itself. Query "show me every report that depended on this dataset" with one SPARQL statement.
Available to
Once it's in the hub, it's everywhere it needs to be.
Search
Hybrid retrieval that joins structured rows with unstructured paragraphs and live signals.
Agents
Multi-step agents reason across all three shapes — with lineage attached to every claim.
Analytics & BI
Lineage-aware datasets feed dashboards, notebooks and forecasting models.
Audit & compliance
Trace any output backward to every source that touched it.
Lineage in practice
Trace any answer back to every source that shaped it.
A regulator asks "where did this number come from?". A controller asks "which contracts referenced this clause version?". A model owner asks "which datasets train this classifier?". With lineage co-located in the knowledge graph, those become one-line queries — not month-long forensic projects.