Structured data
Every warehouse, database and lake — behind one SQL endpoint.
Structured data sits in too many systems to query by hand. Indentia gives you a single, ludicrously fast SQL surface across all of them — a federated query engine under a sovereign gateway, with a self-service BI tool on top and a metadata catalogue that finally explains why a column means what it means.
The stack
A small set of well-chosen open-source engines, wired into one platform.
Trino, Superset and DataHub are open source — Indentia integrates them through the sovereign query gateway and shares lineage with the rest of the platform. No new query language to learn, no vendor lock-in at the SQL layer.
Trino
Ludicrously fast. Massively parallel. Open source.
Trino is the open-source distributed SQL engine designed to query large data sets from one or more disparate sources. Indentia uses it as the workhorse behind the query gateway: push-down to the engine that runs best, parallel execution across a coordinator and worker fleet, results back in a single SQL response.
One join across two systems
Customers in Postgres, orders in Snowflake, products in Iceberg — one SQL statement, the engine handles where each piece runs.
Right tool for the right query
Analytical scans go through Trino; OLTP point-lookups go directly to the source. The gateway routes; you write standard SQL.
Capabilities
What you get on top of plain Trino.
Federated SQL across every source
One SQL endpoint over Postgres, Oracle, SQL Server, MySQL, Snowflake, BigQuery, Iceberg, Parquet on S3 — and more. Query a Postgres table and a Snowflake table in the same statement, with the join pushed down to where it runs best.
Sovereign query gateway
Every SQL statement passes through an Indentia query gateway that enforces tenancy, row-level ACLs and per-source rate limits. Auditable from query plan to bytes returned — even when the underlying engine is a third-party warehouse.
Self-service BI on top
Superset dashboards plug directly into the same query gateway. Business analysts pick a dataset; security and lineage come along for free.
A catalog that knows the why
DataHub catalogues every dataset, column and dashboard with lineage, ownership, tags, glossary terms and quality checks. Search "monthly recurring revenue" — get the certified dataset, its owner, the joins behind it and which reports depend on it.
Lineage end to end
OpenLineage events from ingest jobs, dbt models, Superset queries and agents all land in DataHub. Trace any answer back to every source row that shaped it.
Joinable with everything else
Structured rows share entity IRIs with the documents, conversations and IoT signals in the knowledge graph. A customer in your CRM, their support emails and their device telemetry all link to the same Person/Organization entity.
Sources
Where the data lives.
Connect to the systems you already run. No copying, no shadow warehouse. The gateway federates over them and pushes work down to where it belongs.
- Warehouses — Snowflake, BigQuery, Redshift, Synapse — read-only via push-down SQL.
- Databases — Postgres, Oracle, SQL Server, MySQL, MariaDB, MongoDB.
- Lakehouse — Iceberg, Delta, Parquet on S3 / Garage / MinIO / Azure Blob.
- Streaming — Kafka, Pulsar, NATS — query in-flight events alongside historical data.
Consumers
Where the answers go.
Same identity, same ACLs, same lineage — whether the query is submitted by a person or an agent.
- Indentia agents — Autonomous Agents query the same SQL surface — reasoning across structured and unstructured at once.
- Superset — Dashboards, alerts, exports — under the same identity and ACLs as the rest of the platform.
- Notebooks & tools — JDBC / ODBC clients, Jupyter, Tableau, Power BI — connect once, scope by role.
- External lakes — Materialize curated views into Iceberg for downstream analytics teams.
DataHub — the metadata catalogue
A catalog that knows where every number came from.
DataHub indexes every dataset, column, dashboard and pipeline. It records ownership, tags, glossary terms, data-quality checks and full lineage — from raw source through dbt models and Superset dashboards. Indentia stores DataHub's metadata in the same knowledge graph as the rest of the platform, so a query about "what's behind this number" returns a tree, not a guess.
Discover
Search across every dataset, dashboard and pipeline in your estate.
Govern
Tag PII, certify trusted datasets, attach contracts and SLAs.
Trace
One click from a number on a dashboard to every source row that produced it.
Sovereign by construction
Your structured data stays in your own systems.
The gateway federates — it doesn't centralise. Queries push down to the source engines; only the results travel. Self-hosted or air-gapped deployments work without a SaaS dependency. Lineage, catalogue and audit live inside your perimeter — even when the underlying data does too.