Guides

Stream Lineage

Visualize producers, topics, processors, and consumers as a graph

Overview

Stream Lineage renders an interactive, left-to-right graph of every producer, topic, processor (Managed Connect, Flink SQL, Tableflow), lake table, and consumer group running in your (business_unit, stage) context. It answers the question "who produces to and consumes from this topic, and what runs in between?" without leaving the self-service portal.

The graph is reconstructed on demand by querying Confluent Cloud directly — Managed Connect, Flink SQL v1, Tableflow, the Telemetry (Metrics) API, the Schema Registry, and the Kafka REST endpoint — and mapping the results to the open OpenLineage object model. Nothing is stored: every page load rebuilds the graph from authoritative sources (with a short cache window — see Caching).

Access

Stream Lineage is admin-only. Both the page (/ui/lineage) and the backing endpoint (GET /{business_unit}/{stage}/lineage) require either a BU admin, super-admin, or platform admin role.

Why admin-only?

On shared clusters, the lineage graph is cluster-wide — it returns every edge regardless of BU naming prefix. Restricting it to admins prevents tenants from seeing each other's producers, consumer groups, connectors, and Flink statements.

How It Works

On every uncached page load the backend issues parallel async calls to six Confluent sources and merges the results into a single OpenLineage event stream. The frontend then bootstraps a Cytoscape.js graph with a dagre layout.

Source Confluent API Contributes
connect Managed Connect REST Source & sink connector topic bindings
flink Flink SQL v1 + Compute Pools (fcpm/v2) RUNNING Flink statements → topic-to-topic edges (parsed from the SQL)
tableflow Tableflow v1 Topic → Delta / Iceberg table sync edges
dataflow Telemetry API /v2/metrics/dataflow/query Producer & consumer-group edges + per-topic throughput facet. Surfaces non-committing consumers (e.g. Spark Structured Streaming) the documented consumer_lag_offsets metric never returns.
dataflow_ksql Telemetry API (ksqlDB query metrics, gated on cluster presence) ksqlDB query → topic edges (skipped silently when no ksqlDB cluster exists)
schema_registry Schema Registry Per-topic value & key schemas (facet enrichment)
kafka_rest Kafka REST v3 Topic partition counts & replication factor (facet enrichment)

Each source runs independently. If one fails (auth, transport, 5xx), the others still contribute — see Partial Responses.

Reading the Graph

Data flows left to right. Two node families render distinctly:

Dataset nodes (pink chips)

Kafka topics, Delta tables, and Iceberg tables. Topic nodes carry partition / replication / schema facets when the enrichments succeed.

Job nodes (teal chips)

Anything that moves data: Connect connectors, Flink statements, Tableflow syncs, producer client IDs, and consumer groups. Job nodes carry source-specific facets (connector class, Flink SQL excerpt, table format, throughput, etc.).

Badge nodes (aggregated)

When several producers or consumer groups attach to a single topic, they collapse into one PRODUCERS · N or CONSUMER GROUPS · N badge. Click the badge to expand the individual nodes inline.

Toolbar & Interaction

Action Behaviour
Click a node Opens a right side-panel listing every OpenLineage facet attached to that node.
Click a badge Expands the aggregated producers / consumer groups into individual nodes.
Refresh Cache-busts the browser fetch. The server-side 60 s cache still applies; see Caching.
Export PNG Downloads the full graph (not just the viewport) at 2× scale.

Caching

Lineage responses are cached server-side for 60 seconds, keyed by (business_unit, stage). Repeated loads within the window share a single build — important because a full graph build can fan out a dozen or more API calls against Confluent.

The browser also gets Cache-Control: private, max-age=60. Newly created resources (topics, connectors, consumer groups) show up on the next build after the TTL expires.

Partial Responses

Source failures never return a 5xx. The endpoint always answers 200 OK with the events it could build, and adds two response headers when one or more sources failed:

X-Lineage-Partial: true
X-Lineage-Failed-Sources: dataflow,flink

A yellow banner above the graph enumerates the failed sources so you can tell at a glance whether you are looking at the full picture or a degraded view. The most common causes:

Failed source Likely cause Fix
dataflow / dataflow_ksql Admin Cloud API key is missing the MetricsViewer role. Grant MetricsViewer at org or env scope on the admin service account.
flink Compute pool listing returned 5xx or Flink SQL v1 rejected the credential. Verify the per-landing-zone flink_<region>_client_id / _secret entries in App Configuration.
connect At least one connector's config could not be parsed. Check application logs for the offending connector name; the entire source fails on a single parse error by design.
schema_registry / kafka_rest Per-context credentials missing or expired. Same credentials used by Topics & Schemas pages — verify those work first.

Flink not configured ≠ failure

A context with zero Flink compute pools, or pools whose region has no credentials in App Configuration, simply contributes no Flink edges. This is not a partial failure — no header is set, no banner appears.

Limitations

Sources

No ksqlDB or self-managed Connect

EON does not run ksqlDB or self-managed Connect clusters today. Statements / connectors running outside Confluent Managed Connect & Cloud Flink will not appear.

Consumers

Spark / non-committing consumers

Consumers that use assign() instead of subscribe() (e.g. Spark Structured Streaming) don't commit offsets and are invisible to the consumer_lag_offsets metric — they won't render as consumer-group nodes.

Flink

SQL parsing is regex-based

Complex Flink SQL patterns (CTEs, lateral joins, deeply nested subqueries) may not yield edges. The statement still renders as a job node — only its input / output detection is affected.

Metrics

10-minute lookback window

Producer and consumer edges come from the Telemetry API with a default 10-minute lookback. Clients that haven't produced or committed in that window won't show up. Tunable via CONFLUENT_METRICS_LOOKBACK_MINUTES.

Lineage

Topic-level only

Column / field-level lineage is not extracted. Schemas are attached as facets on topic nodes for inspection, but edges are always topic-to-job-to-topic.

Mobile

Desktop-only

Below 768 px the page shows an "open on desktop" placeholder. Large DAGs with badges are not usable on a phone screen.

Esc