Guides
Stream Lineage
Visualize producers, topics, processors, and consumers as a graph
Overview
Stream Lineage renders an interactive, left-to-right graph of every producer, topic, processor
(Managed Connect, Flink SQL, Tableflow), lake table, and consumer group running in your
(business_unit, stage) context. It answers the question
"who produces to and consumes from this topic, and what runs in between?" without leaving
the self-service portal.
The graph is reconstructed on demand by querying Confluent Cloud directly — Managed Connect, Flink SQL v1, Tableflow, the Telemetry (Metrics) API, the Schema Registry, and the Kafka REST endpoint — and mapping the results to the open OpenLineage object model. Nothing is stored: every page load rebuilds the graph from authoritative sources (with a short cache window — see Caching).
Two surfaces share the same backend:
-
The full-lineage page at
/ui/lineage— cluster-wide graph for the current context. -
The Stream Lineage card on the topic page
(
/ui/topics/{bu}/{stage}/{topic}) — a 1-hop subgraph centred on the focused topic. It calls the same endpoint with an extra?topic=<name>query param that pushes the topic filter server-side into the Metrics queries, so the response is small and fast.
Reference
Access
Stream Lineage is writer-gated: developers, BU admins, super-admins, and
platform admins can view it. Viewers and unauthenticated users are denied. The gate applies
to both the page (/ui/lineage) and the backing endpoint
(GET /{business_unit}/{stage}/lineage); the sidebar link
is also hidden for viewers so they normally never reach the page.
Why writer-gated?
How It Works
On every uncached page load the backend issues parallel async calls to seven Confluent sources and merges the results into a single OpenLineage event stream. The frontend then bootstraps a Cytoscape.js graph with a dagre layout.
When the request carries ?topic=<name> (the topic-page
card always does), the dataflow source AND-filters every
Metrics query on metric.topic so the API returns only rows
involving that topic. Cluster-wide callers (/ui/lineage) omit
the param and get the full graph.
| Source | Confluent API | Contributes |
|---|---|---|
connect |
Managed Connect REST | Connector name + direction (source/sink) per lcc-* id. Used to reclassify the matching dataflow producer / consumer-group edges into Connect source / Connect sink nodes — the topic bindings themselves come from telemetry, not the static connector config. |
flink |
Flink SQL v1 + Compute Pools (fcpm/v2) | RUNNING Flink statements → topic-to-topic edges (parsed from the SQL) |
tableflow |
Tableflow v1 | Topic → Delta / Iceberg table sync edges |
dataflow |
Telemetry API /v2/metrics/dataflow/query |
Producer & consumer-group edges + per-topic throughput facet. Three queries (bytes_in, bytes_out, consumer_lag_offsets) fired in parallel — matching exactly what the Confluent Cloud portal fires. Surfaces non-committing consumers (e.g. Spark Structured Streaming) the documented consumer_lag_offsets metric never returns. |
dataflow_ksql |
Telemetry API (ksqlDB query metrics, gated on cluster presence) | ksqlDB query → topic edges (skipped silently when no ksqlDB cluster exists) |
schema_registry |
Schema Registry | Per-topic value & key schemas (facet enrichment) |
kafka_rest |
Kafka REST v3 | Topic partition counts & replication factor (facet enrichment) |
Each source runs independently. If one fails (auth, transport, 5xx), the others still contribute — see Partial Responses.
Reading the Graph
Data flows left to right. Two node families render distinctly:
Dataset nodes
Kafka topics (pink), Delta tables and Iceberg tables (indigo). Topic nodes carry partition, replication, and schema facets when the enrichments succeed.
Job nodes
Anything that moves data, colour-coded by kind: producers and consumer groups (teal), bidirectional apps (cyan), Connect source connectors (amber), Connect sink connectors (orange), Flink statements (violet), ksqlDB queries (sky), Tableflow syncs (slate). Job nodes carry kind-specific facets (connector class, Flink SQL excerpt, table format, throughput, etc.).
Badge nodes (aggregated, currently disabled)
The graph supports collapsing groups of producers / consumer groups that share an identical topic-set into one PRODUCERS · N or CONSUMER GROUPS · N badge — the same rule the Confluent Cloud portal uses. Aggregation is intentionally turned off in the current build while its effect on real graphs is being evaluated; every producer and consumer group renders as an individual node. The toolbar's Expand all groups button is a no-op until aggregation is re-enabled.
Toolbar & Interaction
| Action | Behaviour |
|---|---|
| Click a node | Opens a right side-panel listing every OpenLineage facet attached to that node. |
| Period selector | Switches the telemetry lookback window (presets from 10 minutes up to 24 hours). The window is sent as ?start=&end= on the lineage fetch and drives the dataflow queries. The selection is not persisted — every page load resets to the surface's default: Last 10 minutes on /ui/lineage, Last hour on the topic card. Tight defaults keep both pages fast on entry; the Metrics API gets slow at long windows. |
| Search | Type-ahead search across node labels. Selecting a result centres and highlights the node. |
| Fit to screen | Re-runs the layout fit so the whole graph is visible in the canvas. |
| Hide internal resources | Hides Confluent-internal producers, consumer groups, and topics (e.g. _confluent-*) so the graph shows only user-owned flows. |
| Refresh | Cache-busts the browser fetch. The server-side 60 s cache still applies; see Caching. |
| Export PNG | Downloads the full graph (not just the viewport) at 2× scale. |
Caching
Lineage responses are cached server-side for 60 seconds via aiocache, keyed by
(business_unit, stage, window.start, window.end, topic).
Repeated loads within the window share a single build — important because a full graph build can
fan out a dozen or more API calls against Confluent. Cluster-wide (topic=None)
and topic-scoped responses occupy independent cache slots so they don't poison each other.
Picking a different period in the toolbar produces a new cache entry; the default key rolls
forward every minute as the window's end is snapped to the current minute.
The browser also gets Cache-Control: private, max-age=60.
Newly created resources (topics, connectors, consumer groups) show up on the next build after the
TTL expires.
Partial Responses
Source failures never return a 5xx. The endpoint always answers 200 OK
with the events it could build, and adds two response headers when one or more sources failed:
X-Lineage-Failed-Sources: dataflow,flink
A yellow banner above the graph enumerates the failed sources so you can tell at a glance whether you are looking at the full picture or a degraded view. The most common causes:
| Failed source | Likely cause | Fix |
|---|---|---|
dataflow / dataflow_ksql |
Admin Cloud API key is missing the MetricsViewer role. | Grant MetricsViewer at org or env scope on the admin service account. |
flink |
Compute pool listing returned 5xx or Flink SQL v1 rejected the credential. | Verify the per-landing-zone flink_<region>_client_id / _secret entries in App Configuration. |
connect |
At least one connector's config could not be parsed. | Check application logs for the offending connector name; the entire source fails on a single parse error by design. |
schema_registry / kafka_rest |
Per-context credentials missing or expired. | Same credentials used by Topics & Schemas pages — verify those work first. |
Flink not configured ≠ failure
Limitations
No ksqlDB or self-managed Connect
EON does not run ksqlDB or self-managed Connect clusters today. Statements / connectors running outside Confluent Managed Connect & Cloud Flink will not appear.
Spark Structured Streaming granularity
Non-committing consumers (e.g. Spark Structured Streaming, which uses
assign() and an external checkpoint store) are
surfaced via the dataflow bytes_out
rows — one consumer-group node per StreamingQuery (queryRunId),
with all Kafka sources in that query merged into a single node. Per-executor breakdown isn't
shown; "Spark application" isn't a Confluent concept.
SQL parsing is regex-based
Complex Flink SQL patterns (CTEs, lateral joins, deeply nested subqueries) may not yield edges. The statement still renders as a job node — only its input / output detection is affected.
Short default lookback
Producer and consumer edges come from the Telemetry API. Each surface defaults to a short
window — Last 10 minutes on /ui/lineage,
Last hour on the topic card — to keep page loads fast. Clients that haven't produced
or committed in that window won't show up; widen the window via the toolbar to look further
back. Direct API calls without ?start=&end= fall
back to the server-side default controlled by
CONFLUENT_METRICS_LOOKBACK_MINUTES.
Topic-level only
Column / field-level lineage is not extracted. Schemas are attached as facets on topic nodes for inspection, but edges are always topic-to-job-to-topic.
Desktop-only
Below 768 px the page shows an "open on desktop" placeholder. Large DAGs with badges are not usable on a phone screen.