Guides
Tableflow
Materialize Kafka topics into open table formats
Overview
Tableflow continuously materializes a Kafka topic into an open table format — Apache Iceberg or Delta Lake — stored in Azure Data Lake Storage Gen2 or Confluent Managed Storage. Once enabled, analytics engines can query the topic data directly without requiring a Kafka consumer.
Tableflow output can optionally be registered in an external data catalog (Databricks Unity Catalog) to make the materialized tables discoverable and queryable from tools such as Databricks SQL or Azure Databricks notebooks.
Confluent Documentation
Prerequisites
Tableflow is not enabled by default. Before any team can use Tableflow, the Platform Team must activate it for the target Confluent environment. This is a one-time, environment-level setup that involves configuring the Confluent Azure Provider Integration and provisioning Tableflow API credentials in Azure App Configuration.
Contact Platform Team First
Topic Requirement
Each topic that should use Tableflow must have a Data Contract — a value schema registered in the Schema Registry. Tableflow relies on the schema to project Kafka records into columns. Topics without a registered schema cannot have Tableflow enabled.
Creating a Tableflow Configuration
Tableflow is enabled per topic. Navigate to the topic's detail page and click Enable Tableflow in the Tableflow section. The four-step wizard guides you through the configuration:
Step 1 — Table Format
Choose the open table format for materialization:
| Format | Storage options | Notes |
|---|---|---|
| Apache Iceberg | Confluent Managed or Azure Data Lake Storage Gen2 | Recommended for broad tool compatibility |
| Delta Lake | Azure Data Lake Storage Gen2 only | Confluent Managed Storage is not supported for Delta |
Step 2 — Storage Backend
Choose where the materialized table data is stored:
- Confluent Managed Storage (Iceberg only) — Confluent provisions and manages the underlying object storage. No extra Azure configuration required.
- Azure Data Lake Storage Gen2 (BYOS) — Bring your own storage account and container. Requires the storage account and container to already exist, and the Confluent Azure Provider Integration's role assignment must cover the target storage account. You supply the Storage Account Name and Container Name. Storage region and provider integration are read from the environment configuration automatically.
Step 3 — Advanced Settings
| Setting | Description | Default |
|---|---|---|
| Snapshot expiration | How long snapshots (Iceberg) or versions (Delta) are kept. Minimum 1 day; always keeps at least 10 snapshots. | 7 days |
| Error handling | NONE — suspend on first bad record. LOG — route bad records to a dead letter queue topic (requires Avro or Protobuf schema). | NONE |
| Dead letter queue topic | Only shown when error handling is LOG. Leave blank to use the default error_log topic (created automatically by Confluent if it does not exist). |
error_log |
Step 4 — Review & Confirm
Review all configuration details before submitting. Once confirmed, Confluent will begin materializing the topic. The status transitions to RUNNING when materialization is active.
Write Access Required
allow_modify must be enabled for the stage). Users with
read-only (Viewer) access can see the current Tableflow status on a topic's detail page but cannot make changes.
External Catalog Integration
Once Tableflow is running, you can register its output in an external data catalog so the materialized tables are discoverable from tools like Databricks. Navigate to Tableflow in the sidebar and use the External Catalog Integration section.
Tableflow Config Must Exist First
Supported Catalogs
Currently only Databricks Unity Catalog is supported. AWS Glue and Snowflake catalog integrations are not yet available in the self-service portal.
Required Fields
| Field | Description |
|---|---|
| Display Name | A human-readable label for this integration (e.g. eon-unity) |
| Workspace URL | The Databricks workspace URL associated with the Unity Catalog (e.g. https://adb-123….azuredatabricks.net) |
| Catalog Name | The Unity Catalog name in Databricks where tables will be registered (e.g. eon_catalog) |
| OAuth Client ID | The Databricks service principal client ID that Confluent uses to authenticate against Unity Catalog |
| OAuth Client Secret | The corresponding secret. Write-only — never returned by the API. Leave blank on edit to preserve the existing secret. |
Databricks Service Principal Permissions
The OAuth service principal must be created in the Databricks workspace under Settings → Identity and Access → Service Principals with a client ID and client secret generated for OAuth. It must then be granted the following permissions before the integration can reach CONNECTED status:
| Scope | Permission / Role | Why it's needed |
|---|---|---|
| Unity Catalog | Data Editor role (includes USE CATALOG, USE SCHEMA, CREATE TABLE, SELECT, MODIFY) |
Allows Confluent to create and manage a schema (named after the Kafka cluster ID) and register each Tableflow-enabled topic as an external table inside it |
| External Location | CREATE EXTERNAL TABLE |
Allows Confluent to register new external tables at the storage path |
| External Location | READ FILES |
Allows reading data files from the Tableflow storage location |
| External Location | WRITE FILES |
Allows writing and updating table data files at the storage location |
What is an External Location? An External Location is a Databricks Unity Catalog object that maps a path in an external cloud storage account to a named, access-controlled resource. The External Location required here must point to the same Azure Data Lake Storage Gen2 storage account and container that was configured in the Tableflow Config (Step 2 of the Enable Tableflow wizard). Databricks uses it to verify that the service principal is allowed to read and write the table files that Confluent materializes at that path.
Who Can Manage Catalog Integrations
Creating, editing, and deleting External Catalog Integrations is restricted to
administrators (daai-platform-admin group).
Non-admins can see the catalog status displayed on a topic's detail page but cannot make changes.
Costs
Tableflow pricing depends on the number of topics enabled, data volume, and storage choice. Because these variables differ significantly per use case, if you want to know how much it costs we need to provide a proper cost estimation based on actual requirements. Reach out to the Platform Team to get a tailored estimate before enabling Tableflow at scale.
Limitations
Schema required
A topic must have a registered value schema (Data Contract) before Tableflow can be enabled. Schemaless topics are not supported.
LOG error handling requires Avro or Protobuf
The LOG error handling mode needs a structured schema to serialize failed records to the dead letter queue. JSON Schema topics must use NONE.
Delta Lake requires BYOS
Delta Lake only supports Azure Data Lake Storage Gen2 — Confluent Managed Storage is not available. Apache Iceberg supports both.
No on-demand suspend
Tableflow cannot be paused manually. SUSPENDED only appears when Confluent auto-suspends due to errors. Use Resume after resolving the issue.
One integration per catalog kind per cluster
Confluent allows at most one Unity Catalog integration per environment + cluster. Delete the existing one before creating a new one. Other catalog kinds (e.g. AWS Glue) each have their own independent slot.
Credentials rotate as a pair
client_id and client_secret must be provided together when updating. Leave both blank to keep existing credentials.
Deleting a catalog integration does not remove Unity Catalog tables
Removing the integration stops Confluent from registering new materializations. Tables already registered in Databricks Unity Catalog are not deleted — they must be removed manually if no longer needed.