Guides

Tableflow

Materialize Kafka topics into open table formats

Overview

Tableflow continuously materializes a Kafka topic into an open table format — Apache Iceberg or Delta Lake — stored in Azure Data Lake Storage Gen2 or Confluent Managed Storage. Once enabled, analytics engines can query the topic data directly without requiring a Kafka consumer.

Tableflow output can optionally be registered in an external data catalog (Databricks Unity Catalog) to make the materialized tables discoverable and queryable from tools such as Databricks SQL or Azure Databricks notebooks.

Prerequisites

Tableflow is not enabled by default. Before any team can use Tableflow, the Platform Team must activate it for the target Confluent environment. This is a one-time, environment-level setup that involves configuring the Confluent Azure Provider Integration and provisioning Tableflow API credentials in Azure App Configuration.

Contact Platform Team First

To enable Tableflow in your environment, open a request with the Platform Team and specify the target environment (dev / qas / run). The Platform Team will configure the required resources. Until this is done, the "Enable Tableflow" button on topic pages will be disabled.

Topic Requirement

Each topic that should use Tableflow must have a Data Contract — a value schema registered in the Schema Registry. Tableflow relies on the schema to project Kafka records into columns. Topics without a registered schema cannot have Tableflow enabled.

Creating a Tableflow Configuration

Tableflow is enabled per topic. Navigate to the topic's detail page and click Enable Tableflow in the Tableflow section. The four-step wizard guides you through the configuration:

Step 1 — Table Format

Choose the open table format for materialization:

Format Storage options Notes
Apache Iceberg Confluent Managed or Azure Data Lake Storage Gen2 Recommended for broad tool compatibility
Delta Lake Azure Data Lake Storage Gen2 only Confluent Managed Storage is not supported for Delta

Step 2 — Storage Backend

Choose where the materialized table data is stored:

  • Confluent Managed Storage (Iceberg only) — Confluent provisions and manages the underlying object storage. No extra Azure configuration required.
  • Azure Data Lake Storage Gen2 (BYOS) — Bring your own storage account and container. Requires the storage account and container to already exist, and the Confluent Azure Provider Integration's role assignment must cover the target storage account. You supply the Storage Account Name and Container Name. Storage region and provider integration are read from the environment configuration automatically.

Step 3 — Advanced Settings

Setting Description Default
Snapshot expiration How long snapshots (Iceberg) or versions (Delta) are kept. Minimum 1 day; always keeps at least 10 snapshots. 7 days
Error handling NONE — suspend on first bad record. LOG — route bad records to a dead letter queue topic (requires Avro or Protobuf schema). NONE
Dead letter queue topic Only shown when error handling is LOG. Leave blank to use the default error_log topic (created automatically by Confluent if it does not exist). error_log

Step 4 — Review & Confirm

Review all configuration details before submitting. Once confirmed, Confluent will begin materializing the topic. The status transitions to RUNNING when materialization is active.

Write Access Required

Enabling, updating, and disabling Tableflow configurations requires write access to the environment (allow_modify must be enabled for the stage). Users with read-only (Viewer) access can see the current Tableflow status on a topic's detail page but cannot make changes.

External Catalog Integration

Once Tableflow is running, you can register its output in an external data catalog so the materialized tables are discoverable from tools like Databricks. Navigate to Tableflow in the sidebar and use the External Catalog Integration section.

Tableflow Config Must Exist First

An External Catalog Integration requires at least one active Tableflow configuration in the environment. If no Tableflow configs exist when you register the integration, the status will remain stuck at PENDING indefinitely. Enable Tableflow on at least one topic first.

Supported Catalogs

Currently only Databricks Unity Catalog is supported. AWS Glue and Snowflake catalog integrations are not yet available in the self-service portal.

Required Fields

Field Description
Display Name A human-readable label for this integration (e.g. eon-unity)
Workspace URL The Databricks workspace URL associated with the Unity Catalog (e.g. https://adb-123….azuredatabricks.net)
Catalog Name The Unity Catalog name in Databricks where tables will be registered (e.g. eon_catalog)
OAuth Client ID The Databricks service principal client ID that Confluent uses to authenticate against Unity Catalog
OAuth Client Secret The corresponding secret. Write-only — never returned by the API. Leave blank on edit to preserve the existing secret.

Databricks Service Principal Permissions

The OAuth service principal must be created in the Databricks workspace under Settings → Identity and Access → Service Principals with a client ID and client secret generated for OAuth. It must then be granted the following permissions before the integration can reach CONNECTED status:

Scope Permission / Role Why it's needed
Unity Catalog Data Editor role
(includes USE CATALOG, USE SCHEMA, CREATE TABLE, SELECT, MODIFY)
Allows Confluent to create and manage a schema (named after the Kafka cluster ID) and register each Tableflow-enabled topic as an external table inside it
External Location CREATE EXTERNAL TABLE Allows Confluent to register new external tables at the storage path
External Location READ FILES Allows reading data files from the Tableflow storage location
External Location WRITE FILES Allows writing and updating table data files at the storage location

What is an External Location? An External Location is a Databricks Unity Catalog object that maps a path in an external cloud storage account to a named, access-controlled resource. The External Location required here must point to the same Azure Data Lake Storage Gen2 storage account and container that was configured in the Tableflow Config (Step 2 of the Enable Tableflow wizard). Databricks uses it to verify that the service principal is allowed to read and write the table files that Confluent materializes at that path.

Who Can Manage Catalog Integrations

Creating, editing, and deleting External Catalog Integrations is restricted to administrators (daai-platform-admin group). Non-admins can see the catalog status displayed on a topic's detail page but cannot make changes.

Costs

Tableflow pricing depends on the number of topics enabled, data volume, and storage choice. Because these variables differ significantly per use case, if you want to know how much it costs we need to provide a proper cost estimation based on actual requirements. Reach out to the Platform Team to get a tailored estimate before enabling Tableflow at scale.

Limitations

Topics

Schema required

A topic must have a registered value schema (Data Contract) before Tableflow can be enabled. Schemaless topics are not supported.

Topics

LOG error handling requires Avro or Protobuf

The LOG error handling mode needs a structured schema to serialize failed records to the dead letter queue. JSON Schema topics must use NONE.

Storage

Delta Lake requires BYOS

Delta Lake only supports Azure Data Lake Storage Gen2 — Confluent Managed Storage is not available. Apache Iceberg supports both.

Runtime

No on-demand suspend

Tableflow cannot be paused manually. SUSPENDED only appears when Confluent auto-suspends due to errors. Use Resume after resolving the issue.

Catalog

One integration per catalog kind per cluster

Confluent allows at most one Unity Catalog integration per environment + cluster. Delete the existing one before creating a new one. Other catalog kinds (e.g. AWS Glue) each have their own independent slot.

Catalog

Credentials rotate as a pair

client_id and client_secret must be provided together when updating. Leave both blank to keep existing credentials.

Catalog

Deleting a catalog integration does not remove Unity Catalog tables

Removing the integration stops Confluent from registering new materializations. Tables already registered in Databricks Unity Catalog are not deleted — they must be removed manually if no longer needed.

Esc