Tutorials | SQLFlow Documentation

📄️ Quickstart

Create a stream processor that reads data from Kafka in less than 5 minutes.

📄️ Learning SQLFlow Using the Bluesky Firehose

SQLFlow is a new stream processing engine powered by DuckDB. SQLFlow brings DuckDB to streaming data, using a lightweight python-powered service. This means that end-users can run stateful stream processors and configure them using SQL, tapping into the entire DuckDB ecocystem. SQLFlow currently supports kafka and websockets as datasources. The websocket support makes it trivial to execute SQL against the Bluesky Social firehose:

📄️ Stream 70k Rows per second to Clickhouse using SQLFlow

This tutorial demonstrates how to stream messages from Kafka To Clickhouse using SQLFlow.

📄️ 🦆 Sink Kafka to DuckLake with SQLFlow

SQLFlow now supports DuckLake as a first-class sink, allowing you to write high-throughput streaming data directly into DuckDB-backed cloud-native tables.

📄️ Streaming to Iceberg using SQLFlow

This blog post will show how SQLFlow can stream data into Iceberg. Iceberg is an emerging standard that is gaining traction for use in storing large amounts of data on immutable storage (such as s3). The current iceberg ecosystem is heavy-weight, based on the JVM. The goal of this post is to show how simple it is to stream data from Kafka to Iceberg tables using SQLFlow!

📄️ Integrating SQLFlow with MotherDuck

This guide shows how to use SQLFlow to stream data from Kafka directly into a MotherDuck database using DuckDB's ATTACH integration.

📄️ SQLFlow: Join Kafka With Postgres To Enrich Real time Data Streams

Combining real-time streaming data with operational data, from systems like #postgresql, is a common challenge. Whether you're enriching live transaction streams with user profiles, correlating sensor data with metadata, or performing dynamic lookups for analytics, the ability to join these two types of data sources is essential.

📄️ Using SQLFlow to Sink 5000 Rows / Second from Kafka To Postgres

Modern data workflows often require streaming data from real-time sources like Kafka into reliable, queryable storage systems such as PostgreSQL. While this task is essential for many analytics and operational pipelines, implementing it can be complex—often involving custom code, multiple tools, and extensive configuration.

📄️ Stream Data from Kafka to S3 in Parquet Format using SQLFlow

In modern data architectures, efficiently moving streaming data from Kafka to S3 in a structured format like Parquet is crucial for analytics, machine learning, and long-term storage. SQLFlow simplifies this process by enabling declarative SQL-based stream processing, making it easy to transform and persist Kafka data to cloud storage.