Local Development
Want to get started with streaming SQL pipelines on your laptop? This tutorial walks through running SQLFlow locally—from validating a config file to streaming Kafka data and verifying output.
SQLFlow is like DuckDB for streaming data—just bring your SQL.
🔧 Prerequisites
- Python 3.9+
- Docker + Docker Compose
make
installedgit
installed
1. 🐍 Install Python Dependencies
Clone the SQLFlow repo and install required packages:
git clone https://github.com/turbolytics/sql-flow.git
cd sql-flow
# Install runtime and development dependencies
pip install -r requirements.txt
pip install -r requirements.dev.txt
# If you run into issues with librdkafka:
C_INCLUDE_PATH=/opt/homebrew/Cellar/librdkafka/2.3.0/include \
LIBRARY_PATH=/opt/homebrew/Cellar/librdkafka/2.3.0/lib \
pip install confluent-kafka
make setup-dev
2. ✅ Validate a Config with Test Data
Use the invoke
CLI to test your SQLFlow pipeline configuration locally with sample data:
python3 cmd/sql-flow.py dev invoke dev/config/examples/basic.agg.mem.yml dev/fixtures/simple.json
Expected output:
['{"city":"New York","city_count":28672}', '{"city":"Baltimore","city_count":28672}']
3. 📡 Start Kafka Locally
Start Kafka using Docker Compose:
make start-backing-services
# or manually:
docker-compose -f dev/kafka-single.yml up -d
4. 🚀 Publish Test Data to Kafka
Send test events into Kafka:
python3 cmd/publish-test-data.py --num-messages=10000 --topic="input-simple-agg-mem"
5. ▶️ Run SQLFlow Against Kafka
Start the SQLFlow engine locally to process events from Kafka:
SQLFLOW_KAFKA_BROKERS=localhost:9092 \
python3 cmd/sql-flow.py run dev/config/examples/basic.agg.mem.yml --max-msgs-to-process=10000
6. 🧪 Verify Output
Open a Kafka consumer to read results:
docker exec -it kafka1 kafka-console-consumer \
--bootstrap-server=kafka1:9092 \
--topic=output-simple-agg-mem
You should see real-time results like:
{"city":"San Francisco504","city_count":1}
{"city":"San Francisco735","city_count":1}
...
🎉 Success!
You’ve just run a real-time SQL pipeline on your laptop—using Kafka, DuckDB, and SQLFlow locally.
Need support or want to contribute?