Getting Started
Your First Pipeline in 5 Minutes
Connect a data source, build a medallion pipeline, and schedule it all from the browser. No infrastructure to manage, no YAML to write.
Quick Start
Try It Live: 5-Minute Walkthrough
Want to see the full flow before connecting your own data? Use our public sample API. The same example is exercised by our automated smoke test, so the happy path is verified on every release.
AAdd the sample data source
In your workspace, open Data Sources and add a REST API source with these settings:
BBuild a three-node pipeline
Open the canvas and add three nodes in a line, source on the left, two transforms on the right:
SELECT * FROM bronze.dogfood_orders
SELECT product_id, COUNT(*) AS order_count, SUM(total_amount) AS total_revenue FROM silver.dogfood_orders_cleaned GROUP BY product_id
Connect them with edges so data flows source to silver to gold. The platform stores the SQL on each node and runs validation at execute time.
CExecute and verify
Click Run Now on the monitoring page. All three layers run as one execution. You should see:
- The status header card flips to Running now, then Last run succeeded within roughly 30 seconds.
- The execution history table lists one entry with row counts for each layer.
- Open the Catalog. You will see three new tables under bronze, silver, and gold.
- Click the gold table and open Snapshots to see the Iceberg snapshot timeline. Each run creates a new snapshot.
The dogfood API is deterministic: the same date always returns the same orders. If you re-run the pipeline back-to-back, you will see the row count match exactly across runs. That makes it useful for testing schedule changes and Silver/Gold SQL edits without worrying about source drift.
Sign Up & Set Up Your Workspace
The onboarding wizard walks you through creating your organization and workspace in four steps. Everything is collected up front and submitted together at the end.
Sign in at optimaflo.io/sign-in with Google, Microsoft, Amazon, or email
Organization: name your organization, set a URL slug, and select your industry and team size
Workspace: create one or more workspaces (up to 5 during onboarding). Separate by environment, team, or project.
BYOC: configure your Bring Your Own Cloud deployment on GCP. Select your infrastructure tier and region, then the platform provisions everything. AWS and Azure support is coming soon.
Each workspace gets its own Apache Iceberg catalog, so data from different workspaces is fully isolated — even within the same organization.
Connect a Data Source
OptimaFlo currently supports Google Cloud Storage, BigQuery, and REST API connectors, with more on the way. The Ingestion Engineer guides you through the entire process conversationally; tell it what you want to connect and it handles authentication, browsing, file selection, validation, and schema inference.
Available Today
- Google Cloud Storage
- BigQuery
- REST API (any endpoint)
Coming Soon
- Amazon S3
- Redshift
- Snowflake
- PostgreSQL
- MySQL
- GraphQL
Open Data Sources in the sidebar and click Add Source, this opens the Ingestion Engineer
Tell the agent what you want to connect (e.g. "Connect my GCS bucket gs://company-data") — it authenticates via OAuth for cloud sources or asks for credentials for databases
The agent browses your buckets, folders, or tables and lets you select specific files or datasets to ingest
It validates the connection, infers your schema, and creates the data source record — ready for use in a pipeline
For GCS and BigQuery, authentication happens via a Google OAuth popup — no service account keys to manage. The platform handles token refresh automatically.
No data of your own? Use the public sample API at https://dogfood-ecommerce.fly.devwith API key dogfood-key-2026. See the Try It Live walkthrough above for the full configuration.
Build Your Pipeline
Three ways to build: drag-and-drop on the visual canvas, let the Data Engineer create a pipeline from a prompt, or let the Manager handle the entire workflow end-to-end.
- Add a Ingestion node linked to your data source
- Add Cleaning and Aggregation nodes for transformations
- Connect them with edges to define data flow
- Write SQL or let the Analytics Engineer generate it
- Open the Data Engineer panel in the canvas sidebar
- Describe what you want to transform
- Review the generated nodes and SQL
- Apply to canvas with one click
"Connect my GCS bucket gs://sales-data, clean the CSVs, deduplicate on order_id, and create a monthly revenue summary."
Run and Schedule
Execute your pipeline manually or set a schedule. OptimaFlo generates an Apache Airflow DAG behind the scenes; you never touch Airflow config directly.
Click Execute to run the pipeline immediately. The platform auto-saves before executing
Monitor progress in the execution panel and watch each layer complete from Ingestion through Aggregation
Open Settings to set a schedule (hourly, daily, weekly, monthly, quarterly, or yearly). The platform converts your selection to an Airflow cron
Use Backfills to re-process historical date ranges when you change transformation logic
Backfills run sequentially by default to avoid Iceberg write conflicts. You can increase parallelism for independent tables.
Core Concepts: Ingestion, Cleaning, Aggregation
The medallion architecture is the backbone of every OptimaFlo pipeline.
Ingestion
Raw DataYour source data lands here untouched. Every record is preserved in Apache Iceberg tables with full history, ACID transactions, and time-travel.
- Zero transformations — exact copy of the source
- Schema detected automatically on ingestion
- Full history retention for compliance and replay
- Partitioned for query performance
Cleaning
Cleaned DataCleaned, deduplicated, and type-cast. The Analytics Engineer generates transformations from plain English, then you review and approve before anything runs.
- LLM-generated SQL from natural language
- Preview results before committing
- Deduplication, null handling, and type casting
- Validated and schema-enforced before execution
Aggregation
Business MetricsAggregated, business-ready metrics and star schemas. Feed dashboards, exports, and analyst queries from a single source of truth.
- Aggregations and business KPIs
- Star schema for analytics
- Incremental updates to minimize compute
- Direct connection to BI dashboards
If You Get Stuck
The handful of issues that catch new users most often.
The pipeline ran, but Silver or Gold has 0 rows
Open the table in Catalog and check the Bronze table first. If Bronze is also empty, the data source is not returning rows; verify the API URL, auth header, and data_path in Data Sources. If Bronze has rows but Silver does not, the Silver SQL is filtering everything out: open the Silver node, click Preview, and inspect the result.
REST API source returns 401 or 403
The auth header name is case-sensitive on some servers. For the dogfood API, the header must be X-API-Key exactly. Double-check the api_key_header field in the data source config. If you are using a Bearer token, set auth_type to bearer instead of api_key.
Schema looks empty or wrong
REST APIs that wrap the array under a key need data_path set to that key. The dogfood orders endpoint wraps under {orders: [...]}, so data_path must be orders. If you see one column called data containing JSON, set flatten_json to true.
Snapshots tab shows one entry but the pipeline ran multiple times
The snapshot timeline groups by calendar date. The (N snapshots) badge next to each date header shows the actual count for that day. Multiple hourly runs on the same day all nest under one date header, with one card per run underneath.
Run Now button is disabled
The button is disabled until the live execution status loads from the backend. Wait a second or two and it will enable. If it stays disabled, refresh the page and check the browser console for an authentication error.
Failed run with no obvious cause
Open the execution history table on the monitoring page. Failed rows expand to show the full error_message with the stack trace. If the error references AppError code, look it up in the error taxonomy for the suggested fix.
What's Next
Deepen your knowledge with these guides.
Connect a source, transform with SQL, and schedule with Airflow — all from the browser.