Architecture
8 Layers. One Platform.
OptimaFlo replaces 6-10 data tools with a single, end-to-end platform built on Apache Iceberg, Airflow, and auto-scaling compute.
What It Replaces
Fivetran + Snowflake + dbt + Monte Carlo + Tableau + Mode + consultants
All 8 layers included. Runs in your cloud with BYOC.
The 8 Platform Layers
From raw data ingestion to AI-generated dashboards.
Ingestion
Connect to cloud storage, data warehouses, and APIs. Apache Airflow orchestrates all ingestion with automatic schema detection. More connectors are being added regularly.
- GCS, BigQuery, and REST API connectors available today
- Automatic schema detection and validation
- S3, Redshift, Snowflake, databases, and GraphQL coming soon
- OAuth and credential management built-in
Ingestion — Raw Storage
Raw data lands in Apache Iceberg tables with zero transformations. Every record is preserved with ACID transactions, full history, and time-travel.
- Apache Iceberg tables with ACID guarantees
- Schema evolution without table rewrites
- Full history retention and time-travel queries
- Partitioned by ingestion date for performance
Cleaning — Cleaned Data
LLM-generated SQL transformations clean, deduplicate, and type-cast your data. You review and approve before anything runs — no black-box magic.
- Natural language to SQL via the Analytics Engineer
- Preview results before committing changes
- Deduplication, null handling, type casting
- Validated and schema-enforced before execution
Aggregation — Business Metrics
Aggregated metrics, star schemas, and business KPIs. Incremental updates keep compute costs low while keeping data fresh.
- Aggregations and business-ready metrics
- Star schema for dimensional modeling
- Incremental updates to minimize compute
- Direct feed to dashboards and exports
Dashboards & BI
Built-in semantic layer, charts, KPI tiles, and shareable dashboards. Query Aggregation tables directly without moving data to another tool.
- Semantic layer with reusable metric definitions
- Bar, line, area, pie, scatter, and KPI widgets
- Dashboard sharing and embedding
- The Analyst for ad-hoc queries and visualizations
AI Analyst
Ask questions in plain English. The Analyst queries your data, generates visualizations, and adds them to dashboards. No SQL required from you.
- Natural language queries across all sources
- Automatic chart generation from query results
- Multi-source routing (Iceberg + BigQuery)
- Pin results directly to dashboards
Data Quality
Quality scoring across five dimensions, schema enforcement, and self-healing SQL across every medallion layer.
- Five-dimension quality scores (completeness, validity, uniqueness, consistency, timeliness)
- Schema enforcement blocks unsafe type changes
- Self-healing SQL auto-corrects errors at runtime
- Integrated with pipeline execution flow
End-to-End Pipelines
The Manager takes a single natural language request and builds the complete pipeline: connects sources, generates SQL, creates the canvas, and deploys to Airflow.
- Single-prompt pipeline generation
- Automatic task decomposition
- Source connection, SQL, and scheduling
- Human-in-the-loop review at each step
Auto-Scaling Engine Selection
OptimaFlo picks the right compute engine based on your data volume. No manual configuration — the platform measures table size and routes to the optimal engine automatically.
- No cluster to manage
- Millisecond startup time
- Parquet, CSV, JSON native support
- Native Iceberg read/write via PyIceberg
- Serverless — no clusters
- Columnar storage with auto-optimization
- Pay only for bytes scanned
- BigLake integration with Iceberg
- Horizontal scaling to petabytes
- PySpark and Spark SQL
- On-demand cluster lifecycle
- Native Iceberg read/write
Apache Iceberg & Polaris Catalog
Every table in OptimaFlo is an Apache Iceberg table managed by a Polaris catalog. This means your data is stored in open Parquet files on your own cloud storage — no proprietary formats, no lock-in.
ACID Transactions
Every write is atomic — no partial data, no corruption, even on failure.
Time-Travel
Query any historical version of your table. Roll back changes instantly.
Schema Evolution
Add, rename, or drop columns without rewriting the entire table.
Open Format
Parquet files on your storage. Query from any engine — no lock-in.
Explore Further
Dive deeper into specific platform areas.
One platform. 8 layers. Runs in your cloud.