Architecture

8 Layers. One Platform.

OptimaFlo replaces 6-10 data tools with a single, end-to-end platform built on Apache Iceberg, Airflow, and auto-scaling compute.

What It Replaces

Traditional Stack

6-10 separate tools

$25k — $100k/mo

Fivetran + Snowflake + dbt + Monte Carlo + Tableau + Mode + consultants

OptimaFlo

One unified platform

From $2.5k/mo

All 8 layers included. Runs in your cloud with BYOC.

The 8 Platform Layers

From raw data ingestion to AI-generated dashboards.

Layer 1

Ingestion

Ingestion & Orchestration

Connect to cloud storage, data warehouses, and APIs. Apache Airflow orchestrates all ingestion with automatic schema detection. More connectors are being added regularly.

GCS, BigQuery, and REST API connectors available today
Automatic schema detection and validation
S3, Redshift, Snowflake, databases, and GraphQL coming soon
OAuth and credential management built-in

Layer 2

Ingestion — Raw Storage

Raw Storage & Cataloging

Raw data lands in Apache Iceberg tables with zero transformations. Every record is preserved with ACID transactions, full history, and time-travel.

Apache Iceberg tables with ACID guarantees
Schema evolution without table rewrites
Full history retention and time-travel queries
Partitioned by ingestion date for performance

Layer 3

Cleaning — Cleaned Data

Data Transformation

LLM-generated SQL transformations clean, deduplicate, and type-cast your data. You review and approve before anything runs — no black-box magic.

Natural language to SQL via the Analytics Engineer
Preview results before committing changes
Deduplication, null handling, type casting
Validated and schema-enforced before execution

Layer 4

Aggregation — Business Metrics

Business Logic & Aggregation

Aggregated metrics, star schemas, and business KPIs. Incremental updates keep compute costs low while keeping data fresh.

Aggregations and business-ready metrics
Star schema for dimensional modeling
Incremental updates to minimize compute
Direct feed to dashboards and exports

Layer 5

Dashboards & BI

Visualization & Reporting

Built-in semantic layer, charts, KPI tiles, and shareable dashboards. Query Aggregation tables directly without moving data to another tool.

Semantic layer with reusable metric definitions
Bar, line, area, pie, scatter, and KPI widgets
Dashboard sharing and embedding
The Analyst for ad-hoc queries and visualizations

Layer 6

AI Analyst

Ad-Hoc Analysis

Ask questions in plain English. The Analyst queries your data, generates visualizations, and adds them to dashboards. No SQL required from you.

Natural language queries across all sources
Automatic chart generation from query results
Multi-source routing (Iceberg + BigQuery)
Pin results directly to dashboards

Layer 7

Data Quality

Data Observability

Quality scoring across five dimensions, schema enforcement, and self-healing SQL across every medallion layer.

Five-dimension quality scores (completeness, validity, uniqueness, consistency, timeliness)
Schema enforcement blocks unsafe type changes
Self-healing SQL auto-corrects errors at runtime
Integrated with pipeline execution flow

Layer 8

End-to-End Pipelines

End-to-End Automation

The Manager takes a single natural language request and builds the complete pipeline: connects sources, generates SQL, creates the canvas, and deploys to Airflow.

Single-prompt pipeline generation
Automatic task decomposition
Source connection, SQL, and scheduling
Human-in-the-loop review at each step

Auto-Scaling Engine Selection

OptimaFlo picks the right compute engine based on your data volume. No manual configuration — the platform measures table size and routes to the optimal engine automatically.

DuckDB

< 100 GB

In-process analytical database. Runs inside the pipeline execution context with zero infrastructure overhead. Handles most workloads.

No cluster to manage
Millisecond startup time
Parquet, CSV, JSON native support
Native Iceberg read/write via PyIceberg

BigQuery

100 GB — 10 TB

Serverless warehouse for medium-to-large datasets. Pay-per-query pricing means you only pay for what you scan.

Serverless — no clusters
Columnar storage with auto-optimization
Pay only for bytes scanned
BigLake integration with Iceberg

Apache Spark

> 10 TB

Distributed processing for massive datasets. Dataproc clusters spin up on-demand and shut down when complete.

Horizontal scaling to petabytes
PySpark and Spark SQL
On-demand cluster lifecycle
Native Iceberg read/write

Apache Iceberg & Polaris Catalog

Every table in OptimaFlo is an Apache Iceberg table managed by a Polaris catalog. This means your data is stored in open Parquet files on your own cloud storage — no proprietary formats, no lock-in.

ACID Transactions

Every write is atomic — no partial data, no corruption, even on failure.

Time-Travel

Query any historical version of your table. Roll back changes instantly.

Schema Evolution

Add, rename, or drop columns without rewriting the entire table.

Open Format

Parquet files on your storage. Query from any engine — no lock-in.

Explore Further

Dive deeper into specific platform areas.

AI Agents

Your seven-member AI data team that automates your data workflow

Connectors & Sources

Every supported connector and how to configure them

BYOC Deployment

Deploy the full stack in your own GCP project

Data Quality

Scoring, schema enforcement, and self-healing SQL

See it in action

From raw data to dashboards, without the stack.

Get Started Contact Us

One platform. 8 layers. Runs in your cloud.

8 Layers. One Platform.

OptimaFlo replaces 6-10 data tools with a single, end-to-end platform built on Apache Iceberg, Airflow, and auto-scaling compute.

What It Replaces

The 8 Platform Layers

From raw data ingestion to AI-generated dashboards.

Ingestion

Ingestion — Raw Storage

Cleaning — Cleaned Data

Aggregation — Business Metrics

Dashboards & BI

AI Analyst

Data Quality

End-to-End Pipelines

Auto-Scaling Engine Selection

Apache Iceberg & Polaris Catalog

ACID Transactions

Time-Travel

Schema Evolution

Open Format

Explore Further

Dive deeper into specific platform areas.

AI Agents

Connectors & Sources

BYOC Deployment

Data Quality

See it in action

We value your privacy