Data Quality
Trust Your Data
Quality scoring across five dimensions, schema enforcement, and self-healing SQL across every pipeline layer. Surface data issues alongside pipeline execution.
Quality Dimensions
Four dimensions that define whether your data is ready for analytics.
- Null percentage per column
- Required field enforcement
- Row count vs. expected thresholds
- Duplicate row detection
- Primary key uniqueness validation
- Cross-column composite key checks
- Type conformance (dates, numbers, strings)
- Range and boundary checks
- Regex pattern matching
- Last ingestion timestamp
- Stale data alerting
- Pipeline execution frequency
Quality Scores
Tables are scored from 0 to 100 across completeness, uniqueness, validity, consistency, and freshness — combining into a single number you can track over time.
Data is clean, complete, and fresh. Safe for dashboards.
Minor issues. Review flagged columns before using in Gold.
Significant quality issues. Investigate before proceeding.
Automated Profiling
Profiling runs automatically when data is ingested. Every column gets statistical analysis without manual configuration.
Column-Level Statistics
Min, max, mean, median, standard deviation, null percentage, and distinct count for every column.
Type Distribution
Breakdown of inferred vs. actual types. Catches mixed-type columns (e.g., strings in a numeric field).
Value Frequency
Top values and their frequencies per column. Useful for spotting unexpected categories or outliers.
Sample Preview
View sample rows alongside statistics. Profiles run against your actual data, not a separate sample.
Schema Enforcement & Self-Healing SQL
The platform enforces safe schema evolution and auto-corrects SQL errors at runtime — fixing column references, type mismatches, and conversion errors automatically so pipelines recover without manual intervention.
Integrated Across Every Layer
Quality checks are embedded in the pipeline, not bolted on after.
Bronze Layer
IngestionProfile raw data on ingestion. Catch source issues (missing fields, type changes) before they propagate.
Silver Layer
TransformationValidate transformations produced correct output. Check that deduplication and cleaning worked.
Gold Layer
AnalyticsVerify aggregated metrics are within expected bounds. Prevent bad data from reaching dashboards.
Pipeline Execution
OrchestrationQuality checks run as part of the DAG. If a table fails validation, downstream nodes can be paused.
Profiling and scoring run on every ingestion. No separate tool to configure.