Pre-aggregates

Availability: Pre-aggregates are an Early Access feature available on Enterprise plans only.

Pre-aggregates let you define materialized summaries of your data directly in your dbt YAML. When a user runs a query in Lightdash, the system checks if the query can be answered from a pre-aggregate instead of querying your warehouse. If it matches, the query is served from the pre-computed results, making it significantly faster and reducing warehouse load. This is especially useful for dashboards with high traffic or expensive aggregations that don’t need real-time data. Any query that goes through the Lightdash semantic layer can hit a pre-aggregate — this includes the Lightdash app, the API, MCP, AI agents, the Embed SDK, and the React SDK.

Getting started

Define pre-aggregates in your dbt project and configure scheduling.

Monitoring and debugging

Track materialization status, debug query matching, and view hit/miss stats.

How it works

Pre-aggregates follow a four-step cycle:

Define — You add a pre_aggregates block to your dbt model YAML, specifying which dimensions and metrics to include.
Materialize — Lightdash runs the aggregation query against your warehouse and stores the results. This happens automatically on compile, on a cron schedule you define, or when you trigger it manually.
Match — When a user runs a query, Lightdash checks if every requested dimension, metric, and filter is covered by a pre-aggregate.
Serve — If a match is found, the query is served from the materialized data instead of hitting your warehouse.

Example

Suppose you have an orders table with thousands of rows, and you define a pre-aggregate with dimensions status and metrics total_amount (sum) and order_count (count), with a day granularity on order_date. Your warehouse data:

order_date	status	customer	amount
2024-01-15	shipped	Alice	$100
2024-01-15	shipped	Bob	$50
2024-01-15	pending	Charlie	$75
2024-01-16	shipped	Alice	$200
2024-01-16	pending	Charlie	$30
…	…	…	…

Lightdash materializes this into a pre-aggregate:

order_date_day	status	total_amount	order_count
2024-01-15	shipped	$150	2
2024-01-15	pending	$75	1
2024-01-16	shipped	$200	1
2024-01-16	pending	$30	1

Now when a user queries “total amount by status, grouped by month”, Lightdash re-aggregates from the daily pre-aggregate instead of scanning the full table:

order_date_month	status	total_amount
January 2024	shipped	$350
January 2024	pending	$105

This works because sum can be re-aggregated — summing daily sums gives the correct monthly sum.

Query matching

When a user runs a query, Lightdash automatically checks if a pre-aggregate can serve the results. A pre-aggregate matches when all of the following are true:

Every dimension in the query is included in the pre-aggregate
Every metric in the query is included in the pre-aggregate
Every dimension used in filters is included in the pre-aggregate
All metrics use supported metric types
The query does not contain custom dimensions, custom metrics, or table calculations
If the query uses a time dimension, the requested granularity must be equal to or coarser than the pre-aggregate’s granularity

A day pre-aggregate serves day, week, month, quarter, and year queries. A month pre-aggregate serves month, quarter, and year — but not day or week, since those need finer-grained data.

When multiple pre-aggregates match a query, Lightdash picks the smallest one (fewest dimensions, then fewest metrics as tiebreaker).

Dimensions from joined tables

Pre-aggregates support dimensions from joined tables. Reference them by their full name (for example, customers.first_name) in the dimensions list.

Supported metric types

Pre-aggregates support metrics that can be re-aggregated from pre-computed results:

sum
count
min
max
average

Current limitations

Not all metrics work this way. Consider count_distinct with the same daily pre-aggregate from above. If a daily pre-aggregate stores “2 distinct customers on 2024-01-15” and “1 distinct customer on 2024-01-16”, you can’t sum those to get the monthly distinct count — Alice ordered on both days and would be counted twice:

order_date_day	status	distinct_customers
2024-01-15	shipped	2 (Alice, Bob)
2024-01-16	shipped	1 (Alice)

Re-aggregating: 2 + 1 = 3, but the correct monthly answer is 2 (Alice, Bob). The pre-aggregate lost track of which customers were counted. We’re investigating supporting count_distinct through approximation algorithms. Follow this issue for updates. For similar reasons, the following metric types are also not supported:

sum_distinct, average_distinct
median, percentile
percent_of_total, percent_of_previous
running_total
Custom SQL metrics — Follow this issue
number, string, date, timestamp, boolean

For metrics that can’t be pre-aggregated, consider using caching instead.

Introduction

Explore and analyze

Build with AI

Build your semantic layer

Workspace and user management

Integrations

Lightdash SDK

Embedding

Self-hosting and deployment

Contact

Getting started

Monitoring and debugging

How it works

Example

Query matching

Dimensions from joined tables

Supported metric types

Current limitations

Introduction

Explore and analyze

Build with AI

Build your semantic layer

Workspace and user management

Integrations

Lightdash SDK

Embedding

Self-hosting and deployment

Contact

Getting started

Monitoring and debugging

​How it works

​Example

​Query matching

​Dimensions from joined tables

​Supported metric types

​Current limitations

How it works

Example

Query matching

Dimensions from joined tables

Supported metric types

Current limitations