Table Column Statistics

Provide column-level statistics from Arrow Flight servers to optimize DuckDB query execution with min/max values, distinct counts, and null checks.

Arrow Flight servers can optionally provide column-level statistics to improve query execution performance. Statistics may include:

These statistics enable DuckDB’s query optimizer to make better execution decisions, such as choosing optimal join strategies or filter orderings.

Arrow Flight Server Implementation Notes

To enable statistics for a table, add a metadata key named can_produce_statistics with a non-empty string value to the table’s Arrow schema.

When statistics are available, the Airport extension invokes a DoAction Arrow Flight RPC with the column_statistics action for each column of interest.