Table Column Statistics
Provide column-level statistics from Arrow Flight servers to optimize DuckDB query execution with min/max values, distinct counts, and null checks.
Arrow Flight servers can optionally provide column-level statistics to improve query execution performance. Statistics may include:
- Minimum and maximum values
- Number of distinct values
- Presence of null or non-null values
These statistics enable DuckDB’s query optimizer to make better execution decisions, such as choosing optimal join strategies or filter orderings.
Arrow Flight Server Implementation Notes
To enable statistics for a table, add a metadata key named can_produce_statistics with a non-empty string value to the table’s Arrow schema.
When statistics are available, the Airport extension invokes a DoAction Arrow Flight RPC with the column_statistics action for each column of interest.