column_statistics Action
Implement the column_statistics DoAction to provide min/max values, distinct counts, and null statistics for query optimization.
The column_statistics action provides DuckDB with statistical information about a column in a table. These statistics enable the query optimizer to make better execution decisions, such as choosing optimal join strategies or filtering approaches. Implementing this action is optional and can be done selectively for specific tables.
Input Parameters
The action receives a single msgpack-serialized parameter:
struct GetFlightColumnStatistics
{
std::string flight_descriptor;
std::string column_name;
std::string type;
MSGPACK_DEFINE_MAP(flight_descriptor, column_name, type)
};The flight_descriptor field is the Arrow Flight serialized FlightDescriptor structure.
The type field is the DuckDB data type name, i.e. VARCHAR, TIMESTAMP WITH TIME ZONE.
Return Value
The action must return an Arrow RecordBatch serialized using the IPC format with a single row containing the following fields:
| Field Name | Type | Description |
|---|---|---|
has_not_null |
BOOLEAN |
Indicate if the field contains a value that is not null. |
has_null |
BOOLEAN |
Indicate if the field contains a value that is null. |
distinct_count |
UINT64 |
Indicate the number of distinct values in the field. |
min |
Depends on column type | The minimum value of the field. |
max |
Depends on column type | The maximum value of the field. |
max_string_length |
UINT64 |
The maximum length of a string value |
contains_unicode |
BOOLEAN |
Indicate if the field contains Unicode text. |