column_statistics Action

Implement the column_statistics DoAction to provide min/max values, distinct counts, and null statistics for query optimization.

The column_statistics action provides DuckDB with statistical information about a column in a table. These statistics enable the query optimizer to make better execution decisions, such as choosing optimal join strategies or filtering approaches. Implementing this action is optional and can be done selectively for specific tables.

Input Parameters

The action receives a single msgpack-serialized parameter:

struct GetFlightColumnStatistics
{
  std::string flight_descriptor;
  std::string column_name;
  std::string type;

  MSGPACK_DEFINE_MAP(flight_descriptor, column_name, type)
};

The flight_descriptor field is the Arrow Flight serialized FlightDescriptor structure.

The type field is the DuckDB data type name, i.e. VARCHAR, TIMESTAMP WITH TIME ZONE.

Return Value

The action must return an Arrow RecordBatch serialized using the IPC format with a single row containing the following fields:

Field Name Type Description
has_not_null BOOLEAN Indicate if the field contains a value that is not null.
has_null BOOLEAN Indicate if the field contains a value that is null.
distinct_count UINT64 Indicate the number of distinct values in the field.
min Depends on column type The minimum value of the field.
max Depends on column type The maximum value of the field.
max_string_length UINT64 The maximum length of a string value
contains_unicode BOOLEAN Indicate if the field contains Unicode text.