column_statistics
Action
The get_flight_column_statitics
action is used to provide DuckDB with statitics about a field in a table.These statitics allow DuckDB to execute queries more efficienty.An Arrow Flight server can optionally choose to implement this action.
Input Parameters
There is a single msgpack
serialized parameter passed to the action.
struct GetFlightColumnStatistics
{
std::string flight_descriptor;
std::string column_name;
std::string type;
(flight_descriptor, column_name, type)
MSGPACK_DEFINE_MAP};
The flight_descriptor
field is the Arrow Flight serialized FlightDescriptor structure.The type
field is the DuckDB data type name, i.e. VARCHAR
, TIMESTAMP WITH TIME ZONE
.
Output Results
The response is a msgpack
encoded GetFlightColumnStatisticsResult
structure.
struct GetFlightColumnStatisticsResult
{
//! Whether or not the segment can contain NULL values
bool has_null;
//! Whether or not the segment can contain values that are not null
bool has_no_null;
// estimate that one may have even if distinct_stats==nullptr
idx_t distinct_count;
//! Numeric and String stats
;
GetFlightColumnStatisticsNumericStatsData numeric_stats;
GetFlightColumnStatisticsStringData string_stats
(has_null, has_no_null, distinct_count, numeric_stats, string_stats)
MSGPACK_DEFINE_MAP};
This in turn references a few additional structures. The first is for strings.
struct GetFlightColumnStatisticsStringData
{
std::string min;
std::string max;
(min, max)
MSGPACK_DEFINE_MAP};
Returning statistics for numeric types is a bit more complicated since there are different levels of precision and msgpack
doesn’t support 128-bit integers. So they’ve been split into a high and low 64-bit integer.
struct GetFlightColumnStatisticsNumericStatsData
{
//! Whether or not the value has a max value
bool has_min;
//! Whether or not the segment has a min value
bool has_max;
//! The minimum value of the segment
;
GetFlightColumnStatisticsNumericValue min//! The maximum value of the segment
;
GetFlightColumnStatisticsNumericValue max
(has_min, has_max, min, max)
MSGPACK_DEFINE_MAP};
struct GetFlightColumnStatisticsNumericValue
{
bool boolean;
int8_t tinyint;
int16_t smallint;
int32_t integer;
int64_t bigint;
uint8_t utinyint;
uint16_t usmallint;
uint32_t uinteger;
uint64_t ubigint;
uint64_t hugeint_high;
uint64_t hugeint_low;
float float_;
double double_;
(
MSGPACK_DEFINE_MAP, tinyint,
boolean, integer,
smallint, utinyint,
bigint, uinteger,
usmallint,
ubigint, hugeint_low,
hugeint_highfloat_, double_)
};
There is an experimental schema defined by Apache Arrow for statistics. In the future that schema may be adopted rather than the msgpack
schema.