Scalar Functions

The Airport Extension enables the Arrow Flight Server to provide scalar functions callable within SQL queries.

A scalar function returns a single result (of any DuckDB data type, including nested types) when given a set of parameters.

Example

SELECT geocode_address('1024 Lenox Ave, Miami Beach, Florida 33139')

When invoked, the parameters to the function are serialized and sent to the Arrow Flight server, with the results returned to the DuckDB client.

Scalar functions are registered in a DuckDB catalog and schema when an Airport-provided database is attached. They can be explicitly called by referencing the database and schema names:

SELECT geocoder.usa.geocode_address('1024 Lenox Ave, Miami Beach, Florida 33139')

In this example geocoder is the attached database name, and usa is the name of the schema.

Efficiency / Parallelism

Airport scalar functions operate efficiently by processing entire DuckDB vectors at once. As of DuckDB version 1.2, the standard vector size is 2048 tuples. DuckDB can also invoke scalar functions from multiple threads, so the Arrow Flight server should handle parallel requests.

Note

There may be support added to limit parallelism for calls to scalar functions in the future if deemed necessary, right now the number of threads calling the function is likely limited to the number of CPU cores available to DuckDB.

Arrow Flight Server Implementation Notes

When the scalar function is invoked, a DoExchange Arrow Flight RPC operation is performed. Batches of rows are sent to the server, and results are returned for each batch.

DuckDB Catalog Integration

For information about how to register an Arrow Flight as a scalar function refer to Server Catalog Integration.

Supporting the ANY Type

Scalar function arguments in DuckDB can use the ANY type. Since Apache Arrow does not natively support a generic ANY column type, a workaround is used: if a field in the Arrow schema contains metadata with the key is_any_type and a non-empty value, that field is treated as having the DuckDB ANY type.

Supplied gRPC Headers for DoExchange request.

Header Name Description
airport-operation Set to scalar_function to indicate the operation being performed
return-chunks Set to 1