Scalar Functions

Create custom scalar user-defined functions (UDFs) in Arrow Flight servers that can be called directly from DuckDB SQL queries.

The Airport extension enables Arrow Flight servers to provide custom scalar functions that can be called directly within SQL queries.

A scalar function accepts a set of input parameters and returns a single result of any DuckDB data type, including nested and complex types.

Example

SELECT geocode_address('1024 Lenox Ave, Miami Beach, Florida 33139')

When invoked, function parameters are serialized and sent to the Arrow Flight server. The server executes the function and returns the results to the DuckDB client.

Scalar functions are automatically registered in the DuckDB catalog when an Airport-provided database is attached. Functions can be explicitly qualified with their database and schema names:

SELECT geocoder.usa.geocode_address('1024 Lenox Ave, Miami Beach, Florida 33139')

In this example, geocoder is the attached database name and usa is the schema name.

Efficiency and Parallelism

Airport scalar functions achieve high efficiency by processing entire DuckDB vectors in batch operations. As of DuckDB version 1.2, the standard vector size is 2048 tuples. DuckDB may invoke scalar functions from multiple threads concurrently, so Arrow Flight servers must support parallel request handling.

Note

Future versions may introduce configurable parallelism limits for scalar function calls. Currently, the number of concurrent threads is typically bounded by the number of CPU cores available to DuckDB.

Arrow Flight Server Implementation Notes

When the scalar function is invoked, a DoExchange Arrow Flight RPC operation is performed. Batches of rows are sent to the server, and results are returned for each batch.

DuckDB Catalog Integration

For information about how to register an Arrow Flight as a scalar function refer to Server Catalog Integration.

Supporting the ANY Type

Scalar function arguments in DuckDB can use the ANY type. Since Apache Arrow does not natively support a generic ANY column type, a workaround is used: if a field in the Arrow schema contains metadata with the key is_any_type and a non-empty value, that field is treated as having the DuckDB ANY type.

Supplied gRPC Headers for DoExchange request.

Header Name Description
airport-operation Set to scalar_function to indicate the operation being performed
return-chunks Set to 1