Scalar Functions
The Airport Extension enables the Arrow Flight Server to provide scalar functions callable within SQL queries.
A scalar function returns a single result (of any DuckDB data type, including nested types) when given a set of parameters.
Example
SELECT geocode_address('1024 Lenox Ave, Miami Beach, Florida 33139')
When invoked, the parameters to the function are serialized and sent to the Arrow Flight server, with the results returned to the DuckDB client.
Scalar functions are registered in a DuckDB catalog and schema when an Airport-provided database is attached. They can be explicitly called by referencing the database and schema names:
SELECT geocoder.usa.geocode_address('1024 Lenox Ave, Miami Beach, Florida 33139')
In this example geocoder
is the attached database name, and usa
is the name of the schema.
Efficiency / Parallelism
Airport scalar functions operate efficiently by processing entire DuckDB vectors at once. As of DuckDB version 1.2, the standard vector size is 2048 tuples. DuckDB can also invoke scalar functions from multiple threads, so the Arrow Flight server should handle parallel requests.
There may be support added to limit parallelism for calls to scalar functions in the future if deemed necessary, right now the number of threads calling the function is likely limited to the number of CPU cores available to DuckDB.
Arrow Flight Server Implementation Notes
When the scalar function is invoked, a DoExchange Arrow Flight RPC operation is performed. Batches of rows are sent to the server, and results are returned for each batch.
DuckDB Catalog Integration
For information about how to register an Arrow Flight as a scalar function refer to Server Catalog Integration.
Supporting the ANY
Type
Scalar function arguments in DuckDB can use the ANY
type. Since Apache Arrow does not natively support a generic ANY
column type, a workaround is used: if a field in the Arrow schema contains metadata with the key is_any_type
and a non-empty value, that field is treated as having the DuckDB ANY type.
Supplied gRPC Headers for DoExchange
request.
Header Name | Description |
---|---|
airport-operation |
Set to scalar_function to indicate the operation being performed |
return-chunks |
Set to 1 |