Airport for DuckDB
The Airport extension brings Arrow Flight support to DuckDB, enabling DuckDB to query, modify, and store data via Arrow Flight servers. A DuckDB extension is a plugin that expands DuckDB’s core functionality by adding new capabilities.
To understand the rationale behind the development of this extension, check out the motivation for creating the extension.
Getting started
The Airport extension is a DuckDB community extension. To install it, run the following SQL inside DuckDB:
FROM community; INSTALL airport
To load the extension you can then execute:
LOAD airport;
If you wish to build the extension from source see these instructions.
What can I do with the Airport extension that I can’t do with DuckDB now?
With the Airport extension you can:
- Query data that DuckDB can’t normally access—either because it’s non-tabular or in an unsupported format. Even external APIs. It all depends what the server allows.
- Add custom scalar or table returning SQL functions not available in DuckDB.
- Provide User Defined Functions (UDFs) for DuckDB that execute remotely.
- Serve data with fine-grained access control, filtering both rows and columns based on user permissions.
- Access and provide Data-as-a-Service.
What is Arrow Flight?
From the Apache Arrow Documentation:
Arrow Flight is an RPC framework for high-performance data services based on Apache Arrow and is built on top of gRPC and the Arrow IPC format.
Flight is organized around streams of Arrow record batches1, being either downloaded from or uploaded to another service. A set of metadata methods offers discovery and introspection of streams, as well as the ability to implement application-specific methods.
Methods and message wire formats are defined by Protobuf, enabling interoperability with clients that may support gRPC and Arrow separately, but not Flight. However, Flight implementations include further optimizations to avoid overhead in usage of Protobuf (mostly around avoiding excessive memory copies).
What is an Apache Arrow “Flight”?
An Apache Arrow flight (hereafter referred to simply as a “flight”) is a source or destination for data that is accessible via the Apache Arrow Flight RPC framework. Each flight has a schema and one or more endpoints, that may offer one or more locations.
You can think of flights to be simliar to be a collection of files that share the same schema or even more apt a database table that is stored on a remote server. Apache Arrow Flight servers often provide many different flights.
How does Airport work with DuckDB?
Airport is an extension written in C++ for DuckDB version 1.3.0 or later, it utilizes the Apache Arrow library.
How can I build an Arrow Flight Server?
Start by reading the basics of implementing an Arrow Flight Server.
Conference Presentations
Rusty Conover presented the Airport extension in a presentation titled “Airport For DuckDB: Letting DuckDB take flight.” at DuckCon #6.
Footnotes
A record batch is a collection of equal-length arrays that all match a schema.↩︎