# datafusion **Repository Path**: mirrors/datafusion ## Basic Information - **Project Name**: datafusion - **Description**: DataFusion 是一个可扩展的查询执行框架,用 Rust 编写,使用 Apache Arrow 作为其内存格式 - **Primary Language**: Rust - **License**: Apache-2.0 - **Default Branch**: main - **Homepage**: https://www.oschina.net/p/datafusion - **GVP Project**: No ## Statistics - **Stars**: 2 - **Forks**: 1 - **Created**: 2021-09-01 - **Last Updated**: 2026-02-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # Apache DataFusion [![Crates.io][crates-badge]][crates-url] [![Apache licensed][license-badge]][license-url] [![Build Status][actions-badge]][actions-url] ![Commit Activity][commit-activity-badge] [![Open Issues][open-issues-badge]][open-issues-url] [![Pending PRs][pending-pr-badge]][pending-pr-url] [![Discord chat][discord-badge]][discord-url] [![Linkedin][linkedin-badge]][linkedin-url] ![Crates.io MSRV][msrv-badge] [crates-badge]: https://img.shields.io/crates/v/datafusion.svg [crates-url]: https://crates.io/crates/datafusion [license-badge]: https://img.shields.io/badge/license-Apache%20v2-blue.svg [license-url]: https://github.com/apache/datafusion/blob/main/LICENSE.txt [actions-badge]: https://github.com/apache/datafusion/actions/workflows/rust.yml/badge.svg [actions-url]: https://github.com/apache/datafusion/actions?query=branch%3Amain [discord-badge]: https://img.shields.io/badge/Chat-Discord-purple [discord-url]: https://discord.com/invite/Qw5gKqHxUM [commit-activity-badge]: https://img.shields.io/github/commit-activity/m/apache/datafusion [open-issues-badge]: https://img.shields.io/github/issues-raw/apache/datafusion [open-issues-url]: https://github.com/apache/datafusion/issues [pending-pr-badge]: https://img.shields.io/github/issues-search/apache/datafusion?query=is%3Apr+is%3Aopen+draft%3Afalse+review%3Arequired+status%3Asuccess&label=Pending%20PRs&logo=github [pending-pr-url]: https://github.com/apache/datafusion/pulls?q=is%3Apr+is%3Aopen+draft%3Afalse+review%3Arequired+status%3Asuccess+sort%3Aupdated-desc [linkedin-badge]: https://img.shields.io/badge/Follow-Linkedin-blue [linkedin-url]: https://www.linkedin.com/company/apache-datafusion/ [msrv-badge]: https://img.shields.io/crates/msrv/datafusion?label=Min%20Rust%20Version [Website](https://datafusion.apache.org/) | [API Docs](https://docs.rs/datafusion/latest/datafusion/) | [Chat](https://discord.com/channels/885562378132000778/885562378132000781) logo DataFusion is an extensible query engine written in [Rust] that uses [Apache Arrow] as its in-memory format. This crate provides libraries and binaries for developers building fast and feature-rich database and analytic systems, customized for particular workloads. See [use cases] for examples. The following related subprojects target end users: - [DataFusion Python](https://github.com/apache/datafusion-python/) offers a Python interface for SQL and DataFrame queries. - [DataFusion Comet](https://github.com/apache/datafusion-comet/) is an accelerator for Apache Spark based on DataFusion. "Out of the box," DataFusion offers [SQL](https://datafusion.apache.org/user-guide/sql/index.html) and [DataFrame](https://datafusion.apache.org/user-guide/dataframe.html) APIs, excellent [performance], built-in support for CSV, Parquet, JSON, and Avro, extensive customization, and a great community. DataFusion features a full query planner, a columnar, streaming, multi-threaded, vectorized execution engine, and partitioned data sources. You can customize DataFusion at almost all points including additional data sources, query languages, functions, custom operators and more. See the [Architecture] section for more details. [rust]: http://rustlang.org [apache arrow]: https://arrow.apache.org [use cases]: https://datafusion.apache.org/user-guide/introduction.html#use-cases [python bindings]: https://github.com/apache/datafusion-python [performance]: https://benchmark.clickhouse.com/ [architecture]: https://datafusion.apache.org/contributor-guide/architecture.html Here are links to important resources: - [Project Site](https://datafusion.apache.org/) - [Installation](https://datafusion.apache.org/user-guide/cli/installation.html) - [Rust Getting Started](https://datafusion.apache.org/user-guide/example-usage.html) - [Rust DataFrame API](https://datafusion.apache.org/user-guide/dataframe.html) - [Rust API docs](https://docs.rs/datafusion/latest/datafusion) - [Rust Examples](https://github.com/apache/datafusion/tree/main/datafusion-examples) - [Python DataFrame API](https://arrow.apache.org/datafusion-python/) - [Architecture](https://docs.rs/datafusion/latest/datafusion/index.html#architecture) ## What can you do with this crate? DataFusion is great for building projects such as domain-specific query engines, new database platforms and data pipelines, query languages and more. It lets you start quickly from a fully working engine, and then customize those features specific to your needs. See the [list of known users](https://datafusion.apache.org/user-guide/introduction.html#known-users). ## Contributing to DataFusion Please see the [contributor guide] and [communication] pages for more information. [contributor guide]: https://datafusion.apache.org/contributor-guide [communication]: https://datafusion.apache.org/contributor-guide/communication.html ## Crate features This crate has several [features] which can be specified in your `Cargo.toml`. [features]: https://doc.rust-lang.org/cargo/reference/features.html Default features: - `nested_expressions`: functions for working with nested types such as `array_to_string` - `compression`: reading files compressed with `xz2`, `bzip2`, `flate2`, and `zstd` - `crypto_expressions`: cryptographic functions such as `md5` and `sha256` - `datetime_expressions`: date and time functions such as `to_timestamp` - `encoding_expressions`: `encode` and `decode` functions - `parquet`: support for reading the [Apache Parquet] format - `sql`: support for SQL parsing and planning - `regex_expressions`: regular expression functions, such as `regexp_match` - `unicode_expressions`: include Unicode-aware functions such as `character_length` - `unparser`: enables support to reverse LogicalPlans back into SQL - `recursive_protection`: uses [recursive](https://docs.rs/recursive/latest/recursive/) for stack overflow protection. Optional features: - `avro`: support for reading the [Apache Avro] format - `backtrace`: include backtrace information in error messages - `parquet_encryption`: support for using [Parquet Modular Encryption] - `serde`: enable arrow-schema's `serde` feature [apache avro]: https://avro.apache.org/ [apache parquet]: https://parquet.apache.org/ [parquet modular encryption]: https://parquet.apache.org/docs/file-format/data-pages/encryption/ ## DataFusion API Evolution and Deprecation Guidelines Public methods in Apache DataFusion evolve over time: while we try to maintain a stable API, we also improve the API over time. As a result, we typically deprecate methods before removing them, according to the [deprecation guidelines]. [deprecation guidelines]: https://datafusion.apache.org/contributor-guide/api-health.html ## Dependencies and `Cargo.lock` Following the [guidance] on committing `Cargo.lock` files, this project commits its `Cargo.lock` file. CI uses the committed `Cargo.lock` file, and dependencies are updated regularly using [Dependabot] PRs. [guidance]: https://blog.rust-lang.org/2023/08/29/committing-lockfiles.html [dependabot]: https://docs.github.com/en/code-security/dependabot/working-with-dependabot