Apache Druid is a real-time database to power modern analytics applications.
Apache Druid is a real-time database to power modern analytics applications.
Druid is designed for workflows where fast ad-hoc analytics, instant data visibility, or supporting high concurrency is important. As such, Druid is often used to power UIs where an interactive, consistent user experience is desired.
Druid streams data from message buses such as Kafka, and Amazon Kinesis, and batch load files from data lakes such as HDFS, and Amazon S3. Druid supports most popular file formats for structured and semi-structured data.
Druid has been benchmarked to greatly outperform legacy solutions. Druid combines novel storage ideas, indexing structures, and both exact and approximate queries to return most results in under a second.
Druid unlocks new types of queries and workflows for clickstream, APM, supply chain, network telemetry, digital marketing, risk/fraud, and many other types of data. Druid is purpose built for rapid, ad-hoc queries on both real-time and historical data.
Druid can be deployed in any *NIX environment on commodity hardware, both in the cloud and on premise. Deploying Druid is easy: scaling up and down is as simple as adding and removing Druid services.
Druid is proven in production at the world’s leading companies at massive scale.
Learn about some of the most common questions about Druid.
Get started with Druid in minutes. Load your own data and query it.
Get help from a wide network of community members about using Druid.
Druid Data Cookbook: Quantiles in Druid with Data Sketches
Hellmar Becker -
Imply
Mar 20 2022
Druid Data Cookbook: Ingestion Transforms
Hellmar Becker -
Imply
Feb 9 2022
Multi-dimensional range partitioning
Kashif Faraz -
Imply
Feb 4 2022
Seeking the Perfect Apache Druid Rollup
Neil Buesing -
Rill Data
Dec 16 2021