Apache Druid
  • Technology
  • Use Cases
  • Powered By
  • Docs
  • Community
  • Apache
  • Download

โ€บConfiguration

Getting started

  • Introduction to Apache Druid
  • Quickstart (local)
  • Single server deployment
  • Clustered deployment

Tutorials

  • Load files natively
  • Load files using SQL ๐Ÿ†•
  • Load from Apache Kafka
  • Load from Apache Hadoop
  • Querying data
  • Roll-up
  • Theta sketches
  • Configuring data retention
  • Updating existing data
  • Compacting segments
  • Deleting data
  • Writing an ingestion spec
  • Transforming input data
  • Tutorial: Run with Docker
  • Kerberized HDFS deep storage
  • Convert ingestion spec to SQL
  • Jupyter Notebook tutorials

Design

  • Design
  • Segments
  • Processes and servers
  • Deep storage
  • Metadata storage
  • ZooKeeper

Ingestion

  • Ingestion
  • Data formats
  • Data model
  • Data rollup
  • Partitioning
  • Ingestion spec
  • Schema design tips
  • Stream ingestion

    • Apache Kafka ingestion
    • Apache Kafka supervisor
    • Apache Kafka operations
    • Amazon Kinesis

    Batch ingestion

    • Native batch
    • Native batch: input sources
    • Migrate from firehose
    • Hadoop-based

    SQL-based ingestion ๐Ÿ†•

    • Overview
    • Key concepts
    • API
    • Security
    • Examples
    • Reference
    • Known issues
  • Task reference
  • Troubleshooting FAQ

Data management

  • Overview
  • Data updates
  • Data deletion
  • Schema changes
  • Compaction
  • Automatic compaction

Querying

    Druid SQL

    • Overview and syntax
    • SQL data types
    • Operators
    • Scalar functions
    • Aggregation functions
    • Multi-value string functions
    • JSON functions
    • All functions
    • Druid SQL API
    • JDBC driver API
    • SQL query context
    • SQL metadata tables
    • SQL query translation
  • Native queries
  • Query execution
  • Troubleshooting
  • Concepts

    • Datasources
    • Joins
    • Lookups
    • Multi-value dimensions
    • Nested columns
    • Multitenancy
    • Query caching
    • Using query caching
    • Query context

    Native query types

    • Timeseries
    • TopN
    • GroupBy
    • Scan
    • Search
    • TimeBoundary
    • SegmentMetadata
    • DatasourceMetadata

    Native query components

    • Filters
    • Granularities
    • Dimensions
    • Aggregations
    • Post-aggregations
    • Expressions
    • Having filters (groupBy)
    • Sorting and limiting (groupBy)
    • Sorting (topN)
    • String comparators
    • Virtual columns
    • Spatial filters

Configuration

  • Configuration reference
  • Extensions
  • Logging

Operations

  • Web console
  • Java runtime
  • Security

    • Security overview
    • User authentication and authorization
    • LDAP auth
    • Password providers
    • Dynamic Config Providers
    • TLS support

    Performance tuning

    • Basic cluster tuning
    • Segment size optimization
    • Mixed workloads
    • HTTP compression
    • Automated metadata cleanup

    Monitoring

    • Request logging
    • Metrics
    • Alerts
  • API reference
  • High availability
  • Rolling updates
  • Using rules to drop and retain data
  • Working with different versions of Apache Hadoop
  • Misc

    • dump-segment tool
    • reset-cluster tool
    • insert-segment-to-db tool
    • pull-deps tool
    • Deep storage migration
    • Export Metadata Tool
    • Metadata Migration
    • Content for build.sbt

Development

  • Developing on Druid
  • Creating extensions
  • JavaScript functionality
  • Build from source
  • Versioning
  • Experimental features

Misc

  • Papers

Hidden

  • Apache Druid vs Elasticsearch
  • Apache Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)
  • Apache Druid vs Kudu
  • Apache Druid vs Redshift
  • Apache Druid vs Spark
  • Apache Druid vs SQL-on-Hadoop
  • Authentication and Authorization
  • Broker
  • Coordinator Process
  • Historical Process
  • Indexer Process
  • Indexing Service
  • MiddleManager Process
  • Overlord Process
  • Router Process
  • Peons
  • Approximate Histogram aggregators
  • Apache Avro
  • Microsoft Azure
  • Bloom Filter
  • DataSketches extension
  • DataSketches HLL Sketch module
  • DataSketches Quantiles Sketch module
  • DataSketches Theta Sketch module
  • DataSketches Tuple Sketch module
  • Basic Security
  • Kerberos
  • Cached Lookup Module
  • Apache Ranger Security
  • Google Cloud Storage
  • HDFS
  • Apache Kafka Lookups
  • Globally Cached Lookups
  • MySQL Metadata Store
  • ORC Extension
  • Druid pac4j based Security extension
  • Apache Parquet Extension
  • PostgreSQL Metadata Store
  • Protobuf
  • S3-compatible
  • Simple SSLContext Provider Module
  • Stats aggregator
  • Test Stats Aggregators
  • Druid AWS RDS Module
  • Kubernetes
  • Ambari Metrics Emitter
  • Apache Cassandra
  • Rackspace Cloud Files
  • DistinctCount Aggregator
  • Graphite Emitter
  • InfluxDB Line Protocol Parser
  • InfluxDB Emitter
  • Kafka Emitter
  • Materialized View
  • Moment Sketches for Approximate Quantiles module
  • Moving Average Query
  • OpenTSDB Emitter
  • Druid Redis Cache
  • Microsoft SQLServer
  • StatsD Emitter
  • T-Digest Quantiles Sketch module
  • Thrift
  • Timestamp Min/Max aggregators
  • GCE Extensions
  • Aliyun OSS
  • Prometheus Emitter
  • kubernetes
  • Cardinality/HyperUnique aggregators
  • Select
  • Firehose (deprecated)
  • Native batch (simple)
  • Realtime Process
Edit

Extensions

Druid implements an extension system that allows for adding functionality at runtime. Extensions are commonly used to add support for deep storages (like HDFS and S3), metadata stores (like MySQL and PostgreSQL), new aggregators, new input formats, and so on.

Production clusters will generally use at least two extensions; one for deep storage and one for a metadata store. Many clusters will also use additional extensions.

Core extensions

Core extensions are maintained by Druid committers.

NameDescriptionDocs
druid-avro-extensionsSupport for data in Apache Avro data format.link
druid-azure-extensionsMicrosoft Azure deep storage.link
druid-basic-securitySupport for Basic HTTP authentication and role-based access control.link
druid-bloom-filterSupport for providing Bloom filters in druid queries.link
druid-datasketchesSupport for approximate counts and set operations with Apache DataSketches.link
druid-google-extensionsGoogle Cloud Storage deep storage.link
druid-hdfs-storageHDFS deep storage.link
druid-histogramApproximate histograms and quantiles aggregator. Deprecated, please use the DataSketches quantiles aggregator from the druid-datasketches extension instead.link
druid-kafka-extraction-namespaceApache Kafka-based namespaced lookup. Requires namespace lookup extension.link
druid-kafka-indexing-serviceSupervised exactly-once Apache Kafka ingestion for the indexing service.link
druid-kinesis-indexing-serviceSupervised exactly-once Kinesis ingestion for the indexing service.link
druid-kerberosKerberos authentication for druid processes.link
druid-lookups-cached-globalA module for lookups providing a jvm-global eager caching for lookups. It provides JDBC and URI implementations for fetching lookup data.link
druid-lookups-cached-singlePer lookup caching module to support the use cases where a lookup need to be isolated from the global pool of lookupslink
druid-multi-stage-querySupport for the multi-stage query architecture for Apache Druid and the multi-stage query task engine.link
druid-orc-extensionsSupport for data in Apache ORC data format.link
druid-parquet-extensionsSupport for data in Apache Parquet data format. Requires druid-avro-extensions to be loaded.link
druid-protobuf-extensionsSupport for data in Protobuf data format.link
druid-ranger-securitySupport for access control through Apache Ranger.link
druid-s3-extensionsInterfacing with data in AWS S3, and using S3 as deep storage.link
druid-ec2-extensionsInterfacing with AWS EC2 for autoscaling middle managersUNDOCUMENTED
druid-aws-rds-extensionsSupport for AWS token based access to AWS RDS DB Cluster.link
druid-statsStatistics related module including variance and standard deviation.link
mysql-metadata-storageMySQL metadata store.link
postgresql-metadata-storagePostgreSQL metadata store.link
simple-client-sslcontextSimple SSLContext provider module to be used by Druid's internal HttpClient when talking to other Druid processes over HTTPS.link
druid-pac4jOpenID Connect authentication for druid processes.link
druid-kubernetes-extensionsDruid cluster deployment on Kubernetes without Zookeeper.link

Community extensions

Community extensions are not maintained by Druid committers, although we accept patches from community members using these extensions. They may not have been as extensively tested as the core extensions.

A number of community members have contributed their own extensions to Druid that are not packaged with the default Druid tarball. If you'd like to take on maintenance for a community extension, please post on dev@druid.apache.org to let us know!

All of these community extensions can be downloaded using pull-deps while specifying a -c coordinate option to pull org.apache.druid.extensions.contrib:{EXTENSION_NAME}:{DRUID_VERSION}.

NameDescriptionDocs
aliyun-oss-extensionsAliyun OSS deep storagelink
ambari-metrics-emitterAmbari Metrics Emitterlink
druid-cassandra-storageApache Cassandra deep storage.link
druid-cloudfiles-extensionsRackspace Cloudfiles deep storage and firehose.link
druid-compressed-bigdecimalCompressed Big Decimal Typelink
druid-distinctcountDistinctCount aggregatorlink
druid-redis-cacheA cache implementation for Druid based on Redis.link
druid-time-min-maxMin/Max aggregator for timestamp.link
sqlserver-metadata-storageMicrosoft SQLServer deep storage.link
graphite-emitterGraphite metrics emitterlink
statsd-emitterStatsD metrics emitterlink
kafka-emitterKafka metrics emitterlink
druid-thrift-extensionsSupport thrift ingestionlink
druid-opentsdb-emitterOpenTSDB metrics emitterlink
materialized-view-selection, materialized-view-maintenanceMaterialized Viewlink
druid-moving-average-querySupport for Moving Average and other Aggregate Window Functions in Druid queries.link
druid-influxdb-emitterInfluxDB metrics emitterlink
druid-momentsketchSupport for approximate quantile queries using the momentsketch librarylink
druid-tdigestsketchSupport for approximate sketch aggregators based on T-Digestlink
gce-extensionsGCE Extensionslink
prometheus-emitterExposes Druid metrics for Prometheus server collection (https://prometheus.io/)link
kubernetes-overlord-extensionsSupport for launching tasks in k8s without Middle Managerslink

Promoting community extensions to core extensions

Please post on dev@druid.apache.org if you'd like an extension to be promoted to core. If we see a community extension actively supported by the community, we can promote it to core based on community feedback.

For information how to create your own extension, please see here.

Loading extensions

Loading core extensions

Apache Druid bundles all core extensions out of the box. See the list of extensions for your options. You can load bundled extensions by adding their names to your common.runtime.properties druid.extensions.loadList property. For example, to load the postgresql-metadata-storage and druid-hdfs-storage extensions, use the configuration:

druid.extensions.loadList=["postgresql-metadata-storage", "druid-hdfs-storage"]

These extensions are located in the extensions directory of the distribution.

Druid bundles two sets of configurations: one for the quickstart and one for a clustered configuration. Make sure you are updating the correct common.runtime.properties for your setup.

Because of licensing, the mysql-metadata-storage extension does not include the required MySQL JDBC driver. For instructions on how to install this library, see the MySQL extension page.

Loading community extensions

You can also load community and third-party extensions not already bundled with Druid. To do this, first download the extension and then install it into your extensions directory. You can download extensions from their distributors directly, or if they are available from Maven, the included pull-deps can download them for you. To use pull-deps, specify the full Maven coordinate of the extension in the form groupId:artifactId:version. For example, for the (hypothetical) extension com.example:druid-example-extension:1.0.0, run:

java \
  -cp "lib/*" \
  -Ddruid.extensions.directory="extensions" \
  -Ddruid.extensions.hadoopDependenciesDir="hadoop-dependencies" \
  org.apache.druid.cli.Main tools pull-deps \
  --no-default-hadoop \
  -c "com.example:druid-example-extension:1.0.0"

You only have to install the extension once. Then, add "druid-example-extension" to druid.extensions.loadList in common.runtime.properties to instruct Druid to load the extension.

Please make sure all the Extensions related configuration properties listed here are set correctly.

The Maven groupId for almost every community extension is org.apache.druid.extensions.contrib. The artifactId is the name of the extension, and the version is the latest Druid stable version.

Loading extensions from the classpath

If you add your extension jar to the classpath at runtime, Druid will also load it into the system. This mechanism is relatively easy to reason about, but it also means that you have to ensure that all dependency jars on the classpath are compatible. That is, Druid makes no provisions while using this method to maintain class loader isolation so you must make sure that the jars on your classpath are mutually compatible.

โ† Configuration referenceLogging โ†’
  • Core extensions
  • Community extensions
  • Promoting community extensions to core extensions
  • Loading extensions
    • Loading core extensions
    • Loading community extensions
    • Loading extensions from the classpath

Technologyโ€‚ยทโ€‚Use Casesโ€‚ยทโ€‚Powered by Druidโ€‚ยทโ€‚Docsโ€‚ยทโ€‚Communityโ€‚ยทโ€‚Downloadโ€‚ยทโ€‚FAQ

โ€‚ยทโ€‚โ€‚ยทโ€‚โ€‚ยทโ€‚
Copyright ยฉ 2022 Apache Software Foundation.
Except where otherwise noted, licensed under CC BY-SA 4.0.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.