Apache Druid
  • Technology
  • Use Cases
  • Powered By
  • Docs
  • Community
  • Apache
  • Download

›Design

Getting started

  • Introduction to Apache Druid
  • Quickstart
  • Single server deployment
  • Clustered deployment

Tutorials

  • Loading files natively
  • Load from Apache Kafka
  • Load from Apache Hadoop
  • Querying data
  • Roll-up
  • Configuring data retention
  • Updating existing data
  • Compacting segments
  • Deleting data
  • Writing an ingestion spec
  • Transforming input data
  • Kerberized HDFS deep storage

Design

  • Design
  • Segments
  • Processes and servers
  • Deep storage
  • Metadata storage
  • ZooKeeper

Data ingestion

  • Ingestion
  • Data formats
  • Schema design tips
  • Data management
  • Stream ingestion

    • Apache Kafka
    • Amazon Kinesis
    • Tranquility

    Batch ingestion

    • Native batch
    • Hadoop-based
  • Task reference
  • Troubleshooting FAQ

Querying

  • Druid SQL
  • Native query types

    • Making native queries
    • Timeseries
    • TopN
    • GroupBy
    • Scan
    • TimeBoundary
    • SegmentMetadata
    • DatasourceMetadata
    • Search
    • Select
  • Multi-value dimensions
  • Lookups
  • Joins
  • Multitenancy considerations
  • Query caching
  • Spatial filters

Configuration

  • Configuration reference
  • Extensions
  • Logging

Operations

  • Management UIs
  • Basic cluster tuning
  • API reference
  • High availability
  • Rolling updates
  • Retaining or automatically dropping data
  • Metrics
  • Alerts
  • Working with different versions of Apache Hadoop
  • HTTP compression
  • Recommendations
  • TLS support
  • Password providers
  • dump-segment tool
  • reset-cluster tool
  • insert-segment-to-db tool
  • pull-deps tool
  • Misc

    • Deep storage migration
    • Web console
    • Export Metadata Tool
    • Getting started with Apache Druid
    • Metadata Migration
    • Segment Size Optimization
    • Content for build.sbt

Development

  • Developing on Druid
  • Creating extensions
  • JavaScript functionality
  • Build from source
  • Versioning
  • Experimental features

Misc

  • Expressions
  • Papers

Hidden

  • Apache Druid vs Elasticsearch
  • Apache Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)
  • Apache Druid vs Kudu
  • Apache Druid vs Redshift
  • Apache Druid vs Spark
  • Apache Druid vs SQL-on-Hadoop
  • Authentication and Authorization
  • Broker
  • Coordinator Process
  • Historical Process
  • Indexer Process
  • Indexing Service
  • MiddleManager Process
  • Overlord Process
  • Router Process
  • Peons
  • Approximate Histogram aggregators
  • Apache Avro
  • Bloom Filter
  • DataSketches extension
  • DataSketches HLL Sketch module
  • DataSketches Quantiles Sketch module
  • DataSketches Theta Sketch module
  • DataSketches Tuple Sketch module
  • Basic Security
  • Kerberos
  • Cached Lookup Module
  • Google Cloud Storage
  • HDFS
  • Apache Kafka Lookups
  • Globally Cached Lookups
  • MySQL Metadata Store
  • ORC Extension
  • Apache Parquet Extension
  • PostgreSQL Metadata Store
  • Protobuf
  • S3-compatible
  • Simple SSLContext Provider Module
  • Stats aggregator
  • Test Stats Aggregators
  • Ambari Metrics Emitter
  • Microsoft Azure
  • Apache Cassandra
  • Rackspace Cloud Files
  • DistinctCount Aggregator
  • Graphite Emitter
  • Aggregations
  • Datasources
  • Transforming Dimension Values
  • Query Filters
  • Aggregation Granularity
  • Filter groupBy query results
  • Cardinality/HyperUnique aggregators
  • Sort groupBy query results
  • Post-Aggregations
  • Query context
  • Refining search queries
  • Sorting Orders
  • TopNMetricSpec
  • Virtual Columns
  • InfluxDB Line Protocol Parser
  • InfluxDB Emitter
  • Kafka Emitter
  • Materialized View
  • Moment Sketches for Approximate Quantiles module
  • development/extensions-contrib/moving-average-query
  • OpenTSDB Emitter
  • Druid Redis Cache
  • Microsoft SQLServer
  • StatsD Emitter
  • T-Digest Quantiles Sketch module
  • Thrift
  • Timestamp Min/Max aggregators
  • Realtime Process
Edit

Deep storage

Deep storage is where segments are stored. It is a storage mechanism that Apache Druid (incubating) does not provide. This deep storage infrastructure defines the level of durability of your data, as long as Druid processes can see this storage infrastructure and get at the segments stored on it, you will not lose data no matter how many Druid nodes you lose. If segments disappear from this storage layer, then you will lose whatever data those segments represented.

Local Mount

A local mount can be used for storage of segments as well. This allows you to use just your local file system or anything else that can be mount locally like NFS, Ceph, etc. This is the default deep storage implementation.

In order to use a local mount for deep storage, you need to set the following configuration in your common configs.

PropertyPossible ValuesDescriptionDefault
druid.storage.typelocalMust be set.
druid.storage.storageDirectoryDirectory for storing segments.Must be set.

Note that you should generally set druid.storage.storageDirectory to something different from druid.segmentCache.locations and druid.segmentCache.infoDir.

If you are using the Hadoop indexer in local mode, then just give it a local file as your output directory and it will work.

S3-compatible

See druid-s3-extensions extension documentation.

HDFS

See druid-hdfs-storage extension documentation.

Additional Deep Stores

For additional deep stores, please see our extensions list.

← Processes and serversMetadata storage →
  • Local Mount
  • S3-compatible
  • HDFS
  • Additional Deep Stores

Technology · Use Cases · Powered by Druid · Docs · Community · Download · FAQ

 ·  ·  · 
Copyright © 2019 Apache Software Foundation.
Except where otherwise noted, licensed under CC BY-SA 4.0.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.