Apache Druid
  • Technology
  • Use Cases
  • Powered By
  • Docs
  • Community
  • Apache
  • Download

โ€บHidden

Getting started

  • Introduction to Apache Druid
  • Quickstart (local)
  • Single server deployment
  • Clustered deployment

Tutorials

  • Load files natively
  • Load files using SQL ๐Ÿ†•
  • Load from Apache Kafka
  • Load from Apache Hadoop
  • Querying data
  • Roll-up
  • Theta sketches
  • Configuring data retention
  • Updating existing data
  • Compacting segments
  • Deleting data
  • Writing an ingestion spec
  • Transforming input data
  • Tutorial: Run with Docker
  • Kerberized HDFS deep storage
  • Convert ingestion spec to SQL
  • Jupyter Notebook tutorials

Design

  • Design
  • Segments
  • Processes and servers
  • Deep storage
  • Metadata storage
  • ZooKeeper

Ingestion

  • Ingestion
  • Data formats
  • Data model
  • Data rollup
  • Partitioning
  • Ingestion spec
  • Schema design tips
  • Stream ingestion

    • Apache Kafka ingestion
    • Apache Kafka supervisor
    • Apache Kafka operations
    • Amazon Kinesis

    Batch ingestion

    • Native batch
    • Native batch: input sources
    • Migrate from firehose
    • Hadoop-based

    SQL-based ingestion ๐Ÿ†•

    • Overview
    • Key concepts
    • API
    • Security
    • Examples
    • Reference
    • Known issues
  • Task reference
  • Troubleshooting FAQ

Data management

  • Overview
  • Data updates
  • Data deletion
  • Schema changes
  • Compaction
  • Automatic compaction

Querying

    Druid SQL

    • Overview and syntax
    • SQL data types
    • Operators
    • Scalar functions
    • Aggregation functions
    • Multi-value string functions
    • JSON functions
    • All functions
    • Druid SQL API
    • JDBC driver API
    • SQL query context
    • SQL metadata tables
    • SQL query translation
  • Native queries
  • Query execution
  • Troubleshooting
  • Concepts

    • Datasources
    • Joins
    • Lookups
    • Multi-value dimensions
    • Nested columns
    • Multitenancy
    • Query caching
    • Using query caching
    • Query context

    Native query types

    • Timeseries
    • TopN
    • GroupBy
    • Scan
    • Search
    • TimeBoundary
    • SegmentMetadata
    • DatasourceMetadata

    Native query components

    • Filters
    • Granularities
    • Dimensions
    • Aggregations
    • Post-aggregations
    • Expressions
    • Having filters (groupBy)
    • Sorting and limiting (groupBy)
    • Sorting (topN)
    • String comparators
    • Virtual columns
    • Spatial filters

Configuration

  • Configuration reference
  • Extensions
  • Logging

Operations

  • Web console
  • Java runtime
  • Security

    • Security overview
    • User authentication and authorization
    • LDAP auth
    • Password providers
    • Dynamic Config Providers
    • TLS support

    Performance tuning

    • Basic cluster tuning
    • Segment size optimization
    • Mixed workloads
    • HTTP compression
    • Automated metadata cleanup

    Monitoring

    • Request logging
    • Metrics
    • Alerts
  • API reference
  • High availability
  • Rolling updates
  • Using rules to drop and retain data
  • Working with different versions of Apache Hadoop
  • Misc

    • dump-segment tool
    • reset-cluster tool
    • insert-segment-to-db tool
    • pull-deps tool
    • Deep storage migration
    • Export Metadata Tool
    • Metadata Migration
    • Content for build.sbt

Development

  • Developing on Druid
  • Creating extensions
  • JavaScript functionality
  • Build from source
  • Versioning
  • Experimental features

Misc

  • Papers

Hidden

  • Apache Druid vs Elasticsearch
  • Apache Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)
  • Apache Druid vs Kudu
  • Apache Druid vs Redshift
  • Apache Druid vs Spark
  • Apache Druid vs SQL-on-Hadoop
  • Authentication and Authorization
  • Broker
  • Coordinator Process
  • Historical Process
  • Indexer Process
  • Indexing Service
  • MiddleManager Process
  • Overlord Process
  • Router Process
  • Peons
  • Approximate Histogram aggregators
  • Apache Avro
  • Microsoft Azure
  • Bloom Filter
  • DataSketches extension
  • DataSketches HLL Sketch module
  • DataSketches Quantiles Sketch module
  • DataSketches Theta Sketch module
  • DataSketches Tuple Sketch module
  • Basic Security
  • Kerberos
  • Cached Lookup Module
  • Apache Ranger Security
  • Google Cloud Storage
  • HDFS
  • Apache Kafka Lookups
  • Globally Cached Lookups
  • MySQL Metadata Store
  • ORC Extension
  • Druid pac4j based Security extension
  • Apache Parquet Extension
  • PostgreSQL Metadata Store
  • Protobuf
  • S3-compatible
  • Simple SSLContext Provider Module
  • Stats aggregator
  • Test Stats Aggregators
  • Druid AWS RDS Module
  • Kubernetes
  • Ambari Metrics Emitter
  • Apache Cassandra
  • Rackspace Cloud Files
  • DistinctCount Aggregator
  • Graphite Emitter
  • InfluxDB Line Protocol Parser
  • InfluxDB Emitter
  • Kafka Emitter
  • Materialized View
  • Moment Sketches for Approximate Quantiles module
  • Moving Average Query
  • OpenTSDB Emitter
  • Druid Redis Cache
  • Microsoft SQLServer
  • StatsD Emitter
  • T-Digest Quantiles Sketch module
  • Thrift
  • Timestamp Min/Max aggregators
  • GCE Extensions
  • Aliyun OSS
  • Prometheus Emitter
  • kubernetes
  • Cardinality/HyperUnique aggregators
  • Select
  • Firehose (deprecated)
  • Native batch (simple)
  • Realtime Process
Edit

Kerberos

Apache Druid Extension to enable Authentication for Druid Processes using Kerberos. This extension adds an Authenticator which is used to protect HTTP Endpoints using the simple and protected GSSAPI negotiation mechanism SPNEGO. Make sure to include druid-kerberos in the extensions load list.

Configuration

Creating an Authenticator

druid.auth.authenticatorChain=["MyKerberosAuthenticator"]

druid.auth.authenticator.MyKerberosAuthenticator.type=kerberos

To use the Kerberos authenticator, add an authenticator with type kerberos to the authenticatorChain. The example above uses the name "MyKerberosAuthenticator" for the Authenticator.

Configuration of the named authenticator is assigned through properties with the form:

druid.auth.authenticator.<authenticatorName>.<authenticatorProperty>

The configuration examples in the rest of this document will use "kerberos" as the name of the authenticator being configured.

Properties

PropertyPossible ValuesDescriptionDefaultrequired
druid.auth.authenticator.kerberos.serverPrincipalHTTP/_HOST@EXAMPLE.COMSPNEGO service principal used by druid processesemptyYes
druid.auth.authenticator.kerberos.serverKeytab/etc/security/keytabs/spnego.service.keytabSPNego service keytab used by druid processesemptyYes
druid.auth.authenticator.kerberos.authToLocalRULE:[1:$1@$0](druid@EXAMPLE.COM)s/.*/druid DEFAULTIt allows you to set a general rule for mapping principal names to local user names. It will be used if there is not an explicit mapping for the principal name that is being translated.DEFAULTNo
druid.auth.authenticator.kerberos.cookieSignatureSecretsecretStringSecret used to sign authentication cookies. It is advisable to explicitly set it, if you have multiple druid nodes running on same machine with different ports as the Cookie Specification does not guarantee isolation by port.Random valueNo
druid.auth.authenticator.kerberos.authorizerNameDepends on available authorizersAuthorizer that requests should be directed toEmptyYes

As a note, it is required that the SPNego principal in use by the druid processes must start with HTTP (This specified by RFC-4559) and must be of the form "HTTP/_HOST@REALM". The special string _HOST will be replaced automatically with the value of config druid.host

druid.auth.authenticator.kerberos.excludedPaths

In older releases, the Kerberos authenticator had an excludedPaths property that allowed the user to specify a list of paths where authentication checks should be skipped. This property has been removed from the Kerberos authenticator because the path exclusion functionality is now handled across all authenticators/authorizers by setting druid.auth.unsecuredPaths, as described in the main auth documentation.

Auth to Local Syntax

druid.auth.authenticator.kerberos.authToLocal allows you to set a general rules for mapping principal names to local user names. The syntax for mapping rules is RULE:\[n:string](regexp)s/pattern/replacement/g. The integer n indicates how many components the target principal should have. If this matches, then a string will be formed from string, substituting the realm of the principal for $0 and the nth component of the principal for $n. e.g. if the principal was druid/admin then \[2:$2$1suffix] would result in the string admindruidsuffix. If this string matches regexp, then the s//[g] substitution command will be run over the string. The optional g will cause the substitution to be global over the string, instead of replacing only the first match in the string. If required, multiple rules can be joined by newline character and specified as a String.

Increasing HTTP Header size for large SPNEGO negotiate header

In Active Directory environment, SPNEGO token in the Authorization header includes PAC (Privilege Access Certificate) information, which includes all security groups for the user. In some cases when the user belongs to many security groups the header to grow beyond what druid can handle by default. In such cases, max request header size that druid can handle can be increased by setting druid.server.http.maxRequestHeaderSize (default 8KiB) and druid.router.http.maxRequestBufferSize (default 8KiB).

Configuring Kerberos Escalated Client

Druid internal processes communicate with each other using an escalated http Client. A Kerberos enabled escalated HTTP Client can be configured by following properties -

PropertyExample ValuesDescriptionDefaultrequired
druid.escalator.typekerberosType of Escalator client used for internal process communication.n/aYes
druid.escalator.internalClientPrincipaldruid@EXAMPLE.COMPrincipal user name, used for internal process communicationn/aYes
druid.escalator.internalClientKeytab/etc/security/keytabs/druid.keytabPath to keytab file used for internal process communicationn/aYes
druid.escalator.authorizerNameMyBasicAuthorizerAuthorizer that requests should be directed to.n/aYes

Accessing Druid HTTP end points when kerberos security is enabled

  1. To access druid HTTP endpoints via curl user will need to first login using kinit command as follows -

    kinit -k -t <path_to_keytab_file> user@REALM.COM
    
  2. Once the login is successful verify that login is successful using klist command

  3. Now you can access druid HTTP endpoints using curl command as follows -

    curl --negotiate -u:anyUser -b ~/cookies.txt -c ~/cookies.txt -X POST -H'Content-Type: application/json' <HTTP_END_POINT>
    

    e.g to send a query from file query.json to the Druid Broker use this command -

    curl --negotiate -u:anyUser -b ~/cookies.txt -c ~/cookies.txt -X POST -H'Content-Type: application/json'  http://broker-host:port/druid/v2/?pretty -d @query.json
    

    Note: Above command will authenticate the user first time using SPNego negotiate mechanism and store the authentication cookie in file. For subsequent requests the cookie will be used for authentication.

Accessing Coordinator or Overlord console from web browser

To access Coordinator/Overlord console from browser you will need to configure your browser for SPNego authentication as follows -

  1. Safari - No configurations required.
  2. Firefox - Open firefox and follow these steps -
    1. Go to about:config and search for network.negotiate-auth.trusted-uris.
    2. Double-click and add the following values: "http://druid-coordinator-hostname:ui-port" and "http://druid-overlord-hostname:port"
  3. Google Chrome - From the command line run following commands -
    1. google-chrome --auth-server-whitelist="druid-coordinator-hostname" --auth-negotiate-delegate-whitelist="druid-coordinator-hostname"
    2. google-chrome --auth-server-whitelist="druid-overlord-hostname" --auth-negotiate-delegate-whitelist="druid-overlord-hostname"
  4. Internet Explorer -
    1. Configure trusted websites to include "druid-coordinator-hostname" and "druid-overlord-hostname"
    2. Allow negotiation for the UI website.

Sending Queries programmatically

Many HTTP client libraries, such as Apache Commons HttpComponents, already have support for performing SPNEGO authentication. You can use any of the available HTTP client library to communicate with druid cluster.

โ† Basic SecurityCached Lookup Module โ†’
  • Configuration
    • Creating an Authenticator
    • Properties
    • druid.auth.authenticator.kerberos.excludedPaths
    • Auth to Local Syntax
    • Increasing HTTP Header size for large SPNEGO negotiate header
  • Configuring Kerberos Escalated Client
  • Accessing Druid HTTP end points when kerberos security is enabled
  • Accessing Coordinator or Overlord console from web browser
  • Sending Queries programmatically

Technologyโ€‚ยทโ€‚Use Casesโ€‚ยทโ€‚Powered by Druidโ€‚ยทโ€‚Docsโ€‚ยทโ€‚Communityโ€‚ยทโ€‚Downloadโ€‚ยทโ€‚FAQ

โ€‚ยทโ€‚โ€‚ยทโ€‚โ€‚ยทโ€‚
Copyright ยฉ 2022 Apache Software Foundation.
Except where otherwise noted, licensed under CC BY-SA 4.0.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.