Apache Druid
  • Technology
  • Use Cases
  • Powered By
  • Docs
  • Community
  • Apache
  • Download

โ€บDruid SQL

Getting started

  • Introduction to Apache Druid
  • Quickstart (local)
  • Single server deployment
  • Clustered deployment

Tutorials

  • Load files natively
  • Load files using SQL ๐Ÿ†•
  • Load from Apache Kafka
  • Load from Apache Hadoop
  • Querying data
  • Roll-up
  • Theta sketches
  • Configuring data retention
  • Updating existing data
  • Compacting segments
  • Deleting data
  • Writing an ingestion spec
  • Transforming input data
  • Tutorial: Run with Docker
  • Kerberized HDFS deep storage
  • Convert ingestion spec to SQL
  • Jupyter Notebook tutorials

Design

  • Design
  • Segments
  • Processes and servers
  • Deep storage
  • Metadata storage
  • ZooKeeper

Ingestion

  • Ingestion
  • Data formats
  • Data model
  • Data rollup
  • Partitioning
  • Ingestion spec
  • Schema design tips
  • Stream ingestion

    • Apache Kafka ingestion
    • Apache Kafka supervisor
    • Apache Kafka operations
    • Amazon Kinesis

    Batch ingestion

    • Native batch
    • Native batch: input sources
    • Migrate from firehose
    • Hadoop-based

    SQL-based ingestion ๐Ÿ†•

    • Overview
    • Key concepts
    • API
    • Security
    • Examples
    • Reference
    • Known issues
  • Task reference
  • Troubleshooting FAQ

Data management

  • Overview
  • Data updates
  • Data deletion
  • Schema changes
  • Compaction
  • Automatic compaction

Querying

    Druid SQL

    • Overview and syntax
    • SQL data types
    • Operators
    • Scalar functions
    • Aggregation functions
    • Multi-value string functions
    • JSON functions
    • All functions
    • Druid SQL API
    • JDBC driver API
    • SQL query context
    • SQL metadata tables
    • SQL query translation
  • Native queries
  • Query execution
  • Troubleshooting
  • Concepts

    • Datasources
    • Joins
    • Lookups
    • Multi-value dimensions
    • Nested columns
    • Multitenancy
    • Query caching
    • Using query caching
    • Query context

    Native query types

    • Timeseries
    • TopN
    • GroupBy
    • Scan
    • Search
    • TimeBoundary
    • SegmentMetadata
    • DatasourceMetadata

    Native query components

    • Filters
    • Granularities
    • Dimensions
    • Aggregations
    • Post-aggregations
    • Expressions
    • Having filters (groupBy)
    • Sorting and limiting (groupBy)
    • Sorting (topN)
    • String comparators
    • Virtual columns
    • Spatial filters

Configuration

  • Configuration reference
  • Extensions
  • Logging

Operations

  • Web console
  • Java runtime
  • Security

    • Security overview
    • User authentication and authorization
    • LDAP auth
    • Password providers
    • Dynamic Config Providers
    • TLS support

    Performance tuning

    • Basic cluster tuning
    • Segment size optimization
    • Mixed workloads
    • HTTP compression
    • Automated metadata cleanup

    Monitoring

    • Request logging
    • Metrics
    • Alerts
  • API reference
  • High availability
  • Rolling updates
  • Using rules to drop and retain data
  • Working with different versions of Apache Hadoop
  • Misc

    • dump-segment tool
    • reset-cluster tool
    • insert-segment-to-db tool
    • pull-deps tool
    • Deep storage migration
    • Export Metadata Tool
    • Metadata Migration
    • Content for build.sbt

Development

  • Developing on Druid
  • Creating extensions
  • JavaScript functionality
  • Build from source
  • Versioning
  • Experimental features

Misc

  • Papers

Hidden

  • Apache Druid vs Elasticsearch
  • Apache Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)
  • Apache Druid vs Kudu
  • Apache Druid vs Redshift
  • Apache Druid vs Spark
  • Apache Druid vs SQL-on-Hadoop
  • Authentication and Authorization
  • Broker
  • Coordinator Process
  • Historical Process
  • Indexer Process
  • Indexing Service
  • MiddleManager Process
  • Overlord Process
  • Router Process
  • Peons
  • Approximate Histogram aggregators
  • Apache Avro
  • Microsoft Azure
  • Bloom Filter
  • DataSketches extension
  • DataSketches HLL Sketch module
  • DataSketches Quantiles Sketch module
  • DataSketches Theta Sketch module
  • DataSketches Tuple Sketch module
  • Basic Security
  • Kerberos
  • Cached Lookup Module
  • Apache Ranger Security
  • Google Cloud Storage
  • HDFS
  • Apache Kafka Lookups
  • Globally Cached Lookups
  • MySQL Metadata Store
  • ORC Extension
  • Druid pac4j based Security extension
  • Apache Parquet Extension
  • PostgreSQL Metadata Store
  • Protobuf
  • S3-compatible
  • Simple SSLContext Provider Module
  • Stats aggregator
  • Test Stats Aggregators
  • Druid AWS RDS Module
  • Kubernetes
  • Ambari Metrics Emitter
  • Apache Cassandra
  • Rackspace Cloud Files
  • DistinctCount Aggregator
  • Graphite Emitter
  • InfluxDB Line Protocol Parser
  • InfluxDB Emitter
  • Kafka Emitter
  • Materialized View
  • Moment Sketches for Approximate Quantiles module
  • Moving Average Query
  • OpenTSDB Emitter
  • Druid Redis Cache
  • Microsoft SQLServer
  • StatsD Emitter
  • T-Digest Quantiles Sketch module
  • Thrift
  • Timestamp Min/Max aggregators
  • GCE Extensions
  • Aliyun OSS
  • Prometheus Emitter
  • kubernetes
  • Cardinality/HyperUnique aggregators
  • Select
  • Firehose (deprecated)
  • Native batch (simple)
  • Realtime Process
Edit

SQL data types

Apache Druid supports two query languages: Druid SQL and native queries. This document describes the SQL language.

Columns in Druid are associated with a specific data type. This topic describes supported data types in Druid SQL.

Standard types

Druid natively supports five basic column types: "long" (64 bit signed int), "float" (32 bit float), "double" (64 bit float) "string" (UTF-8 encoded strings and string arrays), and "complex" (catch-all for more exotic data types like json, hyperUnique, and approxHistogram columns).

Timestamps (including the __time column) are treated by Druid as longs, with the value being the number of milliseconds since 1970-01-01 00:00:00 UTC, not counting leap seconds. Therefore, timestamps in Druid do not carry any timezone information, but only carry information about the exact moment in time they represent. See the Time functions section for more information about timestamp handling.

Casts between two SQL types with the same Druid runtime type (see below table) have no effect, other than exceptions noted in the table. Casts between two SQL types that have different Druid runtime types generate a runtime cast in Druid. If a value cannot be cast to the target type, as in CAST('foo' AS BIGINT), Druid either substitutes a default value (when druid.generic.useDefaultValueForNull = true, the default mode), or substitutes NULL (when druid.generic.useDefaultValueForNull = false). NULL values cast to non-nullable types are also substituted with a default value. For example, if druid.generic.useDefaultValueForNull = true, a null VARCHAR cast to BIGINT is converted to a zero.

The following table describes how Druid maps SQL types onto native types when running queries.

SQL typeDruid runtime typeDefault value*Notes
CHARSTRING''
VARCHARSTRING''Druid STRING columns are reported as VARCHAR. Can include multi-value strings as well.
DECIMALDOUBLE0.0DECIMAL uses floating point, not fixed point math
FLOATFLOAT0.0Druid FLOAT columns are reported as FLOAT
REALDOUBLE0.0
DOUBLEDOUBLE0.0Druid DOUBLE columns are reported as DOUBLE
BOOLEANLONGfalse
TINYINTLONG0
SMALLINTLONG0
INTEGERLONG0
BIGINTLONG0Druid LONG columns (except __time) are reported as BIGINT
TIMESTAMPLONG0, meaning 1970-01-01 00:00:00 UTCDruid's __time column is reported as TIMESTAMP. Casts between string and timestamp types assume standard SQL formatting, e.g. 2000-01-02 03:04:05, not ISO8601 formatting. For handling other formats, use one of the time functions.
DATELONG0, meaning 1970-01-01Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting, e.g. 2000-01-02. For handling other formats, use one of the time functions.
OTHERCOMPLEXnoneMay represent various Druid column types such as hyperUnique, approxHistogram, etc.

* Default value applies if druid.generic.useDefaultValueForNull = true (the default mode). Otherwise, the default value is NULL for all types.

Multi-value strings

Druid's native type system allows strings to potentially have multiple values. These multi-value string dimensions are reported in SQL as VARCHAR typed, and can be syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions are applied to all values for each row individually. Multi-value string dimensions can also be treated as arrays via special multi-value string functions, which can perform powerful array-aware operations.

Grouping by a multi-value expression observes the native Druid multi-value aggregation behavior, which is similar to the UNNEST functionality available in some other SQL dialects. Refer to the documentation on multi-value string dimensions for additional details.

Because multi-value dimensions are treated by the SQL planner as VARCHAR, there are some inconsistencies between how they are handled in Druid SQL and in native queries. For example, expressions involving multi-value dimensions may be incorrectly optimized by the Druid SQL planner: multi_val_dim = 'a' AND multi_val_dim = 'b' is optimized to false, even though it is possible for a single row to have both "a" and "b" as values for multi_val_dim. The SQL behavior of multi-value dimensions will change in a future release to more closely align with their behavior in native queries.

NULL values

The druid.generic.useDefaultValueForNull runtime property controls Druid's NULL handling mode. For the most SQL compliant behavior, set this to false.

When druid.generic.useDefaultValueForNull = true (the default mode), Druid treats NULLs and empty strings interchangeably, rather than according to the SQL standard. In this mode Druid SQL only has partial support for NULLs. For example, the expressions col IS NULL and col = '' are equivalent, and both evaluate to true if col contains an empty string. Similarly, the expression COALESCE(col1, col2) returns col2 if col1 is an empty string. While the COUNT(*) aggregator counts all rows, the COUNT(expr) aggregator counts the number of rows where expr is neither null nor the empty string. Numeric columns in this mode are not nullable; any null or missing values are treated as zeroes.

When druid.generic.useDefaultValueForNull = false, NULLs are treated more closely to the SQL standard. In this mode, numeric NULL is permitted, and NULLs and empty strings are no longer treated as interchangeable. This property affects both storage and querying, and must be set on all Druid service types to be available at both ingestion time and query time. There is some overhead associated with the ability to handle NULLs; see the segment internals documentation for more details.

Boolean logic

The druid.expressions.useStrictBooleans runtime property controls Druid's boolean logic mode. For the most SQL compliant behavior, set this to true.

When druid.expressions.useStrictBooleans = false (the default mode), Druid uses two-valued logic.

When druid.expressions.useStrictBooleans = true, Druid uses three-valued logic for expressions evaluation, such as expression virtual columns or expression filters. However, even in this mode, Druid uses two-valued logic for filter types other than expression.

Nested columns

Druid supports storing nested data structures in segments using the native COMPLEX<json> type. See Nested columns for more information.

You can interact with nested data using JSON functions, which can extract nested values, parse from string, serialize to string, and create new COMPLEX<json> structures.

COMPLEX types have limited functionality outside the specialized functions that use them, so their behavior is undefined when:

  • Grouping on complex values.
  • Filtering directly on complex values, such as WHERE json is NULL.
  • Used as inputs to aggregators without specialized handling for a specific complex type.

In many cases, functions are provided to translate COMPLEX value types to STRING, which serves as a workaround solution until COMPLEX type functionality can be improved.

โ† Overview and syntaxOperators โ†’
  • Standard types
  • Multi-value strings
  • NULL values
  • Boolean logic
  • Nested columns

Technologyโ€‚ยทโ€‚Use Casesโ€‚ยทโ€‚Powered by Druidโ€‚ยทโ€‚Docsโ€‚ยทโ€‚Communityโ€‚ยทโ€‚Downloadโ€‚ยทโ€‚FAQ

โ€‚ยทโ€‚โ€‚ยทโ€‚โ€‚ยทโ€‚
Copyright ยฉ 2022 Apache Software Foundation.
Except where otherwise noted, licensed under CC BY-SA 4.0.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.