SQL data types
Apache Druid supports two query languages: Druid SQL and native queries. This document describes the SQL language.
Columns in Druid are associated with a specific data type. This topic describes supported data types in Druid SQL.
Druid natively supports five basic column types: "long" (64 bit signed int), "float" (32 bit float), "double" (64 bit float) "string" (UTF-8 encoded strings and string arrays), and "complex" (catch-all for more exotic data types like json, hyperUnique, and approxHistogram columns).
Timestamps (including the
__time column) are treated by Druid as longs, with the value being the number of
milliseconds since 1970-01-01 00:00:00 UTC, not counting leap seconds. Therefore, timestamps in Druid do not carry any
timezone information, but only carry information about the exact moment in time they represent. See the
Time functions section for more information about timestamp handling.
Casts between two SQL types with the same Druid runtime type (see below table) have no effect, other than exceptions
noted in the table. Casts between two SQL types that have different Druid runtime types generate a runtime cast in
Druid. If a value cannot be cast to the target type, as in
CAST('foo' AS BIGINT), Druid either substitutes a default
druid.generic.useDefaultValueForNull = true, the default mode), or substitutes NULL (when
druid.generic.useDefaultValueForNull = false). NULL values cast to non-nullable types are also substituted with a
default value. For example, if
druid.generic.useDefaultValueForNull = true, a null VARCHAR cast to BIGINT is converted
to a zero.
The following table describes how Druid maps SQL types onto native types when running queries.
|SQL type||Druid runtime type||Default value*||Notes|
|VARCHAR||STRING||Druid STRING columns are reported as VARCHAR. Can include multi-value strings as well.|
|DECIMAL||DOUBLE||DECIMAL uses floating point, not fixed point math|
|FLOAT||FLOAT||Druid FLOAT columns are reported as FLOAT|
|DOUBLE||DOUBLE||Druid DOUBLE columns are reported as DOUBLE|
|BIGINT||LONG||Druid LONG columns (except |
|DATE||LONG||Casting TIMESTAMP to DATE rounds down the timestamp to the nearest day. Casts between string and date types assume standard SQL formatting, e.g. |
|OTHER||COMPLEX||none||May represent various Druid column types such as hyperUnique, approxHistogram, etc.|
* Default value applies if
druid.generic.useDefaultValueForNull = true (the default mode). Otherwise, the default value is
NULL for all types.
Druid's native type system allows strings to potentially have multiple values. These
multi-value string dimensions are reported in SQL as
VARCHAR typed, and can be
syntactically used like any other VARCHAR. Regular string functions that refer to multi-value string dimensions are
applied to all values for each row individually. Multi-value string dimensions can also be treated as arrays via special
multi-value string functions, which can perform powerful array-aware operations.
Grouping by a multi-value expression observes the native Druid multi-value aggregation behavior, which is similar to
UNNEST functionality available in some other SQL dialects. Refer to the documentation on
multi-value string dimensions for additional details.
Because multi-value dimensions are treated by the SQL planner as
VARCHAR, there are some inconsistencies between how they are handled in Druid SQL and in native queries. For example, expressions involving multi-value dimensions may be incorrectly optimized by the Druid SQL planner:
multi_val_dim = 'a' AND multi_val_dim = 'b'is optimized to
false, even though it is possible for a single row to have both "a" and "b" as values for
multi_val_dim. The SQL behavior of multi-value dimensions will change in a future release to more closely align with their behavior in native queries.
runtime property controls Druid's NULL handling mode. For the most SQL compliant behavior, set this to
druid.generic.useDefaultValueForNull = true (the default mode), Druid treats NULLs and empty strings
interchangeably, rather than according to the SQL standard. In this mode Druid SQL only has partial support for NULLs.
For example, the expressions
col IS NULL and
col = '' are equivalent, and both evaluate to true if
contains an empty string. Similarly, the expression
COALESCE(col1, col2) returns
col1 is an empty
string. While the
COUNT(*) aggregator counts all rows, the
COUNT(expr) aggregator counts the number of rows
expr is neither null nor the empty string. Numeric columns in this mode are not nullable; any null or missing
values are treated as zeroes.
druid.generic.useDefaultValueForNull = false, NULLs are treated more closely to the SQL standard. In this mode,
numeric NULL is permitted, and NULLs and empty strings are no longer treated as interchangeable. This property
affects both storage and querying, and must be set on all Druid service types to be available at both ingestion time
and query time. There is some overhead associated with the ability to handle NULLs; see
the segment internals documentation for more details.
runtime property controls Druid's boolean logic mode. For the most SQL compliant behavior, set this to
druid.expressions.useStrictBooleans = false (the default mode), Druid uses two-valued logic.
druid.expressions.useStrictBooleans = true, Druid uses three-valued logic for
expressions evaluation, such as
expression virtual columns or
However, even in this mode, Druid uses two-valued logic for filter types other than
Druid supports storing nested data structures in segments using the native
COMPLEX<json> type. See Nested columns for more information.
You can interact with nested data using JSON functions, which can extract nested values, parse from string, serialize to string, and create new
COMPLEX types have limited functionality outside the specialized functions that use them, so their behavior is undefined when:
- Grouping on complex values.
- Filtering directly on complex values, such as
WHERE json is NULL.
- Used as inputs to aggregators without specialized handling for a specific complex type.
In many cases, functions are provided to translate
COMPLEX value types to
STRING, which serves as a workaround solution until
COMPLEX type functionality can be improved.