Apache Druid
  • Technology
  • Use Cases
  • Powered By
  • Docs
  • Community
  • Apache
  • Download

›Tutorials

Getting started

  • Introduction to Apache Druid
  • Quickstart (local)
  • Single server deployment
  • Clustered deployment

Tutorials

  • Load files natively
  • Load files using SQL 🆕
  • Load from Apache Kafka
  • Load from Apache Hadoop
  • Querying data
  • Roll-up
  • Theta sketches
  • Configuring data retention
  • Updating existing data
  • Compacting segments
  • Deleting data
  • Writing an ingestion spec
  • Transforming input data
  • Tutorial: Run with Docker
  • Kerberized HDFS deep storage
  • Convert ingestion spec to SQL
  • Get to know Query view
  • Unnesting arrays
  • Jupyter Notebook tutorials
  • Docker for tutorials
  • JDBC connector

Design

  • Design
  • Segments
  • Processes and servers
  • Deep storage
  • Metadata storage
  • ZooKeeper

Ingestion

  • Ingestion
  • Data formats
  • Data model
  • Data rollup
  • Partitioning
  • Ingestion spec
  • Schema design tips
  • Stream ingestion

    • Apache Kafka ingestion
    • Apache Kafka supervisor
    • Apache Kafka operations
    • Amazon Kinesis

    Batch ingestion

    • Native batch
    • Native batch: input sources
    • Migrate from firehose
    • Hadoop-based

    SQL-based ingestion 🆕

    • Overview
    • Key concepts
    • API
    • Security
    • Examples
    • Reference
    • Known issues
  • Task reference
  • Troubleshooting FAQ

Data management

  • Overview
  • Data updates
  • Data deletion
  • Schema changes
  • Compaction
  • Automatic compaction

Querying

    Druid SQL

    • Overview and syntax
    • SQL data types
    • Operators
    • Scalar functions
    • Aggregation functions
    • Array functions
    • Multi-value string functions
    • JSON functions
    • All functions
    • Druid SQL API
    • JDBC driver API
    • SQL query context
    • SQL metadata tables
    • SQL query translation
  • Native queries
  • Query execution
  • Troubleshooting
  • Concepts

    • Datasources
    • Joins
    • Lookups
    • Multi-value dimensions
    • Nested columns
    • Multitenancy
    • Query caching
    • Using query caching
    • Query context

    Native query types

    • Timeseries
    • TopN
    • GroupBy
    • Scan
    • Search
    • TimeBoundary
    • SegmentMetadata
    • DatasourceMetadata

    Native query components

    • Filters
    • Granularities
    • Dimensions
    • Aggregations
    • Post-aggregations
    • Expressions
    • Having filters (groupBy)
    • Sorting and limiting (groupBy)
    • Sorting (topN)
    • String comparators
    • Virtual columns
    • Spatial filters

Configuration

  • Configuration reference
  • Extensions
  • Logging

Operations

  • Web console
  • Java runtime
  • Security

    • Security overview
    • User authentication and authorization
    • LDAP auth
    • Password providers
    • Dynamic Config Providers
    • TLS support

    Performance tuning

    • Basic cluster tuning
    • Segment size optimization
    • Mixed workloads
    • HTTP compression
    • Automated metadata cleanup

    Monitoring

    • Request logging
    • Metrics
    • Alerts
  • API reference
  • High availability
  • Rolling updates
  • Using rules to drop and retain data
  • Working with different versions of Apache Hadoop
  • Misc

    • dump-segment tool
    • reset-cluster tool
    • insert-segment-to-db tool
    • pull-deps tool
    • Deep storage migration
    • Export Metadata Tool
    • Metadata Migration
    • Content for build.sbt

Development

  • Developing on Druid
  • Creating extensions
  • JavaScript functionality
  • Build from source
  • Versioning
  • Experimental features

Misc

  • Papers

Hidden

  • Apache Druid vs Elasticsearch
  • Apache Druid vs. Key/Value Stores (HBase/Cassandra/OpenTSDB)
  • Apache Druid vs Kudu
  • Apache Druid vs Redshift
  • Apache Druid vs Spark
  • Apache Druid vs SQL-on-Hadoop
  • Authentication and Authorization
  • Broker
  • Coordinator Process
  • Historical Process
  • Indexer Process
  • Indexing Service
  • MiddleManager Process
  • Overlord Process
  • Router Process
  • Peons
  • Approximate Histogram aggregators
  • Apache Avro
  • Microsoft Azure
  • Bloom Filter
  • DataSketches extension
  • DataSketches HLL Sketch module
  • DataSketches Quantiles Sketch module
  • DataSketches Theta Sketch module
  • DataSketches Tuple Sketch module
  • Basic Security
  • Kerberos
  • Cached Lookup Module
  • Apache Ranger Security
  • Google Cloud Storage
  • HDFS
  • Apache Kafka Lookups
  • Globally Cached Lookups
  • MySQL Metadata Store
  • ORC Extension
  • Druid pac4j based Security extension
  • Apache Parquet Extension
  • PostgreSQL Metadata Store
  • Protobuf
  • S3-compatible
  • Simple SSLContext Provider Module
  • Stats aggregator
  • Test Stats Aggregators
  • Druid AWS RDS Module
  • Kubernetes
  • Ambari Metrics Emitter
  • Apache Cassandra
  • Rackspace Cloud Files
  • DistinctCount Aggregator
  • Graphite Emitter
  • InfluxDB Line Protocol Parser
  • InfluxDB Emitter
  • Kafka Emitter
  • Materialized View
  • Moment Sketches for Approximate Quantiles module
  • Moving Average Query
  • OpenTSDB Emitter
  • Druid Redis Cache
  • Microsoft SQLServer
  • StatsD Emitter
  • T-Digest Quantiles Sketch module
  • Thrift
  • Timestamp Min/Max aggregators
  • GCE Extensions
  • Aliyun OSS
  • Prometheus Emitter
  • kubernetes
  • Cardinality/HyperUnique aggregators
  • Select
  • Firehose (deprecated)
  • Native batch (simple)
  • Realtime Process
Edit

Tutorial: Get to know Query view

This tutorial demonstrates some useful features built into Query view in Apache Druid.

Query view lets you run Druid SQL queries and native (JSON-based) queries against ingested data. Try out the Introduction to Druid SQL tutorial to learn more about Druid SQL.

You can use Query view to test and tune queries before you use them in API requests—for example, to perform SQL-based ingestion. You can also ingest data directly in Query view.

The tutorial guides you through the steps to ingest sample data and query the ingested data using some Query view features.

Prerequisites

Before you follow the steps in this tutorial, download Druid as described in the quickstart and have it running on your local machine. You don't need to have loaded any data.

Run a demo query to ingest data

Druid includes demo queries that each demonstrate a different Druid feature—for example transforming data during ingestion and sorting ingested data. Each query has detailed comments to help you learn more.

In this section you load the demo queries and run a SQL task to ingest sample data into a table datasource.

  1. Navigate to the Druid console at http://localhost:8888 and click Query.

  2. Click the ellipsis at the bottom of the query window and select Load demo queries. Note that loading the demo queries replaces all of your current query tabs. The demo queries load in several tabs:

    demo queries

  3. Click the Demo 1 tab. This query ingests sample data into a datasource called kttm_simple. Click the Demo 1 tab heading again and note the options—you can rename, copy, and duplicate tabs.

  4. Click Run to ingest the data.

  5. When ingestion is complete, Druid displays the time it took to complete the insert query, and the new datasource kttm_simple displays in the left pane.

View and filter query results

In this section you run some queries against the new datasource and perform some operations on the query results.

  1. Click + to the right of the existing tabs to open a new query tab.

  2. Click the name of the datasource kttm_simple in the left pane to display some automatically generated queries:

    auto queries

  3. Click SELECT * FROM kttm_simple and run the query.

  4. In the query results pane, click Chrome anywhere it appears in the browser column then click Filter on: browser = 'Chrome' to filter the results.

Run aggregate queries

Aggregate functions allow you to perform a calculation on a set of values and return a single value.

In this section you run some queries using aggregate functions and perform some operations on the results, using shortcut features designed to help you build your query.

  1. Open a new query tab.

  2. Click kttm_simple in the left pane to display the generated queries.

  3. Click SELECT COUNT() AS "Count" FROM kttm_simple* and run the query.

  4. After you run a query that contains an aggregate function, additional Query view options become available.

    Click the arrow to the left of the kttm_simple datasource to display the columns, then click the country column. Several options appear to apply country-based filters and aggregate functions to the query:

    count distinct

  5. Click Aggregate > COUNT(DISTINCT "country") to add this clause to the query. The query now appears as follows:

    SELECT COUNT(*) AS "Count",
       COUNT(DISTINCT "country") AS "dist_country"
    FROM "kttm_simple"
    GROUP BY ()
    

    Note that you can use column names such as dist_country in this example as shortcuts when building your query.

  6. Run the updated query:

    aggregate-query

  7. Click Engine: auto (sql-native) to display the engine options—native for native (JSON-based) queries, sql-native for Druid SQL queries, and sql-msq-task for SQL-based ingestion.

    Select auto to let Druid select the most efficient engine based on your query input.

  8. From the engine menu you can also edit the query context and turn off some query defaults.

    Deselect Use approximate COUNT(DISTINCT) and rerun the query. The country count in the results decreases because the computation has become more exact. See SQL aggregation functions for more information.

  9. Query view can provide information about a function, in case you aren't sure exactly what it does.

    Delete the contents of the query line COUNT(DISTINCT country) AS dist_country and type COUNT(DISTINCT) to replace it. A help dialog for the function displays:

    count distinct help

    Click outside the help window to close it.

  10. You can perform actions on calculated columns in the results pane.

    Click the results column heading dist_country COUNT(DISTINCT "country") to see the available options:

    result columns actions

  11. Select Edit column and change the Output name to Distinct countries.

Generate an explain plan

In this section you generate an explain plan for a query. An explain plan shows the full query details and all of the operations Druid performs to execute it.

Druid optimizes queries of certain types—see SQL query translation for information on how to interpret an explain plan and use the details to improve query performance.

  1. Open a new query tab.

  2. Click kttm_simple in the left pane to display the generated queries.

  3. Click SELECT * FROM kttm_simple and run the query.

  4. Click the ellipsis at the bottom of the query window and select Explain SQL query. The query plan opens in a new window:

    query plan

  5. Click Open in new tab. You can review the query details and modify it as required.

  6. Change the limit from 1001 to 2001:

    "Limit": 2001,
    

    and run the query to confirm that the updated query returns 2,001 results.

Try out a few more features

In this section you try out a few more useful Query view features.

Use calculator mode

Queries without a FROM clause run in calculator mode—this can be useful to help you understand how functions work. See the Druid SQL functions reference for more information.

  1. Open a new query tab and enter the following:

    SELECT SQRT(49)
    
  2. Run the query to produce the result 7.

Download query results

You can download query results in CSV, TSV, or newline-delimited JSON format.

  1. Open a new query tab and run a query, for example:

    SELECT DISTINCT platform
    FROM kttm_simple
    
  2. Above the results pane, click the down arrow and select Download results as… CSV.

View query history

In any query tab, click the ellipsis at the bottom of the query window and select Query history.

You can click the links on the left to view queries run at a particular date and time, and open a previously run query in a new query tab.

Further reading

For more information on ingestion and querying data, see the following topics:

  • Quickstart for information on getting started with Druid.
  • Tutorial: Querying data for example queries to run on Druid data.
  • Ingestion for an overview of ingestion and the ingestion methods available in Druid.
  • SQL-based ingestion for an overview of SQL-based ingestion.
  • SQL-based ingestion query examples for examples of SQL-based ingestion for various use cases.
← JDBC connectorUnnesting arrays →
  • Prerequisites
  • Run a demo query to ingest data
  • View and filter query results
  • Run aggregate queries
  • Generate an explain plan
  • Try out a few more features
    • Use calculator mode
    • Download query results
    • View query history
  • Further reading

Technology · Use Cases · Powered by Druid · Docs · Community · Download · FAQ

 ·  ·  · 
Copyright © 2022 Apache Software Foundation.
Except where otherwise noted, licensed under CC BY-SA 4.0.
Apache Druid, Druid, and the Druid logo are either registered trademarks or trademarks of The Apache Software Foundation in the United States and other countries.