Powered by Apache Druid
There are numerous companies of various sizes in production with Druid. Some of them are listed below.
Adikteev is the leading mobile app re-engagement platform for performance-driven marketers, and is consistently ranked in the top 5 of the AppsFlyer Performance Index. By using Druid instead of relying on slow and stale dashboards, we have been able to achieve internal productivity gains, make better decisions faster, provide our external clients with strategic advice to improve the performance and effectiveness of their retargeting marketing campaigns, and notify clients quickly of potentially serious problems.
- How Adikteev helps customers succeed using self-service analytics (Margot Miller), Imply Blog, 20 Aug 2020.
AdsWizz is the leading global digital audio advertising solution provider, enabling the effective monetization and personalization of digital audio through a complete suite of advertising and analytics solutions.
We use Druid as the core analytics engine, which powers a multitude of real-time dashboards and widgets, used both by customers and internal staff. Sub-second query response times, the removal of obsolete data cubes, ad-hoc analytics and the seamless integration of historical and real-time data are the main reasons we have chosen Druid. Advertisers, agencies and publishers can get insights on hundreds of metrics and dimensions specific to the advertising industry, having the possibility to slice and dice the data according to their needs.
Druid powers slice and dice analytics on both historical and realtime-time metrics. It significantly reduces latency of analytic queries and help people to get insights more interactively.
How Druid enables analytics at Airbnb (Pala Muthiah, Jinyang Li), The AirBnb Tech Blog [Medium], 13 Nov 2018.
How Airbnb Achieved Metric Consistency at Scale (Amit Pahwa, Cristian Figueroa, Donghan Zhang, Haim Grosman, John Bodley, Jonathan Parks, Maggie Zhu, Philip Weiss, Robert Chang, Shao Xie, Sylvia Tomiyama, Xiaohui Sun), The AirBnb Tech Blog [Medium], 30 Apr 2021.
Airbridge is an people-based attribution and omni-channel campaign analytics platform helping marketers measure and optimize their marketing campaigns. Thanks to Druid's data aggregation technology, marketers using Airbridge are able to receive real-time granular reports regarding their campaign's performance executed across a variety of devices, platforms, and channels.
Druid is widely used in different business units of Alibaba Group.
At Alibaba Search Group, Druid is used to power real-time analytics of users' interaction with its popular e-commerce site.
At its Local Services Business Unit, Druid is responsible for realtime application performance monitoring and alerting. More details can be found in this blog (Chinese).
Allegro is the most popular Polish shopping destination with about 14 million users monthly and over 1.5 million items sold on the platform daily. We have been using Druid since 2016 as business intelligence platform to power our clickstream analytics, marketing automation, anomaly detection, technical metrics and more. Our cluster (432 CPUs, 1300GB of RAM on historicals) processes billions of realtime events loaded from Kafka and hundreds batch indexing jobs on daily basis. It's a pleasure to work with Druid, extraordinarily efficient and exceptionally stable platform with vibrant community.
We leverage Apache Druid in conjunction with the Alphaa AI Super AI NLP Engine to implement Natural Language Query on billions of rows of data for Alphaa Cloud customers.
Today’s consumers move across screens with ease. But advertisers and media companies alike struggle to unify audiences and optimize media investment. Only Amobee brings it all together. We give you the power to bridge these silos with end-to-end solutions that help brands, agencies, and media companies optimize the consumer experience across linear TV, connected TV and digital, including social. You are in control, finding the audiences you seek and getting the performance that you need to drive meaningful results for your business. Our loyal customers have achieved dramatic performance improvements and efficiencies by holistically optimizing their full portfolio of media against strategic, in-market audiences. Together, these beloved household brands and premium media providers have made Amobee a world leading advertising & premium media management platform. Why? Like us, they believe that when all media investment can be mapped to business outcomes, everyone wins -- including the consumer.
Amobee provides advertising solutions across tv, digital and social platforms. Amobee has massive incoming data, with customers requiring reporting analytics across unbounded combinations of use cases. Amobee uses Druid to power such intense customer needs via quick response dashboards and offline reports in Amobee’s Platform UI, as well as queries from demanding automated systems. Specifically, we love that Druid allows convergence of many data streams via its robust ingestion, parallel aggregation, redundant storage and concurrent query mechanisms.
Apollo uses Druid to power our "Graph Manager" SaaS service, which helps application developers manage, validate and secure their organization’s data graph. In addition to managing the evolution of their GraphQL schema, users can draw insights from the operations metadata and execution trace data that are sent to our service, indexed, and stored in Druid.
Druid is the major player in the real-time analytics pipeline at Appsflyer, and it serves as customer facing analytics database for dashboard reporting.
Druid is used to power dynamic analytics and charting in Archive-It Reports. Reports help Archive-It partners understand what content they captured, why some content did not get captured, and assists with quality assurance and crawl scoping to ensure they are building the highest quality collections.
At Athena Health, we are creating a new performance management application for our clients, and one of its key components is Apache Druid.
- Automating CI/CD for Druid Clusters at Athena Health (Shyam Mudambi, Ramesh Kempanna, Karthik Urs), Imply [YouTube], 16 Apr 2020.
Atomx is a new media exchange that connects networks, DSPs, SSPs, and other parties. Atomx uses Druid for it's advanced realtime reporting system. Using the Google Cloud modifications Atomx contributed to Druid, it can easily scale Druid with the fast growing platform.
Autonomic Transportation Mobility Cloud (TMC) is the first industry open cloud platform for transportation and mobility data. One of our missions is to enable our customers to easily explore a tremendous amount of data and draw valuable insights in a timely fashion. To meet these needs, we deploy Apache Druid in our platform to aggregate and process time series data. Each component in Druid is running as an individual microservices, and can be easily deployed, managed, and scaled independently in our Kubernetes clusters.
At Avesta, we use Druid as a central component in our cloud data platform to provide real-time analytics solutions to our clients. We are using Druid for Customer Data Analytics and extending it for Industrial IoT and Market Automation use cases. Druid has not only proven itself to be resilient and performant but has also helped our clients save enormous amounts in cloud and licensing costs.
- Community Spotlight: staying true to open-source roots at Avesta (Dharam Gajera,Peter Marshall, Jelena Zanko), Imply Blog, 25 May 2021.
BIGO selects Druid as an OLAP engine to analyze app(Like, Bigolive, IMO, etc.) data in bigdata team. Typical analysis in Druid includes apm, A/B testing, push, funnel model, etc. Besides, Superset is used as dashboard for deep integration with Druid.
Billy Mobile is a mobile advertising platform, excelling in the performance-based optimisation segment. We use Druid to power our real-time analytics dashboards, in which our publishers, advertisers and staff can get insights on how their campaigns, offers and traffic are performing, with sub-second query time and minute granularity . We are using a lambda-architecture aproach, ingesting the traffic in real time with Tranquility and Storm, and a batch layer via a tight integration with Hive and Hadoop, our Master Data Set. This way we can provide crucial fast access to data, while making sure to have the right numbers.
Blis is the trusted leader in location-powered advertising and analytics, helping brands understand, reach and engage consumers globally to deliver measurable results. The operations team uses Druid to manage media purchases and make intelligent pricing decisions. The analytic insights team, which helps customers maximize their value from Blis, uses Druid to discover opportunities and tell data-driven stories about their campaigns. Engineering and product teams use Druid to keep an eye on revenue and margin, with the ability to drill into unexpected changes.
Blueshift is an AI-powered customer data activation platform enabling CRM and product marketers to intelligently manage their audiences and orchestrate large-scale personalized messaging campaigns at scale. Blueshift offers real-time campaign analytics in the product powered by Druid. Additionally, we use the same analytics backend to power automatic traffic allocation capabilities using Bayesian bandits, perform cohort analysis over multiple dimensions, and generate internal reports to measure the impact of ROI of AI across different types of campaigns and industry verticals.
"We operate a multi-tenant Druid cluster in AWS. Additionally, we have integrated our data lake (Hive/Hbase) with Druid to manage data backfills."
- Blueshift: Scaling real-time campaign analytics with Apache Druid (Anuraj Pandey), Imply blog, 8 Aug 2019.
- Data Engineering At Booking.com: a case study (Andreas Kretz), YouTube, 15 Mar 2019.
Branch uses Druid as their trusted analytics engine to power all of their data analysis needs. This ranges from the user-facing Branch Dashboard analytics that our partners rely on to gain insight into the performance of their links; to the data-driven business decisions that we need to make internally to build a sustainable business.
British Telecom (BT)
- How Druid powers real-time analytics at BT (Pankaj Tiwari), Imply [YouTube], 17 Apr 2020.
Central Bank of the Republic of Turkey
The Central Bank of the Republic of Turkey addresses the challenge of analyzing and interpreting real-time tick data related to money market instruments using Apache Druid and Superset.
- Developing High Frequency Indicators Using Real-Time Tick Data on Apache Superset and Druid (Zekeriya Besiroglu, Emre Tokel, Kerem Başol, M. Yağmur. Şahin), DataWorks Summit [YouTube], 21 Mar 2019.
- Druid at Charter Spectrum (Nate Vogel, Andy Amick), Imply [SlideShare], 24 Jan 2019.
CircleHD is an enterprise video enablement platform, used by businesses for training, sales enablement and digital learning experience. CircleHD uses Druid to power the analytics system that provides real-time insight into adoption, usage and engagement.
Cisco uses Druid to power a real-time analytics platform for network flow data.
- Under the hood of Cisco’s Tetration Analytics platform (Brandon Butler), Network World, 20 Jun 2016.
Condé Nast uses Druid to track billions of events across our two dozen brands, both in real time and historically. Druid helps power dashboards, site performance trackers, and many other internal applications.
Druid has helped push operational visibility to the next level. Operating multi-tenant services requires fine-grained visibility down to the individual tenant, user, or application behavior, where most traditional monitoring stacks fail to scale or become cost-prohibitive. Leveraging Druid as part of our stack means we don't shy away from high-cardinality data. As a result, our teams can not only quickly troubleshoot issues but also glean detailed understanding to help improve the product.
- Scaling Apache Druid for Real-Time Cloud Analytics at Confluent (Zohreh Karimi, Harini Rajendran), Confluent Blog, 8 Nov 2021.
Contiamo uses cutting edge technology to enable agile development and deployment of data-driven frontends and automations. Druid powers various dashboards and frontends that deal with large amounts of event based data.
Criteo is using druid to provide its customers with user-centric analytics & reporting. With more than 1 billion unique users reached per month, 3 billion ads displayed per day, and a 70% growth in 2014, Criteo's previous stack was hard pressed keeping with the load. Replacing it with druid helped us achieved linear scalability while letting our customers explore their data in new and interactive ways.
- Real Time Metrics on Tracker Calls (Camille Coueslant, Benoit Jehanno), Criteo Engineering, 18 Feb 2016.
CrunchMetrics has fully integrated their platform with Apache Druid, designed to quickly ingest massive quantities of event data, and provide low-latency queries on top of the data. With the new integration, customers can gain real-time, intelligent insights from streaming data. This development is in line with the current scalability requirements of 5G and the digital intensification we are all experiencing.
- CrunchMetrics Year in Review 2020: A Year That Was an Anomaly in Itself! (Rohit Maheshwari), CrunchMetrics website, 30 Dec 2020.
At Cuebook, Druid serves as the analytics database for our augmented analytics product.
- Augmented analytics on business metrics by Cuebook with Apache Druid (Sachin Bansal, Peter Marshall), Imply blog, 18 Aug 2021.
DBS is using Apache Druid to handle the AML investigation for the compliance team. The AML (anti-money laundering) workflow generates alerts which are tracked within Druid. The transactional data is ingested from RDBMS to S3 and ingested back to Druid at regular intervals. Investigators can now slice and dice over millions of data with low latency.
- Holistic AML compliance using Apache Druid (Arpit Dubey), Imply [YouTube], 15 Apr 2020.
Dataroid is a digital analytics and customer engagement platform helping brands to understand their users' interactions with their digital assets in details and deliver a better experience with data-driven insights.
Druid powers Dataroid analytics dashboards allowing our customers to interactively deep dive into behavioral and performance data and explore hundreds of metrics in real time.
Datumo is a 100% remote Big Data software house from Poland. We are co-authors of Turnilo - an open-source data exploration tool for Druid. Our analytics platform Storyteller uses power of Apache Druid and Turnilo to help our clients analyze large datasets in interactive manner. Our biggest Druid deployment is responsible for real-time monitoring and data processing of a large video streaming platform (60 000 Events Per Second).
At Deep.BI we track user habits, engagement, product and content performance — processing up to terabytes or billions of events of data daily. Our goal is to provide real-time insights based on custom metrics from a variety of self-created dimensions. To accomplish this, our system collects every user interaction. We use Apache Flink for event enrichment, custom transformations, aggregations and serving machine learning models. The processed data is then indexed by Apache Druid for real-time analytics and our custom UI built on top of Druid and Apache Cassandra for delivery of the scores.
Our talk from FlinkForward conference in Berlin 2019:
- Real-time Stream Analytics and User Scoring Using Apache Druid, Flink & Cassandra at Deep.BI (Hisham Itani), Medium, 24 Mar 2020.
Deema Agency Delivers Results As a leading digital advertising agency in Iran, we design, run and optimize performance and awareness-based digital campaigns for businesses and help them to achieve their goals. With state of the art programmatic advertising technology, extensive ad network, and advanced advertising platforms, we deliver your brand message to your target audience in Iran in the most effective way . We are challenging with the intensive data stream realtime for analytics and better experience with data-driven insights. Druid powers our applications related metrics and helps us slice and dice analytics on both historical and realtime metrics.
Delta Projects uses Druid to power real-time audience and advertising analytics interfaces.
Didi Chuxing is the world’s largest mobile transportation platform, offering a full range of commuting options to 400 cities in China. Didi uses Druid as a core component of our real-time bigdata processing pipeline. Druid powers Didi's real-time monitor system which has hundreds of key business metrics. We are deep impressed by Druid's fast aggregations and sub-second OLAP queries. With Druid, we can get insights from data in real-time.
Dream11 is consuming data from Apache Druid, Amazon Redshift, and Athena and building models on Looker to report user concurrency, to track user journeys, and to build interaction events based funnels.
- Data Highway — Dream11’s Inhouse Analytics Platform — The Burden and Benefits (Lavanya Pulijala), Dream11 Engineering [Medium], 27 Jan 2020.
Easemob Technologies, Inc. is the leading provider of open IM platform, cloud computing customer services. We enable PaaS services for Instant Messaging and SaaS services for Mobile Customer Service.
eBay uses Druid to aggregate multiple data streams for real-time user behavior analytics by ingesting up at a very high rate(over 100,000 events/sec), with the ability to query or aggregate data by any random combination of dimensions, and support over 100 concurrent queries without impacting ingest rate and query latencies.
- Monitoring at eBay with Druid (Garadi Mohan), eBay Big Data Engineering, 29 May 2019.
- embedded-druid: Leveraging Druid Capabilities in Stand-alone Applications (Ruchir Shah), eBay Big Data Engineering, 5 Feb 2016.
- A glance of Pulsar and Druid (Xiaoming Zhang), eBay Big Data Engineering, 7 Dec 2015.
Expedia built real-time customer segmentation into the Expedia Data Platform using Druid as its core component. It helps business and data science teams identify customers using many different criteria.
- Fast Approximate Counting Using Druid and Datasketch (Elan Halfin, Aravind Sethurathnam), Expedia Group Technology [Medium], 10 Dec 2020.
One of our key goals at FacilityConneX is to offer real-time insights that help our customers optimize their equipment or processes, reduce cost, or prevent incidents, to ultimately improve our customers business. This real-time requirement has always been a major technical challenge to scale within our SaaS environment. FacilityConneX has looked to Druid to help solve many of these challenging performance and growth issues.
Druid powers our applications related metrics and helps us slice and dice analytics on both historical and realtime-time metrics. It significantly reduces latency of analytic queries and help people to get insights more interactively. Through druid data we can now do anomaly detection as well.
Finin is India’s first ever consumer-facing neobank. They're using Apache Druid as a backend for analytics, helping the company with clickstream analytics, user activity and behaviour as well as application performance management.
At Flipkart, business and analytics teams need constant visibility into how the users are interacting with the platform across all channels – website, mobile apps and m-site. They are using Druid to power their real-time clickstream analytics reports that enable their business teams to make critical decisions based on current traffic patterns and plan automated as well as manual interventions for traffic shaping.
Flurry is a mobile app analytics platform that empowers product, development and growth experts to build better apps that users love. Mobile data hits the Flurry backend at a huge rate, updates statistics across hundreds of dimensions, and becomes queryable immediately. Flurry harnesses Druid to ingest data and serve queries at this massive rate.
- Druid @ Flurry (Eric Tschetter), Sift Science [YouTube], 16 Dec 2016.
- Enrich API Brings Higher Match Rates with Multi-Field Enrichment Capabilities (Ken Michie), Full Contact, 27 Jun 2019.
Fyber is an app monetization company, developing a next-generation monetization platform for app game developers. Fyber uses Apache Druid for customer data analysis.
- Why GameAnalytics migrated to Apache Druid (Ramón Lastres Guerrero), Imply blog, 14 Feb 2019.
Glia uses Druid to provide application usage insights to its clients.
Druid powers a dashboard used internally to visualize real-time analytics on GumGum's Real Time Bidding platform. GumGum runs Druid on Amazon EC2 and can ingest up to 300,000 events per second at peak time.
- Lambda Architecture with Druid at GumGum (Vaibhav Puranik), WhyNoSql [Wordpress], 6 Nov 2015.
- Optimized Real-time Analytics using Spark Streaming and Apache Druid (Jatinder Assi), GumGum Tech Blog [Medium], 2 Jun 2020.
Hawk is the first independent European platform to offer a transparent and technological advertising experience across all screens.
Hawk's customers harness the power of Imply Druid to access an all-in-one solution to follow their campaign KPIs throughout their lifecycle; build their business dashboards and visualize the evolution of their KPIs; set up tailored reports; and accelerate the Time-To-Market thanks to a simplified data pipeline with Druid directly connected to their Kafka cluster and RDS database.
- Data Revolution at Hawk Powered by Imply (Laura Manzanilla), Imply blog, 12 Oct 2021.
Hexaglobe uses Druid for network and CDN analytics, and gleans intelligence from video services metrics for engineering and marketing purposes.
- How Druid can Help Analyse Video Services Metrics for QoS & Marketing Purposes (Pierre-Alexandre Entraygues, Anthony Courcoux ), Imply [YouTube], 10 Dec 2021.
Druid powers Hubvisor realtime insights for their customer programmatic advertising auctions.
At Hulu, we use Druid to power our analytics platform that enables us to interactively deep dive into the behaviors of our users and applications in real-time.
ICSOC Co., Ltd.
ICSOC Co., Ltd. is the leading call center service provider in China. In our big data department, as traditional ways cannot meet our requirements, eventually we found that Druid, designed in lambda, fully satisfies our requires. It simplified developing statistics or real-time products and highly speeded up OLAP queries.
Ibotta is a free cash back rewards and payments app. The company has partnered with more than 1,500 brands and retailers to pay consumers cash for their purchases. Ibotta has paid out over $600 million in cash rewards to more than 35 million users since its founding in 2012.
Imply uses Druid to power public demos and to power our internal monitoring and metrics.
- Pivot: An Open Source Data Exploration UI for Druid (Vadim Ogievetsky), Imply blog, 26 Oct 2015.
- A Tour Through the "Big Data" Zoo (Fangjin Yang), Imply blog, 4 Nov 2015.
- Architecting Distributed Databases for Failure (Fangjin Yang), Imply blog, 10 Dec 2015.
- Building a Streaming Analytics Stack with Apache Kafka and Druid (Fangjin Yang), Confluent blog, 14 Jun 2016.
- Compressing Longs in Druid (David Li), Imply blog, 7 Dec 2016.
- Securing Druid (Jon Wei), Imply blog, 13 Sep 2018.
- Druid query view: An elegant SQL interface for a more civilized age (Margaret Brewster), Imply blog, 16 Oct 2019.
Inke is a online live company which principally engaged in online live and technology. We use Druid for business intelligence (BI) analytics.
Inmobi is a mobile advertising and discovery platform. We use Druid majorly for internal realtime reporting and analysis. We also use Caravel backed by Druid, which allows users to build interactive dashboards. Apart from that, we use Druid as a Datastore for faster ingestion of large amount of data and to query this data at sub second latencies.
Innowatts uses Druid to quickly work with massive data sets, and differentiates its platform with the intended direction to read meters and forecast usage on the fly.
- Community Spotlight: Innowatts provides AI-driven analytics for the power industry (Daniel Hernandez, Matt Sarrel), Imply blog, 30 Sep 2020.
At ININ we're using Druid within a Lambda architecture to drive cloud based call center analytics. Many of our realtime dashboards, downloadable reports, and public APIs utilize Druid on the backend.
At ironSource, we use Druid to power the real-time capabilities of our in-app bidding solution, which programmatically auctions off publisher ad impressions to a range of ad sources. Using Druid enables us to process and query billions of data points in real time.
- Performance Tuning of Druid Cluster at High Scale (Elad Eldor), Imply [YouTube], 10 Dec 2021.
Italiaonline exploits Druid for Internet trends and analytics management inside its new Data-Driven Contents Management System. Italiaonline is the first Italian internet company, with the two most visited web portals, Libero and Virgilio, and the most used email service of the country @libero.it. Italiaonline features 16.8 million unique users per month*, 4.8 billion impressions per month**, 10.2 million active email accounts** and a 58% active reach*.
* Source: Audiweb View, powered by Nielsen, TDA 2H 2015; ** Internal data, December 2015
Jolata leverages Druid as the analytics data store for the realtime network perfomance managment platform. Injesting over 35 billion events per day, Jolata calculates a billion metrics every minute to visualize precise network metrics in real-time, and enable operators to quickly drill down and perform root cause analysis.
- Realtime Analytics Powered by Druid (Kiran Patchigolla), Medium, 11 Aug 2015.
Thanks to Druid, Kering empowers its sales teams (more than 20,000 people worldwide) to quickly understand their sales performance, compare them to their objectives and identify new sales opportunities within their customer portfolio.
Kering was already providing data on demand via a search engine (Apache Solr), and the implementation of Apache Druid has enabled us to enrich the offer by :
- Adding access to real-time updated KPIs (based on streaming ingestion)
- Providing a sub-second response time
- Offering linear scalability that allows the service to be deployed to tens of thousands of end users
- Allowing massive data updates in a very short period of time (using parquet and mapReduce files) without any impact on the quality of service
KT NexR is the leading provider of the data analytics platform. We use Druid as a real-time analysis tool to help our customers to analyze multidimensional data in interactive ways.
LDMobile is a mobile DSP for the RTB. We use Druid to aggregate some metrics in order to propose to our customers a real-time dashboard showing performance indicators of their campaigns.
Libra AI Technologies & PaperGo
EMA (Explainable Marketing Analytics) is a marketing analytics assistant powered by Druid. She helps marketeers and ecommerce owners leverage users behavioral data and make data driven decisions. The assistant has been implemented on top of PaperGo loyalty platform and LibraAI proprietary machine learning innovations and marketing analytics.
LifeBuzz is a popular web property that serves tens of millions of pageviews per day. We use Druid for all our advanced analytics needs, including in-house analysis and professional realtime analytics for our sponsored media partners.
LiquidM uses Druid for real-time drill-down reporting. LiquidM is also contributing back to the community by creating and maintaining a ruby client library for interacting with Druid located at http://github.com/liquidm/ruby-druid.
Lohika uses Druid to run analytics over large amounts of event-based data. They are using real-time ingestion with Apache Kafka to be able to better respond to changes in advertising campaigns. They are managing their cluster with Kubernetes.
- Apache Druid: Interactive Analytics at Scale (Volodymyr Iordanov), Lohika website, 18 May 2021.
- Streaming SQL and Druid (Arup Malakar), SF Big Analytics [YouTube], 30 Aug 2018.
Lynx Analytics helps B2B clients unlock social, behavioral and experience insights on hundreds of millions of customers. We build interactive dashboards powered by Druid to make the data science results easy to digest. Our clients love the flexibility and being able to explore the data in detail. The speed at which they can do that is in a big part thanks to Druid!
- Data Insights Engine @ MakeMyTrip (Aditya Banerjee), MakeMyTrip [Medium], 4 Jan 2019.
MAKESENS use Druid to store and analyze battery data.
Marchex uses Druid to provide data for Marchex Call Analytics' new customer facing Speech Analytics dashboards. Druid's fast aggregation is critical for providing deep insights into call performance for its customers, enabling them to spot trends and improve performance of both marketing campaigns and call centers.
In 2011, Metamarkets originally created Druid to power its SaaS analytics platform. The technology platform processes trillions of real-time transactions monthly, across thousands of servers. With Druid, Metamarkets is able to provide insight to its customers using complex ad-hoc queries at a 95th percentile query time of less than 1 second.
- Introducing Druid: Real-Time Analytics at a Billion Rows Per Second
- Scaling the Druid Data Store
- Metamarkets Open Sources Druid
- Druid Query Optimization with FIFO: Lessons from Our 5000-Core Cluster
- Distributing Data in Druid at Petabyte Scale
Druid is the Millersoft platform of choice for operational analytics. The seamless integration of real time and historical data sets within Druid has been a real boon for our data processing clients. The flexibility of the API via Rest/SQL/Native against a single data source also means that our dashboards, ad-hoc queries and pivot tables are all consistent. The ability to drill down in Druid to the transactions underpinning the aggregations also means that we can reconcile the results directly against operational systems. Druid destroys legacy database cubes at the end of long data integration pipelines.
Mindhouse streams data from Apache Kafka into Druid and runs dashboards as well as ad-hoc SQL queries to gain insights from user behavior and pinpoint potential issues in their meditation app.
Druid has helped Mobiauto ingest and analyze millions of real time events generated on our online car sales website. We use this data to generate real time analytics used by the commercial team, and create APIs for our backend team that can be used to insert real time data to the website that helps our clients make informed decisions.
Druid is a critical component in Monetate's personalization platform, where it acts as the serving layer of a lambda architecture. As such, Druid powers numerous real-time dashboards that provide marketers valuable insights into campaign performance and customer behavior.
- Gone Monetate : Personalizing Marketing at 100K events/second
- Druid : Vagrant Up (and Tranquility!)
- Kinesis -> Druid : Options Analysis (to Push? to Pull? to Firehose? to Nay Nay?)
At mParticle we have deployed Druid across our entire reporting stack. Druid has enabled us to ingest over 8.5 billion events daily while supporting reliably low latency queries. We have also enabled our support team and customers even greater insight into their data by exposing (controlled) queries to Druid via Looker.
- Priyadarshi, Pushkar; Yurinok, Igor, Nepune, Bikrant. "Druid @ MZ." YouTube, uploaded by Imply, 5 Mar 2018, https://youtu.be/zCk2BV9mQ0Y
N3TWORK uses Druid for real-time analysis of its Internet of Interests social entertainment network. It uses Druid analytics both to optimize user experiences and to guide the evolution of its product.
Netflix engineers use Druid to aggregate multiple data streams, ingesting up to two terabytes per hour, with the ability to query data as its being ingested. They use Druid to pinpoint anomalies within their infrastructure, endpoint activity and content flow.
- Merlino, Gian; Herman, Matt; Pasari, Vivek; Jain, Samarth. "Druid @ Netflix". YouTube, uploaded by Netflix Data, 14 Nov 2018. https://youtu.be/Qvhqe4yUKpw
- Netflix & Amazon Case Study
- Announcing Suro: Backbone of Netflix's Data Pipeline
- How Netflix uses Druid for Real-time Insights to Ensure a High-Quality Experience
Netsil is an observability and analytics company for modern cloud applications. The Netsil Application Operations Center (AOC) uses Druid for real-time queries on sharded data along with support for dynamic and multi-valued attributes. The AOC processes live service interactions in large-scale production applications and also stores massive amounts of historical metrics data. Druid was able to support these and several other AOC requirements allowing the AOC to be scalable and fault-tolerant.
You can learn more about the AOC at http://netsil.com/download/
Nielsen (Nielsen Marketing Cloud)
Nielsen Marketing Cloud uses Druid as it's core real-time analytics tool to help its clients monitor, test and improve its audience targeting capabilities. With Druid, Nielsen provides its clients with in-depth consumer insights leveraging world-class Nielsen audience data.
- How Nielsen Marketing Cloud Uses Druid for Audience and Marketing Performance Analysis
- Counting Unique Users in Real-Time: Here’s a Challenge for You!" (Yakir Buskilla, Itai Yaffe) YouTube, uploaded by DataWorks Summit, 1 Apr 2019.
- Buskilla, Yakir; Yaffe, Itai. "Counting Unique Users in Real-Time: Here’s a Challenge for You!" SlideShare, uploaded by DataWorks Summit, 1 Apr 2019. https://www.slideshare.net/Hadoop_Summit/counting-unique-users-in-realtime-heres-a-challenge-for-you-139142580
- Data Retention and Deletion in Apache Druid
Nodex uses Druid in its real time analytics pipeline to deliver insights and analytics across a wide range of recruitment CRM software, Job boards and recruitment career portals. We ingest large amounts of data and needed something capable of real time metrics to offer KPI's to our clients to allow them to make better business decisions based on actual usage data.
At Noon – The Social Learning Platform, on a daily basis we process close to 100M audio and sketch samples from more than 80K students to help measure the voice and sketch quality of our online classrooms. We built a real time analytics platform on Apache Druid and Apache Flink to provide realtime feedback on classroom quality & engagement metrics.
OCEAN-IT is a company established in S.Korea 2008, and is playing a role in system & database construction, operation in the public IT sector. We have received a service request from Korea Electric Power Corporation (KEPCO), and have been performing system and data purification work. In particular, OCEAN-IT are carrying out the business at the request of KEPCO, collecting real-time sensor databases such as Boiler, Turbine, and Generator as raw data from four power plants in Korea, and refining them for real-time inquiry and analysis. The raw data collected is 40,000 cases per second, which can be estimated to be 1.15 billion cases per day, and these data are implemented to be searchable within 0.1 second of response speed.
Ona https://ona.io is a software engineering and design firm based in Nairobi, Kenya and Washington, DC. Our mission is to improve the effectiveness of humanitarian and development aid by empowering organizations worldwide with collective and actionable intelligence. We use Druid to power dashboards and disseminate global health and service delivery data pulled from diverse sources.
OneAPM http://oneapm.com is an IT service compmay focusing on Application Performance Management (APM). In OneAPM, Druid is used to power clients' interactive queries on performance data collected from their applications in realtime.
Oppo is one of the world largest mobile phone manufacturer. Druid is used to realtime data analyze.
Optimizely uses Druid to power the results dashboard for Optimizely Personalization. Druid enables Optimizely to provide our customers with in-depth, customizable metrics in real time, allowing them to monitor, test and improve their Personalization campaigns with greater ease and flexibility than before.
The Druid production deployment at PayPal processes a very large volume of data and is used for internal exploratory analytics by business analytic teams. Here is what they have to say:
Around early Feb, 2014, the Paypal Tracking Platform team, lead by Suresh Kumar, stumbled upon an article talking about a new upcoming kid in Real Time Analytics world. After first glance it seemed just like any other new cool looking technology. But after reading little deeper into the papers(they had referred) and few blogs, it was clear it is different. The fundamental approach to query the data itself looked very different and refreshing.
Coincidently, at the same time, the team was struggling to create a very high volume real-time data query system. We had already explored Drill, Hive, Cassandra, TSDB, Shark etc. Dating back at least a year, none of these technologies were fulfilling our low latency needs for very high volumes of data.
So, as an option we started the Druid prototype and within couple of weeks it was looking like a very promising alternate. Very soon with great help from Core Druid development team our prototype was doing great.
We then started the prototype with large 7-10 billion records and see the response time for query. It was quite amazing.
Today our Druid implementation in PayPal processes a very large volume of Data and is used for our internal exploratory analytics by business analytic teams.
The thing we liked the most was amazing support provided by core Druid team. I have never seen a Open Source Community providing such a very high level of responsiveness for ANY issue related to Druid setup and tuning.
PayU is a payment service provider. We have several services that publish our transactional data into Kafka. We transform the data using Kafka streams and write it flattened and enriched to a different topic. Kafka indexing service then reads the data from the topic and writes it to Druid. We provide dashboards to our merchants that refresh in real-time.
Play Games24x7 offers multiple casual as well as real money cash games to its users. To maintain a healthy environment for their players, they constantly look out for any elements that could lead to a win by unethical means in the game. They are using Druid to identify those users who follow this set of patterns, and mark them as fraudulent.
The Pollfish real-time analytics platform enables researchers to analyze survey data from over half-a-billion consumers across the world, informing advertising, branding, market research, content, and product strategies. Druid is used alongside Apache Kafka, Spark, Flink, Akka Streams, Finatra / Finagle microservices, Cassandra, PostgreSQL, Hive, and Scruid – an open source Scala library for Apache Druid.
Poshmark uses Druid for reporting to monitor key business metrics and data exploration in real-time.
Apache Superset was built by Maxime Beauchemin when he was at Airbnb, to visualize data from Apache Druid. Druid enables highly interactive and responsive Superset dashboards, massively shortening the time from question to answer.
PubNative uses Druid for its real-time reports and analysis of millions of daily ad views, clicks, conversions and other events.
Quantiply uses Druid for its feature learning, and governance layer for Compliance and Regulatory (CoRe) platform.
Raygun is a full stack software intelligence platform that monitors your applications for errors, crashes and performance issues. They use Druid to complete complex queries with large data sets. Having seen similar stories from others, including those who invested in other technologies such as the Hadoop ecosystem or key-value stores, Raygun began researching other more purpose-built analytics databases. Druid happens to be the one they settled on, as it offers several nice properties one wants in an analytics database.
We have successfully deployed Druid at Razorpay for our use cases and see continued growth in its footprint. We were able to achieve p90, p95 values of less than 5s and 10s respectively and our dashboard performances have improved by at least 10x when compared to Presto.
redBorder is an open source, scale out, cybersecurity analytics platform based on Druid. We hope its full-blown web interface, dashboard and report systems, and ready-to-use real-time pipeline foster other Druid users to create a strong community around it. To see more, please visit redborder.org
Druid is a critical component in our advertising infrastructure, where it serves as the backend for our external reporting dashboards.
Retargetly is a Data Management Platform that enables publishers and advertisers to manage their first party user data, mix it with second and third party data from others providers and activate it into advertising campaigns (direct, programmatic, etc.). Druid enables us to show real time audience insights. It also provides a lot of flexibility on ad-hoc queries with low latency. We provide default graphs and metrics to our clients but they also have the possibility to make their own interactive queries on real-time.
Rill Data uses Druid to power its truly elastic, fully managed cloud service. Rill uses Druid to deliver operational intelligence to business stakeholders with zero DevOps overhead. Rill's team operates the first and longest continuously running Druid service.
We introduced Druid to our technology stack while developing our new supply chain & logistics visibility and intelligence solution. Combining Druid with other data storage, streaming & processing platforms & solutions for the IoT sensor data and 3rd party data helped us design an effective solution for our customers worldwide.
We use Druid as the low-latency backend for pivot & time-series dashboards. We typically feed data to Druid in daily batches. Data is used at Rovio to continually improve our games and provide incredible experiences for the millions of players who play our games every day.
Rubicon Project is the world’s largest independent sell-side advertising platform that helps premium websites and mobile apps sell ads easily and safely. Rubicon Project’s flagship reporting platform, Performance Analytics, is built on Apache Druid and is used by thousands of companies across the globe. Performance Analytics processed more than two trillion events per day during Black Friday. Druid delivers analytics from this data with an average response times under 600 ms, helping publishers maximize the yield of their digital ad inventory across the open internet and builds on our commitment to trust and transparency in the programmatic marketplace.
Sage + Archer
We are using Druid as our single source of truth for both realtime statistics and data analysis. Our clients can see and filter detailed metrics in our self-service DSP. The DSP is used for mobile and digital-out-of-home advertising campaigns. Druid is powering both the front-end and the optimization algorithms within the system.
Salesforce's Edge Intelligence team uses Apache Druid as a real-time analytical database to store application performance metrics extracted from log lines. Our customers within Salesforce, including engineers, product owners, customer service representatives, etc., use our service to define the way they want to ingest and query their data and obtain insights such as performance analysis, trend analysis, release comparison, issue triage, and troubleshooting. We chose Druid because it gives us the flexibility to define pre-aggregations, the ability to easily manage ingestion tasks, the ability to query data effectively, and the means to create a highly scalable architecture.
Scalac migrated to Apache Druid and Apache Kafka (from Apache Cassandra and Apache Spark) for their Blockchain Observability application. The analytical dashboard they created is used to visualize the data from different blockchains. It is built on React with Hooks, Redux, Saga on the front side – and Node.js, Apache Druid, and Apache Kafka on the back-end.
- How We Reduced Costs and Simplified Solution by Using Apache Druid (Adrian Juszczak), Scalac website, 4 Feb 2020.
At Shopee, we provides a sub-second OLAP engine service that supports a vast array of business needs, including user behavior analysis, product recommendation, sales data analysis, brand analysis, network performance analysis, core metrics analysis, advertising revenue analysis, application trajectory analysis, cross-border e-commerce analysis, content recommendation, and more. At the mean time, by upgrading to a cloud-native architecture, evolving the Druid core, and refining the surrounding ecosystem, we continuously enhance the service's stability, performance, security, and usability, while also reducing labor and machine costs.
- Apache Druid in Shopee (in Chinese). Published on 2022-01-13 by yuanlihan, a contributor of Apache Druid.
- Apache Druid cloud native architecture evolution (in Chinese). Published on 2022-08-18 by Benedict Jin, a Committer and PMC Member of Apache Druid.
Shopify uses Druid, along with Apache Flink and Airflow, to create an interactive analytics platform for realtime sales data with enormous volume and variety.
Sift Science provides an online trust platform that online businesses use to prevent fraud and abuse. We use Druid as a tool to gain real-time insights about our data and machine learning models.
SigNoz is an open source observability platform. SigNoz uses distributed tracing to gain visibility into your systems and powers data using Kafka (to handle high ingestion rate and backpressure) and Apache Druid (Apache Druid is a high performance real-time analytics database), both proven in industry to handle scale.
Druid powers aggregations after slicing and dicing of high-dimensional trace data.
Weibo Advertising Platform deploys Druid as realtime data tool for online advertising analytics and business intelligence(BI). Druid processes TBs of realtime data per day with latency in one minute.
Weibo UVE(Unified Value Evaluation) team of Advertising Platform is using Druid as the realtime analysis tool of the data insight system, which processing billions events everyday.
Druid is the primary data store used for ad-hoc analytics in Singular, enabling our customers to generate insights based on real-time and historical data.
SK Telecom is the leading telecommunication and platform solution company. Druid enable us to discover the business insight interactively from telecommunication, manufacturing big data.
- Apache Druid on Kubernetes: Elastic scalable cloud-based system for real-time OLAP on high velocity data
Skimlinks is the leading commerce content monetization platform. Its technology automatically monetizes product links in commerce-related content.
Skyport Systems provides zero-effort, low-touch secure servers that help organizations to rapidly deploy and compartmentalize security-critical workloads. We use Druid as part of our analytics backend to provide real-time insight to our customers about their workload behavior.
Smart Bid is a unique marketing solution platform empowering advertising teams. A one-stop shop taking advantage of proprietary technology to analyze and reach the right audience with the right creative at the right time. We use Druid to gain real-time insights on real time bidding using our machine learning algorithms.
Smyte provides an API and UI for detecting and blocking bad actors on the internet. Druid powers the analytics portion of our user interface providing insight into what users are doing on the website, and specifically which features are unique between different sets of users.
Societe Generale, one of Europe's leading financial services groups and a major player in the economy for over 150 years, supports 29 million clients every day with 138,000 staff in 62 countries.
Within the Societe Generale IT department, Apache Druid is used as Time Series Database in order to store performance metrics generated in real-time by thousands of servers, databases, middlewares. These data are stored in multiple Druid clusters in multiple regions (+840 vCPUs, +7000GB of RAM, +300 billions of events) and are used for many purposes, such as dashboarding and predictive maintenance use cases.
We went through the journey of deploying Apache Druid clusters on Kubernetes and created a druid-operator. We use this operator to deploy Druid clusters at Splunk.
Streamlyzer uses Druid as a next generation online video analytics for online video companies or publishers. Streamlyzer is gathering information from real end-users of our customers and provides visualized real-time analytics in dashboard showing how video contents are delivered and how end-users are experiencing the streaming service.
Sugo is a company that focus on realtime multi-dimension analytics and mining on big data. We build our platform based on Druid, and developed our own extensions to make it more powerful.
SuperAwesome’s mission is to make the internet safer for kids. At the core of SuperAwesome’s analytics is Apache Druid, which helps us relay key insights and data points back to our stakeholders and customers, as well as use this data to power our products themselves. This all happens in a kid-safe way and enables us to deliver the best level of service to billions of children and teens every month.
- How we use Apache Druid’s real-time analytics to power kidtech at SuperAwesome
- Virtual Apache Druid Meetup featuring SuperAwesome
Sweet Couch was a place to discover unique products which are buyable online. Druid powered Sweet Couch harvest which was an open analytics platform for tracking performance of online shops based out in India. All end user events were tracked and analysed using Druid for business insights.
We are providing machine-learning driven, anomaly detection services to our operations teams, using Druid to cover all our storage and query needs. These anomalies currently help engineers during maintenance windows to correct for possible outages before they even happen. Our Dataset includes over 3000 core-network devices and over double that amount in wireless transmission equipment. We ingest and process over 25 million records every minute, but we’re just getting started with onboarding our platforms and services on Druid.
TalkingData is China’s largest independent Big Data service platform. TalkingData uses Druid with Atomcube, an extension for enhancement, to power analyze online application and advertising data.
Apache Druid’s speed and flexibility allow us to provide interactive analytics to front-line, edge-of-business consumers to address hundreds of unique use-cases across several business units.
TelemetryDeck helps app and web developers improve their product by supplying immediate, accurate usage data while users use their app. And the best part: It's all anonymized so users' data stays private!
Tencent SCRM product use Druid for customer behavior analysis.
Time Warner Cable
TWC uses Druid for exploratory analytics.
TrafficGuard detects, mitigates and reports on ad fraud before it hits digital advertising budgets. Three formidable layers of protection block both general invalid traffic (GIVT) and sophisticated invalid traffic (SIVT) to ensure that digital advertising results in legitimate advertising engagement. With visibility of traffic across thousands of campaigns, TrafficGuard’s machine learning can identify emerging patterns, trends and indicators of fraud quickly and reliably.
Druid is a key component of our Big Data Operation Insight Platform.We use imply.io Learn more about TrafficGuard’s comprehensive fraud mitigation at (https://www.trafficguard.ai)
Trendyol.com - Alibaba Group Company
Trendyol, which is the largest e-commerce company in Turkey, uses Druid for real-time analytics. It is mostly used for providing insights to their suppliers.
TripleLift uses Druid to provide insights into performance aspects of its native programmatic exchange for sales/business development opportunities, and to provide reporting used by advertisers and publishers.
TripStack uses Druid to help gain business insight into the tens of billions of flight combinations that TripStack processes every day.
Triton Digital is the global technology and services leader to the audio streaming and podcast industry. Operating in more than 80 countries, Triton Digital provides innovative technology that enables broadcasters, podcasters, and online music services to build their audience, maximize their revenue, and streamline their operations.
Triton Digital uses Druid to power their programmatic analytics (Rill Data, Oct 2021).
TrueCar is a leading automotive digital marketplace that enables car buyers to connect to our nationwide network of Certified Dealers. TrueCar uses Druid and Imply Cloud to help them make their dashboards real-time, detect anomalies, and do so while minimizing engineering and operational overhead. Druid enables TrueCar to unlock insights from digital interaction data, further empowers their data scientists and product teams to improve services with increased agility, and deliver a higher quality experience.
Trustpilot used Apache Druid to create an interactive application based on D3 that their entire company uses to put real time numbers behind every business decision. They replaced siloed applications built with Looker, Amplitude and others with Druid tp better respond to requests and provide more functionality to their end users.
In order to continue empowering decision making as Twitch scaled, we turned to using Druid and Imply to provide self service analytics to both our technical and non technical staff allowing them to drill into high level metrics in lieu of reading generated reports.
Unity's monetization business generates billions of in-game events in a multi-sided marketplace, which creates complexity, slowness, and overhead for reporting. To work around these issues, Unity deploys a Kafka, Spark, and Druid-based ingestion and aggregation pipeline.
Verizon’s network analytics platform leverages Druid as a real-time analytics engine to enable interactive analytics and performance metrics, support use cases like traffic capacity management using Netflow and network statistics, and provide a feature store for machine learning and service key performance indicators to monitor and quantify the health of Verizon’s global networks. We chose Druid because it enables us to achieve our mission with sub-second latency on large datasets.
At VideoAmp, Druid is a key component of our Big Data stack. It powers our real-time video advertising analytics at low granularity and huge scale. Druid has helped us minimized the time between event, insight and action.
Vigiglobe turns the noise of Social Media into real-time Smart Content. To this end, Druid enables us to maintain high request throughput coupled with huge data absorption capacity.
ViralGains uses Druid for real-time analysis of millions of viral video views, shares, and conversations.
Druid powers Virool’s real time analytics of over 1 billion raw events per day. We query this data to gain a deep understanding of all of our inventory sources, from exchanges to direct partners, everything is available with lightning fast query times. Druid puts the power and flexibility of big data in each of our Viroolian’s hands.
Vserv is the leading authentic data platform for mobile marketing in India. We have successfully implemented Druid for analysis of digital marketing campaign data.
WeQ is a leading mobile performance and branding ad-tech company headquartered in Berlin. Druid is one of the core services that helps WeQ drive user acquisition and engagement to deliver mobile performance and branding campaigns.
We are using Druid for real-time analytics that delivers business insights and power our anti-fraud tools. We are ingesting several billions of events per day using the Kafka real-time connector. Druid is a rock-solid and flexible foundation for our data system delivering blazing-fast analytics
WhyLabs is on a mission to build the interface between humans and AI applications. Our Data and AI Observability platform enables teams to monitor all of their ML models, regardless of scale, with zero configuration necessary. Druid allows WhyLabs to combine stream and batch processing to interactively explore and visualize statistically profiled datasets.
- Migrating Our ML Insights Platform To Druid (Drew Dahlke), Imply [YouTube], 10 Dec 2021.
We're serving pageview data via Druid and Pivot. Our internal customers are loving it and we're working on allowing public access to sanitized data, both editing and pageview. We like Druid because it's open source, the folks that work on it have built a good community, and it's about five times faster than Hive for us right now, without any tuning or optimization (and the Hadoop cluster is beefier than the Druid one), just dumped lots of data into it and Pivot was immediately useful to our analysts. We wrote a puppet module that others might find helpful.
Wipro Limited is an Indian multinational corporation that provides information technology, consulting and business process services. Wipro is using Druid to track and monitor the performance of internal applications and to gain insights from data in real-time.
Wiremind develops solutions for optimizing sold capacity (passenger transportation, airfreight and event/sport ticketing) that blend UX, software and data science.
We process thousands of millions of rows of data to serve our machine learning models and ingest tens of millions of rows by batch indexation each day. We chose Druid because it gives us the ability to query data effectively, the capacity to manage ingesting tasks easily, and the capability to split data for our diverse clients while having a highly scalable architecture for our growth.
Xiaomi uses Druid as an analytics tool to analyze online advertising data.
Yahoo uses Druid to power various customer-facing audience and advertising analytics products.
- Druid Ecosystem at Yahoo
- Complementing Hadoop at Yahoo: Interactive Analytics with Druid
- Combining Druid and DataSketches for Real-time, Robust Behavioral Analytics
YeahMobi uses Druid to power a dashboard used for ad-tech analytics such as impression and conversion tracking, unique IP statistics, and aggregating metrics such as costs and revenues.
Yieldr uses Druid to power real-time web and mobile analytics for airlines across the globe.
Youku Tudou employs Druid for real-time advertising analysis of huge volumes of data.
China Youzan is a SaaS company which principally engaged in retail science and technology. We use Druid for business intelligence (BI) analytics and application performance management (APM) metrics
Zapr is leveraging Druid to analyze TV viewership data and powering analytical dashboards to report real time user behavior.
Zhihu is a Chinese question-and-answer website. In Zhihu, Druid is used to power clients' interactive queries, data reports, A/B testing and performance monitoring. Almost 1T per day data is ingested into druid cluster, and we are strongly depending on thetaSketch aggregator for computing cardinality and retention, looking forward to more improvement on DataSketch.
Zilingo's data collection infrastructure, processing pipeline and analytics stack is based on Druid. Data is collected from various IOT devices/sensors, mobile and tablet devices and 3p data and is streamed in near realtime. This allows our customers to get a view of the supply chain with thousands of data points via dashboards, reports and the ability to slice and dice data.
Zuoyebang is the most used K12 education platform, 7 out of every 10 K12 users are using Zuoyebang. At Zuoyebang Data Platform Group, we use the Druid in the advertising scene, mainly related to advertising display, click, billing, and other functions. The performance and timeliness of druid can meet our OLAP queries very well.