Druid supports “multi-value” string dimensions. These are generated when an input field contains an array of values instead of a single value. topN and groupBy queries can group on multi-value dimensions. When grouping on a multi-value dimension, all values from matching rows will be used to generate one group per value. It’s possible for a query to return […]
In my previous article, I have demonstrated how to perform a batch file load, using Druid’s native batch ingestion. And I have only shown handling of root level elements of json and I have intentionally skipped the nested elements of json. That’s because nested json needs special handling for ingestion into Druid, they need to […]
In case of time-series events data in a relational database, stored one event per row, If we need to calculate the number of events per hour, we’d select all rows within an overall interval, group those rows by hour, and count the rows in each hour group. If we have to perform this query many […]
In my last few posts, I have discussed about Druid cluster setup. Zookeeper is required for druid as an external dependency. In my upcoming posts, I will discuss about Apache Kafka which also requires Zookeeper as a dependency.
ZooKeeper is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services.
For better reliability and high availability of the Zookeeper service, we should set up Zookeeper in cluster mode. In this post, I will discuss how to setup a Zookeeper cluster with 3 nodes.
Druid relies on a distributed filesystem or binary object store for data storage. The most commonly used deep storage implementations are S3 (popular for those on AWS) and HDFS (popular if you already have a Hadoop deployment). In this post, I will show you how to configure non-Amazon S3 deep storage for druid cluster. And for this, I will use Zenko CloudServer (formerly S3 Server) as S3 deep storage for druid cluster.
CloudServer is an Amazon S3-compatible object storage server that lets you build and integrate your apps faster and store your data anywhere. It was formerly known as Scality S3 Server. CloudServer helps to build and integrate S3-based applications faster, and store your data anywhere. It is an open-source object storage project to enable on-premises S3-based application development and data deployment choice.
The Metadata Storage is an external dependency of Druid. Druid uses it to store various metadata about the system, but not to store the actual data. There are a number of tables used for various purposes.
Derby is the default metadata store for Druid, however, it is not suitable for production. MySQL and PostgreSQL are more production suitable metadata stores. Druid provides extensions to support MySQL and PostgreSQL as metadata storage. But there may be cases, where you already have a Microsoft SQL Server installed in your local or production environment. So it is not feasible to deploy one additional cluster of MySQL/PostgreSQL if you can use your existing sql server as the metadata storage. Druid has a wonderful community and they have provided an extension to support this out of the box. In this post, I will show you how to configure druid to use SQL Server as the metadata storage.