Druid stores data in segments. Each segment is a single file, typically comprising up to a few million rows of data. Because there is some per-segment memory and processing overhead, it can sometimes be beneficial to reduce the total number of segments. This tutorial demonstrates how to compact existing segments into fewer but larger segments using Druid compaction task.
Once you ingest some data in a dataSource for an interval and create Druid segments, you might want to make changes to the ingested data. For example, if you want to add or remove columns from your existing segments, or you want to change the rollup granularity of your segments, you will have to reindex your data. Kafka Indexing Service may produce a number of segments based on topic partition and granularity configurations. So you need to reindex data to reduce the number of segments. All of these can be done by reindexing the data using Hadoop batch ingestion or native batch ingestion. In this article, I will demonstrate how to reindex data in Druid using the native batch ingestion.
One of the most popular trends in the data world is the stream analytics. Organizations are increasingly striving to build solutions that can provide immediate access to key business intelligence insights through real-time data exploration. Using Apache Kafka and Druid we can easily build an analytics stack that enables immediate exploration and visualization of event data. This tutorial demonstrates how to load data streams from a Kafka topic to Druid, using the Druid Kafka indexing service.
In this article, I am going to demonstrate how to load data into Druid from TSV file, using Druid’s native batch ingestion using TSV ParseSpec. I assume you already have a good understanding of Druid architecture and have Druid installed and running. If not, see my previous post to quickly install and run Druid. TSV ParseSpec TSV ParseSpec […]
In this article, I am going to demonstrate how to load data into Druid from CSV file, using Druid’s native batch ingestion using CSV ParseSpec. I assume you already have a good understanding of Druid architecture and have Druid installed and running. If not, see my previous post to quickly install and run Druid using […]
Druid supports “multi-value” string dimensions. These are generated when an input field contains an array of values instead of a single value. topN and groupBy queries can group on multi-value dimensions. When grouping on a multi-value dimension, all values from matching rows will be used to generate one group per value. It’s possible for a query to return […]