In the earlier version of the Druid, the S3 extension for deep storage was using the jets3t library under the hood. But in the newer versions, it moved from jets3t to the native aws library. In this post, I will discuss about how to configure the newer version of Apache Druid to use Minio as the deep storage using the new S3-extension.
Druid relies on a distributed filesystem or binary object store for data storage. The most commonly used deep storage implementations are S3 (popular for those on AWS) and HDFS (popular if you already have a Hadoop deployment). In this post, I will show you how to configure non-Amazon S3 deep storage for druid cluster. And for this, I will use Minio as S3 deep storage for druid cluster.
Minio is a high performance distributed object storage server, designed for large-scale private cloud infrastructure. Amazon S3 API is the de facto standard for object storage. Minio implements Amazon S3 v2/v4 API. It is best suited for storing unstructured data such as photos, videos, log files, backups and container / VM images. Size of an object can range from a few KBs to a maximum of 5TB.