Blog Detail

preview image Programming
by Anurag Srivastava, Aug 29, 2018, 7:15:06 PM | 4 minutes

Bucket Aggregation in Elasticsearch

Bucket aggregation is like a group by the result of the RDBMS query where we group the result with a certain field. In the case of Elasticsearch, we use to bucket data on the basis of certain criteria. In metrics aggregations, we can calculate metrics on a field while in the bucket we don't perform calculations but just create buckets with the documents which can be clubbed on the basis of certain criteria. In bucket aggregations, we can create sub aggregations.

There are different types of bucket aggregations but I will focus on some of the common bucket aggregations like term aggregation, range aggregation, filters aggregation, and filter aggregation, etc. So let's start.

Term Aggregation:
In term aggregation, we use to bucket the data in the form of unique field values. for example:

GET bqstack/_search?size=0
{
  "aggs": {
    "blog_categories" : {
      "terms" : {
        "field" : "category_name",
        "size" : 5
      }
    }
  }
}

In the above expression, we are creating the bucket on blog categories using term aggregation. I have used size to limit the number of the bucket to 5. Above expression will give the following result:

{
  "took": 32,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 55,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "blog_categories": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 19,
      "buckets": [
        {
          "key": "programming",
          "doc_count": 12
        },
        {
          "key": "devops",
          "doc_count": 9
        },
        {
          "key": "news",
          "doc_count": 8
        },
        {
          "key": "poetry",
          "doc_count": 5
        },
        {
          "key": "informational",
          "doc_count": 4
        }
      ]
    }
  }
}

In the same way, we can use term aggregation on any field to create the bucket with unique values for that field.

Range Aggregation:
Using range aggregation we can bucket the data using a certain range like in blogs we have different views and we can create range aggregation using blog views. By using the views fields we can bucket the data on a certain range. See the below example:

GET bqstack/_search?size=0
{
  "aggs": {
    "blog_categories" : {
      "range" : {
        "field" : "views",
        "ranges": [
          { "key":"Less popular", "to": 50 },
          { "key":"popular","from": 50, "to": 100 },
          { "key":"Most popular","from": 100, "to": 200}
        ]
      }
    }
    }
}

In the above expression, we are creating buckets on the basis of range aggregation where we are taking the views field and provided the criteria using which we want the bucket like from 0 to 50 views, 50 to 100 views and 100 to 200 views. Also, there is a key field using which we can provide a custom label for the range. Now let's see the result:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 55,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "blog_categories": {
      "buckets": [
        {
          "key": "Less popular",
          "to": 50,
          "doc_count": 25
        },
        {
          "key": "popular",
          "from": 50,
          "to": 100,
          "doc_count": 12
        },
        {
          "key": "Most popular",
          "from": 100,
          "to": 200,
          "doc_count": 6
        }
      ]
    }
  }
}

In the above result, we have three buckets with Less popular, popular and most popular blogs.

Filter Aggregation:
We use filter aggregation to narrow down the number of documents used for aggregation. As the filter is used to filter out the documents based on certain criteria and after applying the filter we can apply the aggregation. See below example:

GET bqstack/_search?size=0
{
  "aggs": {
    "blog_categories": {
      "filter": {
        "term": {
          "category_name.keyword": "DevOps"
        }
      },
      "aggs": {
        "avg_views": {
          "avg": {
            "field": "views"
          }
        }
      }
    }
  }
}

In the above expression first I have filtered the data with the category as DevOps and then applied the aggregation to get the average of blog views. We would get the following result after executing the above expression:

{
  "took": 1,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 55,
    "max_score": 0,
    "hits": []
  },
  "aggregations": {
    "blog_categories": {
      "doc_count": 9,
      "avg_views": {
        "value": 916.55
      }
    }
  }
}

In above query execution result, we are getting the average views for DevOps category. In this way, we can apply filter aggregation.


Other Blogs on Elastic Stack:
Introduction to Elasticsearch

Elasticsearch Installation and Configuration on Ubuntu 14.04
Log analysis with Elastic stack 
Elasticsearch Rest API
Basics of Data Search in Elasticsearch
Elasticsearch Rest API
Wildcard and Boolean Search in Elasticsearch
Configure Logstash to push MySQL data into Elasticsearch 
Metrics Aggregation in Elasticsearch
Bucket Aggregation in Elasticsearch
How to create Elasticsearch Cluster

If you found this article interesting, you can explore  "Mastering Kibana 6.0" and "Kibana 7 Quick Start Guide" to get more insight about Kibana and how we can configure ELK to create dashboards for key performance indicators.

About Author

Anurag Srivastava

Author | Blogger | Tech Lead | Elastic Stack | Innovator |

View Profile

Comments (0)

Leave a comment

Related Blogs

Basics of Data Search in Elasticsearch

Aug 4, 2018, 7:02:21 AM | Anurag Srivastava

Elasticsearch Rest API

Jul 31, 2018, 6:16:42 PM | Anurag Srivastava

Wildcard and Boolean Search in Elasticsearch

Aug 10, 2018, 7:14:40 PM | Anurag Srivastava

Metrics Aggregation in Elasticsearch

Aug 18, 2018, 6:02:20 PM | Anurag Srivastava

Introduction to Elasticsearch Aggregations

Aug 14, 2018, 4:47:56 PM | Anurag Srivastava

Create a Pie Chart in Kibana

Dec 24, 2018, 5:25:28 PM | Anurag Srivastava

Create word cloud in Python

Jun 30, 2018, 6:06:45 AM | Anurag Srivastava

Typecasting in PHP for short datatype

Jun 8, 2018, 8:03:52 AM | Lovish Sharma

Top Blogs

Configure SonarQube Scanner with Jenkins

Jun 21, 2018, 4:58:11 AM | Anurag Srivastava

Build and deploy Angular code using Python

Jun 26, 2018, 4:50:18 PM | Anurag Srivastava

Configure Jenkins for Automated Code Deployment

Jun 13, 2018, 3:44:01 PM | Anurag Srivastava

SonarQube installation on Ubuntu

May 12, 2018, 4:47:07 PM | Anurag Srivastava

Execute Commands on Remote Machines using sshpass

Jul 16, 2018, 5:00:02 PM | Anurag Srivastava

Why SonarQube is important for IT projects ?

Apr 24, 2018, 2:52:28 PM | Anurag Srivastava

Analyze your project with SonarQube

Jun 2, 2018, 10:49:54 AM | Anurag Srivastava

Install Jenkins on Ubuntu

May 26, 2018, 6:42:02 PM | Anurag Srivastava

Install Kafka on Ubuntu

Jul 12, 2018, 7:40:51 PM | Anurag Srivastava

Log analysis with Elastic stack

Jan 31, 2018, 6:11:29 AM | Anurag Srivastava