Blog Detail

preview image Programming
by Anurag Srivastava, Aug 14, 2018, 4:47:56 PM | 4 minutes

Introduction to Elasticsearch Aggregations

Aggregations provide us the option to group and extract statistics from our data. aggregations give the insight of our data and can be used for a wide range of problems like we can use Elasticsearch aggregations for creating a recommendation engine through which we can implement the recommendation system on any website.

Now, let us jump to the Elasticsearch aggregations and learn how we can apply data aggregations in Elasticsearch. There are mainly four types of aggregations in Elasticsearch:


  • Metric: Here we can extract metrics on a set of documents like on a numeric field we can get the average, max, min etc.
  • Matrix: This type of aggregations works on multiple fields of the document and after extracting the values from those fields it creates the matrix which provides the insight of those fields.
  • Bucketing: The bucketing aggregations is like group by of RDBMS where we can aggregate the data in a form of the bucket which holds the data as per the bucket criteria. So here we can group the data in different buckets and these buckets hold the data as per the applied criteria.

We will see these aggregations types in detail now. So let us start by understanding the syntax of aggregations:

"aggregationss|aggs" {
   "<name of aggregations>" : {
    "<type of aggregations>" : {
        <body of aggregations>
    }
   }
}

This is the simplest representation of Elasticsearch aggregations. Now let us see what is the meaning of each line of example.

- The first line denotes the aggregation keyword where we can use "aggregations" or "aggs".
- In the second line, we need to specify a name for the aggregation.
- In the third line, we need to specify the type of aggregation like terms, etc.
- Then we need to specify the actual aggregation body.

Now let us see the data format which I am going to use for the aggregation:

{
        "_index": "bqstack",
        "_type": "blogs",
        "_id": "EwJnGWQBnhG38eKPq5Bo",
        "_score": 1,
        "_source": {
          "category_name": "Cars",
          "name": "Rocky Paul",
          "edit_approved": false,
          "email": "rocky.paul.9867@xyz.com",
          "edited_blog_content": null,
          "category_id": 35,
          "author_id": 75,
          "create_date": "2018-05-09T13:28:20.917Z",
          "preview_image": "blog_57.png",
          "approved": false,
          "views": 148,
          "@version": "1",
          "blog_content": """
<p><span class="storyText"><p class="MsoNormal"><span lang="EN-GB">The central government approved green licence plates for electric vehicles </span>
""",
          "tags": "",
          "id": 57,
          "blog_title": "Centre approves green licence plates for electric cars",
          "update_date": "2018-05-16T18:30:22.669Z",
          "category_image": "cars.jpg",
          "@timestamp": "2018-06-19T18:56:20.427Z"
        }
      }

Above document is taken from the index bqstack and will be used to demonstrate Elasticsearch aggregation. This is the introduction of aggregations blog so here I will explain the simplest form of Elasticsearch aggregation. See the below example:

GET bqstack/_search?size=0
{
  "aggs": {
    "blog_categories" : {
      "terms" : {
        "field" : "category_name",
        "size" : 5
      }
    }
  }
}

In the above example we are doing the following:
- Given size=0 after _search API to stop listing the documents.
- Keyword "aggs" is there to tell Elasticsearch that I am going to apply the aggregations. We can use "aggregations" instead of "aggs".
- I have given the name as "blog_categories"  to make the aggregation name meaningful because we are going to bucket on category names.
- After specifying the aggregation name we are simply providing the term to specify the field name.
- I have also added, "size" = 5 as there are multiple categories and I am interested in top 5 categories only.

After executing the above expression we would get the following response:

{
  "took": 16,
  "timed_out": false,
  "_shards": {
    "total": 5,
    "successful": 5,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": 54,
    "max_score": 0,
    "hits": []
  },
  "aggregationss": {
    "blog_categories": {
      "doc_count_error_upper_bound": 0,
      "sum_other_doc_count": 19,
      "buckets": [
        {
          "key": "programming",
          "doc_count": 11
        },
        {
          "key": "devops",
          "doc_count": 9
        },
        {
          "key": "news",
          "doc_count": 8
        },
        {
          "key": "poetry",
          "doc_count": 5
        },
        {
          "key": "informational",
          "doc_count": 4
        }
      ]
    }
  }
}

In this way, we can create a bucket for any field of the document. This was the basic blog for aggregations and in my next blog of aggregations, I will explain more complex examples using which we can get better insights into our data.

Other Blogs on Elastic Stack:
Introduction to Elasticsearch

Elasticsearch Installation and Configuration on Ubuntu 14.04
Log analysis with Elastic stack 
Elasticsearch Rest API
Basics of Data Search in Elasticsearch
Elasticsearch Rest API
Wildcard and Boolean Search in Elasticsearch
Configure Logstash to push MySQL data into Elasticsearch 
Metrics Aggregation in Elasticsearch
Bucket Aggregation in Elasticsearch
How to create Elasticsearch Cluster

If you found this article interesting, you can explore  "Mastering Kibana 6.0" and "Kibana 7 Quick Start Guide" to get more insight about Kibana and how we can configure ELK to create dashboards for key performance indicators.

About Author

Anurag Srivastava

Author | Blogger | Tech Lead | Elastic Stack | Innovator |

View Profile

Comments (0)

Leave a comment

Related Blogs

Metrics Aggregation in Elasticsearch

Aug 18, 2018, 6:02:20 PM | Anurag Srivastava

Basics of Data Search in Elasticsearch

Aug 4, 2018, 7:02:21 AM | Anurag Srivastava

Elasticsearch Rest API

Jul 31, 2018, 6:16:42 PM | Anurag Srivastava

Bucket Aggregation in Elasticsearch

Aug 29, 2018, 7:15:06 PM | Anurag Srivastava

Wildcard and Boolean Search in Elasticsearch

Aug 10, 2018, 7:14:40 PM | Anurag Srivastava

Create a Pie Chart in Kibana

Dec 24, 2018, 5:25:28 PM | Anurag Srivastava

Create word cloud in Python

Jun 30, 2018, 6:06:45 AM | Anurag Srivastava

Typecasting in PHP for short datatype

Jun 8, 2018, 8:03:52 AM | Lovish Sharma

Top Blogs

Configure SonarQube Scanner with Jenkins

Jun 21, 2018, 4:58:11 AM | Anurag Srivastava

Build and deploy Angular code using Python

Jun 26, 2018, 4:50:18 PM | Anurag Srivastava

Configure Jenkins for Automated Code Deployment

Jun 13, 2018, 3:44:01 PM | Anurag Srivastava

SonarQube installation on Ubuntu

May 12, 2018, 4:47:07 PM | Anurag Srivastava

Execute Commands on Remote Machines using sshpass

Jul 16, 2018, 5:00:02 PM | Anurag Srivastava

Why SonarQube is important for IT projects ?

Apr 24, 2018, 2:52:28 PM | Anurag Srivastava

Elasticsearch Rest API

Jul 31, 2018, 6:16:42 PM | Anurag Srivastava

Analyze your project with SonarQube

Jun 2, 2018, 10:49:54 AM | Anurag Srivastava

Wildcard and Boolean Search in Elasticsearch

Aug 10, 2018, 7:14:40 PM | Anurag Srivastava

Install Jenkins on Ubuntu

May 26, 2018, 6:42:02 PM | Anurag Srivastava