Blog Detail

Covid-19 Global Tracker
preview image DevOps
by Anurag Srivastava, Feb 9, 2019, 6:34:22 PM | 4 minutes

Load csv Data into Elasticsearch

In this blog, I am going to explain how you can import publicly available CSV data into Elasticsearch. ELK enables us to easily analyze any data and can help us to create dashboards with key performance indicators. CSV data for different domains like healthcare, crime, agriculture, etc are available on different government sites which we can easily download. I have seen many times people don't know how we can import these CSV data into Elasticsearch and that is why in this blog I have explained this process step by step.

After data import, you can use this data for data analysis or for creating different dashboards. Here I am taking the example of 'Crimes - 2001 to present' from the data.gov website (https://catalog.data.gov/dataset?res_format=CSV). From this website, you can download different types of data in CSV format. The size of this CSV file is approximately 1.6 GB. 

Now, let us start the process to import this data into Elasticsearch. You need to do the following:

- Download the CSV file (crimes_2001.csv) from "https://catalog.data.gov/dataset?res_format=CSV" website. This file has ID, Case Number, Date, Block, IUCR, Primary Type, Description, Location, Description, Arrest, Domestic, Beat, District, Ward, Community Area, FBI Code, X Coordinate, Y Coordinate, Year, Updated On, Latitude, Longitude and Location fields.

If you want to know the basics of Logstash then please refer to the "Introduction to Logstash" blog where I have explained the basics of Logstash.

- Create a Logstash configuration file for reading the CSV data and writing it to Elasticsearch. You need to write following expression in Logstash configuration file (crimes.conf):

input {
    file {
        path => "/home/user/Downloads/crimes_2001.csv"
        start_position => beginning
    }
}
filter {
    csv {
        columns => [
                "ID",
                "Case Number",
                "Date",
                "Block",
                "IUCR",
                "Primary Type",
                "Description",
                "Location Description",
                "Arrest",
                "Domestic",
                "Beat",
                "District",
                "Ward",
                "Community Area",
                "FBI Code",
                "X Coordinate",
                "Y Coordinate",
                "Year",
                "Updated On",
                "Latitude",
                "Longitude",
                "Location"
        ]
        separator => ","
        }
}
output {
    stdout
    {
        codec => rubydebug
    }
     elasticsearch {
        action => "index"
        hosts => ["127.0.0.1:9200"]
        index => "crimes"
    }
}


- After creating the Logstash configuration file execute the configuration with the following command:

/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/crimes.conf 

This command will create the pipeline to read the CSV data and to write it into Elasticsearch.

- You can verify the index "crimes" creation in Elasticsearch by listing indices in the browser:

http://localhost:9200/_cat/indices?v


- If your index "crimes" is listed you can see the data in Elasticsearch:

http://localhost:9200/crimes/_search?pretty

After opening the above URL you can see the data in the following format:


{
        "_index" : "crimes",
        "_type" : "_doc",
        "_id" : "BTTXFWoB75utKkMR2zRC",
        "_score" : 1.0,
        "_source" : {
          "Case Number" : "HY190059",
          "Block" : "066XX S MARSHFIELD AVE",
          "FBI Code" : "26",
          "IUCR" : "4625",
          "X Coordinate" : "1166468",
          "Ward" : "15",
          "Y Coordinate" : "1860715",
          "Beat" : "0725",
          "Location Description" : "STREET",
          "Domestic" : "false",
          "Community Area" : "67",
          "Updated On" : "02/10/2018 03:50:01 PM",
          "Primary Type" : "OTHER OFFENSE",
         "Year of Crime" : "2015",
          "host" : "KELLGGNLPTP0305",
          "Date" : "03/18/2015 11:00:00 PM",
          "path" : "/home/user/Downloads/crimes.csv",
          "Arrest" : "true",
          "Longitude" : "-87.665319468",
          "id" : "10000094",
          "Description" : "PAROLE VIOLATION",
          "@timestamp" : "2019-04-13T08:37:05.351Z",
          "District" : "007",
          "Latitude" : "41.773371528",
          "@version" : "1"
        }
}

Above Elasticsearch document from the 'crimes' index is representing a single record from the CSV file.

In this way you can push any CSV data into Elasticsearch and then can perform search, analytics or create dashboards using that data. If you have any query please comment.

Other Blogs on Elastic Stack:
Introduction to Elasticsearch

Elasticsearch Installation and Configuration on Ubuntu 14.04
Log analysis with Elastic stack 
Elasticsearch Rest API
Basics of Data Search in Elasticsearch
Elasticsearch Rest API
Wildcard and Boolean Search in Elasticsearch
Configure Logstash to push MySQL data into Elasticsearch 
Metrics Aggregation in Elasticsearch
Bucket Aggregation in Elasticsearch
How to create Elasticsearch Cluster

If you found this article interesting, then you can explore  "Mastering Kibana 6.0", "Kibana 7 Quick Start Guide", "Learning Kibana 7", and "Elasticsearch 7 Quick Start Guide" books to get more insight about Kibana and how we can configure ELK to create dashboards for key performance indicators. You can get the book links on the right pane of the page.

In case of any doubt please leave your comments. You can also follow me on Twitter: https://twitter.com/anubioinfo



About Author

Anurag Srivastava

Author of “Mastering Kibana 6.x”, “Kibana 7 Quick Start Guide”, “Learning Kibana 7”, &“Elasticsearch 7 Quick Start Guide” books & AWS Certified Solutions Architect.

View Profile

Comments (4)

  • user image
    Ram P
    Apr 12, 2019, 8:38:45 PM

    @anurag You may want to remove the column names from the quotes as logstash doesn't know how to filter your column names

  • user image
    Anurag Srivastava
    Apr 13, 2019, 8:30:12 AM

    @Ram: this is required to map the field names as if we remove the column from filter it will generate following source in Elasticsearch: "_source" : { "column14" : "25", "column21" : "-87.744384567", "@version" : "1", "column2" : "HY189866", "column12" : "011", "column8" : "STREET", "column20" : "41.891398861", "host" : "KELLGGNLPTP0305", "column15" : "04B", "column1" : "10000092", "column13" : "28", "column10" : "false", "column7" : "AGGRAVATED: HANDGUN", "column5" : "041A", "column3" : "03/18/2015 07:44:00 PM", "path" : "/home/user/Downloads/crimes.csv", "column17" : "1903566", "column9" : "false", "column19" : "02/10/2018 03:50:01 PM", "column6" : "BATTERY", "column16" : "1144606", "column22" : "(41.891398861, -87.744384567)", "@timestamp" : "2019-04-13T08:26:32.732Z", "column4" : "047XX W OHIO ST", "column11" : "1111", "column18" : "2015", "message" : """10000092,HY189866,03/18/2015 07:44:00 PM,047XX W OHIO ST,041A,BATTERY,AGGRAVATED: HANDGUN,STREET,false,false,1111,011,28,25,04B,1144606,1903566,2015,02/10/2018 03:50:01 PM,41.891398861,-87.744384567,"(41.891398861, -87.744384567)"""" }

  • user image
    Ram P
    Apr 13, 2019, 2:25:01 PM

    @anurag I meant to say is to remove the quotes. You would need the column names for identification. Also did you try auto_detect_column_names =>true. ? By defining this ELK would take the row row as column names when the input is CSV

  • user image
    Anurag Srivastava
    Apr 13, 2019, 3:27:09 PM

    @Ram: Thanks for the comment, I have just checked the auto_detect_column_names =>true and it is working.

Leave a comment

Related Blogs

Configuring Logstash to push MySQL data into Elasticsearch

Feb 9, 2019, 12:06:18 PM | Anurag Srivastava

Configuring Logstash to send MongoDB data into Elasticsearch

Mar 9, 2019, 8:20:38 AM | Anurag Srivastava

Log analysis with Elastic stack

Jan 31, 2018, 6:11:29 AM | Anurag Srivastava

Configuring Django application with Elastic APM

Jan 14, 2020, 10:22:34 AM | Anurag Srivastava

Monitoring Application using Elastic APM

Jan 16, 2020, 11:43:29 AM | Anurag Srivastava

MySQL Performance Tuning

Jan 31, 2018, 6:15:34 AM | Anurag Srivastava

Introduction to Logstash

Dec 20, 2019, 11:38:31 AM | Anurag Srivastava

Introduction to Elastic APM

Jan 7, 2020, 7:15:34 PM | Anurag Srivastava

Why monitoring is important?

Jan 6, 2020, 7:30:13 PM | Anurag Srivastava

Snapshot and Restore Elasticsearch Indices

Sep 16, 2019, 5:55:06 AM | Anurag Srivastava

Top Blogs

Configure SonarQube Scanner with Jenkins

Jun 21, 2018, 4:58:11 AM | Anurag Srivastava

Build and deploy Angular code using Python

Jun 26, 2018, 4:50:18 PM | Anurag Srivastava

Configure Jenkins for Automated Code Deployment

Jun 13, 2018, 3:44:01 PM | Anurag Srivastava

SonarQube installation on Ubuntu

May 12, 2018, 4:47:07 PM | Anurag Srivastava

Execute Commands on Remote Machines using sshpass

Jul 16, 2018, 5:00:02 PM | Anurag Srivastava

Wildcard and Boolean Search in Elasticsearch

Aug 10, 2018, 7:14:40 PM | Anurag Srivastava

Why SonarQube is important for IT projects ?

Apr 24, 2018, 2:52:28 PM | Anurag Srivastava

Elasticsearch Rest API

Jul 31, 2018, 6:16:42 PM | Anurag Srivastava

Analyze your project with SonarQube

Jun 2, 2018, 10:49:54 AM | Anurag Srivastava

Configuring Logstash to send MongoDB data into Elasticsearch

Mar 9, 2019, 8:20:38 AM | Anurag Srivastava