Blog Detail

preview image DevOps
by Anurag Srivastava, Oct 15, 2018, 6:25:43 PM | 3 minutes

Load csv Data into Elasticsearch

In this blog I am going to explain how you can import publicly available csv data into Elasticsearch. ELK enables us to easily analyze any data and can help us to create dashboards with key performance indicators. csv data for different domains like healthcare, crime, agriculture etc are available on different government sites which we can easily download. I have seen many times people don't know how we can import these csv data into Elasticsearch and that is why in this blog I have explained this process step by step.

After data import you can use these data for data analysis or for creating different dashboards. Here I am taking the example of 'Crimes - 2001 to present' from data.gov website (https://catalog.data.gov/dataset?res_format=CSV). From this website you can download different types of data in csv format. Size of this csv file is approximately 1.6 GB.

Now lets start the process to import this data into Elasticsearch. You need to do following:

- Download the csv file (crimes_2001.csv) from "https://catalog.data.gov/dataset?res_format=CSV" website. This file has ID,Case Number,Date,Block,IUCR,Primary Type,Description,Location , Description, Arrest, Domestic, Beat, District, Ward, Community Area, FBI Code, X Coordinate, Y Coordinate, Year, Updated On, Latitude, Longitude and Location fields.

- Create a Logstash configuration file for reading the csv data and writing it to Elasticsearch. You need to write following expression in Logstash configuration file (crimes.conf):

input {
    file {
        path => "/home/user/Downloads/crimes_2001.csv"
        start_position => beginning
    }
}
filter {
    csv {
        columns => [
                "ID",
                "Case Number",
                "Date",
                "Block",
                "IUCR",
                "Primary Type",
                "Description",
                "Location Description",
                "Arrest",
                "Domestic",
                "Beat",
                "District",
                "Ward",
                "Community Area",
                "FBI Code",
                "X Coordinate",
                "Y Coordinate",
                "Year",
                "Updated On",
                "Latitude",
                "Longitude",
                "Location"
        ]
        separator => ","
        }
}
output {
    stdout
    {
        codec => rubydebug
    }
     elasticsearch {
        action => "index"
        hosts => ["127.0.0.1:9200"]
        index => "crimes"
    }
}


- After creating the Logstash configuration file execute the configuration with following command:

/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/crimes.conf 

This command will create the pipeline to read the csv data and to write it into Elasticsearch.

- You can verify the index "crimes" creation in Elasticsearch by listing indices in browser:

http://localhost:9200/_cat/indices?v


- If your index is listed you can see the data in Elasticsearch:

http://localhost:9200/crimes/_search?pretty


In this way you can push any csv data into Elasticsearch and then can perform search, analytics or create dashboards using that data. If you have any query please comment.

If you found this article interesting, you can explore  "Mastering Kibana 6.0" to get more insight about Kibana and how we can configure ELK to create dashboards for key performance indicators.

About Author

Anurag Srivastava

Author | Blogger | Tech Lead | Data Scientist | Innovator |

View Profile

Comments (0)

Leave a comment

Related Blogs

htop: An Interactive Process Viewer

Oct 13, 2018, 8:49:59 PM | Anurag Srivastava

Configure Logstash to push MySQL data into Elasticsearch

Jul 7, 2018, 8:51:30 AM | Anurag Srivastava

Execute Commands on Remote Machines using sshpass

Jul 16, 2018, 5:00:02 PM | Anurag Srivastava

Configure Jenkins for Automated Code Deployment

Jun 13, 2018, 3:44:01 PM | Anurag Srivastava

Configure SonarQube Scanner with Jenkins

Jun 21, 2018, 4:58:11 AM | Anurag Srivastava

Build and deploy Angular code using Python

Jun 26, 2018, 4:50:18 PM | Anurag Srivastava

Why SonarQube is important for IT projects ?

Apr 24, 2018, 2:52:28 PM | Anurag Srivastava

SonarQube installation on Ubuntu

May 12, 2018, 4:47:07 PM | Anurag Srivastava

Install Kafka on Ubuntu

Jul 12, 2018, 7:40:51 PM | Anurag Srivastava

Analyze your project with SonarQube

Jun 2, 2018, 10:49:54 AM | Anurag Srivastava

Top Blogs

Build and deploy Angular code using Python

Jun 26, 2018, 4:50:18 PM | Anurag Srivastava

Configure SonarQube Scanner with Jenkins

Jun 21, 2018, 4:58:11 AM | Anurag Srivastava

Configure Logstash to push MySQL data into Elasticsearch

Jul 7, 2018, 8:51:30 AM | Anurag Srivastava

Configure Jenkins for Automated Code Deployment

Jun 13, 2018, 3:44:01 PM | Anurag Srivastava

SonarQube installation on Ubuntu

May 12, 2018, 4:47:07 PM | Anurag Srivastava

Execute Commands on Remote Machines using sshpass

Jul 16, 2018, 5:00:02 PM | Anurag Srivastava

Why SonarQube is important for IT projects ?

Apr 24, 2018, 2:52:28 PM | Anurag Srivastava

Install Jenkins on Ubuntu

May 26, 2018, 6:42:02 PM | Anurag Srivastava

Analyze your project with SonarQube

Jun 2, 2018, 10:49:54 AM | Anurag Srivastava

Install Kafka on Ubuntu

Jul 12, 2018, 7:40:51 PM | Anurag Srivastava