Blog Detail

preview image DevOps
by Anurag Srivastava, Jul 7, 2018, 8:51:30 AM | 5 minutes

Configure Logstash to push MySQL data into Elasticsearch

I am taking the example of a bqstack website which is build using MySQL database. So basically what I am going to do is configure Logstash using JDBC input plugin to connect with MySQL database. After connecting to MySQL database I will run the query to fetch the records from database and will push that record into Elasticsearch index.

Once we are getting the data from MySQL database into Elasticsearch, we can create the dashboards in Kibana as per our requirement. I need to create the Logstash configuration file inside /etc/logstash/conf.d/ directory. So lets create a file as blog.conf and write the code as follows:


# file: blog.conf
input {
jdbc {
jdbc_driver_library => "/usr/share/logstash/mysql-connector-java-5.1.23-bin.jar"
jdbc_driver_class => "com.mysql.jdbc.Driver"
# mysql jdbc connection string to our database, mydb
jdbc_connection_string => "jdbc:mysql://url-of-db:3306/db_name?zeroDateTimeBehavior=convertToNull"
# The user we wish to execute our statement as
jdbc_user => "username"
jdbc_password => "password"
schedule => "* * * * *"
# our query to fetch blog details
statement => "SELECT blg.*, concat(au.first_name, ' ',au.last_name) as name,au.email as email, cc.category_name, cc.category_image FROM `blog_blogs` as blg left join auth_user as au on au.id = blg.author_id left join category_category as cc on cc.id = blg.category_id where blg.id > :sql_last_value order by blg.create_date"
use_column_value => true
tracking_column => id
tracking_column_type => "numeric"
}
}
output {
elasticsearch {
hosts => "http://127.0.0.1:9200"
index => "bqstack"
document_type => "blogs"
}
}


In the above Logstash configuration file we have input and output section, under input we are connecting to MySQL database for fetching the data and under output we are sending that data to Elasticsearch cluster.  Under input section we have jdbc block in which first is jdbc_driver_library which tells the jdbc driver library path. JDBC input library does not contain the jdbc driver so we need to download it and then provide the path under jdbc_driver_library parameter. Next is jdbc_driver_class where we need to provide the driver class, then we need to provide the jdbc connection string. For connection string it has a syntax where we have to provide the db type, URL of database, port of database, database name and then we need to set username and password parameter of database.

Once these database connection related parameters are set we can set the scheduler by setting the schedule parameter then comes the actual query which we are going to execute on connected database. There as a specific syntax using which we can set the schedule frequency. The syntax for JDBC input plugin is quite similar to the cron. For example:

"* * * * * " => runs every second
"30 2 * * *" => runs as 2:30AM
"10 22 * * *" => runs at 10:10PM


Statement parameter is there to write the query, in the above query I have compared the blog id with sql_last_value which is dynamic and refer to the id column of the table. After each query execution the value of sql_last_value is updated and set in the file. Next time when the query is executed this value is picked from the file. We are providing the data type of the tracking column along with the tracking column name, as id is a numeric value so I have given it as numeric for tracking_column_type parameter.

After doing all those configuration changes we need to execute logstash using following command on Ubuntu:

/usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/blog.conf --path.data=/tmp/bq


This is a one time command and after running it the scheduler will start working and as per our scheduler entry the query will be executed every second. We can do the cron entry to auto start the Logstash configuration execution once system restarts. We need to run follwing command for opening the crontab in linux:


crontab -e

Above command opens the crontab file where we can write the following entry to ensure that after machine restart Logstash configuration is executed and JDBC input plugin scheduler starts executing the queries to fetch the data from database.

@reboot /usr/share/logstash/bin/logstash -f /etc/logstash/conf.d/blog.conf --path.data=/tmp/bq

Above crontab entry starts with @reboot which works after each machine restart after that we have given the expression to execute the Logstash configuration.

In this way we can configure Logstash using JDBC iput plugin to read the RDBMS data and put it into Elasticsearch. Once data is there in Elasticsearch we can create the visualizations and dashboards using Kibana.

If you found this article interesting, you can explore  "Mastering Kibana 6.0" to get more insight about Kibana and how we can configure ELK to create dashboards for key performance indicators.

Other Blogs on Elasticsearch:

Introduction to Elasticsearch
Elasticsearch Installation and Configuration on Ubuntu 14.04
Log analysis with Elastic stack
Elasticsearch Rest API
Basics of Data Search in Elasticsearch




In case of any queries please leave a comment.

About Author

Anurag Srivastava

Author | Blogger | Tech Lead | Data Scientist | Innovator |

View Profile

Comments (3)

  • user image
    Divit Sharma
    Jul 7, 2018, 6:59:35 PM

    So tracking ID should be the one which we will use in :sql-last-value?

  • user image
    Divit Sharma
    Jul 7, 2018, 7:01:59 PM

    What does the command --path.data =/tmp/bq mean?

  • user image
    Anurag Srivastava
    Jul 7, 2018, 7:14:37 PM

    Hi Divit, yes I'd column is used for tracking with :sql-last-value. path.data is used for data store path if you have multiple logstash configuration. You need not to worry about that just provide a writable path. Let me know if you have any doubt.

Leave a comment

Related Blogs

Load csv Data into Elasticsearch

Oct 15, 2018, 6:25:43 PM | Anurag Srivastava

htop: An Interactive Process Viewer

Oct 13, 2018, 8:49:59 PM | Anurag Srivastava

Execute Commands on Remote Machines using sshpass

Jul 16, 2018, 5:00:02 PM | Anurag Srivastava

Configure Jenkins for Automated Code Deployment

Jun 13, 2018, 3:44:01 PM | Anurag Srivastava

Configure SonarQube Scanner with Jenkins

Jun 21, 2018, 4:58:11 AM | Anurag Srivastava

Build and deploy Angular code using Python

Jun 26, 2018, 4:50:18 PM | Anurag Srivastava

Why SonarQube is important for IT projects ?

Apr 24, 2018, 2:52:28 PM | Anurag Srivastava

SonarQube installation on Ubuntu

May 12, 2018, 4:47:07 PM | Anurag Srivastava

Install Kafka on Ubuntu

Jul 12, 2018, 7:40:51 PM | Anurag Srivastava

Analyze your project with SonarQube

Jun 2, 2018, 10:49:54 AM | Anurag Srivastava

Top Blogs

Build and deploy Angular code using Python

Jun 26, 2018, 4:50:18 PM | Anurag Srivastava

Configure SonarQube Scanner with Jenkins

Jun 21, 2018, 4:58:11 AM | Anurag Srivastava

Configure Logstash to push MySQL data into Elasticsearch

Jul 7, 2018, 8:51:30 AM | Anurag Srivastava

Configure Jenkins for Automated Code Deployment

Jun 13, 2018, 3:44:01 PM | Anurag Srivastava

SonarQube installation on Ubuntu

May 12, 2018, 4:47:07 PM | Anurag Srivastava

Execute Commands on Remote Machines using sshpass

Jul 16, 2018, 5:00:02 PM | Anurag Srivastava

Why SonarQube is important for IT projects ?

Apr 24, 2018, 2:52:28 PM | Anurag Srivastava

Install Jenkins on Ubuntu

May 26, 2018, 6:42:02 PM | Anurag Srivastava

Analyze your project with SonarQube

Jun 2, 2018, 10:49:54 AM | Anurag Srivastava

Install Kafka on Ubuntu

Jul 12, 2018, 7:40:51 PM | Anurag Srivastava