Blog Detail

preview image Programming
by jitender yadav, Jun 30, 2018, 12:07:47 PM | 2 minutes

How to count number of words in a HTML string and find Read time in Python 3

In this blog we are going to learn how to count number of words in a string with HTML tags and read-time of that string in Python.

While writing blogs or articles in html text editor the editor gives a string with embedded  with HTML tags which is saved in database as it is.

We need to show read-time of a blog/article OR number of words in that blog to a reader. We can count words from a string with HTML tags by stripping HTML tags as follows in Python:

we need HTMLParser library for striping of HTML Tags and math library for mathematical operations and re for regex related operations.

from html.parser import HTMLParser
import math
import re

Then we need to create a Class which implement HTMLParser

class MLStripper(HTMLParser):
Class for stripping Html Tags
def __init__(self):
self.strict = False
self.convert_charrefs= True
self.fed = []
    #this function takes html string as input and put data in
def handle_data(self, d):

def get_data(self):
return ''.join(self.fed)

Now write function which takes input as HTML string return clean word string without HTML tags

def strip_tags(html):
s = MLStripper()
return s.get_data()

Write functions for word count and read-time

def count_words(html_string):
# html_string = """
# <h1>This is a title</h1>
# """
word_string = strip_tags(html_string)
count = len(word_string.split()) #without any argument split() works on space
return count

def get_read_time(html_string):
count = count_words(html_string)
read_time_min = math.ceil(count/200.0) #assuming 200wpm reading
return int(read_time_min)
We can count words using regex also
def count_words(html_string):
# html_string = """
# <h1>This is a title</h1>
# """
word_string = strip_tags(html_string)
words = re.findall(r'\w+', word_string)
count = len(words)
return count

Hope this will help.

About Author

jitender yadav

I am a tech enthusiast and always keen to learn new technologies.

View Profile

Comments (0)

Leave a comment

Related Blogs

Basics of Data Search in Elasticsearch

Aug 4, 2018, 7:02:21 AM | Anurag Srivastava

Elasticsearch Rest API

Jul 31, 2018, 6:16:42 PM | Anurag Srivastava

Bucket Aggregation in Elasticsearch

Aug 29, 2018, 7:15:06 PM | Anurag Srivastava

Wildcard and Boolean Search in Elasticsearch

Aug 10, 2018, 7:14:40 PM | Anurag Srivastava

Metrics Aggregation in Elasticsearch

Aug 18, 2018, 6:02:20 PM | Anurag Srivastava

Introduction to Elasticsearch Aggregations

Aug 14, 2018, 4:47:56 PM | Anurag Srivastava

Create a Pie Chart in Kibana

Dec 24, 2018, 5:25:28 PM | Anurag Srivastava

Create word cloud in Python

Jun 30, 2018, 6:06:45 AM | Anurag Srivastava

Typecasting in PHP for short datatype

Jun 8, 2018, 8:03:52 AM | Lovish Sharma

Top Blogs

Configure SonarQube Scanner with Jenkins

Jun 21, 2018, 4:58:11 AM | Anurag Srivastava

Build and deploy Angular code using Python

Jun 26, 2018, 4:50:18 PM | Anurag Srivastava

Configure Jenkins for Automated Code Deployment

Jun 13, 2018, 3:44:01 PM | Anurag Srivastava

SonarQube installation on Ubuntu

May 12, 2018, 4:47:07 PM | Anurag Srivastava

Execute Commands on Remote Machines using sshpass

Jul 16, 2018, 5:00:02 PM | Anurag Srivastava

Why SonarQube is important for IT projects ?

Apr 24, 2018, 2:52:28 PM | Anurag Srivastava

Analyze your project with SonarQube

Jun 2, 2018, 10:49:54 AM | Anurag Srivastava

Install Jenkins on Ubuntu

May 26, 2018, 6:42:02 PM | Anurag Srivastava

Install Kafka on Ubuntu

Jul 12, 2018, 7:40:51 PM | Anurag Srivastava

Log analysis with Elastic stack

Jan 31, 2018, 6:11:29 AM | Anurag Srivastava