Unparalleled suite of productivity-boosting Web APIs & cloud-based micro-service applications for developers and companies of any size.

APIAutomationComparisonsPython

How To Do Profanity Filter With Pure Python vs. REST API

apilayer bad words api blog banner

In this current state of information age, where everybody can easily access the internet and social media, we need bad words/profanity filters, even more to offer a safe haven for people to connect in a virtual space. This article will explain you how to build your own profanity filter using pure Python from scratch vs. using the existing and mature Bad Words API created by APILayer.

APILayer, an Austrian tech company that builds a marketplace for various reliable application programming interfaces (APIs) builds and maintains the Bad Words API that we will use to perform the profanity filtering.

APILayer makes cutting-edge APIs affordable for developers, startups, and enterprises. APILayer provides a wide range of APIs, from data to machine learning, text processing, image processing, etc. Browse all available apilayer products here.

For other uses cases in text analytics, browse our article collections here:

What Is A Profanity Filter Or Profanity Checker?

A profanity filter is a sort of software that searches user-generated content (UGC) and scrubs it to get rid of profanity from online forums, social networks, online stores, and other locations.

Moderators decide on which words to censor, such as swear or cursing words, words associated with hate speech, harassment, and so on. Although profanity filters have limited functionality and don’t assess the context of words, they are thought to be a great starting point for content management because they are easy to set up.

How To Do Profanity Check With Pure Python?

Why Python?

According to TIOBE, Python will be the most widely used programming language in 2022 for developing websites and applications, automating processes, and conducting data analysis. Because Python is a general-purpose language, it may be used to develop a wide range of programs and is not concentrated on a single issue.

Using profanity library 

We can do profanity check or filtering with pure Python using the profanity library.

profanity [1] is A Python library to check for (and clean) profanity in strings.

This library was created by Ben Friedland (@ben174 [2] ).

Installation 

You can easily install the profanity library with this pip command:

profanity library example

The following code is the basic example of use to detect and filter swear words from a sentence using the profanity library:

Was it case-sensitive?

It seems that the profanity library does the text preprocessing pretty well, so it is not case-sensitive. Here is an example:

profanity wordlists

The profanity library only provides us with the default 32 profanity word list. You can check them on this path:

Due to the highly distasteful nature of the words, so we censored them for your convenience:

You can add your own words, to enrich the wordlist.

 

Using better-profanity library

Another and better option is better-profanity. This library is inspired by the profanity library created by Ben Friedland, as we reviewed in the previous section.

better-profanity [3] is a Python library that blazingly fast cleaning swear words (and their leetspeak) in strings. This library is extremely faster than the original because it uses string comparison rather than regex.

This library even supports modified spellings or leet [4] (such as p0rn, h4NDjob, handj0b and b*tCh).

This library was created by Son Thanh Nguyen (@snguyenthanh [5]), and five other contributors.

Installation 

You can easily install the profanity library with this pip command:

In my case, I’ve already installed it before, so the output is like this:

better-profanity library example

We take the example code from https://pypi.org/project/better-profanity/.

The following code is the basic example of use to detect and censor swear words from a sentence using a better-profanity library:

better-profanity more advanced examples

Censor doesn’t care about word dividers:

The function .censor() also hides words separated not just by space but also other dividers, such as _, , and .. Except for @, $, *, “, ‘.

Censor swear words with custom characters:

The character in the second parameter in .censor() will be used to replace the swear words.

You can find more examples at https://pypi.org/project/better-profanity/

Wordlists

The profanity library provides us with the default 916 profanity word list. It’s a pretty good and broad word list, way much improved from the previous profanity library. You can check them on this path:

Due to the highly distasteful nature of the words, so we censored them for your convenience:

Limitations

The censor might simply be gotten over by adding any character(s) to the word because the library compares each word based on characters. For example:

Compare it with how Bad Words API will handle this case nicely:

 

How To Do Profanity Check With APILayer’s Bad Words API?

Bad Words API would detect bad words or swear words by performing profanity checks in a given text.

Bad Words API is an advanced profanity filter based on English phonetics (how stuff sounds). You can use it to find bad/swear words in a text. It will assist you in identifying bad/swear words in a text. It will also censor the detected words by applying a mask to them.

It is an intelligent filter. It will predict any misuses and will also detect social media acronyms (see the “Some tricky use cases” subsection to see the demo).

Bad Words API was built and maintained by APILayer, an Austrian technology company. To find out more about us and our variety of reliable programming interfaces and the affordable APIs we make for developers and startups, you can browse all our products here.

API Access Key & Authentication

First, create your APILayer account, and sign in.

Browse our API Marketplace with 80 powerful APIs, go straight to the AI & Machine Learning section, and choose Bad Words API

Subscribe for free, choose your subscription plan, or contact us here if you have a special request.

Bad Words API uses API keys to authenticate requests. You can view and manage the API keys in your account dashboard.

Using API keys can grant you a variety of privileges, so be sure to protect them! Avoid disclosing your private API keys in areas that are open to the public, such as GitHub, client-side code, and so forth.

All API requests must include a custom HTTP header called “apikey”. Implementation differs with each programming language. This post will discuss Python implementation.

Requests for APIs must always be sent over HTTPS. Calls made using plain HTTP will not succeed. Requests to APIs without authentication will also be rejected.

What Bad Words API endpoints are available?

POST method 

Use the POST method to detect bad words, swear words, and profanity in a given text.

The parameter for sending a POST request is as follows:

Object Description
body (required) Text to perform profanity check
Location: Body, 
Data Type: string
censor_character (required) The censor character to apply. Defaults to *.
Location: Query, 
Data Type: string

 

Basic cURL example

Run this simple cURL command to use the Bad Words API to filter profanity on the given text:

 

Basic usage with Python

Go to the Bad Words API Live Demo.

Input your text to the “HTTP Body” text box, click the “Run Code” button, and then you can see the complete response on the output console below:

Here is the complete python code:

You can fill the “payload” variable with any text you want, followed by .encode(“utf-8”) to handle utf-8 encoding.

What does the output mean?

The output by Bad Words API is much more informative than the default output by pure Python (profanity and better-profanity library). It will give you information about the position of the bad word, the deviation or typo from the actual bad word, and if there is any deviation-it will print out the original bad word, show the total number of the bad words, and finally, censor it.  

Some tricky use cases

Predict any misuses and will also detect social media acronyms.

For example: 

fck will get caught, but frck will not.

Shet will be caught as it should, but not shot.

Bad Words API could even handle added characters to bypass the filter nicely:

You can view the full source code of this article here [6]!

What Are The Common Industrial Implementations For Bad Words API?

1. eCommerce websites

You can use the Bad Words API to filter the inappropriate comments or text from your website. You can also use this API to automatically approve comments that do not contain any bad words. This is such a great solution for hardworking site moderators.

2. Chat rooms, online forums, social platforms, and dating services

One of the best uses of the Bad Words API is in the online chat rooms and forums. The most disturbing thing on these sites is the use of hate speech, abuse, bad words, and other toxic behaviors and illegal activities.

These sites can also be moderated using our API. When someone comments with offensive words, the API detects it, allowing you to keep the modesty intact.

3. Online games

Online gaming is becoming increasingly popular. These games also have a lot of social features, like online chat.

The point is that games are most likely played by children and there is no restriction about anything. Unless you stop it, abusive language is widely used in online games.

Please read this article for more information about the possible industrial implementations for Bad Words API:

https://blog.apilayer.com/bad-words-detection-api-use-cases-and-examples/

 

Are You Ready To Create A Healthier Internet Using Bad Words API?

When it comes to moderating toxic behaviors online, you shouldn’t have to do it alone. APILayer’s Bad Words API would do all the heavy lifting for you!

So if you have any community site, a forum, or even a personal blog, you can use Bad Words API which automatically detects the bad words and will censor them without interrupting the user experience. We need to create a healthier Internet for everyone, don’t we?

Head over and sign up for free to start empowering your apps with bad words filtering API today!

 

References

[1] https://github.com/ben174/profanity

[2] https://github.com/ben174

[3] https://github.com/snguyenthanh/better_profanity

[4] https://en.wikipedia.org/wiki/Leet

[5] https://github.com/snguyenthanh

[6] https://github.com/MuhammadAzizulHakim/apilayerBlog-repo/tree/main/Article17%20-%20Bad%20Words%20API%20%2B%20Python

Related posts
APIAviation Data

The Role of APIs in Modernizing the Aviation Industry

APICurrencyFinance

Currency Exchange Rate API: Key Trends for the Coming Years

APIFinancestock data

What to Expect: Financial Data API Trends for the Next Few Years

API

Introducing APILayer's Platinum Support: Elevate Your API Experience

Leave a Reply

Your email address will not be published. Required fields are marked *