In this current state of information age, where everybody can easily access the internet and social media, we need bad words/profanity filters, even more to offer a safe haven for people to connect in a virtual space. This article will explain you how to build your own profanity filter using pure Python from scratch vs. using the existing and mature Bad Words API created by APILayer.
APILayer, an Austrian tech company that builds a marketplace for various reliable application programming interfaces (APIs) builds and maintains the Bad Words API that we will use to perform the profanity filtering.
APILayer makes cutting-edge APIs affordable for developers, startups, and enterprises. APILayer provides a wide range of APIs, from data to machine learning, text processing, image processing, etc. Browse all available apilayer products here.
For other uses cases in text analytics, browse our article collections here:
- How To Automatically Discover The Emotions In Tweets With Python
- How To Rewrite And Enhance Any Article Using Paraphraser AI
- 9 Ways To Do Advanced News Scraping With Python And APIs
Table of Contents
What Is A Profanity Filter Or Profanity Checker?
A profanity filter is a sort of software that searches user-generated content (UGC) and scrubs it to get rid of profanity from online forums, social networks, online stores, and other locations.
Moderators decide on which words to censor, such as swear or cursing words, words associated with hate speech, harassment, and so on. Although profanity filters have limited functionality and don’t assess the context of words, they are thought to be a great starting point for content management because they are easy to set up.
How To Do Profanity Check With Pure Python?
Why Python?
According to TIOBE, Python will be the most widely used programming language in 2022 for developing websites and applications, automating processes, and conducting data analysis. Because Python is a general-purpose language, it may be used to develop a wide range of programs and is not concentrated on a single issue.
Using profanity library
We can do profanity check or filtering with pure Python using the profanity library.
profanity [1] is A Python library to check for (and clean) profanity in strings.
This library was created by Ben Friedland (@ben174 [2] ).
Installation
You can easily install the profanity library with this pip command:
1 |
pip install profanity |
profanity library example
The following code is the basic example of use to detect and filter swear words from a sentence using the profanity library:
1 2 3 4 |
from profanity import profanity profanity.contains_profanity("You smell like shit.") profanity.censor("You smell like shit.") |
Was it case-sensitive?
It seems that the profanity library does the text preprocessing pretty well, so it is not case-sensitive. Here is an example:
1 2 3 4 |
profanity.contains_profanity("You smell like shit.") profanity.contains_profanity("You smell like SHIT.") profanity.contains_profanity("You smell like ShIt.") profanity.contains_profanity("You smell like sHiT.") |
1 2 3 4 |
profanity.censor("You smell like shit.") profanity.censor("You smell like SHIT.") profanity.censor("You smell like ShIt.") profanity.censor("You smell like sHiT.") |
profanity wordlists
The profanity library only provides us with the default 32 profanity word list. You can check them on this path:
1 |
C:/Users/YOUR_USERNAME/anaconda3/Lib/site-packages/profanity/data/wordlist.txt |
Due to the highly distasteful nature of the words, so we censored them for your convenience:
You can add your own words, to enrich the wordlist.
Using better-profanity library
Another and better option is better-profanity. This library is inspired by the profanity library created by Ben Friedland, as we reviewed in the previous section.
better-profanity [3] is a Python library that blazingly fast cleaning swear words (and their leetspeak) in strings. This library is extremely faster than the original because it uses string comparison rather than regex.
This library even supports modified spellings or leet [4] (such as p0rn, h4NDjob, handj0b and b*tCh).
This library was created by Son Thanh Nguyen (@snguyenthanh [5]), and five other contributors.
Installation
You can easily install the profanity library with this pip command:
1 |
pip install better_profanity |
In my case, I’ve already installed it before, so the output is like this:
better-profanity library example
We take the example code from https://pypi.org/project/better-profanity/.
The following code is the basic example of use to detect and censor swear words from a sentence using a better-profanity library:
1 2 3 4 5 6 7 8 |
from better_profanity import profanity if __name__ == "__main__": profanity.load_censor_words() text = "You p1ec3 of sHit." censored_text = profanity.censor(text) print(censored_text) |
better-profanity more advanced examples
Censor doesn’t care about word dividers:
The function .censor() also hides words separated not just by space but also other dividers, such as _, , and .. Except for @, $, *, “, ‘.
1 2 3 4 5 6 7 |
from better_profanity import profanity if __name__ == "__main__": text = "...sh1t...hello_dog_fuck,,,,123" censored_text = profanity.censor(text) print(censored_text) |
Censor swear words with custom characters:
The character in the second parameter in .censor() will be used to replace the swear words.
1 2 3 4 5 6 7 |
from better_profanity import profanity if __name__ == "__main__": text = "You p1ec3 of sHit." censored_text = profanity.censor(text, '-') print(censored_text) |
You can find more examples at https://pypi.org/project/better-profanity/
Wordlists
The profanity library provides us with the default 916 profanity word list. It’s a pretty good and broad word list, way much improved from the previous profanity library. You can check them on this path:
1 |
C:/Users/YOUR_USERNAME/anaconda3/Lib/site-packages/better_profanity/profanity_wordlist.txt |
Due to the highly distasteful nature of the words, so we censored them for your convenience:
Limitations
The censor might simply be gotten over by adding any character(s) to the word because the library compares each word based on characters. For example:
1 2 3 |
profanity.censor('I just have sexx') profanity.censor('jerkk off') |
Compare it with how Bad Words API will handle this case nicely:
How To Do Profanity Check With APILayer’s Bad Words API?
Bad Words API would detect bad words or swear words by performing profanity checks in a given text.
Bad Words API is an advanced profanity filter based on English phonetics (how stuff sounds). You can use it to find bad/swear words in a text. It will assist you in identifying bad/swear words in a text. It will also censor the detected words by applying a mask to them.
It is an intelligent filter. It will predict any misuses and will also detect social media acronyms (see the “Some tricky use cases” subsection to see the demo).
Bad Words API was built and maintained by APILayer, an Austrian technology company. To find out more about us and our variety of reliable programming interfaces and the affordable APIs we make for developers and startups, you can browse all our products here.
API Access Key & Authentication
First, create your APILayer account, and sign in.
Browse our API Marketplace with 80 powerful APIs, go straight to the AI & Machine Learning section, and choose Bad Words API.
Subscribe for free, choose your subscription plan, or contact us here if you have a special request.
Bad Words API uses API keys to authenticate requests. You can view and manage the API keys in your account dashboard.
Using API keys can grant you a variety of privileges, so be sure to protect them! Avoid disclosing your private API keys in areas that are open to the public, such as GitHub, client-side code, and so forth.
All API requests must include a custom HTTP header called “apikey”. Implementation differs with each programming language. This post will discuss Python implementation.
Requests for APIs must always be sent over HTTPS. Calls made using plain HTTP will not succeed. Requests to APIs without authentication will also be rejected.
What Bad Words API endpoints are available?
POST method
Use the POST method to detect bad words, swear words, and profanity in a given text.
The parameter for sending a POST request is as follows:
Object | Description |
body (required) | Text to perform profanity check Location: Body, Data Type: string |
censor_character (required) | The censor character to apply. Defaults to *. Location: Query, Data Type: string |
Basic cURL example
Run this simple cURL command to use the Bad Words API to filter profanity on the given text:
1 |
curl --location --request POST "https://api.apilayer.com/bad_words?censor_character=censor_character" --header 'apikey: YOUR_API_KEY' --data-raw "You smell like shit!" |
Basic usage with Python
Go to the Bad Words API Live Demo.
Input your text to the “HTTP Body” text box, click the “Run Code” button, and then you can see the complete response on the output console below:
Here is the complete python code:
1 2 3 4 5 6 7 8 9 10 11 12 13 |
import requests url = "https://api.apilayer.com/bad_words?censor_character=censor_character" payload = "You%20smell%20like%20shit.".encode("utf-8") headers= { "apikey": "YOUR_API_KEY" } response = requests.request("POST", url, headers=headers, data = payload) status_code = response.status_code result = response.text |
You can fill the “payload” variable with any text you want, followed by .encode(“utf-8”) to handle utf-8 encoding.
What does the output mean?
The output by Bad Words API is much more informative than the default output by pure Python (profanity and better-profanity library). It will give you information about the position of the bad word, the deviation or typo from the actual bad word, and if there is any deviation-it will print out the original bad word, show the total number of the bad words, and finally, censor it.
Some tricky use cases
Predict any misuses and will also detect social media acronyms.
For example:
fck will get caught, but frck will not.
Shet will be caught as it should, but not shot.
Bad Words API could even handle added characters to bypass the filter nicely:
You can view the full source code of this article here [6]!
What Are The Common Industrial Implementations For Bad Words API?
1. eCommerce websites
You can use the Bad Words API to filter the inappropriate comments or text from your website. You can also use this API to automatically approve comments that do not contain any bad words. This is such a great solution for hardworking site moderators.
2. Chat rooms, online forums, social platforms, and dating services
One of the best uses of the Bad Words API is in the online chat rooms and forums. The most disturbing thing on these sites is the use of hate speech, abuse, bad words, and other toxic behaviors and illegal activities.
These sites can also be moderated using our API. When someone comments with offensive words, the API detects it, allowing you to keep the modesty intact.
3. Online games
Online gaming is becoming increasingly popular. These games also have a lot of social features, like online chat.
The point is that games are most likely played by children and there is no restriction about anything. Unless you stop it, abusive language is widely used in online games.
Please read this article for more information about the possible industrial implementations for Bad Words API:
Are You Ready To Create A Healthier Internet Using Bad Words API?
When it comes to moderating toxic behaviors online, you shouldn’t have to do it alone. APILayer’s Bad Words API would do all the heavy lifting for you!
So if you have any community site, a forum, or even a personal blog, you can use Bad Words API which automatically detects the bad words and will censor them without interrupting the user experience. We need to create a healthier Internet for everyone, don’t we?
Head over and sign up for free to start empowering your apps with bad words filtering API today!
References
[1] https://github.com/ben174/profanity
[2] https://github.com/ben174
[3] https://github.com/snguyenthanh/better_profanity
[4] https://en.wikipedia.org/wiki/Leet
[5] https://github.com/snguyenthanh
[6] https://github.com/MuhammadAzizulHakim/apilayerBlog-repo/tree/main/Article17%20-%20Bad%20Words%20API%20%2B%20Python