Everyone will acknowledge that spam emails are a constant nuisance. Spam remains a regular interruption in our daily lives, where we have to spend time to open and delete those emails. Though not always, they also pose severe threats to our systems and can cripple our networks. Spamming in today’s digital era is a billion-dollar industry where companies even go so far as to use this as a professional technique to promote their services.
To avoid such spam, internet users resort to various spam filtering techniques and spam-free channels to curb this practice. There are different categories of filtering options to choose from. We will be discussing the best spam filtering methods which will cover the scope of the current spam problems.
List-Based Anti-Spam Filters
This type of spam filtering works by using a preset list of senders marked as either spammers or trusted senders. The lists are created by the user or an organization’s system administrator. Only the messages from trusted origins are allowed to enter your email ID.
Blacklist
In this anti-spam filtering service, a preset list of potentially spammy IPs and email addresses are built, and the system works by blocking messages from those in the list and allowing messages from those senders not in the blacklist to get through. When a new message arrives, the spam filter cross-checks it with the list, and if the message is on the blacklist, the message is spammed and rejected.
Whitelist
This anti-spam filter works almost precisely opposite to the technique of blacklisting. With this type of filter, the user keeps a list of users allowed to send emails. Most spam filters use this as an additional feature apart from the primary blacklisting method.
Greylist
This is a new filtering method in list-based filters. As the name implies, it works somewhere in between the two list-based methods above. It has neither blacklists nor whitelists; instead, it works on the principle that many spammers only try to send junk emails by sending them in bulk once. Greylist treats any unknown senders as potential spammers, or it puts them in the greylist. It sends off any mail from anonymous mailers, if the server or sender attempts sending the mail for a second time, the mail is allowed to enter your inbox.
Content-Based Anti-Spam Filters
Instead of blocking or allowing emails from a particular IP address or email ID, content-based filters scan and evaluate words and phrases in the emails to analyze its authenticity. They are also known as rule-based filters as they review the content of email messages using artificial intelligence and machine learning tools according to preset rules and policies to decide whether they are spam or not.
Word-based filters
A simple content-based filter, it blocks emails containing certain pre-defined words. There are specific terms commonly used in spam messages which are not very suitable for or often used in personal or business communications. This word list needs to be created and updated regularly.
This type of filters tends to generate false positives sometimes, so we have to be careful while choosing the terms to be included in the list. For instance, if you select the word ‘discount’ to be junked, you can block out even legitimate emails containing that term along with spam.
Heuristic Filters
This type of filters is a step-up from simple word-based filters. They don’t take a single word into consideration but various terms instead. They use algorithms to identify spam terms and score emails based on such contents. Points are assigned to terms according to their usage in spam emails. While suspicious words are high on the score, ordinary words which are also used in legitimate emails score minor points. The filter adds up the points, and if the score reaches a certain pre-established point, it considers the email to be spam and directs it to quarantine.
Heuristic filters are efficient in minimizing email delay due to filtering. However, they also tend to cause false positives. Smart attackers may even learn to bypass the filter by using carefully chosen terms in their messages.
Bayesian Filters
One of the most advanced methods of content-based filters. They use the logic of mathematical probability to determine the legitimacy of a mail. Initially, the system administrator has to train the spam filter or show it how to identify spam emails with manual flagging. Over time, the filter learns and builds word lists – legitimate and spam. For instance, if a word has appeared a number of times in a spam message, there is a higher chance of it being spam. Thus it adds it in the spam list. The list is build up continually over a period of time. Later on, it works on its own by scanning emails and evaluating their contents against its word lists to determine their authenticity.
Even though the Bayesian filter is very effective and efficient in the long run, it takes a lot of patience on the part of the system admin. You have to patiently and manually flag and delete spam messages before it becomes capable of working efficiently on its own.