Tips for Mitigating Spam and Other Abuse on Drupal Websites

by Matt Moen Wednesday, November 16, 2016

The CAN-SPAM act, intended to mitigate unsolicited email and penalize those who enage in it, was passed in 2003, but even as of this writing in 2016 unsolicited marketing hawking anything from meds to bootleg content to Nigerian prince scams continue to be an issue. Modern email systems such as the cloud services from Google, Microsoft and others, have virtually eliminated the garbage from my inbox. As purveyors of web services and hosting, we see those actors continuing to attempt to get their messages through by submitting contact forms on websites.

Spammy content submitted to websites was often targeted at generating traffic through spammy links in comments. These days, bots will hit any form they can find on your website – contact forms, lead generation forms, signups, etc – with the assumption that such forms will generate a notification, so the spam content will end up in someone's inbox.

The challenge of mitigating such unwanted submissions is that the forms exist to make it easy for visitors to submit, and most actions that would curtail the spam would also discourage site visitors. So we must find the least intrusive way that mitigates the specific attacks that we are seeing.

Least Intrusive: Honeypot

The Drupal honeypot module is probably the least instrusive approach that can be effective at the unsophisticated bots that produce a good portion of the form-submission spam. The principle here is that the bot wants to fill out your form so that it isn't rejected for validation issues, and because every field is an opportunity to spam some links. But, the bots aren't sophisticated. They don't read instructions, and they don't understand CSS. So what say we add a field to forms that is irresistable to these unsophisticated bots, but humans won't fill out?

Honeypot does just that: adds a field with a label of your choice, hidden by CSS so normal users don't even see the protection at all, and with a label (also hidden by CSS) indicating the field should be left empty for assistive technologies. Bots can't help themselves and provide the information, and honeypot rejects the submission.

Next Up: Text Analysis

Analyzing the actual content of the submissions to determine their spammyness is among the keys that have made server side email spam protection work: spam emails follow simple obvious patterns and computers can easily be taught those patterns (or learn them themselves!) and squish the spam.

In Drupal, the Mollom module provides this service. Set up an account and a site at the Mollom website, install and configure the module in Drupal, and indicate what forms should be protected and how, and Mollom will use it's toolkit to mitigate spam on those forms.

It's important to note a few things: first, only forms that have sufficiently open ended text area fields can be analyzed for spam. So your contact form with a big ole "comments" field will definitely get analyzed, but a newsletter signup form will not. Second, Mollom is smart in that it won't impose anything on the user at all in text analysis mode (an important aspect of satisfying our goal of not reducing legit submissions), but if you cannot use text analysis, it will fall back to the captcha method (described below). Likewise if a form does have text analysis, and Mollom finds that a submission is spammy, rather than rejecting the content it will allow the user to answer a captcha to continue.

The Nuclear Option: CAPTCHAs

Quothe Wikipedia:

A CAPTCHA (a backronym for "Completely Automated Public Turing test to tell Computers and Humans Apart") is a type of challenge-response test used in computing to determine whether or not the user is human.

A CAPTCHA is just a puzzle that's meant to be (relatively) easy for humans to solve but difficult for computers to solve. Most commonly, we see simple, plain english questions for which the answer is obvious ("What color is the sky?" or "What is one plus eight?"), or one of those distorted images where the visitor is supposed to read the numbers and/or letters. Those are relatively hard problems for computers to solve – image recognition and natural language.

The downside is that they require the user to do something. That can be something simple, which usually means it's more likely to be defeated by a more sophisticated bot, or something that borders on annoying like those image captchas where the letters or digits are barely discernable from the noise. These will definitely impact your submissions, so they are a last resort.

The Drupal CAPTCHA module is a framework that allows admins to use a number of different CAPTCHA services, from the built in Math questions, to third party services like Google's reCAPTCHA.

reCAPTCHA

reCAPTCHA is worth specifically calling out. Originally the service combined the spam protection task with a mission to digitize books: show pictures of captured text and let users answer. The system knows more or less if the submitted answer is correct, so it can let users through based on some threshold logic, and at the same time can get human input on ambiguous text content.

So while the mission was novel, the challenge remained: users still had to fill out an annoying image challenge. reCAPTCHA released a new version more recently that analyzes user behavior via Javascript and initially presents on a checkbox saying "I am not a robot." If the user seems likely to be a robot, an image matching challenge is presented where users must choose images from a list that match a statement (for example, "Select all images that contain food."). This is an easier ask then character recongition on a distorted image.

Even better, in many cases the user will not be asked to perform any additional challenges based on reCAPTCHAs analysis of the behavior on the page. That's about as unintrusive as you can get while still requiring some user action or acknowledgement.

reCAPTCHA is available as a submodule for the Drupal CAPTCHA framework.

Abuse by Non-bots

A final consideration for this topic is that of abuse submitted by humans. Increasingly we see the spammy equivalent of services like Mechanical Turk are used to submit spam to forms on sites by humans. There is no truly effective way to block these, as they are not bots and have all the same abilities to bypass such protection as your actual intended audience. When using Mollom or similar services, if the content is sufficiently spammy, Mollom may block it altogether instead of falling back to captcha verification, but that's not a majority case.

Another place we've seen this manifested is for customers who have open-ended payment processes such as donation forms. Obviously capturing those donations is a high priority, obstacles to completing that transaction are to be avoided, and we want to allow visitors to donate any amount. But we've found that thieves use these forms to test stolen credit cards to determine which are valid but donating small amounts. This results in a flood of transactions that need to be voided or might otherwise result in costly chargebacks to the website owner.

Address verification and other fraud prevention services on your payment gateway are crucial to preventing these attacks. You'll still be charged transaction fees, but at least you won't have to manually reverse completed fraudulent transactions. We've implemented other specific measures to defeat such attempts, such as requiring a minimum amount, limiting submissions by IP address, and responding to patterns in the fraudulent transactions. These are by necessity custom solutions we don't have a good way to adapt to a generally useful solution.