While we’re mostly interested in the human visitors to our blogs, it can be fascinating to see what some of the non-human visitors are getting up to, including an assortment of bad bots.
What a 404 error is
Before we get to the main part of this post, I need to quickly explain what a 404 error is. If you try to go to a page that doesn’t exist on a website, you’ll get a 404 error message that basically says nope, there’s nothing there.
As an example, if I try to navigate to mentalhealthathome,org/doofus/, since there is no “doofus” page on my site, the following error message will show up:
A 404 error log
This isn’t something that you can keep track of without a plugin; to use plugins, which you can only use if you’re self-hosted or on the WP.com business plan or higher. Anyway, I use a plugin called Redirection, and one of the things is does is keep a 404 log.
Every time someone tries to go to something on my site that doesn’t exist, it keeps track. It will capture 404 errors from things like me typing in mentalhealthathome.org/doofus, but it will also keep track of bots that come by searching for things that aren’t there.
That’s where all of this starts to get interesting.
Weird stuff bots come looking for
While there might be the odd 404 error on my site generated by an actual person looking for something (or me making an error with my links) most of what shows up in my error logs is from bots. A lot of them come looking for images and things that used to exist on my site and were included in older versions of the sitemap, but have since been deleted.
There’s also some weird stuff. For all of these examples, assume that there’s a mentalhealthathome.org in front of each of them.
The first one sounds like they were searching for the secret porn corner of my website. I’m not sure what’s going on with the last one; presumably it’s non-English alphabet characters that my plugin can’t understand.
Bad bots, bad bots, whatcha gonna do…
As far as I can gather, the rest are from bad bots cruising around looking for websites to hack.
This isn’t something I know a ton about, but in terms of the very basics, a bot is simply something that moves around the web doing automated things. There are bots that do constructive things, like allowing search engines to see what’s going on around the web so they can actually function as a search engine. There are also bad bots.
The examples above come from bots that are looking for a particular kind of doorway to try to gain access to a website. These examples show up in my 404 log because they’re doorways that don’t exist on my site.
While I knew in theory that sketchy bots wander around the internet, what’s surprised me since I started using this plugin is how often these sketchy bots are swinging by.
The nice thing about having a WordPress.com plan is that they take care of security, and the essential bits and pieces are held by WP rather than our individual blogs. That means there’s nothing actually breakable or break-in-able on our sites. There’s more on WordPress security here. For people who are self-hosted, it’s up to you and your host to make sure everything’s up to date security-wise for your site.
So rather than be concerned, I’ll just be fascinated. And maybe I should create a an xxxss page just for the hell of it.
P.S. – Some asshat just published this post on their site, not as a reblog, but just plain old copyright infringement. They’re not bright, though, because they didn’t remove the internal links, so I got a pingback. Copyright infringement and stupidity aren’t an attractive combination.