Blogging

Bad Bot Visitors to Your Blog

the word blog repeated around a globe

While we’re mostly interested in the human visitors to our blogs, it can be fascinating to see what some of the non-human visitors are getting up to, including an assortment of bad bots.

What a 404 error is

Before we get to the main part of this post, I need to quickly explain what a 404 error is. If you try to go to a page that doesn’t exist on a website, you’ll get a 404 error message that basically says nope, there’s nothing there.

As an example, if I try to navigate to mentalhealthathome,org/doofus/, since there is no “doofus” page on my site, the following error message will show up:

screenshot of Mental Health @ Home 404 error page: Oops! That page can't be found
mentalhealthathome,org/doofus/

A 404 error log

This isn’t something that you can keep track of without a plugin; to use plugins, which you can only use if you’re self-hosted or on the WP.com business plan or higher. Anyway, I use a plugin called redirection, and one of the things is does is keep a 404 log.

Every time someone tries to go to something on my site that doesn’t exist, it keeps track. It will capture 404 errors from things like me typing in mentalhealthathome.org/doofus, but it will also keep track of bots that come by searching for things that aren’t there.

That’s where all of this starts to get interesting.

Weird stuff bots come looking for

While there might be the odd 404 error on my site generated by an actual person looking for something (or me making an error with my links) most of what shows up in my error logs is from bots. A lot of them come looking for images and things that used to exist on my site and were included in older versions of the sitemap, but have since been deleted.

There’s also some weird stuff. For all of these examples, assume that there’s a mentalhealthathome.org in front of each of them.

  • /xxxss
  • /wp-content/plugins/apikey/apikey.php?test=hello
  • /data/admin/allowurl.txt
  • /vendor/phpunit/phpunit/LICENSE
  • /wp-content/plugins/viral-optins/api/uploader/file-uploader.php
  • /vuln.htm
  • /fckeditor/editor/filemanager/connectors/php/upload.php?Type=Media
  • /wp-content/plugins/hd-webplayer/playlist.php
  • /wp-content/plugins/jekyll-exporter/vendor/phpunit/phpunit/build.xml
  • /wp-content/plugins/cherry-plugin/admin/import-export/upload.php
  • /Keiths.php
  • /�½��ļ���.rar

The first one sounds like they were searching for the secret porn corner of my website. I’m not sure what’s going on with the last one; presumably it’s non-English alphabet characters that my plugin can’t understand.

Bad bots, bad bots, whatcha gonna do…

As far as I can gather, the rest are from bad bots cruising around looking for websites to hack.

This isn’t something I know a ton about, but in terms of the very basics, a bot is simply something that moves around the web doing automated things. There are bots that do constructive things, like allowing search engines to see what’s going on around the web so they can actually function as a search engine. There are also bad bots.

The examples above come from bots that are looking for a particular kind of doorway to try to gain access to a website. These examples show up in my 404 log because they’re doorways that don’t exist on my site.

While I knew in theory that sketchy bots wander around the internet, what’s surprised me since I started using this plugin is how often these sketchy bots are swinging by.

The nice thing about having a WordPress.com plan is that they take care of security, and the essential bits and pieces are held by WP rather than our individual blogs. That means there’s nothing actually breakable or break-in-able on our sites. There’s more on WordPress security here. For people who are self-hosted, it’s up to you and your host to make sure everything’s up to date security-wise for your site.

So rather than be concerned, I’ll just be fascinated. And maybe I should create a an xxxss page just for the hell of it.

P.S. – Some asshat just published this post on their site, not as a reblog, but just plain old copyright infringement. They’re not bright, though, because they didn’t remove the internal links, so I got a pingback. Copyright infringement and stupidity aren’t an attractive combination.

For tips on blogging basics, check out the New Blogger’s Guide to WordPress.

The Up Your Blogging Game page has more advanced info to take your blogging to the next level.

32 thoughts on “Bad Bot Visitors to Your Blog”

  1. I don’t check for that (or maybe I don’t have access). I have managed to send most of the garbage comments directly to trash by nothing their keywords. I hate looking through spam and it takes so much longer with all that in there. Some of the spam has been beating the filter by including only one link. This is super grrrr!

  2. Older Child is interested in how bots skew/drive views and comments on social media. We are sad that exploiting vulnerabilities is anyone’s goal because behind every bot’s attempt to steal is the human who programmed it to do so. Neat information, Ashley.

    1. The bot situation on social media is quite fascinating, although not in good way. I get followed on Twitter all the time by apparent bots for no obvious reason, but it sounds like they build their networks so they’re already established when the bot programmer wants to start spreading information.

  3. Thanks for the info. There are tons of scammers out there… lots of phishing schemes. It’s just crazy but yes I’m glad WP takes care of it 🙂

  4. /Keiths.php?? There’s some narcissistic bots out there 😂 It’s interesting because I know of the 404 error but not of seeing what errors have been prompted by others/bots searching my site. xx

    1. Yeah I wonder who Keith is… Probably some pimply-faced teenage who’s doing malicious things when he’s in his room because his parents have grounded him!

  5. Hi Ashley, l left a couple of comments on your blog yesterday but received a message to say they couldn’t be embedded into your blog? I am not sure if it means the comment was lost or simply eaten by spam? Have you seen them? I am making a copy of this comment as apparently Melanie has experienced the same thing and says l have to copy it and post it again?

    1. That’s so bizarre. There was nothing in my spam or trash. I’ve never come across that error before. And just now I tried embedding a Youtube video in the comments for one of my posts, and that embed wasn’t a problem for it. WordPress really needs to hire a blogger to just blog like a normal person all day long so they can identify all of this stuff.

    1. It looked like a sketchy site rather than an actual blog. And I suppose they didn’t give it enough thought to check for internal links. I found out they were hosted on Bluehost. It seems like they’re a major hosting provider, so I was surprised how difficult it was to find a contact email on their site to report a copyright infringement.

  6. This is such a fascinating subject. Why do they do it? What do they gain? And I love that one might be searching for your secretly hidden porn site! Hilarious

Leave a Reply