While we’re mostly interested in the human visitors to our blogs, it can be fascinating to see what some of the non-human visitors are getting up to, including an assortment of bad bots.
What a 404 error is
Before we get to the main part of this post, I need to quickly explain what a 404 error is. If you try to go to a page that doesn’t exist on a website, you’ll get a 404 error message that basically says nope, there’s nothing there.
As an example, if I try to navigate to mentalhealthathome.org/doofus/, since there is no “doofus” page on my site, the following error message will show up:
A 404 error log
This isn’t something that you can keep track of without a plugin; to use plugins, which you can only use if you’re self-hosted or on the WP.com business or pro plan. Anyway, I use a plugin called Redirection, and one of the things it does is keep a 404 log.
Every time someone tries to go to something on my site that doesn’t exist, it keeps track. It will capture 404 errors from things like me typing in mentalhealthathome.org/doofus, but it will also keep track of bots that come by searching for things that aren’t there.
That’s where all of this starts to get interesting.
Weird stuff bots come looking for
While there might be the odd 404 error on my site generated by an actual person looking for something (or me making an error with my links) most of what shows up in my error logs is from bots. A lot of them come looking for images and things that used to exist on my site and were included in older versions of the sitemap, but have since been deleted.
There’s also some weird stuff. For all of these examples, assume that there’s a mentalhealthathome.org in front of each of them.
The first one sounds like they were searching for the secret porn corner of my website. I’m not sure what’s going on with the last one; presumably, it’s non-English alphabet characters that my plugin can’t understand.
Bad bots, bad bots, whatcha gonna do…
As far as I can gather, the rest are from bad bots cruising around looking for websites to hack.
This isn’t something I know a ton about, but in terms of the very basics, a bot is simply something that moves around the web doing automated things. There are bots that do constructive things, like allowing search engines to see what’s going on around the web so they can actually function as a search engine. There are also bad bots.
The examples above come from bots that are looking for a particular kind of doorway to try to gain access to a website. These examples show up in my 404 log because they’re doorways that don’t exist on my site.
While I knew in theory that sketchy bots wander around the internet, what’s surprised me since I started using this plugin is how often these sketchy bots are swinging by.
The nice thing about having a WordPress.com plan is that they take care of security, and the essential bits and pieces are held by WP rather than our individual blogs. That means there’s nothing actually breakable or break-in-able on our sites. There’s more on WordPress security here. For people who are self-hosted, it’s up to you and your host to make sure everything’s up to date security-wise for your site.
So rather than be concerned, I’ll just be fascinated. And maybe I should create an xxxss page just for the hell of it.
The blogging toolbox series has tips to support you in your blogging journey. It includes these posts:
26 thoughts on “Bad Bot Visitors to Your Blog”
I don’t check for that (or maybe I don’t have access). I have managed to send most of the garbage comments directly to trash by nothing their keywords. I hate looking through spam and it takes so much longer with all that in there. Some of the spam has been beating the filter by including only one link. This is super grrrr!
Older Child is interested in how bots skew/drive views and comments on social media. We are sad that exploiting vulnerabilities is anyone’s goal because behind every bot’s attempt to steal is the human who programmed it to do so. Neat information, Ashley.
The bot situation on social media is quite fascinating, although not in good way. I get followed on Twitter all the time by apparent bots for no obvious reason, but it sounds like they build their networks so they’re already established when the bot programmer wants to start spreading information.
And by information, I mean misinformation
Thanks for the info. There are tons of scammers out there… lots of phishing schemes. It’s just crazy but yes I’m glad WP takes care of it 🙂
/Keiths.php?? There’s some narcissistic bots out there 😂 It’s interesting because I know of the 404 error but not of seeing what errors have been prompted by others/bots searching my site. xx
Yeah I wonder who Keith is… Probably some pimply-faced teenage who’s doing malicious things when he’s in his room because his parents have grounded him!
Thank you Ashley, you answered some things that I have been experiencing on my blog.
I do get those messages about website has been deleted. I was wondering how they can log into WordPress without an active blog.
Those are often because someone has changed the URL of their blog but hasn’t updated their gravatar.
Hi Ashley, l left a couple of comments on your blog yesterday but received a message to say they couldn’t be embedded into your blog? I am not sure if it means the comment was lost or simply eaten by spam? Have you seen them? I am making a copy of this comment as apparently Melanie has experienced the same thing and says l have to copy it and post it again?
That’s so bizarre. There was nothing in my spam or trash. I’ve never come across that error before. And just now I tried embedding a Youtube video in the comments for one of my posts, and that embed wasn’t a problem for it. WordPress really needs to hire a blogger to just blog like a normal person all day long so they can identify all of this stuff.
Um, ok, this one published direct to the blog … so bizarre.
I love how stupid these ‘blogger’ are that just repost! Do they not think you’ll notice? And do you know, it’s just as easy for them to reblog and give credit?
It looked like a sketchy site rather than an actual blog. And I suppose they didn’t give it enough thought to check for internal links. I found out they were hosted on Bluehost. It seems like they’re a major hosting provider, so I was surprised how difficult it was to find a contact email on their site to report a copyright infringement.
I’,m glad you reported it 🙂
Helpful.Many times When I try to reach a blogger,standard pop up ….blog has been deleted etc.As your rightly put it,the blogger forgets to update their URL.in gravatar.
And I don’t think they even realize it.
True.No way we can keep them informed also,unless they take part in the discussion.
This is such a fascinating subject. Why do they do it? What do they gain? And I love that one might be searching for your secretly hidden porn site! Hilarious
This made me smile. Go for those pages…
yeah that’s a frustrating thing i have a bad experience about that in my previous blog