logo
Welcome to our new AbleCommerce forums. As a guest, you may view the information here. To post to this forum, you must have a registered account with us, either as a new user evaluating AbleCommerce or an existing user of the application. For all questions related to the older version of Gold and earlier, please go to AbleCommerce Gold forum. Please use your AbleCommerce username and password to Login. New Registrations are disabled.

Notification

Icon
Error

Options
Go to last post Go to first unread
david9688526  
#1 Posted : Friday, December 22, 2023 9:13:59 PM(UTC)
david9688526

Rank: Newbie

Groups: Developers
Joined: 4/17/2020(UTC)
Posts: 1

Thanks: 1 times
I've been noticing that my AbleCommerce site was using a lot of CPU on my server for serveral days now. I finally looked into it today.
When viewing SQL Profiler, I was seeing a ton of queries from my AbleCommerce site.
I then went at looked at my, Reports -> System -> Who Is Online. There were only 8 users listed but two of the users were clearly bots. One had over 44,000 Page Views in a single day. Another had 3900 views over the course of 40 minutes. Both had multiple PageViews per second.

Something that is confusing to me is that this "user" had a wide variety of IP addresses and Browsers listed.
When I looked up the IP addresses, they were associated with AWS (Amazon Web Services). When I blocked the IP blocks through my firewall, it appears the bot just shifted to different IP blocks.

I remembered this happening in the past and found this old thread from 2015 in the old forums that I started:
http://forums.ablecommer...&p=81617&e=81617

Based on that thread I queried my ac_PageViews table. The two users have 48,486 queries today. A "user" yesterday had 140,025 queries.

AbleCommerce needs an automated way to deal with this.
Are there any recommendations for addressing this. It has been 8 years since I first brought this up and it doesn't seem to have been addressed.

David

Edited by user Tuesday, December 26, 2023 12:54:22 PM(UTC)  | Reason: fixed typo

Wanna join the discussion?! Login to your AbleCommerce Forums forum account. New Registrations are disabled.

david9688526  
#2 Posted : Friday, December 22, 2023 9:43:39 PM(UTC)
david9688526

Rank: Newbie

Groups: Developers
Joined: 4/17/2020(UTC)
Posts: 1

Thanks: 1 times
I'm continuing to dig into this (an exciting Friday night!) and noticing that some of the offenders are valid bots. And for them, the problem seems to occur when they try to use "ShopBy" as part of the UriQuery. Anyone know the problem robots.txt syntax for blocking any use of "ShopBy="?

Edited by user Friday, December 22, 2023 9:45:47 PM(UTC)  | Reason: added info

david9688526  
#3 Posted : Friday, December 22, 2023 9:53:29 PM(UTC)
david9688526

Rank: Newbie

Groups: Developers
Joined: 4/17/2020(UTC)
Posts: 1

Thanks: 1 times
I've added the following to my robots.txt

Disallow: /*?*ShopBy= # disallow anything including ShopBy in querystring

Based this on this thread:
https://stackoverflow.co...with-specific-parameters

I'm aware this will only help with bots that respect robots.txt but I think good bots may be a significant source of the hammering.

Thoughts?
david9688526  
#4 Posted : Friday, December 22, 2023 10:41:56 PM(UTC)
david9688526

Rank: Newbie

Groups: Developers
Joined: 4/17/2020(UTC)
Posts: 1

Thanks: 1 times
The AmazonBot was responsible for 57,000 of the visits in the last 24 hours. It hasn't picked up the robots.txt changes yet but I'm hoping it will in the next few hours. I also blocked a few more IP blocks that were obviously bots but didn't have a UserAgent string that reflected that. Site is still being hammered. I will check things in the morning to see where we are.

Edited by user Friday, December 22, 2023 10:42:49 PM(UTC)  | Reason: added info

david9688526  
#5 Posted : Saturday, December 23, 2023 9:32:28 AM(UTC)
david9688526

Rank: Newbie

Groups: Developers
Joined: 4/17/2020(UTC)
Posts: 1

Thanks: 1 times
Here are some useful queries I've created to help me figure out what is happening.

--Traffic Counts by UserAgent for the past 24 hours
SELECT Count(PageViewID) as PageViews, UserAgent from ac_PageViews
where ActivityDate > DateAdd(hh, -24, GetDate())
GROUP BY UserAgent
ORDER BY Count(PageViewID) DESC

--Traffic Counts by UserAgent, RemoteIP for the past 24 hours -Useful for blocking IPs from bad bots through Firewall
SELECT Count(PageViewID) as PageViews, UserAgent, RemoteIP from ac_PageViews
where ActivityDate > DateAdd(hh, -24, GetDate())
GROUP BY UserAgent, RemoteIP
ORDER BY Count(PageViewID) DESC

--Show PageViews from the Last 2 hours -- Details, Useful for understanding what URLs they are hitting
SELECT ActivityDate, UserId, UriStem, UriQuery, UserAgent, RemoteIP
FROM ac_PageViews
Where ActivityDate > DateAdd(hh, -2, GetDate())
--and UserAgent like '%Amazonbot%'
ORDER BY ActivityDate DESC

For me, the top 10 results from the first query above are.
PageViews UserAgent
67613 --Mozilla/5.0 ... (Amazonbot/0.1; +https://developer.amazon.com/support/amazonbot)
18405 --Mozilla/5.0 ...bingbot/2.0; +http://www.bing.com/bingbot.htm) Chrome/116.0.1938.76 Safari/537.36
11971 --Mozilla/5.0 ... AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36
11838 --Mozilla/5.0 ... AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.169 Safari/537.36
11827 --Mozilla/5.0 ... AppleWebKit/537.36 (KHTML, like Gecko) Chrome/74.0.3729.157 Safari/537.36
11810 --Mozilla/5.0 ... AppleWebKit/537.36 (KHTML, like Gecko) Chrome/96.0.4664.110 Safari/537.36
6234 --Mozilla/5.0 (compatible; DotBot/1.2; +https://opensiteexplorer.org/dotbot; help@moz.com)
5102 --serpstatbot/2.1 (advanced backlink tracking bot; https://serpstatbot.com/; abuse@serpstatbot.com)
5086 --Mozilla/5.0 (compatible; GeedoBot; +http://www.geedo.com/bot.html)
3802 --Mozilla/5.0 (Linux; Android 7.0;) ... (compatible; PetalBot;+https://webmaster.petalsearch.com/site/petalbot)

I snipped the UserAgent string above to improve readability in the forums

So, 67613 hit from AmazonBot is the main issue. After 11 hours, it still hasn't picked up the changes to Robots.txt
ray22901031  
#6 Posted : Saturday, December 23, 2023 6:09:39 PM(UTC)
ray22901031

Rank: Advanced Member

Groups: Authorized User, Developers
Joined: 2/17/2019(UTC)
Posts: 827

Thanks: 3 times
Was thanked: 13 time(s) in 13 post(s)
If you're waiting for Abelcommerce to fix this problem, newsflash: they won't. It isn't their concern. Ablecommerce is an e-commerce platform, not a firewall. I suggest you pay $20 monthly and get the basic Cloudflare firewall that will take care of this once and for all.

As specified in an earlier post, if Magento, which has thousands of developers and millions of users, has the identical issue and people have to rely on an external firewall, and you think Abelcommerce is going to jump on this?

The solution is quite simple: pay the extra $20.

It may not be what you want to hear, but I'm being realistic.

-Ray
david9688526  
#7 Posted : Tuesday, December 26, 2023 1:47:56 PM(UTC)
david9688526

Rank: Newbie

Groups: Developers
Joined: 4/17/2020(UTC)
Posts: 1

Thanks: 1 times
Adding the ShopBy rule I posted above does seem to have largely fixed the problem. It took a couple days for the bots to pick up the change. My server now averages about 20% usage whereas before it averaged around 50% usage.

I think the main concern I have is that AbleCommerce doesn't provide any way for site admins to be aware of this issue. I was only aware of it because I saw an unusual load on my server and dug into it. It seems like there should be a widget on the Dashboard that warns about excessive traffic from individual "users" and ideally identifies Bots as part of that.

Once I was aware of the problem, it didn't take me too long to resolve it. But currently, I'm not aware of anywhere in AbleCommerce where you can see excessive traffic.

Also, if this was some kind of malicious hack, I'd be fine paying CloudFlare $20/month to filter traffic. But since this was a valid bot (AmazonBot) that adhere's to Bot rules, it seems like this needs to be better addressed as this could happen to any site that makes extensive use of Product Templates.
ray22901031  
#8 Posted : Tuesday, December 26, 2023 1:58:26 PM(UTC)
ray22901031

Rank: Advanced Member

Groups: Authorized User, Developers
Joined: 2/17/2019(UTC)
Posts: 827

Thanks: 3 times
Was thanked: 13 time(s) in 13 post(s)
Cloudflare will give you all the warnings and all the reporting you need. In fact, the reporting is superior because it looks at many things, not just IPs. You must also remember that blocking it at the robot.txt level only works for bots that read the file and do not ignore it.

Also, to block it using robot.txt means that they have already started to use resources at the server level.

It is better to block or limit bots using their AS # than wild cards, which are more resource-intensive and rob your resources from the server.

We used the business edition, so I have a lot more resources than the basic, but the first thing I would do, even with the basic, is limit the search page by specific IPs to 4 searches per minute. It will take time since you need to go through the server logs and look at things from many angles. The bottom line is that no resources are ever used at the server because they never make it to the server.

I hope this helps, and happy holidays,
-Ray



Users browsing this topic
Guest
Forum Jump  
You cannot post new topics in this forum.
You cannot reply to topics in this forum.
You cannot delete your posts in this forum.
You cannot edit your posts in this forum.
You cannot create polls in this forum.
You cannot vote in polls in this forum.