|
|
 |
 |
 |
Controlaccess.botBlocker
A robot or a client downloading a whole website can have a big impact on a dynamic website because the load is much higher than average. One way to deal with this is to keep a running moving total of the recents hits from any one IP and to return an http error code if the number of hits exceeds a magic number threshhold value. If the error code is a 503 service unavailable, well behaved clients are expected to retry the url after a small period of time. It remains to be seen if there are many, indeed any, well behaved clients .
Since this routine is called for every hit through mainresponder, its important to keep the overhead down, This is an interesting challenge - my solution is a table structure within temp with a subtable entry for each second of the day which in turn contains an array indexed by IP number counting the number of hits from one IP address in one second.
Computing the moving total then means accessing at most interval subtable entries. Since we are interested in a small time interval, we garbage collect the main table as we go by discarding subtables whose age is too great. Hopefuly this distributes that overhead evenly.
The script, which can be found at frontierTailor.data.installAddress.config.mainresponder.callbacks.controlaccess.botBlocker, must be called from a controlaccess table entry, which you need to create. Two globals at config.bayly.mainresponderCallbacks.maxHitsInterval and config.bayly.mainresponderCallbacks.maxHitsPerInterval also need to be manually defined. I started with values of max of 30 hits in 10 seconds, but will tune this as I monitor the scripts behaviour.
There are lots of ways to extend this idea.
- Don't return an error, simply have the thread wait for small period of time. This might slow robots on their first pass, but not work well if they are doing a verification pass.
| «BotBlocker : 02/01/12, 14:40:19 by DAB |
| |
«A way to ensure that no single client monopolises the use of the dynamic sever |
| |
«First attempt 02/01/12, 14:41:30 by DAB |
| |
«look for number of hits from each in a (definedable) period; if it exceeds a configurable |
| |
«maximum, bail out with a server unavilable error. |
| |
«Table design is optimised for answering the question |
| |
«how many hits from this IP in the last n seconds? |
| |
«and keeping the total table size to a minimum |
| if not defined(temp.mainresponderdenyIP) |
| |
new(tabletype, @temp.mainresponderdenyIP) |
| |
user.webserver.stats.botBlockerSinceStart = 0 «02/04/22, 10:25:40 by DAB |
| if defined(config.bayly.mainresponderCallbacks.newHitsThreshold) «08/06/23, 11:23:43 by DAB |
| |
if sizeOf(system.compiler.threads) > config.bayly.mainresponderCallbacks.newHitsThreshold |
| |
responseBody = webserver.util.buildErrorPage ("503 SERVICE UNAVAILABLE", "Access temporarily denied due to excessive load on our server.") |
| |
if not defined(user.webserver.stats.overloaded) |
| |
user.webserver.stats.overloaded=0 |
| |
user.webserver.stats.overloaded++ |
| |
scriptError ("!return") « and bail out to prevent mainResponder.respond from overwriting our response |
| «if client == tcp.mydottedID() «04/05/02, 16:42:13 by DAB look for spoofed IP addresses |
| |
«responseBody = webserver.util.buildErrorPage ("403 Forbidden", "Access denied because of spoofed IP address") |
| |
«if not defined(user.webserver.stats.botBlocker) |
| |
« user.webserver.stats.botBlocker=0 |
| |
«user.webserver.stats.botBlocker++ «02/01/29, 17:12:07 by DAB |
| if defined(user.webserver.botblockerPermanent) and defined(user.webserver.botblockerPermanent.[client]) «known bad robots |
| |
«user.webserver.botblockerPermanent.[client]++ |
| |
responseBody = webserver.util.buildErrorPage ("403 Forbidden", "Access denied to referer spammer") |
| |
if not defined(user.webserver.stats.botBlockerPermanent) |
| |
user.webserver.stats.botBlockerPermanent=0 |
| |
user.webserver.stats.botBlockerPermanent++ «02/01/29, 17:12:07 by DAB |
| |
scriptError ("!return") « and bail out to prevent mainResponder.respond from overwriting our response 04/12/05, 09:39:01 by DAB |
| if defined(user.webserver.botBlockerDomainBlackList) and defined(requestHeaders.referer) |
| |
local (domain, flBlacklisted=false) «08/07/25, 11:59:59 by DAB |
| |
«local (domain, nodesMatched=0) |
| |
local (urlbits = string.urlSplit(requestHeaders.referer)) |
| |
domain = string.nthField(urlbits[2], ":", 1) «05/10/27, 09:40:45 by DAB |
| |
local (cntNodes = string.countFields(domain, ".")) |
| |
local (adr = @user.webserver.botBlockerDomainBlackList) |
| |
local (nodesMatched=0) «08/07/25, 12:02:10 by DAB |
| |
for ix = cntNodes downto 1 |
| |
adr = @adr^.[string.nthField(domain, ".", ix)] |
| |
if sizeOf(parentOf(adr^)) == 0 «08/07/25, 12:01:40 by DAB |
| |
if nodesMatched == cntNodes «08/07/25, 12:01:44 by DAB |
| |
if flBlacklisted «08/07/25, 12:02:34 by DAB |
| |
if not defined(user.webserver.botblockerReferers.[domain]) |
| |
user.webserver.botblockerReferers.[domain] = 0 |
| |
user.webserver.botblockerReferers.[domain]++ |
| |
if not defined(user.webserver.botblockerPermanent.[client]) «07/01/06, 13:19:39 by DAB |
| |
new(tabletype, @user.webserver.botblockerPermanent.[client]) |
| |
if not defined(user.webserver.botblockerPermanent.[client].[domain]) «07/01/06, 13:19:42 by DAB |
| |
user.webserver.botblockerPermanent.[client].[domain] = 0 |
| |
user.webserver.botblockerPermanent.[client].[domain]++ «07/01/06, 13:20:18 by DAB |
| |
responseBody = webserver.util.buildErrorPage ("403 Forbidden", "Access denied to referer spammer") |
| |
if not defined(user.webserver.stats.botBlockerPermanent) |
| |
user.webserver.stats.botBlockerPermanent=0 |
| |
user.webserver.stats.botBlockerPermanent++ «02/01/29, 17:12:07 by DAB |
| local (now = string(clock.now())) |
| local (adrTbl = @temp.mainresponderdenyIP.[now]) |
| local (adr = @adrTbl^.[client]) |
| adr^++ « increment counter for this (attempted) hit |
| bundle «cnt prior entries in the table; garbage collect here too |
| |
try «we could use semaphores, but better to bail out on errors and let next pass clean up |
| |
for ix = sizeof(temp.mainresponderdenyIP) downto 1 |
| |
adr = @temp.mainresponderdenyIP[ix] |
| |
if number(date(now) - number(date(nameof(adr^)))) > config.bayly.mainresponderCallbacks.maxHitsInterval |
| |
try {cnt = cnt + adr^.[client]} «the entry may not exist, so ignore errors |
| if cnt > config.bayly.mainresponderCallbacks.maxHitsPerInterval |
| |
«code = 403 «if we get here, the client is a bot or a site stealer mos likely |
| |
code = 503 «if we get here, the client is a bot or a site stealer most likely; change 403 to 503 02/01/29, 17:07:28 by DAB |
| |
responseBody = webserver.util.buildErrorPage ("503 SERVICE UNAVAILABLE", "Access temporarily denied due to excessive load from your IP Address") |
| |
if not defined(user.webserver.stats.botBlocker) {user.webserver.stats.botBlocker=0} «02/01/29, 17:12:01 by DAB |
| |
user.webserver.stats.botBlocker++ «02/01/29, 17:12:07 by DAB |
| |
user.webserver.stats.botBlockerSinceStart++ «02/04/22, 10:25:28 by DAB |
| |
scriptError ("!return") « and bail out to prevent mainResponder.respond from overwriting our response |
Relative to Frontier version 9.7b10
|