space Baylys
Just practicing to pass the Turing test.
space
space
space
space
Developer for Hire!
space
addedValues Plugin
space

Home

What's new

Bayly.Root

Cornershop Plugin

Career

edutools Root

Enhancements

space

Mainresponder

space
space

Callbacks

space
space

Controlaccess.botBlocker

pathevaluation.resolvealias

space

Controlpanel addIns

space

manila

Interests

linguist Plugin

Manila

Patches

Patches by Group

Papers

Sales

Sign My Guestbook

User(land) Relations.

Contact Address

Search Baylys

urlchains

space
Join Now
Login
space space space

Controlaccess.botBlocker

A robot or a client downloading a whole website can have a big impact on a dynamic website because the load is much higher than average. One way to deal with this is to keep a running moving total of the recents hits from any one IP and to return an http error code if the number of hits exceeds a magic number threshhold value. If the error code is a 503 service unavailable, well behaved clients are expected to retry the url after a small period of time. It remains to be seen if there are many, indeed any, well behaved clients .

Since this routine is called for every hit through mainresponder, its important to keep the overhead down, This is an interesting challenge - my solution is a table structure within temp with a subtable entry for each second of the day which in turn contains an array indexed by IP number counting the number of hits from one IP address in one second.

Computing the moving total then means accessing at most interval subtable entries. Since we are interested in a small time interval, we garbage collect the main table as we go by discarding subtables whose age is too great. Hopefuly this distributes that overhead evenly.

The script, which can be found at frontierTailor.data.installAddress.config.mainresponder.callbacks.controlaccess.botBlocker, must be called from a controlaccess table entry, which you need to create. Two globals at config.bayly.mainresponderCallbacks.maxHitsInterval and config.bayly.mainresponderCallbacks.maxHitsPerInterval also need to be manually defined. I started with values of max of 30 hits in 10 seconds, but will tune this as I monitor the scripts behaviour.

There are lots of ways to extend this idea.

  • Don't return an error, simply have the thread wait for small period of time. This might slow robots on their first pass, but not work well if they are doing a verification pass.

«BotBlocker : 02/01/12, 14:40:19 by DAB
  «A way to ensure that no single client monopolises the use of the dynamic sever
  «First attempt 02/01/12, 14:41:30 by DAB
  «look for number of hits from each in a (definedable) period; if it exceeds a configurable
  «maximum, bail out with a server unavilable error.
  «Table design is optimised for answering the question
  «how many hits from this IP in the last n seconds?
  «and keeping the total table size to a minimum
if not defined(temp.mainresponderdenyIP)
  new(tabletype, @temp.mainresponderdenyIP)
  user.webserver.stats.botBlockerSinceStart = 0 «02/04/22, 10:25:40 by DAB
if defined(config.bayly.mainresponderCallbacks.newHitsThreshold) «08/06/23, 11:23:43 by DAB
  if sizeOf(system.compiler.threads) > config.bayly.mainresponderCallbacks.newHitsThreshold
  code = 503
  responseBody = webserver.util.buildErrorPage ("503 SERVICE UNAVAILABLE", "Access temporarily denied due to excessive load on our server.")
  if not defined(user.webserver.stats.overloaded)
  user.webserver.stats.overloaded=0
  user.webserver.stats.overloaded++
  scriptError ("!return") « and bail out to prevent mainResponder.respond from overwriting our response
«if client == tcp.mydottedID() «04/05/02, 16:42:13 by DAB look for spoofed IP addresses
  «code = 403 «forbidden
  «responseBody = webserver.util.buildErrorPage ("403 Forbidden", "Access denied because of spoofed IP address")
  «if not defined(user.webserver.stats.botBlocker)
  « user.webserver.stats.botBlocker=0
  «user.webserver.stats.botBlocker++ «02/01/29, 17:12:07 by DAB
  «return true
if defined(user.webserver.botblockerPermanent) and defined(user.webserver.botblockerPermanent.[client]) «known bad robots
  «user.webserver.botblockerPermanent.[client]++
  code = 403 «forbidden
  responseBody = webserver.util.buildErrorPage ("403 Forbidden", "Access denied to referer spammer")
  if not defined(user.webserver.stats.botBlockerPermanent)
  user.webserver.stats.botBlockerPermanent=0
  user.webserver.stats.botBlockerPermanent++ «02/01/29, 17:12:07 by DAB
  «return true
  scriptError ("!return") « and bail out to prevent mainResponder.respond from overwriting our response 04/12/05, 09:39:01 by DAB
if defined(user.webserver.botBlockerDomainBlackList) and defined(requestHeaders.referer)
  local (domain, flBlacklisted=false) «08/07/25, 11:59:59 by DAB
  «local (domain, nodesMatched=0)
  try
  local (urlbits = string.urlSplit(requestHeaders.referer))
  domain = string.nthField(urlbits[2], ":", 1) «05/10/27, 09:40:45 by DAB
  local (cntNodes = string.countFields(domain, "."))
  local (adr = @user.webserver.botBlockerDomainBlackList)
  local (nodesMatched=0) «08/07/25, 12:02:10 by DAB
  local (ix)
  for ix = cntNodes downto 1
  adr = @adr^.[string.nthField(domain, ".", ix)]
  if defined(adr^)
  nodesMatched++
  else
  if sizeOf(parentOf(adr^)) == 0 «08/07/25, 12:01:40 by DAB
  flBlacklisted = true
  break
  if nodesMatched == cntNodes «08/07/25, 12:01:44 by DAB
  flBlacklisted = true
  else
  window.msg(tryError)
  if flBlacklisted «08/07/25, 12:02:34 by DAB
  «if nodesMatched >= 2 \
  if not defined(user.webserver.botblockerReferers.[domain])
  user.webserver.botblockerReferers.[domain] = 0
  user.webserver.botblockerReferers.[domain]++
  if not defined(user.webserver.botblockerPermanent.[client]) «07/01/06, 13:19:39 by DAB
  new(tabletype, @user.webserver.botblockerPermanent.[client])
  if not defined(user.webserver.botblockerPermanent.[client].[domain]) «07/01/06, 13:19:42 by DAB
  user.webserver.botblockerPermanent.[client].[domain] = 0
  user.webserver.botblockerPermanent.[client].[domain]++ «07/01/06, 13:20:18 by DAB
  code = 403 «forbidden
  responseBody = webserver.util.buildErrorPage ("403 Forbidden", "Access denied to referer spammer")
  if not defined(user.webserver.stats.botBlockerPermanent)
  user.webserver.stats.botBlockerPermanent=0
  user.webserver.stats.botBlockerPermanent++ «02/01/29, 17:12:07 by DAB
  scriptError ("!return")
local (now = string(clock.now()))
local (adrTbl = @temp.mainresponderdenyIP.[now])
if not defined(adrTbl^)
  new(tabletype, adrTbl)
local (adr = @adrTbl^.[client])
if not defined(adr^)
  adr^ = 0
adr^++ « increment counter for this (attempted) hit
local (cnt= 0)
bundle «cnt prior entries in the table; garbage collect here too
  local (ix)
  try «we could use semaphores, but better to bail out on errors and let next pass clean up
  for ix = sizeof(temp.mainresponderdenyIP) downto 1
  adr = @temp.mainresponderdenyIP[ix]
 
  if number(date(now) - number(date(nameof(adr^)))) > config.bayly.mainresponderCallbacks.maxHitsInterval
  delete(adr)
  continue
 
  try {cnt = cnt + adr^.[client]} «the entry may not exist, so ignore errors
if cnt > config.bayly.mainresponderCallbacks.maxHitsPerInterval
  «code = 403 «if we get here, the client is a bot or a site stealer mos likely
  code = 503 «if we get here, the client is a bot or a site stealer most likely; change 403 to 503 02/01/29, 17:07:28 by DAB
  responseBody = webserver.util.buildErrorPage ("503 SERVICE UNAVAILABLE", "Access temporarily denied due to excessive load from your IP Address")
  if not defined(user.webserver.stats.botBlocker) {user.webserver.stats.botBlocker=0} «02/01/29, 17:12:01 by DAB
  user.webserver.stats.botBlocker++ «02/01/29, 17:12:07 by DAB
  user.webserver.stats.botBlockerSinceStart++ «02/04/22, 10:25:28 by DAB
  scriptError ("!return") « and bail out to prevent mainResponder.respond from overwriting our response
 
return (true)

Relative to Frontier version 9.7b10