Google

FTP search mirror setup

The cost estimates here are based upon configuring a server that indexes all ftp servers that are indexed by ftpsearch.ntnu.no. For a smaller number of ftp servers, less resources are needed, and the server does not need to run on a dedicated machine.

An estimate of the cost of a dedicated machine configured the same way as the current incarnation of ftpsearch.ntnu.no. With over 550000 requests/day, the machine will be somewhat overloaded:

$300
2 MS160SE cards.
$$1800
320 MB memory:
  • 128 MB system memory
  • 96 MB + 96 MB memory on the cards
$3400
3 x 4 GB disks for data:
  • 1 x 4 GB for running data set
  • 1 x 4 GB for next data set being constructed
  • 1 x 4 GB for ls listings, logs, ...
$1500
Pentium 100 machine with
  • space for 2 MS160SE cards (2 full length ISA slots each, for a total of 4 full length ISA slots)
  • PCI Ethernet controller
  • PCI SCSI controller
  • 1 GB SCSI system disk
  • only NetBSD or FreeBSD compatible hardware
  • VGA ISA or PCI screen card
    A small screen card an fit into an ISA slot next to an MS160SE card, avoiding the waste of one ISA slot.
  • VGA monitor, keyboard, power supply and other standard items.
-----
$7000 (currency is USD)

An estimate of the cost of a dedicated machine that might be able to handle about twice the above mentioned traffic load, without using special purpose hardware:

$4000
512 MB system memory
$8000
7 x 4 GB disks for data:
  • 2 x 4 GB for running data set
  • 2 x 4 GB for next data set being constructed
  • 1 x 4 GB for ls listings, logs, ...
  • 1 x 4 GB for temporary file during suffix table generation
  • 1 x 4 GB system disk
$4000
Machine (e.g. PC with 200 MHz Pentium Pro Processor running FreeBSD 2.2).
-----
$17000 (currency is USD)

Without the MS160SE card, you need suffix tables. This substantially increases the memory or disk requirements of the data set generation.

With suffix tables you need 160 MB extra memory and 3 extra 4 GB disks. The data set generation time will also be increased, e.g., from 3 to 6 hours on an otherwise idle machine.

Using suffix tables instead of the MS160SE cards thus has the following disadvantages:

  • Longer data set generation time.
  • Some regular expressions can be handled by the MS160SE card, but not by suffix tables, e.g. the expression
          ^[a-g]([0-9]\.)+g[u-z]$
          
    Here we cannot extract any good substrings to use in combination with the suffix table, and we end up with brute force anyway. It becomes very difficult to come close to the 0.8 seconds used when using the MS160SE card (searching 120 MB of unique file names). This type of regular expressions rarely occurs.
  • Need more Disk and RAM.
and the advantages:
  • Faster for substring searches and normal regular expressions.
  • Not dependent upon an ISA bus and MS160SE device driver.
  • Can run on more platforms. We have tried
    • NetBSD 1.1/i386,
    • NetBSD 1.2/sparc,
    • FreeBSD 2.2-BETA/i386,
    • FreeBSD 3.0-current/i386,
    • SunOS 5.5/sparc
    • AIX 4.1/rs6000, with locally installed BIND 4.9.3.
    • OSF1 V3.2/alpha, with locally installed BIND 4.9.5.
  • Can be combined with using fallback to MS160SE cards on some regular expressions to yield the best overall performance.

[ FTP search | Search page | Mirrors | Technical info | Software ]


tegge@idt.ntnu.no
Last modified: Mon Feb 10 04:58:00 MET 1997