Technical

Technical in-depth articles and information about search engines and directories. Find resources on theme search engines, term vector technology, and authority hub status sites. Details on the usage of query strings and web site and web page cloaking.
Business Resources - Business Search - Business Plans - Business Letters - Business Cards - Business Tools - Cost Calculator
Search Engine Listings Submission Pages Other Engines Meta-Search Engines Domain Name Search Spiders News Technical
Business Plan Pro
 

Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text

We describe the design, prototyping and evaluation of ARC, a system for automatically compiling a list of authoritative Web resources on any (sufficiently broad) topic.

 

 

Efficient Crawling Through URL Ordering

This paper studies in what order a crawler should visit the URLs it has seen, in order to obtain more "important" pages first. Obtaining important pages rapidly can be very useful when a crawler cannot visit the entire Web in a reasonable amount of time.

 

 

Mercator: A Scalable, Extensible Web Crawler

Technical paper describing Mercator and the architecture of a scalable web crawler.

 

 

The Anatomy of a Search Engine

The Anatomy of a Large-Scale Hypertextual Web Search Engine research paper written by Sergey Brin and Lawrence Page founders of Google. In this paper they present Google, a prototype of a large-scale search engine which makes heavy use of the structure present in hypertext.

 

 

Efficient Computation of PageRank

Paper discusses efficient techniques for computing PageRank, a ranking metric for hypertext documents. Several methods are discussed analyzing the convergence of PageRank based on induced ordering of the pages.

 

 

Compaq Systems Research Center

The Systems Research Center (SRC) is one of Compaq's Corporate Research labs.

 

 

Cornell News: HITS web search

A Cornell University computer scientist has developed a new method of searching the World Wide Web that uses the way sites are linked to one another, rather than their text content, to find the most valuable sites on a given topic.

 

 

Hilltop: A Search Engine based on Expert Documents

In this paper we propose a new approach to authoritative ranking, which we call Hilltop. Our approach is based on the same assumptions as the other connectivity algorithms, namely that the number and quality of the sources referring to a page are a good measure of the page's quality.

 

 

How Search Engines Rank Web Pages

Explains how search engines rank web pages by determining relevancy though analyzing keyword location, frequency and through other methods.

 

 

Robots, spiders and other user agents: a resource for WebMasters

Robots, spiders and other user agents: a resource for WebMasters

 

VistaPrint
 

Spider Hunter: Learn to write cloaking scripts and track spiders

Free cloaking scripts, spider lists, forums, and tons of information.

 

 

The Term Vector Database

The Term Vector Database allows fast access to indexing terms for Web pages.

 

 

Web Robots Pages

Web Robots FAQ's, Robots Exclusion, A list of Robots, Robots Mailing List, Articles and papers, and related sites.

 

 

WTMS: A System for Collecting and Analysing Topic-Specific Web Information

WTMS: A System for Collecting and Analysing Topic-Specific Web Information

 

 

Apache 1.3 URL Rewriting Guide

This document supplements the mod_rewrite reference documentation. It describes how one can use Apache's mod_rewrite to solve typical URL-based problems webmasters are usually confronted with in practice.

 

 

Apache module mod_rewrite

This module uses a rule-based rewriting engine (based on a regular-expression parser) to rewrite requested URLs on the fly.

 

 

Jon Kleinberg's Homepage

Jon Kleinberg's researches algorithms which exploit the combinatorial structure of networks and information. This includes techniques for analyzing and modeling link structure in the World Wide Web.

 

 

Stanford Database Group Publication Server

Here you can find fulltexts and bibliographic data of a number of electronic publications of the Stanford InfoLab.

 

 

rfc1630 - Universal Resource Identifiers in WWW

Universal Resource Identifiers in WWW: A Unifying Syntax for the Expression of Names and Addresses of Objects on the Network as used in the World-Wide Web

 

 


Technical

 
Keyword Search!

Quick-Search
City/State/Zip
 
Product Search
Product Line


iPod Value Bundles!

WalMart - Best Names in Electronics.

Walmart.com