Site
Statistics
Your
account comes with HTTP-Analyze preinstalled and configured.
HTTP-Analyze
is a log analyzer for web servers. It analyzes the logfile of a web server
and creates a comprehensive summary report from the information found
there. http-analyze has been optimized to process large logfiles as fast
as possible.
In
easier-to-understand terms, HTTP-Analyze is a very powerful traffic
analyzer that quickly and efficiently delivers you statistics on the
traffic that your web pages have generated. It has a user-friendly
graphical user interface (GUI) that by a click of your mouse button will
produce your traffic reports.
Below we
explain in more detail how this powerful software works with your web
site, as well as provide you with definitions to the results you'll
receive.
The web
server is a program running on a networked machine, waiting for
connections from the outside world to serve certain documents on behalf of
a request by a browser.
To
communicate, the server and the browser use an asynchronous communication
method called the HTTP (hypertext transaction) protocol. It works as
follows:
the user
starts the browser and types in an URL
the
browser connects to the given host and requests the specified document.
The web
server handles the request and sends out a response:
If this
document exists, the web server delivers it.
If it does not exist or if access is not permitted, the web server sends
back an error message instead.
The
document delivered as an answer to this request may contain inline
objects. Inline objects are simply URLs pointing to another resource,
either a document, an image, an applet, a video/audio stream, or any other
addressable HTML object.

The
browser then requests all inline objects of the current page from the
server using the steps 2 and 3 above, before it can display the content of
that page.
This
communication method is called asynchronous, because the browser sends out
many requests for inline documents at once (without waiting for a response
from the server before sending the next request) using different
communication channels:

Since the
browser's requests are often handled by different server processes or
different threads of a server process, there is absolutely no relationship
between the logfile entries caused by the responses from the server due to
a request of a document and it's inline objects.
For
example, the order in which the server logs the successful transmission of
the document itself and the inline images contained therein is not
predictable and depends on the type of documents, objects, server speed,
system and network load, and many other parameters.
What
is logged?
Each and
every response from the server - whether it indicates success, an error,
or even a timeout (i.e. no response) - gets logged in the server's logfile.
Since the server was hit by a request, such a response is called a Hit. In
other words, the total number of hits must equal the total number of lines
in the logfile minus the number of corrupt and empty lines. A typical
logfile entry in the Common Logfile Format looks like:
hostname-[01/Feb/1998:10:10:00
+0100]
"GET/index.html HTTP/1.0"200 4839
The
hostname field contains the full qualified domain name (FQDN) of the site
accessing your server (see »Special Cases« below). The next two fields
usually contain a minus (`-') to indicate that those fields are empty. The
date is surrounded by square brackets ('[' and ']'). The next field
contains the request. It contains the request method ('GET' for example),
the name of the requested document (URL), and the protocol specification
('HTTP/1.0').
The
following field contains the servers response code ('200' stands for an
'OK', while '404' would mean 'Document not found', for example). The last
field contains the size of the document (some servers log the number of
bytes transferred actually, while other servers log the size of the
document, which makes a difference if the user interrupts the transfer
before the document could be transmitted completely.
There are
two other logfile formats, the Combined or Extended Logfile Format. Those
formats add the user-agent (browser type) and the referrer URL (the page,
which contains a link to the requested document if this request for such
document has been generated by following a link) to the logfile entry.
Those Combined or Extended Logfile Format append following two fields to
the Common Logfile Format (CLF) in one of two usual ways:
CLF
Mozilla/2.0 (X11; IRIX 6.3; IP22) http://foo/bar.html
CLF "http://foo/bar.html" "Mozilla/2.0 (X11; IRIX 6.3;
IP22)"
Note that
in the second form, the user-agent and the referrer URL are surrounded by
double quotes, which makes them ambiguous in certain cases such as
erroneous referrer URLs, which contain double quotes. Therefore, the first
form should be preferred if possible.
The
entries shown above are the only information the server records in the
logfile. There might be much more information being transferred from the
browser to the server, but although this additional information is
available through CGI-scripts running on your server, it gets not logged
in the logfile. Therefore, http-analyze can only show you a summary of the
information in the logfile - nothing more, nothing less.
Special
Cases
Caching in
the browser:
As soon as
a page has been saved in a browser's disk cache, the browser might send
out conditional requests for documents or inline objects. This conditional
request ask the web server to only send a document/object if it has been
modified since the last time the page has been requested (if the page is
still in the browser's cache). This way, network traffic is reduced
somewhat, since documents must be transferred only if they have changed
recently. If such a conditional request arrives, the server will respond
with a Code 304 (Not Modified) status to indicate that the document
hasn't changed or with a Code 200 (OK) status if it has changed in
the meantime. Since the browser may be configured (and usually is so by
default) to only send out such conditional requests once per session and
otherwise unconditionally use the copy from the cache, you may not even
see a Code 304 response if this users visits your site again in the
same session. Conditional requests are then sent out only if the user
terminates the browser session and later restarts the browser.
Caching in
a proxy server:
Organizations
with a large number of users - such as companies, universities, or online
providers - often use a so-called proxy server for mainly two reasons:
-
Often
such organizations have a firewall to protect their internal network
against intruders. This means, that their network is logically
separated from the rest of the Internet and that they have to use such
a proxy server, which is able to communicate with the inside and the
outside of their local network.
-
To
reduce network load somewhat, the proxy server acts as a local copy
machine: As soon as a page is loaded into a browser through such a
proxy server, the proxy saves a copy of this page in it's disk cache
much like a browser does in the scenario above. This way, documents
requested very often by users in the same local network need to be
transferred to the proxy only once, which then answers future requests
for the same page from it's local cache instead of connecting to the
original web server the document originated from.
Both forms
of caching make it technically impossible to count visitors or to track
their way through your web site. All you see in the logfile of your server
is only a few initial hits from the proxy or browser and probably some
Code 304 responses resulting from conditional requests sent out by
the proxy or browser, depending on the preferences settings of the proxy
or browser.
Definition
of Terms
The
statistics report contains among others the following information:
the
number of hits, 304's, files, pageviews, sessions, data sent (in KB)
the
amount of data requested, transferred, and saved by cache (in KB)
the
number of unique URLs, sites, and sessions per month
the
number of all response codes other than 200 (OK)
the
average hits per weekday and for last week
the
maximum/average hits per day and per hour
the
number of hits, files, 304's, sites, data sent by day
the
top 5 days, 24 hours, 5 minutes and 5 seconds of the summary period
the
top 30 most commonly accessed URLs (hits, 304's, data sent)
the 10
least frequently accessed URLs (hits, 304's, data sent)
the
top 30 client domains accessing your server most often
the
top 30 browser types
the
top 30 referrer hosts
the
overview/detailed list of all files requested
the
overview/detailed list of all sites by domain and reverse domain
the
overview/detailed list of all browser types
the
overview/detailed list of all referrer URLs
The
following table summarizes the meaning of all terms in the statistics
report which are not self-explaining:
|
Term
|
Color
|
Meaning
|
|
Hits
|

|
A
hit is any response from the server on behalf of a request sent
from a browser. This includes any response from the server, not
only text files or documents. If, for example, a HTML page has two
images embedded, the server generates three hits if this page is
requested: one hit for the HTML page itself and two hits for the
two inline images.
|
|
Files
|

|
If
the user requests a document and the server successfully sends
back a file for this request, this is counted as a Code 200 (OK)
response. Any such response is counted for as a file. Again,
"file" here means any kind of a file.
|
|
Code
304
|

|
A
Code 304 (Not Modified) response is generated by the server if a
document hasn't been updated since the last time it was requested
by the user and therefore there was no need to actually send the
files for this document. This happens if the browser (or a caching
proxy server between the browser and your web server) still has an
up-to-date copy of the page in it's local storage (cache) and
therefore can display the page without requesting the actual
content. This technique is used to reduce network traffic, but it
also causes an inaccuracy in the statistics reports regarding the
number of visitors, because the browser or proxy usually sends
only one such a conditional request per user session if it still
holds an up-to-date copy of the file. However, the ratio between
files and 304's reflects the efficiency of overall caching
mechanisms for at least those hits which made it's way to the
server.
|
|
Pageviews
|

|
Pageviews
are all files which either have a text file suffix (.html, .text)
or which are directory index files. This number allows to estimate
the number of "real" documents transmitted by your
server. If defined correctly, the analyzer rates text files
(documents) as pageviews. Those pageviews do not include images,
CGI scripts, Java applets or any other HTML objects except all
files ending with one of the pre-defined pageview suffixes, such
as .html or .text.
|
|
Other
responses
|
¹
|
There
are much more responses than only Code 200 (OK) and Code 304 (Not
Modified) responses, especially in the coming standard, the HTTP
1.1 protocol specification. For example, the server could generate
a Code 302 (Redirected) response if a page has moved, a Code 401
(Unauthorized Request) response if access to the document is
denied or a Code 404 (Not Found) response if the requested page
does not exist on this server.
|
|
KBytes
transferred
|

|
This
is the amount of data sent during the whole summary period as
reported by the server. Note that some servers log the size of a
document instead of the actual number of bytes transferred. While
in most cases this is the same, if a user interrupts the
transmission by pressing the browser's stop button before the page
has been received completely, some servers (for example all
Netscape web servers) do not log the amount of data transferred
but the amount of data which would have been transferred if the
user would have completely loaded the page.
|
|
KBytes
requested
|
¹
|
This
is the amount of data requested during the whole summary period.
http-analyze computes this number by summing up the values of
KBytes transferred and KBytes saved by cache (see below).
|
|
KBytes
saved by cache
|
¹
|
The
amount of data saved by various caching mechanisms such as in
proxy servers or in browsers. This value is computed by
multiplying the number of Code 304 (Not Modified) requests per
file with the size of the corresponding file. Note: Because
http-analyze can determine the size of a file only if the file has
been requested at least once in the same summary period, the
values for KBytes saved by cache and KBytes requested are just
approximations of the real values.
|
|
Unique
URLs
|
|
Unique
URLs are the number of all different, valid URLs requested in a
given summary period. This shows you the number of all different
files requested at least once in the corresponding summary period.
|
|
Unique
sites
|
|
This
is the sum of all unique hosts accessing the server during a given
time-window . The time-window is hardwired to the length of the
current month. This means that if a host accesses your server very
often, it gets counted only once during the whole month. Only the
sum of the unique hosts per month is listed in the statistics
report.
|
|
Sessions
|

|
Similar
to unique sites, this is the number of unique hosts accessing the
server during a given time-window. This time-window is one day by
default for backward compatibility, but it can be changed with the
option -u or the Session directive in the configuration file. For
example, if the time-window is two hours, all accesses from a
certain host in less than 2 hours after the first access from this
host are lumped together into one session. All following accesses
more than 2 hours apart from the first access will be counted as a
new session. This way you may get an estimated number of how many
sessions are started on different sites to access your server.
|
1 shown
only on the total summary page.
|
|
|