All the query examples that we have seen so far share a common
characteristic: the values of each output record were built upon
the values of a single input record.
Sometimes, however, we might need to aggregate multiple
input records together and perform some operation on groups
of input records.
To accomplish this task, the Log Parser SQL-Like language has a
special set of functions that can be used to perform basic
calculations on multiple records. These aggregate functions (also
referred to as "SQL functions") include SUM, COUNT,
MAX, MIN, and AVG.
To show a classic example of the use of aggregate functions,
assume that given an IIS W3C log file, we want to calculate the
total number of bytes sent by the IIS server during the whole
period recorded in the log file.
Considering that the number of bytes sent by the IIS server for
each HTTP request is logged in the "sc-bytes" field, our command
will look like the following example:
C:\>LogParser -i:IISW3C -o:NAT "SELECT SUM(sc-bytes) FROM ex040528.log"Since the SELECT clause of this query makes use of the SUM aggregate function, the query will automatically aggregate all the input records, and calculate the sum of all the values of the "sc-bytes" field across all the input records; the output of this command will then look like the following output:
SUM(sc-bytes) ------------- 242834732As the example shows, the result of the query is a single output record, containing a single value calculated across all the input records.
As another example, we might want to calculate how many requests
have been logged in the log file.
Considering that each log file entry represents a single HTTP
request, this task can be accomplished by simply counting how many
input records are logged in the file:
C:\>LogParser -i:IISW3C -o:NAT "SELECT COUNT(*) FROM ex040528.log"The example above makes use of the COUNT aggregate function. When used with the special "*" argument, the COUNT function returns the total number of input records processed by the query.
If we want to calculate how many requests satisfy a particular
condition, for example how many requests were for an ASP page, we
can add a WHERE clause to the query,
and the COUNT function will only count input records satisfying the
WHERE condition:
SELECT COUNT(*) FROM ex040528.log WHERE EXTRACT_EXTENSION(cs-uri-stem) LIKE 'asp'
In the examples above, we have been using aggregate functions to
calculate a value across all the input records; sometimes,
however, we might want to calculate values across groups of
input records.
As an example, we might want to calculate the total number of bytes sent by the IIS server for each URL. To perform this task, we need to divide all the input records into groups according to the URL requested, and then use the SUM aggregate function separately on each group.
This can be accomplished by using another building block of the
Log Parser SQL language: the GROUP
BY clause.
The GROUP BY clause is used to specify which fields we want the
group subdivision to be based on; after the input records have been
divided into these groups, all the aggregate functions in the
SELECT clause will be calculated separately on each of these
groups, and the query will return an output record for each group
created.
Using the GROUP BY clause, our example query and its output will
look like this:
SELECT cs-uri-stem, COUNT(*) FROM ex040528.log GROUP BY cs-uri-stem
cs-uri-stem COUNT(*) ---------------------- -------- /Home/default.asp 5 /Home/images/bckgd.gif 419 /Docs/expl.htm 12 /Docs/main.htm 26 /login/frmx.dll 1
To make another example, assume that we want to calculate how
many requests have been served for each page type (ASP, html, CSS,
etc.).
First of all, we need to create separate groups according to the
extension of the URL; after this group subdivision has been done,
we can calculate a COUNT(*) on each group:
SELECT EXTRACT_EXTENSION(cs-uri-stem) AS PageType, COUNT(*) FROM ex040528.log GROUP BY PageTypeThe output will look like:
PageType COUNT(ALL *) -------- ------------ htm 115 css 22 gif 585 exe 25 nsf 142 swf 11 jpg 77 html 1 dll 1 asp 5 js 11 class 5
If we sort the output above according to the number of requests
for each group, we will be creating a list showing the most
requested page types first:
SELECT EXTRACT_EXTENSION(cs-uri-stem) AS PageType, COUNT(*) AS PageTypeHits FROM ex040528.log GROUP BY PageType ORDER BY PageTypeHits DESCThe output will look like:
PageType PageTypeHits -------- ------------ gif 585 nsf 142 htm 115 jpg 77 exe 25 css 22 js 11 swf 11 asp 5 class 5 dll 1 html 1
Groups can also be built on multiple fields, thus creating a
hierarchy of groups.
For example, consider the following query:
SELECT EXTRACT_EXTENSION(cs-uri-stem) AS PageType, sc-status, COUNT(*) FROM ex040528.log GROUP BY PageType, sc-statusThis query creates groups according to the requested page type, and within each of these groups, sub-groups are created according to the HTTP status sent by the IIS server for the group page type; the aggregate function "COUNT" will then be calculated on each sub-group.
PageType sc-status PageTypeHits -------- --------- ------------ htm 304 79 css 304 10 gif 304 450 exe 200 25 nsf 200 129 swf 200 3 gif 404 12 css 404 9 htm 200 34 css 200 3 jpg 200 17 gif 200 123 jpg 304 60 swf 304 8 nsf 403 3 html 404 1 dll 500 1 asp 200 5 js 304 7 class 304 4 js 200 4 htm 404 2 class 200 1 nsf 304 9 nsf 302 1
It's important to note a particular language constraint derived
from the use of the GROUP BY clause.
Whenever a query contains a GROUP BY clause, its SELECT clause can only contain any of the
following:
SELECT 'hello', TO_UPPERCASE(cs-uri-stem), COUNT(*), SUM(sc-bytes) FROM ex040528.log GROUP BY cs-uri-stemIn fact, the SELECT clause in the example above contains:
SELECT date, COUNT(*), SUM(sc-bytes) FROM ex040528.log GROUP BY cs-uri-stemThe SELECT clause in the example above contains a field-expression ("date") that does not appear in the GROUP BY clause.
SELECT TO_UPPERCASE(cs-uri-stem), COUNT(*), SUM(sc-bytes) FROM ex040528.log GROUP BY SUBSTR(TO_UPPERCASE(cs-uri-stem), 0, 5)The SELECT clause in the example above contains a field-expression ("TO_UPPERCASE(cs-uri-stem)") that is not derived from any field-expression in the GROUP BY clause; in this case, it's actually the field-expression in the GROUP BY clause that is derived from a field-expression in the SELECT clause.
SELECT SUBSTR(TO_UPPERCASE(cs-uri-stem), 0, 5), COUNT(*), SUM(sc-bytes) FROM ex040528.log GROUP BY SUBSTR(TO_UPPERCASE(cs-uri-stem), 0, 5)
© 2004 Microsoft Corporation. All rights reserved.