Eliminating Duplicate Values

When working with information from logs, it is often desired to retrieve a list of some values where each element in the list appears only once, regardless of the number of times the same value appears in the original data.

As an example, consider the following query, which extracts all the domain accounts that have logged on a computer from the "Security" event log:

SELECT RESOLVE_SID(Sid) AS Account
FROM \\TESTMACHINE1\Security
WHERE EventID IN (540; 528)
The output of this query is a list of all the domain accounts appearing in each "Logon" event:
Account
------------------------------------------------
NT AUTHORITY\LOCAL SERVICE
NT AUTHORITY\NETWORK SERVICE
NT AUTHORITY\NETWORK SERVICE
NT AUTHORITY\NETWORK SERVICE
TESTDOMAIN\TESTUSER1
NT AUTHORITY\LOCAL SERVICE
NT AUTHORITY\LOCAL SERVICE
TESTDOMAIN\TESTUSER1
TESTDOMAIN\TESTUSER2
NT AUTHORITY\LOCAL SERVICE
TESTDOMAIN\TESTUSER1
If we are interested in retrieving a list in which each account name appears only once, we could use the DISTINCT keyword in the SELECT clause as follows:
SELECT DISTINCT RESOLVE_SID(Sid) AS Account
FROM \\TESTMACHINE1\Security
WHERE EventID IN (540; 528)
And obtain:
Account
------------------------------------------------
NT AUTHORITY\LOCAL SERVICE
NT AUTHORITY\NETWORK SERVICE
TESTDOMAIN\TESTUSER1
TESTDOMAIN\TESTUSER2

The DISTINCT keyword is used to indicate that the output of a query should consist of unique records; duplicate output records are discarded.

As another example, we might want to retrieve a list of all the browsers used to request pages from our IIS server, with each browser appearing only once in the list:

SELECT DISTINCT cs(User-Agent)
FROM <1>
cs(User-Agent)
--------------------------------------------------------------------
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1)
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+98)
Mozilla/4.05+[en]
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.0;+T312461;+Q312461)
Mozilla/4.0+(compatible;+MSIE+5.01;+Windows+NT+5.0)
Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.0)
Microsoft+Data+Access+Internet+Publishing+Provider+Cache+Manager
Mozilla/2.0+(compatible;+MS+FrontPage+4.0)
MSFrontPage/4.0
Microsoft+Data+Access+Internet+Publishing+Provider+DAV

It is also possible to use the DISTINCT keyword inside the COUNT aggregate function, in order to retrieve the total number of different values appearing in the data.

For example, the following query returns the total number of different browsers and the total number of different client IP addresses that requested pages from our IIS server:

SELECT   COUNT(DISTINCT cs(User-Agent)) AS Browsers, 
		COUNT(DISTINCT c-ip) AS Clients
FROM <1>
Browsers Clients
-------- -------
356	3379

Tip: In the Log Parser SQL-Like language, the DISTINCT keyword can be used inside aggregate functions only when the GROUP BY clause is not used.


© 2004 Microsoft Corporation. All rights reserved.