Calculating Text File Statistics

 

 

If you take a look at the archives for the Hey, Scripting Guy!column (which many people consider to be one of the better daily scripting columns published on TechNet) one item that jumps out at you is this: something that people seem to do over and over again is calculate statistics for text files. That is, how many lines are in my file, how many words are in my file, how many characters are in my file, etc.For various reasons, it’s never just enough to havea text file; people need to know everything there is to know about that text file.

 

Calculating text file statistics is relatively easy in VBScript (a little cumbersome, mind you, but relatively easy). Which leads to an obvious question: how easy is it to calculate text file statistics using Windows PowerShell? Let’s find out for ourselves.

 

To begin with, let’s assume we have a text file named C:\Scripts\Alice.txt, a file that contains the following information:

 

Curiouser and curiouser!' cried Alice (she was so much surprised, that for the moment
she quite forgot how to speak good English); 'now I'm opening out like the largest
telescope that ever was! Good-bye, feet!' (for when she looked down at her feet, they
seemed to be almost out of sight, they were getting so far off). 'Oh, my poor little
feet, I wonder who will put on your shoes and stockings for you now, dears? I'm sure
_I_ shan't be able! I shall be a great deal too far off to trouble myself about you:
you must manage the best way you can; --but I must be kind to them,' thought Alice ,
'or perhaps they won't walk the way I want to go! Let me see: I'll give them a new
pair of boots every Christmas.'

 

We’d like to know how many words are in this file, how many lines are in this file, and how many characters are in this file. How hard is that going to be? As it turns out, not very hard at all:

 

Get-Content c:\scripts\alice.txt | Measure-Object –word –line -character

 

No, we didn’t leave anything out: one little line of code really canreturn all sorts of useful information about a text file. In order to get that information we simply use the Get-Contentcmdlet to read the contents of the file C:\Scripts\Alice.txt. However, rather than display those contents to the screen (which is the default behavior of Get-Content), we instead “pipe” that information to the Measure-Objectcmdlet. As the name implies, Measure-Object is designed to “measure” property values; for example, given a set of numbers, Measure-Object can calculate the sum and the average of those numbers, as well as report back the highest and lowest values in that set.

 

Of course, we didn’t pass Measure-Object a set of numbers. Instead, we passed it the contents of a text file; that’s why we tacked on the parameters –word(show me the number of words in the file); -line(show me the number of lines); and –character(show me the number of characters). In return, here’s what Measure-Object reports back:

 

Lines                         Words                    Characters Property
-----                         -----                    ---------- --------
1                           137                           708

 

Pretty cool, right?

 

Here’s another parameter you might find useful: - ignorewhitespace. By default, Measure-Object counts each blank space in a file as a character. In some cases that’s fine; at other times, however, you might want to ignore blank spaces. Doyou want to ignore blank spaces? That’s fine; just tack the – ignorewhitespaceparameter onto the end of your command, like so:

 

Get-Content c:\scripts\alice.txt | Measure-Object –word –line –character - ignorewhitespace

 

Now take a look at the number of characters found in the file:

 

Lines                         Words                    Characters Property
-----                         -----                    ---------- --------
     1                           137                           572

 

Obviously a big difference.

 

Incidentally, you aren’t limited to calculating statistics on text files; Measure-Object works equally well with variables. For example, suppose we assign a text value to a variable named $a:

 

$a = "This is a two-line value `n stored in a variable."

 

How many words, lines, and characters are in $a? Well, let’s try the following command and see for ourselves:

 

$a | Measure-Object –word –line -character

 

According to Measure-Object, it’s the following:

 

Lines                         Words                    Characters Property
-----                         -----                    ---------- --------
    2                             9                            48

 

And people think that statistics are hard. They’re not, at least not when you have Windows PowerShell.