Calculating Text File Statistics
If you take a look at the archives for the Hey, Scripting Guy!column (which many people consider to be one of the better daily scripting columns published on TechNet) one item that jumps out at you is this: something that people seem to do over and over again is calculate statistics for text files. That is, how many lines are in my file, how many words are in my file, how many characters are in my file, etc.For various reasons, it’s never just enough to havea text file; people need to know everything there is to know about that text file.
Calculating text file statistics is relatively easy in VBScript (a little cumbersome, mind you, but relatively easy). Which leads to an obvious question: how easy is it to calculate text file statistics using Windows PowerShell? Let’s find out for ourselves.
To begin with, let’s assume we have a text file named C:\Scripts\Alice.txt, a file that contains the following information:
Curiouser
and
curiouser!' cried
she quite forgot how to
speak good English); 'now I'm opening out like the largest
telescope that ever was!
Good-bye, feet!' (for when she looked down at her feet, they
seemed to be almost out of
sight, they were getting so far off). 'Oh, my poor little
feet, I wonder who will put
on your shoes and stockings for you now, dears? I'm sure
_I_ shan't be able! I shall
be a great deal too far off to trouble myself about you:
you must manage the best
way you can; --but I must be kind to them,' thought
'or perhaps they won't walk
the way I want to go! Let me see: I'll give them a new
pair of boots every
Christmas.'
We’d like to know how many words are in this file, how many lines are in this file, and how many characters are in this file. How hard is that going to be? As it turns out, not very hard at all:
Get-Content c:\scripts\alice.txt | Measure-Object –word –line -character
No, we didn’t leave anything out: one little line of code really canreturn all sorts of useful information about a text file. In order to get that information we simply use the Get-Contentcmdlet to read the contents of the file C:\Scripts\Alice.txt. However, rather than display those contents to the screen (which is the default behavior of Get-Content), we instead “pipe” that information to the Measure-Objectcmdlet. As the name implies, Measure-Object is designed to “measure” property values; for example, given a set of numbers, Measure-Object can calculate the sum and the average of those numbers, as well as report back the highest and lowest values in that set.
Of course, we didn’t pass Measure-Object a set of numbers. Instead, we passed it the contents of a text file; that’s why we tacked on the parameters –word(show me the number of words in the file); -line(show me the number of lines); and –character(show me the number of characters). In return, here’s what Measure-Object reports back:
Lines
Words
Characters Property
-----
-----
---------- --------
1
137
708
Pretty cool, right?
Here’s another parameter you might find useful: - ignorewhitespace. By default, Measure-Object counts each blank space in a file as a character. In some cases that’s fine; at other times, however, you might want to ignore blank spaces. Doyou want to ignore blank spaces? That’s fine; just tack the – ignorewhitespaceparameter onto the end of your command, like so:
Get-Content c:\scripts\alice.txt | Measure-Object –word –line –character - ignorewhitespace
Now take a look at the number of characters found in the file:
Lines
Words
Characters Property
-----
-----
---------- --------
1
137
572
Obviously a big difference.
Incidentally, you aren’t limited to calculating statistics on text files; Measure-Object works equally well with variables. For example, suppose we assign a text value to a variable named $a:
$a = "This is a two-line value `n stored in a variable."
How many words, lines, and characters are in $a? Well, let’s try the following command and see for ourselves:
$a | Measure-Object –word –line -character
According to Measure-Object, it’s the following: