Tactical Advice

.NET Framework Helps IT Find Data More Easily

This story appears in the March 2008 issue of BizTech Magazine.

With just a few lines of code, you can extract data from text files, including log files, using regular- expression capture groups. If you’ve used regular expressions to search for matching text, extracting text using the .NET Framework will be very intuitive. If you haven’t worked with regular expressions before, or (like me) you need a reference to remember all the symbols, check out the Microsoft Developer Network’s reference for help.

Finding Matching Lines

Imagine that you need to parse a log file (we’ll use C:\Windows\WgaNotify.log as an example, because it’s present on most computers) and list every file that was successfully copied. The WgaNotify.log file resembles the following:

[WgaNotify.log]
0.109: ========================================================
0.109: 2006/04/27 06:54:09.218 (local)
0.109: Failed To Enable SE_SHUTDOWN_PRIVILEGE
1.359: Starting AnalyzeComponents
1.359: AnalyzePhaseZero used 0 ticks
1.359: No c:\windows\INF\updtblk.inf file.
23.328: Copied file:  C:\WINDOWS\system32\LegitCheckControl.dll
23.578: Copied file (delayed):  C:\WINDOWS\system32\SETE.tmp
25.156:  Return Code = 0
25.156: Starting process:  C:\WINDOWS\system32\wgatray.exe /b

As you can see, two of the lines (shown in bold) contain useful information, and the rest can be ignored. You could use the following console application, which requires the System.IO and System.Text.RegularExpressions name spaces, to display just the lines that contain the phrase “Copied file”:

' Visual Basic
Dim inFile As StreamReader = File.OpenText("C:\Windows\wganotify.log")
Dim inLine As String

' Read each line of the log file
While (inLine = inFile.ReadLine()) IsNot Nothing
    Dim r As New Regex("Copied file")
  
    ' Display the line only if it matches the regular expression
    If r.IsMatch(inLine) Then
        Console.WriteLine(inLine)
    End If
End While
inFile.Close()

// C#
StreamReader inFile = File.OpenText(@"C:\Windows\wganotify.log");
string inLine;

// Read each line of the log file
while ((inLine = inFile.ReadLine()) != null)
{
    Regex r = new Regex(@"Copied file");

    // Display the line only if it matches the regular expression
    if (r.IsMatch(inLine))
        Console.WriteLine(inLine);
}
inFile.Close();

Running this console application would match the lines that contain information about the files copied and display the following:

23.328: Copied file:  C:\WINDOWS\system32\LegitCheckControl.dll
23.578: Copied file (delayed):  C:\WINDOWS\system32\SETE.tmp

If all you need to do is display matching lines in a text file, use FindStr. The following command displays the same output as the previous code sample:
FindStr /R "Copied file:" C:\Windows\WgaNotify.log

Capturing Specific Data

To extract portions of matching lines, specify capture groups by surrounding a portion of your regular expression with parentheses. For example, the regular expression "Copied file:\s*(.*$)" would place everything after the phrase “Copied file:”, followed by white space (the “\s” symbol), into a group. Remember, “.*” matches anything, and “$” matches the end of the line.

To match a pattern and capture a portion of the match, follow these steps:

  1. Create a regular expression, and enclose in parentheses the pattern to be matched. This creates a group.
  2. Create an instance of the System.Text.RegularExpressions.Match class using the static Regex.Match method.
  3. Retrieve the matched data by accessing the elements of the Match.Groups array. The first group is added to the first element, the second group is added to the second element, and so on.

The following example expands on the previous code sample to extract and display the filenames from the WgaNotify.log file:

' Visual Basic
Dim inFile As StreamReader = File.OpenText("C:\Windows\wganotify.log")
Dim inLine As String

' Read each line of the log file
While (inLine = inFile.ReadLine()) IsNot Nothing
    ' Create a regular expression
    Dim r As New Regex("Copied file.*:\s+(.*$)")
   
    ' Display the group only if it matches the regular expression
    If r.IsMatch(inLine) Then
        Dim m As Match = r.Match(inLine)
        Console.WriteLine(m.Groups(1))
    End If
End While
inFile.Close()

// C#
StreamReader inFile = File.OpenText(@"C:\Windows\wganotify.log");
string inLine;

// Read each line of the log file
while ((inLine = inFile.ReadLine()) != null)
{
    // Create a regular expression
    Regex r = new Regex(@"Copied file.*:\s+(.*$)");

    // Display the group only if it matches the regular expression
    if (r.IsMatch(inLine))
    {
        Match m = r.Match(inLine);
        Console.WriteLine(m.Groups[1]);
    }
}
inFile.Close();

This code does a bit better, displaying just the filenames of the copied files:

C:\WINDOWS\system32\LegitCheckControl.dll
C:\WINDOWS\system32\SETE.tmp

Capturing Multiple Groups

You can also separate the folder and filename by matching multiple groups in a single line. The following slightly updated sample creates separate capture groups for the folder name and the filename, and then displays both values. Notice that the regular expression now contains two groups (indicated by two sets of parentheses), and the call to Console.WriteLine now references the first two elements in the Match.Groups array.

' Visual Basic
Dim inFile As StreamReader = File.OpenText("C:\Windows\wganotify.log")
Dim inLine As String

' Read each line of the log file
While (inLine = inFile.ReadLine()) IsNot Nothing
    ' Create a regular expression
    Dim r As New Regex("Copied file.*:\s+(.*\\)(.*$)")
   
    ' Display the line only if it matches the regular expression
    If r.IsMatch(inLine) Then
        Dim m As Match = r.Match(inLine)
        Console.WriteLine("Folder: " + m.Groups(1) + ", File: " + m.Groups(2))
    End If
End While
inFile.Close()

// C#
StreamReader inFile = File.OpenText(@"C:\Windows\wganotify.log");
string inLine;

// Read each line of the log file
while ((inLine = inFile.ReadLine()) != null)
{
    // Create a regular expression
    Regex r = new Regex(@"Copied file.*:\s+(.*\\)(.*$)");

    // Display the line only if it matches the regular expression
    if (r.IsMatch(inLine))
    {
        Match m = r.Match(inLine);
        Console.WriteLine("Folder: " + m.Groups[1] + ", File: " + m.Groups[2]);
    }
}
inFile.Close();

The end result is that the console application captures the folder and filename separately, and outputs the following formatted data:

Folder: C:\WINDOWS\system32\, File: LegitCheckControl.dll
Folder: C:\WINDOWS\system32\, File: SETE.tmp

Using Named Capture Groups

You can make your regular expressions easier to read by naming the capture groups. To name a group, add “?<name>” after the open parenthesis. You can then access the named groups using Match.Groups[“name”]. The following example demonstrates using named groups with the Match.Result method, which allows you to format the results of a regular expression match. It produces exactly the same output as the previous code sample, but the code is easier to read.

' Visual Basic
Dim inFile As StreamReader = File.OpenText("C:\Windows\wganotify.log")
Dim inLine As String

' Read each line of the log file
While (inLine = inFile.ReadLine()) IsNot Nothing
    ' Create a regular expression
    Dim r As New Regex("Copied file.*:\s+(?<folder>.*\\)(?<file>.*$)")
   
    ' Display the line only if it matches the regular expression
    If r.IsMatch(inLine) Then
        Dim m As Match = r.Match(inLine)
        Console.WriteLine(m.Result("Folder: ${folder}, File: ${file}"))
    End If
End While
inFile.Close()

// C#
StreamReader inFile = File.OpenText(@"C:\Windows\wganotify.log");
string inLine;

// Read each line of the log file
while ((inLine = inFile.ReadLine()) != null)
{
    // Create a regular expression
    Regex r = new Regex(@"Copied file.*:\s+(?<folder>.*\\)(?<file>.*$)");

    // Display the line only if it matches the regular expression
    if (r.IsMatch(inLine))
    {
        Match m = r.Match(inLine);
        Console.WriteLine(m.Result("Folder: ${folder}, File: ${file}"));
    }
}
inFile.Close();

The .NET Framework supports using capture groups with regular expressions to extract specific data from log files. Using capture groups, you can parse complex text files and isolate just the information you need. First, create a Regex object (part of the System.Text.RegularExpressions namespace) using a regular expression that includes one or more capture groups in parentheses. Then, call the Regex.Match method to compare the regular expression to the input string. Access your capture groups using the Match.Groups array, or format and output the capture groups by calling Match.Result.

 

PowerShell offers very similar functionality. For more information, read “Regular Expressions in Monad” at http://www.leeholmes.com/blog/RegularExpressionsInMonad.aspx.

Tony Northrup is a developer, security consultant and author with more than 10 years of professional experience developing applications for Microsoft Windows.

Sign up for our e-newsletter

About the Author

Tony Northrup

Tony Northrup

Tony Northrup is a developer, security consultant and author with more than 10 years of professional experience developing applications for Microsoft Windows.

Security

Review: Belkin Advanced Secu... |
This tool can prevent KVM toggling from being a source of network vulnerabilities.
Honeywords: Password Securit... |
Researchers are proposing a new method of spiking the password punch as a way to identify...
How Many Vulnerabilities Doe... |
The potential for damaging data breaches lurks in nearly every corner for SMBs.

Storage

EMC World 2013: Software-Def... |
Storage virtualization is a key element of providing on-demand, flexible cloud services.
How Steve Wozniak Explains V... |
Fusion-io's chief scientist breaks virtualization down into terms everyone can understand.
Product Review: Quantum NDX-... |
Device does double duty for storage and backup.

Infrastructure Optimization

Why More Software Is Headed... |
Many of your favorite software suites are trading in their shiny discs for cloud-based...
Cisco Live 2013: Brush Up wi... |
Get up to speed on convergence, wireless networking, collaboration and more ahead of the...
EMC World 2013: Software-Def... |
Storage virtualization is a key element of providing on-demand, flexible cloud services.

Networking

How to Secure Optimized Netw... |
WAN optimization and security aren’t always complementary. These tips can help you deal...
Cisco Live 2013: Brush Up wi... |
Get up to speed on convergence, wireless networking, collaboration and more ahead of the...
Do Virtual Meetings Boost Pr... |
New study finds that face-to-face meetings don’t always work in workers’ favor.

Mobile & Wireless

Consumr App Powers Informed... |
Reviews and ratings for products on the shelf are only a barcode scan away.
Faster In-Flight Wi-Fi: Com... |
The FCC is working on regulation to free up more Internet bandwidth for air travelers.
CTIA: Wireless Network Data... |
The invisible bytes that zip through the air continue to multiply at rapid rates.

Hardware & Software

Consumr App Powers Informed... |
Reviews and ratings for products on the shelf are only a barcode scan away.
Review: Belkin Advanced Secu... |
This tool can prevent KVM toggling from being a source of network vulnerabilities.
How Many Vulnerabilities Doe... |
The potential for damaging data breaches lurks in nearly every corner for SMBs.