Tactical Advice

How to Develop a Sound Data Loss Prevention Strategy

Pattern matching and document tagging are often used in parallel to create the most effective approaches to data loss prevention.
This story appears in the Spring 2012 issue of BizTech Magazine.
How to Develop a Sound Data Loss Prevention Strategy

No matter what your line of business, you no doubt have some data in your possession that’s critical to you, your customers or other stakeholders. Inadvertently releasing that information could lead to financial loss, reputational damage or even criminal sanctions. How, then, are you ensuring that you maintain control over your company’s sensitive data?

Data loss prevention (DLP) systems help keep tabs on an organization’s sensitive data by building an inventory and then maintaining control over the flow of information both inside and outside of the network. There has been a rapid adoption of DLP over the past few years, and now most organizations either already have or are considering deployment of a DLP system.

What Does DLP Technology Do?

There are three critical roles that DLP products play in the enterprise. First, they help build an inventory of the sensitive information in an environment. It’s common for organizations to have only vague ideas about where this data resides, if they even have a clear definition of what constitutes sensitive data.

Many organizations go to great lengths to protect their centralized stores of sensitive information, such as that stored in an ERP system or on enterprise databases, only to be shocked when they discover that users are copying that data into shadow systems stored on notebook computers. DLP can help identify those unsanctioned sensitive data repositories and either eradicate them or implement effective controls around them.

The second role that DLP plays is to monitor the flow of sensitive information throughout an organization. DLP products can tag sensitive information and then document how that data is transferred across networks and between systems. This can help identify business processes that work with sensitive information and implement appropriate security controls.

Finally, DLP allows organizations to proactively block the use of sensitive information that does not meet its security policies. When the DLP system identifies a flow of sensitive data, it can optionally terminate the network connection, redact the sensitive data or apply additional security controls, such as encryption. These actions take place in real time, preventing an unintended leak of sensitive data.

Identifying Sensitive Data

There are two major techniques that DLP systems use to identify sensitive information: pattern matching and document tagging. These are commonly used in parallel to create the most effective approaches to DLP.

With pattern matching approaches, the DLP system uses regular expressions to identify information that might be sensitive. This technique is typically used to identify sensitive numbers that follow a regular pattern, such as Social Security numbers (using the format xxx-xx-xxxx) and credit card numbers (using the format xxxx-xxxx-xxxx-xxxx).

Pattern matching is a good way to identify this type of information, but it is highly prone to false positives. For example, if an unformatted nine-digit number is on a system, a pure pattern-matching system has no way of telling whether it is a Social Security number, a dollar amount or a CUSIP number used to identify financial securities.

For this reason, many DLP products add contextual information to their pattern matching algorithms. In fact, this is one of the major ways that DLP manufacturers differentiate their products. Examples of ways that DLP can use context to improve the accuracy of pattern matching include:

  • Prioritizing formatted data over unformatted data (for example, the hyphens in 123-45-6789 make it much more likely to be a Social Security number than 123456789);
  • Looking at the header row in spreadsheets to determine whether it contains clues to the field’s nature;
  • Using field-specific knowledge to eliminate false positives. (For example, credit card numbers contain a checksum digit calculated using the openly available Luhn algorithm; sixteen-digit numbers that fail the Luhn check are not valid credit card numbers and may be ignored. Similarly, there are no valid Social Security numbers with “00” in the middle position.)

Pattern matching might also be used to search for specific keywords in documents. For example, a company with a rigorous classification policy might configure its DLP to monitor for documents leaving the organization with the phrase “Company Confidential” in the header.

With document tagging, security administrators build an inventory of specific documents that contain sensitive information. The DLP system then has two possible approaches for detecting the attempted sharing of those documents outside of the organization. In the first approach — document fingerprinting — the system computes a cryptographic hash of the file and then compares that digital fingerprint with the fingerprints of all documents leaving the organization. In the second approach, the system stores the entire content of the sensitive files and then watches for data leaving the organization matching those patterns.

The Endpoint and the Network 

There are two major environments monitored by DLP products: endpoints and networks. Many organizations start with one approach or the other and then eventually expand their DLP implementation to include both environments.

Host-based DLP solutions target the weakest link in the security chain: the notebooks, desktops and servers that serve as endpoints. These products can help identify those unknown data stores that contain sensitive information through the use of an agent-based approach. In these cases, a software agent residing on an organization’s endpoints takes an inventory of sensitive information and monitors for policy violations. Most host-based DLP products also provide users with the ability to digitally “shred” unwanted data and securely encrypt sensitive data that they do not wish to delete.

Network-based DLP products sit at the perimeter of an organization’s network and scan all outbound network traffic for potential policy violations. These systems don’t have the ability to build an inventory of sensitive information, but they do provide a last line of defense capable of stopping the flow of sensitive information before it leaves a network.

Adopting a data loss prevention strategy for an organization requires selecting appropriate data identification strategies and then building a monitoring environment that can effectively watch for identified data that is being used in violation of an organization’s security policies.

Sign up for our e-newsletter

About the Author

Mike Chapple

Mike Chapple is an IT professional and assistant professor of computer applications at the University of Notre Dame. He is a frequent contributor to BizTech magazine, SearchSecurity and About.com as well as the author of over a dozen books including the CISSP Study Guide, Information Security Illuminated and SQL Server 2008 for Dummies.


Heartbleed: What Should Your... |
One of the biggest security vulnerabilities has almost every user and every industry...
Why Businesses Need a Next-G... |
Devices investigate patterns that could indicate malicious activity.
Review: HP TippingPoint S105... |
Next-generation firewall can easily replace a stand-alone intrusion prevention system....


The New Backup Utility Proce... |
Just getting used to the Windows 8 workflow? Prepare for a change.
How to Perform Traditional W... |
With previous versions going unused, Microsoft radically reimagined the backup utility in...
5 Easy Ways to Build a Bette... |
While large enterprises have the resources of an entire IT department behind them, these...

Infrastructure Optimization

Businesses Must Step Careful... |
Slow and steady wins the race as businesses migrate IT operations to service providers,...
Why Cloud Security Is More E... |
Cloud protection services enable companies to keep up with security threats while...
Ensure Uptime Is in Your Dat... |
Power and cooling solutions support disaster recovery and create cost savings and...


Securing the Internet of Thi... |
As excitement around the connected-device future grows, technology vendors seek ways to...
How to Maximize WAN Bandwidt... |
Understand six common problems that plague wide area networks — and how to address them.
Linksys Makes a Comeback in... |
The networking vendor introduced several new Smart Switch products at Interop this week.

Mobile & Wireless

Mobility: A Foundational Pie... |
Other technologies rely on mobile computing, which has the power to change lives, Lextech...
Now that Office for iPad Is... |
After waiting awhile for Microsoft’s productivity suite to arrive, professionals who use...
Visualization Can Help Busin... |
Companies need to put their data in formats that make it consumable anytime, anywhere.

Hardware & Software

Review: HP TippingPoint S105... |
Next-generation firewall can easily replace a stand-alone intrusion prevention system....
New Challenges in Software M... |
IT trends such as cloud, virtualization and BYOD pose serious hurdles for software...
Visualization Can Help Busin... |
Companies need to put their data in formats that make it consumable anytime, anywhere.