Tactical Advice

How to Develop a Sound Data Loss Prevention Strategy

Pattern matching and document tagging are often used in parallel to create the most effective approaches to data loss prevention.
This story appears in the Spring 2012 issue of BizTech Magazine.
How to Develop a Sound Data Loss Prevention Strategy

No matter what your line of business, you no doubt have some data in your possession that’s critical to you, your customers or other stakeholders. Inadvertently releasing that information could lead to financial loss, reputational damage or even criminal sanctions. How, then, are you ensuring that you maintain control over your company’s sensitive data?

Data loss prevention (DLP) systems help keep tabs on an organization’s sensitive data by building an inventory and then maintaining control over the flow of information both inside and outside of the network. There has been a rapid adoption of DLP over the past few years, and now most organizations either already have or are considering deployment of a DLP system.

What Does DLP Technology Do?

There are three critical roles that DLP products play in the enterprise. First, they help build an inventory of the sensitive information in an environment. It’s common for organizations to have only vague ideas about where this data resides, if they even have a clear definition of what constitutes sensitive data.

Many organizations go to great lengths to protect their centralized stores of sensitive information, such as that stored in an ERP system or on enterprise databases, only to be shocked when they discover that users are copying that data into shadow systems stored on notebook computers. DLP can help identify those unsanctioned sensitive data repositories and either eradicate them or implement effective controls around them.

The second role that DLP plays is to monitor the flow of sensitive information throughout an organization. DLP products can tag sensitive information and then document how that data is transferred across networks and between systems. This can help identify business processes that work with sensitive information and implement appropriate security controls.

Finally, DLP allows organizations to proactively block the use of sensitive information that does not meet its security policies. When the DLP system identifies a flow of sensitive data, it can optionally terminate the network connection, redact the sensitive data or apply additional security controls, such as encryption. These actions take place in real time, preventing an unintended leak of sensitive data.

Identifying Sensitive Data

There are two major techniques that DLP systems use to identify sensitive information: pattern matching and document tagging. These are commonly used in parallel to create the most effective approaches to DLP.

With pattern matching approaches, the DLP system uses regular expressions to identify information that might be sensitive. This technique is typically used to identify sensitive numbers that follow a regular pattern, such as Social Security numbers (using the format xxx-xx-xxxx) and credit card numbers (using the format xxxx-xxxx-xxxx-xxxx).

Pattern matching is a good way to identify this type of information, but it is highly prone to false positives. For example, if an unformatted nine-digit number is on a system, a pure pattern-matching system has no way of telling whether it is a Social Security number, a dollar amount or a CUSIP number used to identify financial securities.

For this reason, many DLP products add contextual information to their pattern matching algorithms. In fact, this is one of the major ways that DLP manufacturers differentiate their products. Examples of ways that DLP can use context to improve the accuracy of pattern matching include:

  • Prioritizing formatted data over unformatted data (for example, the hyphens in 123-45-6789 make it much more likely to be a Social Security number than 123456789);
  • Looking at the header row in spreadsheets to determine whether it contains clues to the field’s nature;
  • Using field-specific knowledge to eliminate false positives. (For example, credit card numbers contain a checksum digit calculated using the openly available Luhn algorithm; sixteen-digit numbers that fail the Luhn check are not valid credit card numbers and may be ignored. Similarly, there are no valid Social Security numbers with “00” in the middle position.)

Pattern matching might also be used to search for specific keywords in documents. For example, a company with a rigorous classification policy might configure its DLP to monitor for documents leaving the organization with the phrase “Company Confidential” in the header.

With document tagging, security administrators build an inventory of specific documents that contain sensitive information. The DLP system then has two possible approaches for detecting the attempted sharing of those documents outside of the organization. In the first approach — document fingerprinting — the system computes a cryptographic hash of the file and then compares that digital fingerprint with the fingerprints of all documents leaving the organization. In the second approach, the system stores the entire content of the sensitive files and then watches for data leaving the organization matching those patterns.

The Endpoint and the Network 

There are two major environments monitored by DLP products: endpoints and networks. Many organizations start with one approach or the other and then eventually expand their DLP implementation to include both environments.

Host-based DLP solutions target the weakest link in the security chain: the notebooks, desktops and servers that serve as endpoints. These products can help identify those unknown data stores that contain sensitive information through the use of an agent-based approach. In these cases, a software agent residing on an organization’s endpoints takes an inventory of sensitive information and monitors for policy violations. Most host-based DLP products also provide users with the ability to digitally “shred” unwanted data and securely encrypt sensitive data that they do not wish to delete.

Network-based DLP products sit at the perimeter of an organization’s network and scan all outbound network traffic for potential policy violations. These systems don’t have the ability to build an inventory of sensitive information, but they do provide a last line of defense capable of stopping the flow of sensitive information before it leaves a network.

Adopting a data loss prevention strategy for an organization requires selecting appropriate data identification strategies and then building a monitoring environment that can effectively watch for identified data that is being used in violation of an organization’s security policies.

Sign up for our e-newsletter

About the Author

Mike Chapple

Mike Chapple is an IT professional and assistant professor of computer applications at the University of Notre Dame. He is a frequent contributor to BizTech magazine, SearchSecurity and About.com as well as the author of over a dozen books including the CISSP Study Guide, Information Security Illuminated and SQL Server 2008 for Dummies.

Security

Review: Belkin Advanced Secu... |
This tool can prevent KVM toggling from being a source of network vulnerabilities.
Honeywords: Password Securit... |
Researchers are proposing a new method of spiking the password punch as a way to identify...
How Many Vulnerabilities Doe... |
The potential for damaging data breaches lurks in nearly every corner for SMBs.

Storage

EMC World 2013: Software-Def... |
Storage virtualization is a key element of providing on-demand, flexible cloud services.
How Steve Wozniak Explains V... |
Fusion-io's chief scientist breaks virtualization down into terms everyone can understand.
Product Review: Quantum NDX-... |
Device does double duty for storage and backup.

Infrastructure Optimization

Why More Software Is Headed... |
Many of your favorite software suites are trading in their shiny discs for cloud-based...
Cisco Live 2013: Brush Up wi... |
Get up to speed on convergence, wireless networking, collaboration and more ahead of the...
EMC World 2013: Software-Def... |
Storage virtualization is a key element of providing on-demand, flexible cloud services.

Networking

How to Secure Optimized Netw... |
WAN optimization and security aren’t always complementary. These tips can help you deal...
Cisco Live 2013: Brush Up wi... |
Get up to speed on convergence, wireless networking, collaboration and more ahead of the...
Do Virtual Meetings Boost Pr... |
New study finds that face-to-face meetings don’t always work in workers’ favor.

Mobile & Wireless

Consumr App Powers Informed... |
Reviews and ratings for products on the shelf are only a barcode scan away.
Faster In-Flight Wi-Fi: Com... |
The FCC is working on regulation to free up more Internet bandwidth for air travelers.
CTIA: Wireless Network Data... |
The invisible bytes that zip through the air continue to multiply at rapid rates.

Hardware & Software

Consumr App Powers Informed... |
Reviews and ratings for products on the shelf are only a barcode scan away.
Review: Belkin Advanced Secu... |
This tool can prevent KVM toggling from being a source of network vulnerabilities.
How Many Vulnerabilities Doe... |
The potential for damaging data breaches lurks in nearly every corner for SMBs.