Data Center

Big Data Needs Push Firm to Explore Storage Options

With millions of acres of timberland in its care, a unique investment management firm overhauls back-end systems to create a high-availability infrastructure.

Karen D. Schwartz

Twitter

Karen Schwartz is a freelance technology writer based in the Washington D.C. area.

The Campbell Group manages more than 3 million acres of timberland — probably not in the way that most people would think though.

The 30-year-old Portland, Ore., company is a management and investment firm that trades in woodlands rather than finances, managing the timber on land owned by its customers — tracking timber portfolios to assure their risk-return payoff stays favorable.

All those acres, and the trees on them, translate into a major data storage challenge for the 300-employee company, which has offices in 30 locations throughout the United States. Currently, the company houses about 15 terabytes of geographic information system (GIS) data.

It maintains thousands upon thousands of discrete data-intensive geospatial files — managed using ESRI ArcGIS and stowed in Microsoft SQL Server and Oracle databases — that contain satellite and aerial images, survey measurements and more.

No slow growth here. The company’s data stores keep multiplying, says IT Manager Jeffrey Groff. Why? Because the Campbell Group archives all past data, even as it creates new data. For instance, if a team overseeing a land holding needs to build a new logging road, the data set for that parcel grows. When the company gets a new client, its records expand exponentially. That’s why the infrastructure must be flexible enough to scale quickly.

“We get treated like a small business because of our number of employees, but we are an enterprise business in terms of our data requirements,” Groff says.

Campbell Group employees must be able to quickly and reliably access the data. But high availability can be tricky with GIS, Groff notes, because GIS images are relatively large and consume a lot of space.

“If they aren’t put in the right spot, they will serve up to people across the nation very slowly. We needed to make sure that bottlenecks were as small as possible.”

Those were the exact issues the company faced not so long ago, hindering its growth and agility. The storage setup at the time consisted of almost two dozen aging servers connected to an array of disks that had been added piecemeal as the data stores grew. Campbell Group housed most of the servers at its main data center in Portland, with a few located at a second facility in Diboll, Texas. The time and energy required to manage the haphazard and decentralized storage system cut into other IT responsibilities.

“Performance had degraded, we were running out of storage, and there were times when the system simply failed,” Groff says. “There were many nights and weekends where I had to come in and get everything back running before morning.”

35 terabytes
The increase in the Campbell Group’s storage needs (from 5TB to 40TB) over the three years it spent digitizing its archival data

Without the ability to service existing clients effectively and bring on new clients efficiently, Campbell’s executives knew the company couldn’t grow. Clearly, things had to change.

“We knew we needed to build a foundation that was the right path, and we needed to get away from the Band-Aid approach,” Groff says. “We needed to stop buying things cheaply to fill gaps and find something with good performance that could scale quickly and was really solid.”

Clearing a Path

After evaluating the existing data store and projecting growth for the next several years, Groff laid out a plan to overhaul the storage infrastructure at the Campbell Group. The new environment, rolled out in December 2010, standardized on EMC gear: a Celerra NS-120 unified storage array that can scale to 120 drives and a Clariion CX4 Series networked storage system.

“We were fortunate because my management was receptive and realistic about what we needed. It wasn’t a hard sell to the director of technology,” Groff says. “I just laid out what the company needed in terms of performance, storage and growth for the future. They knew it was the only way to address our growing pains.”

The NS-120 and Clariion CX 4 environment can be connected to multiple storage networks via network-attached storage (NAS), Fibre Channel storage area networks (SANs) and Celerra Multi-Path File System (MPFS). Groff chose a Fibre Channel SAN because that enables both high availability and quick scalability. It also supports advanced storage functions such as data deduplication and virtual provisioning.

To ready the data center for the SAN implementation, the IT team consolidated 20 servers down to five and virtualized using Citrix XenServer. The servers connect to the SAN, which serves up logical units of storage to the virtual servers.

Standing Tall

The Campbell Group experienced immediate results after implementing the new SAN, Groff says. Flexibility and manageability improved dramatically, and performance and uptime were no longer issues.

Jeffrey Groff, IT Manager at the Campbell Group

“We needed to stop buying things cheaply to fill gaps and find something with good performance that could scale quickly and was really solid.”
– Jeffrey Groff, IT Manager at The Campbell Group

“We’re back to one single source of management,” Groff says. “We can provision storage as needed for our virtual environment from one location, and our IT staff can go back to doing the other parts of their jobs.” Soon after Groff installed the SAN unit in the main Portland data center, he worked with his CDW account manager, Kim Fiala, to create a disaster recovery site at one of CDW’s own data centers in Illinois. A second mirror EMC setup is installed at the disaster recovery center in Chicago, creating full redundancy.

“We are redundant in two different ways now,” Groff says. “In the main data center, if one piece of the storage system goes down, we can still maintain performance and uptime because the device has the ability to move itself to another storage processor if it encounters a problem. And if our entire main data center goes down, everything is immediately available via the SAN at the Chicago collocation site.”

Groff uses EMC MirrorView to accomplish full redundancy between the two SANs. MirrorView allows synchronous and asynchronous mirroring along with array-based remote replication to copy data to a second site.

The scalability of the new SAN system already has proved its mettle. Even though it’s barely a year old, Groff already is preparing to add more drives. “We’re confident that this unit will last us 10 years, and that’s exactly what the company needed to keep growing.”

Although the IT team is confident now in the infrastructure, there were definitely some lessons learned along the way, Groff says. First, before even researching the options, it’s critical to determine all the elements of the storage environment that need to be replaced and then to set a vision for the future based on data growth projections.

“You can’t buy any type of storage without getting a base of where you are storagewise and where you think you’ll be several years out,” Groff says. “With a purchase of this size, you want to be as spot-on as possible.” That information is necessary not only when determining how much storage capacity to buy but also when setting the project budget.

Don’t shortchange training either. Moving to a new storage infrastructure can be jolting for an IT staff used to doing things differently, Groff points out. “Training is well worth the time.”

Finally, when it comes to choosing the product, reliability and scalability should be top of mind, Groff says. No matter how much preplanning and analysis the IT team does, it can’t foresee every possible future need, he says. “Make sure to get something that can scale without a lot of hassle because changes happen that you can’t predict.”

To download this story in PDF format, click here.

Robbie McClaran