Loading...
Loading

Improving The Performance Of Cloud Services With Clustered Storage

2014-02-13by Jamin Andrews

Conetix a Web Solutions Provider has recently rolled out its clustered storage platform whereabouts pools of storage devices are connected together into clusters to deliver the capacity, resiliency and availability required of the cloud.

The key criteria for any storage system is performance, scalability and reliability. While all of these can be achieved, doing so in a cost effective manner without compromise has been a difficult process with traditional storage systems. These traditional systems can be very costly to upgrade, especially if you require both increased capacity and performance.

With the advent of clustered storage platforms these barriers are removed, opening the doors to high performance, scalable systems at a cost effective price. At Conetix, one of the great benefits of being at the frontier of technology is being able to rollout some pretty unique gear. We have always invested heavily in good technology, which in turn helps deliver fast and reliable systems.

To solve the storage challenge, we have rolled out  a clustered storage platform, a design whereby pools of storage devices are connected together into clusters to deliver the capacity, resiliency and availability required of the cloud. Earlier on we designed our storage strategy around a combination of direct attached storage and storage area networks (SAN) but this has proven to be expensive and locked us in to a single solution.

The technology we have based this on is Parallels Cloud Storage. This allows us to use of off-the-shelf servers and distribute the storage blocks across all systems for a combined compute / storage cluster platform.

By migrating to this clustered storage, we can achieve greater speeds and greater redundancy yet at a lower cost than expanding our existing SAN.  This results in greater protection against disk failure as well as maintaining our high performance for our customers.

System Design

As per Parallels recommendations, we’re using a three-replica based design. This means that every block of data is distributed to a minimum of three other storage nodes. What this provides is the ability to drop multiple nodes in the cluster yet the storage will still remain online. This is separate to our backups of course, which again adds another layer of data integrity to ensure  data isn’t lost. 

The other important part of the rollout is ensuring the network infrastructure is up to the task. As anyone who’s worked with large networks will know, there’s a lot of complexity that can quickly bring things undone in a hurry. Adding redundancy and “stacking” the switches then adds yet another layer on top of this. Like the servers, we have stuck with Dell kit here to ensure compatibility and we’re familiar with the Dell configuration and performance.

Testing to Ensure Stability and Performance

Few customers are willing to take risks when it comes to new technologies. Cloud computing is no exception.

Before rolling out any new platform, we run through a number of tests and benchmarks to ensure the stability and performance remain as expected. Testing a server failure in a test environment over and over using different scenarios has given us some great confidence in the system. It’s also meant that we have been able to document our procedures for disaster recovery so that in the event of a real failure, we have proven methods of recovery.

Initial benchmarks look very healthy, we’ve run a few tests and so far we’re able to peak around 2.5GB/s write, with acceptable latency around 2GB/s. This is with a 64-thread read/write workload, which is far more than we expect to see in typical scenarios.

Even when deliberately degrading the cluster by switching a node off, we’re not seeing any significant change in performance. This is exactly what we wanted to see!

Monitoring System Performance

In the age of the cloud where a business can come down permanently very quickly, or have great difficulty in getting back from a disaster, customers are ultra-paranoid about their technology investments and the providers delivering those services. At Conetix, we take these concerns very seriously. 

We keep a very close eye on how the system performs via some internal monitoring systems. We use a combination of commercial software (such as PRTG) as well as our own custom developed system. This gives us not only fault tolerant monitoring but differing data to help track down the cause of any issue.

Rest assured, with three replicas of your data on our cluster combined with 4-hour warranty response from Dell we can ensure the cluster does not go offline. Our alerting has multiple escalation points as well as notifying multiple staff members. Monitoring has always been a strong focus at Conetix, we always strive to ensure any faults or performance variances are quickly dealt with.

“No Compromise” Solution for Our Customers

We’re really excited by this system and the opportunities that it offers our clients. Combining high performance Dell systems along with high levels of redundancy is a “no compromise” solution that has previously been very costly to implement.

Progressively over 2014, we’ll be rolling both new and existing nodes into our cluster to increase performance and reliability even further. We’re also testing Solid State Drive (SSD) acceleration, which will allow us to reach beyond 100,000 IOPS.

news Buffer
Author

Jamin Andrews

Jamin Andrews

Conetix

Having worked in the IT industry for almost 20 years, I have worked with many of Australia's leading IT professionals that have inspired and driven me to excel in this industry. I founded Conetix Premier Web Solutions in 1998 and the company has recorded substantial growth year on year. At Conetix we treat your business as our own and strive to consistently provide you with the excellent service and support you deserve. Our priority is to maintain all the backend techy stuff so you can focus on what’s important – your business and customers.

View Jamin Andrews`s profile for more
line

Leave a Comment