Jul 20, 2009 Legacy

(gs) Updates. Storage Transition COMPLETE and more!

Gen2 Upgrade - 95 Percent Complete!

It’s time for some more news about the GRID. In this session, we’ll cover:

  • Gen 2 Upgrade – 95% complete
  • Gen 1 Storage is in the dumpster!
  • More GRID News.
  • Anti-Spam Update – Better stopgaps. RBL reputations continue to improve.
  • Recent System Incidents

Gen 2 Upgrade Progress

Looks like we’ve reached about 95% completion of a full Gen 2 transition for Clusters 1 and 2! Remember, this transition is happening in two phases: 1st Storage; 2nd Full-system architecture, and we are very near the end of the entire upgrade.

The priority of our engineers, now, is focusing on removing single points of failure by bringing online our Gen 2 OS Redundancy, followed immediately by upgrading the backup architecture to provide quicker recovery and failover to match Clusters 3, 4 and 5. Full OS Redundancy should be finished within a week with backup architecture to follow .

GEN 1 storage is in the dumpster!

The transition to Gen 2 storage on Clusters 1 & 2 is now finished! We have successfully migrated all customer sites to our Gen 2 storage systems powered by Sun — a milestone for (gs) Grid-Server. This means past incidents caused by Gen 1 storage will not reoccur in the future. More importantly, we can now begin decommissioning Gen 1 storage, thus unifying the storage platform across the entire GRID.

More GRID News!

Performance enhancements

We’ve added a sizable amount of storage performance to our clusters in past months. Customers are definitely noticing the increase in speed and reliability. Thank you for your feedback. We never stop making refinements to the GRID.

Better Storage Segment quotas

A recent System Incident revealed that no system is perfect and the unexpected can result in a difficult situation. As we outlined in our Incident Review of #867, we’re changing the formula for allocating sites onto our Gen 2 Storage segments. We are confident these changes will give us an improved safety net to catch issues before you do.

Sun Solaris 10 U7

The latest version of Sun Solaris 10 has been deployed to one of our forward-facing (gs) Storage segments (Cluster 2, Segment 07) and several of our backup segments. This OS release includes several enhancements that we are already seeing benefits from.

MySQL SmartPool Issues

System monitoring and forum discussions have shown that some of our customers have been experiencing database related problems. We have started working on a resolution for this we think our customers are going to enjoy. If you have any issues with your databases please contact our Support Department. We can provide you with a temporary solution.

Anti-Spam Update

We have been busy adding further stopgaps for better detection of outgoing spam from the GRID.

If we suspect an email you’re sending is spam, you’ll get a friendly notification pointing you to help resources. The spammers get shut out, and you get a helping hand. We have reduced ALL of our outgoing mail by 20%, accounting for a large chunk of spam that was previously leaving our network, and getting the GRID into trouble. With this significant reduction our reputation with RBLs has been improving, and will only continue to get better.

This new system still needs further refinement, but so far it has proven to be very effective from initial observations. We will continue refining our systems until we are satisfied that you are satisfied!

Recent System Incidents

We recently had a couple of System Incidents that we want you to know about.

  1. Incident #867: Email access and web latency on Cluster 02
  2. Incident #864: Brief Availability Issues on Cluster 05

Both of these incidents occurred on Clusters containing Gen 2 technology. It is important to note that the root causes for both of these incidents were not directly related to the platform itself. Here’s what went wrong:

  • Incident 867 related to a rare instance of file-system fragmentation we had not previously encountered. This was related to the rapid movement of data off of the Gen 1 storage. For further details please see the incident review linked above.
  • Incident 864 was related to a bad networking interface that has since been fixed. We have built corrective measures into our design process so that this segment has more redundancy features.

Thank you for reading. Remember, we’re always here to listen. Leave us a comment or join our (mt) User Forums to participate in our community discussions.

About the Author More by this Author