RAID System performance surprises
Implementing MySQL database in 24/7 environments we typically hope for uniform component performance, or at least would like to be able to control it. Typically this is indeed the case, for example CPU will perform with same performance day and night (unless system management software decides to lower CPU frequency due to overheating).
This is also almost the case with Hard Drives - there are possible performance differences based on where data is stored on the disk, amount of remapped sectors etc. There is also database and file system fragmentation however these also tend to accumulate in predictable fashion.
If you have RAID controller this well may not be the case - to protect your data RAID controller may implement bunch of algorithms which can affect performance dramatically.
For example speaking about PERC5 (LSI MegaRaid) - Pretty typical controller from Dell installed on PowerEdge 1950, 2950 etc you should be aware of couple of things
Battery Learning and Charging Build in Battery has to pass through learning cycle every 3 months or so and this cycle takes about 7 hours according to the docs. During this time battery backed up cache will be disabled and system will operate with write through cache possibly slowing down write performance several times.
Patrol Read This is feature which should discover bad sectors before it is too late and it is doing so by doing disk read checks periodically. When it wakes up it will some IO resources (30% by default) which will affect your performance some way.
Consistency Checks This is another thing which I’ve seen initiated by controller (though I’m not sure on this one) - which pretty much checks the disks are in sync - this also can slow down performance dramatically.
So what you can do about these ?
First you should not have any of this to come as surprise for you when you discovered your server just stopped performance during the time you planned investor show case or other important event. Learn what cron jobs does your RAID card have and see how they can be controlled - may be schedule them during least busy intervals or something similar.
Also you should be ready for degraded and rebuild RAID mode - when one of the disk fails and you replace it with another one which needs to be rebuilt. This means you already should leave some slack of the system. It often would be enough for consistency check and patrol read but not for battery backed up cache being temporary disabled.
Another thing you can do is of course switch to another server and take this down for maintenance if this learning process can’t be scheduled when it is non intrusive. To do this properly however you need to know when it is about to happen.
0 comments:
Post a Comment