![]() ![]() ![]() The next step in real troubleshooting here would be to verify whether the disks are balanced or if the aggregate needs a reallocate. In this case, this volume is clearly experiencing periods of slow performance. Having a plot of how that latency increases with workload in your environment is a good thing to have.Ī quick example of how to follow this through:Ĭheck if latency is really an issue (often people blame storage performance when that is not the issue.) (Which makes sense – the only way to keep a disk busy is to have work queued up waiting for it – which of course will increase the response time for those requests.) As the disks approach 100% busy, latency of each request increases. Assuming there is no need for a reallocate (the disks are evenly loaded – I’ll write a separate article about how to determine that), how can you tell when what level of disk busy-ness is acceptable? Visualizing that performance like the below is what this post is about.ĭisks may be rated to 250 IOps – but that doesn’t mean that latency won’t be affected well before that level. If you have decent NetApp monitoring, you should of course be monitoring the thing that matters – and (for most applications) that is the latency of requests on a volume (or LUN.)Įasy enough to get – with LogicMonitor it’s graphed and alerted on automatically, for every volume. But of course when there is an issue, the focus changes to why there is latency. Usually it’s a limitation of the disks in the aggregate being IO bound.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |