<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>vmweaver.com &#187; Virtualization</title>
	<atom:link href="http://vmweaver.com/index.php/tag/virtualization/feed/" rel="self" type="application/rss+xml" />
	<link>http://vmweaver.com</link>
	<description>Mindless ramblings of a geek...</description>
	<lastBuildDate>Thu, 06 Oct 2011 20:42:50 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.1.2</generator>
		<item>
		<title>Zen and the Mystical Art of Candidacy Analysis</title>
		<link>http://vmweaver.com/index.php/2009/03/zen-and-the-mystical-art-of-candidacy-analysis/</link>
		<comments>http://vmweaver.com/index.php/2009/03/zen-and-the-mystical-art-of-candidacy-analysis/#comments</comments>
		<pubDate>Thu, 12 Mar 2009 03:54:34 +0000</pubDate>
		<dc:creator>Mark A. Weaver</dc:creator>
				<category><![CDATA[Technology]]></category>
		<category><![CDATA[Virtualization]]></category>
		<category><![CDATA[candidacy analysis]]></category>

		<guid isPermaLink="false">http://vmweaver.com/?p=116</guid>
		<description><![CDATA[Okay so maybe there is no Zen, nor is it really &#8220;mystical&#8221;.  One of the MANY challenges we faced while trying to implement a brand-spanking-new virtualization infrastructure was to put together some type of guidance for our server admin teams to help them determine if existing systems would &#8220;live&#8221; happily in our new environment. After reading recommendations from [...]]]></description>
			<content:encoded><![CDATA[<p>Okay so maybe there is no Zen, nor is it really &#8220;mystical&#8221;. </p>
<p>One of the MANY challenges we faced while trying to implement a brand-spanking-new virtualization infrastructure was to put together some type of guidance for our server admin teams to help them determine if existing systems would &#8220;live&#8221; happily in our new environment.</p>
<p>After reading recommendations from VMware, digging through Microsoft Management Packs for Operations Manager, etc., it didn&#8217;t seem that any of those really fit our needs nor did they make much sense.</p>
<p>So we happen to use Operations Manager to collect performance data on our Windows-based servers, but there is a TON of data.</p>
<p>We also had to make sure that the tools we were using (Operations Manager) didn&#8217;t aggregate our data.  We NEEDED to have granular data for our entire 3 month look-back period.  We just didn&#8217;t like the idea of averaging averages, so we have data points every 5 or 15 minutes (depending on the counter) for 90 days.  One of the early reports we developed let admins specify the peak usage times so that the performance analysis could be tailored for specific systems.</p>
<p>Now we have all of this data but how do we look at it?  How do we know what&#8217;s important?  What recommendations from VMware and MS do we use and which ones do we throw away?</p>
<p>Several of the deficiencies we saw with most tools are:</p>
<ol>
<li>They average entire periods of time (a day, a week, a month) with no regard for peak usage times</li>
<li>They use aggregated data (kind of like above)</li>
<li>They assume that &#8220;CPU %&#8221; means the same thing for all systems</li>
</ol>
<p>So now we have to look only at peak times for specific servers.  Wow..that can be a VERY daunting task unless you make SOME assumptions.  That&#8217;s exactly what we did.</p>
<p>We know that not ALL systems will have the same usage patterns, but generally our systems will be the busiest during normal business hours at our corporate data center.  So, that would be Monday &#8211; Friday, 8am-5pm CST.  Well, we also have people using our systems on the east and west coasts, so if we move our &#8220;peak&#8221; time to 7am-7pm CST, we should cover most of the usage we wish to capture.</p>
<p>You  may be asking why not just take all data points and be done with it&#8230; I know *WE* did.</p>
<p>It would certainly have been easier, but far less accurate.  To demonstrate this, think about this:</p>
<ul>
<li>A system has a CPU utilization of 75% all the time during peak usage times</li>
<li>This same system has a CPU utilization of 10% during off-peak times</li>
<li>This system has a CPU utilization of 10% during the ENTIRE weekend</li>
</ul>
<p>Okay, so if we take the mean average (the normal practice) we get something in the range of&#8230;well.. I am not gonna do the math, but it is WAY less than 75%.  We THEN thought, well&#8230; why not just take the highest or maximum measurement for the entire period and plan for that?  That doesn&#8217;t work out so well either if you have systems that don&#8217;t do ANYTHING, then a virus scan kicks off and pegs the CPU for 3o minutes.  This greatly throws our performance footprint off.  So now we have to find a happy sweet-spot for evaluating performance.</p>
<p>After mulling through all of our options and thinking about what types of data were REALLY helpful in this process, we decided to do a &#8220;Top 20% Average&#8221;.  I know it sounds a little strange, but it is pretty straight-forward. How did we come up with this number?  Well, we had to pick a number and threw some others around and this one seems to work, so we stuck with it.  The calculation is pretty simple:</p>
<ol>
<li>Take all of your data points</li>
<li>Sort them High to Low</li>
<li>Take the top 20% of your values and average them</li>
</ol>
<p>This gives us a nice look at what our systems do when they are operating at or near their highest loads.  The benefit of having a Top 20% Average and its normal average is that the greater the difference between the two, the &#8220;more spiky&#8221; the performance is.  As the numbers get closer and closer we can see that our systems are more consistently busy during our peak usage time.</p>
<p>The two most important things we need to be looking at are CPU and Memory utilization since those are often the most limiting resources in our environment.  Of less importance to us were Disk IO and Network IO because we were really looking at &#8220;low hanging fruit&#8221; types of systems and NORMALLY if a system is disk or network heavy, CPU utilization will also shoot up.</p>
<p>So now we have this notion of Top 20% Average (T20 Average) and Average, but what are we actually analyzing?</p>
<h2><span style="text-decoration: underline;">Not So Normal CPU Usage</span></h2>
<p>Well, we can&#8217;t just look at CPU Utilization Percent!  Why, you may ask? Well CLEARLY the following are not equivalent:</p>
<ul>
<li>1 CPU &#8211; 700MHz system running 75% utilization</li>
<li>4 CPU &#8211; 2.2 GHz system running 20% utilization</li>
</ul>
<p>Many tools used to evaluate this type of data look at Total CPU Percent Utilization.  This would mean our 1 CPU system has a larger processor footprint than our 4 CPU system.  This doesn&#8217;t really make sense.</p>
<p>One of the interesting strategies VMware presents in some of their documentation is this idea of a &#8220;normalized&#8221; CPU.   Basically this is kind of calculating how many megahertz  a system uses instead of using a percentage (which is relative to how many horses are under the hood).  While this isn&#8217;t entirely accurate, it does help us create high-water marks for a normalized calculation.</p>
<p>In the above example, our 1CPU system would have a normalized CPU usage of :</p>
<ul>
<li>1 CPU x 700MHz  x .75 = <strong>525MHz</strong></li>
</ul>
<p>Our 4CPU system would normalize to :</p>
<ul>
<li>4 CPU x 2200MHz x .20 = <strong>1760MHz</strong></li>
</ul>
<p>It is MUCH easier to compare the 2 of them now and to see which one will probably have a larger processor footprint.  I do realize that it doesn&#8217;t really take into account performance of multi-threaded applications and such, but it does a pretty good job.</p>
<p>So, how do we get all that info to calculate it? Well, we also happen to have SMS or SCCM available to collect configuration information about our systems.  This enables us to create some custom database views that contained system information regarding CPU Speeds, number of CPUs,  Hyperthreading configuration, etc.  We needed to have info on Hyper-Threading enabled hardware because Windows actually reports those &#8220;HT&#8221; processors as physical processors.  Unfortunately a  HT processor doesn&#8217;t really have the same performance as a TRUE physical processor.</p>
<p>To get around this, we figure a HT Processor is about a half of a physical processor.  So if our 4CPU system above is really a 2CPU system with HT, it would look like this:</p>
<ul>
<li>2 CPU x 2200MHz x .20 = <strong>880MHz</strong></li>
<li>2 CPU x .5 (because they are HT)  x 2200MHz x .20 = <strong>440MHz</strong></li>
<li>Total Normalized CPU usage is <strong>1320MHz</strong></li>
</ul>
<p>I hope I did an okay job of explaining it.  If not, please let me know.</p>
<h2><span style="text-decoration: underline;">Memory and Commitment</span></h2>
<p>Memory is VERY easy to evaluate compared to CPU numbers as it is already normalized (kinda).  We will use the same methodologies for memory as we did for CPU with respect to Average Values and peak times.  When looking at memory counters, we are interested in finding out how much memory our systems are using, not how much is free, or the percent free or anything like that.</p>
<p>As Microsoft calls it, we need the &#8220;<strong>Committed Bytes in Use</strong>&#8221; counter.   I don&#8217;t think Operations Manager collects this info by default, so you may need to start collecting it to do this type of analysis.</p>
<p>Other than using the right counter it is pretty much the same as CPU.</p>
<h2><span style="text-decoration: underline;">Disk and Network Stuff</span></h2>
<p>While we DO care about network and disk throughput, they typically will be weighted much less than CPU and memory utilization when looking at the overall analysis.  If you are looking at virtualizing some beefy SQL or Exchange systems, then these will become MUCH more important, but probably still following memory and cpu.</p>
<p>We use the same Average calculations as we do for CPU and Memory, but we will look at the disk counter &#8220;Disk Bytes per Second&#8221;  for this analysis and<br />
&#8220;Total Bytes per second&#8221;(I think) for the network counter.  It MAY be important to eliminate loopback adapters in the query for network data, but most of the time our data comes back with numbers that are WAY below our thresholds for candidacy on even relatively busy systems.</p>
<h2><span style="text-decoration: underline;">Whew!!!</span></h2>
<p>So, I made it through this in one fell swoop.  I am SURE that it isn&#8217;t written perfectly, but I wanted to get my thoughts down on this topic.  If you have any questions about it or recommendations&#8230;.. PLEASE comment and open a dialogue.</p>
<p>Thanks for making it through this&#8230; and good luck.</p>
<p>&#8211; Mark</p>
]]></content:encoded>
			<wfw:commentRss>http://vmweaver.com/index.php/2009/03/zen-and-the-mystical-art-of-candidacy-analysis/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

