Archive

Author Archive

Powershell and DFSR

April 7th, 2009 Mark A. Weaver 81 comments
Rating 4.67 out of 5

Sorry for the long delay between posts, but work has been absolutely crazy.

Anyway, one of the recent tasks I have been working on is to find a way to check DFSR to make sure that our remote sites are properly replicating data back to our corporate datacenter.  Part of this new infrastructure relies heavily on Microsoft DFSR and all the cool stuff it brings (in 2003 R2).

Our support teams have been asking for ways to ensure that data has completely synchronized to our corporate datacenter every night.  Unfortunately there isn’t an easy way to determine this scriptomatically.  Well leave it to me to try some different things and attempt to put SOMETHING in place to do this.

Basically we have remote sites replicating during non-business-hours back to a central “hub” DFSR server.  We would then backup this “hub” server with our corporate backup infrastructure.  This is a WHOLE lot easier than getting users in remote sites to swap tapes or whatever and send them offsite, etc. 

The only way I have been able to determine the state of replication is to query the “backlog” of the remote site DFSR servers.  This should tell us how many files are sitting there awaiting replication. DFSRDIAG is a tool that can help us enumerate these files, but then we have to parse out the data.  We also need to know which replication partner, which replicated folder, and which replication group these remote sites belong to.

One way to enumerate that info is through a WMI query.  From the DFSR “hub” server you can enumerate all DFSR connections, groups, folders, etc. by running some queries against the “MicrosoftDFS” namespace.  This is different from standard WMI queries because the default namespace (cimv2) does not contain any DFSR configuruations.

Once we connect to this namespace, it is a fairly trivial task to cycle through all the connection partners, replication groups, and replicated folders.

We can then run the “DFSRDIAG” tool to see how many files are in the backlog.

Once we have determine how many files are out there for each replicated folder, we then write a custom event log entry and have our monitoring tools pick those up.

For this script I have set a threshold of 10 files before writing an “error” event log.  This can easily be changed based on your specific needs, though.  

You should also be able to easily customize the eventIDs and source information by modifying the values assigned to those variables.

For actually writing to the event log, I am “borrowing” some code my colleauge Mike put together.

Anyway, I think the script is fairly self explanitory.  If you need additionaly info or have questions, please let me know.

Thanks and happy scripting…

– Mark

## Check-DFSR.ps1 script
## Written by: Mark A. Weaver
## Site: http://www.vmweaver.com
## Version: 2.0
## Date: 5/7/2009
## Purpose: This script will query the local WMI root for DFS replication groups and folders.  
##				It will then run DFS utilities to determine the number of files in the backlog on the
##          destination partners in the replication group.
##          
##          This script was written for the spefic use of being run on a centralized DFSR server
##          which acts as the HUB for remote office backups.
##         
##        
##          Monitoring Rules can be setup to collect and report on the events being generated.
## 
##          Event information is written to the Application log using the EventIDs at the bottom.
## Input: None
#############################
## Updates:
##  20090408 Weaver: Fixed issue where multiple events are generated throughout the execution
##  20090408 Weaver: Added BacklogFileCount to event message
##  20090409 Weaver: Fixed list of replication connections issue due to change in replication topology
##  20090507 Weaver: Added functionality to return results from all partners in the replication
##
##
######################################################################
######################################################################
# Write-Event powershell function
# Written by Mike Hays
# http://blog.mike-hays.net
#
#
 
function Write-Event(
	[string]$Source = $(throw "An event Source must be specified."),
	[int]$EventId = $(throw "An Event ID must be specified."),
	[System.Diagnostics.EventLogEntryType] $EventType = $(throw "Event EventType must be specified. (Error, Warning, Information, SuccessAudit, FailureAudit)"),
	[string]$Message = $(throw "An event Message must be specified."),
	$EventLog
)
{
	#Uncommon event logs can be specified (even custom ones), but since that isn't generally
	#the desired result, I prevent that here
	$acceptedEventLogs = "Application", "System"
	if ($eventEventLog -eq $null)
	{
		$eventEventLog = "Application"
	}
	elseif (!($acceptedEventLogs -icontains $eventEventLog))
	{
		Write-Host "This function supports writing to the following event logs:" $acceptedEventLogs
		Write-Host "Defaulting to Application Eventlog"
		$eventEventLog = "Application"
	}
 
	#Create a .NET object that is connected to the Eventlog
	$event = New-Object -type System.Diagnostics.Eventlog -argumentlist $EventLog
	#Define the Source property
	$event.Source = $Source
	#Write the event to the log
	$event.WriteEntry($Message, $EventType, $EventId)
}
 
######################################################################
######################################################################
## Main 
## Errors written:
##   Log File: Application
##   Source: Check-DFSR Script
##   ID: 9500 - Lists fully replicated replication folders
##   ID: 9501 - Lists replication folders with less than the $BacklogErrorLevel files waiting 
##   ID: 9502 - Lists replication folders with more than the $BacklogErrorLevel files waiting
##   ID: 9503 - If a connection is not pingable, this event is written.
 
$BacklogErrorLevel = 10 
 
$ComputerName = $env:ComputerName
## Query DFSR groups from the local MicrosftDFS WMI namespace.
$DFSRGroupWMIQuery = "SELECT * FROM DfsrReplicationGroupConfig"
$RGroups = Get-WmiObject -Namespace "root\MicrosoftDFS" -Query $DFSRGroupWMIQuery
 
 
## Setup my variables
$ping = New-Object System.Net.NetworkInformation.Ping
$SuccessAudit = $Null
$WarningAudit = $Null
$ErrorAudit = $Null
$EventSource = "Check-DFSR Script"
$SuccessEventID = 9500
$WarningEventID = 9501
$ErrorEventID = 9502
$NoPingEventID = 9503
 
foreach ($Group in $RGroups)
{
	## Cycle through all Replication groups found
	$DFSRGFoldersWMIQuery = "SELECT * FROM DfsrReplicatedFolderConfig WHERE ReplicationGroupGUID='" + $Group.ReplicationGroupGUID + "'"
	$RGFolders = Get-WmiObject -Namespace "root\MicrosoftDFS" -Query $DFSRGFoldersWMIQuery
 
	## Grab all connections associated with a Replication Group
	$DFSRConnectionWMIQuery = "SELECT * FROM DfsrConnectionConfig WHERE ReplicationGroupGUID='" + $Group.ReplicationGroupGUID + "'"
	$RGConnections = Get-WmiObject -Namespace "root\MicrosoftDFS" -Query $DFSRConnectionWMIQuery	
	foreach ($Connection in $RGConnections)
	{
 
		$ConnectionName = $Connection.PartnerName.Trim()
		$IsInBound = $Connection.Inbound
		$IsEnabled = $Connection.Enabled
 
		## Do not attempt to look at connections that are Disabled
		if ($IsEnabled -eq $True)
		{  
			## If the connection is not ping-able, do not attempt to query it for Backlog info
			$Reply = $ping.send("$ConnectionName")
			if ($reply.Status -eq "Success")
			{
 
 
				## Cycle through the Replication Folders that are part of the replication group and run DFSRDIAG tool to determine the backlog on the connection partners.
				foreach ($Folder in $RGFolders)
				{
					$RGName = $Group.ReplicationGroupName
					$RFName = $Folder.ReplicatedFolderName
 
					## Determine if current connect is an inbound connection or not, set send/receive members accordingly
					if ($IsInBound -eq $True)
					{
						$SendingMember = $ConnectionName
						$ReceivingMember = $ComputerName
					}
					else
					{
						$SendingMember = $ComputerName
						$ReceivingMember = $ConnectionName
					}
					   $Out = $RGName + ":" + $RFName +  " - S:"+$SendingMember + " R:" + $ReceivingMember 
					   Write-Host $Out
						## Execute the dfsrdiag command and get results back in the $Backlog variable
						$BLCommand = "dfsrdiag Backlog /RGName:'" + $RGName + "' /RFName:'" + $RFName + "' /SendingMember:" + $SendingMember + " /ReceivingMember:" + $ReceivingMember
						$Backlog = Invoke-Expression -Command $BLCommand
 
						$BackLogFilecount = 0
						foreach ($item in $Backlog)
						{
							if ($item -ilike "*Backlog File count*")
							{
								$BacklogFileCount = [int]$Item.Split(":")[1].Trim()
							}
 
						}
 
 
						if ($BacklogFileCount -eq 0)
						{
							#Update Success Audit 
							$SuccessAudit += $RGName + ":" + $RFName + " is in sync with 0 files in the backlog from "+ $SendingMember + " to " + $ReceivingMember +".`n"					
 
						}
						elseif ($BacklogFilecount -lt $BacklogErrorLevel)
						{
							#Update Warning Audit
							$WarningAudit += $RGName + ":" + $RFName + " has " + $BacklogFileCount + " files in the backlog from " + $SendingMember + " to " + $ReceivingMember + ".`n"
						}
						else
						{
							#Update Error Audit
							$ErrorAudit += $RGName + ":" + $RFName + " has " + $BacklogFilecount + " files in the backlog from " + $SendingMember + " to " + $ReceivingMember + ".`n"
						}
						#Write-Host + $Folder.ReplicatedFolderName "- " $BackLogFilecount -foregroundcolor $FGColor
					}
				}
				else
				{ 
				Write-Host $ConnectionName "is not pingable" 
				$NoPingMessage = "Server """ + $ConnectionName + """ could not be reached.`nPlease verify it is on the network and pingable."
				Write-Event $EventSource $NoPingEventID "Warning" $NoPingMessage "Application"
				}
			}
 
	}
 
}
## Write my events to the local Application log.
 
if ($SuccessAudit -ne $Null)
{
	Write-Event $EventSource $SuccessEventID "Information" $SuccessAudit "Application"
}
 
if ($WarningAudit -ne $Null)
{
	Write-Event $EventSource $WarningEventID "Warning" $WarningAudit "Application"
}
 
if ($ErrorAudit -ne $Null)
{
	Write-Event $EventSource $ErrorEventID "Error" $ErrorAudit "Application"
}
Categories: Powershell, Scripting Tags: , ,

Zen and the Mystical Art of Candidacy Analysis

March 11th, 2009 Mark A. Weaver No comments
Rating 4.00 out of 5

Okay so maybe there is no Zen, nor is it really “mystical”. 

One of the MANY challenges we faced while trying to implement a brand-spanking-new virtualization infrastructure was to put together some type of guidance for our server admin teams to help them determine if existing systems would “live” happily in our new environment.

After reading recommendations from VMware, digging through Microsoft Management Packs for Operations Manager, etc., it didn’t seem that any of those really fit our needs nor did they make much sense.

So we happen to use Operations Manager to collect performance data on our Windows-based servers, but there is a TON of data.

We also had to make sure that the tools we were using (Operations Manager) didn’t aggregate our data.  We NEEDED to have granular data for our entire 3 month look-back period.  We just didn’t like the idea of averaging averages, so we have data points every 5 or 15 minutes (depending on the counter) for 90 days.  One of the early reports we developed let admins specify the peak usage times so that the performance analysis could be tailored for specific systems.

Now we have all of this data but how do we look at it?  How do we know what’s important?  What recommendations from VMware and MS do we use and which ones do we throw away?

Several of the deficiencies we saw with most tools are:

  1. They average entire periods of time (a day, a week, a month) with no regard for peak usage times
  2. They use aggregated data (kind of like above)
  3. They assume that “CPU %” means the same thing for all systems

So now we have to look only at peak times for specific servers.  Wow..that can be a VERY daunting task unless you make SOME assumptions.  That’s exactly what we did.

We know that not ALL systems will have the same usage patterns, but generally our systems will be the busiest during normal business hours at our corporate data center.  So, that would be Monday – Friday, 8am-5pm CST.  Well, we also have people using our systems on the east and west coasts, so if we move our “peak” time to 7am-7pm CST, we should cover most of the usage we wish to capture.

You  may be asking why not just take all data points and be done with it… I know *WE* did.

It would certainly have been easier, but far less accurate.  To demonstrate this, think about this:

  • A system has a CPU utilization of 75% all the time during peak usage times
  • This same system has a CPU utilization of 10% during off-peak times
  • This system has a CPU utilization of 10% during the ENTIRE weekend

Okay, so if we take the mean average (the normal practice) we get something in the range of…well.. I am not gonna do the math, but it is WAY less than 75%.  We THEN thought, well… why not just take the highest or maximum measurement for the entire period and plan for that?  That doesn’t work out so well either if you have systems that don’t do ANYTHING, then a virus scan kicks off and pegs the CPU for 3o minutes.  This greatly throws our performance footprint off.  So now we have to find a happy sweet-spot for evaluating performance.

After mulling through all of our options and thinking about what types of data were REALLY helpful in this process, we decided to do a “Top 20% Average”.  I know it sounds a little strange, but it is pretty straight-forward. How did we come up with this number?  Well, we had to pick a number and threw some others around and this one seems to work, so we stuck with it.  The calculation is pretty simple:

  1. Take all of your data points
  2. Sort them High to Low
  3. Take the top 20% of your values and average them

This gives us a nice look at what our systems do when they are operating at or near their highest loads.  The benefit of having a Top 20% Average and its normal average is that the greater the difference between the two, the “more spiky” the performance is.  As the numbers get closer and closer we can see that our systems are more consistently busy during our peak usage time.

The two most important things we need to be looking at are CPU and Memory utilization since those are often the most limiting resources in our environment.  Of less importance to us were Disk IO and Network IO because we were really looking at “low hanging fruit” types of systems and NORMALLY if a system is disk or network heavy, CPU utilization will also shoot up.

So now we have this notion of Top 20% Average (T20 Average) and Average, but what are we actually analyzing?

Not So Normal CPU Usage

Well, we can’t just look at CPU Utilization Percent!  Why, you may ask? Well CLEARLY the following are not equivalent:

  • 1 CPU – 700MHz system running 75% utilization
  • 4 CPU – 2.2 GHz system running 20% utilization

Many tools used to evaluate this type of data look at Total CPU Percent Utilization.  This would mean our 1 CPU system has a larger processor footprint than our 4 CPU system.  This doesn’t really make sense.

One of the interesting strategies VMware presents in some of their documentation is this idea of a “normalized” CPU.   Basically this is kind of calculating how many megahertz  a system uses instead of using a percentage (which is relative to how many horses are under the hood).  While this isn’t entirely accurate, it does help us create high-water marks for a normalized calculation.

In the above example, our 1CPU system would have a normalized CPU usage of :

  • 1 CPU x 700MHz  x .75 = 525MHz

Our 4CPU system would normalize to :

  • 4 CPU x 2200MHz x .20 = 1760MHz

It is MUCH easier to compare the 2 of them now and to see which one will probably have a larger processor footprint.  I do realize that it doesn’t really take into account performance of multi-threaded applications and such, but it does a pretty good job.

So, how do we get all that info to calculate it? Well, we also happen to have SMS or SCCM available to collect configuration information about our systems.  This enables us to create some custom database views that contained system information regarding CPU Speeds, number of CPUs,  Hyperthreading configuration, etc.  We needed to have info on Hyper-Threading enabled hardware because Windows actually reports those “HT” processors as physical processors.  Unfortunately a  HT processor doesn’t really have the same performance as a TRUE physical processor.

To get around this, we figure a HT Processor is about a half of a physical processor.  So if our 4CPU system above is really a 2CPU system with HT, it would look like this:

  • 2 CPU x 2200MHz x .20 = 880MHz
  • 2 CPU x .5 (because they are HT)  x 2200MHz x .20 = 440MHz
  • Total Normalized CPU usage is 1320MHz

I hope I did an okay job of explaining it.  If not, please let me know.

Memory and Commitment

Memory is VERY easy to evaluate compared to CPU numbers as it is already normalized (kinda).  We will use the same methodologies for memory as we did for CPU with respect to Average Values and peak times.  When looking at memory counters, we are interested in finding out how much memory our systems are using, not how much is free, or the percent free or anything like that.

As Microsoft calls it, we need the “Committed Bytes in Use” counter.   I don’t think Operations Manager collects this info by default, so you may need to start collecting it to do this type of analysis.

Other than using the right counter it is pretty much the same as CPU.

Disk and Network Stuff

While we DO care about network and disk throughput, they typically will be weighted much less than CPU and memory utilization when looking at the overall analysis.  If you are looking at virtualizing some beefy SQL or Exchange systems, then these will become MUCH more important, but probably still following memory and cpu.

We use the same Average calculations as we do for CPU and Memory, but we will look at the disk counter “Disk Bytes per Second”  for this analysis and
“Total Bytes per second”(I think) for the network counter.  It MAY be important to eliminate loopback adapters in the query for network data, but most of the time our data comes back with numbers that are WAY below our thresholds for candidacy on even relatively busy systems.

Whew!!!

So, I made it through this in one fell swoop.  I am SURE that it isn’t written perfectly, but I wanted to get my thoughts down on this topic.  If you have any questions about it or recommendations….. PLEASE comment and open a dialogue.

Thanks for making it through this… and good luck.

– Mark

Powershell Power! (Part Deux)

February 23rd, 2009 Mark A. Weaver No comments
Rating 3.50 out of 5

Okay, so this is the second installment of my little tutorial-thingy for Powershell.  After much thought on where I want this to go, I figured the next logical step would be to talk about setting up your Powershell (“PoSh”) environment.

So, let’s put together a little laundry-list of tools you may want to procure.  Just download the bits and save them for later.  We will go through some of these in more detail.

  1. Powershell installation bits (requires at least .NET 2.0 Framework)
  2. PowerTab from ThePowerShellGuy (Most EXCELLENT Site for all things PoSh)
  3. Quest PowerGUI (A powerful FREE GUI and IDE for PoSh) I won’t cover this installation here, but it is pretty straight forward.

That will probably be enough to get moving.

The Powershell install should be fairly straight forward.  Just take all the default options and let-her-role.

NOTE: If you want to install a newer version of Powershell, you will need to Uninstall the previous version.  This is kinda sucky, but not too bad.  To Uninstall it, you can go to Add/Remove Programs and look for it, but you may not see it if you have “Show Updates” option off.  It probably will show up as “Window Powershell”.

To install PowerTab, unzip the downloaded file to a folder you want it to live permanently.  I normally pick something like “C:\Program Files\PowerTab” so it is where all my other programs live.

Now open a command shell and start Powershell by typing in “powershell” and hitting Enter.

This may take a few seconds to launch, but if successful you should see your new and shiny Powershell prompt.   The first thing to do before anything else make it so we can actually run scripts.

Out-of-the-box, Powershell is delivered secure.  SO secure, in fact, that you can NOT run ANY scripts without making it a little less secure.  I will probably discuss this in more detail later or pressure my buddy Mike to write about it on his blog (this is more up his alley than mine).

Anyway…at your PoSh prompt type in the cmdlet:

Set-ExecutionPolicy RemoteSigned

If successful, you should be kicked back to your prompt.  Now we can move forward with the PowerTab install.

CD to the directory you install PowerTab to and run the “Setup.CMD” file to start the installer.  I normally just take all the defaults and let it go.  This will probably take a few minutes to finish up.

Now to show you a little about what PowerTab does for you….

If you type “Get-” and then hit the TAB key you should see a popup window with all sorts of fun things.  Basically PowerTab is like tab completion on STEROIDS.  It is very helpful for discovering what is available and such.  I hope you will take this opportunity to explore Powershell a bit.

You can get a list of all commands (called Commandlets) by typing in “Get-Command”.

I guess that is a fairly good place to break and talk about the “Cmdlets” (said CommandLets).

Cmdlets are the meat-and-potatoes of Powershell.  They are to Powershell what “cd” and “dir” are to the normal Windows command shell (cmd.exe).  Cmdlet names are constructed of 2 parts: a VERB and a NOUN.  If you look at the output of the “Get-Command” cmdlet, you will see lots of verbs on the left side of the “dash” in the cmdlet name: “Get”, “Out”, “Write”, etc.  The right side of the cmdlet name is the noun, or the thing that the verb acts on.

As you can see there are common Verbs and Nouns.  You can, in fact, do a “get-command -verb get” to list all commands that have the verb “Get”.  You can do the same for Nouns.  Go ahead and try that out on a few.

There are several cmdlets you should be using as you are learning the ins and outs of Powershell.  Probably the MOST helpful is, well “Get-Help”.

Get-Help is one I use on a VERY regular basis.  The general format for this one is:

  • “Get-Help <Cmdlet>”    : Gives basic info on the cmdlet
  • “Get-Help <Cmdlet> -Detailed”  : Gives  MUCH more info about the Cmdlet including some examples on using the cmdlet
  • “Get-Help <Cmdlet> -Full” : I think this gives ALL the info about the Cmdlet
  • “Get-Help <Cmdlet> -Examples” : Just gives you the usage examples.

Go ahead and try it now.  Do a “get-command”,  pick one that looks interesting to you and “get-help” on it.  This was crucial for me in learning the Powershell.

While I bring this part to a close I will challenge you to write something…well ANYTHING really.  The best way for me in learning was to pick a task and do it in Powershell.  The thought process for me was “Well, I could do this in vbScript in like 5 minutes, or I could take 30 minutes and do it in Powershell and learn a ton.”  If I have the time to do it in Powershell, I am doing it.

Some simple tasks would be to start using Powershell as your “normal” command shell.  This will get you used to using the cmdlets to navigate the environment and doing things.  Try NOT to use some of the default aliases that are “replacements” for the standard “cmd.exe” commands. These would include “type”, “dir”, etc.  I have been using the Powershell equivilant aliases…  “dir” becomes “Get-ChildItem” or simply “gci”.

The other thing I will challenge you to look at is the cmdlet “Get-Member”.  This will help you in this worthwhile venture into the wonderful world of Powershell.

I think the next session we will cover the notion that Powershell is “Object-Based”, which will be a good lead in from your “homework” on the “Get-Member” cmdlet.

Anyway, until next time…. happy Scripting!

– Mark