Thursday, January 10, 2013

Executing tasks in parallel in PowerShell

Let's pretend that we need to write a script that can test to see if 10 different webpages are available.
While this could be accomplished by retrieving the webpages one at a time, it does take some time to do this. Often something like 0.5 to 2 seconds for each page.

Now imagine that you needed to test 50 pages or 100 - that would take quite a while...
 
The following function "Run-Threaded" allows the same unit of work (scriptblock) to be executed against several targets in parallel.


Function Run-Threaded {
    Param ($scriptBlock, $targets, $throttleLimit = 4)
    ## Initial setup of runspace with default sessionState/host
    $sessionState = [system.management.automation.runspaces.initialsessionstate]::CreateDefault()
    $runspacePool = [runspacefactory]::CreateRunspacePool(1, $throttleLimit, $sessionState, $host)
    $runspacePool.Open()
    ## create empty array to keep track of the threads
    $threads = @()
    ## Prepare the threads, one for each target
    $handles = foreach ($target in $targets) {
        $powershell = [powershell]::Create().AddScript($scriptBlock).AddArgument($target)
        $powershell.RunspacePool = $runspacePool
        $powershell.BeginInvoke()
        $threads += $powershell
    }
    ## Throttling - make sure no more than $throttleLimit threads run at once
    $output = $null
    do {
        $i = 0
        $finished = $true
        foreach ($handle in $handles) {
            if ($handle -ne $null) {
                if ($handle.IsCompleted) {
                    $result = $threads[$i].EndInvoke($handle)
                    $threads[$i].Dispose()
                    $handles[$i] = $null
                    $output += $result
                } else {
                    $finished = $false
                }
            }
            $i++
        }
        if (-not $finished) { Start-Sleep -Milliseconds 150 }
    } until ($finished)
    $output
    ## cleanup
    $runspacePool.Close()
}

The following small piece of code ($myScriptBlock), tries to download whatever is located at he address http://$target.
As output it then creates a PowerShell Object, that contains the URL, the download time in seconds and the size of the page downloaded in bytes.



$myScriptBlock = {
    Param ($target)
    $downloadTime = (Measure-Command {$webPage = (new-object net.webclient).DownloadString("http://$target")}).totalSeconds
    $downloadSize = $webpage.length
    $obj = New-Object Object
$obj | Add-Member NoteProperty URL -value $target
    $obj | Add-Member NoteProperty 'DownloadTimeInSeconds' -value $downloadTime
    $obj | Add-Member Noteproperty 'SizeOfPageDownloadedInBytes' -value $downloadSize
    $obj
}

Using the few lines of code below, together with the scriptblock and funtion above, it is possible to execute the dowload of all the pages listed in the array $targets in parallel, and retrieve and display the results, as well as the elapsed time:


$targets = "www.microsoft.com", "www.cnn.com", "www.ebay.com", "www.slashdot.org", "www.eb.dk", "www.bbc.com", "www.apple.com", "www.bing.com", "www.lego.com", "www.htc.com", "www.samsung.com", "www.megabloks.com"
$time = (Measure-Command {$result = Run-Threaded -scriptBlock $myScriptBlock -targets $targets 20}).totalSeconds
$result | Format-Table -AutoSize

$sumTime = ($result | Measure-Object -Sum DownloadTimeInSeconds).sum
Write-Host "Total runtime: $time seconds" -fore green
Write-Host "Sum of runtime: $sumTime seconds" -fore cyan

The output should look something like this:

URL               DownloadTimeInSeconds SizeOfPageDownloadedInBytes
---               --------------------- ---------------------------
www.microsoft.com             0,3908426                        1020
www.apple.com                 0,1885553                       11248
www.bbc.com                   0,4792315                      114846
www.htc.com                   0,4955355                       10127
www.samsung.com               0,3469335                       71112
www.eb.dk                     1,0516336                      263063
www.megabloks.com             0,6676591                       38997
www.bing.com                  1,0628155                       30023
www.lego.com                  0,9899592                       31447
www.cnn.com                   2,2071444                      118118
www.ebay.com                   2,750856                       93185
www.slashdot.org              3,4376653                      100297

Total runtime: 3.8353139 seconds
Sum of runtime: 14.0688315 seconds
 


The code example can easily be adapted to collect the status from 100 databases or 1000 servers, in very little time. However, please note that this happens at the expense of CPU and RAM usage.

An alternative to the function above, would be to use the Start/sSop/Get/Wait-Job Cmdlets, native to PowerShell. Unfortunately those Cmdlets are sometimes too slow and too resource heavy, but they do offer a simple interface and additional possibilities.


No comments: