I need a break from six straight hours of exam studying.  Coffee.  Inhale… here goes…

UPDATED – 10/11/2015 – 12:06 AM EDT

Funny-Looking-Cats-2-300x275

Back in the hay-day of KixTart and VBScript, I hated having to sort string-based data with the standard Array tools using custom bubble sorting and built-in Dictionary (hash) objects, etc.  As soon as a friend of mine turned me onto SQL Server, ADO and data services, I was hooked.  But I probably went a little too far in that direction, just out of over-excitement (new toy syndrome).

Sometimes it’s not as practical or efficient as hash/dictionary sorting, but when it fits the situation, it rocks.  The differentiator is usually associated with the virtual column structure (schema) and the overall size of data being sorted.  For example, three virtual columns and 100 rows, are probably perfectly fine for hash tables and arrays.   Hundreds of virtual columns and hundreds-of-thousands of rows, and I’m reaching for ADO.

Anyhow, PowerShell continues to win my favor every day, and here’s just one tiny example.

Feeding an array of delimited strings into a function and spitting out the same array structure, but sorting it on any [virtual/logical] column I define.

In plain Anglish, let’s say I have an array such as the one below.  I realize it violates the explanation I provided above (three columns, three rows), but it’s an example and my brain is fried, so…

$emplist = @("John,Doe,Manager", "Steve,Powers,Bionics", "Donna,Clampette,VP")

The array now looks like this

"John,Doe,Manager"
"Steve,Powers,Bionics"
"Donna,Clampette,VP"

Then I want to apply a virtual column structure to this as

$columns = "FName,LName,Title"

and then use that to apply a sort operation over the array, such as:

$emplist2 = Sort-RecordSet $emplist $columns "LName"

I should get back the following array sorted by the LName (last name) virtual column:

"Donna,Clampette,VP"
"John,Doe,Manager"
"Steve,Powers,Bionics"

Here’s one option…

function Sort-RecordSet {
  param (
    [parameter(Mandatory=$True, Position=0] [string] $Columns,
    [parameter(Mandatory=$True, Position=1] [array] $DataSet,
    [parameter(Mandatory=$True, Position=2] [string] $SortBy,
    [parameter(Mandatory=$False)] [int] $MaxColumnSize = 255
  )
  $adVarChar = 200
  $fields = $Columns.Split(",")
  $rs = New-Object -ComObject "ADOR.RecordSet"
  
  foreach ($fn in $fields) {
    $rs.Fields.Append($fn, $adVarChar, $MaxColumnSize)
  }
  $rs.Open()
  foreach ($ds in $DataSet) {
    $row = $ds.Split(",")
    $rs.AddNew()
    for ($i = 0; $i -lt $row.Length; $i++) {
      $fn = $fields[$i]
      $fv = $row[$i]
      $rs.Fields.Item("$fn").value = $fv
    }
    $rs.Sort = $SortBy
    $rs.MoveFirst()
    $result = @()
    do {
      $fs = @()
      for ($i = 0; $i -lt $fields.Length; $i++) {
        $fs += $rs.Fields($i).value
      }
      $arow = $fs -join ','
      $result += $arow
      $rs.MoveNext()
    } until ($rs.EOF)
  }
  $rs.Close()
  $rs = $null
  return $result
}

Thoughts

Data Type Mapping inefficiency

I’m taking a huge shortcut out of convenience by forcing everything into a VarChar() data type.  For very large data sets that might cause performance impacts.  A more-efficient approach would be to map a granular column=datatype construct, and parse that out during the Append() iteration block.  That would trim the memory allocation to fit the data being inserted per column.  That still isn’t ideal (vis-a-vis LISP mapcar(), or apply() ).

One example of what I’m blabbering about would be:

$columns = "FName=string,LName=string,Title=string,Age=int"

Of course, that would involve a few more lines of code inside the function block, to parse to a second level and then apply a switch() branch to handle the datatype assignment.  Optionally, just hand over the string itself and not map anything via switch().  Whatever.

Input / Output Mapping

Another dent in the efficiency score would be from the building of the output array to return at the end of the function block.  Returning a native ADO recordset would save quite a few cycles and reduce memory allocation overhead.

Caveat Shmaveat

Using the example above, it’s obvious that this is a CSV structure.  In such cases, other means could have employed, including native ADO Text/CSV handling.  But, what if the source text uses some other Martian or Neptunian syntax, with tilde (~) or @ in place of commas.  Who knows.

Context and Reality

This entire concept isn’t intended to be a long-term “solution”.  If you’re building something to solve a short-term challenge, scripting is ideal.  If you’re building something for the long-term, particularly if it’s going to be operationally “critical” to your business operations, take the time to architect, design, build, test and deploy a proper “application”.

UPDATE:

I forgot to mention that this could be sorted using built-in PowerShell array sorting methods by first parsing the delimited strings.  Another option would be using a mapcar() process (from LISP) to pluck the nth position elements, sort them, and then re-apply the results to the outer set.  I’m probably suffering from sleep deprivation (still).

function Mapcar {
  param (
    [parameter(Mandatory=$True, Position=0)] [array] $MDArray,
    [parameter(Mandatory=$True, Position=1)] [int] $Position
  )
  $result = @()
  foreach ($subset in $MDArray) {
    $result += @($subset[$Position])
  }
  return $result
}

Try that on the following…

$test = @((1,2,3), (4,5,6), ("A", "B", "C"))
Mapcar $test 1

returns: (2,5,"B")

cheers

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s