FoxDeploy.com

Saw this super interesting read online over the weekend:

Command line tools can be 235x faster than Hadoop

In this post, the author posits that he can crunch numbers from the Linux command line MUCH faster than Hadoop can!

If he can do that, surely we can also beat the Hadoop Cluster…then I started wondering how I would replicate this in PowerShell, and thus this challenge was born…

Challenge

  • Download the repo here (2gb!), unzip it and keep the first 10 folders
  • This equates to ~3.5 GB, which is roughly the same data size from the original post
  • Be sure to only parse the first 10 folders 🙂

    hadop You can delete RebelSite, Twic and WorldChampionships

  • Iterate through all of those Chess Record files it contains(*.pgn) and parse each record out.  We need to return a total count of black wins, white wins and draws.  To read a PGN:

We are only…

View original post 235 more words

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s