mine_detector

Here’s a couple of scenarios I’ve seen or heard play out about a dozen times over the past few months:

“I installed Windows 10 on a computer at work and now it keeps restarting all the time.”

“I installed Windows 10 on a computer at work and [insert application name] keeps hanging or crashes.”

Then begins the usual 20 questions game:

  1. Clean install or in-place upgrade?
  2. Is this happening on every instance of the same model?
  3. Every user, or only certain users?
  4. Are the computers domain-joined or not?
  5. …and so on…

Square One

First off, I don’t ever recommend starting your Windows 10 testing using an in-place upgrade of a domain-joined business computer.  It’s not because I don’t believe in the in-place upgrade, I do.  It’s because you’d be starting off with way too many unknown variables in the equation.

Start simple and clean.  That means, pick a machine, and perform a clean installation.  Even if you back up data from it beforehand, do not restore it.  At least, not yet.  The goal of the initial testing phase should be to determine how well it (Windows 10) performs on a pristine configuration.  Do not join it to any Active Directory domain.  Do not install ANY third-party applications.  The bare minimum installation should only include device drivers, and only if proven necessary AFTER trying the device without them.

Why did I emphasize “AFTER” in the preceding sentence?  Because twice in the past month I’ve encountered users who performed a clean install, and immediately loaded several vendor drivers and then ran into problems.  When I asked why they immediately installed them, rather than confirmed their need first, they said “because we always need them with Windows 8.1”.  As it turned out, some of those vendor drivers were the cause of their problems.

Square Two

Nobody talks about “square two” much, do they? In order to have a “square one”, there would logically have to be a “square two”, otherwise it would just be called “square”.  Right?  Anyhow.

Following on from the previous section, the next logical phase of testing would be to gradually add layers of functionality until you arrive at a configuration state which matches your business needs.  The following is only a short list of example things to layer on.  These should be applied and tested completely before adding another layer.

Bare Metal

  • Joining to an AD domain
  • Move into OU with GPO’s applied**
  • Install management client (ConfigMgr, etc.)
  • Install antivirus product
  • Install standard/common business applications
    • Without custom configuration
    • With custom configuration

In-Place Upgrade

  • At least three of a selected model, for comparison

** Refer to my previous article on Group Policy.

Patterns

As with anything that exists in the world of computing, patterns are the core of all that is.  Patterns of code.  Patterns of communication.  Patterns of usage.  Patterns of success and Patterns of failure.

Customer says “It’s crashing on all our machines!”   After a few minutes of narrowing down what “all” means, it turns out it’s just one particular model. That helps lead to possible driver issues, or bad configuration settings.

Customer says “It’s crashing on random devices.”  After discussion, it turns out that the pattern is a particular application they installed, which was not installed on the ones working just fine.

Look for patterns.

Bread Crumbs

Windows, like most operating systems,  maintains a lot of telemetry data for various purposes.  Among them are the Windows event logs, and log files.  But don’t overlook that potential of third-party application logs either.  Quite often, when a particular application crashes or locks up, and nothing can be correlated in the Windows event logs, evidence can be found in the applications own usage log files.

In addition, check with product vendors (both hardware and software) for possible “verbose” settings of their software and firmware.  This can help with regards to exposing more detailed log output, which can help pinpoint issues more quickly.

The first question I ask most technicians or anyone else when they report a problem with their computer is, “what do the logs show?”  Most often, the answer is “I haven’t checked yet.”  This is pretty much the same as calling 911 and reporting someone hurt and they ask “are they still alive?” and answering with, “I haven’t checked yet.”

Detecting Mines

One common approach that has worked well since Windows 95, has been the Safe Boot option.  This is particularly relevant to startup and login issues, but can have a more pervasive benefit as well.

Use the MSCONFIG command to configure the machine for its next boot.  This can make it possible to suspend the loading of certain default applications and services, and allow for discreet loading of drivers during startup.  The trick is often working fast enough after a brief successful login to use the command before the annoying issue forces an automatic crash or restart.

This helped me diagnose a problem with a particular HP laptop back on the release date for Windows 10 (build 10240).  After suffering through repeated crash/restart cycles every 30 seconds (literally), I managed to prop the door open by using MSCONFIG and safe boot.  After several iterations of loading individual drivers, I discovered the root cause was a bad video driver.  After reporting that to HP and Microsoft, HP released an updated driver from the source vendor to correct the issue.

With Application crashing and lock-up issues, the trick is to hone in on the time window in the Event Logs.  This is often more of an art than a science, as it’s not uncommon for a seemingly unrelated issue to have a major impact on stability.  Filter the log down to an hour before and an hour after the event. Look for ANYTHING that might be related, and start with the most obvious entries, working back to the seemingly less-obvious.

As an example, in one case, an application kept displaying an error upon launch.  Click OK and it would close.  Couldn’t get into the application at all.  Scanning the event logs around the time of the issue, I found nothing that stood out.  Checking the vendor’s product log files, there were some clues, but nothing obvious enough to say “a -ha!”.  Something about ‘required resources were not available’ or similar to that.

Looking farther back in the Application event log, the application had reported failing to start a custom service about an hour and a half earlier.  The customer had an antivirus program that was blocking the .exe used by the service configuration, upon which the application depended for launching.  Once the .exe was excluded (along with it’s folder location), the service started, and the application returned to working condition.

Final Note – Bare Metal vs. Upgrade

There is a third option.  The ICD, or Imaging and Configuration Designer, which is part of the Windows Automated Deployment Kit (ADK) for Windows 10.  The ICD is intended for building a logical transformation package to reconfigure an existing Windows 10 computer to meet a specified set of requirements.

What for?  This is ideally suited for devices purchased with Windows 10 already installed.  The ICD Provisioning Package can be used to remove, add, or modify applications, services, and configuration settings, and join the device to a particular workgroup or domain as well.

Patience, and coffee.  These are your friends.

The End

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s