Expert Tips for SCCM Log Analysis

1wearandtear

1. Locate cmtrace.exe (or another suitable “active” log viewer)
2. Open cmtrace.exe and click “yes” to register it as default log viewer
3. Consume precisely 5 quarts of a strong, caffeinated liquid substance
4. Browse to location (folder) with log files and double-click desired log file
5. Rub eyelids approximately 12 times, make sure to yawn fully and loud
6. Stare at log details and look for any lines colored in red.
7. Ignore red lines which do not actually display an error, but are instead mentioning that they’re looking for an error
8. Ignore yellow lines which do not actually display a warning, but instead show mention of looking for warnings
9. Rub eyelids 12 more times.
10. Announce to whomever interrupts that you’re busy reviewing log files (the louder the better)
11. Open another log file (selected at random)
12. Stare intently at one line, without scrolling
13. Rub chin, squint, and nod slightly. You may also say “hmmmm”
14. Scroll and repeat step 12
15. Repeat steps 10 through 13, approximately 5 more times.
16. Open browser and begin searching for fragments of error messages along with “sccm error log…”
17. Inhale deeply, exhale loudly.
18. Consume more caffeinated liquids
19. Rub eyes some more
20. Lunch break.

(Seriously) 5 Most Common SCCM Issues

Joking aside (for a few minutes anyway)…

teamamerica3

The five (5) most common root causes for SCCM site issues that I’ve seen over the past year, working as a consultant.

  • Site scale:  (smallest) 500, (largest) 180,000
  • Site types: CAS (5%), Primary alone (85%), Primary with Secondaries (5%), None (5%) aka “new install”
  • Avg staffing: (IT dept) 12-24 (SCCM admin) 1
  • Avg coffee consumption: 1 cup per 30 minutes
  • Avg sleep: 5.2 hours

1 – Lack of planning before installing the environment

In the past year alone, I’ve run across almost a dozen sites which had a CAS and didn’t need one, or Secondary sites, and didn’t need them, and so on.  Some didn’t have a FSP and could’ve used one.  Some weren’t using the appropriate credentials for client installations, network access and so on.  And lately, many seem to have pinned their plans on outdated platforms, such as Windows Server 2008 R2 or SQL Server 2012.  At least keep them patched (e.g. SQL 2012 SP3 CU9)

2 – Lack of monitoring and following-up on warnings/errors

Of the last 24 customer engagements I’ve been involved with, roughly 60% do not keep a daily watch over site issues (sites, components, clients, content distribution, deployments, etc.).  Of those that do monitor, about half ignore lingering warnings which impact site performance.

3 – Lack of cohesive management

This varies by scale/size of the organization (at least in my world).   Often it’s a matter of job roles and organizational divisions.  For example, DBA’s controlling the SQL Server environment without allowing SCCM admins any direct access (very bad).  Or AD admins who drag their feet (or push back) on requests for schema extensions, keeping AD accounts “clean” and so on.  Or Network Admins who fight back against using PXE, no matter what the rationale.  In many cases, it rolls up to team managers who don’t work well together, so resolving conflicts and barriers is difficult, especially when the CTO or CIO prefer to avoid dealing with it.  My advise: deal with it!  The good of the company outweighs your stupid personal disagreements.

4 – Lack of keeping up on updates

Whether it’s the Windows Server, SQL Server, ADK, MDT or Configuration Manager itself, all of these require persistent support and oversight. Keep them patched.  But more importantly, READ THE PATCH details first.  Understand what’s being “fixed” or “modified” (or deprecated) as well as “known issues”.  You can save yourself a shit-ton (that’s a scientific measurement, by the way) of headaches and support costs by not blindly installing without understanding.  However, do not avoid patching simply because of fear and doubt.  You work in IT, which means “change” is inevitable and continuous.  It’s why the “soft” in “software” exists (trust me, Babbage wasn’t kidding around).

5 – Inefficient use of features

This one alone could be broken out into sub-categories actually, and now that I mentioned it, I will…

a – Ignoring features which are not fully understood (not doing research)

b – Continuing to use outdated methods (disk imaging, for one, like Acronis or Ghost)

c – Ignoring other System Center capabilities (SCOM, Orchestrator, etc.)

d – Not following “best practices” (excessive permissions on common accounts, incorrect client installation settings

e – Paying for 3rd-party products which SCCM (or other System Center) capabilities could provide (depends upon the individual requirements of course)

f – Ignoring 3rd-party products out of fear of the unknown (FUD)

g – Ignoring new features added with each build (current branch), such as Azure, OMS, UA, and mobile device features

h – [my peeve] Inefficient mapping of tools to processes.  Such as ignoring Group Policy in favor of doing everything in SCCM or via scripts. Continuing to use familiar solutions even when newer and better (cheaper, faster, more efficient, more reliable) solutions are available.

i – Insufficient use of Internet search tools (Google, Bing, etc.)

Did I miss anything?

5 Tips for Fixing Broken SCCM DMZ services

080530-N-7981E-259

The following five (5) tips should help even the most seasoned SCCM expert determine the root cause for problematic DMZ environments.

Reasons you’re having trouble with your SCCM DMZ

1 – You don’t actually have a DMZ

2 – The DMZ doesn’t contain a SCCM site system, nor an AD Forest trust, nor any network connections back into the internal network.  You might also not have any SCCM clients that operate in the DMZ.

3 – You have no idea what “SCCM” or “DMZ” are.  And you don’t really care.

4 – You work in the Finance department.

5 – Why are you reading this?

Sorry – I needed a break from mind-numbing emails and phone calls today.

CMWT 2017.04.24.01 Released

cmwtlogo5

I’m trying something different this time, so I will let you tell me if it’s better or worse than what I was doing.

What I was doing: Uploading raw files to the Github repo, and uploading a .ZIP to a separate repo under the same account.

What I’m trying now: Uploading raw files to the Github repo and let everyone download the entire stack using the Github “clone or download” feature.  The Download option makes a .ZIP of the entire mess, so it seems like a better option (so far).

What’s new in 2017.04.24.01

  • Bug fixes to AD users, groups and SCCM device details
  • AD user page now allows adding to AD groups
  • AD groups are filtered using the _protectedgroups.txt file (you can edit this to your liking)

More info here

SCCM Collection Queries by Server Role

MFfn7

Rather than spew forth a bunch of sample queries, I’ll just hand you a virtual fishing rod, a case of imaginary beer, and point you to the make-believe boat.  This little procedure came in handy today with a customer I was helping.  I hope it helps you as well…

  • Device Collections
    • Create Device Collection
      • Name: Servers – WDS Servers (example)
        • Limiting collection: (whatever has servers with clients)
        • Use incremental updates for this collection (check)
        • Add Rule > Query-Rule
          • Name: 1 (or whatever you want, I’m lazy)
          • Edit Query Statement:
            • Omit duplicate rows (check)
            • Criteria tab
              • “Select” button (click it)
              • Class = Server Feature
              • Attribute = Name (click OK)
                • Is Equal To (leave as-is)
              • Click the “Value” button
                • Select an appropriate Feature Name
                • Enjoy a cold one!

SCCM Upgrade Scenarios

master-pai-mei

From SCCM 2012 SP1 to 1702

  • Resolve SCCM site monitor issues
  • Resolve hardware and/or software deficiencies (storage, horsepower, etc.)
  • If not virtualized, virtualize now.  Anyone who insists physical is only way to go, must be eliminated (quietly, of course)
  • Upgrade Windows Server**
  • Upgrade SQL Server**
  • Upgrade to 1606
  • In-console update to 1702

From SCCM 2012 R2 to 1702

  • Same as above

From SCCM 1511

  • Why are you still on 1511?  Did you think “IT” implied sitting around and never keeping up with current technology?  What were you thinking? You’re so close, don’t stop now!

From SCCM 2007

  • Install parallel site hierarchy on 1606
  • In-console update to 1702
  • Migrate DPs, packages, etc. from 2007 to 1702 environment
  • Attach suitable explosives to 2007
  • Maintain safe distance
  • Detonate

From LANDesk, Tivoli, Kasaya, etc.

  • Locate person who selected that platform
  • Cover this person thoroughly in fresh steak juice, garnish with fresh chunks of steak
  • Drop person into tank filled with starving crocodiles
  • Capture and upload video to Liveleak.com (don’t waste it)
  • Oh yeah, install SCCM 1606 site
  • In-console update to 1702

** follow Microsoft guidelines for ‘supported operating systems‘ and ‘supported SQL versions‘ under ‘supported configurations’.  Pay careful attention to what “supported version” means while you transition from 2012, 2012 R2 up to current branch, as well.