What the %#@$ is soooooooo damned difficult about managing AD?

Seriously.

airplane

I’m not talking about the technical side.  I’m talking about the organizational, role-based, human-related, glue-sniffing, weed-smoking, pill-popping mindset that goes into “who is doing what” when it comes to AD in a given organization.  The more I see, the more I’m reminded of the nighttime foxhole scene in Apocalypse Now.

Here’s the problem I see ALL THE TIME:  There’s no one person in charge, from a technical aspect.  There’s often a swarm of people involved with bits and pieces of AD aspects, but they often answer to different teams and managers.  The user accounts person in a different world from the GPO person, from the ADFS/Trust person, from the schema control person, from the computer accounts person, from the DNS and DHCP persons, and so on.

Consider this semi-fictitious conversation…

Manager: “Users in the Sales group are complaining that their login times are way too long”

Engineer: “That’s Jimmy’s problem.  He needs to look at the image.”

Manager: “I’m not an expert, but that same image is used in R&D, Engineering and Finance, and they don’t have any problems like that.”

Engineer: “Then it’s Tiffany’s problem.  She needs to look at their network.”

Manager: “I’m not expert, but they’re at the same location, on the same subnet as Engineering.”

Engineer: “Then it’s Bob’s problem.  He needs to look at their user accounts.”

Manager: “When Ken Footinass logs on from a Engineering desktop, it doesn’t take as long.”

Engineer: “Then it’s Juan’s problem.  He needs to look at Group Policy for those machines.”

Manager: “If you deflect me one more time, I will choke you with this pencil.”

See where this is going?

AD is typically a co-managed service in most environments that have more than a dozen IT staff.  Small shops are in a different boat: putting out fires with insufficient staff, so status quo is infinite.  Just change all the “Manager” and “Engineer” names in the above discussion to one person, and it’s the same.

In larger shops, AD is typically divided up as follows:

  • User accounts, group memberships, passwords = help desk
  • Group policy = InfoSec
  • Computer accounts = oooh.  whoever builds computers, after that nobody cares
  • Sites and Subnets = uhhhh, “ain’t you?” or “I think that’s…”
  • DNS and DHCP maintenance (not the setup) = “uhhhh…Network team?”
  • Forest/Domain trusts = InfoSec?  No wait, maybe it’s networking.  No, hold on…
  • Application interfaces (apps that depend on AD aspects like sites, DNS, containers and schema attributes, etc.) = That’s not our problem, that’s the app owners. (wrong, their apps will impact AD as well)

I didn’t even touch on the tentacles Group Policy creates.  Firewall settings?  Uhh.. Network team.  Folder redirection settings?  Uhhhhh…. Is it lunch time yet?

It doesn’t matter what industry is involved.  Government. Private. Defense.  However, I would say that the Defense world is a bit tighter in most respects on who does what.  But execution varies by each contractor as well.  The STIG approach will only get you so far.

Some of the blood-pressure-increasing statements I hear sound like:

  • “DNS issues shouldn’t cause problems with clients.” (wrong)
  • “DNS issues won’t cause problems with Configuration Manager.” (wronger!)
  • “DHCP has nothing to do with DNS.” (sighs. takes out shotgun and starts cleaning it)
  • “We don’t have anyone in charge of GPO’s or documentation.” (checks the mechanical parts to make sure the shotgun will work. squints to look through the chambers)
  • “We share the management of AD and GPOs as best as we can.” (starts loading shells into the chambers, while humming a Christmas song)
  • “We don’t know how accurate our AD accounts really are, but we don’t really care that much about it either.” (adds more shells, lifts shotgun into firing position. Sets beer can on table)
  • “We use OU’s for grouping things but not for GPO’s.  We do everything in the Default Domain policy.” (dials 911 and asks for SWAT and a few ambulances to be sent. Puts cell phone on silent mode)
  • “AD? Nobody cares about that.  It’s just for user accounts.” (adjusts NVGs and turns out the lights…)

Here’s some basic tips I hope will make a dent (possibly in someone’s head):

  1. Assign someone to be the single point of authority for all AD issues.  Make it a hierarchical team.  Everything that touches or smells like Active Directory has to go through this team. No exceptions.
  2. Document the shit out of everything. (That’s DTSE in mil-speak)  Backups don’t explain “why”
  3. OU’s are primarily for linking GPOs. Not for organizing your garage.
    1. Avoid OU names with spaces and special characters.  Unless you literally hate anyone who writes scripts and applications.
    2. Keep OU names short.  Unless you hate anyone who writes scripts and applications or has to import/export data.
  4. Group Policy settings should NEVER be made to “Default Domain” policy unless they *cannot* be done in separate GPOs.
    1. Rule 1 should be “in order to obtain permission to modify Default Domain policy, the requester must be kicked in the face and crotch at least 6 times each”  If they’re still conscious afterwards, then they may proceed.
  5. GPOs should be created sparingly.  Very…. very….. very… sparingly.
    1. Creating a new GPO should require a physical challenge, followed by a mental challenge.
    2. GPOs should adhere to a rigorous testing process, using a separate test environment.
    3. GPOs should ALWAYS be documented (who, what, when, where, and why). DTSE.
    4. Every GPO should be associated with an “owner” who can explain why it exists and what it does.  Otherwise, it should be unlinked.
  6. Sites and Subnets MATTER… A LOT.
    1. Take the time to verify they are configured correctly.
  7. Assign someone (or a team) to oversee IT asset inventory.  Computers / accounts, groups and group memberships, it all matters a lot.
    1. This person/team should have access to at least view data in every system that contains device inventory records of some kind.  If they have to wait on others it will fail eventually.

Remember, if you don’t do it, and it continues to get sloppy, you’re just adding more justification to the pile for management to replace you.  Either with another human, or (more likely) an automated process.  Either way, you go bye-bye.  Even worse, I’ve seen management just give up on waiting and push a change that makes things worse for everyone (except them, of course).  So keeping your house clean keeps the bugs out and makes it livable.

DNS

I used to think DNS was appreciated and taken seriously.  Not so much anymore.  More and more I’m seeing engineers who think it’s nothing but a name look-up service.  Nothing more.  They ignore event log errors and warnings about name resolution failures.  They ignore the relationship DHCP scopes have with DNS and name resolution, especially when it comes to mobile devices and WiFi networks.

“How can DNS cause network issues?” I hear this often.

Let’s say you don’t have a reliable DNS record scavenging process in place.  Maybe your DHCP reservations are bit longer than they should (with respect to how devices roam about).  Now you have a bunch of machines that you can’t ping, because the address it tries is outdated from what the actual remote device is assigned.  Plan, Test, Deploy, Verify === wash/rinse/and repeat.

Group Policy

One of the biggest frustrations I’ve run into is a shop that has dozens of GPO’s and not a clue why they exist, or what they do.  No documentation.  No one left who can explain why they exist or what they were intended for, or what they actually do. Then they start having issues and freak out over the thought of troubleshooting them.

Then, rather than trying to clean up, the next engineer starts adding and changing settings in the existing “mystery GPO” to co opt it for other purposes.

Don’t be afraid to start clean.  If you can’t do a new domain, start a new root-level OU and build out from there.  Create a parallel structure and populate it gradually as you scale the testing.  Eventually, you will have a clean, stable, and documented environment.  The old environment, when no longer used, can safely be loaded into a white van, bound in duct tape, interviewed by Chris Hanson, and dumped in a landfill somewhere at night.

Event Logs

Read them. Monitor them.  Take action on them.  They are your friend.

If someone walks up, emails, calls, IMs, with “my machine is having a problem”.  I start with “when did it start to happen?” they answer with a date and time.  I respond “what do the logs show around that time?”

Long silence.  Followed by “uhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhhh…”

I respond “(click) hangs up” (unless it’s my boss or his boss)

Don’t forget about log files as well.  Those sneaky little bastards hide in every crack of a machine, but they matter very much as well.

That’s all I have right now.  Now please, go smack the next person who is letting your AD fall apart.

Advertisements

Leave a Reply

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s