VMware VMworld 2013 Day 1 Announcements

Unless you’ve had your head in the sand you probably know it’s the 10th annual VMworld and as usual it’s time to get your ear to the ground and check out the latest announcements.

Over on the VMworld Blog they’ve released the latest technologies coming out of this years conference on day 1 but in a nutshell they are:

The Software-Defined Data Center Today
VMware NSX
VMware Virtual SAN
VMware vCloud Suite 5.5 and VMware vSphere with Operations Management 5.5.
Delivering The Software Defined Data Center

For more information about what’s new in 5.5, Jason Nash has a write up here and Chis Wahl and Julian Wood both have a deeper description here and here.

There’s also a What’s New PDF up on the VMware site already: http://www.vmware.com/files/pdf/vsphere/VMware-vSphere-Platform-Whats-New.pdf

DR/BC Site, SSO & AD Authentication

I’m in the midsts of testing DR/BC at the moment with SRM replicating machines down to our BC site. We’ve upgraded everything to 5.1.1a across the board and since moving to SSO we’ve had our fair share of issues. Some we’ve resolved but one particularly important one involved not being able to authenticate with our offsite domain controller at the BC site when we pull the plug on the metro line, isolating that site from the rest of the network.

I was able to login to the offsite VC via the Web Client using my domain credentials absolutely fine but when I logged in as admin@system-domain and changed Identity Source to the offsite DC via the SSO config section then I found that I couldn’t login using domain credentials. The message I received was ‘Authentication Failed’.

I also noticed I had ‘Failed to initialize start-up services’ and a message advising me on installing a vCenter Server system when I logged in. It was apparent that SSO wasn’t installed or authenticating correctly.

bc_vc_sso

So I took a step back and had a think. Initially I’d been following Derek Seaman’s guide to installing SSO and the plethora of SSL certs and when it came to choosing the vCenter Single Sign On Type I had been selecting ‘create the primary node for a new vCenter Single Sign On installtion’. Derek’s advice is as follows:

Even if you don’t want multiple SSO instances now, you may want them in the future. You don’t need to configure additional ones from the outset, so there’s no harm in leaving the door open for future expansion. Thus I selected the second option, as shown below. .

sso_primary_node

However, after searching for more information and recommendations I came across a post by Duncan Epping over on Yellow Bricks about what his thoughts when it comes to SSO. The phrase ‘KISS’ has never rung truer:

Justin King already mentioned this in his blog series on SSO (part 1234) as a suggestion, but lets drive it home! Although it might seem like it defeats the purpose I would recommend the following in almost every single scenario one can imagine: Basic SSO deployment, local to vCenter Server instance. Really, the KISS principle applies here. (Keep It Simple SSO!)

Now, I am most certainly not any kind of vExpert and I am in no way proclaiming that Derek’s information is incorrect; his guides have been invaluable to me as well as thousands of other vNerds and his blog is a constant source of awesomeness. But, as with most things, YMMV and so it was time give it another go. Armed with this new found knowledge I set about reinstalling SSO one more time and on this attempt I chose the following:

sso_primary_node

Afterwards I completed the Inventory Service installation, then vCenter and finally the web client. Then the moment of truth: I logged in as admin@system-domain and saw that the offsite VC was now listed in the available systems. Eureka! The next step was to get this VC authenticating with the offsite DC.

At this stage I figured I needed a coffee so I rebooted the DC and VC and went for a refill. When the servers had finished booting I logged back in as admin@system-domain and removed the existing Identity Source and added the new details for the offsite DC. This time I paid special attention to the requirements and used the attribute editor in ADUC to retrieve the correct DN for both the users and groups. I also changed the authentication type to require a username and password and it all went in fine.

So there you have it. Keep it simple!

All that’s left to do now is to pull that plug and make sure I can login when we’ve isolated the BC site. Wish me luck!

SRM 5.1 – 404 errors and Pairing Issues

I’ve spent a good hour or two trying to work this one out and so it’s documented and I don’t forget here’s the skinny.

We’re having some SSO issues at our BC site and I’ve spent the morning reinstalling all the vSphere and SRM components and checking everything along the way: FQDN, IP’s, SSO username and password – everything. It’s still not working as intended and there’s a ‘failed to initialize services‘ banner message we’re troubleshooting at the moment.

In the meantime one of the virtual hard disks that one of our many Exchange DB’s sits on had run out of space and needed expanding by a few GB. In this situation we disable replication on the disk in question and simply increase it a little and then expand it in the OS. Except when I tried to connect to SRM after the aforementioned reinstall I was presented with a “Lost connection to SRM server, the remote server returned an error 404” message.

Not ideal.

So the troubleshooting began and I ended up checking out the log files on the SRM server where the SRM application is installed (we use VR and so have the appliance installed as well) – in case you need to know they’re located in C:ProgramDataVMwareVMware vCenter Site Recovery ManagerLogs and you might have to display hidden files and folders.

If you see this message: Registration with the local VC server is not valid then you have two choices. You can take a look at this KB over at VMware and try to re-establish the credentials between vCenter Server and the SRM server or you can take option two, as I did, and reinstall the SRM application. When you do uninstall, DO NOT check the box that asks you to remove the database because you’ll essentially be wiping your SRM install clean. DO make sure you have you SQL credentials at hand, though. This ensures all of your SRM data will remain in place.

When you’re done, reboot the SRM servers, the appliances and reconfigure the two sites again.

Happy days.

 

CloudPhysics, Log Analysis & Insights – The New Awesome

Now is a great time for log, performance and insight analysis in VMware.

VMware vCenter Log Insight has recently been updated to version 1.0.4 proving this application is going from strength to strength. If you’re looking for a way to capture the massive amount of data that emanates from your environment and turn it into something that makes sense then this is what you need, it really is a no-brainer. Check out the product information page for more of an overview and to download the trial. And you can’t go far wrong with checking out Sam McGeown’s guide to installing and configuring Log Insight over on DefinIT.

And then we have CloudPhysics. If, like me, you spend most of your day administering and tweaking your vSphere environment to get the best performance possible then this app is nothing short of godly. At a basic level CloudPhysics is a VM distributed via OVA that sits in your enviroment and quietly gathers data and sends it on to your dashboard which is hosted at https://app.cloudphysics.com/login

After you install the Observer app and have created an account on the CloudPhysics site then you’ll need to let it poll for a small amount of time – probably the same amount of time it takes to drink a cup of coffee. Then you’ll be presented with a number of ‘cards’ displaying your parsed vSphere data.

CloudPhysics Cards
CloudPhysics Cards

You can also create your own cards using the handy card builder wizard, opening up massive potential for all sorts of data display. Probably the best aspect of the default layout is the Knowledge Base Advisor.

I won’t ruin the surprise but needless to say you should check it out.

Unknown Virtual Machines After Host Reboot

One of our hosts became inaccessible over the weekend as I was migrating VMs between hosts. After a number of attempts to gracefully reboot the host I was left with no other choice than to reboot -f and wait patiently while it rebooted. Sadly HA didn’t have an opportunity to migrate machines off so I had to login to the host via the C# client and power all the VMs on again. Thankfully there were no machines on there that caused disruption and total downtime was about 10 minutes. Not ideal, though.

Anyway, as a result of the reboot I noticed I had a VM that had been renamed to Unknown VM. I had an idea of what it could be but after checking the events on the host I ended up being a little confused as the VM I thought it was had migrated to the new datastore. Not a problem though and if you find yourself in this situation then the following should help out:

  1. SSH into your host as root
  2. Run the following: cat /etc/vmware/hostd/vmInventory.xml – this will result in an output of the VMs currently registered to your host. Compare this with the list of VMs actually running on your host.
  3. Right-click the Unknown VM entry and click Remove From Inventory.
  4. Browse to the appropriate datastore for the virtual machine and open the folder.
  5. Right-click the *.vmx file and click Add to Inventory.
  6. Power on the virtual machine.

Now, in my case the VM had actually completed the migration and appeared further down the list of running VMs so I followed steps 1 – 3.

For reference here are the VMware KBs I used for troubleshooting:

Restarting the Management agents on an ESXi or ESX host (1003490)
Identifying Fibre Channel, iSCSI, and NFS storage issues on ESX/ESXi hosts (1003659)
Inaccessible virtual machines are named as Unknown VM (2172)
A virtual machine cannot be powered on and shows as unknown (1008752)