Monday, June 23, 2014

XenServer 6.2 Unable to find partition containing kernel

Sometimes XenServer can be a real pain in the neck.

Here's what I ran into a little earlier:

I upgrade my VM's kernel and reboot. Wait for a while, but the VM doesn't appear to come up. So I go to my Logs and lo and behold:



Ugh, wat?

OK, let's see if we can start it manually, worst case scenario is the command line providing us with more info, right?

[root@xen]# xe vm-start vm="My VM's Name"
The bootloader returned an error
vm: .... (My VM's Name)
msg: Unable to find partition containing kernel

Oh, how informative. Luckily XenServer provides us with an off-line VM boot editor. That should do the trick:

[root@xen]# xe-edit-bootloader -n "My VM's Name" -p 1
Plugging VBD: 
Creating dom0 VBD: ...
add map ...1 (252:5): 0 1048576 linear /dev/sm/backend/.../... 2048
add map ...2 (252:6): 0 33554432 linear /dev/sm/backend/.../... 1050624
add map ...3 (252:7): 0 1886386176 linear /dev/sm/backend/.../... 34605056
Waiting for /dev/mapper/...1: .....Device /dev/mapper/...1 not found.
You must specify the correct partition number with -p
Unplugging VBD: . done

This should have worked. It's usually as good as logging into the VM and editing the GRUB menu.lst yourself. What on earth happened here? Well, I'm pretty sure I know my systems and my partitions so I'm sure the /boot partition is the first one!

What now? Well, for sure we'll need to boot from a rescue disk. So, make sure your NFS ISO library is online or you have a storage already handy that has a rescue CD .iso. No? We don't? Well, no reason to panic. We'll just create a directory named "/rescue/ISOs", download a Debian Live CD there and create an SR with the name label "RESCUE" that points to that directory:

[root@xen]# mkdir -p /rescue/ISOs
[root@xen]# cd /rescue/ISOs
[root@xen]# wget http://cdimage.debian.org/debian-cd/current-live/amd64/iso-hybrid/debian-live-7.5.0-amd64-rescue.iso
[root@xen]# xe sr-create name-label=RESCUE type=iso device-config:legacy_mode=true device-config:location=/rescue/ISOs content-type=iso
303bd588-c675-2bc0-8982-8a691141226a 

Please note that you need to make sure you create this SR on a partition with plenty of disk space, the Dom0's default partition is just too small and therefore I do not recommend having any rescue ISOs on it.

Cool. Now let's select that ISO as our DVD disk on our VM and finally boot off of it. The latter task can be done by going to VM -> Start/Shut Down -> Start in Recovery Mode.

Now that everything's started up, let's see what's going on. As I already mentioned, my /boot partition is my first partition so let's mount it and inspect the grub.conf file:

root@debian:~# cat /proc/partitions
major minor  #blocks  name

 202        0  960495616 xvda
 202        1     524288 xvda1
 202        2   16777216 xvda2
 202        3  943193088 xvda3
  11        0     696320 sr0
   7        0     548992 loop0 
root@debian:~# mkdir /a
root@debian:~# mount /dev/xvda1 /a
root@debian:~# vi /a/grub/grub.conf 
# grub.conf generated by anaconda
#
# Note that you do not have to rerun grub after making changes to this file
# NOTICE:  You have a /boot partition.  This means that
#          all kernel and initrd paths are relative to /boot/, eg.
#          root (hd0,0)
#          kernel /vmlinuz-version ro root=/dev/xvda3
#          initrd /initrd-[generic-]version.img
#boot=/dev/xvda
default=1
timeout=5
splashimage=(hd0,0)/grub/splash.xpm.gz
hiddenmenu
title Oracle Linux Server (3.8.13-35.1.2.el6uek.x86_64)
        root (hd0,0)
        kernel /vmlinuz-3.8.13-35.1.2.el6uek.x86_64 ro root=UUID=e9b3edd9-15e0-4cfd-a8fa-7dc24f6aeefa rd_NO_LUKS LANG=en_US.UTF-8 rd_NO_MD console=hvc0  KEYTABLE=us SYSFONT=latarcyrheb-sun16  rd_NO_LVM rd_NO_DM rhgb quiet
        initrd /initramfs-3.8.13-35.1.2.el6uek.x86_64.img

Now, we have two options really:

a) Solving this through XenServer. We would have to issue:

xe vm-param-set uuid=vm-uuid PV-bootloader-args="--kernel=/vmlinuz-3.8.13-35.1.2.el6uek.x86_64 --ramdisk=/initramfs-3.8.13-35.1.2.el6uek.x86_64.img"
xe vm-param-set uuid=vm-uuid PV-args="root=root-device  ro quiet"

Since I would have to change this every time I upgrade the kernel and I really want to find what went wrong, I'll pass this option for now.

b) Trying to debug what's wrong with grub.conf.

Since the kernel is too recent to not support XenServer, it should just be a matter of patience to debug it.

Here's what's usually wrong with grub.conf/menu.lst:
i  ) The root(hdx,y) is wrong:
          x should point to the hard drive number where our boot partition is located at;
          y should point to the partition number of our boot partition.
   In this case, root(hd0,0) is correct.
ii ) The paths of vmlinuz-... and/or initramfs-... are wrong.
         The paths should be relative to the partition root directory. So for example if the boot directory is in a dedicated partition, it should be /vmlinuz-... and /initramfs-... but if the boot directory is in the same partition as the linux root (/) directory it should be /boot/vmlinuz-... and /boot/initramfs... if that explanation makes sense.
iii) The root directive is wrong.
          Here I mean the root directive that defines the linux root (/) directory, and not the boot partition which has been already declared with the root(hdx,y) statement. It could be root=/dev/xvda3 for instance. In my case it is root=UUID=e9b3edd9-15e0-4cfd-a8fa-7dc24f6aeefa.

The three cases above can be easily examined with a simple ls command on the /a directory and a blkid /dev/xvda3 or ls -l /dev/disk/by-uuid to find if the UUID of the device that hosts our root (/) directory is the correct one.

Another case is for the partition to have been corrupted, so we just umount /a and fsck /dev/xvda1    

In my case, as you can see the error was in the default setting. Changing this to 0 did the trick. Default signifies which item in the menu is the one that will boot after the user-interaction timeout occurs. The count starts from 0 and not from 1 so my system couldn't boot. What happened is that the OS provider had decided to change the default kernel order, making a mess out of my server.

Save changes, shutdown, reboot. Should be fine.

Friday, June 6, 2014

Subnetting made easy Part 2

Class A:
0.0.0.0     - 127.255.255.255 default subnet mask: 255.0.0.0 (or /8, 8 network bits and 24 host bits)
Class B:
128.0.0.0 - 191.255.255.255 default subnet mask: 255.255.0.0 (or /16, 16 network bits and 16 host bits)
Class C:
192.0.0.0 - 223.255.255.255 default subnet mask: 255.255.255.0 (or /24, 24 network bits and 8 host bits)
Class D: (Used for multicasting, not used for IP addressing)
224.0.0.0 - 239.255.255.255
Class E: (Unused)
240.0.0.0 - 255.255.255.255

Subnetting based on the number of hosts:

Example A:

Say that our assigned network is 213.213.213.0 and we need to divide it into networks that have 30 hosts each.

Let's consult a table for our decimal to binary conversion to find what 30 is in binary. We work with 8 bits and we start from 0, so we work from 20 to 27:

2726252423222120
1286432168421
00011110

That means we need 5 bits to represent number 30 (24 is the 5th bit, we start from 0) and therefore we need 5 bits for our hosts. 52=32, and the number of the actual usable hosts is 52-2=30.

213.213.213.0 is a Class C network.

Class C subnet mask: 255.255.255.0 (/24) or in binary:   11111111 11111111 11111111 00000000

Our subnet mask will be the default mask with the number of network bits that we will need to have in order to have the number of hosts we desire.

Class C subnet mask: 11111111 11111111 11111111 00000000
                                                                                  ^^^ we need 5 host bits, so we'll need to turn the last 5 bits to 0; the rest will be assigned as network bits by turning them to 1.                    

Our     subnet mask: 11111111 11111111 11111111 11100000
 In other words     :      255          255       255           224   (/27, since we have 27 network bits enabled)

So now, the number of our hosts will be the remaining host bits. In our case 32. The network bits are 3, so 23=8 networks.

The number of usable hosts in our network though will be the remaining host bits minus the network
address and the broadcast address, so 2 to the power of our remaining host bits minus 2.

So, in our case 25-2=32-2=30. We will have 30 hosts -exactly as many as we needed- in 8 networks.

A faster way to find this number is with what Cisco refers to as the interesting part. The interesting part is
the part of the subnet mask which is not 255. We subtract this number from 256 and that will be our IP address increment.

This in our case is the fourth part of the IP address. Also, In our case, 256-224=32.

So our first IP allocation will be 213.213.213.32. Just write the increments and fill the spaces in-between.

213.213.213.0
213.213.213.32
213.213.213.64
213.213.213.96
etc.
So our network is:

213.213.213.0 - 213.213.213.31
213.213.213.32 - 213.213.213.63
213.213.213.64 - 213.213.213.95
etc.
.....

Example B:

We are given the network 130.130.0.0 and we want to divide it into networks with 600 hosts each.

Let's consult a table for our decimal to binary conversion to find how 600 is represented:

The sum of our 8-bit table (128 to 1) amounts to 255 (obviously). 8 bits are not enough. We need more entries:

5122561286432168421
1001011000


So we need 10 bits for our hosts. 102-2=1024 number of hosts, 1022 usable hosts.

This is a Class B network so its network mask looks like this:

11111111 11111111 00000000 00000000


We need 10 host bits, so we'll need to turn the last 10 bits to 0; the rest will be assigned as network bits by turning them to 1. And now our netmask looks like this:

11111111 11111111 11111100 00000000 (/22)
   255          255           252          0

As we already said, we need 10 bits for our hosts. 102-2=1024 number of hosts, 1022 usable hosts.
The network bits are 6, so 26=64 networks.

256-252=4. So this is our network:

130.130.0.0
130.130.4.0
130.130.8.0
etc.

and our IP ranges are:

130.130.0.0 - 130.130.3.255
130.130.4.0 - 130.130.7.255
etc.
.....

Example C:

We have the network 10.0.0.0 and we want to split it into networks with 96 hosts each.

Let's consult a table for our decimal to binary conversion to find how 96 is represented:

12864.................
01.................

No need to calculate any more, we are sure that this is the correct bit, we go over after that
So we need 7 bits for our hosts. 27=128 hosts, 128-2=126 usable addresses.

This is a Class A network so its netmask looks like ths:

11111111 00000000 00000000 00000000

We need to have 7 host bits, so our netmask should look like this:

11111111 11111111 11111111 10000000 (/25)
   255           255         255         128

So 27=128 hosts, 128-2=126 usable addresses, 217=131072 networks.

256-128=128. Our IP ranges are:

10.0.0.0 - 10.0.0.127
10.0.0.128 - 10.0.0.255
10.0.1.0 - 10.0.1.127
etc.

Now that we've gone through the basics and learned how to subnet, we need to note:
It's always a good idea to add one when trying to subnet based on hosts just in case our number is on a binary calculation boundary (e.g. 128,64,32,16,8,4,2,1).

Let's go over an example again with our new method:

Say that our assigned network is 213.213.213.0 and we need to divide it into networks that have 32 hosts each.

We need to add one so instead of calculating for 32, we calculate for 32+1=33:

Let's consult a table for our decimal to binary conversion to find what 33 is in binary.

1286432168421
00100001

We need 6 bits to represent number 33. 62=36, and the number of the actual usable hosts is 62-2=34.

213.213.213.0 is a Class C network.

Class C subnet mask: 255.255.255.0 (/24) or in binary:   11111111 11111111 11111111 00000000

Our subnet mask will be the default mask with the number of network bits that we will need to have in order to have the number of hosts we desire.

Class C subnet mask: 11111111 11111111 11111111 00000000
                                                                                                     
Our subnet mask:       11111111 11111111 11111111 11000000
In other words:             255          255          255         192   (/26, since we have 26 network bits enabled)

The network bits are 2, so 22=4 networks.

The number of usable hosts in our network will be the remaining host bits minus the network address and the broadcast address, so 2 to the power of our remaining host bits minus 2. As calculated earlier, 62=36, and the number of the actual usable hosts is 62-2=34.


This in our case is the fourth part of the IP address. Also, In our case, 256-192=64.

So our first IP allocation will be 213.213.213.64. Just write the increments and fill the spaces in-between.

213.213.213.0
213.213.213.64
213.213.213.128
213.213.213.192

So our network is this:

213.213.213.0 - 213.213.213.63
213.213.213.64 - 213.213.213.127
213.213.213.128 - 213.213.213.191
213.213.213.192 - 213.213.213.255

If we had done the calculation without adding one, we would have provisioned for 30 hosts (5 bits,25=32,30 usable hosts)