I showed the diagram that I got from Intel manual and basically that shows how VMM/Hypervisor interacts with Guest OS. In summary, it uses several instructions such as VM Exit or VM Entry to move in and out of Guest OS. To a degree that is very similar to how system call works. In other words, on 64-bit machine when we execute 'syscall' instruction, it causes the change to kernel mode and the kernel knows which system service the user mode application requested because the system service number is passed via EAX register. In the similar fashion, when we execute VM Exit, that causes the change to VMM/Hypervisor mode and VMM/Hypervisor does what's necessary to provide the guest OS virtualized environment.
If we look at disassembly of NtReadFile from ntdll.dll, we can see that it is calling 'syscall' as follows.
0:013> u ntdll!ntreadfile
ntdll!NtReadFile:
000007fe`747d2e40 4c8bd1 mov r10,rcx
000007fe`747d2e43 b804000000 mov eax,4
000007fe`747d2e48 0f05 syscall
Then the natural question would be what about virtualization? Do we also generate similar code for VM Exit or VM Entry? The answer is 'No'. The way it works is somewhat different and you can find the detailed information in chapters 25-27 of Intel Software Developer Manual but let me briefly describe how that works here.
Basically we tell processors that VMM/Hypervisor wants to gain the control when the guest OS executes certain instruction. Or we specify that when the guest OS touches certain parts of memory, we want to obtain the control so that we can provide appropriate information to the guest OS.
For instance, we might want to control the time inside guest OS and the way OS may obtain the time information is via TSC. So for this we want to gain the control when the guest OS executes 'rdtsc' or 'rdtscp' instruction. So essentially there is a certain data structure called virtual-machine control structures that the processor uses when it executes instructions and all we need to do is that we program this data structure. Once we do that the mode will be changed to VMM/Hypervisor when the given instruction is executed on the guest OS. Similarly, when we gain the control, processor provides us information with regard to the reason why VM Exit occured and the guest addresses at the time it was exiting so that we can use those information. So essentially this makes things a lot easier to create virtualized environment compared to binary patching technology.
Next, let me briefly describe another hardware support for virtualization.
That's the support for memory address translation. If you think about it, we cannot really give guest OS to control entire memory as that could mean that the given guest OS can view pages belonging to other guest OSs. Hence, we will have to control guest OS memory access in VMM/Hypervisor. How do we do that? Earlier we did this by using some sort of software page table that was hidden from the guest OS. What does this mean? This means that when the application in guest OS attempts to read from the memory, that address has to be translated to physical address as the address the application uses is virtual address. However, in our case the physical address that the guest wants to use is actually fake physical address from VMM/Hypervisor's point of view. Hence, we needed a way to provide real hardware physical address so that the guest OS can access to the correct memory address. This additional translation was done in software and its technology is normally called 'shadow paging'. Obviously, keeping up with all the hidden page tables and execute translation caused more memory consumption and slow execution compared to native execution case. So the hardware vendor such as Intel or AMD came up with the hardware support for this so that these are done hardware behind the scene so that the software does not have to concern this translation task. This technology is called extended page table (EPT) from Intel and nested page table (NPT) from AMD.
Here is the diagram that I stole from one of Intel presentation slides: Intel Virtualization Technology Roadmap and VT-d Support in Xen. I think the diagram does a great illustrating the point. As you can see, there is now new EPT base register that points to EPT page table
In addition, hardware vendor added the Virtual Processor Identifiers (VPIDs) so that we do not throw out the TLB cache for those that belong to other processors as it makes sense that we want to keep these cache mapping data to improve the overall performance.
So that's all for now and here is the summary of this post:
- How to configure processor to gain control back when the guest OS executes certain instructions
- Hardware support for hidden page table to translate guest physical to machine physical address
- Keeping the address mapping data by using virtual processor id
I hope that this was useful to those who stop by my page. Thank you.