In my last blog posting I have talked about how you can read and execute the Second Stage Boot Loader of your own Operating System. In today’s blog posting I want to show you in the first step how you can remove the dependency from floppy disks. Afterwards we will switch our CPU into the x64 Long Mode, which is necessary to be able to execute our x64-based OS Kernel.
Reading Data through ATA PIO
Over the last few blog postings, you have already learned how to read from floppy disks through the BIOS interrupt 0x10. Floppy disks are great for the first steps in OS development, because you can interact with them quite nicely through the BIOS. But if you want to run your OS on a physical computer, you need to have a physical floppy drive (there are USB-based versions available). Besides that, you are running your OS on a very ancient technology.
Therefore, I did some research to find a better option for a storage device, which can be interfaced in a quite effortless way. One of my prerequisites was that the storage device can be interfaced through I/O Ports instead of BIOS interrupts. As soon as we are switching the CPU into x64 Long Mode, you have no BIOS interrupts available anymore, and therefore you can only communicate with your hardware through their various I/O Ports.
After some time, I came up with the ATA PIO interface. With that interface you can read from and write to an IDE-based hard disk. Yes, you read correctly: the hard drive must be an IDE attached hard disk. This is not a big problem when you deal with a Virtual Machine, because this is just a configuration option within your virtual disk file:
But when we talk about physical hardware it can get a little bit more complicated, because these days hard disks are normally connected through a SATA controller – or even through a NVMe controller. When I test out my OS on physical hardware, I use a 10-year-old Lenovo W510 notebook (Quad-Core CPU with 8 GB RAM). It has also one internal SATA slot where you can put in your hard disk. To be able to deal with this hard disk through ATA PIO, you must switch the SATA Controller into the so-called Compatibility Mode. Then your hard disk just acts like a traditional IDE-based hard disk. This compatibility mode even works with SATA SSDs!
The output of the build process of my OS is still a 1.44 MB large FAT12 formatted floppy image. You can use the command dd to copy that image in raw format to your physical hard drive. With this approach, I can now run my OS on physical hardware directly from an SSD drive which is interfaced through ATA PIO. It does not give you the native speed of the SSD drive, but we do not care about speed in these first baby steps. You can access the ATA PIO interface through the I/O ports ranging from 0x1F0 to 0x1F7:
- 0x1F0: Access to the read data or the data to be written
- 0x1F2: Sector count that must be read or written
- 0x1F3 to 0x1F5: Logical address of the disk sector that must be read or written
- 0x1F7: Command Port
- 0x20: Read
- 0x30: Write
The following listing shows the assembly code that is necessary to read a given disk sector through ATA PIO into main memory. The register BX must contain the number of sectors to be read, the register ECX must contain the starting LBA address, and the register ES:EDI must contain the destination memory address.
;================================================ ; This function reads a sector through ATA PIO. ; BX: Nunber of sectors to read ; ECX: Starting LBA ; EDI: Destination Address ;================================================ ReadSector: ; Sector count MOV DX, 0x1F2 MOV AL, BL OUT DX, AL ; LBA - Low Byte MOV DX, 0x1F3 MOV AL, CL OUT DX, AL ; LBA - Middle Byte MOV DX, 0x1F4 MOV AL, CH OUT DX, AL ; LBA - High Byte BSWAP ECX MOV DX, 0x1F5 MOV AL, CH OUT DX, AL ; Read Command MOV DX, 0x1F7 MOV AL, 0x20 ; Read Command OUT DX, AL .ReadNextSector: CALL Check_ATA_BSY CALL Check_ATA_DRQ ; Read the sector of 512 bytes into ES:EDI ; EDI is incremented by 512 bytes automatically MOV DX, 0x1F0 MOV CX, 256 REP INSW ; Decrease the number of sectors to read and compare it to 0 DEC BX CMP BX, 0 JNE .ReadNextSector RET
The label .ReadNextSector also calls the utility functions Check_ATA_BSY and Check_ATA_DRQ. These two functions are checking for the BSY and DRQ flag of the ATA PIO interface. The following listing shows their implementation.
;================================================ ; This function checks the ATA PIO BSY flag. ;================================================ Check_ATA_BSY: MOV DX, 0x1F7 IN AL, DX TEST AL, 0x80 JNZ Check_ATA_BSY RET ;================================================ ; This function checks the ATA PIO DRQ flag. ;================================================ Check_ATA_DRQ: MOV DX, 0x1F7 IN AL, DX TEST AL, 0x08 JZ Check_ATA_DRQ RET
With these functions you are now able to read a given file from a hard disk. The following listing shows the function LoadFileIntoMemory that is used now from the boot sector code to read a given file from a FAT12 partition into memory.
;================================= ; Loads a given file into memory. ;================================= LoadFileIntoMemory: .LoadRootDirectory: ; Load the Root Directory into memory. ; It starts at the LBA 19, and consists of 14 sectors. MOV BL, 0xE ; 14 sectors to be read MOV ECX, 0x13 ; The LBA is 19 MOV EDI, ROOTDIRECTORY_AND_FAT_OFFSET ; Destination address CALL ReadSector ; Loads the complete Root Directory into memory .FindFileInRootDirectory: ; Now we have to find our file in the Root Directory MOV CX, [bpbRootEntries] ; The number of root directory entries MOV DI, ROOTDIRECTORY_AND_FAT_OFFSET ; Address of the Root directory .Loop: PUSH CX MOV CX, 11 ; We compare 11 characters (8.3 convention) MOV SI, FileName ; Compare against the file name PUSH DI REP CMPSB ; Test for string match POP DI JE .LoadFAT ; When we have a match, we load the FAT POP CX ADD DI, 32 ; When we don't have a match, we go to next root directory entry (+ 32 bytes) LOOP .Loop JMP Failure ; The file image wasn't found in the root directory .LoadFAT: ; Store the first FAT cluster of the file to be read in the variable "Cluster" MOV DX, WORD [DI + 0x001A] ; Add 26 bytes to the current entry of the root directory, so that we get the start cluster MOV WORD [Cluster], DX ; Store the 2 bytes of the start cluster (byte 26 & 27 of the root directory entry) in the variable "cluster" ; Load the FATs into memory. ; It starts at the LBA 1 (directly after the boot sector), and consists of 18 sectors (2 x 9). MOV BL, 0x12 ; 18 sectors to be read MOV ECX, 0x1 ; The LBA is 1 MOV EDI, ROOTDIRECTORY_AND_FAT_OFFSET ; Offset in memory at which we want to load the FATs CALL ReadSector ; Call the load routine MOV EDI, [Loader_Offset] ; Address where the first cluster should be stored .LoadImage: ; Print out the current offset where the cluster is loaded into memory ; This introduces a short delay, which is somehow needed by the ATA PIO code...? MOV AX, DI CALL PrintDecimal MOV SI, CRLF CALL PrintLine ; Load the first sector of the file into memory MOV AX, WORD [Cluster] ; First FAT cluster to read ADD AX, 0x1F ; Add 31 sectors to the retrieved FAT cluster to get the LBA address of the first FAT cluster MOV ECX, EAX ; LBA MOV BL, 1 ; 1 sector to be read CALL ReadSector ; Read the cluster into memory ; Compute the next cluster that we have to load from disk MOV AX, WORD [Cluster] ; identify current cluster MOV CX, AX ; copy current cluster MOV DX, AX ; copy current cluster SHR DX, 0x0001 ; divide by two ADD CX, DX ; sum for (3/2) MOV BX, ROOTDIRECTORY_AND_FAT_OFFSET ; location of FAT in memory ADD BX, CX ; index into FAT MOV DX, WORD [BX] ; read two bytes from FAT TEST AX, 0x0001 JNZ .LoadRootDirectoryOddCluster .LoadRootDirectoryEvenCluster: AND DX, 0000111111111111b ; Take the lowest 12 bits JMP .LoadRootDirectoryDone .LoadRootDirectoryOddCluster: SHR DX, 0x0004 ; Take the highest 12 bits .LoadRootDirectoryDone: MOV WORD [Cluster], DX ; store new cluster CMP DX, 0x0FF0 ; Test for end of file JB .LoadImage .LoadRootDirectoryEnd: ; Restore the stack, so that we can do a RET POP BX RET
The boot sector code has also changed because now we must call the above-mentioned function during the startup. In addition, the boot sector code loads 2 additional files into memory for execution:
1. KLDR16.BIN
This is the Second Stage Boot Loader that is implemented in x16 Real Mode, which still has access to the BIOS. After getting the necessary information from the BIOS, it switches the CPU into x64 Long Mode, where it executes the x64 based KLDR64.BIN file.
2. KLDR64.BIN
This is the Third Stage Boot Loader that is implemented in x64 Long Mode. It currently just prints out the date and time that we have retrieved from the BIOS. In the next release of my OS, it will read through ATA PIO the x64 based OS kernel KERNEL.BIN from the FAT12 partition into memory and executes it. The only purpose of this additional boot loader file is to load the KERNEL.BIN file to the physical memory address 0x100000 and executes it. This task must be done in KLDR64.BIN, because the CPU is now already in x64 Long Mode, and there we can access higher memory addresses like 0x100000. This would be impossible to do in KLDR16.BIN, because the CPU is at that point in time still in x16 Real Mode. The implementation of this functionality will be covered in the next blog posting. The following listing shows the rewritten boot sector code.
Main: ; Setup the DS and ES register XOR AX, AX MOV DS, AX MOV ES, AX ; Prepare the stack ; Otherwise we can't call a function... MOV AX, 0x7000 MOV SS, AX MOV BP, 0x8000 MOV SP, BP ; Print out a boot message MOV SI, BootMessage CALL PrintLine ; Load the KLDR64.BIN file into memory MOV CX, 11 LEA SI, [SecondStageFileName64] LEA DI, [FileName] REP MOVSB MOV WORD [Loader_Offset], KAOSLDR64_OFFSET CALL LoadFileIntoMemory ; Load the KLDR16.BIN file into memory MOV CX, 11 LEA SI, [SecondStageFileName16] LEA DI, [FileName] REP MOVSB MOV WORD [Loader_Offset], KAOSLDR16_OFFSET CALL LoadFileIntoMemory ; Execute the KLDR16.BIN file... CALL KAOSLDR16_OFFSET
As you can see, both files are read into memory, and finally we continue our code execution at the memory address 0x2000 where the KLDR16.BIN resides.
BIOS Information Block and A20 Line
The first step in the KLDR16.BIN code execution is to retrieve all the necessary information from the BIOS. At this point in time, we only retrieve the current date and time from the BIOS and store them in a memory area that I call the BIOS Information Block – the BIB. In the future we will enhance the BIB with additional information from the BIOS – like the Memory Map and information about the supported graphic modes. The information from the BIOS Information Block will be later used and processed by the x64-based OS kernel. The following listing shows how the current date and time is stored in the BIB.
;================================================= ; This function retrieves the date from the BIOS. ;================================================= GetDate: ; Get the current date from the BIOS MOV AH, 0x4 INT 0x1A ; Century PUSH CX MOV AL, CH CALL Bcd2Decimal MOV [Year1], AX POP CX ; Year MOV AL, CL CALL Bcd2Decimal MOV [Year2], AX ; Month MOV AL, DH CALL Bcd2Decimal MOV WORD [ES:DI + BiosInformationBlock.Month], AX ; Day MOV AL, DL CALL Bcd2Decimal MOV WORD [ES:DI + BiosInformationBlock.Day], AX ; Calculate the whole year (e.g. "20" * 100 + "22" = 2022) MOV AX, [Year1] MOV BX, 100 MUL BX MOV BX, [Year2] ADD AX, BX MOV WORD [ES:DI + BiosInformationBlock.Year], AX RET ;================================================= ; This function retrieves the time from the BIOS. ;================================================= GetTime: ; Get the current time from the BIOS MOV AH, 0x2 INT 0x1A ; Hour PUSH CX MOV AL, CH CALL Bcd2Decimal MOV WORD [ES:DI + BiosInformationBlock.Hour], AX POP CX ; Minute MOV AL, CL CALL Bcd2Decimal MOV WORD [ES:DI + BiosInformationBlock.Minute], AX ; Second MOV AL, DH CALL Bcd2Decimal MOV WORD [ES:DI + BiosInformationBlock.Second], AX RET
Before we switch the CPU into the x64 Long Mode, we also must enable the so-called A20 Line. This line must be enabled on a system so that we can access all memory areas. Unfortunately, there are so many different methods how to enable this line – depending on the used hardware. You can check out the complexity of it in the source code of the Linux Kernel. The following listing shows one method that works currently for me.
;============================================= ; This function enables the A20 gate ;============================================= EnableA20: CLI ; Disables interrupts PUSH AX ; Save AX on the stack MOV AL, 2 OUT 0x92, AL POP AX ; Restore the value of AX from the stack STI ; Enable the interrupts again RET
Virtual Memory on an x64 System
After we have done all these individual steps, we are finally able to switch our CPU into the x64 Long Mode to gain access to the whole available system memory and to be able to execute 64-bit instructions. But before we do that, we must talk about Virtual Memory on a x64 system.
Every time when we have accessed main memory up to this point in time, we have dealt with so-called Physical Memory Addresses. A physical memory address, like 0x2000 where the KLDR16.BIN file resides, is the physical location within the installed RAM module. On the other hand, a Virtual Memory Address, abstracts memory addresses from the underlying RAM modules. The x16 Real Mode has no idea about virtual memory, it only works with physical memory and therefore with physical memory addresses.
But as soon as you switch your CPU into x32 Protected Mode or the x64 Long Mode, your CPU only deals with virtual memory addresses. Every memory address that you provide in machine code is treated as a virtual memory address. But physical memory still can be only accessed with physical memory addresses. Therefore, you need a component which translates a given virtual memory address into a physical memory address. That component is called the Memory Management Unit (MMU) and is part of the CPU.
Based on a so-called Translation Method, the translation between a virtual and physical memory address happens. The cool thing about virtual memory addresses is the fact that it adds an additional layer of abstraction. With that abstraction layer the CPU can enforce different memory policies – based on what the running OS enforces. Here are some examples:
- A running process can’t access the memory region from a different process because each process has a different physical memory region assigned.
- A user mode process can’t directly access Kernel data structures because the Kernel memory space is not accessible from a user mode process.
- Some parts of the main memory can be marked as read-only. When some machine code tries to write to these memory regions, a CPU fault will be triggered.
The idea of virtual memory is to divide the whole available system memory into regions called Pages. A traditional page on a x32/x64 system is normally 4 KB large – 4096 bytes. In addition, Intel CPUs also offer larger page sizes – like Large Pages (2 MB) and Huge Pages (1 GB). A page is always accessed through a virtual memory address. With the provided translation function, the virtual memory address is mapped to a physical memory address – the so-called Page Frame. The page frame has the same size as the page.
2 concurrent running processes can access the same virtual memory address, but through the translation function this virtual address is mapped for each process to a different physical page frame. That’s the power of virtual memory. The following picture illustrates this very important concept.
The question is now how a page is mapped to a physical frame and where that mapping is stored. The CPU uses here so-called Page Tables, which are used to store the mapping/translation information. Each running process has an individual set of page tables assigned. The address of the current active page table is stored in a special CPU register called CR3 – the Control Register 3. As soon as a process switch occurs in the OS, the Kernel must load the address of the new active page table into that register. Afterwards the virtual addresses of the newly active running process are mapped to different physical page frames.
The x64 CPU architecture uses a 4-level hierarchy to store the mapping information in various page tables and each page table has here a size of 4 KB. Each entry in a page table has a fixed size of 8 bytes (64 bits), and therefore you can store 512 entries in a page table. The whole 64-bit long virtual memory address acts here as page table indexes for the various levels – as seen in the following picture.
Each page table index is 9 bits long. This makes sense, because each page table has 512 entries, and 2^9 = 512. With 9 bits we can address each of these individual entries. The remaining bits from bit 48 to bit 64 are ignored, because the current x64 CPU architecture only supports 48 bit long virtual addresses. But it is very important that these remaining bits have a correct value. They must have the same bit value as bit 47 (Sign Extension). These are so-called Canonical Memory Addresses. With only 48 bits in use, the x64 architecture gives us 256 TB of addressable virtual memory. Intel Ice Lake CPUs are introducing a 5-Level Paging, where the bits 48 – 57 are also in use. This extends the addressable virtual memory to 128 PB (peta byte). The x64 architecture calls the 4 levels in the page table hierarchy as follows:
- Level 4: Page Map Level 4 Table (PML4T)
- Level 3: Page Directory Pointer Table (PDPT)
- Level 2: Page Directory Table (PDT)
- Level 1: Page Table (PT)
When you now have a virtual memory address, the CPU must perform a so-called Page Table Walk to translate the virtual memory address to a physical memory address. During the page table walk the given virtual memory address is used as lookup values into the various page tables. A page table walk consists of the following steps:
- The physical memory address of the Page Map Level 4 Table is read from the CR3 register.
- Bits 39 – 47 of the virtual memory address are used to determine the index into the Page Map Level 4 Table.
- The entry of the specific index is read and returns the physical memory address of the Page Directory Pointer Table.
- The Page Directory Pointer Table is read based on the provided physical memory address from the last step.
- Bits 30 – 38 of the virtual memory address are used to determine the index into the Page Directory Pointer Table.
- The entry of the specific index is read and returns the physical memory address of the Page Directory Table.
- The Page Directory Table is read based on the provided physical memory address from the last step.
- Bits 21 – 29 of the virtual memory address are used to determine the index into the Page Directory Table.
- The entry of the specific index is read and returns the physical memory address of the Page Table.
- The Page Table is read based on the provided physical memory address from the last step.
- Bits 12 – 20 are used to determine the final physical frame.
The following table gives describes the format of a page table entry of 64 bits.
Bit(s) | Description |
0 | Is the page currently in memory? |
1 | Is it allowed to write to the page? |
2 | If not set, only Kernel mode code can access this page. |
3 | Writes are going directly to memory. |
4 | No cache is used for this page. |
5 | The CPU sets this bit when the page is accessed. |
6 | The CPU sets this bit when a write occurs on this page. |
7 | Must be 0 in Level 1 and Level 4. When set it creates 1 GB page in Level 3, and 2 MB pages in Level 2. |
8 | The page isn’t flushed from caches on an address space switch. |
9 – 11 | Freely usable by the OS. |
12 – 51 | Physical address of the frame of the next page table in the next level below. |
52 – 62 | Freely usable by the OS. |
63 | If set, it forbids code execution from this page |
It is very important to recall here that the bits 12 – 51 are storing the *physical address* of the page frame that must be accessed in next level below of the page table hierarchy.
Note: The bits 0 – 11 are of course also part of the physical address of the page frame, but the physical address must be always aligned at a 4 KB boundary (4 KB are 0x1000 in hexadecimal and in binary 1000000000000), and therefore these lower 12 bits are always set to zero and can be used as flags as mentioned in the table above.
Therefore, a page table walk performs a lot of physical memory accesses. As you know, accessing physical memory is very slow compared to the speed of the CPU. That’s the reason why a CPU includes a so-called Translation Lookaside Buffer (TLB). The TLB caches the recent translations of virtual memory addresses to physical memory addresses to speed up page table walks. Of course, the TLB must be flushed by the OS as soon as an address space switch occurs. The TLB flush is normally performed by writing a new value into the CR3 control register.
Switching the CPU into x64 Long Mode
In the last section I gave you a quick overview about virtual memory on an x64 system. As you have learned, every memory address acts as a virtual memory address. Therefore, it is very important that you have a working page table hierarchy in place when you switch the CPU into the x64 Long Mode. Otherwise, your OS would crash, because even the memory address in the instruction pointer register RIP is also treated as a virtual memory address that must be translated to a physical memory address.
Therefore, the first step is to create a simple page table hierarchy in physical memory (we are still in x16 Real Mode!) where a virtual memory address maps to the same physical memory address (a virtual memory address has the same value as its physical memory address). This is called an Identity Mapping. We will identity map the first 2 MB of physical memory. The identity mapping is just done temporarily so that we can switch the CPU into x64 Long Mode. When the real OS Kernel is loaded afterwards into memory and is executed, it will set up the final page table structures in memory. The following picture shows the page table hierarchy structure that we will create in the next step in memory.
As you can see from the picture, the PML4T, the PDPT, and the PDT have only a valid page table entry at the first entry 0. This makes sense, because the bits 21 – 47 are always zero when we deal with a memory address below 2 MB. The maximum value below 2 MB is in decimal 2097151 (2 MB minus 1 byte) and is represented in binary as follows:
000000000000000 000000000 000000000 000000000 111111111 111111111111
Sign Extension PML4 PDPT PDT PT Offset
When we perform a page table walk with this limited range of virtual memory addresses (0 – 2 MB), we must access *always* the first entry (zero-based!) in the PML4T, the PDPT, and in the PDT – based on the bit pattern from above. Which entry you must access in the PT depends on the bits 12 – 20 of the virtual memory address. One PT covers a memory region of 2 MB, and therefore we must fill each of the 512 entries in the PT. Each entry points to the same identity mapped physical memory location. The following assembly code shows how to create this page table hierarchy in memory where the PML4T starts at the physical memory address 0x9000.
SwitchToLongMode: MOV EDI, 0x9000 ; Zero out the 16KiB buffer. ; Since we are doing a rep stosd, count should be bytes/4. PUSH DI ; REP STOSD alters DI. MOV ECX, 0x1000 XOR EAX, EAX CLD REP STOSD POP DI ; Get DI back. ; Build the Page Map Level 4 (PML4) ; es:di points to the Page Map Level 4 table. LEA EAX, [ES:DI + 0x1000] ; Put the address of the Page Directory Pointer Table in to EAX. OR EAX, PAGE_PRESENT | PAGE_WRITE ; Or EAX with the flags - present flag, writable flag. MOV [ES:DI], EAX ; Store the value of EAX as the first PML4E. ; Build the Page Directory Pointer Table (PDP) LEA EAX, [ES:DI + 0x2000] ; Put the address of the Page Directory in to EAX. OR EAX, PAGE_PRESENT | PAGE_WRITE ; Or EAX with the flags - present flag, writable flag. MOV [ES:DI + 0x1000], EAX ; Store the value of EAX as the first PDPTE. ; Build the Page Directory (PD) LEA EAX, [ES:DI + 0x3000] ; Put the address of the Page Table in to EAX. OR EAX, PAGE_PRESENT | PAGE_WRITE ; Or EAX with the flags - present flag, writeable flag. MOV [ES:DI + 0x2000], EAX ; Store to value of EAX as the first PDE. PUSH DI ; Save DI for the time being. LEA DI, [DI + 0x3000] ; Point DI to the page table. MOV EAX, PAGE_PRESENT | PAGE_WRITE ; Move the flags into EAX - and point it to 0x0000. ; Build the Page Table (PT) .LoopPageTable: MOV [ES:DI], EAX ADD EAX, 0x1000 ADD DI, 8 CMP EAX, 0x200000 ; If we did all 2MiB, end. JB .LoopPageTable POP DI ; Restore DI.
After we have set up the necessary page table hierarchy in memory, we will perform the switch into the x64 Long Mode. To enable the x64 Long Mode, you must work together with a so-called Extended Feature Enable Register (EFER). Please check out this article for more details about it. The following code shows the necessary code.
; Disable IRQs MOV AL, 0xFF ; Out 0xFF to 0xA1 and 0x21 to disable all IRQs. OUT 0xA1, AL OUT 0x21, AL NOP NOP LIDT [IDT] ; Load a zero length IDT so that any NMI causes a triple fault. ; Enter long mode. MOV EAX, 10100000b ; Set the PAE and PGE bit. MOV CR4, EAX MOV EDX, EDI ; Point CR3 at the PML4. MOV CR3, EDX MOV ECX, 0xC0000080 ; Read from the EFER MSR. RDMSR OR EAX, 0x00000100 ; Set the LME bit. WRMSR MOV EBX, CR0 ; Activate long mode - OR EBX, 0x80000001 ; - by enabling paging and protection simultaneously. MOV CR0, EBX LGDT [GDT.Pointer] ; Load GDT.Pointer defined below. JMP CODE_SEG:LongMode ; Load CS with 64 bit segment and flush the instruction cache [BITS 64] LongMode: MOV AX, DATA_SEG MOV DS, AX MOV ES, AX MOV FS, AX MOV GS, AX MOV SS, AX ; Setup the stack MOV RAX, QWORD 0x70000 MOV RSP, RAX MOV RBP, RSP XOR RBP, RBP ; Execute the KLDR64.BIN JMP 0x3000
Phew, we are now in x64 Long Mode! The code execution continues at the label LongMode, where we first set up the segment registers and the stack. Afterwards we jump to the virtual memory address 0x3000 (physical memory addresses are now gone!) for continuing the code execution. That’s the memory location where the KLDR64.BIN file resides. This virtual memory address is identity mapped to the physical memory address 0x3000. The KLDR64.BIN file currently just prints out a welcome message and the information obtained from the BIOS Information Block that we set up earlier. I will cover in the next blog posting in more details how to write to the screen in x64 Long Mode, because by now you have no access to the BIOS interrupts anymore!
Summary
This was a very long blog posting today, because we have covered a lot of stuff – especially around virtual memory on an x64 system. This knowledge is crucial because we must set up a whole 4-level page table hierarchy to be able to enable the x64 Long Mode. As soon as you are in the x64 Long Mode, your CPU will treat each memory address as a virtual memory address. Therefore, we also have identity mapped the first 2 MB of memory to be able to continue the code execution in x64 Long Mode.
In the next blog posting we will concentrate in more details on the KLDR64.BIN file, which finally loads the real x64 OS Kernel into memory and executes it. Stay tuned.
Thanks for your time,
-Klaus