Tuesday, September 30, 2014

Registers (x86)

As I discussed towards the end of my last post regarding stacks, my next post was likely going to be about registers. Well, here we are! I had originally planned on discussing both x86 and x64 registers in a single post, but this posed two main problems. The first problem was this would have been a very long post! The second problem, which is a considerably larger one, is I don't know x64 assembly/architecture as much as I'd like to feel confident in making a post about it. The good news is whenever I am brushed up enough in regards to x64's architecture to write a detailed post, I can simply jump right in as I did all the dirty work right here in this post! Happy days.

Memory Hierarchy

First off, it's important to discuss and understand what a register is. Before we get into that however, let's have a look at my favorite image of the memory hierarchy:


(thanks to COMPUTER SCIENCE E-1 for this great image)

It's safe to say there are probably thousands of images regarding the memory hierarchy throughout CS books, documents, and presentations, but this one takes the cake for me! It's about as good as it visually gets for a hierarchy image, and although it doesn't display a few key points, I can do that myself right here in this post.

What are the key points I'm talking about that are missing from this image? Well, from the bottom>top, we're going from slowest to fastest. If we're coming from the top>bottom, we're going from the fastest to the slowest (in regards to read and write access time). It's absolutely imperative to also understand that the faster we get, the more expensive we get, and the slower we get, the less expensive we get (in regards to USD). With that now known, you can imagine that the read/write from a removal media device (such as USB) is slower than the read/write from your hard drive, but is less expensive.

As this is a post strictly about registers, I won't go into the complexities and intricacies of each part of the hierarchy, and will instead focus on the registers themselves. As far as access time goes, let's compare registers and the hard drive as an example:

Registers - 1-2ns (nanoseconds)

Hard Drive - 5-20ms (milliseconds)

-- It's all dependent on the architecture of the processor, really. These are rough #'s.

Cue the amazing Grace Hopper!


Why are registers so fast? Registers are actually circuits which are built/wired (literally) into the Arithmetic logic unit (ALU), which is also widely considered the fundamental building block of a CPU. With that said, we really can't get any closer, which means there's also no data transfer overhead as there are barely any clock cycles required. Also, a CPU's instruction set tends to work with registers more than it does with actual memory locations.

Speaking of clock cycles, here's a chart displaying the cycles regarding the memory hierarchy:


(thanks to HLNAND for this great image)

As we can see, a register only takes one clock cycle.

What is a register?
 
Now that we understand the basic fundamentals behind the memory hierarchy and where the register resides on the hierarchy, we can discuss what a register actually is! In its most basic definition, a register is used to store small pieces of data that the processor is actively working on. There are many different registers and categories of registers, all of which essentially do something different, however you can generally break registers down into two types. For example, regarding the first type, we have the General purpose register (GPR), which essentially stores data and performs arithmetic based on an instruction (addition, subtraction, multiplication, etc). Once the arithmetic is finished (or the manipulation of data/memory is finished), it's entirely up to what the instruction is set to do. It can either store it back into memory with the same instruction, a different one, etc.

Regarding the second type, we have the Special purpose register (SPR) which as the name implies has a special meaning and specific purpose. For example, the SP (Stack Pointer) register is a SPR in addition to being a GPR regarding the IA-32 architecture. This register is used to have the CPU store the address of the last program request in a stack. Among other things, as new requests are coming in, they push down the older ones in the stack, with the most recent request always residing at the top of the stack.

It's important to note that at every clock tick, there are specific values regarding registers. The values stored in a specific register may have been updated on a tick, so the values may not be the same as they were prior to the tick. For example, when an interrupt fires, register values are copied to a stack and stay on that stack while an Interrupt Service Routine (ISR) is being executed by the CPU. Once the interrupt is properly handled, the original register values are loaded back from the stack so they can continue to service the instruction they were previously working with.

What I described above is known as context switching, which is essentially the jump of instructions from CPU > ISR. Although unrelated yet interesting to note, in some special cases regarding 0x101 bug checks depending on what actually caused the bug check, you may need to have knowledge of context switching to properly debug.

With all of the above said, there's about a dozen different ways I could go at this point. I could go on to talk about the register file, the many different and various categories of registers, etc. However, let's jump ahead to register renaming as that's a pretty important topic.

Register Renaming

Register renaming is essentially a form of pipelining that deals with data dependencies between instructions by renaming their register operands. The way renaming works is, it will go ahead and replace the architectural register (user-accessible registers, or more easily just known to us as 'the registers') names (value names) with a new value name for each instruction destination operand.

Thanks to register renaming, we can also successfully perform what is known as Out-of-order execution (OOE). How exactly does it allow OOE to be performed? Register renaming entirely eliminates name dependencies between instructions, and recognizes true dependencies. True dependencies occur when an instruction depends on the result of a subsequent instruction.

Given the above is now explained, here's a good time to explain data hazards. Data hazards occur when instructions that exhibit data dependence modify data in different stages of a pipeline. Ignoring a data hazard can lead to what is known as a race condition, which is when the order of the data that was outputted was not the intended order. We have three main data hazards:

  • Read-after-write (RAW), also known as a true dependency.
  • Write-after-read (WAR), also known as an anti-dependency.
  • Write-after-write (WAW), also known as an output dependency.

1. What is RAW? Let's take two instruction locations (l1, l2). A prime RAW example is when l2 tries to read a source before l1 writes to it. l2 is attempting to refer to a result that hasn't been calculated or retrieved yet by l1.

2. What is WAR? Let's once again take l1 & l2. l2 tries to write a destination before it is read by l1. This is a problem in concurrent execution, which notes of course they must work concurrently and not sequentially. If they do work sequentially, then we have a data hazard like so.

3. What is WAW? Taking l1 & l2 one last time, l2 tries to write an operand before it is written by l1.

With register renaming, since we're ultimately maintaining a status bit for each value that indicates whether or not it has been completed, it allows the execution of two instruction operations to be performed out of order when there are no true data dependencies between them. This removes WAR/WAW, and of course leaves RAW intact as discussed above.


(Excerpt from the following .pdf)

x86 Registers

In WinDbg, by using the r command we can go ahead and dump the registers from the context of the thread that caused the crash. For example:

 kd> r  
 eax=818f4920 ebx=86664d90 ecx=818fb9c0 edx=000002d0 esi=818f493c edi=00000000  
 eip=818c07dd esp=a1e72cb0 ebp=a1e72ccc iopl=0     nv up ei pl nz na pe nc  
 cs=0008 ss=0010 ds=0023 es=0023 fs=0030 gs=0000       efl=00000206  

x86 (IA-32) has eight GPRs, which are:

  • EAX
  • EBX
  • ECX
  • EDX
  • ESI
  • EDI
  • EBP
  • ESP

Great, so there's our eight GPRs. Now we can go ahead and break them down:

  • EAX - The 'A' in the EAX register implies it's the accumulator register for operands and results data.
  • EBX - The 'B' in the EBX register implies it's the pointer to the data in the DS segment. DS is the current data segment. It also means 'base register'.
  • ECX - The 'C' in the ECX register implies it's the counter for storing loop and string operations.
  • EDX - The 'D' in the EDX register implies it's the I/O pointer.
  • ESI - The 'SI' in the ESI register implies it's the Source Index, which is a pointer to data in the segment pointed to by the DS register.
  • EDI - The 'DI' in the EDI register implies it's the Destination Index, which is a pointer to data (or a destination) in the segment pointed to by the ES register. It's essentially the counterpart to the ESI register, for lack of a better word.
  • EBP - The 'BP' in the EBP register implies it's the Base Pointer, which is the pointer to data on the stack.
  • ESP - The 'SP' in the ESP register implies it's the Stack Pointer, which is used to detect the location of the last item put on the stack.

Whew, alright! So there's just a few things I'd like to quickly explain as well as we haven't covered all of the bases:

  • EAX - The 'AX' in the EAX register is used to address only the lower 16 bits of the register. If we were to reference all 32 bits, we'd use all of EAX, and not just AX.
  • EIP - I didn't forget this guy, don't worry! The 'IP' in the EIP register implies it's the Instruction Pointer, which can also actually be called the 'program counter'. It contains the offset in the current code segment for the next instruction that will be executed. It's also interesting to note that EIP cannot be accessed by software, and is explicitly controlled by control-transfer instructions such as JMP, CALL, JC, and RET. Aside from control-transfer instructions, interrupts and exceptions can also access EIP directly.

Okay, so I can't mention control-transfer instructions and then not explain them. I mean, I could... but I wouldn't be happy. Control-transfer instructions specifically control the flow of program execution. There are quite a few control-transfer instructions, but I will discuss the ones I mentioned that can directly access EIP:

  • JMP - Jump. JMP transfers program control to a different point in the instruction stream without recording any return information. The destination operand specifies the address of the instruction being jumped to. This operand can be an immediate value, a GPR, or a memory location. The JMP instruction can be used to actually execute four different types of jumps:
1. Near jump - A jump to an instruction within the segment currently pointed to by the CS register. It can also at times be referred to as an intrasegment jump.

2. Short jump - A type of near jump in that the jump range is limited to -128 to +127 from the current EIP value.

3. Far jump - A jump to an instruction that's located in a different segment than the current code segment, but at the same privilege level. It can also at times be referred to as an intersegment jump.

4. Task switch - A jump to an instruction that's located in a different task. Note that a task switch can only be accomplished in protected-mode, which not to fly too off the handle here, but it's necessary to explain. Rather than explaining it here though as it's a bit large, I'll explain it below in a short while.
  • CALL - Call procedure. CALL pushes the current code location onto the hardware supported stack in memory, and then performs an unconditional jump to the code location indicated by the label operand. Unlike the simple jump instructions I listed above, the call instruction saves the location to return to when the subroutine completes. 
  • JC - Jump if carry flag is set. 
  • Ret - Return. This instruction transfers control to the return address located on the stack, which is usually placed on the stack by a call instruction that we discussed above. It then performs an unconditional jump to the retrieved code location. For example:

 call <label>  
 ret  
I mentioned intersegment/intrasegment jumps above. Intersegment jumps can transfer control to a statement in a different code segment, while intrasegment jumps are always between statements in the same code segment.

Now that we have the above control-transfer instructions explained, let's discuss protected-mode as I mentioned above. Before further discussing protected-mode and the similarly named but very different real-mode, we'll need to do a bit of a history lesson.

Way before my time, back in the late 70's (76-78), Intel's 16-bit 8086 processor was released. It was 16-bit because its internal registers + internal/external data buses were 16 bits wide. A 20-bit external address bus meant this beast could address a whopping 1 MB's of memory! 1 MB may not seem like anything these days, but it was actually considered more than overkill around this time. Due to this being the case, the max linear address space was limited to a mere 64 KB. Aside from 1 MB being overkill, this was also because the internal registers were only 16 bits wide.

There were two problems here:

1. Programming over 64 KB boundaries meant adjusting segment registers.

2. As time went on, applications were being released that made this mere 64 KB seem like the measly number it is today.

What was the saving grace? Intel's 80286 processor was released in 1982! Well, what's so great about this processor that solve the above two problems? The 80286 processor had two operating modes, as opposed to the 8086 which only had one. The operating modes were:

1. Real-Mode (backwards compatible 8086 mode).

2. Protected-Mode.

The 80286 processor had a 24-bit address bus, which could address up to 16 MB! That's 15 more MB than its predecessor. There's too much good stuff here, so let's discuss the problems. The problems were certainly problems, and they were considerably big ones:

- DOS apps couldn't easily be ported to protected-mode granted that most/if not all DOS apps were developed in a way that made them incompatible.

- The 80286 processor couldn't successfully revert back to the backwards compatible real-mode without a CPU reset. Later on however in 1984, IBM added circuitry that allowed a special series of instructions to successfully cause a revert without initiating a CPU reset. This method while certainly being a great feat, posed quite the performance penalty. Later on it was discovered that initiating a triple fault was a much faster and cleaner way, but there was still yet to be a 'painless' method of transition.

With the above said, the successor was released which is the 80386. The 80386 had a 32-bit address bus, which allowed for 4 GB of memory access. 1 MB > 16 MB > 4 GB, quite the increase! In addition, the segment size was increased to 32-bits, which meant there wasn't a need to switch between multiple segments to access the full address space of 4 GB.

Whew! With all of this known, what is protected-mode actually so great for at this point? Protected-mode allows for virtual memory, paging, ability to finally painlessly switch back to real-mode without a CPU reset, etc.

How do we actually get into protected-mode? The Global Descriptor Table (GDT) needs to be created with a minimum of three entries. These three entries are a null descriptor, a code segment descriptor, and a data segment descriptor. Afterwards, the PE bit needs to be set in the CR0 register and a far JMP needs to be made to clear the prefect input queue (PIQ).

 // Set PE bit  
 mov eax, cr0  
 or eax, 1  
 mov cr0, eax  
 // Far JMP (Remember CS = Code Segment)  
 jmp cs:@pm  
 @pm:  
 // Now in protected-mode :)  

So we got to most of the registers from the x86 register dump excerpt, but we're missing these ones:


 nv up ei pl nz na pe nc  

These are the current contents of the FLAGS register (there are 20[?] different flags), which is the status register for x86.

  • nv - No overflow.
  • up - Up.
  • ei - Enable interrupt.
  • pl - Plus (I believe).
  • nz - Not zero.
  • na - Not auxiliary carry.
  • pe - Parity even.
  • nc - No carry.

Disassembly Example

Let's now dump a stack from a random x86 crash dump to show an example of some of the registers we talked about, some assembly code, and instructions:

 ChildEBP RetAddr   
 a1e72ccc 81a8eec4 nt!KeBugCheckEx+0x1e  
 a1e72cf0 81a1e85f nt!PspCatchCriticalBreak+0x73  
 a1e72d20 81a1e806 nt!PspTerminateAllThreads+0x2c  
 a1e72d54 8185c986 nt!NtTerminateProcess+0x1c1  
 a1e72d54 77725d14 nt!KiSystemServicePostCall  
 0028f0d0 00000000 0x77725d14  

Let's go ahead and disassemble the nt!PspTerminateAllThreads+0x2c kernel function:

 nt!PspTerminateAllThreads+0x2c:  
 81a1e85f 8b450c     mov   eax,dword ptr [ebp+0Ch]  // Move value stored at memory address in the ebp+0Ch register to the eax register.
 81a1e862 8b4048     mov   eax,dword ptr [eax+48h]  // Move value stored at memory address in the eax+48h register to the eax register.
 81a1e865 8945f0     mov   dword ptr [ebp-10h],eax  // Move contents of eax register to the ebp-10h register.
 81a1e868 6a00      push  0  // Push a 32-bit zero on the stack.
 81a1e86a 8bc7      mov   eax,edi  // Move contents of the edi register into the eax register.
 81a1e86c c745fc22010000 mov   dword ptr [ebp-4],122h  // Store the 32-bit value 122h to the ebp-4 register.
 81a1e873 e8bf010000   call  nt!PspGetPreviousProcessThread (81a1ea37) // Call function name nt!PspGetPreviousProcessThread.
 81a1e878 8b5d14     mov   ebx,dword ptr [ebp+14h]  // Moving value stored at memory address in the ebp+14h register to the ebx register.

Hope you enjoyed reading!

References

Intel 80386 Reference Programmer's Manual
X86 Assembly/High-Level Languages
A Guide to Programming Intel IA32 PC Architecture
Segment Registers: Real mode vs. Protected mode

27 comments:

  1. As usual, excellent article Patrick ^_^

    ReplyDelete

  2. imo
    imo downloading
    imo download
    imo app
    imo apk
    Competition even among digital distribution platforms is important.

    ReplyDelete
  3. Thanks for sharing, very informative blog.
    ReverseEngineering

    ReplyDelete
  4. Hey, this is amazing content. thank you for sharing.
    ReverseEngineering

    ReplyDelete
  5. If you are in search of a reliable coin master free spins online tool then make use of our tool free coin master spins

    ReplyDelete
  6. This comment has been removed by the author.

    ReplyDelete
  7. I certainly enjoyed every small bit of it, I have you bookmarked to check out all the unused stuff you post. Best 3d scanning services Vancouver, BC

    ReplyDelete
  8. Thank you for your sharing. Thanks to this article I can learn more things. Expand your knowledge and abilities. Actually the article is very practical. Thank you!

    사설토토
    바카라사이트
    파워볼게임
    온라인바카라

    ReplyDelete
  9. I really like this post it was excellent and integrating post. I must say I enjoy to reading this article thanks a lot for sharing this article and continue to good work.

    토토사이트
    온라인카지노사이트
    파워볼사이트
    온라인카지노

    ReplyDelete
  10. I really like this post it was excellent and integrating post Also check the new uk lottery result.
    thunderball results tonight

    ReplyDelete
  11. I have read your article, it is very informative and helpful for me. I admire the valuable information you offer in your articles. Thanks for posting it. for More Information Click Here :- HP Support Assistant

    ReplyDelete
  12. พบกับเว็บเกมสล็อตออนไลน์ปั่นง่าย เกมดี ไม่มีสะดุด SLOT PG เราคือผู้นำด้านเกมสล็อตออนไลน์ ที่มีระบบการเล่นดีและมีประสิทธิภาพสูงมากที่สุดในปี 2022 เราได้ปรับปรุงและอัพเดตเกมให้ใหม่และน่าลงเดิมพันอยู่เสมอ เปิดใจลงเดิมพันกับเราวันนี้ กดที่ตรงนี้ แล้วรับโบนัสสมาชิกใหม่ทันที 100% มีทีมงานคุณภาพคอยให้บริการเกมสล็อตออนไลน์แก่ผู้เล่นฟรีตลอด 24 ชั่วโมง สมัครสมาชิกกับเราวันนี้ ลุ้นรับโชคใหญ่จากการเดิมพันได้ทันที หมุนสปินได้ทุกที่ทุกเวลา !!

    ReplyDelete
  13. The global Office Supplies Market is estimated to grow to $247 billion by 2020. Items in the office supplies' market include paper-based products, storage and equipment, and writing and marketing instruments. It also covers non-core office supplies such as kitchen supplies, cleaning supplies, office furniture, workwear safety, computer consumables/accessories, etc. The global office solutions market can be divided into Recurring Spend products which are ordered on a regular basis (predominantly core office supplies and some non-core office supplies) such pens, paper files and folders, clips, staplers and markers and Non-recurring Spend Products.

    ReplyDelete
  14. AMBสล็อตเว็บตรงที่ดีที่สุด 2022 เปิดประสบการณ์ สล็อต ที่ไม่เหมือนที่ใด บนเว็บไซต์เดินพันออนไลน์อันดับ 1 บริการ SUPERSLOT สล็อตครบทุกค่าย สล็อตทุนน้อย amb สล็อตเครดิตฟรี 100% PG SLOT SLOTXO JOKER123 ค่ายสล็อตที่ดีที่สุด เเตกง่าย เเตกบ่อยที่สุด มีเกมสล็อตให้ล่าเเจ๊คพอตมากกว่า 300 เกม สล็อตเล่นง่าย จ่ายเต็ม ถอนได้ไม่อั้น ถอนฟรี บริการฟรี ผ่านระบบ AUTO

    ReplyDelete
  15. IGNOU is one of the government's best universities and as per the government rules IGNOU assignments weightage is 30% in exams and it’s necessary for all students to complete assignments to pass for exams. If you want to more information about IGNOU Assignments go through our website

    ReplyDelete
  16. สำหรับคนทุนน้อย ซึ่งอันที่จริงมีให้สำหรับสมาชิกใหม่ที่สามารถเล่นได้ทั้งสล็อตแมชชีนออนไลน์และคาสิโนออนไลน์ซึ่งคุณสามารถฝาก ถอน และเล่นเงินได้อย่างไม่มีกำหนด สล็อต PG ไม่มีการฝากและถอนขั้นต่ำ และไม่จำกัดจำนวนโบนัสที่คุณสามารถเล่นในเครื่อง สล็อตออนไลน์ ได้ฝาก 50บาทรับ 100บาทถอนได้ไม่จำกัด MEGA GAME ข้อมูลเพิ่มเติม ของโปรโมชั่น สิ่งที่ยอดเยี่ยมเหล่านี้จะทำให้เว็บไซต์ของเราคุณสามารถบอกต่อหรือพูดฝาก 50 รับ 100 หนีคนอื่นได้ไม่

    ReplyDelete
  17. พนันบอลออนไลน์ 10 บาท ฝาก-ถอน เงินออโต้ AUTO รวดเร็ว แทงบอลเต็ง
    บอลสด บอลสเต็ป 2-12 เริ่มต้นเพียง 10 บาท เล่นผ่านมือถือ รองรับทุกระบบทั้ง iOS และ Android สมัครสมาชิกได้ตลอดเวลา 24 ชั่วโมง
    เปิดยูสใหม่ฝากเพียง 100 บาท ฝากถอนครั้งต่อไปไม่มีขั้นต่ำ แทงบอลออนไลน์ขั้นต่ำ 10

    ReplyDelete
  18. Thanks so much for the blog post its very informative and I commend you for that. sportstoto I will have to bookmark this site and show my friends.

    스포츠토토
    파워볼게임
    안전놀이터

    ReplyDelete
  19. ทางเราคือผู้ให้บริการเกมออนไลน์ปี 2022 ที่มีเกมให้เลือกเล่นมากกว่า 1,000 เกม สมัครเว็บตรง สล็อตเล่นผ่านเว็บ ก็สามารถเล่นได้เช่นกัน สมัครเล่นด่วน SLOT สล็อตออนไลน์มาใหม่ superslot.li*******

    ReplyDelete
  20. This is my first visit to this website, you have shared useful content which i like reading. I have bookmarked the website to read more similar content !


    토토스페셜
    토토랭킹
    토토픽사이트

    ReplyDelete
  21. I read your article, it was very informative and helpful to me. I admire the valuable information you provide in your article. Thanks for posting it. For more information, click here:
    essay experts

    ReplyDelete