قالب وردپرس درنا توس
Home / IOS Development / mikeash.com: Friday Q & A 2017-06-30: Dissecting objc_msgSend on ARM64

mikeash.com: Friday Q & A 2017-06-30: Dissecting objc_msgSend on ARM64



Friday Q & A 2017-06-30: Dissecting objc_msgSend on ARM64

We're back! During the week of WWDC I spoke to CocoaConf Next Door, and one of my conversations involved a dissection of objc_msgSend s ARM64 implementation. I thought it would be a nice return to blogging for Friday Q & A.

Overview
All Objective-C objects have a class and each Goal C class has a list of methods. Each method has a voter, a feature pointer for the implementation, and some metadata. The job of objc_msgSend is to take the object and selector passed, look up the corresponding function's function pointer, and then jump to that function pointer.

Looking at a method can be very complicated. If a method does not exist on a class, it must continue to search the super classes. If no method is found at all, it must ring in the driving time message code. If this is the very first message sent to a particular class, it must call the + initiation method .

.

Viewing a method must also be extremely fast in the usual case since it is done for each method call. This is obviously in conflict with the complicated lookup process.

Goal-C's solution to this conflict is the method buffer. Each class has a cache that stores methods as pairs of selectors and function pointers, known in Objective-C as IMP . They are organized as a hashboard, so searches are fast. When looking up a method, driving time first consult the cache. If the method is not in the cache, follow the slow, complicated procedure, then add the result to the cache so the next time can be fast.

objc_msgSend is written in collection. There are two reasons for this: one is that it is not possible to write a function that protects unknown arguments and jumps to any function in C. The language just does not have the necessary features to express such things. The other reason is that it's extremely important for objc_msgSend to be fast, so every last instruction of it is hand-written so it can go as fast as possible.

Of course, you will not write the complicated messaging procedure in Langauge. Neither is it necessary because things will be slow no matter what moment you begin to go through it. The message sent code can be divided into two parts: it is the fast track in objc_msgSend itself, written in collection, and slowly the path implemented in C. The collection section looks up the cache method and jumps to it if it is found. If the method is not in the cache, it calls it into the C code to handle things.

Looking at objc_msgSend yourself, do the following:

  1. Get the class of
  2. Use the steps submitted to look up the method in the cache.
  3. If it is not in the cache, enter the C-code.
  4. ] Jump to IMP for the method.

How does it all this? Let us see!

Instructions for instruction
objc_msgSend has some different ways that can take depending on the circumstances. It has special code to handle things like messages to null marked pointers and hockey borders. I start by looking at the most common linear case where a message has been sent to an unlisted nil unmarked pointer and the method exists in the cache without having to be scanned. I notice the different junctions when we go through them, and as soon as we finish the common course, I circle back and look at all the others.

I will list each instruction or group of instructions followed by a description of what it does and why. Just remember to see to find the instruction that a particular piece of text discusses.

Each instruction takes place by its displacement from the beginning of the function. This acts as a counter and allows you to identify hopes.

ARM64 has 31

integer registers that are 64 bits wide. They are referred to with the note x0 through x30 . It is also possible to access the lower 32 bits in each registry as if it were a separate registry using w0 through w30 . Registers x0 through x7 are used to send the first eight parameters of a function. This means that objc_msgSend receives the parameter in x0 and the selector _cmd parameter in x1 .

. ] Let us begin!

      0x0000    cmp        x0    #   0x0 
      0x0004    b .   le       0x6c 

0x6c

] This performs a signed comparison of by itself with 0 and jumps elsewhere if the value is less than or equal to zero. A value of zero is null so this handles the special case of messages to zero. This also handles marked pointers. Marked pointers on ARM64 are indicated by setting high bits of pointer. (This is an interesting contrast with x86-64, where it is the low bit.) If the high bit is set, the value is negative when interpreted as a signed integer. For the usual case of itself as a normal pointer, the branch will not be taken. [19659000] This load itself s isa loading 64-bit quantity indicated at x0 containing itself . The register x13 now contains isa .

      0x000c    and       x16     x13    #  ] 0xffffffff8 

ARM64 can use non-pointer ises. Traditionally, isa refers to object class, but non-point isa utilizes savings by proclaiming other information in isa as well. This instruction performs a logical OG to mask all of the additional bits and leaves the current class pointer in x16 .

      0x0010    ldp       x10     x11 [19659026]   [  x16    #   0x10 ] 

This is my favorite instruction in ] objc_msgSend . It loads the class cache information into x10 and x11 . ldp the instruction loads two registry values ​​of data from memory into registers named in the first two arguments. The third argument describes where to load the data, in this case by compensating 16 from x16 which is the area of ​​the class containing the buffer information. The cache itself looks like this:

      typedef    uint32_t    mask_t ; 

      struct    cache_t    {
          struct    bucket_t    *   ; 
          mask_t    _mask ; 
          mask_t    _occupied ; ]  ldp  instruction,  x10  contains the value of  _buckets  and  x11  contains  occupied  in its high 32 bits and  _mask  in its low 32 bits. 

_obcupied indicates how many entries the hashboard contains and does not matter in objc_msgSend . _mask is important: it describes the size of the hash table as a practical AND-possible mask. Its value is always a force of two minus 1, or in binary terms something that looks like 000000001111111 with a variable number of 1s at the end. This value is required to determine the index index of a selector and to wrap around the end when searching the table.

This instruction calculates the boot table index of the voter that was submitted as _cmd . x1 contains _cmd so w1 contains the 32 bits of _cmd . w11 contains _mask as mentioned above. This instruction OGS the two together and places the result in w12 . The result is equivalent to the calculation of _cmd% table_size but without the expensive modulo operation.

      0x0018    Adds       x12     x10     x12     lsl    #   4 

The index is not enough. To start loading data from the table, we need the actual address to be uploaded. This instruction calculates that address by adding the table index to the table pointer. It changes the table index again after 4 bits first multiply by 16 because each table span is 16 byte. x12 now contains the address of the first bucket to search.

      0x001c    ldp       x9     x17     [  x12 ] 

Our friend ldp does a different look . This time it is loaded from the pointer in x12 pointing to the bucket to search. Each bucket contains a picker and an IMP . x9 now contains the seller for the current bucket, and x17 contains IMP .

      0x0020    cmp       x9     x1 
      0x0024    b .   ne      0x2c 

These instructions compares the tray selectors in x9 with _cmd in x1 . If they are not equal, this bucket does not contain an entry for the vendor we are looking for, and if so, the other instruction will jump to compensate 0x2c that handles non-matching buckets. If the choices match, we have found the listing we are looking for and the drive continues with the next instruction.

This makes an unconditional jump to x17 containing IMP loaded from the current bucket. From here, performance will continue in the actual implementation of the target method, and this is the end of objc_msgSend's shortcut. All argument records have become undisturbed, so the target method will receive all passed arguments, just as if it had been called directly.

When everything is cached and all the stars vote, this path can be performed in less than 3 nanoseconds on modern hardware.

It's the fast track, what about the rest of the code? Let's proceed with the code for a non-matching bucket.

      0x002c    cbz       x9     __ objc_msgSend_uncached 

x9 contains the seller loaded from the bucket. This instruction compares it with zero and jumps to __ objc_msgSend_uncached if it is null. A null selector indicates an empty bucket, and an empty bucket means the search has failed. The target method is not in the cache, and it's time to fall back to the C-code that performs a more comprehensive lookup. __ objc_msgSend_uncached handles it. Otherwise, the bar does not match but is not empty and the search continues.

      0x0030    cmp       x12     x10 
      0x0034    b .   eq      0x40 

This instruction compares the current bucket address in x12 with the beginning of the x10 hash table. If they match, skip to code that breaks the search back to the end of the hash table. We have not seen it yet, but the hash table search that is being performed here actually goes backwards. The search investigates decreasing indexes until it hits the beginning of the table, then it begins at the end. I'm not sure why it works this way, instead of the usual approach to increasing addresses that pack to the beginning, but it's a sure bet that's because it ends up getting faster this way.

Offset 0x40 handles the wraparound case. Otherwise, the drive will continue for the next instruction.

      0x0038    ldp       x9     x17     [  x12    19659144] -   0x10 ] ! 

Another ldp reloads a cache bucket. This time it loads it from displacement 0x10 to the address of the current cache bucket. The exclamation mark at the end of the address reference is an interesting feature. This indicates a recall of registry, which means that the registry is updated with the newly calculated value. In this case it is efficient to do x12 - = 16 in addition to loading the new bucket, making x12 pointing to the new bucket.

Now that the new bunch is loaded, driving can continue with the code that checks to see if the current bucket is a match. This reverts to the instruction labeled 0x0020 above, and goes through all that code again with the new values. If it continues to find non-matching buckets, this code continues until it finds a match, an empty bucket or hits the beginning of the table.

      0x0040    add       x12  [19659024] x12     w11     uxtw    #   4 

This is the goal of when The search is broken. x12 contains a pointer for today's bucket, as in this case is also the first bucket. w11 contains the table mask, which is the size of the table. This adds the two together, while switching w11 after 4 bits, multiplying it with 16 . The result is that x12 points at the end of the table, and the search can continue from there.

      0x0044    ldp       x9     x17 ]    [] ] 

The present ldp loads the new buckled into x9 and x17 . [19659023] 0x0048 cmp x9 x1
0x004c b . ne 0x54
0x0050 br x17

This code checks to see if the bucket matches and jumps to the bucket's IMP . There is a duplicate of the code at 0x0020 above.

      0x0054    cbz       x9     __ objc_msgSend_uncached 

As before, if the bucket is empty then there is a cache error and execution continues in the comprehensive lookup code implemented in C. [19659024] 0x0058 cmp x12 x10
0x005c b . eq 0x68

This will check wraparound again and jump to 0x68 if we have hit the beginning of the table again. In this case, it jumps into the comprehensive lookup code implemented in C:

      0x0068    b         __ objc_msgSend_uncached 

This is something that should never happen. The table grows as listings are added to it and it is never 100% full. Hash tables become ineffective when they are too crowded because collisions become too common.

Why is this here? A comment in the source code explains:

Clone scan slope to miss instead of hanging when the cache is corrupt. The slow path can detect any corruption and stop later.

I doubt this is common, but obviously, the Apple people have seen memory correction that caused the cache to be filled with bad entries, and jump into the C-code improves diagnostics.

The existence of this check should have minimal impact on code that does not suffer from this corruption. Without it, the original loop could be reused, which would save a bit of instruction buffer space, but the effect is minimal. This wraparound deal is not a common case anyway. It will only be called for selectors that are sorted near the start of the hash table and only if there is a collision and all previous entries are busy.

      0x0060    ldp       x9  [19659024] x17     [  x12    #  -   0x10 ]  19659029] 0x0064    b         0x48 

The rest of this loop is the same as before. Add the next bucket in x9 and x17 update the bucket pointer in x12 and return to the top of the loop.

It's the end of the main part of objc_msgSend . What remains is special cases for nil and marked pointers.

Tagged Pointer Handler
You will remember that the very first instructions checked for them and jumped to compensate 0x6c to handle them. Let's continue from there:

We've come here because even is less than or equal to zero. Less than zero indicates a marked pointer, and zero is null . The two cases are handled completely differently, so the first code here is to see if itself is null or not. If itself is equal to zero, this instruction explains to 0xa4 which is where nil trades resides. Otherwise, there is a marked pointer, and the drive continues with the next instruction.

Before moving on, let's briefly discuss how marked pointers work. Marked pointers support multiple classes. The four top pieces in the highlighted pointer (on ARM64) indicate which class the object is. They are mainly the marked pointer's isa. Of course, four pieces are not close enough to hold a class pointer. Instead, it is a special table that stores the available branded pointer classes. The class of a marked pointer "object" is found by looking up the index in the table corresponding to the four top bits.

This is not the whole story. Marked pointers (at least on ARM64) also support extended classes. When the top four bits are set to 1 the next eight bits are used to index in an expanded marked pointer class table. This allows driving time to support more branded pointer classes, at the expense of having less storage space for them.

Let's go on.

      0x0070    mov       x10    #  -   0x1000000 billion 

This sets x10 to an integer with the four top bits and all other bits set to zero . This will serve as a mask to extract the pieces from by yourself .

      0x0074    cmp       x0     x10 
      0x0078    b ].   hs      0x90 

This controls an extended marked pointer. If itself is greater than or equal to the value of x10 it means that the four top four bits are set. If so, branch to 0x90 that will handle extended classes. Otherwise, use the primary tag pointer table.

      0x007c    adrp      x10     _objc_debug_taggedpointer_classes  @   SIDE 
      0x0080    add       x10 [19659026]    x10       _objc_debug_taggedpointer_classes  @   PAGEOFF 

This little song and dance loads the address of _objc_debug_taggedpointer_classes which is the primary labeled pointing table. ARM64 requires two instructions to load the address of a symbol. This is a standard technique for RISC-like architectures. ARM64 pointers are 64 bits wide, and the instructions are only 32 bits wide. It is not possible to place an entire pointer in an instruction.

x86 does not suffer from this issue since it has variable length instructions. It can only use a 10-byte instruction, where two bytes identify the instruction itself and the target register, and eight bytes hold the pointer value.

On a machine with fixed-length instructions, you load the value in pieces. In this case, only two pieces are needed. adrp The instruction loads the upper part of the value, and adds and then puts into the bottom.

The marked class index is in the four top pieces of x0 . To use it as an index, it must be moved immediately after 60 bits so that there will be an integer in the range 0-15 . This instruction performs that shift and places the index in x11 . ldr x16 [ x10 x11 lsl # 3 ] [19659034] This uses the index in x11 to load the entry from the table x10 points to. x16 The registry now contains the class of this marked pointer.

With the class in x16 we can now refine back to the main code. The code starting with offset 0x10 assumes that the class pointer has been loaded into x16 and performs shipment from there. The marked pointer handler can therefore only branch back to that code instead of duplicating the logic here.

      0x0090    adrp      x10     _objc_debug_taggedpointer_ext_classes  @   SIDE 
      0x0094    add       x10     x10     _objc_debug_taggedpointer_ext_classes  @   PAGEOFF 

The extended labeled class processor is similar. These two instructions load the pointer to the extended table.

      0x0098    ubfx      x11     x0    #   52    #   8 

This instruction loads the expanded class index. It extracts 8 bits starting from bit 52 in itself to x11 . ldr x16 [19659026] [ x10 x11 lsl # 3 ] [19659034] As previously used, this index will look up the class in the table and load it into x16 .

With the class in x16 it can limit back to the main code.

It's almost everything. All that remains is nil trades.

nil Trades
Finally, we come to nil trades. Here it is in its entirety.

      0x00a4    mov       x1    #   0x0 
      0x00a8    movi      d0   ] #   0000000000000000 
      0x00ac    movie part [

]

]

00000000000000 00000000000000 0x00b0 film part d2 # 0000000000000000 0x00b4 movi d3 # 00000000000000 0x00b8 right

The handler nil is completely different from the rest of the code. There is no class lookup or method shipping. All that makes for zero is back 0 to the caller.

This task is a little complicated by the fact that objc_msgSend does not know what type of return value the caller expects. Is this method back an integer, or two, or a floating point value, or nothing at all?

Fortunately, all records used for return values ​​can be overwritten safely even if they are not used for this call's return value. Integrated return values ​​are stored in x0 and x1 and float point return values ​​are stored in vector register v0 through v3 . Multiple registers are used to return smaller struct s.

This code is deleted x1 and v0 through v3 . d0 through d3 records refer to the lower half of the corresponding v records and storage in them clearing the upper half so that the effect of the four movi The instructions are to remove the four registries. After doing this, it returns the control of the caller.

You may wonder why this code does not remove x0 . The answer to that is simple: x0 keeps even as in this case is null so it's already zero! You can save an instruction by not clearing x0 since it already contains the value we want.

What about bigger struct returns that does not fit in registers? This requires a small cooperation from the caller. Large struct returns are performed by giving the caller allocate enough memory for the return value, and then sends the address of the current memory in x8 . The function then writes to that memory to return a value. objc_msgSend can not remove this memory because it does not know how much return value is. To solve this, the compiler generates code that fills the memory with zeros before calling objc_msgSend .

It is the end of nil handles and by objc_msgSend as a whole.

Conclusion
It is always interesting to dive into frame contacts. objc_msgSend is especially a work of art and wonderful to read through.

That's it for today. Come back next time for more squishy goodness. Friday Q & A is powered by reading entry, so if you have something you want to see discussed here, submit it!

Do you like this article? I sell all the books full of them! Volumes II and III are now out! They are available as ePub, PDF, Print, and on iBooks and Kindle. Click here for more information.


Comments:


Comments RSS feed for this page

Add your thoughts, post a comment:

Spam and off topic posts will be deleted without notice. Culprits can be humiliated publicly in my own discretion.

Code syntax highlighting thanks to Pygments.

Source link