Five Things That Make Go Fast (5/5): Goroutine Stack Management

Goroutine stack management

In the previous section I discussed how goroutines reduce the overhead of managing many, sometimes hundreds of thousands of concurrent threads of execution. There is another side to the goroutine story, and that is stack management, which leads me to my final topic.

This is a diagram of the memory layout of a process. The key thing we are interested is the location of the heap and the stack.

Gocon-2014-39

Traditionally inside the address space of a process, the heap is at the bottom of memory, just above the program (text) and grows upwards. The stack is located at the top of the virtual address space, and grows downwards.

Because the heap and stack overwriting each other would be catastrophic, the operating system usually arranges to place an area of unwritable memory between the stack and the heap to ensure that if they did collide, the program will abort. This is called a guard page, and effectively limits the stack size of a process, usually in the order of several megabytes.

Gocon-2014-40

We’ve discussed that threads share the same address space, so for each thread, it must have its own stack.

Gocon-2014-41

Because it is hard to predict the stack requirements of a particular thread, a large amount of memory is reserved for each thread’s stack along with a guard page. The hope is that this is more than will ever be needed and the guard page will never be hit. The downside is that as the number of threads in your program increases, the amount of available address space is reduced.

We’ve seen that the Go runtime schedules a large number of goroutines onto a small number of threads, but what about the stack requirements of those goroutines?

Gocon-2014-42

Instead of using guard pages, the Go compiler inserts a check as part of every function call to check if there is sufficient stack for the function to run. If there is not, the runtime can allocate more stack space. Because of this check, a goroutines initial stack can be made much smaller, which in turn permits Go programmers to treat goroutines as cheap resources.

This is a slide that shows how stacks are managed in Go 1.2.

Gocon-2014-43

When G calls to H there is not enough space for H to run, so the runtime allocates a new stack frame from the heap, then runs H on that new stack segment. When H returns, the stack area is returned to the heap before returning to G.

This method of managing the stack works well in general, but for certain types of code, usually recursive code, it can cause the inner loop of your program to straddle one of these stack boundaries.

For example, in the inner loop of your program, function G may call H many times in a loop. Each time this will cause a stack split. This is known as the hot split problem.

Gocon-2014-44

To solve hot splits, Go 1.3 has adopted a new stack management method.

Instead of adding and removing additional stack segments, if the stack of a goroutine is too small, a new, larger, stack will be allocated. The old stack’s contents are copied to the new stack, then the goroutine continues with its new larger stack. After the first call to H the stack will be large enough that the check for available stack space will always succeed. This resolves the hot split problem.

Gocon-2014-45

Values, Inlining, Escape Analysis, Goroutines, and segmented/copying stacks. These are the five features that I chose to speak about today, but they are by no means the only things that makes Go a fast programming language, just as there more that three reasons that people cite as their reason to learn Go.

Gocon-2014-46

As powerful as these five features are individually, they do not exist in isolation.

For example, the way the runtime multiplexes goroutines onto threads would not be nearly as efficient without growable stacks. Inlining reduces the cost of the stack size check by combining smaller functions into larger ones. Escape analysis reduces the pressure on the garbage collector by automatically moving allocations from the heap to the stack. Escape analysis is also provides better cache locality. Without growable stacks, escape analysis might place too much pressure on the stack.

 

原文链接 发表于2014/06/07