Introduction to coroutine

最新推荐文章于 2025-03-06 00:58:19 发布

delphiwcdj

最新推荐文章于 2025-03-06 00:58:19 发布

阅读量1.5k

点赞数

分类专栏： C/C++ GNU/Linux Coroutine Go in Action

本文链接：https://blog.csdn.net/delphiwcdj/article/details/52792944

版权

C/C++ 同时被 3 个专栏收录

194 篇文章

订阅专栏

GNU/Linux

114 篇文章

订阅专栏

Go in Action

14 篇文章

订阅专栏

本文介绍了协程的概念，对比了不同类型的线程和切换方式，详细讲解了Go语言中goroutine的调度器设计，包括系统调用、工作窃取等策略，并探讨了协程的优缺点及其在实际应用中的优化。

摘要生成于 C知道，由 DeepSeek-R1 满血版支持，前往体验 >

导言：本文是在小组内的一个分享，介绍协程实现的几种方法和优化策略，对比GoLang中goroutine实现方式及调度器的设计，与常见后台服务器设计模式对比，使用协程的优劣分析。

some questions

Q1: multitasking ?

time-sharing (1960s，voluntarily/hardware interrupt to relinquish the CPU)
real-time (real-time constraint)

Multiple processes（heavyweight）are allowed to share processors (CPUs) and other system resources.
Threads（lightweight）are scheduled preemptively and are described as lightweight processes because switching between threads does not involve changing the memory context. It can make use of machines with multiple processors.
other ?

Q2: context switch ?

Most commonly, within some scheduling scheme, one process must be switched out of the CPU so another process can run. This context switch can be triggered by the process making itself unrunnable, such as by waiting for an I/O or synchronization operation to complete. On a pre-emptive multitasking system, the scheduler may also switch out processes which are still runnable. To prevent other processes from being starved of CPU time, preemptive schedulers often configure a timer interrupt to fire when a process exceeds its time slice. This interrupt ensures that the scheduler will gain control to perform a context switch.

In computing, a context switch is the process of storing and restoring the state (more specifically, the execution context) of a process or thread so that execution can be resumed from the same point at a later time. This enables multiple processes to share a single CPU and is an essential feature of a multitasking operating system.

thread switch/process switch/task switch
mode switch (switching between user mode and kernel mode)
register switch
stack frame switch
address space switch (changing virtual memory to physical memory map)

Q3: the cost of context switches ?

Context switches are usually computationally intensive. Switching from one process to another requires a certain amount of time for doing the administration – saving and loading registers and memory maps, updating various tables and lists, etc. For example, in the Linux kernel, context switching involves switching registers, stack pointer, and program counter.

Q4: context switching can be selective and store only those registers that need storing ?

context switching between threads in the same process is typically faster than context switching between processes. Thread switching is relatively cheap: it requires a context switch (saving and restoring registers and stack pointer), but does not change virtual memory and is thus cache-friendly (leaving TLB valid).
a context switch between threads requires system calls (involving the OS kernel), which can cost more than thousand CPU cycles on x86 CPUs. By contrast, transferring control among them requires only fewer than hundred CPU cycles because it does not involve system calls as it is done within a single thread.
Making coroutines fast (reuse of stacks and a lightweight swapcontext implementation.)

Q5: pre-emptive multitasking (time slice/instruction cycle) VS co-operative multitasking (non-preemptive multitasking) ?

Preemptive multitasking involves the use of an interrupt mechanism which suspends the currently executing process and invokes a scheduler to determine which process should execute next. Therefore, all processes will get some amount of CPU time at any given time.

As a cooperatively multitasked system relies on each process regularly giving up time to other processes on the system, one poorly designed program can consume all of the CPU time for itself, either by performing extensive calculations or by busy waiting; both would cause the whole system to hang. In a server environment, this is a hazard that makes the entire environment unacceptably fragile. A cooperative multitasking system wherein processes or tasks must be explicitly programmed to yield when they do not need system resources.

user thread

Coroutines（a language-level construct）are computer program components that generalize subroutines for nonpreemptive multitasking, by allowing multiple entry points for suspending and resuming execution at certain locations. Coroutines are well-suited for implementing more familiar program components such as cooperative tasks, exceptions, event loop, iterators, infinite lists and pipes.

According to Donald Knuth, the term coroutine was coined by Melvin Conway in 1958. Coroutines originated as an assembly language method, but are supported in some high-level programming languages. Early examples include Simula(1965) and Modula-2(1978). More recent examples are Ruby(1995), Lua(1993), and Go(2009).

var q := new queue

coroutine produce
    loop
        while q is not full
            create some new items
            add the items to q
        yield to consume

coroutine consume
    loop
        while q is not empty
            remove some items from q
            use the items
        yield to produce

yield：In computer science, yield is an action that occurs in a computer program during multithreading, of forcing a processor to relinquish control of the current running thread, and sending it to the end of the running queue, of the same scheduling priority.

switch-case

Stack-based coroutines
Coroutines in C

// have ten successive calls to the function return the numbers 0 through 9. 
#include <stdio.h>

int function(void) 
{
    static int i = 0;
    for (; i < 10; ) {
        return i++;
    }

    return 0;
}

int function2(void) {
    static int i, state = 0;
    switch (state) {
    case 0: goto LABEL0;
    case 1: goto LABEL1;
    }
LABEL0: /* start of function */
    for (i = 0; i < 10; i++) {
        state = 1; /* so we will come back to LABEL1 */
        return i;
LABEL1:; /* resume control straight after the return */
    }

    return -1;
}

int function3(void) 
{
    static int i, state = 0;
    switch (state) {
    case 0: /* start of function */
        for (i = 0; i < 10; i++) {
            state = 1; /* so we will come back to "case 1" */
            return i;
        case 1:; /* resume control straight after the return */
        }
    }

    return -1;
}

int function4(void) 
{
    static int i, state = 0;
    switch (state) {
    case 0: /* start of function */
        for (i = 0; i < 10; i++) {
            state = __LINE__ + 2; /* so we will come back to "case __LINE__" */
            return i;
        case __LINE__:; /* resume control straight after the return */
        }
    }

    return -1;
}

#define Begin() static int state=0; switch(state) { case 0:
#define Yield(x) do { state=__LINE__; return x; case __LINE__:; } while (0)
#define End() }
int function5(void) 
{
    static int i;
    Begin();
    for (i = 0; i < 10; i++)
        Yield(i);
    End();

    return -1;
}

int main(int argc, char *argv[])
{
    printf("function() return [%d]\n", function());
    printf("function() return [%d]\n", function());

    printf("function2() return [%d]\n", function2());
    printf("function2() return [%d]\n", function2());

    printf("function3() return [%d]\n", function3());
    printf("function3() return [%d]\n", function3());

    printf("function4() return [%d]\n", function4());
    printf("function4() return [%d]\n", function4());

    printf("function5() return [%d]\n", function5());
    printf("function5() return [%d]\n", function5());

    return 0;
}

利用switch-case的分支跳转特性，以及预编译的__LINE__宏，实现了一种隐式状态机，最终实现了yield语义。

non-local jump

早期C语言标准不支持嵌套函数声明（Boost.Lambda和C++11的lambda已经支持），因此goto只能完成在同一个函数中的跳转。而setjmp()/longjmp()可以从嵌套函数中跳出。

#include <stdio.h>
#include <stdlib.h>
#include <setjmp.h>

static jmp_buf env;

static void func(int rvar, int vvar)
{
    printf("func(): rvar[%d] vvar[%d]\n", rvar, vvar);
    longjmp(env, 1);
}

int main(int argc, char *argv[])
{
    register int rvar = 1;          
    volatile int vvar = 2;          

    if (setjmp(env) == 0) {
        rvar = 3;
        vvar = 4;
        func(rvar, vvar);

    } else {
        printf("after longjmp(): rvar[%d] vvar[%d]\n", rvar, vvar);
    }

    return 0;
}
/* output: ?
 * func(): rvar[3] vvar[4]
 * after longjmp(): rvar[3] vvar[4]
 */

在调用setjmp()时，env会保存当前进程的程序计数器（PC）、栈指针寄存器（SP）以及进程的其他信息，这些信息可以保证longjump()调用完成后从初始的位置继续执行。但是因为setjmp()实现无法保证保存所有寄存器和临时栈信息，因此只能用于简单的语境中。

C99 defines setjmp()/longjmp() to provide non-local jumps but it does not require that longjmp() preserves the current stack frame. Therefore, jumping into a function which was exited via a call to longjmp() is undefined.

context control

C library functions(ucontext.h)提供了四个函数（setcontext/getcontext/makecontext/swapcontext）实现上下文的控制。

typedef struct ucontext {
    unsigned long int uc_flags;
    struct ucontext   *uc_link;
    stack_t           uc_stack;
    mcontext_t        uc_mcontext;
    sigset_t          uc_sigmask;
    ...
} ucontext_t;

// example
char stack[SIGSTKSZ];
getcontext(&context);
context.uc_link          = &main_context;
context.uc_stack.ss_sp   = stack;
context.uc_stack.ss_size = sizeof(stack);
// void makecontext(ucontext_t *ucp, void (*func)(), int argc, ...);
makecontext(&context, (void (*)(void)) func, 2, &context, &main_context2);

#include <stdio.h>
#include <ucontext.h>
#include <unistd.h>

int main(int argc, const char *argv[]){
    ucontext_t context;

    getcontext(&context);
    puts("Hello world");
    sleep(1);
    setcontext(&context);

    return 0;
}
/* output: ?
 *
 */

Since POSIX.1-2003 ucontext_t is deprecated and was removed in POSIX.1-2008. POSIX Threads recommended.

The third argument of makecontext() specifies the number of integer arguments that follow which will require function pointer cast if func will accept those arguments which is undefined in C99. The arguments in the var-arg list are required to be integers, passing pointers in var-arg list is not guaranteed to work, especially it will fail for architectures where pointers are larger than integers.

Boost.Context

Boost.Context is a foundational library that provides a sort of cooperative multitasking on a single thread.

Boost.Context must be built for the particular compiler(s) and CPU architecture(s)s being targeted. Boost.Context includes assembly code and, therefore, requires GNU AS for supported POSIX systems, MASM for Windows/x86 systems and ARMasm for Windows/arm systems.

// Each instance of fcontext_t represents a context (CPU registers and stack space). Together with its related functions jump_fcontext() and make_fcontext() it provides a execution control transfer mechanism similar interface like ucontext_t. fcontext_t and its functions are located in boost::context and the functions are declared as extern "C".  
#include <boost/context/all.hpp>

struct stack_t
{
    void        *sp;
    std::size_t size;
};

struct fcontext_t
{
    < platform specific >

    stack_t  fc_stack;
};

intptr_t jump_fcontext( fcontext_t * ofc, fcontext_t const* nfc, intptr_t vp, bool preserve_fpu = true);
fcontext_t * make_fcontext( void * sp, std::size_t size, void(* fn)(intptr_t) );

Performance of context switch

Platform	ucontext_t	fcontext_t with fpu (Floating-point unit)	fcontext_t without fpu	boost::function
AMD Athlon 64 DualCore 4400+ (32bit Linux)	846 cycles	65 cycles	43 cycles	15 cycles
Intel Core2 Q6700 (64bit Linux)	1481 cycles	172 cycles	63 cycles	25 cycles

ucontext_t preserves signal mask between context switches which involves system calls consuming a lot of CPU cycles (ucontext_t is slower by perfomance_link[factor 13x] relative to fcontext_t).

goroutine

In Go, each concurrently executing activity is called a goroutine.

f();      // call f() wait for it to return
go f();   // create a new goroutine that calls f() don't wait

Go scheduler

Why create a userspace scheduler when the operating system can schedule threads for you? The Go scheduler (Go 1.1 is the new scheduler)

model	des	traits
N:1	several userspace threads are run on one OS thread	This has the advantage of being very quick to context switch but cannot take advantage of multi-core systems
1:1	one thread of execution matches one OS thread	It takes advantage of all of the cores on the machine, but context switching is slow because it has to trap through the OS
M:N	It schedules an arbitrary number of `goroutines` onto an arbitrary number of OS threads	Get quick context switches and take advantage of all the cores in your system (The main `disadvantage` of this approach is the complexity it adds to the scheduler)

Here are 2 threads (M), each holding a context (P), each running a goroutine (G). In order to run goroutines, a thread must hold a context. The number of contexts is set on startup to the value of the GOMAXPROCS environment variable or through the runtime function GOMAXPROCS().
这里写图片描述

syscall

a thread giving up its context so that another thread can run it. The scheduler makes sure there are enough threads to run all contexts.
这里写图片描述

Stealing work

When a context runs out, it will try to steal about half of the runqueue from another context. This makes sure there is always work to do on each of the contexts, which in turn makes sure that all threads are working at their maximum capacity.
这里写图片描述

Difference

traits	process/thread	goroutine
stacks	a fixed-size block of memroy, often as large as 2MB.	growable stacks, not fixed, typically 2KB, it grows and shrinks as needed.
scheduling	OS kernel scheduler, every few milliseconds, a hardware timer interrupts the processor, which causes a kernel function called the scheduler to be invoked.	the Go runtime contains its own scheduler that uses a technique known as m:n scheduling, because it schedules m goroutines on n OS threads. The Go scheduler is not invoked periodically by a hardware timer, but implicitly by certain Go language constructs.
GOMAXPROCS	NULL	determine how many OS threads may be actively executing Go code simultaneously.
identity	thread-local storage	a simpler style of programming, explicit

go 1.2之前，goroutine只会在两种情况下被调度: runtime.GoSched()和systemcall。但是从1.2开始，为了避免饿死其它goroutine，在发生任意函数调用的时候，都有机会触发scheduler。所以从1.2开始如果goroutine中是纯计算，没有任何系统调用，scheduler仍然有机会介入，不会永远独占CPU。(Refer: preemptive scheduling)

TDF

TDF是腾讯内部的一个基于协程的自研后台开发框架，使用ucontext，N:1模型。

ready_list = [A, B, C]
while (!ready_list.empty()) {
    for co in ready_list {
        co.process()
        if (co.finished())
        ready_list.remove(co)
    }
}

内部协程调度模型
这里写图片描述

内部事件触发流程
这里写图片描述

协程优化
Just using an off-the-shelf library like libcoroutine isn’t as fast as you might think. The improvement comes from two optimizations: reuse of stacks and a lightweight swapcontext implementation. (Refer: Making coroutines fast)

这里写图片描述

conclusion

Scheduling can be done at the kernel level or user level, and multitasking can be done preemptively or cooperatively.

type	switch mode	scheduling	concept
process/thread	kernel level	preemptive	stack-based structure, one function will be caller and the other will be callee
coroutine	user level	cooperative	stop thinking of one process as the caller and the other as the callee, resume control from just after the return statement

同步等待处理模型

这里写图片描述
* 优点：可以按照人的正常思维实现业务逻辑，开发效率高。
* 缺点：同步接口会导致排队和毛刺现象，并发处理能力弱。

异步多路复用处理模型

这里写图片描述
* 优点：并发处理能力强。
* 缺点：需要处理各种回调，业务代码实现复杂，开发效率低。

协程(同步编程异步执行)处理模型

进程和线程是抢占式切换，不知道会在什么时刻发生切换，操作系统为了切换到另外的线程再切回到原来的位置，需要保存大部分的寄存器和其他信息。而协程在网络IO函数里发现可能堵塞的情况下进行主动切换，只需要保存函数调用路径中可能发生变化的寄存器（ESP、EIP以及callee-saved寄存器），因此协程之间的切换更高效。

优点：
- without race conditions
- light-weight context switch
- understandable (First do this, then do that)
缺点：
- responsiveness (can not use blocking system calls)
- coroutine crashes a process
- explicit scheduling/starving problem (Preemptive multithreading is generally considered the superior approach, as it allows the operating system to determine when a context switch should occur.)