ThinkChat2.0新版上线,更智能更精彩,支持会话、画图、阅读、搜索等,送10W Token,即刻开启你的AI之旅 广告
# 三十二 ostream 继续以一个hello world程序为例,但是这次使用ostream ``` #include <iostream> int main() { std::cout << "Hello, world!\n"; } ``` 几乎所有关于c++的书籍都会提到&lt;&lt;操作支持很多数据类型。这些支持是在ostream中完成的。通过反汇编之后的代码,可以看到ostream中的&lt;&lt;操作被调用: ``` $SG37112 DB ’Hello, world!’, 0aH, 00H _main PROC push OFFSET $SG37112 push OFFSET ?cout@std@@3V?$basic_ostream@DU?$char_traits@D@std@@@1@A ; std::cout call ??$?6U?$char_traits@D@std@@@std@@YAAAV?$basic_ostream@DU? $char_traits@D@std@@@0@AAV10@PBD@Z ; std::operator<<<std::char_traits<char> > add esp, 8 xor eax, eax ret 0 _main ENDP ``` 对示例程序做如下修改: ``` #include <iostream> int main() { std::cout << "Hello, " << "world!\n"; } ``` 同样的,从许多C++书籍中可以知道,ostream的输出操作的运算结果可以用作下一次输出(即ostream的输出操作返回ostream对象)。 ``` $SG37112 DB ’world!’, 0aH, 00H $SG37113 DB ’Hello, ’, 00H _main PROC push OFFSET $SG37113 ; ’Hello, ’ push OFFSET ?cout@std@@3V?$basic_ostream@DU?$char_traits@D@std@@@1@A ; std::cout call ??$?6U?$char_traits@D@std@@@std@@YAAAV?$basic_ostream@DU? $char_traits@D@std@@@0@AAV10@PBD@Z ; std::operator<<<std::char_traits<char> > add esp, 8 push OFFSET $SG37112 ; ’world!’ push eax ; result of previous function call ??$?6U?$char_traits@D@std@@@std@@YAAAV?$basic_ostream@DU? $char_traits@D@std@@@0@AAV10@PBD@Z ; std::operator<<<std::char_traits<char> > add esp, 8 xor eax, eax ret 0 _main ENDP ``` 如果用函数f()替换&lt;&lt;运算符,示例代码等价于: ``` f(f(std::cout, "Hello, "), "world!") ``` 通过GCC生成的代码和MSVC的代码基本相同。 引用: 在c++中,引用和指针一样,但是使用的时候更安全,因为在使用引用的时候几乎不会发生错误。例如,引用必须始终指向一个相应类型的对象,而不能为NULL。甚至于,引用不能被改变,不能将一个对象的引用重新赋值以指向另一个对象。 如果我们尝试修改指针的例子(9),将指针替换为引用: ``` void f2 (int x, int y, int & sum, int & product) { sum=x+y; product=x*y; }; ``` 可以想到,编译后的代码和使用指针生成的代码一致。 ``` _x$ = 8 ; size = 4 _y$ = 12 ; size = 4 _sum$ = 16 ; size = 4 _product$ = 20 ; size = 4 ?f2@@YAXHHAAH0@Z PROC ; f2 mov ecx, DWORD PTR _y$[esp-4] mov eax, DWORD PTR _x$[esp-4] lea edx, DWORD PTR [eax+ecx] imul eax, ecx mov ecx, DWORD PTR _product$[esp-4] push esi mov esi, DWORD PTR _sum$[esp] mov DWORD PTR [esi], edx mov DWORD PTR [ecx], eax pop esi ret 0 ?f2@@YAXHHAAH0@Z ENDP ; f2 ``` ## STL 注意:本章所有这些例子只在32位环境下进行了验证,没有在x64环境下验证 ``` std::string ``` 内部实现 许多string库的实现结构包含一个指向字符串缓冲区的指针,一个包含当前字符串长度的变量以及一个表示当前字符串缓冲区大小的变量。为了能够将缓冲区指针传递给使用ASCII字符串的函数,通常string缓冲区中的字符串以0结尾。 C++标准中没有规定std::string应该如何实现,因此通常是按照上述方式实现的。 按照规定,std::string应该是一个模板而不是类,以便能够支持不同的字符类型,如char、wchar_t等。 对于std::string,MSVC和GCC中的内部实现存在差异,下面依次进行说明 ## MSVC MSVC的实现中,字符串存储在适当的位置,不一定位于指针指向的缓冲区(如果字符串的长度小于16个字符)。 这意味着短的字符串在32位环境下至少占据16+4+4=24字节的空间,在64位环境下至少占据16+8+8=32字节,当字符串长度大于16字符时,相应的需要增加字符串自身的长度。 ``` #include <string> #include <stdio.h> struct std_string { union { char buf[16]; char* ptr; } u; size_t size; // AKA ’Mysize’ in MSVC size_t capacity; // AKA ’Myres’ in MSVC }; void dump_std_string(std::string s) { struct std_string *p=(struct std_string*)&s; printf ("[%s] size:%d capacity:%d\n", p->size>16 ? p->u.ptr : p->u.buf, p->size, p-> capacity); }; int main() { std::string s1="short string"; std::string s2="string longer that 16 bytes"; dump_std_string(s1); dump_std_string(s2); // that works without using c_str() printf ("%s\n", &s1); printf ("%s\n", s2); }; ``` 通过源代码可以清晰的看到这些。 如果字符串长度小于16个符号,存储字符的缓冲区不需要在堆上分配。实际上非常适宜这样做,因为大量的字符串确实都较短。显然,微软的开发人员认为16个字符是好的临界点。 在main函数尾部,虽然没有使用c_str()方法,但是如果编译运行上面的代码,所有字符串都将打印在控制台上。 当字符串的长度小于16个字符时,存储字符串的缓冲区位于std::string对象的开始位置,printf函数将指针当做指向以0结尾的字符数组,因此上述代码可以正常运行。 第二个超过16字符的字符串的打印方式比较危险,通常程序员犯的错误是忘记写c_str()。这在很长的一段时间不会引起人的注意,直到一个很长的字符串出现,然后程序崩溃。而上述代码可以工作,因为指向字符串缓冲区的指针位于结构体的开始。 ## GCC GCC的实现中,增加了一个引用计数, 一个有趣的事实是一个指向std::string类实例的指针并不是指向结构体的起始位置,而是指向缓冲区的指针,在libstdc++-v3\include\bits\basic_string.h,中我们可以看到这主要是为了方便调试。 &gt; The reason you want _M_data pointing to the character %array and not the _Rep is so that the debugger can see the string contents. (Probably we should add a non-inline member to get the _Rep for the debugger to use, so users can check the actual string length.) 在我的例子中将考虑这一点: ``` #include <string> #include <stdio.h> struct std_string { size_t length; size_t capacity; size_t refcount; }; void dump_std_string(std::string s) { char *p1=*(char**)&s; // GCC type checking workaround struct std_string *p2=(struct std_string*)(p1-sizeof(struct std_string)); printf ("[%s] size:%d capacity:%d\n", p1, p2->length, p2->capacity); }; int main() { std::string s1="short string"; std::string s2="string longer that 16 bytes"; dump_std_string(s1); dump_std_string(s2); // GCC type checking workaround: printf ("%s\n", *(char**)&s1); printf ("%s\n", *(char**)&s2); }; ``` 由于GCC有较强的类型检查,因此需要技巧来隐藏类似之前的错误,即使不使用c_str(),printf也能够正常工作。 更复杂的例子 ``` #include <string> #include <stdio.h> int main() { std::string s1="Hello, "; std::string s2="world!\n"; std::string s3=s1+s2; printf ("%s\n", s3.c_str()); } $SG39512 DB ’Hello, ’, 00H $SG39514 DB ’world!’, 0aH, 00H $SG39581 DB ’%s’, 0aH, 00H _s2$ = -72 ; size = 24 _s3$ = -48 ; size = 24 _s1$ = -24 ; size = 24 _main PROC sub esp, 72 ; 00000048H push 7 push OFFSET $SG39512 lea ecx, DWORD PTR _s1$[esp+80] mov DWORD PTR _s1$[esp+100], 15 ; 0000000fH mov DWORD PTR _s1$[esp+96], 0 mov BYTE PTR _s1$[esp+80], 0 call ?assign@?$basic_string@DU?$char_traits@D@std@@V? $allocator@D@2@@std@@QAEAAV12@PBDI@Z ; std::basic_string<char,std::char_traits<char>,std:: allocator<char> >::assign push 7 push OFFSET $SG39514 lea ecx, DWORD PTR _s2$[esp+80] mov DWORD PTR _s2$[esp+100], 15 ; 0000000fH mov DWORD PTR _s2$[esp+96], 0 mov BYTE PTR _s2$[esp+80], 0 call ?assign@?$basic_string@DU?$char_traits@D@std@@V? $allocator@D@2@@std@@QAEAAV12@PBDI@Z ; std::basic_string<char,std::char_traits<char>,std:: allocator<char> >::assign lea eax, DWORD PTR _s2$[esp+72] push eax lea eax, DWORD PTR _s1$[esp+76] push eax lea eax, DWORD PTR _s3$[esp+80] push eax call ??$?HDU?$char_traits@D@std@@V?$allocator@D@1@@std@@YA?AV?$basic_string@DU? $char_traits@D@std@@V?$allocator@D@2@@0@ABV10@0@Z ; std::operator+<char,std::char_traits<char >,std::allocator<char> > ; inlined c_str() method: cmp DWORD PTR _s3$[esp+104], 16 ; 00000010H lea eax, DWORD PTR _s3$[esp+84] cmovae eax, DWORD PTR _s3$[esp+84] push eax push OFFSET $SG39581 call _printf add esp, 20 ; 00000014H cmp DWORD PTR _s3$[esp+92], 16 ; 00000010H jb SHORT $LN119@main push DWORD PTR _s3$[esp+72] call ??3@YAXPAX@Z ; operator delete add esp, 4 $LN119@main: cmp DWORD PTR _s2$[esp+92], 16 ; 00000010H mov DWORD PTR _s3$[esp+92], 15 ; 0000000fH mov DWORD PTR _s3$[esp+88], 0 mov BYTE PTR _s3$[esp+72], 0 jb SHORT $LN151@main push DWORD PTR _s2$[esp+72] call ??3@YAXPAX@Z ; operator delete add esp, 4 $LN151@main: cmp DWORD PTR _s1$[esp+92], 16 ; 00000010H mov DWORD PTR _s2$[esp+92], 15 ; 0000000fH mov DWORD PTR _s2$[esp+88], 0 mov BYTE PTR _s2$[esp+72], 0 jb SHORT $LN195@main push DWORD PTR _s1$[esp+72] call ??3@YAXPAX@Z ; operator delete add esp, 4 $LN195@main: xor eax, eax add esp, 72 ; 00000048H ret 0 _main ENDP ``` 编译器并不是静态构造string对象,存储数据的缓冲区是否一定要在堆中呢?通常以0结尾的ASCII字符串存储在数据节中,然后运行时,通过赋值方法完成s1和s2两个string对象的构造。通过+操作符,s3完成string对象的构造。 可以注意到上述代码中并没有c_str()方法的调用,这是因为由于函数太小,编译器将其内联了,如果一个字符串小于16个字符,eax寄存器中存放指向缓冲区的指针,否则,存放指向堆中字符串缓冲区的指针。 然后,我们看到了三个析构函数的调用,当字符串长度超过16字符时,析构函数将被调用,在堆中的缓冲区会被释放。此外,由于三个std::string对象都存储在栈中,当函数结束时,他们将被自动释放。 可以得到一个结论,短的字符串对象处理起来更快,因为堆访问操作较少。 GCC生成的代码甚至更简单(正如我之前提到的,GCC并不将短的字符串存储在结构体中) ``` .LC0: .string "Hello, " .LC1: .string "world!\n" main: push ebp mov ebp, esp push edi push esi push ebx and esp, -16 sub esp, 32 lea ebx, [esp+28] lea edi, [esp+20] mov DWORD PTR [esp+8], ebx lea esi, [esp+24] mov DWORD PTR [esp+4], OFFSET FLAT:.LC0 mov DWORD PTR [esp], edi call _ZNSsC1EPKcRKSaIcE mov DWORD PTR [esp+8], ebx mov DWORD PTR [esp+4], OFFSET FLAT:.LC1 mov DWORD PTR [esp], esi call _ZNSsC1EPKcRKSaIcE mov DWORD PTR [esp+4], edi mov DWORD PTR [esp], ebx call _ZNSsC1ERKSs mov DWORD PTR [esp+4], esi mov DWORD PTR [esp], ebx call _ZNSs6appendERKSs ; inlined c_str(): mov eax, DWORD PTR [esp+28] mov DWORD PTR [esp], eax call puts mov eax, DWORD PTR [esp+28] lea ebx, [esp+19] mov DWORD PTR [esp+4], ebx sub eax, 12 mov DWORD PTR [esp], eax call _ZNSs4_Rep10_M_disposeERKSaIcE mov eax, DWORD PTR [esp+24] mov DWORD PTR [esp+4], ebx sub eax, 12 mov DWORD PTR [esp], eax call _ZNSs4_Rep10_M_disposeERKSaIcE mov eax, DWORD PTR [esp+20] mov DWORD PTR [esp+4], ebx sub eax, 12 mov DWORD PTR [esp], eax call _ZNSs4_Rep10_M_disposeERKSaIcE lea esp, [ebp-12] xor eax, eax pop ebx pop esi pop edi pop ebp ret ``` 可以看到传递给析构函数的并不是一个对象的指针,而是在对象所在位置的前12个字节的位置,也就是结构体的真正起始位置。 ## std::string 作为全局变量使用 有经验的c++程序员会说,可以定义一个STL类型的全局变量。 是的,确实如此 ``` #include <stdio.h> #include <string> std::string s="a string"; int main() { printf ("%s\n", s.c_str()); }; MSVC: $SG39512 DB ’a string’, 00H $SG39519 DB ’%s’, 0aH, 00H _main PROC cmp DWORD PTR ?s@@3V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@A +20, 16 ; 00000010H mov eax, OFFSET ?s@@3V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@A ; s cmovae eax, DWORD PTR ?s@@3V?$basic_string@DU?$char_traits@D@std@@V? $allocator@D@2@@std@@A push eax push OFFSET $SG39519 call _printf add esp, 8 xor eax, eax ret 0 _main ENDP ??__Es@@YAXXZ PROC ; ‘dynamic initializer for ’s’’, COMDAT push 8 push OFFSET $SG39512 mov ecx, OFFSET ?s@@3V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@A ; s call ?assign@?$basic_string@DU?$char_traits@D@std@@V? $allocator@D@2@@std@@QAEAAV12@PBDI@Z ; std::basic_string<char,std::char_traits<char>,std:: allocator<char> >::assign push OFFSET ??__Fs@@YAXXZ ; ‘dynamic atexit destructor for ’s’’ call _atexit pop ecx ret 0 ??__Es@@YAXXZ ENDP ; ‘dynamic initializer for ’s’’ ??__Fs@@YAXXZ PROC ; ‘dynamic atexit destructor for ’s’’, COMDAT push ecx cmp DWORD PTR ?s@@3V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@A +20, 16 ; 00000010H jb SHORT $LN23@dynamic push esi mov esi, DWORD PTR ?s@@3V?$basic_string@DU?$char_traits@D@std@@V? $allocator@D@2@@std@@A lea ecx, DWORD PTR $T2[esp+8] call ??0?$_Wrap_alloc@V?$allocator@D@std@@@std@@QAE@XZ ; std::_Wrap_alloc<std:: allocator<char> >::_Wrap_alloc<std::allocator<char> > push OFFSET ?s@@3V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@A ; s lea ecx, DWORD PTR $T2[esp+12] call ??$destroy@PAD@?$_Wrap_alloc@V?$allocator@D@std@@@std@@QAEXPAPAD@Z ; std:: _Wrap_alloc<std::allocator<char> >::destroy<char *> lea ecx, DWORD PTR $T1[esp+8] call ??0?$_Wrap_alloc@V?$allocator@D@std@@@std@@QAE@XZ ; std::_Wrap_alloc<std:: allocator<char> >::_Wrap_alloc<std::allocator<char> > push esi call ??3@YAXPAX@Z ; operator delete add esp, 4 pop esi $LN23@dynamic: mov DWORD PTR ?s@@3V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@A +20, 15 ; 0000000fH mov DWORD PTR ?s@@3V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@A +16, 0 mov BYTE PTR ?s@@3V?$basic_string@DU?$char_traits@D@std@@V?$allocator@D@2@@std@@A, 0 pop ecx ret 0 ??__Fs@@YAXXZ ENDP ; ‘dynamic atexit destructor for ’s’’ ``` 实际上,main函数之前,CRT中将调用一个特殊的函数调用所有全局变量的构造函数。除此之外,CRT中还将通过atexit()注册另一个函数,该函数中将调用全局变量的析构函数。 GCC生成的代码如下: ``` main: push ebp mov ebp, esp and esp, -16 sub esp, 16 mov eax, DWORD PTR s mov DWORD PTR [esp], eax call puts xor eax, eax leave ret .LC0: .string "a string" _GLOBAL__sub_I_s: sub esp, 44 lea eax, [esp+31] mov DWORD PTR [esp+8], eax mov DWORD PTR [esp+4], OFFSET FLAT:.LC0 mov DWORD PTR [esp], OFFSET FLAT:s call _ZNSsC1EPKcRKSaIcE mov DWORD PTR [esp+8], OFFSET FLAT:__dso_handle mov DWORD PTR [esp+4], OFFSET FLAT:s mov DWORD PTR [esp], OFFSET FLAT:_ZNSsD1Ev call __cxa_atexit add esp, 44 ret .LFE645: .size _GLOBAL__sub_I_s, .-_GLOBAL__sub_I_s .section .init_array,"aw" .align 4 .long _GLOBAL__sub_I_s .globl s .bss .align 4 .type s, @object .size s, 4 s: .zero 4 .hidden __dso_handle ``` GCC并没有创建单独的函数来玩全局变量的析构,而是依次将全局变量的析构函数传递给atexit()函数 **std::list** std::list是标准的双向链表,每一个元素包含两个指针,分别前一个和后一个元素。 这意味着每个元素的的内存空间大小要相应的扩大2个字。(在32位环境下为8字节,64位环境下为16字节) std::list也是一个循环链表,也就是说最后一个元素包含一个指向第一个元素的指针,反之亦然。 C++ STL只是为你希望做为链表使用的结构体增加前向和后向指针。 下面我们构造一个包含两个简单变量的结构体,将其存储在链表中。 虽然C++标准中并没有规定如何实现list,但是MSVC和GCC的实现比较相似,因此下面仅通过一段源代码来展示。 ``` #include <stdio.h> #include <list> #include <iostream> struct a { int x; int y; }; struct List_node { struct List_node* _Next; struct List_node* _Prev; int x; int y; }; void dump_List_node (struct List_node *n) { printf ("ptr=0x%p _Next=0x%p _Prev=0x%p x=%d y=%d\n", n, n->_Next, n->_Prev, n->x, n->y); }; void dump_List_vals (struct List_node* n) { struct List_node* current=n; for (;;) { dump_List_node (current); current=current->_Next; if (current==n) // end break; }; }; void dump_List_val (unsigned int *a) { #ifdef _MSC_VER // GCC implementation doesn’t have "size" field printf ("_Myhead=0x%p, _Mysize=%d\n", a[0], a[1]); #endif dump_List_vals ((struct List_node*)a[0]); }; int main() { std::list<struct a> l; printf ("* empty list:\n"); dump_List_val((unsigned int*)(void*)&l); struct a t1; t1.x=1; t1.y=2; l.push_front (t1); t1.x=3; t1.y=4; l.push_front (t1); t1.x=5; t1.y=6; l.push_back (t1); printf ("* 3-elements list:\n"); dump_List_val((unsigned int*)(void*)&l); std::list<struct a>::iterator tmp; printf ("node at .begin:\n"); tmp=l.begin(); dump_List_node ((struct List_node *)*(void**)&tmp); printf ("node at .end:\n"); tmp=l.end(); dump_List_node ((struct List_node *)*(void**)&tmp); printf ("* let’s count from the begin:\n"); std::list<struct a>::iterator it=l.begin(); printf ("1st element: %d %d\n", (*it).x, (*it).y); it++; printf ("2nd element: %d %d\n", (*it).x, (*it).y); it++; printf ("3rd element: %d %d\n", (*it).x, (*it).y); it++; printf ("element at .end(): %d %d\n", (*it).x, (*it).y); printf ("* let’s count from the end:\n"); std::list<struct a>::iterator it2=l.end(); printf ("element at .end(): %d %d\n", (*it2).x, (*it2).y); it2--; printf ("3rd element: %d %d\n", (*it2).x, (*it2).y); it2--; printf ("2nd element: %d %d\n", (*it2).x, (*it2).y); it2--; printf ("1st element: %d %d\n", (*it2).x, (*it2).y); printf ("removing last element...\n"); l.pop_back(); dump_List_val((unsigned int*)(void*)&l); }; ``` ### 34.2.1 GCC 我们以GCC为例开始,当我们运行这个例子,可以看到屏幕上打印出了大量输出,我们依次对其进行分析。 ``` * empty list: ptr=0x0028fe90 _Next=0x0028fe90 _Prev=0x0028fe90 x=3 y=0 ``` 我们可以看到一个空的链表,它由一个元素组成,其中x和y中包含有垃圾数据。前向和后向指针均指向自身。 这一时刻的.begin和.end迭代器两者相等。 当我们压入3个元素之后,链表的内部状态将变为: ``` * 3-elements list: ptr=0x000349a0 _Next=0x00034988 _Prev=0x0028fe90 x=3 y=4 ptr=0x00034988 _Next=0x00034b40 _Prev=0x000349a0 x=1 y=2 ptr=0x00034b40 _Next=0x0028fe90 _Prev=0x00034988 x=5 y=6 ptr=0x0028fe90 _Next=0x000349a0 _Prev=0x00034b40 x=5 y=6 ``` 最后一个元素依然位于0x28fe90,只有当链表被销毁的时候它才会移动。x和y中依然包含有垃圾数据。虽然和链表中的最后一个元素中的x和y拥有相同的值,但这并不能说明这些值有意义。 下面这幅图说明了链表中的三个元素如何在内存中存储 变量l始终指向第一个链表节点 .begin()和.end()迭代器不指向任何东西,也并不出现在内存中,但是当适当的方法被调用的时候将返回指向这些节点的指针。(这句话是说.begin()和.end()并不是变量,而是函数,该函数返回迭代器。) 链表中包含一个垃圾元素在链表的实现中是非常流行的方式,如果不包含它,很多操作都将变得复杂和低效。 迭代器实质上是一个指向链表节点的指针。list.begin()和list.end()仅仅返回迭代器。 ``` node at .begin: ptr=0x000349a0 _Next=0x00034988 _Prev=0x0028fe90 x=3 y=4 node at .end: ptr=0x0028fe90 _Next=0x000349a0 _Prev=0x00034b40 x=5 y=6 ``` 实际上链表是循环的非常有用:包含一个指向第一个链表元素的指针,类似l变量,这将有利于快速获取指向最后一个元素的指针,而不必遍历整个链表。这样也有利于快速的在链表尾部插入元素。 --和++操作符只是将当前的迭代器的值设置为current_node-&gt;prev、current_node-&gt;next。反向迭代器只是以相反的方向做相似的工作。 迭代器的*操作符返回指向节点结构指针,也就是用户结构的起始位置,如,指向第一个结构体元素x的指针。 List的插入和删除比较简单,只需要分配新节点同时给指针设置有效值。 这就是元素被删除后,迭代器会失效的原因,它依然指向已经被释放的节点。当然,迭代器指向的被释放的节点的数据是不能再使用的。 GCC的实现中没有存储链表的大小,由于没有其他方式获取这些信息,.size()方法需要遍历整个链表统计元素个数,因此执行速度较慢。这一操作的时间复杂度为O(n),即消耗时间多少和链表中元素个数成正比。 Listing34.7: GCC 4.8.1 -O3 -fno-inline-small-functions ``` main proc near push ebp mov ebp, esp push esi push ebx and esp, 0FFFFFFF0h sub esp, 20h lea ebx, [esp+10h] mov dword ptr [esp], offset s ; "* empty list:" mov [esp+10h], ebx mov [esp+14h], ebx call puts mov [esp], ebx call _Z13dump_List_valPj ; dump_List_val(uint *) lea esi, [esp+18h] mov [esp+4], esi mov [esp], ebx mov dword ptr [esp+18h], 1 ; X for new element mov dword ptr [esp+1Ch], 2 ; Y for new element call _ZNSt4listI1aSaIS0_EE10push_frontERKS0_ ; std::list<a,std::allocator<a >>::push_front(a const&) mov [esp+4], esi mov [esp], ebx mov dword ptr [esp+18h], 3 ; X for new element mov dword ptr [esp+1Ch], 4 ; Y for new element call _ZNSt4listI1aSaIS0_EE10push_frontERKS0_ ; std::list<a,std::allocator<a >>::push_front(a const&) mov dword ptr [esp], 10h mov dword ptr [esp+18h], 5 ; X for new element mov dword ptr [esp+1Ch], 6 ; Y for new element call _Znwj ; operator new(uint) cmp eax, 0FFFFFFF8h jz short loc_80002A6 mov ecx, [esp+1Ch] mov edx, [esp+18h] mov [eax+0Ch], ecx mov [eax+8], edx loc_80002A6: ; CODE XREF: main+86 mov [esp+4], ebx mov [esp], eax call _ZNSt8__detail15_List_node_base7_M_hookEPS0_ ; std::__detail:: _List_node_base::_M_hook(std::__detail::_List_node_base*) mov dword ptr [esp], offset a3ElementsList ; "* 3-elements list:" call puts mov [esp], ebx call _Z13dump_List_valPj ; dump_List_val(uint *) mov dword ptr [esp], offset aNodeAt_begin ; "node at .begin:" call puts mov eax, [esp+10h] mov [esp], eax call _Z14dump_List_nodeP9List_node ; dump_List_node(List_node *) mov dword ptr [esp], offset aNodeAt_end ; "node at .end:" call puts mov [esp], ebx call _Z14dump_List_nodeP9List_node ; dump_List_node(List_node *) mov dword ptr [esp], offset aLetSCountFromT ; "* let’s count from the begin:" call puts mov esi, [esp+10h] mov eax, [esi+0Ch] mov [esp+0Ch], eax mov eax, [esi+8] mov dword ptr [esp+4], offset a1stElementDD ; "1st element: %d %d\n" mov dword ptr [esp], 1 mov [esp+8], eax call __printf_chk mov esi, [esi] ; operator++: get ->next pointer mov eax, [esi+0Ch] mov [esp+0Ch], eax mov eax, [esi+8] mov dword ptr [esp+4], offset a2ndElementDD ; "2nd element: %d %d\n" mov dword ptr [esp], 1 mov [esp+8], eax call __printf_chk mov esi, [esi] ; operator++: get ->next pointer mov eax, [esi+0Ch] mov [esp+0Ch], eax mov eax, [esi+8] mov dword ptr [esp+4], offset a3rdElementDD ; "3rd element: %d %d\n" mov dword ptr [esp], 1 mov [esp+8], eax call __printf_chk mov eax, [esi] ; operator++: get ->next pointer mov edx, [eax+0Ch] mov [esp+0Ch], edx mov eax, [eax+8] mov dword ptr [esp+4], offset aElementAt_endD ; "element at .end(): %d %d\n" mov dword ptr [esp], 1 mov [esp+8], eax call __printf_chk mov dword ptr [esp], offset aLetSCountFro_0 ; "* let’s count from the end:" call puts mov eax, [esp+1Ch] mov dword ptr [esp+4], offset aElementAt_endD ; "element at .end(): %d %d\n" mov dword ptr [esp], 1 mov [esp+0Ch], eax mov eax, [esp+18h] mov [esp+8], eax call __printf_chk mov esi, [esp+14h] mov eax, [esi+0Ch] mov [esp+0Ch], eax mov eax, [esi+8] mov dword ptr [esp+4], offset a3rdElementDD ; "3rd element: %d %d\n" mov dword ptr [esp], 1 mov [esp+8], eax call __printf_chk mov esi, [esi+4] ; operator--: get ->prev pointer mov eax, [esi+0Ch] mov [esp+0Ch], eax mov eax, [esi+8] mov dword ptr [esp+4], offset a2ndElementDD ; "2nd element: %d %d\n" mov dword ptr [esp], 1 mov [esp+8], eax call __printf_chk mov eax, [esi+4] ; operator--: get ->prev pointer mov edx, [eax+0Ch] mov [esp+0Ch], edx mov eax, [eax+8] mov dword ptr [esp+4], offset a1stElementDD ; "1st element: %d %d\n" mov dword ptr [esp], 1 mov [esp+8], eax call __printf_chk mov dword ptr [esp], offset aRemovingLastEl ; "removing last element..." call puts mov esi, [esp+14h] mov [esp], esi call _ZNSt8__detail15_List_node_base9_M_unhookEv ; std::__detail:: _List_node_base::_M_unhook(void) mov [esp], esi ; void * call _ZdlPv ; operator delete(void *) mov [esp], ebx call _Z13dump_List_valPj ; dump_List_val(uint *) mov [esp], ebx call _ZNSt10_List_baseI1aSaIS0_EE8_M_clearEv ; std::_List_base<a,std:: allocator<a>>::_M_clear(void) lea esp, [ebp-8] xor eax, eax pop ebx pop esi pop ebp retn main endp ``` Listing 34.8: 所有输出 ``` * empty list: ptr=0x0028fe90 _Next=0x0028fe90 _Prev=0x0028fe90 x=3 y=0 * 3-elements list: ptr=0x000349a0 _Next=0x00034988 _Prev=0x0028fe90 x=3 y=4 ptr=0x00034988 _Next=0x00034b40 _Prev=0x000349a0 x=1 y=2 ptr=0x00034b40 _Next=0x0028fe90 _Prev=0x00034988 x=5 y=6 ptr=0x0028fe90 _Next=0x000349a0 _Prev=0x00034b40 x=5 y=6 node at .begin: ptr=0x000349a0 _Next=0x00034988 _Prev=0x0028fe90 x=3 y=4 node at .end: ptr=0x0028fe90 _Next=0x000349a0 _Prev=0x00034b40 x=5 y=6 * let’s count from the begin: 1st element: 3 4 2nd element: 1 2 3rd element: 5 6 element at .end(): 5 6 * let’s count from the end: element at .end(): 5 6 3rd element: 5 6 2nd element: 1 2 1st element: 3 4 removing last element... ptr=0x000349a0 _Next=0x00034988 _Prev=0x0028fe90 x=3 y=4 ptr=0x00034988 _Next=0x0028fe90 _Prev=0x000349a0 x=1 y=2 ptr=0x0028fe90 _Next=0x000349a0 _Prev=0x00034988 x=5 y=6 ``` ## 34.2.2 MSVC MSVC实现形式基本相同,但是它存储了当前list的大小。这意味着.size()方法非常迅速,只需要从内存中读取一个值即可。从另一个方面来说,必须在插入和删除操作后更新size变量的值。 MSVC组织节点的方式也稍有不同。 GCC将垃圾元素放在链表的尾部,而MSVC放在链表起始位置。 Listing 34.9: MSVC 2012 /Fa2.asm /Ox /GS- /Ob1 ``` _l$ = -16 ; size = 8 _t1$ = -8 ; size = 8 _main PROC sub esp, 16 ; 00000010H push ebx push esi push edi push 0 push 0 lea ecx, DWORD PTR _l$[esp+36] mov DWORD PTR _l$[esp+40], 0 ; allocate first "garbage" element call ?_Buynode0@?$_List_alloc@$0A@U?$_List_base_types@Ua@@V? $allocator@Ua@@@std@@@std@@@std@@QAEPAU?$_List_node@Ua@@PAX@2@PAU32@0@Z ; std::_List_alloc<0, std::_List_base_types<a,std::allocator<a> > >::_Buynode0 mov edi, DWORD PTR __imp__printf mov ebx, eax push OFFSET $SG40685 ; ’* empty list:’ mov DWORD PTR _l$[esp+32], ebx call edi ; printf lea eax, DWORD PTR _l$[esp+32] push eax call ?dump_List_val@@YAXPAI@Z ; dump_List_val mov esi, DWORD PTR [ebx] add esp, 8 lea eax, DWORD PTR _t1$[esp+28] push eax push DWORD PTR [esi+4] lea ecx, DWORD PTR _l$[esp+36] push esi mov DWORD PTR _t1$[esp+40], 1 ; data for a new node mov DWORD PTR _t1$[esp+44], 2 ; data for a new node ; allocate new node call ??$_Buynode@ABUa@@@?$_List_buy@Ua@@V?$allocator@Ua@@@std@@@std@@QAEPAU? $_List_node@Ua@@PAX@1@PAU21@0ABUa@@@Z ; std::_List_buy<a,std::allocator<a> >::_Buynode<a const &> mov DWORD PTR [esi+4], eax mov ecx, DWORD PTR [eax+4] mov DWORD PTR _t1$[esp+28], 3 ; data for a new node mov DWORD PTR [ecx], eax mov esi, DWORD PTR [ebx] lea eax, DWORD PTR _t1$[esp+28] push eax push DWORD PTR [esi+4] lea ecx, DWORD PTR _l$[esp+36] push esi mov DWORD PTR _t1$[esp+44], 4 ; data for a new node ; allocate new node call ??$_Buynode@ABUa@@@?$_List_buy@Ua@@V?$allocator@Ua@@@std@@@std@@QAEPAU? $_List_node@Ua@@PAX@1@PAU21@0ABUa@@@Z ; std::_List_buy<a,std::allocator<a> >::_Buynode<a const &> mov DWORD PTR [esi+4], eax mov ecx, DWORD PTR [eax+4] mov DWORD PTR _t1$[esp+28], 5 ; data for a new node mov DWORD PTR [ecx], eax lea eax, DWORD PTR _t1$[esp+28] push eax push DWORD PTR [ebx+4] lea ecx, DWORD PTR _l$[esp+36] push ebx mov DWORD PTR _t1$[esp+44], 6 ; data for a new node ; allocate new node call ??$_Buynode@ABUa@@@?$_List_buy@Ua@@V?$allocator@Ua@@@std@@@std@@QAEPAU? $_List_node@Ua@@PAX@1@PAU21@0ABUa@@@Z ; std::_List_buy<a,std::allocator<a> >::_Buynode<a const &> mov DWORD PTR [ebx+4], eax mov ecx, DWORD PTR [eax+4] push OFFSET $SG40689 ; ’* 3-elements list:’ mov DWORD PTR _l$[esp+36], 3 mov DWORD PTR [ecx], eax call edi ; printf lea eax, DWORD PTR _l$[esp+32] push eax call ?dump_List_val@@YAXPAI@Z ; dump_List_val push OFFSET $SG40831 ; ’node at .begin:’ call edi ; printf push DWORD PTR [ebx] ; get next field of node $l$ variable points to call ?dump_List_node@@YAXPAUList_node@@@Z ; dump_List_node push OFFSET $SG40835 ; ’node at .end:’ call edi ; printf push ebx ; pointer to the node $l$ variable points to! call ?dump_List_node@@YAXPAUList_node@@@Z ; dump_List_node push OFFSET $SG40839 ; ’* let’’s count from the begin:’ call edi ; printf mov esi, DWORD PTR [ebx] ; operator++: get ->next pointer push DWORD PTR [esi+12] push DWORD PTR [esi+8] push OFFSET $SG40846 ; ’1st element: %d %d’ call edi ; printf mov esi, DWORD PTR [esi] ; operator++: get ->next pointer push DWORD PTR [esi+12] push DWORD PTR [esi+8] push OFFSET $SG40848 ; ’2nd element: %d %d’ call edi ; printf mov esi, DWORD PTR [esi] ; operator++: get ->next pointer push DWORD PTR [esi+12] push DWORD PTR [esi+8] push OFFSET $SG40850 ; ’3rd element: %d %d’ call edi ; printf mov eax, DWORD PTR [esi] ; operator++: get ->next pointer add esp, 64 ; 00000040H push DWORD PTR [eax+12] push DWORD PTR [eax+8] push OFFSET $SG40852 ; ’element at .end(): %d %d’ call edi ; printf push OFFSET $SG40853 ; ’* let’’s count from the end:’ call edi ; printf push DWORD PTR [ebx+12] ; use x and y fields from the node $l$ variable points to push DWORD PTR [ebx+8] push OFFSET $SG40860 ; ’element at .end(): %d %d’ call edi ; printf mov esi, DWORD PTR [ebx+4] ; operator--: get ->prev pointer push DWORD PTR [esi+12] push DWORD PTR [esi+8] push OFFSET $SG40862 ; ’3rd element: %d %d’ call edi ; printf mov esi, DWORD PTR [esi+4] ; operator--: get ->prev pointer push DWORD PTR [esi+12] push DWORD PTR [esi+8] push OFFSET $SG40864 ; ’2nd element: %d %d’ call edi ; printf mov eax, DWORD PTR [esi+4] ; operator--: get ->prev pointer push DWORD PTR [eax+12] push DWORD PTR [eax+8] push OFFSET $SG40866 ; ’1st element: %d %d’ call edi ; printf add esp, 64 ; 00000040H push OFFSET $SG40867 ; ’removing last element...’ call edi ; printf mov edx, DWORD PTR [ebx+4] add esp, 4 ; prev=next? ; it is the only element, "garbage one"? ; if yes, do not delete it! cmp edx, ebx je SHORT $LN349@main mov ecx, DWORD PTR [edx+4] mov eax, DWORD PTR [edx] mov DWORD PTR [ecx], eax mov ecx, DWORD PTR [edx] mov eax, DWORD PTR [edx+4] push edx mov DWORD PTR [ecx+4], eax call ??3@YAXPAX@Z ; operator delete add esp, 4 mov DWORD PTR _l$[esp+32], 2 $LN349@main: lea eax, DWORD PTR _l$[esp+28] push eax call ?dump_List_val@@YAXPAI@Z ; dump_List_val mov eax, DWORD PTR [ebx] add esp, 4 mov DWORD PTR [ebx], ebx mov DWORD PTR [ebx+4], ebx cmp eax, ebx je SHORT $LN412@main $LL414@main: mov esi, DWORD PTR [eax] push eax call ??3@YAXPAX@Z ; operator delete add esp, 4 mov eax, esi cmp esi, ebx jne SHORT $LL414@main $LN412@main: push ebx call ??3@YAXPAX@Z ; operator delete add esp, 4 xor eax, eax pop edi pop esi pop ebx add esp, 16 ; 00000010H ret 0 _main ENDP ``` 与GCC不同,MSVC代码在函数开始,通过Buynode函数分配垃圾元素,这个方法也用于后续节点的申请(GCC将最早的节点分配在栈上)。 Listing 34.10 完整输出 ``` * empty list: _Myhead=0x003CC258, _Mysize=0 ptr=0x003CC258 _Next=0x003CC258 _Prev=0x003CC258 x=6226002 y=4522072 * 3-elements list: _Myhead=0x003CC258, _Mysize=3 ptr=0x003CC258 _Next=0x003CC288 _Prev=0x003CC2A0 x=6226002 y=4522072 ptr=0x003CC288 _Next=0x003CC270 _Prev=0x003CC258 x=3 y=4 ptr=0x003CC270 _Next=0x003CC2A0 _Prev=0x003CC288 x=1 y=2 ptr=0x003CC2A0 _Next=0x003CC258 _Prev=0x003CC270 x=5 y=6 node at .begin: ptr=0x003CC288 _Next=0x003CC270 _Prev=0x003CC258 x=3 y=4 node at .end: ptr=0x003CC258 _Next=0x003CC288 _Prev=0x003CC2A0 x=6226002 y=4522072 * let’s count from the begin: 1st element: 3 4 2nd element: 1 2 3rd element: 5 6 element at .end(): 6226002 4522072 * let’s count from the end: element at .end(): 6226002 4522072 3rd element: 5 6 2nd element: 1 2 1st element: 3 4 removing last element... _Myhead=0x003CC258, _Mysize=2 ptr=0x003CC258 _Next=0x003CC288 _Prev=0x003CC270 x=6226002 y=4522072 ptr=0x003CC288 _Next=0x003CC270 _Prev=0x003CC258 x=3 y=4 ptr=0x003CC270 _Next=0x003CC258 _Prev=0x003CC288 x=1 y=2 ``` ## 34.2.3 C++ 11 std::forward_list 和std::list相似,但是包含一个指向下一个节点的指针。它所占用的内存空间更少,但是没有提供双向遍历链表的能力。 ## 34.3 std::vector 我将std::vector称作c数组的安全封装。在内部实现上,它和std::string相似,包含指向缓冲区的指针,指向数组尾部的指针,以及指向缓冲区尾部的指针。 std::vector中的元素在内存中连续存放,和通常中的数组一样。在C++11包含一个新方法.data(),它返回指向缓冲区的指针,这和std::string中的.c_str()相同。 在堆中分配的缓冲区大小将超过数组自身大小。 MSVC和GCC的实现相似,只是结构体中元素名稍有不同,因此下面的代码在两种编译器上都能工作。下面是用于转储std::vector结构的类C代码。 ``` #include <stdio.h> #include <vector> #include <algorithm> #include <functional> struct vector_of_ints { // MSVC names: int *Myfirst; int *Mylast; int *Myend; // GCC structure is the same, names are: _M_start, _M_finish, _M_end_of_storage }; void dump(struct vector_of_ints *in) { printf ("_Myfirst=%p, _Mylast=%p, _Myend=%p\n", in->Myfirst, in->Mylast, in->Myend); size_t size=(in->Mylast-in->Myfirst); size_t capacity=(in->Myend-in->Myfirst); printf ("size=%d, capacity=%d\n", size, capacity); for (size_t i=0; i<size; i++) printf ("element %d: %d\n", i, in->Myfirst[i]); }; int main() { std::vector<int> c; dump ((struct vector_of_ints*)(void*)&c); c.push_back(1); dump ((struct vector_of_ints*)(void*)&c); c.push_back(2); dump ((struct vector_of_ints*)(void*)&c); c.push_back(3); dump ((struct vector_of_ints*)(void*)&c); c.push_back(4); dump ((struct vector_of_ints*)(void*)&c); c.reserve (6); dump ((struct vector_of_ints*)(void*)&c); c.push_back(5); dump ((struct vector_of_ints*)(void*)&c); c.push_back(6); dump ((struct vector_of_ints*)(void*)&c); printf ("%d\n", c.at(5)); // bounds checking printf ("%d\n", c[8]); // operator[], no bounds checking }; 如果编译器是MSVC,下面是输出样例。 _Myfirst=00000000, _Mylast=00000000, _Myend=00000000 size=0, capacity=0 _Myfirst=0051CF48, _Mylast=0051CF4C, _Myend=0051CF4C size=1, capacity=1 element 0: 1 _Myfirst=0051CF58, _Mylast=0051CF60, _Myend=0051CF60 size=2, capacity=2 element 0: 1 element 1: 2 _Myfirst=0051C278, _Mylast=0051C284, _Myend=0051C284 size=3, capacity=3 element 0: 1 element 1: 2 element 2: 3 _Myfirst=0051C290, _Mylast=0051C2A0, _Myend=0051C2A0 size=4, capacity=4 element 0: 1 element 1: 2 element 2: 3 element 3: 4 _Myfirst=0051B180, _Mylast=0051B190, _Myend=0051B198 size=4, capacity=6 element 0: 1 element 1: 2 element 2: 3 element 3: 4 _Myfirst=0051B180, _Mylast=0051B194, _Myend=0051B198 size=5, capacity=6 element 0: 1 element 1: 2 element 2: 3 element 3: 4 element 4: 5 _Myfirst=0051B180, _Mylast=0051B198, _Myend=0051B198 size=6, capacity=6 element 0: 1 element 1: 2 element 2: 3 element 3: 4 element 4: 5 element 5: 6 6 6619158 ``` 可以看到,在main函数的头部也没有分配空间。当第一次push_back()调用结束后,缓冲区被分配。同时在每一次调用push_back()后,数组的大小和缓冲区容量都增大了。缓冲区地址也变化了,因为push_back()函数每次都会在堆中重新分配缓冲区。这是个耗时的操作,这就是为什么提前预测数组大小,同时调用.reserve()方法预留空间比较重要了。最后数据是垃圾数据,没有数组元素位于这个位置,因此打印出了随机的数据。这也说明了std::vector的[]操作并不校验数组下标是否越界。然而.at()方法会做相应检查,当出现越界时会抛出std::out_of_range。 让我们来看代码: Listing 34.11: MSVC 2012 /GS- /Ob1 ``` $SG52650 DB ’%d’, 0aH, 00H $SG52651 DB ’%d’, 0aH, 00H _this$ = -4 ; size = 4 __Pos$ = 8 ; size = 4 ?at@?$vector@HV?$allocator@H@std@@@std@@QAEAAHI@Z PROC ; std::vector<int,std::allocator<int> >:: at, COMDAT ; _this$ = ecx push ebp mov ebp, esp push ecx mov DWORD PTR _this$[ebp], ecx mov eax, DWORD PTR _this$[ebp] mov ecx, DWORD PTR _this$[ebp] mov edx, DWORD PTR [eax+4] sub edx, DWORD PTR [ecx] sar edx, 2 cmp edx, DWORD PTR __Pos$[ebp] ja SHORT $LN1@at push OFFSET ??_C@_0BM@NMJKDPPO@invalid?5vector?$DMT?$DO?5subscript?$AA@ call DWORD PTR __imp_?_Xout_of_range@std@@YAXPBD@Z $LN1@at: mov eax, DWORD PTR _this$[ebp] mov ecx, DWORD PTR [eax] mov edx, DWORD PTR __Pos$[ebp] lea eax, DWORD PTR [ecx+edx*4] $LN3@at: mov esp, ebp pop ebp ret 4 ?at@?$vector@HV?$allocator@H@std@@@std@@QAEAAHI@Z ENDP ; std::vector<int,std::allocator<int> >:: at _c$ = -36 ; size = 12 $T1 = -24 ; size = 4 $T2 = -20 ; size = 4 $T3 = -16 ; size = 4 $T4 = -12 ; size = 4 $T5 = -8 ; size = 4 $T6 = -4 ; size = 4 _main PROC push ebp mov ebp, esp sub esp, 36 ; 00000024H mov DWORD PTR _c$[ebp], 0 ; Myfirst mov DWORD PTR _c$[ebp+4], 0 ; Mylast mov DWORD PTR _c$[ebp+8], 0 ; Myend lea eax, DWORD PTR _c$[ebp] push eax call ?dump@@YAXPAUvector_of_ints@@@Z ; dump add esp, 4 mov DWORD PTR $T6[ebp], 1 lea ecx, DWORD PTR $T6[ebp] push ecx lea ecx, DWORD PTR _c$[ebp] call ?push_back@?$vector@HV?$allocator@H@std@@@std@@QAEX$$QAH@Z ; std::vector<int,std ::allocator<int> >::push_back lea edx, DWORD PTR _c$[ebp] push edx call ?dump@@YAXPAUvector_of_ints@@@Z ; dump add esp, 4 mov DWORD PTR $T5[ebp], 2 lea eax, DWORD PTR $T5[ebp] push eax lea ecx, DWORD PTR _c$[ebp] call ?push_back@?$vector@HV?$allocator@H@std@@@std@@QAEX$$QAH@Z ; std::vector<int,std ::allocator<int> >::push_back lea ecx, DWORD PTR _c$[ebp] push ecx call ?dump@@YAXPAUvector_of_ints@@@Z ; dump add esp, 4 mov DWORD PTR $T4[ebp], 3 lea edx, DWORD PTR $T4[ebp] push edx lea ecx, DWORD PTR _c$[ebp] call ?push_back@?$vector@HV?$allocator@H@std@@@std@@QAEX$$QAH@Z ; std::vector<int,std ::allocator<int> >::push_back lea eax, DWORD PTR _c$[ebp] push eax call ?dump@@YAXPAUvector_of_ints@@@Z ; dump add esp, 4 mov DWORD PTR $T3[ebp], 4 lea ecx, DWORD PTR $T3[ebp] push ecx lea ecx, DWORD PTR _c$[ebp] call ?push_back@?$vector@HV?$allocator@H@std@@@std@@QAEX$$QAH@Z ; std::vector<int,std ::allocator<int> >::push_back lea edx, DWORD PTR _c$[ebp] push edx call ?dump@@YAXPAUvector_of_ints@@@Z ; dump add esp, 4 push 6 lea ecx, DWORD PTR _c$[ebp] call ?reserve@?$vector@HV?$allocator@H@std@@@std@@QAEXI@Z ; std::vector<int,std:: allocator<int> >::reserve lea eax, DWORD PTR _c$[ebp] push eax call ?dump@@YAXPAUvector_of_ints@@@Z ; dump add esp, 4 mov DWORD PTR $T2[ebp], 5 lea ecx, DWORD PTR $T2[ebp] push ecx lea ecx, DWORD PTR _c$[ebp] call ?push_back@?$vector@HV?$allocator@H@std@@@std@@QAEX$$QAH@Z ; std::vector<int,std ::allocator<int> >::push_back lea edx, DWORD PTR _c$[ebp] push edx call ?dump@@YAXPAUvector_of_ints@@@Z ; dump add esp, 4 mov DWORD PTR $T1[ebp], 6 lea eax, DWORD PTR $T1[ebp] push eax lea ecx, DWORD PTR _c$[ebp] call ?push_back@?$vector@HV?$allocator@H@std@@@std@@QAEX$$QAH@Z ; std::vector<int,std ::allocator<int> >::push_back lea ecx, DWORD PTR _c$[ebp] push ecx call ?dump@@YAXPAUvector_of_ints@@@Z ; dump add esp, 4 push 5 lea ecx, DWORD PTR _c$[ebp] call ?at@?$vector@HV?$allocator@H@std@@@std@@QAEAAHI@Z ; std::vector<int,std:: allocator<int> >::at mov edx, DWORD PTR [eax] push edx push OFFSET $SG52650 ; ’%d’ call DWORD PTR __imp__printf add esp, 8 mov eax, 8 shl eax, 2 mov ecx, DWORD PTR _c$[ebp] mov edx, DWORD PTR [ecx+eax] push edx push OFFSET $SG52651 ; ’%d’ call DWORD PTR __imp__printf add esp, 8 lea ecx, DWORD PTR _c$[ebp] call ?_Tidy@?$vector@HV?$allocator@H@std@@@std@@IAEXXZ ; std::vector<int,std:: allocator<int> >::_Tidy xor eax, eax mov esp, ebp pop ebp ret 0 _main ENDP ``` 我们看到.at()方法如何进行辩解检查,当发生错误时将抛出异常。最后一个printf()调用只是从内存中读取数据,并没有做任何检查。 有人可能会问,为什么没有用类似std::string中size和capacity的变量,我猜想可能是为了让边界检查更快,但是我不确定。 GCC生成的代码基本上相同,但是.at()方法被内联了。 Listing34.12: GCC 4.8.1 -fno-inline-small-functions –O1 ``` main proc near push ebp mov ebp, esp push edi push esi push ebx and esp, 0FFFFFFF0h sub esp, 20h mov dword ptr [esp+14h], 0 mov dword ptr [esp+18h], 0 mov dword ptr [esp+1Ch], 0 lea eax, [esp+14h] mov [esp], eax call _Z4dumpP14vector_of_ints ; dump(vector_of_ints *) mov dword ptr [esp+10h], 1 lea eax, [esp+10h] mov [esp+4], eax lea eax, [esp+14h] mov [esp], eax call _ZNSt6vectorIiSaIiEE9push_backERKi ; std::vector<int,std::allocator<int >>::push_back(int const&) lea eax, [esp+14h] mov [esp], eax call _Z4dumpP14vector_of_ints ; dump(vector_of_ints *) mov dword ptr [esp+10h], 2 lea eax, [esp+10h] mov [esp+4], eax lea eax, [esp+14h] mov [esp], eax call _ZNSt6vectorIiSaIiEE9push_backERKi ; std::vector<int,std::allocator<int >>::push_back(int const&) lea eax, [esp+14h] mov [esp], eax call _Z4dumpP14vector_of_ints ; dump(vector_of_ints *) mov dword ptr [esp+10h], 3 lea eax, [esp+10h] mov [esp+4], eax lea eax, [esp+14h] mov [esp], eax call _ZNSt6vectorIiSaIiEE9push_backERKi ; std::vector<int,std::allocator<int >>::push_back(int const&) lea eax, [esp+14h] mov [esp], eax call _Z4dumpP14vector_of_ints ; dump(vector_of_ints *) mov dword ptr [esp+10h], 4 lea eax, [esp+10h] mov [esp+4], eax lea eax, [esp+14h] mov [esp], eax call _ZNSt6vectorIiSaIiEE9push_backERKi ; std::vector<int,std::allocator<int >>::push_back(int const&) lea eax, [esp+14h] mov [esp], eax call _Z4dumpP14vector_of_ints ; dump(vector_of_ints *) mov ebx, [esp+14h] mov eax, [esp+1Ch] sub eax, ebx cmp eax, 17h ja short loc_80001CF mov edi, [esp+18h] sub edi, ebx sar edi, 2 mov dword ptr [esp], 18h call _Znwj ; operator new(uint) mov esi, eax test edi, edi jz short loc_80001AD lea eax, ds:0[edi*4] mov [esp+8], eax ; n mov [esp+4], ebx ; src mov [esp], esi ; dest call memmove loc_80001AD: ; CODE XREF: main+F8 mov eax, [esp+14h] test eax, eax jz short loc_80001BD mov [esp], eax ; void * call _ZdlPv ; operator delete(void *) loc_80001BD: ; CODE XREF: main+117 mov [esp+14h], esi lea eax, [esi+edi*4] mov [esp+18h], eax add esi, 18h mov [esp+1Ch], esi loc_80001CF: ; CODE XREF: main+DD lea eax, [esp+14h] mov [esp], eax call _Z4dumpP14vector_of_ints ; dump(vector_of_ints *) mov dword ptr [esp+10h], 5 lea eax, [esp+10h] mov [esp+4], eax lea eax, [esp+14h] mov [esp], eax call _ZNSt6vectorIiSaIiEE9push_backERKi ; std::vector<int,std::allocator<int >>::push_back(int const&) lea eax, [esp+14h] mov [esp], eax call _Z4dumpP14vector_of_ints ; dump(vector_of_ints *) mov dword ptr [esp+10h], 6 lea eax, [esp+10h] mov [esp+4], eax lea eax, [esp+14h] mov [esp], eax call _ZNSt6vectorIiSaIiEE9push_backERKi ; std::vector<int,std::allocator<int >>::push_back(int const&) lea eax, [esp+14h] mov [esp], eax call _Z4dumpP14vector_of_ints ; dump(vector_of_ints *) mov eax, [esp+14h] mov edx, [esp+18h] sub edx, eax cmp edx, 17h ja short loc_8000246 mov dword ptr [esp], offset aVector_m_range ; "vector::_M_range_check" call _ZSt20__throw_out_of_rangePKc ; std::__throw_out_of_range(char const*) loc_8000246: ; CODE XREF: main+19C mov eax, [eax+14h] mov [esp+8], eax mov dword ptr [esp+4], offset aD ; "%d\n" mov dword ptr [esp], 1 call __printf_chk mov eax, [esp+14h] mov eax, [eax+20h] mov [esp+8], eax mov dword ptr [esp+4], offset aD ; "%d\n" mov dword ptr [esp], 1 call __printf_chk mov eax, [esp+14h] test eax, eax jz short loc_80002AC mov [esp], eax ; void * call _ZdlPv ; operator delete(void *) jmp short loc_80002AC ; --------------------------------------------------------------------------- mov ebx, eax mov edx, [esp+14h] test edx, edx jz short loc_80002A4 mov [esp], edx ; void * call _ZdlPv ; operator delete(void *) loc_80002A4: ; CODE XREF: main+1FE mov [esp], ebx call _Unwind_Resume ; --------------------------------------------------------------------------- loc_80002AC: ; CODE XREF: main+1EA ; main+1F4 mov eax, 0 lea esp, [ebp-0Ch] pop ebx pop esi pop edi pop ebp locret_80002B8: ; DATA XREF: .eh_frame:08000510 ; .eh_frame:080005BC retn main endp ``` .reserve()方法也被内联了。如果缓冲区大小小于新的size,则调用new()申请新缓冲区,调用memmove()拷贝缓冲区内容,然后调用delete()释放旧的缓冲区。 让你给我们看看通过GCC编译后程序的输出 ``` _Myfirst=0x(nil), _Mylast=0x(nil), _Myend=0x(nil) size=0, capacity=0 _Myfirst=0x8257008, _Mylast=0x825700c, _Myend=0x825700c size=1, capacity=1 element 0: 1 _Myfirst=0x8257018, _Mylast=0x8257020, _Myend=0x8257020 size=2, capacity=2 element 0: 1 element 1: 2 _Myfirst=0x8257028, _Mylast=0x8257034, _Myend=0x8257038 size=3, capacity=4 element 0: 1 element 1: 2 element 2: 3 _Myfirst=0x8257028, _Mylast=0x8257038, _Myend=0x8257038 size=4, capacity=4 element 0: 1 element 1: 2 element 2: 3 element 3: 4 _Myfirst=0x8257040, _Mylast=0x8257050, _Myend=0x8257058 size=4, capacity=6 element 0: 1 element 1: 2 element 2: 3 element 3: 4 _Myfirst=0x8257040, _Mylast=0x8257054, _Myend=0x8257058 size=5, capacity=6 element 0: 1 element 1: 2 element 2: 3 element 3: 4 element 4: 5 _Myfirst=0x8257040, _Mylast=0x8257058, _Myend=0x8257058 size=6, capacity=6 element 0: 1 element 1: 2 element 2: 3 element 3: 4 element 4: 5 element 5: 6 6 0 ``` 我们可以看到缓冲区大小的增长不同于MSVC中。 简单的实验说明MSVC中缓冲区每次扩大当前大小的50%,而GCC中则每次扩大100%。 ## 34.4 std::map and std::set 二叉树是另一个基本的数据结构。它是一个树,但是每个节点最多包含两个指向其他节点的指针。每个节点包含键和/或值。 二叉树在键值字典的实现中是常用数据结构。 二叉树至少包含三个重要的属性: 所有的键的存储是排序的。 任何类型的键都容易存储。二叉树并不知道键的类型,因此需要键比较算法。 查找键的速度相比于链表和数组要快。 下面是一个非常简单的例子,我们在二叉树中存储下面这些数据0,1,2,3,5,6,9,10,11,12,20,99,100,101,107,1001,1010. 所有小于根节点键的键存储在树的左侧,所有大于根节点键的键存储在树的右侧。 因此,查找算法就很直接,当要查找的值小于当前节点的键的值,则查找左侧;如果大于当前节点的键的值,则查找右侧;当值相等时,则停止查找。这就是为什么通过键比较函数,算法可以查找数值、文本串等。 所有键的值都是唯一的。 如果这样,为了在n个键的平衡二叉树中查找一个键将需要log2 n步。1000个键需要10步,10000个键需要13步。但是这要求树总是平衡的,即键应该均匀的分布在树的不同层里。插入和删除操作需要保持树的平衡状态。 有一些流行的平衡算法可用,包括AVL树和红黑树。红黑树通过给节点增加一个color值来简化平衡过程,因此每个节点可能是红或者黑。 GCC和MSVC中的std::map和std::set模板的实现均采用了红黑树。 std::set只包含键。std::map是set的扩展,它在每个节点中包含一个值。 ## 34.4.1 MSVC ``` #include <map> #include <set> #include <string> #include <iostream> // struct is not packed! struct tree_node { struct tree_node *Left; struct tree_node *Parent; struct tree_node *Right; char Color; // 0 - Red, 1 - Black char Isnil; //std::pair Myval; unsigned int first; // called Myval in std::set const char *second; // not present in std::set }; struct tree_struct { struct tree_node *Myhead; size_t Mysize; }; void dump_tree_node (struct tree_node *n, bool is_set, bool traverse) { printf ("ptr=0x%p Left=0x%p Parent=0x%p Right=0x%p Color=%d Isnil=%d\n", n, n->Left, n->Parent, n->Right, n->Color, n->Isnil); if (n->Isnil==0) { if (is_set) printf ("first=%d\n", n->first); else printf ("first=%d second=[%s]\n", n->first, n->second); } if (traverse) { if (n->Isnil==1) dump_tree_node (n->Parent, is_set, true); else { if (n->Left->Isnil==0) dump_tree_node (n->Left, is_set, true); if (n->Right->Isnil==0) dump_tree_node (n->Right, is_set, true); }; }; }; const char* ALOT_OF_TABS="\t\t\t\t\t\t\t\t\t\t\t"; void dump_as_tree (int tabs, struct tree_node *n, bool is_set) { if (is_set) printf ("%d\n", n->first); else printf ("%d [%s]\n", n->first, n->second); if (n->Left->Isnil==0) { printf ("%.*sL-------", tabs, ALOT_OF_TABS); dump_as_tree (tabs+1, n->Left, is_set); }; if (n->Right->Isnil==0) { printf ("%.*sR-------", tabs, ALOT_OF_TABS); dump_as_tree (tabs+1, n->Right, is_set); }; }; void dump_map_and_set(struct tree_struct *m, bool is_set) { printf ("ptr=0x%p, Myhead=0x%p, Mysize=%d\n", m, m->Myhead, m->Mysize); dump_tree_node (m->Myhead, is_set, true); printf ("As a tree:\n"); printf ("root----"); dump_as_tree (1, m->Myhead->Parent, is_set); }; int main() { // map std::map<int, const char*> m; m[10]="ten"; m[20]="twenty"; m[3]="three"; m[101]="one hundred one"; m[100]="one hundred"; m[12]="twelve"; m[107]="one hundred seven"; m[0]="zero"; m[1]="one"; m[6]="six"; m[99]="ninety-nine"; m[5]="five"; m[11]="eleven"; m[1001]="one thousand one"; m[1010]="one thousand ten"; m[2]="two"; m[9]="nine"; printf ("dumping m as map:\n"); dump_map_and_set ((struct tree_struct *)(void*)&m, false); std::map<int, const char*>::iterator it1=m.begin(); printf ("m.begin():\n"); dump_tree_node ((struct tree_node *)*(void**)&it1, false, false); it1=m.end(); printf ("m.end():\n"); dump_tree_node ((struct tree_node *)*(void**)&it1, false, false); // set std::set<int> s; s.insert(123); s.insert(456); s.insert(11); s.insert(12); s.insert(100); s.insert(1001); printf ("dumping s as set:\n"); dump_map_and_set ((struct tree_struct *)(void*)&s, true); std::set<int>::iterator it2=s.begin(); printf ("s.begin():\n"); dump_tree_node ((struct tree_node *)*(void**)&it2, true, false); it2=s.end(); printf ("s.end():\n"); dump_tree_node ((struct tree_node *)*(void**)&it2, true, false); }; Listing 34.13: MSVC 2012 dumping m as map: ptr=0x0020FE04, Myhead=0x005BB3A0, Mysize=17 ptr=0x005BB3A0 Left=0x005BB4A0 Parent=0x005BB3C0 Right=0x005BB580 Color=1 Isnil=1 ptr=0x005BB3C0 Left=0x005BB4C0 Parent=0x005BB3A0 Right=0x005BB440 Color=1 Isnil=0 first=10 second=[ten] ptr=0x005BB4C0 Left=0x005BB4A0 Parent=0x005BB3C0 Right=0x005BB520 Color=1 Isnil=0 first=1 second=[one] ptr=0x005BB4A0 Left=0x005BB3A0 Parent=0x005BB4C0 Right=0x005BB3A0 Color=1 Isnil=0 first=0 second=[zero] ptr=0x005BB520 Left=0x005BB400 Parent=0x005BB4C0 Right=0x005BB4E0 Color=0 Isnil=0 first=5 second=[five] ptr=0x005BB400 Left=0x005BB5A0 Parent=0x005BB520 Right=0x005BB3A0 Color=1 Isnil=0 first=3 second=[three] ptr=0x005BB5A0 Left=0x005BB3A0 Parent=0x005BB400 Right=0x005BB3A0 Color=0 Isnil=0 first=2 second=[two] ptr=0x005BB4E0 Left=0x005BB3A0 Parent=0x005BB520 Right=0x005BB5C0 Color=1 Isnil=0 first=6 second=[six] ptr=0x005BB5C0 Left=0x005BB3A0 Parent=0x005BB4E0 Right=0x005BB3A0 Color=0 Isnil=0 first=9 second=[nine] ptr=0x005BB440 Left=0x005BB3E0 Parent=0x005BB3C0 Right=0x005BB480 Color=1 Isnil=0 first=100 second=[one hundred] ptr=0x005BB3E0 Left=0x005BB460 Parent=0x005BB440 Right=0x005BB500 Color=0 Isnil=0 first=20 second=[twenty] ptr=0x005BB460 Left=0x005BB540 Parent=0x005BB3E0 Right=0x005BB3A0 Color=1 Isnil=0 first=12 second=[twelve] ptr=0x005BB540 Left=0x005BB3A0 Parent=0x005BB460 Right=0x005BB3A0 Color=0 Isnil=0 first=11 second=[eleven] ptr=0x005BB500 Left=0x005BB3A0 Parent=0x005BB3E0 Right=0x005BB3A0 Color=1 Isnil=0 first=99 second=[ninety-nine] ptr=0x005BB480 Left=0x005BB420 Parent=0x005BB440 Right=0x005BB560 Color=0 Isnil=0 first=107 second=[one hundred seven] ptr=0x005BB420 Left=0x005BB3A0 Parent=0x005BB480 Right=0x005BB3A0 Color=1 Isnil=0 first=101 second=[one hundred one] ptr=0x005BB560 Left=0x005BB3A0 Parent=0x005BB480 Right=0x005BB580 Color=1 Isnil=0 first=1001 second=[one thousand one] ptr=0x005BB580 Left=0x005BB3A0 Parent=0x005BB560 Right=0x005BB3A0 Color=0 Isnil=0 first=1010 second=[one thousand ten] As a tree: root----10 [ten] L-------1 [one] L-------0 [zero] R-------5 [five] L-------3 [three] L-------2 [two] R-------6 [six] R-------9 [nine] R-------100 [one hundred] L-------20 [twenty] L-------12 [twelve] L-------11 [eleven] R-------99 [ninety-nine] R-------107 [one hundred seven] L-------101 [one hundred one] R-------1001 [one thousand one] R-------1010 [one thousand ten] m.begin(): ptr=0x005BB4A0 Left=0x005BB3A0 Parent=0x005BB4C0 Right=0x005BB3A0 Color=1 Isnil=0 first=0 second=[zero] m.end(): ptr=0x005BB3A0 Left=0x005BB4A0 Parent=0x005BB3C0 Right=0x005BB580 Color=1 Isnil=1 dumping s as set: ptr=0x0020FDFC, Myhead=0x005BB5E0, Mysize=6 ptr=0x005BB5E0 Left=0x005BB640 Parent=0x005BB600 Right=0x005BB6A0 Color=1 Isnil=1 ptr=0x005BB600 Left=0x005BB660 Parent=0x005BB5E0 Right=0x005BB620 Color=1 Isnil=0 first=123 ptr=0x005BB660 Left=0x005BB640 Parent=0x005BB600 Right=0x005BB680 Color=1 Isnil=0 first=12 ptr=0x005BB640 Left=0x005BB5E0 Parent=0x005BB660 Right=0x005BB5E0 Color=0 Isnil=0 first=11 ptr=0x005BB680 Left=0x005BB5E0 Parent=0x005BB660 Right=0x005BB5E0 Color=0 Isnil=0 first=100 ptr=0x005BB620 Left=0x005BB5E0 Parent=0x005BB600 Right=0x005BB6A0 Color=1 Isnil=0 first=456 ptr=0x005BB6A0 Left=0x005BB5E0 Parent=0x005BB620 Right=0x005BB5E0 Color=0 Isnil=0 first=1001 As a tree: root----123 L-------12 L-------11 R-------100 R-------456 R-------1001 s.begin(): ptr=0x005BB640 Left=0x005BB5E0 Parent=0x005BB660 Right=0x005BB5E0 Color=0 Isnil=0 first=11 s.end(): ptr=0x005BB5E0 Left=0x005BB640 Parent=0x005BB600 Right=0x005BB6A0 Color=1 Isnil=1 ``` 结构体没有打包,因此所有的char类型都占用4个字节。 对于std::map,first和second变量可以看做是一个std::pair型变量。而在std::set中,std::pair只有一个变量。 当前树的节点个数总被保存着,这与MSVC中的std::list实现方式一样。 和std::list一样,迭代器只是指向节点的指针。.begin()迭代器指向最小的键。.begin()指针没有存储在某个位置(和list一样),树中最小的key总是可以找到。当节点有前一个和后一个节点时,- -和++操作会将迭代器移动到前一个或者后一个节点。关于这些操作的算法在文献七中进行了描述。 .end()迭代器指向根节点,它的Isnil的值为1,即这个节点没有键和值。 ## 34.4.2 GCC ``` #include <stdio.h> #include <map> #include <set> #include <string> #include <iostream> struct map_pair { int key; const char *value; }; struct tree_node { int M_color; // 0 - Red, 1 - Black struct tree_node *M_parent; struct tree_node *M_left; struct tree_node *M_right; }; struct tree_struct { int M_key_compare; struct tree_node M_header; size_t M_node_count; }; void dump_tree_node (struct tree_node *n, bool is_set, bool traverse, bool dump_keys_and_values) { printf ("ptr=0x%p M_left=0x%p M_parent=0x%p M_right=0x%p M_color=%d\n", n, n->M_left, n->M_parent, n->M_right, n->M_color); void *point_after_struct=((char*)n)+sizeof(struct tree_node); if (dump_keys_and_values) { if (is_set) printf ("key=%d\n", *(int*)point_after_struct); else { struct map_pair *p=(struct map_pair *)point_after_struct; printf ("key=%d value=[%s]\n", p->key, p->value); }; }; if (traverse==false) return; if (n->M_left) dump_tree_node (n->M_left, is_set, traverse, dump_keys_and_values); if (n->M_right) dump_tree_node (n->M_right, is_set, traverse, dump_keys_and_values); }; const char* ALOT_OF_TABS="\t\t\t\t\t\t\t\t\t\t\t"; void dump_as_tree (int tabs, struct tree_node *n, bool is_set) { void *point_after_struct=((char*)n)+sizeof(struct tree_node); if (is_set) printf ("%d\n", *(int*)point_after_struct); else { struct map_pair *p=(struct map_pair *)point_after_struct; printf ("%d [%s]\n", p->key, p->value); } if (n->M_left) { printf ("%.*sL-------", tabs, ALOT_OF_TABS); dump_as_tree (tabs+1, n->M_left, is_set); }; if (n->M_right) { printf ("%.*sR-------", tabs, ALOT_OF_TABS); dump_as_tree (tabs+1, n->M_right, is_set); }; }; void dump_map_and_set(struct tree_struct *m, bool is_set) { printf ("ptr=0x%p, M_key_compare=0x%x, M_header=0x%p, M_node_count=%d\n", m, m->M_key_compare, &m->M_header, m->M_node_count); dump_tree_node (m->M_header.M_parent, is_set, true, true); printf ("As a tree:\n"); printf ("root----"); dump_as_tree (1, m->M_header.M_parent, is_set); }; int main() { // map std::map<int, const char*> m; m[10]="ten"; m[20]="twenty"; m[3]="three"; m[101]="one hundred one"; m[100]="one hundred"; m[12]="twelve"; m[107]="one hundred seven"; m[0]="zero"; m[1]="one"; m[6]="six"; m[99]="ninety-nine"; m[5]="five"; m[11]="eleven"; m[1001]="one thousand one"; m[1010]="one thousand ten"; m[2]="two"; m[9]="nine"; printf ("dumping m as map:\n"); dump_map_and_set ((struct tree_struct *)(void*)&m, false); std::map<int, const char*>::iterator it1=m.begin(); printf ("m.begin():\n"); dump_tree_node ((struct tree_node *)*(void**)&it1, false, false, true); it1=m.end(); printf ("m.end():\n"); dump_tree_node ((struct tree_node *)*(void**)&it1, false, false, false); // set std::set<int> s; s.insert(123); s.insert(456); s.insert(11); s.insert(12); s.insert(100); s.insert(1001); printf ("dumping s as set:\n"); dump_map_and_set ((struct tree_struct *)(void*)&s, true); std::set<int>::iterator it2=s.begin(); printf ("s.begin():\n"); dump_tree_node ((struct tree_node *)*(void**)&it2, true, false, true); it2=s.end(); printf ("s.end():\n"); dump_tree_node ((struct tree_node *)*(void**)&it2, true, false, false); }; dumping m as map: ptr=0x0028FE3C, M_key_compare=0x402b70, M_header=0x0028FE40, M_node_count=17 ptr=0x007A4988 M_left=0x007A4C00 M_parent=0x0028FE40 M_right=0x007A4B80 M_color=1 key=10 value=[ten] ptr=0x007A4C00 M_left=0x007A4BE0 M_parent=0x007A4988 M_right=0x007A4C60 M_color=1 key=1 value=[one] ptr=0x007A4BE0 M_left=0x00000000 M_parent=0x007A4C00 M_right=0x00000000 M_color=1 key=0 value=[zero] ptr=0x007A4C60 M_left=0x007A4B40 M_parent=0x007A4C00 M_right=0x007A4C20 M_color=0 key=5 value=[five] ptr=0x007A4B40 M_left=0x007A4CE0 M_parent=0x007A4C60 M_right=0x00000000 M_color=1 key=3 value=[three] ptr=0x007A4CE0 M_left=0x00000000 M_parent=0x007A4B40 M_right=0x00000000 M_color=0 key=2 value=[two] ptr=0x007A4C20 M_left=0x00000000 M_parent=0x007A4C60 M_right=0x007A4D00 M_color=1 key=6 value=[six] ptr=0x007A4D00 M_left=0x00000000 M_parent=0x007A4C20 M_right=0x00000000 M_color=0 key=9 value=[nine] ptr=0x007A4B80 M_left=0x007A49A8 M_parent=0x007A4988 M_right=0x007A4BC0 M_color=1 key=100 value=[one hundred] ptr=0x007A49A8 M_left=0x007A4BA0 M_parent=0x007A4B80 M_right=0x007A4C40 M_color=0 key=20 value=[twenty] ptr=0x007A4BA0 M_left=0x007A4C80 M_parent=0x007A49A8 M_right=0x00000000 M_color=1 key=12 value=[twelve] ptr=0x007A4C80 M_left=0x00000000 M_parent=0x007A4BA0 M_right=0x00000000 M_color=0 key=11 value=[eleven] ptr=0x007A4C40 M_left=0x00000000 M_parent=0x007A49A8 M_right=0x00000000 M_color=1 key=99 value=[ninety-nine] ptr=0x007A4BC0 M_left=0x007A4B60 M_parent=0x007A4B80 M_right=0x007A4CA0 M_color=0 key=107 value=[one hundred seven] ptr=0x007A4B60 M_left=0x00000000 M_parent=0x007A4BC0 M_right=0x00000000 M_color=1 key=101 value=[one hundred one] ptr=0x007A4CA0 M_left=0x00000000 M_parent=0x007A4BC0 M_right=0x007A4CC0 M_color=1 key=1001 value=[one thousand one] ptr=0x007A4CC0 M_left=0x00000000 M_parent=0x007A4CA0 M_right=0x00000000 M_color=0 key=1010 value=[one thousand ten] As a tree: root----10 [ten] L-------1 [one] L-------0 [zero] R-------5 [five] L-------3 [three] L-------2 [two] R-------6 [six] R-------9 [nine] R-------100 [one hundred] L-------20 [twenty] L-------12 [twelve] L-------11 [eleven] R-------99 [ninety-nine] R-------107 [one hundred seven] L-------101 [one hundred one] R-------1001 [one thousand one] R-------1010 [one thousand ten] m.begin(): ptr=0x007A4BE0 M_left=0x00000000 M_parent=0x007A4C00 M_right=0x00000000 M_color=1 key=0 value=[zero] m.end(): ptr=0x0028FE40 M_left=0x007A4BE0 M_parent=0x007A4988 M_right=0x007A4CC0 M_color=0 dumping s as set: ptr=0x0028FE20, M_key_compare=0x8, M_header=0x0028FE24, M_node_count=6 ptr=0x007A1E80 M_left=0x01D5D890 M_parent=0x0028FE24 M_right=0x01D5D850 M_color=1 key=123 ptr=0x01D5D890 M_left=0x01D5D870 M_parent=0x007A1E80 M_right=0x01D5D8B0 M_color=1 key=12 ptr=0x01D5D870 M_left=0x00000000 M_parent=0x01D5D890 M_right=0x00000000 M_color=0 key=11 ptr=0x01D5D8B0 M_left=0x00000000 M_parent=0x01D5D890 M_right=0x00000000 M_color=0 key=100 ptr=0x01D5D850 M_left=0x00000000 M_parent=0x007A1E80 M_right=0x01D5D8D0 M_color=1 key=456 ptr=0x01D5D8D0 M_left=0x00000000 M_parent=0x01D5D850 M_right=0x00000000 M_color=0 key=1001 As a tree: root----123 L-------12 L-------11 R-------100 R-------456 R-------1001 s.begin(): ptr=0x01D5D870 M_left=0x00000000 M_parent=0x01D5D890 M_right=0x00000000 M_color=0 key=11 s.end(): ptr=0x0028FE24 M_left=0x01D5D870 M_parent=0x007A1E80 M_right=0x01D5D8D0 M_color=0 ``` GCC的实现也很类似。唯一一的不同点就是没有Isnil元素,因此占用的空间要小于MSVC的实现。跟节点也是作为.end()迭代器指向的位置,同时也不包含键和值。 ## 34.4.3 重新平衡的演示 下面的例子将向我们展示插入节点后,树如何重新平衡。 ``` #include <stdio.h> #include <map> #include <set> #include <string> #include <iostream> struct map_pair { int key; const char *value; }; struct tree_node { int M_color; // 0 - Red, 1 - Black struct tree_node *M_parent; struct tree_node *M_left; struct tree_node *M_right; }; struct tree_struct { int M_key_compare; struct tree_node M_header; size_t M_node_count; }; const char* ALOT_OF_TABS="\t\t\t\t\t\t\t\t\t\t\t"; void dump_as_tree (int tabs, struct tree_node *n) { void *point_after_struct=((char*)n)+sizeof(struct tree_node); printf ("%d\n", *(int*)point_after_struct); if (n->M_left) { printf ("%.*sL-------", tabs, ALOT_OF_TABS); dump_as_tree (tabs+1, n->M_left); }; if (n->M_right) { printf ("%.*sR-------", tabs, ALOT_OF_TABS); dump_as_tree (tabs+1, n->M_right); }; }; void dump_map_and_set(struct tree_struct *m) { printf ("root----"); dump_as_tree (1, m->M_header.M_parent); }; int main() { std::set<int> s; s.insert(123); s.insert(456); printf ("123, 456 are inserted\n"); dump_map_and_set ((struct tree_struct *)(void*)&s); s.insert(11); s.insert(12); printf ("\n"); printf ("11, 12 are inserted\n"); dump_map_and_set ((struct tree_struct *)(void*)&s); s.insert(100); s.insert(1001); printf ("\n"); printf ("100, 1001 are inserted\n"); dump_map_and_set ((struct tree_struct *)(void*)&s); s.insert(667); s.insert(1); s.insert(4); s.insert(7); printf ("\n"); printf ("667, 1, 4, 7 are inserted\n"); dump_map_and_set ((struct tree_struct *)(void*)&s); printf ("\n"); }; 123, 456 are inserted root----123 R-------456 11, 12 are inserted root----123 L-------11 R-------12 R-------456 100, 1001 are inserted root----123 L-------12 L-------11 R-------100 R-------456 R-------1001 667, 1, 4, 7 are inserted root----12 L-------4 L-------1 R-------11 L-------7 R-------123 L-------100 R-------667 L-------456 R-------1001 ```