malloc源码学习

5.1边界标记法

size_t 32位下4字节，64位下4字节或8字节
分配chunk必须与2*SIZE_SZ(size_t)对齐
倒三倒四的#是用来处理chunk地址对齐的宏
==32平台下chunk地址按8字节对齐，64位按8字节或16字节对齐==

使用结构体来描述这些chunk

prev_size：前一个块是空闲时，表示前一个块大小，若不空闲则无意义
size：记录当前chunk大小和当前的和前一个chunk一些属性：是否在使用中、当前chunk是否是通过mmap分配，当前chunk是否属于主分配去
fd,bk：当前chunk空闲时存在的两个指针，用于链表管理，若已分配，则作为应用程序的使用空间
fd_nextsize,bk_nextsize：当前chunk存在与large bins中时， fd_nextsize 指向下一个比当前 chunk 大小 ==大==的第一个空闲 chunk，bk_nextszie 指向前一个比当前 chunk 大小==小==的第一个空闲 chunk ，
若已分配，则也作为程序的使用空间
- Fast Bins：用于管理小内存块（通常小于 64 字节），采用单链表结构，分配和释放速度较快。
- Small Bins：用于管理中等大小的内存块（通常小于 512 字节），采用双向链表结构。
- Large Bins：用于管理较大的内存块（通常大于 512 字节），采用双向链表结构，并按大小排序。
- Unsorted Bins：用于临时存放释放的内存块，后续会根据大小将其移动到 small bins 或 large bins。

已分配的

free中的

可以看到已分配中的fd和bk作为了程序可以使用的空间

通过 mmap 分配的块，在它们的大小字段中设置了第二低的位 M（IS_MMAPPED）。因为它们是one by one分配的，所以每个块都必须包含自己的尾随大小字段，是独立于堆的。
P（PREV_INUSE）位存储在块大小的未使用的最低位中，标记前一个块是否被使用，1 使用，0 空闲。状态为0时，当前chunk包含前一个chunk大小及位置。第一个chunk总是被设置，防止访问错误
特殊块 top 不使用尾随大小字段，因为没有下一个连续块需要通过它来索引。在初始化后，top 始终存在。如果它变得小于 MINSIZE 字节长，它会被重新填充。

chunk2mem：通过chunk地址获得返回给用户内存地址（mem的地址）
mem2chunk：通过mem的地址得到chunk的地址
mem地址同样2*SIZE_SZ对齐（前两个域刚好是2*SIZE_SZ大小）
宏 aligned_OK 和 misaligned_chunk(p)用于校验地址是否是按 2*SIZE_SZ 对齐的
MIN_CHUNK_SIZE 定义了最小的 chunk 的大小，32 位平台上位 16 字节，64 位平台为 24 字节或是 32 字节
MINSIZE是最小的分配的内存大小

这几个宏将用户请求的大小转换为实际上需要分配的大小

在linux X86_64平台中，假设SIZE_SZ为8字节，空闲时，一个chunk至少要有4size_t空间（来存储prev_size,size,fd,bk）也就是*MINSIZE（32B）
可以看到转换时不但考虑地址对齐，还额外加上了SIZE_SZ，原因：

对于一个使用中的chunk，其下一个chunk的pre_size处于无效的，可以被当前chunk所使用，所以in_use_size=(用户请求大小 +16(pre_size+size)-8(借了下一个chunk的pre_size)) align to 16B

又因为空闲时的chunk和使用中的chunk使用的是同一块空间，所以取最大者为实际分配的空间，所以最终chunk_size=max(in_use_size,32(MINSIZE))
如果chunk由mmap直接分配，就不会有前一个和后一个chunk，所以借不到下一个chunk的pre_size，所以overhead=2*SIZE_SZ

chunk分割时总是以地址对齐（默认8字节，但也可以自己设置（alignment=2^n, n是整数且n>=3））所以chunk->size末3bit总是0，比如说8的二进制为1000，用来存储其他信息：

第0位作为P状态位，记录前一chunk是否在使用中 1使用，0空闲
第1位M状态位，标记本chunk是否是mmap分配，1是0否
第2位A状态位，标记本chunk是否属于非主分配区，1是0否

pre_size记录的信息：
- 前一个chunk空闲，pre_size记录前一个chunk的大小，这就是通过当前chunk指针获得前一个空闲chunk地址的依据，宏prev_chunk(p)就是这样实现的
- 前一个chunk在使用中。pre_size无意义
size记录本chunk大小，可通过本chunk大小+本chunk地址得到下一个chunk的地址，由于size低三位记录控制信息，取出实际size在计算时需要注意。这是next_chunk(p)实现原理
chunksize(p)用于获得chunk实际大小，需要屏蔽size中控制信息
chunk_at_offset(p,s)将p+s地址强制看作一个chunk
- p是当前chunk地址
- 作用
  - 获取相邻chunk
  - 合并空闲chunk
  - 遍历堆内存
注意，可以有多个连续并且正在使用中的chunk，但不会有多个连续空闲chunk（会合并为一个大的空闲chunk）

这组宏用于check/set/clear当前chunk使用标志位，因为当前chunk信息存储在下一个chunk中，所以要先获取下一个chunk地址再操作

这三个用于check/set/clear指定chunk的size中的使用表示位

set_head_size(p, s) 设置当前 chunk 的 size 域，并保留标志位
set_head(p, s) 设置当前 chunk 的 size 域，并忽略标志位
set_foot(p, s) 设置下一个 chunk 的 prev_size 为当前 chunk 的大小

5.2分箱式内存管理

对于空闲的chunk，ptmalloc采用分箱式内存管理方式，将其放在4个不同的bin中

fast bins
- 小内存块的高速缓存，小于64字节的chunk被回收时，首先放入fast bins中，分配小内存时，首先会查看fast bins中是否有合适的，若有则直接返回以加快分配速度
unsorted bin
- 顾名思义，类似一个中转。临时存放刚释放的chunk，后续会存到small/large bins中。用户请求是首先看这个bin，找到直接返回，没找到再看small/large bins
small bins
- 存放固定大小的chunk，共64个bin，最小的chunk大小为16或32字节，每个bin大小相差8/16字节， Small Bins[0] 管理 32 字节的 chunk，Small Bins[1] 管理 48 字节的 chunk，以此类推
large bins
- 存放大小大于等于512或1024B的空闲chunk，采用双向链表

5.2.1 Small Bins

Chunk_size=2 * SIZE_SZ * index

在 SIZE_SZ 为 4B 的平台（32位平台）上，small bins 中的 chunk 大小是以 8B 为公差的等差数列，最大的chunk大小为 504B，最小的 chunk 大小为 16B，所以实际共 62 个 bin。分别为 16B、24B、 32B,…,504B。在 SIZE_SZ 为 8B 的平台（64位）上，small bins 中的 chunk 大小是以 16B 为公差的等差数列，最大的 chunk 大小为 1008B，最小的 chunk 大小为 32B，所以实际共 62 个 bin。
ptmalloc维护了62个双向环形链表，每个链表的空闲chunk的大小一直
下图为32位平台下ptmalloc的组织方式

5.2.2large bins

SIZE_SZ为4B（32位）平台下，>=512B，或SIZE_SZ为8B（64位），>=1024B的空闲chunk，由large bins管理。一共包含63个bin，每个bin中的chunk大小不是固定公差的等差数列，而是分成 6 组 bin，每组 bin 是一个固定公差的等差数列，每组的 bin 数量依次为 32、16、8、4、2、1，公差依次为 64B、512B、 4096B、32768B、262144B 等
比如SIZE_SZ为4的平台，第一个large bin起始chunk大小为512B，共32bin，公差64B，所以Chunk_size=512+64*index

第二个large bin从第一个结束开始，所以Chunk_size=512+6432+512index，后面以此类推
可以看出large bins和small bins都有规律，我们可以将这两个bins放在同一个包含128个chunk的数组上，前一部分为small bins

这几个宏可以根据数组下标计算出该 bin 的 chunk 大小（small bins）或是 chunk 大小范围（large bins），也可以根据需要分配内存块大小计算出所需 chunk 所属 bin 的 index
bin_index(sz)根据所需内存大小计算出所需bin的index，再调用smallbin_index(sz)或者largebin_index(sz)
smallbin_index(sz)的计算：
- 如果SIZE_SZ是4B，那么将sz/4
- 如果SIZE_SZ是8B，那么将sz/4
large bins计算会复杂一点
在SIZE_SZ是4B的情况下，chunk大小和bin index：

对于 SIZE_SZ 为 4B 的平台，bin[0]和 bin[1]是不存在的，因为最小的 chunk 为 16B（fd,bk,pre_size,size），small bins 一共 62 个，large bins 一共 63 个，加起来一共 125 个 bin。而 NBINS 定义为 128，其实 bin[0]和 bin[127]都不存在，bin[1]为 unsorted bin 的 chunk 链表头。
对于用户要分配的size，要先用,checked_request2size(req,sz)，计算出chunk大小再用bin_index(sz)计算出所属的bin_index

bin_at(m,i)通过bin index获得bin的链表头，chunk的fd，bk将空闲chunk链入链表，对于bin的链表头，只需要存储fd，bk即可，size和pre_size对链表来说无意义，所以为了节省空间和提高速度，每个bin只留了fd和bk的空间
bin_at(m,i)：定义看出bin[0]不存在
在SIZE_SZ是4B的平台,bin[1]前4B存fd，后4B存bk，bin_at返回malloc_chunk的指针，由于fd在malloc_chunk的偏移地址是 offsetof (struct malloc_chunk, fd))=8（前面有size和pre_size，加起来是8B），所以fd地址-8就是malloc_chunk的地址
对 bin 的链表头的 chunk，一定不能修改 prev_size 和 size 域，这两个域是与其他 bin 的链表头的 fb 和 bk 内存复用的 （其中存储的是fd和bk，若修改的话会破坏双向链表）
next_bin(b)用于获得下个bin地址：当前bin地址向后移动fd+bk的距离
bin的链表头的指针fd指向第一个chunk，bk指向最后一个，first(b)获取第一个可用chunk，last(b)获取最后一个可用chunk，这两个宏用于遍历bin
unlike(P,BK,FD)将chunk从空闲链表中取出来，注意large bins空闲chunk可能在两个双向链表中，unlike需要从两个中删除

5.2.3 Unsorted bin

可看作small bins和large bins的cache（缓存），只有一个bin，以双向链表管理，不排序
所有chunk回收时首先放到该bin，如果该bin没合适的chunk，那么将该bin所有chunk加入到所属的bins中，然后再在small/large bins分配合适的。bins数组中bin[1]用于存储unsorted bin的chunk链表头

第一个宏将bin[1]设置位unsorted bin的chunk链表头，对top chunk初始化，暂时将其初始化位unsorted chunk，这仅仅是初始化一个值，该chunk内容肯定不能用于top chunk来分配内存：top chunk不属于任何一个bin，但ptmalloc有些check需要topchunk属于一个bin

5.2.4fast bins

该bins用于提高内存分配效率
对SIZE_SZ为4或8B平台，小于64或128B的请求首先查看该bins是否有合适的chunk存在（精准匹配），若存在则直接返回
可看作是small bins的前7部分的cache，SIZE_SZ为4/8B，每个bin的chunk大小依次为16/32，24/48，…..
32为平台下分配的内存大小和chunk大小和fast bins对应关系如下

fast bins可看作LIFO的栈（last in,first out）
根据fast bin的index获取fast bin的地址
fastbin_index(sz)用于获得fast bin在fast bins数组中的index
- SIZE_SZ为4/8，将sz除以8/16后-2（bin[0],bin[1]不存在）

根据SIZE_SZ的不同大小，MAX_FAST_SIZE为80B或160B
NFASTBINS为10（fast bins的数组大小）
FASTBIN_CONSOLIDATION_THRESHOLD 用于检查，当释放的chunk与该chunk相邻chunk合并后大于64KB，认为内存碎片较多，需要将fast bins中所有chunk都合并，减少内存碎片对系统的影响

DEFAULT_MXFAST定义了默认fast bins中最大的chunk大小，SIZE_SZ为4/8B，大小分别是64B/128B
set_max_fast(s)将全局变量global_max_fast设置为DEAFAULT_MXFAST
get_max_fast()用于获得这个全局变量的值

5.3 核心结构体分析

ptmalloc使用malloc_state来管理分配区，参数管理用struct malloc_par

5.3.1 malloc_state

struct malloc_state定义如下

struct malloc_state {
 /* Serialize access. */
 //序列化访问
 mutex_t mutex;
 /* Flags (formerly in max_fast). */
 int flags;
#if THREAD_STATS
   /* Statistics for locking. Only used if THREAD_STATS is defined. */  
     //用于锁定的数据，仅在THREAD_STATS定义时被使用
 long stat_lock_direct, stat_lock_loop, stat_lock_wait;
#endif
 /* Fastbins */
 mfastbinptr fastbinsY[NFASTBINS];
 /* Base of the topmost chunk -- not otherwise kept in a bin */
 //top chunk的基地址不会被存储在其他bin中
 mchunkptr top;
 /* The remainder from the most recent split of a small request */
     //最近的一次分割请求产生的剩余块
 mchunkptr last_remainder;
 /* Normal bins packed as described above */
     //按描述的方式打包正常的bin
 mchunkptr bins[NBINS * 2 - 2];
 /* Bitmap of bins */
     //bin的位图
 unsigned int binmap[BINMAPSIZE];
 /* Linked list */
     //链表
 struct malloc_state *next;
#ifdef PER_THREAD
 /* Linked list for free arenas. */
     //用于空闲区域（arena）的链表
 struct malloc_state *next_free;
#endif
 /* Memory allocated from the system in this arena. */
     //此区域中从系统分配的内存
 INTERNAL_SIZE_T system_mem;
 INTERNAL_SIZE_T max_system_mem;
};

Mutex（互斥锁）用于串行化访问分配区，当有多线程访问同一分配区时，第一个获得mutex的线程可以访问该分配区，分配完成后释放mutex供其他线程使用
Flags记录分配区的标志，bit0记录分配区是否至少有一个fast bin chunk，bit1用于记录分配区是否能返回连续的虚拟地址空间

/*
 FASTCHUNKS_BIT held in max_fast indicates that there are probably
 some fastbin chunks. It is set true on entering a chunk into any
 fastbin, and cleared only in malloc_consolidate.
 The truth value is inverted so that have_fastchunks will be true
 upon startup (since statics are zero-filled), simplifying
 initialization checks.
*/
//FASTCHUNKS_BIT 存储在 max_fast 中，表示可能存在一些快速分配区（fastbin）的块。当一个块被放入任何一个快速分配区时，该标志会被设置为 true，并且仅在 malloc_consolidate 函数中被清除。
//该标志的真假值被反转，以便在启动时 have_fastchunks 为 true（因为静态变量会被初始化为零），从而简化初始化检查。
#define FASTCHUNKS_BIT (1U)
#define have_fastchunks(M) (((M)->flags & FASTCHUNKS_BIT) == 0)
#ifdef ATOMIC_FASTBINS
#define clear_fastchunks(M) catomic_or (&(M)->flags, FASTCHUNKS_BIT)
#define set_fastchunks(M) catomic_and (&(M)->flags, ~FASTCHUNKS_BIT)
#else
#define clear_fastchunks(M) ((M)->flags |= FASTCHUNKS_BIT)
#define set_fastchunks(M) ((M)->flags &= ~FASTCHUNKS_BIT)
#endif

上面的宏用于设置或置为flags中fast chunk的标志位bit0，0表示分配区有fast chunk，1表示没有
初始化完成的malloc_state中，flag值为0，表示该分配区有fast chunk，但实际上没有，试图从fast bins中分配chunk都会返回NULL
第一次调用malloc_consolidate()对fast bins进行chunk合并时，如果max_fast>0，则调用clear_fastchunks，标志该分配区没有fast chunk（malloc_consolidate()会合并fast bins中所有chunk）
clear_fastchunks宏只在函数malloc_consolidate()中调用
当有fast chunk加入fast bins时，就调用set_fastchunks宏记录分配区的fast bins中存在fast chunk

/*
 NONCONTIGUOUS_BIT indicates that MORECORE does not return contiguous
 regions. Otherwise, contiguity is exploited in merging together,
 when possible, results from consecutive MORECORE calls.
 The initial value comes from MORECORE_CONTIGUOUS, but is
 changed dynamically if mmap is ever used as an sbrk substitute.
*/
//NONCONTIGUOUS_BIT 表示 MORECORE 不返回连续的内存区域。否则，会利用连续性，在可能的情况下，将连续的 MORECORE 调用的结果合并在一起。
//初始值来自 MORECORE_CONTIGUOUS，但如果 mmap 曾经被用作 sbrk 的替代品，这个值会被动态更改。
#define NONCONTIGUOUS_BIT (2U)
#define contiguous(M) (((M)->flags & NONCONTIGUOUS_BIT) == 0)
#define noncontiguous(M) (((M)->flags & NONCONTIGUOUS_BIT) != 0)
#define set_noncontiguous(M) ((M)->flags |= NONCONTIGUOUS_BIT)
#define set_contiguous(M) ((M)->flags &= ~NONCONTIGUOUS_BIT)

Flags的bit1为0表示MORCORE返回连续虚拟地址空间，为1表示返回非连续虚拟空间
对于主分配区，MORCORE其实为sbr()，是连续
对于非主分配区，使用mmap()分配大块虚拟内存，然后切分来模拟主分配区，而通常mmap是非连续的，所以非主分配区默认分配非连续虚拟地址空间
malloc_state中声明了几个对锁的统计变量，默认没定义THREAD_STATS（编译时的选项，当其被定义时，ptmalloc会收集与线程相关的内存分配和释放的统计信息，例如每个线程的分配内存大小，分配次数等。），所以不会对锁使用情况进行统计
fastbinY有10个元素（NFSTBINS）的数组，用于存放fast chunk链表头指针，所以fast bins最多包含10个fast chunk的单向链表
top是一个chunk的指针，指向分配区的top chunk
last_remainder是一个chunk指针，分配区上次分配small chunk时从一个chunk裂出一个chunk返回给用户，剩余部分形成一个chuhnk，被last_remaider指向
bins用来存储unsorted bin,small bins和large bins的chunk链表头，small bins62个，large bins63个，共125个，bin[1]为unsorted bins链表头，bin[0],bin[127]不存在，所以共126bins，Bins数组共有 254（NBINS*2 – 2）个 mchunkptr 指针，这里由于size,pre_size,fd_nextsize,bk_nextsize对存储无意义，所以在SIZE_SZ为8的平台上，只需要126*2*8=2016个字节即可，bins数组大小为（128*2-2）*8=2032大小，最后16个字节被浪费掉了
binmap字段是一个int数组，ptmalloc用一个bit表示该bit对应的bin中有无空闲chunk

/*
 Binmap
 To help compensate for the large number of bins, a one-level index
 structure is used for bin-by-bin searching. `binmap' is a
 bitvector recording whether bins are definitely empty so they can
 be skipped over during during traversals. The bits are NOT always
 cleared as soon as bins are empty, but instead only
 when they are noticed to be empty during traversal in malloc.
*/
/* Conservatively use 32 bits per map word, even if on 64bit system */
/*
Binmap（位图）
为了弥补 bin 数量较多的问题，这里使用了一种单级索引结构来进行逐个 bin 的搜索。binmap 是一个位向量，用于记录 bin 是否绝对为空，以便在遍历时跳过这些空的 bin。这些位并不是在 bin 变为空时立即清零，而是在 malloc 遍历时发现 bin 为空时才清除。*/
/* 保守地使用 32 位来表示每个映射单元，即使在 64 位系统上也是如此 */
#define BINMAPSHIFT 5
#define BITSPERMAP (1U << BINMAPSHIFT)
#define BINMAPSIZE (NBINS / BITSPERMAP)
#define idx2block(i) ((i) >> BINMAPSHIFT)
#define idx2bit(i) ((1U << ((i) & ((1U << BINMAPSHIFT)-1))))
#define mark_bin(m,i) ((m)->binmap[idx2block(i)] |= idx2bit(i))
#define unmark_bin(m,i) ((m)->binmap[idx2block(i)] &= ~(idx2bit(i)))
#define get_binmap(m,i) ((m)->binmap[idx2block(i)] & idx2bit(i))

binmap一共28bit，16字节，4个int大小，binmap按int分成4个block，每个block有32bit，根据bin index可以用宏idx2block计算出该bin在binmap对应的bit属于哪个block
idx2bit宏取第i位为1，其他位都是0的掩码，比如idx2bit(3):0000 1000（低位向高位，从第0位开始）
mark_bin设置第i个bin在binmap中对应bit位为1，unmark_bin设置为0，get_binmap获取第i个bin对应的bit
next用于将分配区以单向链表链接起来
next_free将空闲分配区链接到单向链表中，仅当PER_THREAD定义时才定义该字段
system_mem字段记录当前分配区已分配内存大小
max_system_mem记录当前分配区最大能分配的内存大小

5.3.2 Malloc_par

malloc_par定义：

struct malloc_par {
 /* Tunable parameters */
/* 可调节参数 */
 unsigned long trim_threshold;
 INTERNAL_SIZE_T top_pad;
 INTERNAL_SIZE_T mmap_threshold;
#ifdef PER_THREAD
 INTERNAL_SIZE_T arena_test;
 INTERNAL_SIZE_T arena_max;
#endif
 /* Memory map support */
     /* 内存映射支持 */
 int n_mmaps;
 int n_mmaps_max;
 int max_n_mmaps;
 /* the mmap_threshold is dynamic, until the user sets
 it manually, at which point we need to disable any
 dynamic behavior. */
/* mmap_threshold 是动态的，直到用户手动设置它为止。在用户手动设置后，我们需要禁用任何动态行为。*/
 int no_dyn_threshold;
 /* Cache malloc_getpagesize */
    /* 缓存 malloc_getpagesize 的值 */
 unsigned int pagesize;
 /* Statistics */
 /*数据*/
 INTERNAL_SIZE_T mmapped_mem;
 INTERNAL_SIZE_T max_mmapped_mem;
 INTERNAL_SIZE_T max_total_mem; /* only kept for NO_THREADS *//* 仅用于 NO_THREADS 情况 */
 /* First address handed out by MORECORE/sbrk. */
     /* 通过 MORECORE/sbrk 分配的第一个地址。*/
 char* sbrk_base;
};

trim_threshold字段表示收缩阈值，默认128KB，当每个分配区top chunk大于该阈值时，会被free掉
由于mmap分配阈值的动态调整，在free时可能将收缩阈值修改为mmap的分配阈值的2倍
- 64位下，mmap分配阈值32MB，所以收缩阈值64MB
- 32位下，mmap分配阈值512B，所以收缩阈值1MB
收缩阈值可通过mllocpt()设置
top_pad表示分配时是否有添加额外的pad（额外添加的内存），默认是0。（在 glibc 的 ptmalloc 内存分配器中，top_pad 字段用于控制在初始化或扩展堆时，分配器会额外申请的内存大小。这个字段的默认值为 0，表示在分配内存时不会额外添加填充）
mmap_threshold表示mmap分配阈值，默认128KB。32位最大512KB，64位下最大32MB，由于默认开启mmap分配阈值动态调整，该字段会被动态调整，但不会超过最大值
area_test,arena_max用于PER_THREAD优化，32位下arena_test默认为2，64位：8
- 当创建分配区数量达到arena_test时，系统会根据当前配置计算分配区最大数量，并将其设置为一个固定值
- arena_max默认是0，表示分配区最大数量由arena_test决定，当系统中分配区数量达到arena_max时就不会创建新分配区而是通过重用已有的分配区
- 这两个值都可以通过mallopt()调整
n_mmaps字段表示当前进程使用mmap()分配的内存块数量
n_mmaps_max表示进程使用mmap()分配内存块的最大数量，默认65536，可以使用mallopt()修改
max_n_mmaps表示当前进程使用mmap分配的内存块最大数量，以确保不会超过n_mmap_max， max_n_mmaps 的值通常由 n_mmaps_max 决定，用户通过调整 n_mmaps_max 来间接影响 max_n_mmap ，该字段是由于mstats()函数输出统计需要该字段
no_dyn_threshold表示是否开启mmap分配阈值的动态调整，默认值0，表示开启
pagsize表示系统的页大小，默认4KB
mmapped_mem和max_nmmapped_mem都用于统计mmap分配的内存大小，一般情况二者相等，max_mmapped_mem用于mstats()函数
- max_total_mem字段在单线程情况下用于统计进程分配的内存总数
sbrk_base字段表示堆的起始地址

5.3.3分配区的初始化

/* There are several instances of this struct ("arenas") in this
 malloc. If you are adapting this malloc in a way that does NOT use
 a static or mmapped malloc_state, you MUST explicitly zero-fill it
 before using. This malloc relies on the property that malloc_state
 is initialized to all zeroes (as is true of C statics). */
/*
在这个内存分配器中，存在多个这种结构体（“分配区”）。如果你正在以一种不使用静态分配或通过 mmap 映射的 malloc_state 的方式调整这个内存分配器，你必须在使用之前显式地将其清零。这个内存分配器依赖于 malloc_state 被初始化为全零的特性（这与 C 语言中静态变量的初始化行为一致）。
*/
static struct malloc_state main_arena;
/* There is only one instance of the malloc parameters. */
/* 只有一个实例的内存分配参数。*/
static struct malloc_par mp_;
/* Maximum size of memory handled in fastbins. */
/* fastbins 中处理的内存块的最大大小。*/
static INTERNAL_SIZE_T global_max_fast;

main_arena：主分配区，任何进程有且仅有一个主分配区
mp_：全局唯一malloc_par实例，用于参数管理和统计信息，比如
- mp_ 包含了内存分配器的各种可调节参数，例如 mmap_threshold（使用 mmap 分配的阈值）、trim_threshold（堆收缩的阈值）等。
- 它还记录了一些统计信息，例如当前的内存使用情况、分配的内存块数量等。
global_max_fast全局变量表示fast bins中最大chunk的大小

main_arena初始化函数：

/*
 Initialize a malloc_state struct.
 This is called only from within malloc_consolidate, which needs
 be called in the same contexts anyway. It is never called directly
 outside of malloc_consolidate because some optimizing compilers try
 to inline it at all call points, which turns out not to be an
 optimization at all. (Inlining it in malloc_consolidate is fine though.)
*/
#if __STD_C
static void malloc_init_state(mstate av)
#else
static void malloc_init_state(av) mstate av;
#endif
{
 int i;
 mbinptr bin;
 /* Establish circular links for normal bins */
 for (i = 1; i < NBINS; ++i) {
 bin = bin_at(av,i);
 bin->fd = bin->bk = bin;
 }
//设置非主分配区
#if MORECORE_CONTIGUOUS
 if (av != &main_arena)
#endif
 set_noncontiguous(av);
//初始化主分配区
 if (av == &main_arena)
 set_max_fast(DEFAULT_MXFAST);
//设置标志位
 av->flags |= FASTCHUNKS_BIT;
//初始化top chunk
 av->top = initial_top(av);
}

分配区初始化函数默认分配区的实例av是全局静态变量或已经将av所有字段清0
初始化：遍历所有bins，将每个bin中空闲链表置空，就是将bin的fd，bk都指向bin本身
由于默认av为0，即默认分配连续虚拟地址空间，但只有主分配区才可以分配连续的，所以对于非主分配区，需要设置其为分配非连续虚拟地址空间
若初始化主分配区，则需要设置global_max_fast，由于主分配区最先被初始化，这保证global_max_fast只被初始化一次。由此，如果global_max_fast值非0，那么意味着主分配区已经被初始化
最后初始化top chunk

Ptmalloc参数初始化

/* Set up basic state so that _int_malloc et al can work. */
static void
ptmalloc_init_minimal (void)
{
#if DEFAULT_TOP_PAD != 0
 mp_.top_pad = DEFAULT_TOP_PAD;
#endif
//mmap参数设置
 mp_.n_mmaps_max = DEFAULT_MMAP_MAX;
 mp_.mmap_threshold = DEFAULT_MMAP_THRESHOLD;
//内存收缩阈值设置
 mp_.trim_threshold = DEFAULT_TRIM_THRESHOLD;
//页
 mp_.pagesize = malloc_getpagesize;
//线程本地分配区数量      
#ifdef PER_THREAD
# define NARENAS_FROM_NCORES(n) ((n) * (sizeof(long) == 4 ? 2 : 8))
 mp_.arena_test = NARENAS_FROM_NCORES (1);
 narenas = 1;
#endif
}

主要将全局变量mp_字段初始化为默认值
若定义编译选项PER_THREAD，会根据cpu核心数计算分配区数量
- 32位每个核心2分配区
- 64位每个核心8分配区

glibc-2.23

_libc_malloc

__libc_malloc (size_t bytes)
{
  mstate ar_ptr; //表示分配区(arena)状态
  void *victim; //表示分配的内存块，最终返回给调用者

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);//_malloc_hook是一个全局钩子函数指针，允许用于在malloc时插入自己的逻辑
//atomic_forced_read：确保_malloc_hook的读取是原子操作，防止多线程导致的错误
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));//RETURN_ADDRESS(0)获取当前函数的返回地址，用于调试和追踪

  arena_get (ar_ptr, bytes); //获取分配区

  victim = _int_malloc (ar_ptr, bytes);
  /* Retry with another arena only if we were able to find a usable arena
     before.  */
//如果第一次分配失败，并分配区不为空，则尝试从另一分配区重新分配
  if (!victim && ar_ptr != NULL)
    {
      LIBC_PROBE (memory_malloc_retry, 1, bytes);
      ar_ptr = arena_get_retry (ar_ptr, bytes);//尝试另一个可用分配区
      victim = _int_malloc (ar_ptr, bytes);//再次调用分配的函数
    }

//如果分配区不为空那就释放互斥锁(mutex)
  if (ar_ptr != NULL)
    (void) mutex_unlock (&ar_ptr->mutex);

  assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
          ar_ptr == arena_for_chunk (mem2chunk (victim)));
  return victim;//返回分配的内存块的指针
}

钩子函数指针：
- 钩子：钩子是一种拦截机制，允许程序在系统或应用程序的某些关键点插入自己的代码
- 钩子函数指针：钩子函数指针是一个指向函数的指针，它指向的函数是用户定义的回调函数。当某个事件被触发时，系统会调用这个指针指向的函数，从而执行用户定义的逻辑。
原子操作：
- 原子操作是指一个不可分割的操作，它在执行过程中不会被中断或干扰，要么完全执行，要么完全不执行，中间状态不会被其他操作看到。主要用于多线程或并发编程环境，避免出现竞态条件和数据不一致环境

对于：

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);//_malloc_hook是一个全局钩子函数指针，允许用于在malloc时插入自己的逻辑
//atomic_forced_read：确保_malloc_hook的读取是原子操作，防止多线程导致的错误
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));//RETURN_ADDRESS(0)获取当前函数的返回地址，用于调试和追踪

hook被初始化为_malloc_hook

_malloc_hook:

1 2	`void weak_variable (__malloc_hook) (size_t __size, const void *) = malloc_hook_ini;`

这是对_malloc_hook的初始化，将其指向一个名为malloc_hook_ini

malloc_hook_ini:

static void* malloc_hook_ini(size_t sz, const void* caller) {
    __malloc_hook = NULL;  // 禁用当前钩子，防止递归调用，这里保证在多次调用_libc_malloc情况下只会在第一次时调用hook
    ptmalloc_init();       // 初始化内存分配器
    return __libc_malloc(sz);  // 调用原始的 malloc 函数
}

这里又调用了ptmalloc_init()，是一个初始化函数

void ptmalloc_init (void) {
  if (__malloc_initialized >= 0)//检查是否已经初始化了__malloc_initialized是一个全局变量，用于记录内存分配器是否已经初始化。>=0表示分配器已经初始化就直接返回，<0是未初始化，要继续
    return;
  __malloc_initialized = 0;//表示分配器正在初始化，防止其他线程重复初始化
  
  thread_arena = &main_arena;//将线程本地分配区指向主分配区
  
  __malloc_initialized = 1;//标记初始化完成
}

总结：

第一次调用时要初始化那些东西，而后再次调用时，就跳过了直接到_int_malloc

可以由代码看到arena_get (ar_ptr, bytes);后就到了_int_malloc，这是ptmalloc分配器的核心函数，负责实际的内存分配逻辑，那么接下去就到了_int_malloc源码部分

__int_malloc

static void *
_int_malloc (mstate av, size_t bytes)
{
  INTERNAL_SIZE_T nb;               /* normalized request size (规范化请求大小)*/
     //nb是经过调整后的请求大小
  unsigned int idx;                 /* associated bin index （关联的bin索引）*/
     //idx是请求大小对应的bin索引
  mbinptr bin;                      /* associated bin（关联的bin） */
     //bin是一个指向bin的指针，表示当前请求大小对应的bin

  mchunkptr victim;                 /* inspected/selected chunk（检查/选择的块） */
     //victim：指针，指向当前正在检查或选择的内存块
  INTERNAL_SIZE_T size;             /* its size */
     //size是当前内存块的大小
  int victim_index;                 /* its bin index（它的bin索引） */
    //=当前内存块所属的bin索引
  mchunkptr remainder;              /* remainder from a split （分割后的剩余部分）*/
     //指针，指向分割后的剩余部分
  unsigned long remainder_size;     /* its size */
     //剩余部分大小

  unsigned int block;               /* bit map traverser （位图遍历器）*/
     //block用来遍历位图
  unsigned int bit;                 /* bit map traverser （位图遍历器）*/
     //bit是另一个变量，用来遍历位图中的位
  unsigned int map;                 /* current word of binmap（当前binmap中的word） */
     //map是当前正在处理的binmap中的一个字(word)

  mchunkptr fwd;                    /* misc temp for linking（链接时的临时变量） */
     //一个临时指针，用于链接
  mchunkptr bck;                    /* misc temp for linking （链接时的临时变量）*/
     //另一个临时指针，用于链接

  const char *errstr = NULL;

然后

/*
   Convert request size to internal form by adding SIZE_SZ bytes
   overhead plus possibly more to obtain necessary alignment and/or
   to obtain a size of at least MINSIZE, the smallest allocatable
   size. Also, checked_request2size traps (returning 0) request sizes
   that are so large that they wrap around zero when padded and
   aligned.
 */

checked_request2size (bytes, nb);//将请求的大小转换为内部实际请求的大小

然后：

  /* There are no usable arenas.  Fall back to sysmalloc to get a chunk from
     mmap.  */
/* 没有可用的分配区域（arena）。回退到 sysmalloc，通过 mmap 获取一块内存。 */
  if (__glibc_unlikely (av == NULL))//检查是否有可用arena
    {
       //没有时
      void *p = sysmalloc (nb, av);
      if (p != NULL)
	alloc_perturb (p, bytes);
      return p;
    }

接着来到fast bins

fast bins

/*
   If the size qualifies as a fastbin, first check corresponding bin.
   This code is safe to execute even if av is not yet initialized, so we
   can try it without checking, which saves some time on this fast path.
 */

if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))//判断是否符合fast bins的条件
  {
    idx = fastbin_index (nb);//根据请求大小计算对应的fast bins的索引
    mfastbinptr *fb = &fastbin (av, idx);//获取fastbin指针
    mchunkptr pp = *fb;
    do//尝试从fastbins中取出一个内存块
      {
        victim = pp;
        if (victim == NULL)
          break;
      }
    while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim))
           != victim);//原子操作。如果当前值等于预期值（victim），则将变量更新为新值（victim->fd），并返回原值（victim）。如果当前值不等于预期值，则返回当前值
    if (victim != 0)//检查内存块是否有效
      {
        if (__builtin_expect (fastbin_index (chunksize (victim)) != idx, 0))//检查该fastbin的size是否合法
          {
            errstr = "malloc(): memory corruption (fast)";
          errout:
            malloc_printerr (check_action, errstr, chunk2mem (victim), av);
            return NULL;
          }
        check_remalloced_chunk (av, victim, nb);//检查内存块是否被重新分配。转到定义看
        void *p = chunk2mem (victim);//将内存块转换为用户可用的内存地址
        alloc_perturb (p, bytes);
        return p;
      }
  }

check_remalloced_chunk定义：

#if !MALLOC_DEBUG

# define check_chunk(A, P)
# define check_free_chunk(A, P)
# define check_inuse_chunk(A, P)
# define check_remalloced_chunk(A, P, N)
# define check_malloced_chunk(A, P, N)
# define check_malloc_state(A)

#else

# define check_chunk(A, P)              do_check_chunk (A, P)//检查内存块P的状态是否符合预期
# define check_free_chunk(A, P)         do_check_free_chunk (A, P)//检查已释放的内存块P状态
# define check_inuse_chunk(A, P)        do_check_inuse_chunk (A, P)//检查正在使用的内存块P状态
# define check_remalloced_chunk(A, P, N) do_check_remalloced_chunk (A, P, N)//检查重新分配的内存块P状态
# define check_malloced_chunk(A, P, N)   do_check_malloced_chunk (A, P, N)//检查新分配的内存块P状态
# define check_malloc_state(A)         do_check_malloc_state (A)//检查分配区A状态是否一致

MALLOC_DEBUG是一个宏，用于控制是否开启了调试
- 未定义：则所有调试检查宏被定义为空，不执行任何操作
- 定义：这些宏会被定义为调用实际的检查函数

smallbin

/*
    If a small request, check regular bin.  Since these "smallbins"
    hold one size each, no searching within bins is necessary.
    (For a large request, we need to wait until unsorted chunks are
    processed to find best fit. But for small ones, fits are exact
    anyway, so we can check now, which is faster.)
  */

 if (in_smallbin_range (nb))//检查是否属于smallbin范围内
   {
     idx = smallbin_index (nb);//计算smallbin索引
     bin = bin_at (av, idx);//获取smallbin指针

     if ((victim = last (bin)) != bin)//尝试从smallbin中分配内存
       {
         if (victim == 0) /* initialization check（初始化检查） */
           malloc_consolidate (av);//如果分配区还没初始化就初始化，这是初始化函数
         else
           {
             bck = victim->bk;
if (__glibc_unlikely (bck->fd != victim))//检查双向链表完整性
               {
                 errstr = "malloc(): smallbin double linked list corrupted";
                 goto errout;
               }
             set_inuse_bit_at_offset (victim, nb);//标记内存块已使用和设置内存块大小
              //接下来两行是更新双向链表指针
             bin->bk = bck;
             bck->fd = bin;

             if (av != &main_arena)//如果不属于主分配区，则进入if，标记为非主分配区
               victim->size |= NON_MAIN_ARENA;
             check_malloced_chunk (av, victim, nb);//检查内存块
             void *p = chunk2mem (victim);//转换
             alloc_perturb (p, bytes);
             return p;
           }
       }
   }

largebin

/*
   If this is a large request, consolidate fastbins before continuing.
   While it might look excessive to kill all fastbins before
   even seeing if there is space available, this avoids
   fragmentation problems normally associated with fastbins.
   Also, in practice, programs tend to have runs of either small or
   large requests, but less often mixtures, so consolidation is not
   invoked all that often in most programs. And the programs that
   it is called frequently in otherwise tend to fragment.
 */

else
  {
    idx = largebin_index (nb);//计算largebin索引
    if (have_fastchunks (av))//检查分配区有没有fastbins中的内存块，如果有就合并
      malloc_consolidate (av);
  }

大循环

1
2
3

for (;; )
   {
     int iters = 0;//初始化计数器防止无限循环

unsorted bin

 while ((victim = unsorted_chunks (av)->bk) != unsorted_chunks (av))//从unsorted bin取出chunkunsorted_chunks(av)：获取未排序 chunk 列表的头指针。victim：当前尝试分配的 chunk
        {
          bck = victim->bk;
          if (__builtin_expect (victim->size <= 2 * SIZE_SZ, 0)
              || __builtin_expect (victim->size > av->system_mem, 0))//检查chunk有效，小于2*SIZE_SZ和大于分配区总内存大小的就是无效的
            malloc_printerr (check_action, "malloc(): memory corruption",
                             chunk2mem (victim), av);
          size = chunksize (victim);//计算chunk实际大小

          /*
             If a small request, try to use last remainder if it is the
             only chunk in unsorted bin.  This helps promote locality for
             runs of consecutive small requests. This is the only
             exception to best-fit, and applies only when there is
             no exact fit for a small chunk.
           */
      /* 如果是一个小请求，尝试使用上一次的剩余部分（remainder），前提是无序链表（unsorted bin）中只有一个块。这有助于提升连续小请求的局部性。这是最佳拟合（best-fit）策略的唯一例外，仅适用于没有精确匹配的小块。 */

          if (in_smallbin_range (nb) &&
              bck == unsorted_chunks (av) &&
              victim == av->last_remainder &&
              (unsigned long) (size) > (unsigned long) (nb + MINSIZE))//如果请求的是小块内存且unsortedbin只有一个chunk(last_remainder)且该chunk大小大于请求大小+MINSIZE
            {
              /* split and reattach remainder */
               //分割chunk
              remainder_size = size - nb;
              remainder = chunk_at_offset (victim, nb);
               //更新unsorted bin
              unsorted_chunks (av)->bk = unsorted_chunks (av)->fd = remainder;
              av->last_remainder = remainder;
              remainder->bk = remainder->fd = unsorted_chunks (av);
              if (!in_smallbin_range (remainder_size))
                {
                  remainder->fd_nextsize = NULL;
                  remainder->bk_nextsize = NULL;
                }

               //设置chunk的大小和状态
              set_head (victim, nb | PREV_INUSE |
                        (av != &main_arena ? NON_MAIN_ARENA : 0));
              set_head (remainder, remainder_size | PREV_INUSE);
              set_foot (remainder, remainder_size);

              check_malloced_chunk (av, victim, nb);
               //返回分配的内存地址
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
            }

          /* remove from unsorted list */
      //若chunk不满足条件，则移出unsorted bin
          unsorted_chunks (av)->bk = bck;
          bck->fd = unsorted_chunks (av);

          /* Take now instead of binning if exact fit */

      //处理精准匹配
          if (size == nb)
            {
              set_inuse_bit_at_offset (victim, size);
              if (av != &main_arena)
                victim->size |= NON_MAIN_ARENA;
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
            }

          /* place chunk in bin *///将chunk插入到对应的bin中

          if (in_smallbin_range (size))//chunk在smallbin范围
            {
              victim_index = smallbin_index (size);
              bck = bin_at (av, victim_index);
              fwd = bck->fd;
            }
          else//largebin范围，插入到largebins中
            {
              victim_index = largebin_index (size);
              bck = bin_at (av, victim_index);
              fwd = bck->fd;

              /* maintain large bins in sorted order */
              if (fwd != bck)
                {
                  /* Or with inuse bit to speed comparisons */
                  size |= PREV_INUSE;
                  /* if smaller than smallest, bypass loop below */
                  assert ((bck->bk->size & NON_MAIN_ARENA) == 0);
                  if ((unsigned long) (size) < (unsigned long) (bck->bk->size))
                    {
                      fwd = bck;
                      bck = bck->bk;

                      victim->fd_nextsize = fwd->fd;
                      victim->bk_nextsize = fwd->fd->bk_nextsize;
                      fwd->fd->bk_nextsize = victim->bk_nextsize->fd_nextsize = victim;
                    }
                  else
                    {
                      assert ((fwd->size & NON_MAIN_ARENA) == 0);
                      while ((unsigned long) size < fwd->size)
                        {
                          fwd = fwd->fd_nextsize;
                          assert ((fwd->size & NON_MAIN_ARENA) == 0);
                        }

                      if ((unsigned long) size == (unsigned long) fwd->size)
                        /* Always insert in the second position.  */
                        fwd = fwd->fd;
                      else
                        {
                          victim->fd_nextsize = fwd;
                          victim->bk_nextsize = fwd->bk_nextsize;
                          fwd->bk_nextsize = victim;
                          victim->bk_nextsize->fd_nextsize = victim;
                        }
                      bck = fwd->bk;
                    }
                }
              else
                victim->fd_nextsize = victim->bk_nextsize = victim;
            }

          mark_bin (av, victim_index);//标记bin已使用（binmap）
      //更新双向链表
          victim->bk = bck;
          victim->fd = fwd;
          fwd->bk = victim;
          bck->fd = victim;

#define MAX_ITERS       10000
          if (++iters >= MAX_ITERS)//检查最大尝试次数
            break;
        }

编译后largebins

/*
         If a large request, scan through the chunks of current bin in
         sorted order to find smallest that fits.  Use the skip list for this.
       */

      if (!in_smallbin_range (nb))
        {
          bin = bin_at (av, idx);//获取头指针

          /* skip scan if empty or largest chunk is too small *///跳过链表为空或第一个chunk大小小于请求大小
          if ((victim = first (bin)) != bin &&
              (unsigned long) (victim->size) >= (unsigned long) (nb))
            {
               //寻找大小合适的chunk
              victim = victim->bk_nextsize;
              while (((unsigned long) (size = chunksize (victim)) <
                      (unsigned long) (nb)))
                victim = victim->bk_nextsize;

              /* Avoid removing the first entry for a size so that the skip
                 list does not have to be rerouted.  */
               //若找到的chunk不是最后一个且与下一个chunk大小相同就跳过这个chunk
              if (victim != last (bin) && victim->size == victim->fd->size)
                victim = victim->fd;

              remainder_size = size - nb;//计算剩余大小
              unlink (av, victim, bck, fwd);//移除找到的chunk

              /* Exhaust */
               //对剩余部分的处理
              if (remainder_size < MINSIZE)
                {
                  set_inuse_bit_at_offset (victim, size);
                  if (av != &main_arena)
                    victim->size |= NON_MAIN_ARENA;
                }//若剩余部分<MINSIZE，就标记整个chunk已使用
              /* Split */
              else
                {
                  remainder = chunk_at_offset (victim, nb);//分割
                  /* We cannot assume the unsorted list is empty and therefore
                     have to perform a complete insert here.  *//* 我们不能假设unsorted bin是空的，因此必须在这里执行完整的插入操作。 */
                  bck = unsorted_chunks (av);
                  fwd = bck->fd;
	  if (__glibc_unlikely (fwd->bk != bck))
                    {
                      errstr = "malloc(): corrupted unsorted chunks";
                      goto errout;
                    }
                  remainder->bk = bck;
                  remainder->fd = fwd;
                  bck->fd = remainder;
                  fwd->bk = remainder;//上述代码作用是将剩余部分插入到unsorted bin中
                  if (!in_smallbin_range (remainder_size))
                    {
                      remainder->fd_nextsize = NULL;
                      remainder->bk_nextsize = NULL;
                    }
                  set_head (victim, nb | PREV_INUSE |
                            (av != &main_arena ? NON_MAIN_ARENA : 0));
                  set_head (remainder, remainder_size | PREV_INUSE);
                  set_foot (remainder, remainder_size);//设置分割后的chunk大小与状态
                }
               //返回内存地址
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;
            }
        }

逻辑：

获取 largebin 的头指针
在排序链表中查找合适大小的 chunk
移除找到的 chunk
如果剩余部分足够大，则分割 chunk，并将剩余部分插入unsorted bin
返回内存地址

binmap

 /*
         Search for a chunk by scanning bins, starting with next largest
         bin. This search is strictly by best-fit; i.e., the smallest
         (with ties going to approximately the least recently used) chunk
         that fits is selected.

         The bitmap avoids needing to check that most blocks are nonempty.
         The particular case of skipping all bins during warm-up phases
         when no chunks have been returned yet is faster than it might look.
       */
/*
 * 通过扫描 bins 来搜索合适的 chunk，从下一个较大的 bin 开始。
 * 这种搜索严格遵循最佳拟合策略；即选择最小的（如果有多个大小相同的 chunk，则选择最近最少使用的）chunk。
 *
 * 位图（binmap）避免了检查大多数块是否为空的需要。
 * 在没有返回任何 chunk 的热启动阶段，跳过所有 bins 的情况比看起来要快。
 */

      ++idx;//从下一个bin开始搜索
      bin = bin_at (av, idx);//获取当前bin的指针
      block = idx2block (idx);//计算当前bin所在块索引
      map = av->binmap[block];//获取当前块的binmap
      bit = idx2bit (idx);//获取当前bin对应的位

      for (;; )
        {
          /* Skip rest of block if there are no more set bits in this block.  */
          if (bit > map || bit == 0)
            {
              do//若超出bins的范围就跳到使用top chunk
                {
                  if (++block >= BINMAPSIZE) /* out of bins */
                    goto use_top;
                }
              while ((map = av->binmap[block]) == 0);

              bin = bin_at (av, (block << BINMAPSHIFT));
              bit = 1;
            }

          /* Advance to bin with set bit. There must be one. */
          while ((bit & map) == 0)//找到第一个非空bin
            {
              bin = next_bin (bin);
              bit <<= 1;
              assert (bit != 0);
            }

          /* Inspect the bin. It is likely to be non-empty */
          victim = last (bin);//获取bin最后一个chunk

          /*  If a false alarm (empty bin), clear the bit. */
          if (victim == bin)//若bin为空
            {
              av->binmap[block] = map &= ~bit; /* Write through *///清除对应的位
              bin = next_bin (bin);
              bit <<= 1;
            }

          else//若不为空
            {
              size = chunksize (victim);//获取大小

              /*  We know the first chunk in this bin is big enough to use. */
              assert ((unsigned long) (size) >= (unsigned long) (nb));

              remainder_size = size - nb;//剩余部分

              /* unlink */
              unlink (av, victim, bck, fwd);

              /* Exhaust */
              if (remainder_size < MINSIZE)//剩余<MINSIZE
                {
                  set_inuse_bit_at_offset (victim, size);//标记整个chunk为已使用
                  if (av != &main_arena)
                    victim->size |= NON_MAIN_ARENA;
                }

              /* Split */
              else//分割
                {
                  remainder = chunk_at_offset (victim, nb);

                  /* We cannot assume the unsorted list is empty and therefore
                     have to perform a complete insert here.  */
                   //将剩余部分插入unsorted bin
                  bck = unsorted_chunks (av);
                  fwd = bck->fd;
	  if (__glibc_unlikely (fwd->bk != bck))
                    {
                      errstr = "malloc(): corrupted unsorted chunks 2";
                      goto errout;
                    }
                  remainder->bk = bck;
                  remainder->fd = fwd;
                  bck->fd = remainder;
                  fwd->bk = remainder;

                  /* advertise as last remainder */
                  if (in_smallbin_range (nb))
                    av->last_remainder = remainder;
                  if (!in_smallbin_range (remainder_size))
                    {
                      remainder->fd_nextsize = NULL;
                      remainder->bk_nextsize = NULL;
                    }
                  set_head (victim, nb | PREV_INUSE |
                            (av != &main_arena ? NON_MAIN_ARENA : 0));
                  set_head (remainder, remainder_size | PREV_INUSE);
                  set_foot (remainder, remainder_size);
                }//设置大小状态
              check_malloced_chunk (av, victim, nb);
              void *p = chunk2mem (victim);
              alloc_perturb (p, bytes);
              return p;//返回地址
            }
        }

总结：

从下一个较大的 bin 开始搜索。
使用位图跳过空块，提高搜索效率。
在非空 bin 中找到合适的 chunk。
如果找到的 chunk 大小合适，则直接分配。
如果 chunk 大小过大，则分割 chunk，并将剩余部分插入unsorted bin
返回分配的内存地址。

top chunk

use_top:
      /*
         If large enough, split off the chunk bordering the end of memory
         (held in av->top). Note that this is in accord with the best-fit
         search rule.  In effect, av->top is treated as larger (and thus
         less well fitting) than any other available chunk since it can
         be extended to be as large as necessary (up to system
         limitations).

         We require that av->top always exists (i.e., has size >=
         MINSIZE) after initialization, so if it would otherwise be
         exhausted by current request, it is replenished. (The main
         reason for ensuring it exists is that we may need MINSIZE space
         to put in fenceposts in sysmalloc.)
       */

      victim = av->top;//获取top chunk
      size = chunksize (victim);//获取top的大小

      if ((unsigned long) (size) >= (unsigned long) (nb + MINSIZE))//top chunk足够大
        {
          remainder_size = size - nb;//计算剩余
          remainder = chunk_at_offset (victim, nb);//分割
          av->top = remainder;
          set_head (victim, nb | PREV_INUSE |
                    (av != &main_arena ? NON_MAIN_ARENA : 0));
          set_head (remainder, remainder_size | PREV_INUSE);//设置剩余部分头部信息

          check_malloced_chunk (av, victim, nb);
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;//返回分配的内存地址
        }

      /* When we are using atomic ops to free fast chunks we can get
         here for all block sizes.  */
      else if (have_fastchunks (av))//合并fastbins后再次尝试分配
        {
          malloc_consolidate (av);//合并fastbins
          /* restore original bin index */
          if (in_smallbin_range (nb))
            idx = smallbin_index (nb);
          else
            idx = largebin_index (nb);
        }

      /*
         Otherwise, relay to handle system-dependent cases
       */
      else//上述都不满足，则调用系统级内存分配函数
        {
          void *p = sysmalloc (nb, av);//调用系统级内存分配函数
          if (p != NULL)
            alloc_perturb (p, bytes);
          return p;
        }
    }
}

总结：

使用 top chunk 分配内存。
合并 fastbins 后再次尝试分配
调用系统级内存分配函数（如 sbrk 或 mmap）

_libc_free

void
__libc_free (void *mem)
{
  mstate ar_ptr;
  mchunkptr p;                          /* chunk corresponding to mem */

     //检查是否有用户自定义逻辑
  void (*hook) (void *, const void *)
    = atomic_forced_read (__free_hook);
  if (__builtin_expect (hook != NULL, 0))
    {
      (*hook)(mem, RETURN_ADDRESS (0));
      return;
    }

  if (mem == 0)                              /* free(0) has no effect （free(0)无任何效果i）*/
    return;//如果传入指针是NULL直接返回

  p = mem2chunk (mem);//将用户指针转换为chunk指针

  if (chunk_is_mmapped (p))    //看是否是mmap分配的内存块                   /* release mmapped memory. */
    {
      /* see if the dynamic brk/mmap threshold needs adjusting */
       //动态调整相关
      if (!mp_.no_dyn_threshold
          && p->size > mp_.mmap_threshold
          && p->size <= DEFAULT_MMAP_THRESHOLD_MAX)
        {
          mp_.mmap_threshold = chunksize (p);
          mp_.trim_threshold = 2 * mp_.mmap_threshold;
          LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
                      mp_.mmap_threshold, mp_.trim_threshold);
        }
      munmap_chunk (p);//mmap对应的释放函数
      return;
    }

  ar_ptr = arena_for_chunk (p);
  _int_free (ar_ptr, p, 0);//调用释放函数
}

_int_free

初始化：

static void
_int_free (mstate av, mchunkptr p, int have_lock)
{
  INTERNAL_SIZE_T size;        /* its size */
  mfastbinptr *fb;             /* associated fastbin(相关的fastbin) */
  mchunkptr nextchunk;         /* next contiguous chunk （紧邻的下一个chunk）*/
  INTERNAL_SIZE_T nextsize;    /* its size */
  int nextinuse;               /* true if nextchunk is used（若下个chunk正在被使用，就为true） */
  INTERNAL_SIZE_T prevsize;    /* size of previous contiguous chunk（紧邻的上个chunk大小） */
  mchunkptr bck;               /* misc temp for linking （用于临时链接的临时变量）*/
  mchunkptr fwd;               /* misc temp for linking */

  const char *errstr = NULL;
  int locked = 0;

  size = chunksize (p);//获取当前chunk大小

安全检查：

  /* Little security check which won't hurt performance: the
     allocator never wrapps around at the end of the address space.
     Therefore we can exclude some size values which might appear
     here by accident or by "design" from some intruder.  */
/* 一个不会影响性能的小型安全检查：分配器永远不会在地址空间的末尾回绕。因此，我们可以排除一些可能偶然出现或被“设计”出来的大小值，这些值可能来自某些入侵者。 */
  if (__builtin_expect ((uintptr_t) p > (uintptr_t) -size, 0)
      || __builtin_expect (misaligned_chunk (p), 0))//指针是否合法||内存块地址是否符合对其要求
    {
      errstr = "free(): invalid pointer";
    errout:
      if (!have_lock && locked)
        (void) mutex_unlock (&av->mutex);
      malloc_printerr (check_action, errstr, chunk2mem (p), av);
      return;
    }
  /* We know that each chunk is at least MINSIZE bytes in size or a
     multiple of MALLOC_ALIGNMENT.  */
  if (__glibc_unlikely (size < MINSIZE || !aligned_OK (size)))//内存块大小<MINSIZE||内存块大小是否符合对其要求
    {
      errstr = "free(): invalid size";
      goto errout;
    }

  check_inuse_chunk(av, p);//检查内存块是否正在使用

fastbin

if ((unsigned long)(size) <= (unsigned long)(get_max_fast ())//看是否在fastbin范围内

#if TRIM_FASTBINS
      /*
	If TRIM_FASTBINS set, don't place chunks
	bordering top into fastbins
      */
         /*
	如果定义了 TRIM_FASTBINS，不要将紧邻 top chunk 的 chunk 放入 fastbins。
      */
      && (chunk_at_offset(p, size) != av->top)
#endif
      ) {

    if (__builtin_expect (chunk_at_offset (p, size)->size <= 2 * SIZE_SZ, 0)//检查洗一个chunk大小是不是<=2*SIZE_SZ
	|| __builtin_expect (chunksize (chunk_at_offset (p, size))//或大于分配区总内存
			     >= av->system_mem, 0))
      {
	/* We might not have a lock at this point and concurrent modifications
	   of system_mem might have let to a false positive.  Redo the test
	   after getting the lock.  */
	if (have_lock
	    || ({ assert (locked == 0);//获取锁，重新检查
		  mutex_lock(&av->mutex);
		  locked = 1;
		  chunk_at_offset (p, size)->size <= 2 * SIZE_SZ
		    || chunksize (chunk_at_offset (p, size)) >= av->system_mem;
	      }))
	  {
	    errstr = "free(): invalid next size (fast)";
	    goto errout;
	  }
	if (! have_lock)//释放锁
	  {
	    (void)mutex_unlock(&av->mutex);
	    locked = 0;
	  }
      }

    free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);

    set_fastchunks(av);//设置标志，表示分配区中有fastbin
    unsigned int idx = fastbin_index(size);//根据chunk大小计算对应fastbin索引
    fb = &fastbin (av, idx);//获取分配区第idx个fastbin的指针

    /* Atomically link P to its fastbin: P->FD = *FB; *FB = P;  */
     //原子操作将chunk插入fastbin
    mchunkptr old = *fb, old2;
    unsigned int old_idx = ~0u;
    do
      {
	/* Check that the top of the bin is not the record we are going to add
	   (i.e., double free).  */
         /* 检查 fastbin 的顶部是否是当前要插入的 chunk（即双重释放）。 */
	if (__builtin_expect (old == p, 0))
	  {
	    errstr = "double free or corruption (fasttop)";
	    goto errout;
	  }
	/* Check that size of fastbin chunk at the top is the same as
	   size of the chunk that we are adding.  We can dereference OLD
	   only if we have the lock, otherwise it might have already been
	   deallocated.  See use of OLD_IDX below for the actual check.  */
         //检查fastbin顶部chunk大小是否与当前chunk大小一致
	if (have_lock && old != NULL)
	  old_idx = fastbin_index(chunksize(old));
	p->fd = old2 = old;
      }
    while ((old = catomic_compare_and_exchange_val_rel (fb, p, old2)) != old2);

    if (have_lock && old != NULL && __builtin_expect (old_idx != idx, 0))//检查fastbin顶部chunk大小
      {
	errstr = "invalid fastbin entry (free)";
	goto errout;
      }
  }

继续检查

/*
   Consolidate other non-mmapped chunks as they arrive.
 */
 //检查是否要对非mmap内存块合并
 else if (!chunk_is_mmapped(p)) {
   if (! have_lock) {
     (void)mutex_lock(&av->mutex);//尝试获取分配区的锁
     locked = 1;
   }

   nextchunk = chunk_at_offset(p, size);//获取下个chunk地址

   /* Lightweight tests: check whether the block is already the
      top block.  */
   if (__glibc_unlikely (p == av->top))//检查当前内存块是不是分配区的topchunk
     {
errstr = "double free or corruption (top)";
goto errout;
     }
   /* Or whether the next chunk is beyond the boundaries of the arena.  */
      //检查下个chunk是不是超出分配区的边界
   if (__builtin_expect (contiguous (av)
		  && (char *) nextchunk
		  >= ((char *) av->top + chunksize(av->top)), 0))
     {
errstr = "double free or corruption (out)";
goto errout;
     }
   /* Or whether the block is actually not marked used.  */
      //检查下个快的‘前一块是否使用标志位’设置没
   if (__glibc_unlikely (!prev_inuse(nextchunk)))
     {
errstr = "double free or corruption (!prev)";
goto errout;
     }

      //检查下个快的大小（<MINSIZE或>分配区总内存大小）
   nextsize = chunksize(nextchunk);
   if (__builtin_expect (nextchunk->size <= 2 * SIZE_SZ, 0)
|| __builtin_expect (nextsize >= av->system_mem, 0))
     {
errstr = "free(): invalid next size (normal)";
goto errout;
     }

   free_perturb (chunk2mem(p), size - 2 * SIZE_SZ);//释放内存块内容

合并

/* consolidate backward */
       //向背后那个chunk合并
    if (!prev_inuse(p)) //若前一个chunk空闲，则将当前chunk与前个chunk合并
    {
      prevsize = p->prev_size;
      size += prevsize;//更新size
      p = chunk_at_offset(p, -((long) prevsize));//更新当前chunk指针为前一个chunk地址
      unlink(av, p, bck, fwd);//将前个chunk移除链表
    }

    if (nextchunk != av->top) {
      /* get and clear inuse bit */
      nextinuse = inuse_bit_at_offset(nextchunk, nextsize);//获取下个chunk的使用位

      /* consolidate forward */
         //向前面的chunk合并
      if (!nextinuse) {
	unlink(av, nextchunk, bck, fwd);
	size += nextsize;
      } else
	clear_inuse_bit_at_offset(nextchunk, 0);

unsorted bin

      /*
	Place the chunk in unsorted chunk list. Chunks are
	not placed into regular bins until after they have
	been given one chance to be used in malloc.
      */

      bck = unsorted_chunks(av);//将chunk放入unsorted bin
      fwd = bck->fd;
      if (__glibc_unlikely (fwd->bk != bck))//检查双向链表有没被破坏
	{
	  errstr = "free(): corrupted unsorted chunks";
	  goto errout;
	}
      //将释放的内存块p插入unsorted bin链表
      p->fd = fwd;
      p->bk = bck;
      if (!in_smallbin_range(size))//如果是在largebin范围内就设置特有的两个指针
	{
	  p->fd_nextsize = NULL;
	  p->bk_nextsize = NULL;
	}
      bck->fd = p;
      fwd->bk = p;

      set_head(p, size | PREV_INUSE);
      set_foot(p, size);//设置头，脚部信息

      check_free_chunk(av, p);//检查内存块状态
    }

    /*
      If the chunk borders the current high end of memory,
      consolidate into top
    */
//如果内存块与top chunk相邻，就合并到top chunk
    else {
      size += nextsize;
      set_head(p, size | PREV_INUSE);
      av->top = p;
      check_chunk(av, p);
    }

整理堆

    /*
      If freeing a large space, consolidate possibly-surrounding
      chunks. Then, if the total unused topmost memory exceeds trim
      threshold, ask malloc_trim to reduce top.

      Unless max_fast is 0, we don't know if there are fastbins
      bordering top, so we cannot tell for sure whether threshold
      has been reached unless fastbins are consolidated.  But we
      don't want to consolidate on each free.  As a compromise,
      consolidation is performed if FASTBIN_CONSOLIDATION_THRESHOLD
      is reached.
    */

    if ((unsigned long)(size) >= FASTBIN_CONSOLIDATION_THRESHOLD) //当前释放的大小超过FASTBIN_CONSOLIDATION_THRESHOLD进入分支
    {
      if (have_fastchunks(av))
	malloc_consolidate(av);//合并fastbin中内存块

      if (av == &main_arena)//看是否要修剪top chunk 
      {
#ifndef MORECORE_CANNOT_TRIM
	if ((unsigned long)(chunksize(av->top)) >=
	    (unsigned long)(mp_.trim_threshold))//看top chunk是不是超过了修剪阈值
	  systrim(mp_.top_pad, av);
#endif
      } else//对于非主分配区 
      {
	/* Always try heap_trim(), even if the top chunk is not
	   large, because the corresponding heap might go away.  */
           /* 即使顶部块不大，也尝试调用 heap_trim()，因为对应的堆可能会被释放。 */
	heap_info *heap = heap_for_ptr(top(av));

	assert(heap->ar_ptr == av);
	heap_trim(heap, mp_.top_pad);
      }
    }

    if (! have_lock) {
      assert (locked);
      (void)mutex_unlock(&av->mutex);
    }//释放锁
  }
  /*
    If the chunk was allocated via mmap, release via munmap().
  */

  else {
    munmap_chunk (p);
  }//释放由mmap分配的内存
}

glibc-2.27

__libc_malloc

void *
__libc_malloc (size_t bytes)
{
  mstate ar_ptr;
  void *victim;

  void *(*hook) (size_t, const void *)
    = atomic_forced_read (__malloc_hook);//钩子函数
  if (__builtin_expect (hook != NULL, 0))
    return (*hook)(bytes, RETURN_ADDRESS (0));
#if USE_TCACHE//编译时宏，用于启用线程缓存机制
  /* int_free also calls request2size, be careful to not pad twice.  */
  size_t tbytes;
  checked_request2size (bytes, tbytes);//将请求大小转换为实际分配的大小
  size_t tc_idx = csize2tidx (tbytes);

  MAYBE_INIT_TCACHE ();

  DIAG_PUSH_NEEDS_COMMENT;
  if (tc_idx < mp_.tcache_bins
      /*&& tc_idx < TCACHE_MAX_BINS*/ /* to appease gcc */
      && tcache
      && tcache->entries[tc_idx] != NULL)
    {
      return tcache_get (tc_idx);
    }
  DIAG_POP_NEEDS_COMMENT;
#endif

  if (SINGLE_THREAD_P)//单线程
    {
      victim = _int_malloc (&main_arena, bytes);
      assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
	      &main_arena == arena_for_chunk (mem2chunk (victim)));
      return victim;
    }

  arena_get (ar_ptr, bytes);//多线程

  victim = _int_malloc (ar_ptr, bytes);
  /* Retry with another arena only if we were able to find a usable arena
     before.  */
  if (!victim && ar_ptr != NULL)
    {
      LIBC_PROBE (memory_malloc_retry, 1, bytes);
      ar_ptr = arena_get_retry (ar_ptr, bytes);
      victim = _int_malloc (ar_ptr, bytes);
    }

  if (ar_ptr != NULL)
    __libc_lock_unlock (ar_ptr->mutex);

  assert (!victim || chunk_is_mmapped (mem2chunk (victim)) ||
          ar_ptr == arena_for_chunk (mem2chunk (victim)));
  return victim;
}
libc_hidden_def (__libc_malloc)

可以看到与2.23相比最大的变化就是多了teache，下面就来详细地看一下这一部分的源码

teache

teache结构体

#if USE_TCACHE

/* We overlay this structure on the user-data portion of a chunk when
   the chunk is stored in the per-thread cache.  */
/* 当一个内存块（chunk）存储在线程本地缓存（per-thread cache）中时，
   我们将这个结构体覆盖在内存块的用户数据部分。 */
typedef struct tcache_entry//用于管理空闲chunk的结构体
{
  struct tcache_entry *next;//指向下一个空闲chunk
} tcache_entry;

/* There is one of these for each thread, which contains the
   per-thread cache (hence "tcache_perthread_struct").  Keeping
   overall size low is mildly important.  Note that COUNTS and ENTRIES
   are redundant (we could have just counted the linked list each
   time), this is for performance reasons.  */
/* 每个线程都有一个这样的结构体，它包含了线程本地缓存
   （因此命名为 "tcache_perthread_struct"）。保持整体大小较小是较为重要的。
   注意，COUNTS 和 ENTRIES 是冗余的（我们本可以每次都遍历链表来计数），
   这是出于性能考虑。 */
typedef struct tcache_perthread_struct//每个线程的teache数据结构
{
  char counts[TCACHE_MAX_BINS];//记录每个bin中空闲chunk数量，TCACHE_MAX_BINS：tcache中最大bin数量
  tcache_entry *entries[TCACHE_MAX_BINS];//每个元素是指向tcache_entry链表的指针
} tcache_perthread_struct;

static __thread bool tcache_shutting_down = false;//用于标记当前线程的tcache是否正在关闭，初始false
static __thread tcache_perthread_struct *tcache = NULL;//tcache的核心数据结构，存储当前线程的所有缓存信息，初始值NULL，表示tcache未初始化

tcache_put

/* Caller must ensure that we know tc_idx is valid and there's room
   for more chunks.  */
/* 调用者必须确保 tc_idx 是有效的，并且有空间存放更多的内存块。 */   //用于将内存块放入tcache
static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)//chunk:内存块指针，tc_idx：内存块对应的tcache bin索引
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);//强制转换类型以便作为链表的结点
  assert (tc_idx < TCACHE_MAX_BINS);//看看索引是否有效
  //插入链表
  e->next = tcache->entries[tc_idx];//将当前内存块指向链表的头部
  tcache->entries[tc_idx] = e;//将链表头部更新为当前内存块
  ++(tcache->counts[tc_idx]);//更新计数
}

tcache_get

/* Caller must ensure that we know tc_idx is valid and there's
   available chunks to remove.  */
/* 调用者必须确保 tc_idx 是有效的，并且有可用的内存块可以移除。 */
static __always_inline void *
tcache_get (size_t tc_idx)//用于从tcache中获取内存块
{
  tcache_entry *e = tcache->entries[tc_idx];//获取链表头部内存块地址
  assert (tc_idx < TCACHE_MAX_BINS);//检查索引是否有效
  assert (tcache->entries[tc_idx] > 0);//检查是否有可用内存块
  tcache->entries[tc_idx] = e->next;//更新链表（相当于从链表移除内存块）
  --(tcache->counts[tc_idx]);//更新计数
  return (void *) e;//返回内存块
}

_int_malloc

fastbin

 /*
     If the size qualifies as a fastbin, first check corresponding bin.
     This code is safe to execute even if av is not yet initialized, so we
     can try it without checking, which saves some time on this fast path.
   */

#define REMOVE_FB(fb, victim, pp)			\
  do							\
    {							\
      victim = pp;					\
      if (victim == NULL)				\
	break;						\
    }							\
  while ((pp = catomic_compare_and_exchange_val_acq (fb, victim->fd, victim)) \
	 != victim);					/
//原子操作，保证安全从fastbin中移除内存块

  if ((unsigned long) (nb) <= (unsigned long) (get_max_fast ()))
    {
      idx = fastbin_index (nb);
      mfastbinptr *fb = &fastbin (av, idx);
      mchunkptr pp;
      victim = *fb;

      if (victim != NULL)
	{
	  if (SINGLE_THREAD_P)//单线程直接移除（更新fastbin链表头）
	    *fb = victim->fd;
	  else//多线程，用上面那个宏安全移除
	    REMOVE_FB (fb, pp, victim);
	  if (__glibc_likely (victim != NULL))
	    {
	      size_t victim_idx = fastbin_index (chunksize (victim));
	      if (__builtin_expect (victim_idx != idx, 0))
		malloc_printerr ("malloc(): memory corruption (fast)");
	      check_remalloced_chunk (av, victim, nb);
#if USE_TCACHE//新加的，若启用tcache且其没满，就将fastbin中内存块移到tcache中
	      /* While we're here, if we see other chunks of the same size,
		 stash them in the tcache.  */
	      size_t tc_idx = csize2tidx (nb);
	      if (tcache && tc_idx < mp_.tcache_bins)
		{
		  mchunkptr tc_victim;

		  /* While bin not empty and tcache not full, copy chunks.  */
		  while (tcache->counts[tc_idx] < mp_.tcache_count
			 && (tc_victim = *fb) != NULL)
		    {
		      if (SINGLE_THREAD_P)
			*fb = tc_victim->fd;
		      else
			{
			  REMOVE_FB (fb, pp, tc_victim);
			  if (__glibc_unlikely (tc_victim == NULL))
			    break;
			}
		      tcache_put (tc_victim, tc_idx);
		    }
		}
#endif
	      void *p = chunk2mem (victim);
	      alloc_perturb (p, bytes);
	      return p;
	    }
	}
    }

与2.23相比

新加了tcache机制，提高性能（当从fastbin取出chunk时，若tcache没满且fastbin中还有chunk，就将fastbin中chunk移到tcache中）
新加判断是否是单线程，单线程与多线程不同的取出chunk的方式

smallbins

/*
     If a small request, check regular bin.  Since these "smallbins"
     hold one size each, no searching within bins is necessary.
     (For a large request, we need to wait until unsorted chunks are
     processed to find best fit. But for small ones, fits are exact
     anyway, so we can check now, which is faster.)
   */

  if (in_smallbin_range (nb))//移除内存块
    {
      idx = smallbin_index (nb);
      bin = bin_at (av, idx);

      if ((victim = last (bin)) != bin)
        {
          bck = victim->bk;
	  if (__glibc_unlikely (bck->fd != victim))
	    malloc_printerr ("malloc(): smallbin double linked list corrupted");
          set_inuse_bit_at_offset (victim, nb);
          bin->bk = bck;
          bck->fd = bin;

          if (av != &main_arena)//非主分配区
	    set_non_main_arena (victim);
          check_malloced_chunk (av, victim, nb);
#if USE_TCACHE//新加的
	  /* While we're here, if we see other chunks of the same size,
	     stash them in the tcache.  */
	  size_t tc_idx = csize2tidx (nb);
	  if (tcache && tc_idx < mp_.tcache_bins)
	    {
	      mchunkptr tc_victim;

	      /* While bin not empty and tcache not full, copy chunks over.  */
	      while (tcache->counts[tc_idx] < mp_.tcache_count
		     && (tc_victim = last (bin)) != bin)
		{
		  if (tc_victim != 0)
		    {
		      bck = tc_victim->bk;
		      set_inuse_bit_at_offset (tc_victim, nb);
		      if (av != &main_arena)
			set_non_main_arena (tc_victim);
		      bin->bk = bck;
		      bck->fd = bin;

		      tcache_put (tc_victim, tc_idx);//移入tcache
	            }
		}
	    }
#endif
          void *p = chunk2mem (victim);
          alloc_perturb (p, bytes);
          return p;
        }
    }

与2.23相比

引入tcache，在分配smallbins后，若还有剩余块且tcache有空间，就将smallbin中的块移入tcache
对smallbins未初始化就malloc_consolidate的机制进行移除

tcache_put的过程没对smallbin的完整性进行检查

largebin

没变

大循环

#if USE_TCACHE//初始化tcache相关变量
  INTERNAL_SIZE_T tcache_nb = 0;
  size_t tc_idx = csize2tidx (nb);
  if (tcache && tc_idx < mp_.tcache_bins)
    tcache_nb = nb;
  int return_cached = 0;

  tcache_unsorted_count = 0;
#endif

  for (;; )
    {
      int iters = 0;

与2.23：

新加了tcache相关变量初始化

unsorted bin

代码太长了，这里只选取新加的部分

#if USE_TCACHE
	      /* Fill cache first, return to user only if cache fills.
		 We may return one of these chunks later.  */
	      if (tcache_nb
		  && tcache->counts[tc_idx] < mp_.tcache_count)//满足tcache
		{
		  tcache_put (victim, tc_idx);//放到tcache中
		  return_cached = 1;
		  continue;
		}
	      else
		{
#endif

在unsortbin遍历，如果符合tcache的条件（tacche_nb为真且tcache还没满），就将该内存块就放入tcache中

#if USE_TCACHE
      /* If we've processed as many chunks as we're allowed while
	 filling the cache, return one of the cached ones.  */
      ++tcache_unsorted_count;//计数已经处理的内存块
      if (return_cached
	  && mp_.tcache_unsorted_limit > 0
	  && tcache_unsorted_count > mp_.tcache_unsorted_limit)//判断填没填满
	{
	  return tcache_get (tc_idx);
	}
#endif

大致意思：在处理 unsorted bin 中的内存块时，如果已经处理了足够多的内存块（用于填充 tcache），则直接从 tcache 中返回一个内存块给用户。

libc_free

void
__libc_free (void *mem)
{
  mstate ar_ptr;
  mchunkptr p;                          /* chunk corresponding to mem */

  void (*hook) (void *, const void *)
    = atomic_forced_read (__free_hook);
  if (__builtin_expect (hook != NULL, 0))
    {
      (*hook)(mem, RETURN_ADDRESS (0));
      return;
    }

  if (mem == 0)                              /* free(0) has no effect */
    return;

  p = mem2chunk (mem);

  if (chunk_is_mmapped (p))                       /* release mmapped memory. */
    {
      /* See if the dynamic brk/mmap threshold needs adjusting.
	 Dumped fake mmapped chunks do not affect the threshold.  */
      if (!mp_.no_dyn_threshold
          && chunksize_nomask (p) > mp_.mmap_threshold
          && chunksize_nomask (p) <= DEFAULT_MMAP_THRESHOLD_MAX
	  && !DUMPED_MAIN_ARENA_CHUNK (p))
        {
          mp_.mmap_threshold = chunksize (p);
          mp_.trim_threshold = 2 * mp_.mmap_threshold;
          LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
                      mp_.mmap_threshold, mp_.trim_threshold);
        }
      munmap_chunk (p);
      return;
    }

  MAYBE_INIT_TCACHE ();//多的

  ar_ptr = arena_for_chunk (p);
  _int_free (ar_ptr, p, 0);
}
libc_hidden_def (__libc_free)

主要变化就在38行，判断TCACHE初始化了没有，没有的话就初始化

_int_free

#if USE_TCACHE
  {
    size_t tc_idx = csize2tidx (size);//更具内存块大小寻找索引

    if (tcache
	&& tc_idx < mp_.tcache_bins
	&& tcache->counts[tc_idx] < mp_.tcache_count)//检查tcache是否可用
      {
	tcache_put (p, tc_idx);//放入tcache
	return;
      }
  }
#endif

与2.23相比：

在释放时如果tcache可用且未满，就会优先放入tcache中，优先级最高

fastbin

在2.26及更高版本中 free时

如果tcache没满，则当前释放的块会进入tcache，满了就放fastbin
fastbin中已经有的不会被转移

glibc-2.31

tcache

结构体

新增了key成员

typedef struct tcache_entry
{
  struct tcache_entry *next;//组成单向链表
  /* This field exists to detect double frees.  */
  struct tcache_perthread_struct *key;//检查是否double free
} tcache_entry;

增加了key成员，防止double free

tcache_put

static __always_inline void
tcache_put (mchunkptr chunk, size_t tc_idx)
{
  tcache_entry *e = (tcache_entry *) chunk2mem (chunk);

  /* Mark this chunk as "in the tcache" so the test in _int_free will
     detect a double free.  */
  e->key = tcache;

  e->next = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e;
  ++(tcache->counts[tc_idx]);
}

增加了第8行e->key = tcache;标记内存块以释放到tcache用于在后续的释放时检测double free

tcache_get

static __always_inline void *
tcache_get (size_t tc_idx)
{
  tcache_entry *e = tcache->entries[tc_idx];
  tcache->entries[tc_idx] = e->next;
  --(tcache->counts[tc_idx]);
  e->key = NULL;
  return (void *) e;
}

增加了第七行e->key = NULL;：清楚key字段，标记该内存块为”已从tcache中取出

_libc_malloc

多了

1 2	`_Static_assert (PTRDIFF_MAX <= SIZE_MAX / 2, "PTRDIFF_MAX is not more than half of SIZE_MAX");`

断言（就是检查满不满足条件），用于确保 PTRDIFF_MAX 不超过 SIZE_MAX 的一半。防止指针运算时发生溢出。（不过我没理解这话）

_int_malloc

多了

if (!checked_request2size (bytes, &nb))
  {
    __set_errno (ENOMEM);
    return NULL;
  }

检查请求的大小是否合法，不合法就报错

fastbin

没变

smallbin

没变

largebin

没变

大循环

增加

for (;; )
  {
    int iters = 0;
    while ((victim = unsorted_chunks (av)->bk) != unsorted_chunks (av))//遍历unsorted bin
      {
        bck = victim->bk;//获取当前块的前一个块
        size = chunksize (victim);//获取当前块大小
        mchunkptr next = chunk_at_offset (victim, size);//获取下一个块

        if (__glibc_unlikely (size <= 2 * SIZE_SZ)
            || __glibc_unlikely (size > av->system_mem))//检查当前块大小是否有效
          malloc_printerr ("malloc(): invalid size (unsorted)");
        if (__glibc_unlikely (chunksize_nomask (next) < 2 * SIZE_SZ)//检查下个块大小是否有效
            || __glibc_unlikely (chunksize_nomask (next) > av->system_mem))
          malloc_printerr ("malloc(): invalid next size (unsorted)");
        if (__glibc_unlikely ((prev_size (next) & ~(SIZE_BITS)) != size))//检查下个块的pre_size是否与当前快大小匹配
          malloc_printerr ("malloc(): mismatching next->prev_size (unsorted)");
        if (__glibc_unlikely (bck->fd != victim)
            || __glibc_unlikely (victim->fd != unsorted_chunks (av)))//检查双向链表是否损坏
          malloc_printerr ("malloc(): unsorted double linked list corrupted");
        if (__glibc_unlikely (prev_inuse (next)))//检查下个块是否被使用
          malloc_printerr ("malloc(): invalid next->prev_inuse (unsorted)");

就是一些检查，检查内存块以及双向链表

unsortedbin

在

if ((unsigned long) size
			  == (unsigned long) chunksize_nomask (fwd))
                        /* Always insert in the second position.  */
                        fwd = fwd->fd;
                      else
                        {
                          victim->fd_nextsize = fwd;
                          victim->bk_nextsize = fwd->bk_nextsize;
                          if (__glibc_unlikely (fwd->bk_nextsize->fd_nextsize != fwd))
                            malloc_printerr ("malloc(): largebin double linked list corrupted (nextsize)");
                          fwd->bk_nextsize = victim;
                          victim->bk_nextsize->fd_nextsize = victim;
                        }
                      bck = fwd->bk;
                      if (bck->fd != fwd)
                        malloc_printerr ("malloc(): largebin double linked list corrupted (bk)");
                    }

新增了

1 2	`if (__glibc_unlikely (fwd->bk_nextsize->fd_nextsize != fwd))//检查链表完整性 malloc_printerr ("malloc(): largebin double linked list corrupted (nextsize)");`

和

1 2	`if (bck->fd != fwd)//检查连表完整性 malloc_printerr ("malloc(): largebin double linked list corrupted (bk)");`

就是多了点检查

top chunk

多了

1 2	`if (__glibc_unlikely (size > av->system_mem)) malloc_printerr ("malloc(): corrupted top size");`

检查top chunk大小是否超过了系统分配的内存总量，若超过则说明内存管理器数据被破坏，就报错

_libc_free

没变

_int_free

变化：

#if USE_TCACHE
  {
    size_t tc_idx = csize2tidx (size);
    if (tcache != NULL && tc_idx < mp_.tcache_bins)
      {
	/* Check to see if it's already in the tcache.  */
	tcache_entry *e = (tcache_entry *) chunk2mem (p);

	/* This test succeeds on double free.  However, we don't 100%
	   trust it (it also matches random payload data at a 1 in
	   2^<size_t> chance), so verify it's not an unlikely
	   coincidence before aborting.  */
	if (__glibc_unlikely (e->key == tcache))
	  {
	    tcache_entry *tmp;
	    LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx);
	    for (tmp = tcache->entries[tc_idx];
		 tmp;
		 tmp = tmp->next)
	      if (tmp == e)
		malloc_printerr ("free(): double free detected in tcache 2");
	    /* If we get here, it was a coincidence.  We've wasted a
	       few cycles, but don't abort.  */
	  }

	if (tcache->counts[tc_idx] < mp_.tcache_count)
	  {
	    tcache_put (p, tc_idx);
	    return;
	  }
      }
  }
#endif

就是在放入tcache时检测一下是不是双重释放，不是的话才放入tcache

glibc-2.35

tcache

结构体

typedef struct tcache_entry
{
  struct tcache_entry *next;
  /* This field exists to detect double frees.  */
  uintptr_t key;
} tcache_entry

key的数据类型发生变化，变成无符号整型

并且多了初始化key的代码（从2.34开始的，原本的key时heap_base+0x10）

/* The value of tcache_key does not really have to be a cryptographically
   secure random number.  It only needs to be arbitrary enough so that it does
   not collide with values present in applications.  If a collision does happen
   consistently enough, it could cause a degradation in performance since the
   entire list is checked to check if the block indeed has been freed the
   second time.  The odds of this happening are exceedingly low though, about 1
   in 2^wordsize.  There is probably a higher chance of the performance
   degradation being due to a double free where the first free happened in a
   different thread; that's a case this check does not cover.  */
static void
tcache_key_initialize (void)
{
  if (__getrandom (&tcache_key, sizeof(tcache_key), GRND_NONBLOCK)
      != sizeof (tcache_key))
    {
      tcache_key = random_bits ();
#if __WORDSIZE == 64
      tcache_key = (tcache_key << 32) | random_bits ();
#endif
    }
}

通过随机化key防止攻击者修改固定的key值来实现攻击

tcache_put

1	`e->next = PROTECT_PTR (&e->next, tcache->entries[tc_idx]);`

保护next字段指针（加密），具体加密方法如下

1 2	`#define PROTECT_PTR(pos, ptr) \ ((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))`

所以此时next字段存储的并不是下一个块的地址，而是加密后的地址

tcache_get

1	`tcache->entries[tc_idx] = REVEAL_PTR (e->next);`

与上面那个配套使用，用于解密

_libc_malloc

去除了hook

_int_malloc

fastbin

     pp = REVEAL_PTR (victim->fd);                                     \
     if (__glibc_unlikely (pp != NULL && misaligned_chunk (pp)))       \
malloc_printerr ("malloc(): unaligned fastbin chunk detected"); \
   }

解密指针

检查对齐

if (__glibc_unlikely (misaligned_chunk (victim)))
  malloc_printerr ("malloc(): unaligned fastbin chunk detected 2");

if (SINGLE_THREAD_P)
  *fb = REVEAL_PTR (victim->fd);

检查对齐

解密指针

tcache里：

     if (__glibc_unlikely (misaligned_chunk (tc_victim)))
malloc_printerr ("malloc(): unaligned fastbin chunk detected 3");
     if (SINGLE_THREAD_P)
*fb = REVEAL_PTR (tc_victim->fd);

检查对齐

解密指针并更新fastbin链表头部

_libc_free

去除了hook

    /* See if the dynamic brk/mmap threshold needs adjusting.
Dumped fake mmapped chunks do not affect the threshold.  */
    if (!mp_.no_dyn_threshold
        && chunksize_nomask (p) > mp_.mmap_threshold
        && chunksize_nomask (p) <= DEFAULT_MMAP_THRESHOLD_MAX)
      {
        mp_.mmap_threshold = chunksize (p);
        mp_.trim_threshold = 2 * mp_.mmap_threshold;
        LIBC_PROBE (memory_mallopt_free_dyn_thresholds, 2,
                    mp_.mmap_threshold, mp_.trim_threshold);
      }
    munmap_chunk (p);
  }
else
  {
    MAYBE_INIT_TCACHE ();

    /* Mark the chunk as belonging to the library again.  */
    (void)tag_region (chunk2mem (p), memsize (p));

    ar_ptr = arena_for_chunk (p);
    _int_free (ar_ptr, p, 0);
  }

多了19行

标记内存块是库所有

_int_free

变化：

if (__glibc_unlikely (e->key == tcache_key))
  {
    tcache_entry *tmp;
    size_t cnt = 0;
    LIBC_PROBE (memory_tcache_double_free, 2, e, tc_idx);
    for (tmp = tcache->entries[tc_idx];
	 tmp;
	 tmp = REVEAL_PTR (tmp->next), ++cnt)//解密
      {
	if (cnt >= mp_.tcache_count)//检查链表长度是不是超过了最大长度
	  malloc_printerr ("free(): too many chunks detected in tcache");
	if (__glibc_unlikely (!aligned_OK (tmp)))//检查对齐
	  malloc_printerr ("free(): unaligned chunk detected in tcache 2");
	if (tmp == e)//检查双重释放
	  malloc_printerr ("free(): double free detected in tcache 2");
	/* If we get here, it was a coincidence.  We've wasted a
	   few cycles, but don't abort.  */
      }
  }

加密及解密宏

/* Safe-Linking:
   Use randomness from ASLR (mmap_base) to protect single-linked lists
   of Fast-Bins and TCache.  That is, mask the "next" pointers of the
   lists' chunks, and also perform allocation alignment checks on them.
   This mechanism reduces the risk of pointer hijacking, as was done with
   Safe-Unlinking in the double-linked lists of Small-Bins.
   It assumes a minimum page size of 4096 bytes (12 bits).  Systems with
   larger pages provide less entropy, although the pointer mangling
   still works.  */
#define PROTECT_PTR(pos, ptr) \
  ((__typeof (ptr)) ((((size_t) pos) >> 12) ^ ((size_t) ptr)))
#define REVEAL_PTR(ptr)  PROTECT_PTR (&ptr, ptr)

加密：将pos地址右移12位后与ptr异或得到加密数据

解密相反

glibc-2.41

tcache

结构体

没变

tcache_put

没变

tcache_get

static __always_inline void *
tcache_get_n (size_t tc_idx, tcache_entry **ep)
{
  tcache_entry *e;
  if (ep == &(tcache->entries[tc_idx]))//若rp直接指向链表头节点，则直接获取
    e = *ep;
  else
    e = REVEAL_PTR (*ep);//否则就解密

  if (__glibc_unlikely (!aligned_OK (e)))
    malloc_printerr ("malloc(): unaligned tcache chunk detected");

     //更新链表头节点
  if (ep == &(tcache->entries[tc_idx]))//若ep直接指向头节点，就解密并更新
      *ep = REVEAL_PTR (e->next);
  else//否则就加密更新
    *ep = PROTECT_PTR (ep, REVEAL_PTR (e->next));

  --(tcache->counts[tc_idx]);
  e->key = 0;
  return (void *) e;
}

相比于2.35新增可加密解密，提高安全性

_libc_malloc

tcache的发生变化

#if USE_TCACHE
  bool err = tcache_try_malloc (bytes, &victim);//调用函数从tcache中分配内存

  if (err)
      return NULL;

  if (victim)
      return tag_new_usable (victim);//对内存块进行标记，标记为可用
#endif

将大部分代码封装在了tcache_try_malloc中，下面是代码

static __always_inline bool
tcache_try_malloc (size_t bytes, void **memptr)
{
  /* int_free also calls request2size, be careful to not pad twice.  */
  size_t tbytes = checked_request2size (bytes);//将请求大小转换为实际需要分配的大小
  if (tbytes == 0)//转换失败的情况
    {
      __set_errno (ENOMEM);
      return true;
    }

  size_t tc_idx = csize2tidx (tbytes);//根据实际分配的大小计算tcache bin的索引

  MAYBE_INIT_TCACHE ();//初始化tcache（如果没初始化的话）

  if (tcache_available (tc_idx))//检查tcache是否可用
    *memptr = tcache_get (tc_idx);//获取内存块
  else
    *memptr = NULL;

  return false;
}

_int_malloc

没变

_libc_free

没变

_int_free

static inline void
_int_free (mstate av, mchunkptr p, int have_lock)
{
  INTERNAL_SIZE_T size;        /* its size */

  size = chunksize (p);

  _int_free_check (av, p, size);//这是检查的函数

#if USE_TCACHE//启用tcache的情况
  if (tcache_free (p, size))
    return;
#endif

  _int_free_chunk (av, p, size, have_lock);//这是free的函数
}

被封装在了不同的函数里

2025冬令营堆

malloc源码学习

http://yyyffff.github.io/2025/02/09/malloc源码学习/

作者

yyyffff

发布于

2025年2月9日

许可协议

buu做题记录上一篇

格式化字符串下一篇