Kernel Memory Management notes

More study notes.

Pages: Physical memory is managed in pages by the CPU’s MMU, usually 4K or 8K. The kernel tracks every physical page with <linux/mm.h> struct page, and holds the status flags, the count of references to that page (see page_count(),  zero = unused), the kernel’s virtual address of the page, and owner (user, dyn-alloc kernel, static kernel, page cache, etc.).

Zones: Physical address range limits some activity, like DMA devices may have access to a smaller range (ZONE_DMA) like ISA which can only access top 16MB, or there may be more physical than can be mapped virtual so not always mapped to kernel (ZONE_HIGHMEM), on x86 all memory above 896MB, or just normal use (ZONE_NORMAL).

struct page * alloc_pages(mask, pow) allocates 2^pow contiguous physical pages. Convert to a logical/virtual page with void * page_address(struct page * phys);. See also get_zeroed_page(mask), free_pages(vaddr, order).

kmalloc()/kfree() returns byte granularity, physically contiguous memory for the kernel. It has flags controlling actions (no sleep, no retry, etc) , zones (dma, high) and types (atomic for interrupt handlers, dma, user space…).

vmalloc()/vfree() returns memory contiguous in kernel virtual memory. It may fix up the page table to align physical pages, it may sleep.This causes TLB thrashing so is used only for very large allocations when it is unlikey to work with kmalloc.

Slab allocator: a system managed allocator used to allocate specific structures and cache them for reuse. This replaces usage of custom free lists. Some of the caching can be per CPU, removing the need for locks, if it is NUMA aware it can allocate memory from the same node as the CPU. Objects can be colored to not map to the same cache line.

static stack allocation: Kernel threads have small stacks unlike user space, default is usually 2 physically contiguous pages (8k ~ 16k). Linux moved to 1 page, so interrupt handlers don’t fit, so a one page, per processor interrupt stack was added. Recursion and alloca() are not allowed. Overruning the stack will damage thread_info.

High memory: Physical pages past 896MB are only temporarily mapped in the kernel’s VM, in the 3GB~4GB range. Permanent mapping is done in process context via void * kmap(struct page * p). Temporary mapping is done without sleeping (for interrupts) and done via void * kmap-atomic(struct page *p, km_type type) and disables preemption because the map is bound to a CPU.

per-CPU allocation: For SMP to avoid locking, you can use CPU local variables. While accessing the variable, your thread must not change CPU, hence use a primitive that turns off preemption

Legacy method: Declare a variable as an array with one entry per CPU. myType myVar[NR_CPUS], and detect you current CPU with get_cpu()/put_cpu(). get_cpu will turn off preemption to stay on the same CPU (else use smp_processor_id()).

percpu: Simpler more systematic and powerful. Cache alignment is implicitly handled to avoid thrashing. Consider the section after the call as critical, you cannot sleep until the variable is released.

Static declarations won’t work for dynamic kernel modules.

#include <linux/percpu.h>
DEFINE_PER_CPU(myType, myVar);
// DECLARE_PER_CPU(myType, myVar); for the header file
// Critical section with preemption off.
// Updates var of another CPU without changing preemption:
per_cpu(myVar, cpu)++;

dynamic dynamically allocated version, optionally force the compiler to align

void * ptr = alloc_percpu(myType) __alignof__ (myType);
myType * myVar = get_cpu_ptr(ptr);
// Critical section with preemption off.

// Updates var of another CPU without changing preemption:
per_cpu_ptr(myVar, cpu);



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s