一个历史遗留问题,引发的linux内存管理的‘血案’
最近處理一個骨灰級歷史殘留問題,內核模塊DPI的內存數據被無故關顧,導致系統的panic的問題,linux 內核版本3.18 x86_64,由于我們要精簡系統,許多調試工具已經被閹割,SLAB_DEBUG, KASAN not support, 由于這部分數據主要是查詢,在初始化后不會對其進行修改,所以想到一個辦法初始化完DPI后,將其使用的內存頁設置為只讀,通過stack的信息找到元兇。
按照以上的分析總共分為以下步驟:
按照思路從業務中抽取功能代碼,寫了非常簡單的一個測試用例,以為萬事大吉,萬萬沒有想到,理想很豐滿,現實很骨感,事情總是不按照我們的預期執行,多次執行insmod test.ko,得到以下結果
[ 659.486243] lookup_address ffff8800692e4000 [ 659.486248] level not 4K 2 [ 660.142577] lookup_address ffff880046436000 [ 660.142582] level not 4K 2 [ 660.530890] lookup_address ffff8800461a0000 [ 660.530896] level not 4K 2 [ 660.873884] lookup_address ffff88012369a000 [ 660.873889] level not 4K 2為什么level不是PG_LEVEL_4K,明明申請一頁,level層級確是PG_LEVEL_2M,這樣會將2M的內存空間設置為只讀狀態,為了查清這個問題,我們不得不梳理內存管理初始化流程:
start_kernel()|---->setup_arch(&command_line);||---->init_mem_mapping();||---->memory_map_top_down();||---->init_range_memory_mapping();||---->init_memory_mapping(); /** Setup the direct mapping of the physical memory at PAGE_OFFSET.* This runs before bootmem is initialized and gets pages directly from* the physical memory. To access them they are temporarily mapped.*/ unsigned long __init_refok init_memory_mapping(unsigned long start,unsigned long end) {struct map_range mr[NR_RANGE_MR];unsigned long ret = 0;int nr_range, i;pr_info("init_memory_mapping: [mem %#010lx-%#010lx]\n",start, end - 1);memset(mr, 0, sizeof(mr));nr_range = split_mem_range(mr, 0, start, end);for (i = 0; i < nr_range; i++)ret = kernel_physical_mapping_init(mr[i].start, mr[i].end,mr[i].page_size_mask);add_pfn_range_mapped(start >> PAGE_SHIFT, ret >> PAGE_SHIFT);return ret >> PAGE_SHIFT; }static int __meminit split_mem_range(struct map_range *mr, int nr_range,unsigned long start,unsigned long end) {...... // 省略部分代碼/* big page (2M) range */start_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE)); #ifdef CONFIG_X86_32end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE)); #else /* CONFIG_X86_64 */end_pfn = round_up(pfn, PFN_DOWN(PUD_SIZE));if (end_pfn > round_down(limit_pfn, PFN_DOWN(PMD_SIZE)))end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE)); #endifif (start_pfn < end_pfn) {nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,page_size_mask & (1<<PG_LEVEL_2M));pfn = end_pfn;}#ifdef CONFIG_X86_64/* big page (1G) range */start_pfn = round_up(pfn, PFN_DOWN(PUD_SIZE));end_pfn = round_down(limit_pfn, PFN_DOWN(PUD_SIZE));if (start_pfn < end_pfn) {nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,page_size_mask &((1<<PG_LEVEL_2M)|(1<<PG_LEVEL_1G)));pfn = end_pfn;}/* tail is not big page (1G) alignment */start_pfn = round_up(pfn, PFN_DOWN(PMD_SIZE));end_pfn = round_down(limit_pfn, PFN_DOWN(PMD_SIZE));if (start_pfn < end_pfn) {nr_range = save_mr(mr, nr_range, start_pfn, end_pfn,page_size_mask & (1<<PG_LEVEL_2M));pfn = end_pfn;} #endif...... // 省略部分代碼 }從split_mem_range() 可以看出,在做物理內存直接映射的時候,盡可能使用huge page去映射,這就解釋了為什么我們申請的內存是PG_LEVEL_2M,理論上說應該也會出現PG_LEVEL_1G的大頁,問題原因找到了,該怎么解決這個問題呢?此時想到了BPF功能,會將BPF字節碼注入內核,為了安全它也會做BPF字節碼的內存設置只讀權限,肯定也會遇到我們同樣的問題,RTFSC
sys_bpf() | |---->bpf_prog_load()||---->bpf_prog_select_runtime()||---->bpf_int_jit_compile()||---->set_memory_ro()||---->change_page_attr_clear()||---->__change_page_attr_set_clr()||---->__change_page_attr()||---->lookup_address_cpa()||---->split_large_page() /* ! PG_LEVEL_4K */從上面代碼流程可以看出,bpf() 系統調用最終會調用split_large_page() 來解決申請的大頁的情況,x86平臺封裝了系列函數,至此我們修改我們的實現方式,采用set_memory_ro(),自作聰明的以為修改PTE屬性,還是掉進的坑里。
/** The set_memory_* API can be used to change various attributes of a virtual* address range. The attributes include:* Cachability : UnCached, WriteCombining, WriteBack* Executability : eXeutable, NoteXecutable* Read/Write : ReadOnly, ReadWrite* Presence : NotPresent* / int set_memory_uc(unsigned long addr, int numpages); int set_memory_wc(unsigned long addr, int numpages); int set_memory_wb(unsigned long addr, int numpages); int set_memory_x(unsigned long addr, int numpages); int set_memory_nx(unsigned long addr, int numpages); int set_memory_ro(unsigned long addr, int numpages); int set_memory_rw(unsigned long addr, int numpages); int set_memory_np(unsigned long addr, int numpages); int set_memory_4k(unsigned long addr, int numpages);學習的道路,永無止境,特別是內核學習,RTFSC!!!!
總結
以上是生活随笔為你收集整理的一个历史遗留问题,引发的linux内存管理的‘血案’的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: python request timeo
- 下一篇: Python遇到的问题(二)