page cache 与free
我們經(jīng)常用free查看服務(wù)器的內(nèi)存使用情況,而free中的輸出卻有些讓人困惑,如下:
?
先看看各個(gè)數(shù)字的意義以及如何計(jì)算得到:
free命令輸出的第二行(Mem):這行分別顯示了物理內(nèi)存的總量(total)、已使用的 (used)、空閑的(free)、共享的(shared)、buffer(buffer大小)、 cache(cache的大小)的內(nèi)存。我們知道Total、free、buffers、cached這幾個(gè)字段是從/proc/meminfo中獲取的,而used = total – free。Share列已經(jīng)過(guò)時(shí),忽略(見(jiàn)參考)。
free命令輸出的第三行(-/+ buffers/cache):
它顯示的第一個(gè)值(used):210236,這個(gè)值表示系統(tǒng)本身使用的內(nèi)存總量,即除去buffer/cache,等于Mem行used列 - Mem行buffers列 - Mem行cached列。
它顯示的第二個(gè)值(free):814956,這個(gè)值表示系統(tǒng)當(dāng)前可用內(nèi)存,它等于Mem行total列— buffers/cache used,也等于Mem行free列 + Mem行buffers列 + Mem行cached列。
free命令輸出的第四行(Swap) 這行顯示交換內(nèi)存的總量、已使用量、 空閑量。
?
我們都知道free是從/proc/meminfo中讀取相關(guān)的數(shù)據(jù)的。
?
下面是/proc/meminfo的實(shí)現(xiàn):
復(fù)制代碼
static int meminfo_read_proc(char *page, char **start, off_t off,
int count, int *eof, void *data)
{
struct sysinfo i;
int len;
unsigned long committed;
unsigned long allowed;
struct vmalloc_info vmi;
long cached;
/*
* display in kilobytes.
*/
#define K(x) ((x) << (PAGE_SHIFT - 10))
si_meminfo(&i);
si_swapinfo(&i);
committed = atomic_read(&vm_committed_space);
allowed = ((totalram_pages - hugetlb_total_pages())
* sysctl_overcommit_ratio / 100) + total_swap_pages;
cached = global_page_state(NR_FILE_PAGES) -
total_swapcache_pages - i.bufferram;
if (cached < 0)
cached = 0;
get_vmalloc_info(&vmi);
/*
* Tagged format, for easy grepping and expansion.
*/
len = sprintf(page,
"MemTotal: %8lu kB\n"
"MemFree: %8lu kB\n"
"Buffers: %8lu kB\n"
"Cached: %8lu kB\n"
"SwapCached: %8lu kB\n"
......
K(i.totalram),
K(i.freeram),
K(i.bufferram),
K(cached),
K(total_swapcache_pages),
......
#undef K
}
?
struct sysinfo {
long uptime; /* Seconds since boot */
unsigned long loads[3]; /* 1, 5, and 15 minute load averages */
unsigned long totalram; /* Total usable main memory size */
unsigned long freeram; /* Available memory size */
unsigned long sharedram; /* Amount of shared memory */
unsigned long bufferram; /* Memory used by buffers */
unsigned long totalswap; /* Total swap space size */
unsigned long freeswap; /* swap space still available */
unsigned short procs; /* Number of current processes */
unsigned short pad; /* explicit padding for m68k */
unsigned long totalhigh; /* Total high memory size */
unsigned long freehigh; /* Available high memory size */
unsigned int mem_unit; /* Memory unit size in bytes */
char _f[20-2*sizeof(long)-sizeof(int)]; /* Padding: libc5 uses this.. */
};
復(fù)制代碼
圖中,Buffers對(duì)應(yīng)sysinfo.bufferram,內(nèi)核中以頁(yè)框?yàn)閱挝?#xff0c;通過(guò)宏K轉(zhuǎn)化成以KB為單位輸出。
復(fù)制代碼
void si_meminfo(struct sysinfo *val)
{
val->totalram = totalram_pages;//total ram pages
val->sharedram = 0;
val->freeram = global_page_state(NR_FREE_PAGES);//free mem pages
val->bufferram = nr_blockdev_pages();//block devices used pages
val->totalhigh = totalhigh_pages;
val->freehigh = nr_free_highpages();
val->mem_unit = PAGE_SIZE;
}
long nr_blockdev_pages(void)
{
struct block_device *bdev;
long ret = 0;
spin_lock(&bdev_lock);
list_for_each_entry(bdev, &all_bdevs, bd_list) {
ret += bdev->bd_inode->i_mapping->nrpages;
}
spin_unlock(&bdev_lock);
return ret;
}
復(fù)制代碼
nr_blockdev_pages計(jì)算塊設(shè)備使用的頁(yè)框數(shù),遍歷所有塊設(shè)備,將使用的頁(yè)框數(shù)相加。而不包含普通文件使用的頁(yè)框數(shù)。
cached = global_page_state(NR_FILE_PAGES) - total_swapcache_pages - i.bufferram;
復(fù)制代碼
static inline unsigned long global_page_state(enum zone_stat_item item)
{
long x = atomic_long_read(&vm_stat[item]);
#ifdef CONFIG_SMP
if (x < 0)
x = 0;
#endif
return x;
}
復(fù)制代碼
Cache的大小為內(nèi)核總的page cache減去swap cache和塊設(shè)備占用的頁(yè)框數(shù)量,實(shí)際上cache即為普通文件的占用的page cache。實(shí)際上,在函數(shù)add_to_page_cache和__add_to_swap_cache 中,都會(huì)通過(guò)調(diào)用pagecache_acct實(shí)現(xiàn)對(duì)內(nèi)核變量nr_pagecache進(jìn)行累加。前者對(duì)應(yīng)page cache,內(nèi)核讀塊設(shè)備和普通文件使用;后者對(duì)應(yīng)swap cache,內(nèi)核讀交換分區(qū)使用。
Page cache(頁(yè)面緩存)
在linux系統(tǒng)中,為了加快文件的讀寫(xiě),內(nèi)核中提供了page cache作為緩存,稱(chēng)為頁(yè)面緩存(page cache)。為了加快對(duì)塊設(shè)備的讀寫(xiě),內(nèi)核中還提供了buffer cache作為緩存。在2.4內(nèi)核中,這兩者是分開(kāi)的。這樣就造成了雙緩沖,因?yàn)槲募x寫(xiě)最后還是轉(zhuǎn)化為對(duì)塊設(shè)備的讀寫(xiě)。在2.6中,buffer cache合并到page cache中,對(duì)應(yīng)的頁(yè)面叫作buffer page。當(dāng)進(jìn)行文件讀寫(xiě)時(shí),如果文件在磁盤(pán)上的存儲(chǔ)塊是連續(xù)的,那么文件在page cache中對(duì)應(yīng)的頁(yè)是普通的page,如果文件在磁盤(pán)上的數(shù)據(jù)塊是不連續(xù)的,或者是設(shè)備文件,那么文件在page cache中對(duì)應(yīng)的頁(yè)是buffer page。buffer page與普通的page相比,每個(gè)頁(yè)多了幾個(gè)buffer_head結(jié)構(gòu)體(個(gè)數(shù)視塊的大小而定)。此外,如果對(duì)單獨(dú)的塊(如超級(jí)塊)直接進(jìn)行讀寫(xiě),對(duì)應(yīng)的page cache中的頁(yè)也是buffer page。這兩種頁(yè)面雖然形式略有不同,但是最終他們的數(shù)據(jù)都會(huì)被封裝成bio結(jié)構(gòu)體,提交到通用塊設(shè)備驅(qū)動(dòng)層,統(tǒng)一進(jìn)行I/O調(diào)度。
復(fù)制代碼
/**
* 塊緩沖頭描述符
*/
struct buffer_head {
/* 塊緩沖狀態(tài)位圖,如BH_Uptodate */
unsigned long b_state; /* buffer state bitmap (see above) */
/* 指向下一個(gè)塊緩沖,二者屬于同一個(gè)頁(yè)緩存 */
struct buffer_head *b_this_page;/* circular list of page's buffers */
/* 如果緩沖區(qū)屬于頁(yè)緩存,則指向緩存頁(yè)。如果獨(dú)立于頁(yè)緩存,則為NULL */
struct page *b_page; /* the page this bh is mapped to */
/* 對(duì)應(yīng)的塊號(hào) */
sector_t b_blocknr; /* start block number */
/* 塊長(zhǎng) */
size_t b_size; /* size of mapping */
/* 內(nèi)存中的數(shù)據(jù)指針 */
char *b_data; /* pointer to data within the page */
/* 后備設(shè)備 */
struct block_device *b_bdev;
/* 當(dāng)IO操作完成時(shí),由內(nèi)核調(diào)用的回調(diào)函數(shù) */
bh_end_io_t *b_end_io; /* I/O completion */
/* 預(yù)留指針,用于b_end_io。一般用于日志文件系統(tǒng)。 */
void *b_private; /* reserved for b_end_io */
struct list_head b_assoc_buffers; /* associated with another mapping */
/* 所屬地址空間 */
struct address_space *b_assoc_map; /* mapping this buffer is
associated with */
/* 訪問(wèn)計(jì)數(shù)器 */
atomic_t b_count; /* users using this buffer_head */
};
復(fù)制代碼
在kernel2.6之后,buffer_head沒(méi)有別的作用,主要用來(lái)保持頁(yè)框與塊設(shè)備中數(shù)據(jù)塊的映射關(guān)系。
Buffer page(緩沖頁(yè))
如果內(nèi)核需要單獨(dú)訪問(wèn)一個(gè)塊,就會(huì)涉及到buffer page,并會(huì)檢查對(duì)應(yīng)的buffer head。
內(nèi)核創(chuàng)建buffer page的兩種常見(jiàn)情況:
(1)當(dāng)讀或者寫(xiě)一個(gè)文件頁(yè)的數(shù)據(jù)塊不相鄰時(shí)。發(fā)生這種情況是因?yàn)槲募到y(tǒng)為文件分配了非連續(xù)的塊,或者文件有洞。具體請(qǐng)參見(jiàn)block_read_full_page(fs/buffer.c)函數(shù):
復(fù)制代碼
/**
* 從塊設(shè)備中讀取整頁(yè)
*/
int block_read_full_page(struct page *page, get_block_t *get_block)
{
struct inode *inode = page->mapping->host;
sector_t iblock, lblock;
struct buffer_head *bh, *head, *arr[MAX_BUF_PER_PAGE];
unsigned int blocksize;
int nr, i;
int fully_mapped = 1;
BUG_ON(!PageLocked(page));
blocksize = 1 << inode->i_blkbits;
if (!page_has_buffers(page))/* 如果還沒(méi)有建立緩沖區(qū),則建立幾個(gè)空緩沖區(qū) */
create_empty_buffers(page, blocksize, 0);
/* 取頁(yè)面關(guān)聯(lián)的第一個(gè)緩沖區(qū) */
head = page_buffers(page);
/* 計(jì)算要讀取的塊號(hào) */
iblock = (sector_t)page->index << (PAGE_CACHE_SHIFT - inode->i_blkbits);
lblock = (i_size_read(inode)+blocksize-1) >> inode->i_blkbits;
bh = head;
nr = 0;
i = 0;
/* 遍歷所有緩沖區(qū) */
do {
if (buffer_uptodate(bh))/* 緩沖區(qū)已經(jīng)與設(shè)備匹配了,不需要處理 */
continue;
if (!buffer_mapped(bh)) {/* 沒(méi)有映射 */
int err = 0;
fully_mapped = 0;
if (iblock < lblock) {/* 在設(shè)備上還不存在塊 */
WARN_ON(bh->b_size != blocksize);
/* 獲得邏輯塊在磁盤(pán)上的位置 */
err = get_block(inode, iblock, bh, 0);
if (err)
SetPageError(page);
}
if (!buffer_mapped(bh)) {/* 對(duì)應(yīng)的塊是稀疏塊,寫(xiě)入0即可 */
zero_user_page(page, i * blocksize, blocksize,
KM_USER0);
if (!err)
set_buffer_uptodate(bh);
continue;
}
/*
* get_block() might have updated the buffer
* synchronously
*/
if (buffer_uptodate(bh))/* get_block將緩沖區(qū)更新了,繼續(xù)處理下一塊 */
continue;
}
/* 緩沖區(qū)已經(jīng)映射,但內(nèi)容不是最新的,將它放到臨時(shí)數(shù)組中 */
arr[nr++] = bh;
} while (i++, iblock++, (bh = bh->b_this_page) != head);
if (fully_mapped)
SetPageMappedToDisk(page);
if (!nr) {/* 所有緩沖區(qū)都是最新的 */
/*
* All buffers are uptodate - we can set the page uptodate
* as well. But not if get_block() returned an error.
*/
if (!PageError(page))/* 設(shè)置頁(yè)的uptodate標(biāo)志,然后退出 */
SetPageUptodate(page);
unlock_page(page);
return 0;
}
/* Stage two: lock the buffers */
for (i = 0; i < nr; i++) {/* 鎖定緩沖區(qū) */
bh = arr[i];
lock_buffer(bh);
mark_buffer_async_read(bh);
}
/*
* Stage 3: start the IO. Check for uptodateness
* inside the buffer lock in case another process reading
* the underlying blockdev brought it uptodate (the sct fix).
*/
for (i = 0; i < nr; i++) {/* 遍歷頁(yè)內(nèi)所有需要更新的緩沖區(qū) */
bh = arr[i];
if (buffer_uptodate(bh))/* 在沒(méi)有獲得鎖的期間,如果有其他進(jìn)程讀取的內(nèi)容 */
end_buffer_async_read(bh, 1);
else
submit_bh(READ, bh);/* 提交IO請(qǐng)求 */
}
return 0;
}
復(fù)制代碼
這里使用buffer head主要是通過(guò)buffer head建立頁(yè)框與數(shù)據(jù)塊的映射關(guān)系。因?yàn)轫?yè)面中的數(shù)據(jù)不是連接的,而頁(yè)框描述符struct page的字段又不足以表達(dá)這種信息。
該函數(shù)會(huì)調(diào)用create_empty_buffers來(lái)創(chuàng)建一組全新的緩沖區(qū),并與page關(guān)聯(lián)起來(lái)
復(fù)制代碼
/**
* 創(chuàng)建一組全新的緩沖區(qū),以便與頁(yè)關(guān)聯(lián)
*/
void create_empty_buffers(struct page *page,
unsigned long blocksize, unsigned long b_state)
{
struct buffer_head *bh, *head, *tail;
/* 創(chuàng)建所需要數(shù)目的緩沖頭,并將其形成一個(gè)鏈表,返回第一個(gè)緩沖頭 */
head = alloc_page_buffers(page, blocksize, 1);
/* 設(shè)置所有緩沖頭的狀態(tài),并將緩沖頭形成一個(gè)環(huán)形鏈表 */
bh = head;
do {
bh->b_state |= b_state;
tail = bh;
bh = bh->b_this_page;
} while (bh);
tail->b_this_page = head;
/* 根據(jù)頁(yè)面狀態(tài)設(shè)置塊緩沖區(qū)的狀態(tài) */
spin_lock(&page->mapping->private_lock);
if (PageUptodate(page) || PageDirty(page)) {
bh = head;
do {/* 更新每一個(gè)緩沖頭的狀態(tài) */
if (PageDirty(page))
set_buffer_dirty(bh);
if (PageUptodate(page))
set_buffer_uptodate(bh);
bh = bh->b_this_page;
} while (bh != head);
}
/* 將緩沖區(qū)關(guān)聯(lián)到頁(yè)面 */
attach_page_buffers(page, head);
spin_unlock(&page->mapping->private_lock);
}
復(fù)制代碼
create_empty_buffers調(diào)用alloc_page_buffers來(lái)創(chuàng)建一組buffer head鏈表,但還不是循環(huán)鏈表:
復(fù)制代碼
struct buffer_head *alloc_page_buffers(struct page *page, unsigned long size,
int retry)
{
struct buffer_head *bh, *head;
long offset;
try_again:
head = NULL;
offset = PAGE_SIZE;
while ((offset -= size) >= 0) {
bh = alloc_buffer_head(GFP_NOFS);
if (!bh)
goto no_grow;
bh->b_bdev = NULL;
bh->b_this_page = head;
bh->b_blocknr = -1;
head = bh;
bh->b_state = 0;
atomic_set(&bh->b_count, 0);
bh->b_private = NULL;
bh->b_size = size;
/* Link the buffer to its page */
set_bh_page(bh, page, offset);
init_buffer(bh, NULL, NULL);
}
return head;
......
}
復(fù)制代碼
alloc_page_buffers調(diào)用set_bh_page來(lái)設(shè)置b_data.
復(fù)制代碼
void set_bh_page(struct buffer_head *bh,
struct page *page, unsigned long offset)
{
bh->b_page = page;
BUG_ON(offset >= PAGE_SIZE);
if (PageHighMem(page))
/*
* This catches illegal uses and preserves the offset:
*/
bh->b_data = (char *)(0 + offset);
else
bh->b_data = page_address(page) + offset;
}
復(fù)制代碼
(2)訪問(wèn)一個(gè)單獨(dú)的磁盤(pán)塊(比如,讀超級(jí)塊或者索引節(jié)點(diǎn)塊時(shí))。參見(jiàn)ext2_fill_super(fs/ext2/super.c),該函數(shù)在安裝ext2文件系統(tǒng)時(shí)調(diào)用。
Buffer page和buffer head的關(guān)系:
?
?
因此,對(duì)于普通文件,如果頁(yè)面中的塊是連續(xù)的,則頁(yè)面沒(méi)有對(duì)應(yīng)buffer head;如果不連續(xù),則頁(yè)面有對(duì)應(yīng)的buffer head,參見(jiàn)do_mpage_readpage函數(shù)。對(duì)于塊設(shè)備,無(wú)論是讀取單獨(dú)的數(shù)據(jù)塊,還是作為設(shè)備文件來(lái)進(jìn)行讀取,頁(yè)面始終有對(duì)應(yīng)的buffer head,參見(jiàn)block_read_full_page/__bread函數(shù)。
轉(zhuǎn)載于:https://www.cnblogs.com/chenliyang/p/6543165.html
總結(jié)
以上是生活随笔為你收集整理的page cache 与free的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: JavaScript中this的指向问题
- 下一篇: 通过监测DLL调用探测Mimikatz