通常在高速網卡驅動中,為了實現高性能,都會直接從 Buddy 中按照頁的N次方為單位劃分內存(大頁內存也是從 Buddy 里獲取的,只是在向用戶層映射時使用的大頁表而已,這里不細區分),然后 IO 映射給網卡,同時使用 RingBuffer 數組或者 DMA 鏈組成多隊列。而 X710 網卡是一塊比較高端的網卡,應該也是具備這方面的功能的,其實現也脫離不了這些基本方法。而從 Buddy 分配內存的函數主要是 __get_free_pages (不同內核版本,還有一些宏定義和變種,但都是大同小異,一定是以 Pages 的 N 次方分配內存, N 用參數 order 輸入)。
快速分析:
從下面的輸出可以快速推斷出,在收發數據中的確使用了直接從 Buddy 分配頁面函數,雖然他使用了一個”變種“函數 alloc_pages_node (這個函數肯定也是間接調用 Buddy 內存分配函數,因為有 order 參數,這里就不細說了)。
$ grep -rHn pages
src/Makefile:107: @echo "Copying manpages..."
src/kcompat.h:5180:#ifndef dev_alloc_pages
src/kcompat.h:5181:#define dev_alloc_pages(_order) alloc_pages_node(NUMA_NO_NODE, (GFP_ATOMIC | __GFP_COLD | __GFP_COMP | __GFP_MEMALLOC), (_order))
src/kcompat.h:5184:#define dev_alloc_page() dev_alloc_pages(0)
src/kcompat.h:5620: __free_pages(page, compound_order(page));
src/i40e_txrx.c:1469: page = dev_alloc_pages(i40e_rx_pg_order(rx_ring));
src/i40e_txrx.c:1485: __free_pages(page, i40e_rx_pg_order(rx_ring));
src/i40e_txrx.c:1858: * Also address the case where we are pulling data in on pages only
src/i40e_txrx.c:1942: * For small pages, @truesize will be a constant value, half the size
src/i40e_txrx.c:1951: * For larger pages, @truesize will be the actual space used by the
src/i40e_txrx.c:1955: * space for a buffer. Each region of larger pages will be used at
src/i40e_lan_hmc.c:295: * This will allocate memory for PDs and backing pages and populate
src/i40e_lan_hmc.c:394: /* remove the backing pages from pd_idx1 to i */
// src/i40e_txrx.c/*** i40e_alloc_mapped_page - recycle or make a new page* @rx_ring: ring to use* @bi: rx_buffer struct to modify** Returns true if the page was successfully allocated or* reused.**/
static bool i40e_alloc_mapped_page(struct i40e_ring *rx_ring,struct i40e_rx_buffer *bi)
{struct page *page = bi->page;dma_addr_t dma;/* since we are recycling buffers we should seldom need to alloc */if (likely(page)) {rx_ring->rx_stats.page_reuse_count++;return true;}/* alloc new page for storage */page = dev_alloc_pages(i40e_rx_pg_order(rx_ring));if (unlikely(!page)) {rx_ring->rx_stats.alloc_page_failed++;return false;}/* map page for use */dma = dma_map_page_attrs(rx_ring->dev, page, 0,i40e_rx_pg_size(rx_ring),DMA_FROM_DEVICE,I40E_RX_DMA_ATTR);/* if mapping failed free memory back to system since* there isn't much point in holding memory we can't use*/if (dma_mapping_error(rx_ring->dev, dma)) {__free_pages(page, i40e_rx_pg_order(rx_ring));rx_ring->rx_stats.alloc_page_failed++;return false;}bi->dma = dma;bi->page = page;bi->page_offset = i40e_rx_offset(rx_ring);/* initialize pagecnt_bias to 1 representing we fully own page */bi->pagecnt_bias = 1;return true;
}
根據過往經驗,再繼續看下去,就得看這個網卡的 Data Sheet 和 Programming Guide 了,很顯然短期內 Intel 也沒有打算開放這些資料。如果真要細心分析,難度不小,估計得一個月以上的時間,這個分析方向暫時停止。但是也基本和之前的內存分析的結論不矛盾。
https://sourceforge.net/projects/e1000/files/i40e%20stable/2.3.6/Changelog for i40e-linux-2.3.6
===========================================================================- Fix mac filter removal timing issue
- Sync i40e_ethtool.c with upstream
- Fixes for TX hangs
- Some fixes for reset of VFs
- Fix build error with packet split disabled
- Fix memory leak related to filter programming status
- Add and modify branding strings
- Fix kdump failure
- Implement an ethtool private flag to stop LLDP in FW
- Add delay after EMP reset for firmware to recover
- Fix incorrect default ITR values on driver load
- Fixes for programming cloud filters
- Some performance improvements
- Enable XPS with QoS on newer kernels
- Enable support for VF VLAN tag stripping control
- Build fixes to force perl to load specific ./SpecSetup.pm file
- Fix the updating of pci.ids
- Use 16 byte descriptors by default
- Fixes for DCB
- Don't close client in debug mode
- Add change MTU log in VF driver
- Fix for adding multiple ethtool filters on the same location
- Add new branding strings for OCP XXV710 devices
- Remove X722 Support for Destination IP Cloud Filter
- Allow turning off offloads when the VF has VLAN set
在內核源碼倉庫里查一下,這就是那個所說解決內存泄露問題的 Patch ,但實際上并沒有:
https://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/commit/drivers/net/ethernet/intel/i40e/i40e_txrx.c?id=2b9478ffc550f17c6cd8c69057234e91150f5972author Alexander Duyck <alexander.h.duyck@intel.com> 2017-10-04 08:44:43 -0700
committer Jeff Kirsher <jeffrey.t.kirsher@intel.com> 2017-10-10 08:04:36 -0700
commit 2b9478ffc550f17c6cd8c69057234e91150f5972 (patch)
tree 3c3478f6c489db75c980a618a44dbd0dc80fc3ef /drivers/net/ethernet/intel/i40e/i40e_txrx.c
parent e836e3211229d7307660239cc957f2ab60e6aa00 (diff)
download net-2b9478ffc550f17c6cd8c69057234e91150f5972.tar.gzi40e: Fix memory leak related filter programming status
It looks like we weren't correctly placing the pages from buffers that had
been used to return a filter programming status back on the ring. As a
result they were being overwritten and tracking of the pages was lost.This change works to correct that by incorporating part of
i40e_put_rx_buffer into the programming status handler code. As a result we
should now be correctly placing the pages for those buffers on the
re-allocation list instead of letting them stay in place.Fixes: 0e626ff7ccbf ("i40e: Fix support for flow director programming status")
Reported-by: Anders K. Pedersen <akp@cohaesio.com>
Signed-off-by: Alexander Duyck <alexander.h.duyck@intel.com>
Tested-by: Anders K Pedersen <akp@cohaesio.com>
Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>diff --git a/drivers/net/ethernet/intel/i40e/i40e_txrx.c b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
index 1519dfb..2756131 100644
--- a/drivers/net/ethernet/intel/i40e/i40e_txrx.c
+++ b/drivers/net/ethernet/intel/i40e/i40e_txrx.c
@@ -1038,6 +1038,32 @@ reset_latency:}/**
+ * i40e_reuse_rx_page - page flip buffer and store it back on the ring
+ * @rx_ring: rx descriptor ring to store buffers on
+ * @old_buff: donor buffer to have page reused
+ *
+ * Synchronizes page for reuse by the adapter
+ **/
+static void i40e_reuse_rx_page(struct i40e_ring *rx_ring,
+ struct i40e_rx_buffer *old_buff)
+{
+ struct i40e_rx_buffer *new_buff;
+ u16 nta = rx_ring->next_to_alloc;
+
+ new_buff = &rx_ring->rx_bi[nta];
+
+ /* update, and store next to alloc */
+ nta++;
+ rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
+
+ /* transfer page from old buffer to new buffer */
+ new_buff->dma = old_buff->dma;
+ new_buff->page = old_buff->page;
+ new_buff->page_offset = old_buff->page_offset;
+ new_buff->pagecnt_bias = old_buff->pagecnt_bias;
+}
+
+/*** i40e_rx_is_programming_status - check for programming status descriptor* @qw: qword representing status_error_len in CPU ordering*
@@ -1071,15 +1097,24 @@ static void i40e_clean_programming_status(struct i40e_ring *rx_ring,union i40e_rx_desc *rx_desc,u64 qw){
- u32 ntc = rx_ring->next_to_clean + 1;
+ struct i40e_rx_buffer *rx_buffer;
+ u32 ntc = rx_ring->next_to_clean;u8 id;/* fetch, update, and store next to clean */
+ rx_buffer = &rx_ring->rx_bi[ntc++];ntc = (ntc < rx_ring->count) ? ntc : 0;rx_ring->next_to_clean = ntc;prefetch(I40E_RX_DESC(rx_ring, ntc));+ /* place unused page back on the ring */
+ i40e_reuse_rx_page(rx_ring, rx_buffer);
+ rx_ring->rx_stats.page_reuse_count++;
+
+ /* clear contents of buffer_info */
+ rx_buffer->page = NULL;
+id = (qw & I40E_RX_PROG_STATUS_DESC_QW1_PROGID_MASK) >>I40E_RX_PROG_STATUS_DESC_QW1_PROGID_SHIFT;@@ -1639,32 +1674,6 @@ static bool i40e_cleanup_headers(struct i40e_ring *rx_ring, struct sk_buff *skb,}/**
- * i40e_reuse_rx_page - page flip buffer and store it back on the ring
- * @rx_ring: rx descriptor ring to store buffers on
- * @old_buff: donor buffer to have page reused
- *
- * Synchronizes page for reuse by the adapter
- **/
-static void i40e_reuse_rx_page(struct i40e_ring *rx_ring,
- struct i40e_rx_buffer *old_buff)
-{
- struct i40e_rx_buffer *new_buff;
- u16 nta = rx_ring->next_to_alloc;
-
- new_buff = &rx_ring->rx_bi[nta];
-
- /* update, and store next to alloc */
- nta++;
- rx_ring->next_to_alloc = (nta < rx_ring->count) ? nta : 0;
-
- /* transfer page from old buffer to new buffer */
- new_buff->dma = old_buff->dma;
- new_buff->page = old_buff->page;
- new_buff->page_offset = old_buff->page_offset;
- new_buff->pagecnt_bias = old_buff->pagecnt_bias;
-}
-
-/*** i40e_page_is_reusable - check if any reuse is possible* @page: page struct to check*
然后再看看這個驅動在內核 Mail List 上的反饋,遇到這個問題的人很多,我們并不孤獨:
https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+Linux+4.12%5C%2B+memory+leak+on+router+with+i40e+NICs%22&o=newest...Upgraded and looks like problem is not solved with that patch
Currently running system withhttps://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git/
kernelStill about 0.5GB of memory is leaking somewhereAlso can confirm that the latest kernel where memory is not
leaking (with
use i40e driver intel 710 cards) is 4.11.12
With kernel 4.11.12 - after hour no change in memory usage.also checked that with ixgbe instead of i40e with same
net.git kernel there
is no memleak - after hour same memory usage - so for 100%
this is i40e
driver problem.....