数据包接收系列 — IP协议处理流程(一)
本文主要內(nèi)容:在接收數(shù)據(jù)包時,IP協(xié)議的處理流程。
內(nèi)核版本:2.6.37
Author:zhangskd @ csdn blog?
?
IP報(bào)頭
?
IP報(bào)頭:
struct iphdr { #if defined(__LITTLE_ENDIAN_BITFIELD)__u8 ihl:4,version:4; #elif defined(__BIG_ENDIAN_BITFIELD)__u8 version:4, /* 協(xié)議版本,IPv4為4 */ihl:4; /* 首部長度,不包括選項(xiàng)為5,表示20字節(jié) */ #else #error "Please fix <asm/byteorder.h>" #endif__u8 tos; /* TOS服務(wù)類型,6位DSCP,2為ECN */__be16 tot_len; /* IP包總長度,最大為65535 */__be16 id; /* 標(biāo)識符,同一個IP包的不同分片具有相同的標(biāo)識符 */__be16 frag_off; /* 3個標(biāo)志位,13位偏移 */__u8 ttl; /* 存活時間,一般為64跳 */__u8 protocol; /* L4協(xié)議值 */__sum16 check; /* 報(bào)頭校驗(yàn)和,不包含載荷 */__be32 saddr; /* 源IP */__be32 daddr; /* 目的IP */ };??
ip_rcv
?
調(diào)用ip_rcv()時skb中的一些變量:
?
ip_rcv()是IP層的入口,主要做了:
丟棄L2目的地址不是本機(jī)的數(shù)據(jù)包(這說明網(wǎng)卡處于混雜模式,嗅探器會處理這些包)。
檢查skb的引用計(jì)數(shù),如果大于1,說明其它地方也在使用此skb,則克隆一個skb返回;否則直接返回原來的skb。
數(shù)據(jù)包合法性檢查:
data room必須大于IP報(bào)頭長度。
IP報(bào)頭長度至少是20,類型為IPv4。
data room至少能容納IP報(bào)頭(包括IP選項(xiàng))。
檢查IP報(bào)頭校驗(yàn)和是否正確。
數(shù)據(jù)包沒被截?cái)?skb->len >= 報(bào)總長),報(bào)總長不小于20。
如果L2有進(jìn)行填充(以太網(wǎng)幀最小長度為64),則把IP包裁剪成原大小,去除填充。此時如果接收的NIC
已計(jì)算出校驗(yàn)和,則讓其失效,讓L4自己重新計(jì)算。
最后,調(diào)用netfilter的NF_INET_PRE_ROUTING的鉤子函數(shù),如果此數(shù)據(jù)包被鉤子函數(shù)放行,則調(diào)用
ip_rcv_finish()繼續(xù)處理。
/* Main IP Receive routinue. */int ip_rcv(struct sk_buff *skb, struct net_device *dev, struct packet_type *pt, struct net_device *orig_dev) {struct iphdr *iph;u32 len;/* When the interface is in promisc mode, drop all the crap that it receives,* do not try to analyse it.* 當(dāng)數(shù)據(jù)幀的L2目的地址和接收接口的地址不同時,skb->pkt_type就被設(shè)成PACKET_OTHERHOST。* 網(wǎng)卡本身會丟棄這些包,除非設(shè)成混雜模式。嗅探器自會處理這種包,IP層無需理會。*/if (skb->pkt_type == PACKET_OTHERHOST)goto drop;IP_UPD_PO_STATS_BH(dev_net(dev), IPSTATS_MIB_IN, skb->len);/* 如果此skb的引用計(jì)數(shù)大于1,說明在其它地方也被使用,則克隆一個skb返回。* 否則直接返回原來的skb。*/if ((skb = skb_share_check(skb, GFP_ATOMIC)) == NULL) {IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS);goto out;}/* 確保data room >= IP報(bào)頭 */if (! pskb_may_pull(skb, sizeof(struct iphdr)))goto inhdr_error;iph = ip_hdr(skb);/** RFC1122: 3.2.1.2 MUST silently discard any IP frame that fails the checksum.* Is the datagram acceptable?* 1. Length at least the size of an ip header* 2. Version of 4* 3. Checksums correctly. [Speed optimisation for later, skip loopback checksums]* 4. Doesn't have a bogus length*//* IP報(bào)頭長度至少是20,類型為IPv4 */if (iph->ihl < 5 || iph->version != 4)goto inhdr_error;/* data room至少能容納IP報(bào)頭(包括IP選項(xiàng)) */if (! pskb_may_pull(skb, iph->ihl * 4))goto inhdr_error;iph = ip_hdr(skb);/* 檢查IP報(bào)頭校驗(yàn)和是否正確 */if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))goto inhdr_error;len = ntohs(iph->tot_len); /* IP報(bào)文總長度 *//* L2為了滿足最小幀的長度可能會進(jìn)行填充,所以skb->len >= len。* Ethernet數(shù)據(jù)幀的最小幀長度為64字節(jié)。*/if (skb->len < len) { /* 數(shù)據(jù)包被截?cái)嗔?*/IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INTRUNCATEDPKTS);} else if (len < (iph->ihl * 4))goto inhdr_error;/* Our transport medium may have padded the buffer out. Now we know it is* IP we can trim to the true length of the frame.* Note this now means skb->len holds ntohs(iph->tot_len).*//* 如果L2有進(jìn)行填充,則把IP包裁剪成原大小。* 如果接收的NIC已計(jì)算出校驗(yàn)和,則讓其失效,讓L4自己重新計(jì)算。*/if (pskb_trim_rcsum(skb, len)) {IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INDISCARDS);goto drop;}/* Remove any debris in the socket control block */memset(IPCB(skb), 0, sizeof(struct inet_skb_parm));/* Must drop socket now because of tproxy. */skb_orphan(skb);/* 調(diào)用netfilter的NF_INET_PRE_ROUTING鉤子 */return NF_HOOK(NFPROTO_IPV4, NF_INET_PRE_ROUTING, skb, dev, NULL,ip_rcv_finish);inhdr_error:IP_INC_STATS_BH(dev_net(dev), IPSTATS_MIB_INHDRERRORS);drop:kfree_skb(skb);out:return NET_RX_DROP; }如果skb的引用計(jì)數(shù)大于1,說明在其它地方也被使用,則克隆一個skb返回,否則直接返回原來的skb。
/*** skb_shard_check - check if buffer is shard and if so clone it* @skb: buffer to check* @pri: priority for memory allocation* * If the buffer is shared the buffer is cloned and the old copy drops a* reference. A new clone with a single reference is returned.* If the buffer is not shared the original buffer is returned. When being called* from interrupt status or with spinlocks held pri must be GFP_ATOMIC.* NULL is returned on a memory allocation failure.*/static inline struct sk_buff *skb_shared_check(struct sk_buff *skb, gfp_t pri) {/* 不能睡眠,否則調(diào)用might_sleep()打印棧的回溯信息 */might_sleep_if(pri & __GFP_WAIT); if (skb_shared(skb)) { /* skb->users是否為1 */struct sk_buff *nskb = skb_clone(skb, pri);kfree_skb(skb);skb = nskb;}return skb; }/*** skb_orphan - orphan a buffer* @skb: buffer to orphan* If a buffer currently has an owner then we call the owner's destructor* function and make the @skb unowned. The buffer continues to exist* but is no longer charged to its former owner.*/ static inline void skb_orphan(struct sk_buff *skb) {if (skb->destructor)skb->destructor(skb);skb->destructor = NULL;skb->sk = NULL; }??
ip_rcv_finish
?
ip_rcv_finish()主要做了:
查找路由,決定要把數(shù)據(jù)包發(fā)送到哪,賦值skb_dst()->input(),發(fā)往本地為ip_local_deliver,轉(zhuǎn)發(fā)為ip_forward()。
更新Traffic Control (Qos)層的統(tǒng)計(jì)數(shù)據(jù)。
處理IP選項(xiàng),檢查選項(xiàng)是否正確,然后將選項(xiàng)存儲在IPCB(skb)->opt中。
最后執(zhí)行skb_dst()->input(),要么發(fā)往四層,要么進(jìn)行轉(zhuǎn)發(fā),取決于IP的目的地址。
static int ip_rcv_finish(struct sk_buff *skb) {const struct iphdr *iph = ip_hdr(skb);struct rtable *rt;/* * Initialise the virtual path cache for the packet.* It describes how the packet travels inside linux networking.*/if (skb_dst(skb) == NULL) {/* 查找路由,決定要把包送往哪里 */int err = ip_route_input_noref(skb, iph->daddr, iph->saddr, iph->tos, skb->dev);if (unlikely(err)) {if (err == -EHOSTUNREACH) /* no route to host,主機(jī)不可達(dá) */IP_INC_STATS_BH(dev_net(skb->dev), IPSTATS_MIB_INADDRERRORS);else if (err == -ENETUNREACH) /* Network is unreachable,網(wǎng)絡(luò)不可達(dá) */IP_INC_STATS_BH(dev_net(skb->dev), IPSTATS_MIB_INNOROUTES);else if (err == -EXDEV) /* Cross-device link */NET_INC_STATS_BH(dev_net(skb->dev), LINUX_MIB_IPRPFILTER);goto drop; /* 目的地不可達(dá),丟棄 */}}/* 更新Traffic Control (Qos)層的統(tǒng)計(jì)數(shù)據(jù) */ #ifdef CONFIG_NET_CLS_ROUTEif (unlikely(skb_dst(skb)->tclassid)) {struct ip_rt_acct *st = this_cpu_ptr(ip_rt_acct);u32 idx = skb_dst(skb)->tclassid;st[idx & 0xFF].o_packets++;st[idx & 0xFF].o_bytes += skb->len;st[(idx >> 16) & 0xFF].i_packets++;st[(idx >> 16) & 0xFF].i_bytes += skb->len;} #endif/* 處理IP選項(xiàng),調(diào)用ip_options_compile()來檢查選項(xiàng)是否正確,然后將選項(xiàng)存儲* 在IPCB(skb)->opt中。*/if (iph->ihl > 5 && ip_rcv_options(skb))goto drop;rt = skb_rtable(skb);if (rt->rt_type == RTN_MULTICAST) {IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INMCAST, skb->len);} else if (rt->rt_type == RTN_BROADCAST)IP_UPD_PO_STATS_BH(dev_net(rt->dst.dev), IPSTATS_MIB_INBCAST, skb->len);/* skb_dst(skb)->input()在ip_route_input_noref()中被賦值,要么是ip_local_deliver(),* 要么是ip_forward(),取決于數(shù)據(jù)包的目的地址。*/return dst_input(skb);drop:kfree_skb(skb);return NET_RX_DROP; }/* Input packet from network to transport. */ static inline int dst_input(struct sk_buff *skb) {return skb_dst(skb)->input(skb); }?
轉(zhuǎn)載于:https://www.cnblogs.com/aiwz/p/6333289.html
總結(jié)
以上是生活随笔為你收集整理的数据包接收系列 — IP协议处理流程(一)的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 把Eclipse项目转换成Maven项目
- 下一篇: C#性能测试代码