Mips KVM TrapEmulate implemented in Linux
基本原理
Trap&Emulate,即陷入&模擬的方式,是純軟件實(shí)現(xiàn)的全虛擬化方案,基本不借助硬件虛擬化功能。本文主要關(guān)注內(nèi)存虛擬化實(shí)現(xiàn)中的核心,TLB miss相關(guān)實(shí)現(xiàn)。基本原理是:
所有的TLB miss都將導(dǎo)致Guest退出到VMM處理,然后在VMM中進(jìn)行相應(yīng)模擬。
具體實(shí)現(xiàn)原理描述如下:
-
當(dāng)前內(nèi)核版本(4.9)的實(shí)現(xiàn)方式定義了2中TLB:
- Guest TLB。由Guest OS維護(hù),用于映射GVA->GPA,本質(zhì)為一段內(nèi)存,用于模擬TLB,虛擬TLB,并不真實(shí)存在。該TLB不會(huì)被Guest OS直接使用,只用于生成Shadow TLB。Guest OS 在維護(hù)Guest TLB時(shí),對(duì) TLB 的讀寫(xiě)指令會(huì)被虛擬機(jī)管理器(virtual machine monitor, VMM)捕捉并模擬,使 Guest OS 認(rèn)為成功讀寫(xiě)了硬件 TLB。
- Shadow Host TLB,也是Host使用的物理TLB。由VMM(即Host維護(hù))。對(duì)于Guest來(lái)說(shuō),用于映射GVA->HPA,對(duì)于Host來(lái)說(shuō),用于映射HVA->HPA。
-
Guest最終通過(guò)Shadow Host TLB中的條目實(shí)現(xiàn)GVA到HPA的轉(zhuǎn)換,GPA->HPA的映射通過(guò)線性映射表實(shí)現(xiàn)。
-
Guest中,任何TLB相關(guān)的操作都將觸發(fā)異常。
Guest OS做訪存操作時(shí)有如下幾中情況發(fā)生:
- VMM先遍歷Guest TLB,如果找到相應(yīng)條目,則將Guest TLB中翻譯后的GPA轉(zhuǎn)換為HPA,然后填入Shadow Host TLB(硬件TLB),并返回Guest OS繼續(xù)運(yùn)行。
- 如果Guest TLB中沒(méi)有找到,則向Guest OS中注入TLB Load/Store異常(由Guest OS負(fù)責(zé)重填Guest TLB),然后填入Shadow Host TLB(硬件TLB),并返回Guest OS繼續(xù)運(yùn)行。
- 當(dāng)Guest得到調(diào)度運(yùn)行時(shí),TLB Load/Store異常(General異常)入口,最終會(huì)進(jìn)入page fault相關(guān)流程,完成Guest TLB重填。
- 重填TLB時(shí),會(huì)調(diào)用類(lèi)似TLBWR之類(lèi)的之令,此時(shí)會(huì)再次觸發(fā)VM-Exit,VMM通過(guò)捕獲該異常,其中填入虛擬的Guest TLB、flust相應(yīng)的Shadow Host TLB條目,然后返回Guest OS繼續(xù)執(zhí)行。
代碼實(shí)現(xiàn)
kvm_mips_handle_tlbmiss
tlbmiss異常通用入口,在如下兩種情況下發(fā)生:
主要處理邏輯:
- 查找Guest TLB中是否存在相應(yīng)entry;
- 如果存在,則調(diào)用kvm_mips_handle_mapped_seg_tlb_fault填充shadow Host TLB(即物理TLB);
- 如果不存在,則根據(jù)exccode向Guest中注入相應(yīng)的異常,比如TLB miss load異常,接口如kvm_mips_emulate_tlbmiss_ld。
具體代碼實(shí)現(xiàn)如下:
enum emulation_result kvm_mips_handle_tlbmiss(u32 cause,u32 *opc,struct kvm_run *run,struct kvm_vcpu *vcpu) {enum emulation_result er = EMULATE_DONE;/* 讀取異常類(lèi)型碼 */u32 exccode = (cause >> CAUSEB_EXCCODE) & 0x1f;/* 獲取發(fā)送tlbmiss的地址,為GVA地址 */unsigned long va = vcpu->arch.host_cp0_badvaddr;int index;kvm_debug("kvm_mips_handle_tlbmiss: badvaddr: %#lx\n",vcpu->arch.host_cp0_badvaddr);/** KVM would not have got the exception if this entry was valid in the* shadow host TLB. Check the Guest TLB, if the entry is not there then* send the guest an exception. The guest exc handler should then inject* an entry into the guest TLB.*//* 查找Guest TLB(本質(zhì)為一段內(nèi)存,虛擬TLB,并不真實(shí)存在),確認(rèn)是否存在 */index = kvm_mips_guest_tlb_lookup(vcpu,(va & VPN2_MASK) |(kvm_read_c0_guest_entryhi(vcpu->arch.cop0) &KVM_ENTRYHI_ASID));/* 不存在 */if (index < 0) {/* 根據(jù)異常類(lèi)型碼向Guest中注入不同的異常 */if (exccode == EXCCODE_TLBL) {/* TLB load異常 */er = kvm_mips_emulate_tlbmiss_ld(cause, opc, run, vcpu);} else if (exccode == EXCCODE_TLBS) {/* TLB set異常 */er = kvm_mips_emulate_tlbmiss_st(cause, opc, run, vcpu);} else {kvm_err("%s: invalid exc code: %d\n", __func__,exccode);er = EMULATE_FAIL;}} else {/* 如果在guest TLB中找到相應(yīng)條目 *//* 獲取相應(yīng)條目 */struct kvm_mips_tlb *tlb = &vcpu->arch.guest_tlb[index];/** Check if the entry is valid, if not then setup a TLB invalid* exception to the guest*//* 檢查條目是否可用,不可用,則再注入TLB相應(yīng)異常 */if (!TLB_IS_VALID(*tlb, va)) {if (exccode == EXCCODE_TLBL) {er = kvm_mips_emulate_tlbinv_ld(cause, opc, run,vcpu);} else if (exccode == EXCCODE_TLBS) {er = kvm_mips_emulate_tlbinv_st(cause, opc, run,vcpu);} else {kvm_err("%s: invalid exc code: %d\n", __func__,exccode);er = EMULATE_FAIL;}} else { /* 條目可用,則填充Host TLB */kvm_debug("Injecting hi: %#lx, lo0: %#lx, lo1: %#lx into shadow host TLB\n",tlb->tlb_hi, tlb->tlb_lo[0], tlb->tlb_lo[1]);/** OK we have a Guest TLB entry, now inject it into the* shadow host TLB*//* Host TLB的實(shí)際填充操作在kvm_mips_handle_mapped_seg_tlb_fault函數(shù)中完成*/if (kvm_mips_handle_mapped_seg_tlb_fault(vcpu, tlb)) {kvm_err("%s: handling mapped seg tlb fault for %lx, index: %u, vcpu: %p, ASID: %#lx\n",__func__, va, index, vcpu,read_c0_entryhi());er = EMULATE_FAIL;}}}return er;}kvm_mips_emulate_tlbmiss_ld
實(shí)現(xiàn)向Guest中注入TLB load異常,本質(zhì)為:
設(shè)置CP0 Status中的ST0_EXL,然后將PC指向General異常的處理入口,Guest得到調(diào)度即可直接跳轉(zhuǎn)到異常入口執(zhí)行,實(shí)現(xiàn)異常注入。
enum emulation_result kvm_mips_emulate_tlbinv_ld(u32 cause,u32 *opc,struct kvm_run *run,struct kvm_vcpu *vcpu) {struct mips_coproc *cop0 = vcpu->arch.cop0;struct kvm_vcpu_arch *arch = &vcpu->arch;unsigned long entryhi =(vcpu->arch.host_cp0_badvaddr & VPN2_MASK) |(kvm_read_c0_guest_entryhi(cop0) & KVM_ENTRYHI_ASID);/** 讀取Guest C0中Status寄存器,判斷ST0_EXL(標(biāo)識(shí)當(dāng)前是否已經(jīng)處于異常狀態(tài),* 如果已經(jīng)是,且再次觸發(fā)tlbmiss異常,此時(shí)不會(huì)重入,而會(huì)觸發(fā)General異常,* 即TLB load異常,也可以理解成X86中的缺頁(yè)異常),如果沒(méi)有設(shè)置,則需要進(jìn)行* 相應(yīng)設(shè)置,以便能向Guest中注入TLB load異常 */if ((kvm_read_c0_guest_status(cop0) & ST0_EXL) == 0) {/* save old pc */kvm_write_c0_guest_epc(cop0, arch->pc);/* 設(shè)置ST0_EXL標(biāo)記 */kvm_set_c0_guest_status(cop0, ST0_EXL);if (cause & CAUSEF_BD)kvm_set_c0_guest_cause(cop0, CAUSEF_BD);elsekvm_clear_c0_guest_cause(cop0, CAUSEF_BD);kvm_debug("[EXL == 0] delivering TLB INV @ pc %#lx\n",arch->pc);/* set pc to the exception entry point *//** 設(shè)置當(dāng)前的PC指針指向General異常的處理入口,這樣,當(dāng)Guest得到調(diào)度* 時(shí),就能直接進(jìn)入General異常處理流程中,其中會(huì)最終走到page fault* 相關(guān)流程,之前的文章中有相應(yīng)的原理描述*/arch->pc = KVM_GUEST_KSEG0 + 0x180;} else {/* 如果已經(jīng)設(shè)置了ST0_EXL標(biāo)記,則直接修改PC指針即可 */kvm_debug("[EXL == 1] delivering TLB MISS @ pc %#lx\n",arch->pc);arch->pc = KVM_GUEST_KSEG0 + 0x180;}/* 寫(xiě)入EXCCODE為T(mén)LBL */kvm_change_c0_guest_cause(cop0, (0xff),(EXCCODE_TLBL << CAUSEB_EXCCODE));/* setup badvaddr, context and entryhi registers for the guest */kvm_write_c0_guest_badvaddr(cop0, vcpu->arch.host_cp0_badvaddr);/* XXXKYMA: is the context register used by linux??? */kvm_write_c0_guest_entryhi(cop0, entryhi);/* Blow away the shadow host TLBs */kvm_mips_flush_host_tlb(1);return EMULATE_DONE; }kvm_mips_handle_mapped_seg_tlb_fault
主要完成任務(wù):根據(jù)Guest TLB中的entry,填充Host TLB中相應(yīng)條目。
int kvm_mips_handle_mapped_seg_tlb_fault(struct kvm_vcpu *vcpu,struct kvm_mips_tlb *tlb) {unsigned long entryhi = 0, entrylo0 = 0, entrylo1 = 0;struct kvm *kvm = vcpu->kvm;kvm_pfn_t pfn0, pfn1;gfn_t gfn0, gfn1;long tlb_lo[2];int ret;tlb_lo[0] = tlb->tlb_lo[0];tlb_lo[1] = tlb->tlb_lo[1];/** The commpage address must not be mapped to anything else if the guest* TLB contains entries nearby, or commpage accesses will break.*/if (!((tlb->tlb_hi ^ KVM_GUEST_COMMPAGE_ADDR) &VPN2_MASK & (PAGE_MASK << 1)))tlb_lo[(KVM_GUEST_COMMPAGE_ADDR >> PAGE_SHIFT) & 1] = 0;/* 獲取gfn */gfn0 = mips3_tlbpfn_to_paddr(tlb_lo[0]) >> PAGE_SHIFT;gfn1 = mips3_tlbpfn_to_paddr(tlb_lo[1]) >> PAGE_SHIFT;if (gfn0 >= kvm->arch.guest_pmap_npages ||gfn1 >= kvm->arch.guest_pmap_npages) {kvm_err("%s: Invalid gfn: [%#llx, %#llx], EHi: %#lx\n",__func__, gfn0, gfn1, tlb->tlb_hi);kvm_mips_dump_guest_tlbs(vcpu);return -1;}/** 建立gfn和pfn的映射關(guān)系,本質(zhì)是通memslot中獲取對(duì)應(yīng)關(guān)系,然后設(shè)置guest_pmap線性* 映射數(shù)組,后面使用*/if (kvm_mips_map_page(kvm, gfn0) < 0)return -1;if (kvm_mips_map_page(kvm, gfn1) < 0)return -1;/* 獲取pfn */pfn0 = kvm->arch.guest_pmap[gfn0];pfn1 = kvm->arch.guest_pmap[gfn1];/* Get attributes from the Guest TLB *//* 從Guest TLB中獲取entrylo0、entrylo1、entryhi等關(guān)鍵信息,以便于后續(xù)將其寫(xiě)入Host TLB */entrylo0 = mips3_paddr_to_tlbpfn(pfn0 << PAGE_SHIFT) |((_page_cachable_default >> _CACHE_SHIFT) << ENTRYLO_C_SHIFT) |(tlb_lo[0] & ENTRYLO_D) |(tlb_lo[0] & ENTRYLO_V);entrylo1 = mips3_paddr_to_tlbpfn(pfn1 << PAGE_SHIFT) |((_page_cachable_default >> _CACHE_SHIFT) << ENTRYLO_C_SHIFT) |(tlb_lo[1] & ENTRYLO_D) |(tlb_lo[1] & ENTRYLO_V);kvm_debug("@ %#lx tlb_lo0: 0x%08lx tlb_lo1: 0x%08lx\n", vcpu->arch.pc,tlb->tlb_lo[0], tlb->tlb_lo[1]);preempt_disable();entryhi = (tlb->tlb_hi & VPN2_MASK) | (KVM_GUEST_KERNEL_MODE(vcpu) ?kvm_mips_get_kernel_asid(vcpu) :kvm_mips_get_user_asid(vcpu));/* 將TLB Entry信息寫(xiě)入Host TLB,即物理TLB */ret = kvm_mips_host_tlb_write(vcpu, entryhi, entrylo0, entrylo1,tlb->tlb_mask);preempt_enable();return ret; }kvm_mips_map_page
建立gfn和pfn的映射關(guān)系,本質(zhì)是通memslot中獲取對(duì)應(yīng)關(guān)系,然后設(shè)置guest_pmap線性映射數(shù)組
static int kvm_mips_map_page(struct kvm *kvm, gfn_t gfn) {int srcu_idx, err = 0;kvm_pfn_t pfn;/* 已經(jīng)存在映射 */if (kvm->arch.guest_pmap[gfn] != KVM_INVALID_PAGE)return 0;srcu_idx = srcu_read_lock(&kvm->srcu);/* 將gfn轉(zhuǎn)換為pfn,通過(guò)memslot實(shí)現(xiàn),kvm標(biāo)準(zhǔn)接口 */pfn = gfn_to_pfn(kvm, gfn);if (is_error_noslot_pfn(pfn)) {kvm_err("Couldn't get pfn for gfn %#llx!\n", gfn);err = -EFAULT;goto out;}/* 填入映射關(guān)系 */kvm->arch.guest_pmap[gfn] = pfn; out:srcu_read_unlock(&kvm->srcu, srcu_idx);return err; } 原文地址: https://happyseeker.github.io/kernel/2017/01/11/Mips-KVM-Trap&E-implement.html總結(jié)
以上是生活随笔為你收集整理的Mips KVM TrapEmulate implemented in Linux的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: Tramp data In Kernel
- 下一篇: 内核中的page fault copy