Nova如何统计节点硬件资源
引言
當(dāng)我們?cè)谑褂媚切┙ㄔO(shè)在OpenStack之上的云平臺(tái)服務(wù)的時(shí)候,往往在概覽頁(yè)面都有一個(gè)明顯的位置用來(lái)展示當(dāng)前集群的一些資源使用情況,如,CPU,內(nèi)存,硬盤等資源的總量、使用量、剩余量。而且,每當(dāng)我們拓展集群規(guī)模之后,概覽頁(yè)面上的資源總量也會(huì)自動(dòng)增加,我們都熟知,OpenStack中的Nova服務(wù)負(fù)責(zé)管理這些計(jì)算資源,那么你有沒(méi)有想過(guò),它們是如何被Nova服務(wù)獲取的嗎?
Nova如何統(tǒng)計(jì)資源
我們知道,統(tǒng)計(jì)資源的操作屬于Nova服務(wù)內(nèi)部的機(jī)制,考慮到資源統(tǒng)計(jì)結(jié)果對(duì)后續(xù)操作(如創(chuàng)建虛擬機(jī),創(chuàng)建硬盤)的重要性,我們推斷該機(jī)制的運(yùn)行順序一定先于其他服務(wù)。
通過(guò)上述簡(jiǎn)單的分析,再加上一些必要的Debug操作,我們得出:
該機(jī)制的觸發(fā)點(diǎn)位于nova.service.WSGIService.start方法中:
其中,self.manager.pre_start_hook()的作用就是去獲取資源信息,它的直接調(diào)用為nova.compute.manager.pre_start_hook如下:
def pre_start_hook(self):"""After the service is initialized, but before we fully bringthe service up by listening on RPC queues, make sure to updateour available resources (and indirectly our available nodes)."""self.update_available_resource(nova.context.get_admin_context()) ...@periodic_task.periodic_taskdef update_available_resource(self, context):"""See driver.get_available_resource()Periodic process that keeps that the compute host's understanding ofresource availability and usage in sync with the underlying hypervisor.:param context: security context"""new_resource_tracker_dict = {}nodenames = set(self.driver.get_available_nodes())for nodename in nodenames:rt = self._get_resource_tracker(nodename)rt.update_available_resource(context)new_resource_tracker_dict[nodename] = rt# Delete orphan compute node not reported by driver but still in dbcompute_nodes_in_db = self._get_compute_nodes_in_db(context,use_slave=True)for cn in compute_nodes_in_db:if cn.hypervisor_hostname not in nodenames:LOG.audit(_("Deleting orphan compute node %s") % cn.id)cn.destroy()self._resource_tracker_dict = new_resource_tracker_dict上述代碼中的rt.update_available_resource()的直接調(diào)用實(shí)為nova.compute.resource_tracker.update_available_resource()如下:
def update_available_resource(self, context):"""Override in-memory calculations of compute node resource usage basedon data audited from the hypervisor layer.Add in resource claims in progress to account for operations that havedeclared a need for resources, but not necessarily retrieved them fromthe hypervisor layer yet."""LOG.audit(_("Auditing locally available compute resources"))resources = self.driver.get_available_resource(self.nodename)if not resources:# The virt driver does not support this functionLOG.audit(_("Virt driver does not support ""'get_available_resource' Compute tracking is disabled."))self.compute_node = Nonereturnresources['host_ip'] = CONF.my_ip# TODO(berrange): remove this once all virt drivers are updated# to report topologyif "numa_topology" not in resources:resources["numa_topology"] = Noneself._verify_resources(resources)self._report_hypervisor_resource_view(resources)return self._update_available_resource(context, resources)上述代碼中的self._update_available_resource的作用是根據(jù)計(jì)算節(jié)點(diǎn)上的資源實(shí)際使用結(jié)果來(lái)同步數(shù)據(jù)庫(kù)記錄,這里我們不做展開(kāi);self.driver.get_available_resource()的作用就是獲取節(jié)點(diǎn)硬件資源信息,它的實(shí)際調(diào)用為:
class LibvirtDriver(driver.ComputeDriver):def get_available_resource(self, nodename):"""Retrieve resource information.This method is called when nova-compute launches, andas part of a periodic task that records the results in the DB.:param nodename: will be put in PCI device:returns: dictionary containing resource info"""# Temporary: convert supported_instances into a string, while keeping# the RPC version as JSON. Can be changed when RPC broadcast is removedstats = self.get_host_stats(refresh=True)stats['supported_instances'] = jsonutils.dumps(stats['supported_instances'])return statsdef get_host_stats(self, refresh=False):"""Return the current state of the host.If 'refresh' is True, run update the stats first."""return self.host_state.get_host_stats(refresh=refresh)def _get_vcpu_total(self):"""Get available vcpu number of physical computer.:returns: the number of cpu core instances can be used."""if self._vcpu_total != 0:return self._vcpu_totaltry:total_pcpus = self._conn.getInfo()[2] + 1except libvirt.libvirtError:LOG.warn(_LW("Cannot get the number of cpu, because this ""function is not implemented for this platform. "))return 0if CONF.vcpu_pin_set is None:self._vcpu_total = total_pcpusreturn self._vcpu_totalavailable_ids = hardware.get_vcpu_pin_set()if sorted(available_ids)[-1] >= total_pcpus:raise exception.Invalid(_("Invalid vcpu_pin_set config, ""out of hypervisor cpu range."))self._vcpu_total = len(available_ids)return self._vcpu_total..... class HostState(object):"""Manages information about the compute node through libvirt."""def __init__(self, driver):super(HostState, self).__init__()self._stats = {}self.driver = driverself.update_status()def get_host_stats(self, refresh=False):"""Return the current state of the host.If 'refresh' is True, run update the stats first."""if refresh or not self._stats:self.update_status()return self._statsdef update_status(self):"""Retrieve status info from libvirt."""...data["vcpus"] = self.driver._get_vcpu_total()data["memory_mb"] = self.driver._get_memory_mb_total()data["local_gb"] = disk_info_dict['total']data["vcpus_used"] = self.driver._get_vcpu_used()data["memory_mb_used"] = self.driver._get_memory_mb_used()data["local_gb_used"] = disk_info_dict['used']data["hypervisor_type"] = self.driver._get_hypervisor_type()data["hypervisor_version"] = self.driver._get_hypervisor_version()data["hypervisor_hostname"] = self.driver._get_hypervisor_hostname()data["cpu_info"] = self.driver._get_cpu_info()data['disk_available_least'] = _get_disk_available_least()...注意get_available_resource方法的注釋信息,完全符合我們開(kāi)始的推斷。我們下面單以vcpus為例繼續(xù)調(diào)查資源統(tǒng)計(jì)流程,self.driver._get_vcpu_total的實(shí)際調(diào)用為L(zhǎng)ibvirtDriver._get_vcpu_total(上述代碼中已給出),如果配置項(xiàng)vcpu_pin_set沒(méi)有生效,那么得到的_vcpu_total的值為self._conn.getInfo()[2](self._conn可以理解為libvirt的適配器,它代表與kvm,qemu等底層虛擬化工具的抽象連接,getInfo()就是對(duì)libvirtmod.virNodeGetInfo的一次簡(jiǎn)單的封裝,它的返回值是一組數(shù)組,其中第三個(gè)元素就是vcpus的數(shù)量),我們看到這里基本就可以了,再往下就是libvirt的C語(yǔ)言代碼而不是Python的范疇了。
另一方面,如果我們配置了vcpu_pin_set配置項(xiàng),那么該配置項(xiàng)就被hardware.get_vcpu_pin_set方法解析成一個(gè)可用CPU位置索引的集合,再通過(guò)對(duì)該集合求長(zhǎng)后,我們也能得到最終想要的vcpus的數(shù)量。
如上,就是Nova統(tǒng)計(jì)節(jié)點(diǎn)硬件資源的整個(gè)邏輯過(guò)程(vcpus為例)。
總結(jié)
以上是生活随笔為你收集整理的Nova如何统计节点硬件资源的全部?jī)?nèi)容,希望文章能夠幫你解決所遇到的問(wèn)題。
- 上一篇: 【POI xls】解析xls遇到的问题
- 下一篇: jquery可见性选择器(匹配匹配所有显