當(dāng)前位置：首頁(yè) > 编程资源 > 编程问答 >内容正文

编程问答

Nova如何统计节点硬件资源

發(fā)布時(shí)間：2025/6/15 编程问答 29 豆豆

生活随笔收集整理的這篇文章主要介紹了 Nova如何统计节点硬件资源小編覺(jué)得挺不錯(cuò)的,現(xiàn)在分享給大家,幫大家做個(gè)參考.

引言

當(dāng)我們?cè)谑褂媚切┙ㄔO(shè)在OpenStack之上的云平臺(tái)服務(wù)的時(shí)候，往往在概覽頁(yè)面都有一個(gè)明顯的位置用來(lái)展示當(dāng)前集群的一些資源使用情況，如，CPU，內(nèi)存，硬盤等資源的總量、使用量、剩余量。而且，每當(dāng)我們拓展集群規(guī)模之后，概覽頁(yè)面上的資源總量也會(huì)自動(dòng)增加，我們都熟知，OpenStack中的Nova服務(wù)負(fù)責(zé)管理這些計(jì)算資源，那么你有沒(méi)有想過(guò)，它們是如何被Nova服務(wù)獲取的嗎？

Nova如何統(tǒng)計(jì)資源

我們知道，統(tǒng)計(jì)資源的操作屬于Nova服務(wù)內(nèi)部的機(jī)制，考慮到資源統(tǒng)計(jì)結(jié)果對(duì)后續(xù)操作(如創(chuàng)建虛擬機(jī)，創(chuàng)建硬盤)的重要性，我們推斷該機(jī)制的運(yùn)行順序一定先于其他服務(wù)。

通過(guò)上述簡(jiǎn)單的分析，再加上一些必要的Debug操作，我們得出：
該機(jī)制的觸發(fā)點(diǎn)位于nova.service.WSGIService.start方法中：

def start(self):"""Start serving this service using loaded configuration.Also, retrieve updated port number in case '0' was passed in, whichindicates a random port should be used.:returns: None"""if self.manager:self.manager.init_host()self.manager.pre_start_hook()if self.backdoor_port is not None:self.manager.backdoor_port = self.backdoor_portself.server.start()if self.manager:self.manager.post_start_hook()

其中，self.manager.pre_start_hook()的作用就是去獲取資源信息,它的直接調(diào)用為nova.compute.manager.pre_start_hook如下：

def pre_start_hook(self):"""After the service is initialized, but before we fully bringthe service up by listening on RPC queues, make sure to updateour available resources (and indirectly our available nodes)."""self.update_available_resource(nova.context.get_admin_context()) ...@periodic_task.periodic_taskdef update_available_resource(self, context):"""See driver.get_available_resource()Periodic process that keeps that the compute host's understanding ofresource availability and usage in sync with the underlying hypervisor.:param context: security context"""new_resource_tracker_dict = {}nodenames = set(self.driver.get_available_nodes())for nodename in nodenames:rt = self._get_resource_tracker(nodename)rt.update_available_resource(context)new_resource_tracker_dict[nodename] = rt# Delete orphan compute node not reported by driver but still in dbcompute_nodes_in_db = self._get_compute_nodes_in_db(context,use_slave=True)for cn in compute_nodes_in_db:if cn.hypervisor_hostname not in nodenames:LOG.audit(_("Deleting orphan compute node %s") % cn.id)cn.destroy()self._resource_tracker_dict = new_resource_tracker_dict

上述代碼中的rt.update_available_resource()的直接調(diào)用實(shí)為nova.compute.resource_tracker.update_available_resource()如下:

def update_available_resource(self, context):"""Override in-memory calculations of compute node resource usage basedon data audited from the hypervisor layer.Add in resource claims in progress to account for operations that havedeclared a need for resources, but not necessarily retrieved them fromthe hypervisor layer yet."""LOG.audit(_("Auditing locally available compute resources"))resources = self.driver.get_available_resource(self.nodename)if not resources:# The virt driver does not support this functionLOG.audit(_("Virt driver does not support ""'get_available_resource' Compute tracking is disabled."))self.compute_node = Nonereturnresources['host_ip'] = CONF.my_ip# TODO(berrange): remove this once all virt drivers are updated# to report topologyif "numa_topology" not in resources:resources["numa_topology"] = Noneself._verify_resources(resources)self._report_hypervisor_resource_view(resources)return self._update_available_resource(context, resources)

上述代碼中的self._update_available_resource的作用是根據(jù)計(jì)算節(jié)點(diǎn)上的資源實(shí)際使用結(jié)果來(lái)同步數(shù)據(jù)庫(kù)記錄，這里我們不做展開(kāi)；self.driver.get_available_resource()的作用就是獲取節(jié)點(diǎn)硬件資源信息，它的實(shí)際調(diào)用為：

class LibvirtDriver(driver.ComputeDriver):def get_available_resource(self, nodename):"""Retrieve resource information.This method is called when nova-compute launches, andas part of a periodic task that records the results in the DB.:param nodename: will be put in PCI device:returns: dictionary containing resource info"""# Temporary: convert supported_instances into a string, while keeping# the RPC version as JSON. Can be changed when RPC broadcast is removedstats = self.get_host_stats(refresh=True)stats['supported_instances'] = jsonutils.dumps(stats['supported_instances'])return statsdef get_host_stats(self, refresh=False):"""Return the current state of the host.If 'refresh' is True, run update the stats first."""return self.host_state.get_host_stats(refresh=refresh)def _get_vcpu_total(self):"""Get available vcpu number of physical computer.:returns: the number of cpu core instances can be used."""if self._vcpu_total != 0:return self._vcpu_totaltry:total_pcpus = self._conn.getInfo()[2] + 1except libvirt.libvirtError:LOG.warn(_LW("Cannot get the number of cpu, because this ""function is not implemented for this platform. "))return 0if CONF.vcpu_pin_set is None:self._vcpu_total = total_pcpusreturn self._vcpu_totalavailable_ids = hardware.get_vcpu_pin_set()if sorted(available_ids)[-1] >= total_pcpus:raise exception.Invalid(_("Invalid vcpu_pin_set config, ""out of hypervisor cpu range."))self._vcpu_total = len(available_ids)return self._vcpu_total..... class HostState(object):"""Manages information about the compute node through libvirt."""def __init__(self, driver):super(HostState, self).__init__()self._stats = {}self.driver = driverself.update_status()def get_host_stats(self, refresh=False):"""Return the current state of the host.If 'refresh' is True, run update the stats first."""if refresh or not self._stats:self.update_status()return self._statsdef update_status(self):"""Retrieve status info from libvirt."""...data["vcpus"] = self.driver._get_vcpu_total()data["memory_mb"] = self.driver._get_memory_mb_total()data["local_gb"] = disk_info_dict['total']data["vcpus_used"] = self.driver._get_vcpu_used()data["memory_mb_used"] = self.driver._get_memory_mb_used()data["local_gb_used"] = disk_info_dict['used']data["hypervisor_type"] = self.driver._get_hypervisor_type()data["hypervisor_version"] = self.driver._get_hypervisor_version()data["hypervisor_hostname"] = self.driver._get_hypervisor_hostname()data["cpu_info"] = self.driver._get_cpu_info()data['disk_available_least'] = _get_disk_available_least()...

注意get_available_resource方法的注釋信息，完全符合我們開(kāi)始的推斷。我們下面單以vcpus為例繼續(xù)調(diào)查資源統(tǒng)計(jì)流程，self.driver._get_vcpu_total的實(shí)際調(diào)用為L(zhǎng)ibvirtDriver._get_vcpu_total(上述代碼中已給出)，如果配置項(xiàng)vcpu_pin_set沒(méi)有生效，那么得到的_vcpu_total的值為self._conn.getInfo()[2]（self._conn可以理解為libvirt的適配器，它代表與kvm,qemu等底層虛擬化工具的抽象連接，getInfo()就是對(duì)libvirtmod.virNodeGetInfo的一次簡(jiǎn)單的封裝，它的返回值是一組數(shù)組，其中第三個(gè)元素就是vcpus的數(shù)量），我們看到這里基本就可以了，再往下就是libvirt的C語(yǔ)言代碼而不是Python的范疇了。

另一方面，如果我們配置了vcpu_pin_set配置項(xiàng)，那么該配置項(xiàng)就被hardware.get_vcpu_pin_set方法解析成一個(gè)可用CPU位置索引的集合，再通過(guò)對(duì)該集合求長(zhǎng)后，我們也能得到最終想要的vcpus的數(shù)量。

如上，就是Nova統(tǒng)計(jì)節(jié)點(diǎn)硬件資源的整個(gè)邏輯過(guò)程(vcpus為例)。

總結(jié)

以上是生活随笔為你收集整理的Nova如何统计节点硬件资源的全部?jī)?nèi)容，希望文章能夠幫你解決所遇到的問(wèn)題。

如果覺(jué)得生活随笔網(wǎng)站內(nèi)容還不錯(cuò)，歡迎將生活随笔推薦給好友。

上一篇：【POI xls】解析xls遇到的问题
下一篇： jquery可见性选择器（匹配匹配所有显