當前位置：首頁 > 编程语言 > python >内容正文

python

python 标签数量_python实现的批量分析xml标签中各个类别个数功能示例

發布時間：2025/3/15 python 13 豆豆

生活随笔收集整理的這篇文章主要介紹了 python 标签数量_python实现的批量分析xml标签中各个类别个数功能示例小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

本文實例講述了python實現的批量分析xml標簽中各個類別個數功能。分享給大家供大家參考，具體如下：

文章目錄

需要個腳本分析下各個目標的數目順帶練習下多進程，自用，直接上代碼：

# -*- coding: utf-8 -*-

# @Time : 2019/06/10 18:56

# @Author : TuanZhangSama

import os

import xml.etree.ElementTree as ET

from multiprocessing import Pool,freeze_support,cpu_count

import imghdr

import logging

def get_all_xml_path(xml_dir:str,filter=['.xml']):

#遍歷文件夾下所有xml

result=[]

#maindir是當前搜索的目錄 subdir是當前目錄下的文件夾名 file是目錄下文件名

for maindir,subdir,file_name_list in os.walk(xml_dir):

for filename in file_name_list:

ext=os.path.splitext(filename)[1]#返回擴展名

if ext in filter:

result.append(os.path.join(maindir,filename))

return result

def analysis_xml(xml_path:str):

tree=ET.parse(xml_path)

root=tree.getroot()

result_dict={}

for obj in root.findall('object'):

obj_name = obj.find('name').text

obj_num=result_dict.get(obj_name,0)+1

result_dict[obj_name]=obj_num

if imghdr.what(xml_path.replace('.xml','.jpg')) != 'jpeg':

print(xml_path.replace('.xml','.jpg'),'is worng')

# logging.info(xml_path.replace('.xml','.jpg'))

if is_valid_jpg(xml_path.replace('.xml','.jpg')):

pass

return result_dict

def analysis_xmls_batch(xmls_path_list:list):

result_list=[]

for i in xmls_path_list:

result_list.append(analysis_xml(i))

return result_list

def collect_result(result_list:list):

all_result_dict={}

for result_dict in result_list:

for key,values in result_dict.items():

obj_num=all_result_dict.get(key,0)+values

all_result_dict[key]=obj_num

return all_result_dict

def main(xml_dir:str,result_save_path:str =None):

r'''根據xml文件統計所有樣本的數目.對于文件不完整的圖片和有xml但無圖片的樣本,直接進行刪除.默認跑滿所有的cpu核心

Parameters

----------

xml_dir : str

xml所在的文件夾.用的遞歸形式,因此只需保證xml在此目錄的子目錄下即可.對應的圖片和其xml要在同一目錄

result_save_path : str

分析結果的日志保存路徑.默認 None 無日志

'''

if result_save_path is not None:

assert isinstance(result_save_path,str),'{} is illegal path'.format(result_save_path)

else:

logging.basicConfig(filename=result_save_path,filemode='w',level=logging.INFO)

freeze_support()#windows 上用

xmls_path=get_all_xml_path(xml_dir)

worker_num=cpu_count()

print('your CPU num is',cpu_count())

length=float(len(xmls_path))/float(worker_num)

#計算下標，盡可能均勻地劃分輸入文件的列表

indices=[int(round(i*length)) for i in range(worker_num+1)]

#生成每個進程要處理的子文件列表

sublists=[xmls_path[indices[i]:indices[i+1]] for i in range(worker_num)]

pool=Pool(processes=worker_num)

all_process_result_list=[]

for i in range(worker_num):

all_process_result_list.append(pool.apply_async(analysis_xmls_batch,args=(sublists[i],)))

pool.close()

pool.join()

print('analysis done!')

_temp_list=[]

for i in all_process_result_list:

_temp_list=_temp_list+i.get()

result=collect_result(_temp_list)

logging.info(result)

print(result)

def is_valid_jpg(jpg_file):

"""判斷JPG文件下載是否完整 """

if not os.path.exists(jpg_file):

print(jpg_file,'is not existes')

os.remove(jpg_file.replace('.jpg','.xml'))

with open(jpg_file, 'rb') as fr:

fr.seek(-2, 2)

if fr.read() == b'\xff\xd9':

return True

else:

os.remove(jpg_file)

os.remove(jpg_file.replace('.jpg','.xml'))

print(jpg_file)

logging.error(jpg_file,'is imperfect img')

return False

if __name__=='__main__':

test_dir='/home/chiebotgpuhq/Share/winshare/origin'

save_path='/home/chiebotgpuhq/MyCode/python/pytorch/mmdetection-master/result.log'

main(test_dir,save_path)

PS：這里再為大家提供幾款關于xml操作的在線工具供大家參考使用：

希望本文所述對大家Python程序設計有所幫助。

總結

以上是生活随笔為你收集整理的python 标签数量_python实现的批量分析xml标签中各个类别个数功能示例的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇：需求分析师的基本功：逻辑思维、逻辑分析与
下一篇： Java的迭代器—— Iterator