django model filter_django中探索如何提高查询数据效率
平時用到django時候挺多, 最近有在搞數據庫表分區的實踐, 所以順便嘗試在django如何提高查詢效率, 也就嘗試在django中加入線程和協成的代碼, 看一下效果
相關工具
django + pycharm + mysql + postman創建django項目, 具體不多說了
1401 django-admin.py startproject db1405 cd db1413 python manage.py startapp cs我獲得文件夾樹
接著我們在cs/models.py編輯一個People模型
from django.db import models class People(models.Model):name = models.CharField(max_length=10, help_text='姓名')sex = models.SmallIntegerField(default=0, help_text="性別")age = models.SmallIntegerField(default=0, help_text='年齡')address = models.CharField(default='', max_length=128, help_text='地址')然后我們生成對應表
1416 python manage.py makemigrations1417 python manage.py migrate cs接下來我們創建了600w的假數據,
在/cs/management/commands/create_people.py 創建一個django的腳本
from django.core.management.base import BaseCommand from cs.models import People import randomclass Command(BaseCommand):""""""def handle(self, *args, **options):spe = 500objs = list()for i in range(6000000):age = random.randint(1, 70)sex = random.randint(0, 1)objs.append(People(name=str(i), age=age, sex=sex))if i % 500 != 0:continuePeople.objects.bulk_create(objs)objs = list()print(i)然后運行, 生成數據
python manage.py create_people這樣我們600w的測試數據就創建完成了
from django.shortcuts import render from django.views import View from django.http.response import JsonResponse from cs.models import People import gevent from gevent.pool import Pool import asyncio from concurrent.futures import ThreadPoolExecutor, as_completed from django.db.transaction import atomic# Create your views here. import timeclass DbViews(View):def get_count(self, age):return age, People.objects.filter(age=age).count()async def gets(self, age):count = People.objects.filter(age=age).countreturn count()async def get_async_count(self, age):print("***age***", age, time.time())age_count = await self.gets(age)print("***age***", age, time.time())return age, age_countdef gevent_pool(self, ages):ages_detail = dict()pool = Pool()ret = list()for age in ages:t = pool.spawn(self.get_count, age)ret.append(t)gevent.joinall(ret)for t in ret:age, age_count = t.valueages_detail.update({age: age_count})return ages_detaildef cur_get(self, ages):ages_detail = dict()for age in ages:age_count = People.objects.filter(age=age).count()ages_detail.update({age: age_count})return ages_detaildef thead_pool(self, ages):ages_detail = dict()thead_poll = ThreadPoolExecutor(max_workers=3)ret = list()for age in ages:t = thead_poll.submit(self.get_count, age)ret.append(t)for t in as_completed(ret):age, age_count = t.result()ages_detail.update({age: age_count})return ages_detaildef async_get(self, ages):ages_detail = dict()loop = asyncio.new_event_loop()asyncio.set_event_loop(loop)ret = list()for age in ages:t = asyncio.ensure_future(self.get_async_count(age))ret.append(t)done, pending = loop.run_until_complete(asyncio.wait(ret))for t in done:age, age_count = t.result()ages_detail.update({age: age_count})return ages_detaildef hclock(self):people = People.objects.select_for_update().filter(id__in=[3,4])with atomic():for i in people:time.sleep(10)print("*******",i.id)i.age += 1i.save()time.sleep(10)def get(self, request):ages = [10, 67, 29]ages_detail = self.hclock()ages_detail = dict()return JsonResponse(ages_detail)這里是全部view代碼, 下面我們會一個說明
然后我們在/db/urls.py配置一個接口路徑people
from django.conf.urls import url from django.contrib import admin from cs.views import DbViewsurlpatterns = [url('people/', DbViews.as_view(), name='cs') ]啟動web服務
我們假定我們的需求, 我們需要獲取10, 67, 29歲三種年齡的人數, 我們先使用查詢三次數據庫的正常方式
def get_count(self, age):return age, People.objects.filter(age=age).count()def cur_get(self, ages):ages_detail = dict()for age in ages:age_count = People.objects.filter(age=age).count()ages_detail.update({age: age_count})return ages_detaildef get(self, request):ages = [10, 67, 29]ages_detail = self.cur_get(ages)我們得到了
消耗了2419ms
接下來我們使用gevent 方式
def gevent_pool(self, ages):ages_detail = dict()pool = Pool()ret = list()for age in ages:t = pool.spawn(self.get_count, age)ret.append(t)gevent.joinall(ret)for t in ret:age, age_count = t.valueages_detail.update({age: age_count})return ages_detaildef get(self, request):ages = [10, 67, 29]ages_detail = self.gevent_pool(ages)return JsonResponse(ages_detail)我們得到了
很明顯的請求的時間沒有多大差別, 這可能models的count不支持的協成操作, 在搞過gevent下載, 可以使用monkey更改一下底層的一些東西
于是加入了
from gevent import monkey monkey.patch_all()但是出現了錯誤
During handling of the above exception, another exception occurred:Traceback (most recent call last):File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/utils/autoreload.py", line 227, in wrapperfn(*args, **kwargs)File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/core/management/commands/runserver.py", line 125, in inner_runself.check(display_num_errors=True)File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/core/management/base.py", line 359, in checkinclude_deployment_checks=include_deployment_checks,File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/core/management/base.py", line 346, in _run_checksreturn checks.run_checks(**kwargs)File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/core/checks/registry.py", line 81, in run_checksnew_errors = check(app_configs=app_configs)File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/core/checks/urls.py", line 16, in check_url_configreturn check_resolver(resolver)File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/core/checks/urls.py", line 26, in check_resolverreturn check_method()File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/urls/resolvers.py", line 254, in checkfor pattern in self.url_patterns:File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/utils/functional.py", line 35, in __get__res = instance.__dict__[self.name] = self.func(instance)File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/urls/resolvers.py", line 405, in url_patternspatterns = getattr(self.urlconf_module, "urlpatterns", self.urlconf_module)File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/utils/functional.py", line 35, in __get__res = instance.__dict__[self.name] = self.func(instance)File "/Users/lixiang/projects/py3/lib/python3.6/site-packages/django/urls/resolvers.py", line 398, in urlconf_modulereturn import_module(self.urlconf_name)File "/Users/lixiang/projects/py3/lib/python3.6/importlib/__init__.py", line 126, in import_modulereturn _bootstrap._gcd_import(name[level:], package, level)File "<frozen importlib._bootstrap>", line 994, in _gcd_importFile "<frozen importlib._bootstrap>", line 971, in _find_and_loadFile "<frozen importlib._bootstrap>", line 152, in __exit__File "<frozen importlib._bootstrap>", line 107, in release RuntimeError: cannot release un-acquired lock看來django不讓這樣更改, 我們有嘗試了async這種方式
async def get_async_count(self, age):print("***age***", age, time.time())age_count = await self.gets(age)print("***age***", age, time.time())def async_get(self, ages):ages_detail = dict()loop = asyncio.new_event_loop()asyncio.set_event_loop(loop)ret = list()for age in ages:t = asyncio.ensure_future(self.get_async_count(age))ret.append(t)done, pending = loop.run_until_complete(asyncio.wait(ret))for t in done:age, age_count = t.result()ages_detail.update({age: age_count})return ages_detaildef get(self, request):ages = [10, 67, 29]ages_detail = self.async_get(ages)return JsonResponse(ages_detail)得到結果
還是沒有什么差異, 看來協成沒有生效, 于是我們在調用協程時候打印了一下
***age*** 10 1560943042.4748821 ***age*** 10 1560943043.3534641 ***age*** 67 1560943043.3534899 ***age*** 67 1560943044.1488252 ***age*** 29 1560943044.148851 INFO 2019-06-19 11:17:24,987 basehttp 3413 123145324937216 "GET /people/ HTTP/1.1" 200 39 ('GET /people/ HTTP/1.1', '200', '39') ***age*** 29 1560943044.9875998很明顯執行的時候是阻塞了, 看來orm的方法執行時無法使用協成
現在我們使用一下線程
def thead_pool(self, ages):ages_detail = dict()thead_poll = ThreadPoolExecutor(max_workers=3)ret = list()for age in ages:t = thead_poll.submit(self.get_count, age)ret.append(t)for t in as_completed(ret):age, age_count = t.result()ages_detail.update({age: age_count})return ages_detail def get(self, request):ages = [10, 67, 29]ages_detail = self.thead_pool(ages)return JsonResponse(ages_detail)得到結果
這里響應時間是857ms, 比以上的大約2500ms方法快了大約1600ms,看來在django中對于多個查詢來說, 使用線程池是比較有效的提高訪問效率的方式
我們對于600w的數據也可以使用分區提高效率, 我們先把原來數據表cs_people刪除, 按age分區在重新導入數據
drop table cs_people create table cs_people(`id` INT AUTO_INCREMENT,`name` varchar(10),`sex` smallint,`age` smallint,`address` varchar(128),PRIMARY KEY (`id`, `age`))PARTITION by range COLUMNS(age)(partition p0 values less than(10),partition p1 values less than(20),partition p2 values less than(30),partition p3 values less than(40),partition p4 values less than(50),partition p6 values less than(60),partition p7 values less than MAXVALUE);我們可以查看分區情況:
select PARTITION_NAME, TABLE_ROWS from INFORMATION_SCHEMA.PARTITIONS WHERE TABLE_NAME='cs_people';我們還可以查看索引情況
show index from cs_people我們在生成數據的腳本中添加了批量創建的腳本, 使用了django自動bulk_create所以創建很快
其實在創建分區的時候我們已經將age設置為索引了, 一會我們在說只索引的情況
我們看一下各種方式的結果, 因為協程方式沒有生效,我們就不討論了
正常方式, 串行三次請求, 時間524ms(這個地方忘記保存截圖了)
線程池請求, 時間215ms
很明顯這兩種方式有很大的提升, 這是因為mysql將按照不同age存在到了不到分區表里, 我們在查詢不同age時, mysql可以智能的到不同的小分區中查找,一下子將查詢集縮小了數倍, 這里有索引的一些功勞,我們接下來在看看只有索引的情況下,效果是怎么樣的
我們刪除分區表穿件索引表,這里和分區創建表的時候就不在使用migrate的了
創建索引
查看索引
正常方式 81ms
線程池,84ms
結果發現對于600w的數據,分區會有些額外開銷, 造成了訪問時間多余只構建索引的情況, 而且正常和線程池這兩種方式時間大致相同
但是后來我增加了只構建索引這種情況的數據量,到達了1200w
線程池: 73ms
正常, 111ms
這樣看來相對數據量大情況還是線程池效率比較高
總結
以上是生活随笔為你收集整理的django model filter_django中探索如何提高查询数据效率的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: python将字典导入excel_pyt
- 下一篇: feign直接走熔断_四、Spring