Mongodb实现多表join
生活随笔
收集整理的這篇文章主要介紹了
Mongodb实现多表join
小編覺得挺不錯的,現(xiàn)在分享給大家,幫大家做個參考.
文章目錄
- Mongodb實現(xiàn)多表join
- 1、通過遍歷其他表,插入到當(dāng)前表
- 2、優(yōu)化方式
- 2.1、mongodb的lookup, 也就是聚合功能
- 2.2、mapreduce 分布式j(luò)oin多表
Mongodb實現(xiàn)多表join
千萬數(shù)量級的table, 如何實現(xiàn)join?
1、通過遍歷其他表,插入到當(dāng)前表
from pymongo import MongoClientclient = MongoClient("mongodb://192.168.123.64:27017/") temp = client["gd_raw_data"]["temp"] prplregistex = client["gd_raw_data"]["prplregistex"] repairfee = client["gd_raw_data"]["repairfee"] prplcitemcar = client["gd_raw_data"]["prplcitemcar"] lossthirdparty_lossmain = client["gd_raw_data"]["lossthirdparty_lossmain"] lossthirdparty = client["gd_raw_data"]["lossthirdparty"] lossmain = client["gd_raw_data"]["lossmain"] citemkind = client["gd_raw_data"]["citemkind"] check = client["gd_raw_data"]["check"]query = {} cursor = temp.find(query, no_cursor_timeout=True) try:i = 0for doc in cursor:registno = doc['registno']print("報案號:{}".format(registno))prplregistex_info = prplregistex.find_one({ "registno": registno},no_cursor_timeout=True)repairfee_info = repairfee.find_one({ "registno": registno},no_cursor_timeout=True)prplcitemcar_info = prplcitemcar.find_one({ "registno": registno},no_cursor_timeout=True)lossthirdparty_lossmain_info = lossthirdparty_lossmain.find_one({ "registno": registno},no_cursor_timeout=True)lossthirdparty_info = lossthirdparty.find_one({ "registno": registno},no_cursor_timeout=True)lossmain_info = lossmain.find_one({ "registno": registno},no_cursor_timeout=True)citemkind_info = citemkind.find_one({ "registno": registno},no_cursor_timeout=True)check_info = check.find_one({ "registno": registno},no_cursor_timeout=True)newvalues = {"$set": {"prplregistex_info": prplregistex_info,"repairfee_info": repairfee_info,"prplcitemcar_info": prplcitemcar_info,"lossthirdparty_lossmain_info": lossthirdparty_lossmain_info,"lossthirdparty_info": lossthirdparty_info,"lossmain_info": lossmain_info,"citemkind_info": citemkind_info,"check_info": check_info}}temp.update_one({ "registno": registno}, newvalues)finally:client.close()發(fā)現(xiàn)我的PC(i7 6代)實現(xiàn)1700萬多表join需要125個小時,也就是5天5夜,中途服務(wù)器容易掛死。
2、優(yōu)化方式
要么多線程,要么分布式
2.1、mongodb的lookup, 也就是聚合功能
操作之前請務(wù)必為關(guān)聯(lián)的字段創(chuàng)建索引
db.getCollection("prplcmain").aggregate([{"$lookup": {"from": "lida","localField": "registno","foreignField": "registno","as": "carinfo"}},{"$lookup": {"from": "prpldriver","localField": "registno","foreignField": "registno","as": "prpldriver"}},{"$lookup": {"from": "prplinjured","localField": "registno","foreignField": "registno","as": "prplinjured"}},{"$lookup": {"from": "prplinsured","localField": "registno","foreignField": "registno","as": "prplinsured"}},{"$lookup": {"from": "regist","localField": "registno","foreignField": "registno","as": "regist"}},{"$out" : "total"}],{"allowDiskUse" : true} );這個相同配置下2個小時內(nèi)可以搞定
2.2、mapreduce 分布式j(luò)oin多表
這個還沒研究透徹
https://stackoverflow.com/questions/38882184/join-two-collections-with-mapreduce-in-mongodb
總結(jié)
以上是生活随笔為你收集整理的Mongodb实现多表join的全部內(nèi)容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 《Neo4j全栈开发》_陈韶健
- 下一篇: spark搭建和使用,处理massive