Seeing Through Fog Without Seeing Fog:Deep Multimodal Sensor Fusion in Unseen Adverse Weather (翻)
Title:Seeing Through Fog Without Seeing Fog:Deep Multimodal Sensor Fusion in Unseen Adverse Weather
看透霧而不見霧:在看不見的不利天氣下進行深度多模態傳感器融合
Abstract:
The fusion of multimodal sensor streams,such as cam-era,lidar,and radar measurements,plays a critical role inobject detection for autonomous vehicles,which base theirdecision making on these inputs.While existing methodsexploit redundant information in good environmentalconditions,they fail in adverse weather where the sensorystreams can be asymmetrically distorted.These rare“edge-case”scenarios are not represented in available datasets,and existing fusion architectures are not designed tohandle them.To address this challenge we present a novelmultimodal dataset acquired in over 10,000 km of drivingin northern Europe.Although this dataset is the first largemultimodal dataset in adverse weather,with 100k labels forlidar,camera,radar,and gated NIR sensors,it does not fa-cilitate training as extreme weather is rare.To this end,wepresent a deep fusion network for robust fusion without alarge corpus of labeled training data covering all asymmet-ric distortions.Departing from proposal-level fusion,wepropose a single-shot model that adaptively fuses features,driven by measurement entropy.We validate the proposedmethod,trained on clean data,on our extensive validationdataset.Code and data are available here https://github.com/princeton-computational-imaging/SeeingThroughFog.
摘要:多模態傳感器流的融合,如攝像機、激光雷達和雷達測量,在自動駕駛車輛的目標檢測中起著至關重要的作用,它們的決策基于這些輸入。現有的方法在良好的環境條件下利用冗余信息,但在惡劣天氣下感知流會發生不對稱失真。這些罕見的"邊緣案例"場景并沒有在可用的數據集中表現出來,現有的融合架構也沒有對其進行處理。為了應對這一挑戰,我們提出了一個在北歐超過10,000公里的駕駛中獲得的新的多模態數據集。盡管該數據集是第一個在惡劣天氣下的大型多模態數據集,具有100k個標簽( forlidar、攝像頭、雷達和門控NIR傳感器),但由于極端天氣很少,不便于訓練。為此,我們提出了一種深度融合網絡,用于魯棒融合,而不需要覆蓋所有非對稱失真的大量有標簽訓練數據。從建議級融合出發,我們提出了一種由測量熵驅動的自適應融合特征的單樣本模型。我們在我們廣泛的驗證數據集上對提出的方法進行驗證,在干凈的數據上進行訓練。代碼和數據見https://github.com/princeton-computational-imaging/SeeingThroughFog.
1.Introduction
Object detection is a fundamental computer vision prob-lem in autonomous robots,including self-driving vehiclesand autonomous drones.Such applications require 2D or3D bounding boxes of scene objects in challenging real-world scenarios,including complex cluttered scenes,highlyvarying illumination,and adverse weather conditions.Themost promising autonomous vehicle systems rely on redun-dant inputs from multiple sensor modalities[58,6,73],in-cluding camera,lidar,radar,and emerging sensor such asFIR[29].A growing body of work on object detection us-ing convolutional neural networks has enabled accurate 2Dand 3D box estimation from such multimodal data,typicallyrelying on camera and lidar data[64,11,56,71,66,42,35].While these existing methods,and the autonomous sys-tem that performs decision making on their outputs,per-form well under normal imaging conditions,they fail inadverse weather and imaging conditions.This is becauseexisting training datasets are biased towards clear weatherconditions,and detector architectures are designed to relyonly on the redundant information in the undistorted sen-sory streams.However,they are not designed for harsh sce-narios that distort the sensor streams asymmetrically,seeFigure.1.Extreme weather conditions are statistically rare.For example,thick fog is observable only during 0.01%?of typical driving in North America,and even in foggy re-gions,dense fog with visibility below 50 m occurs only upto 15 times a year[61].Figure 2 shows the distributionof real driving data acquired over four weeks in Swedencovering 10,000 km driven in winter conditions.The nat-urally biased distribution validates that harsh weather sce-narios are only rarely or even not at all represented in avail-able datasets[65,19,58].Unfortunately,domain adaptationmethods[44,28,41]also do not offer an ad-hoc solution asthey require target samples,and adverse weather-distorteddata are underrepresented in general.Moreover,existingmethods are limited to image data but not to multisensordata,e.g.including lidar point-cloud data.
1.?介紹
目標檢測是自主機器人中的一個基本計算機視覺問題,包括自動駕駛車輛和自主無人機。這類應用在具有挑戰性的現實場景中需要場景對象的2D或3D邊界框,包括復雜的雜亂場景、高度變化的光照和惡劣的天氣條件。最有前途的自動駕駛系統依賴于多個傳感器模態[ 58、6、73]的冗余輸入,包括相機、激光雷達、雷達和紅外等新興傳感器[ 29 ]。越來越多的使用卷積神經網絡進行目標檢測的工作已經能夠從這些多模態數據中準確地估計2D和3D框,通常依賴于相機和激光雷達數據[ 64、11、56、71、66、42、35]。雖然這些現有的方法,以及對其輸出執行決策的自治系統,在正常成像條件下表現良好,但在惡劣天氣和成像條件下失敗。這是因為現有的訓練數據集偏向于晴朗的天氣狀況,而檢測器架構的設計僅依賴于未失真感知流中的冗余信息。然而,它們并不是針對不對稱地扭曲傳感器流的惡劣場景而設計的,見圖1。極端天氣條件在統計上是罕見的。例如,只有在0.01 % 在北美地區,甚至在霧區,能見度低于50 m的濃霧每年僅發生15次期間才能觀測到濃霧。圖2顯示了瑞典在冬季駕駛10000公里的情況下,四周內獲得的實際駕駛數據的分布情況。自然偏態分布驗證了惡劣天氣場景在可用數據集[ 65、19、58]中只有很少甚至根本沒有表現。不幸的是,域適應方法[ 44、28、41]也沒有提供一個臨時的解決方案,因為它需要目標樣本,而不利的天氣失真數據通常是代表性不足的。此外,現有方法僅限于圖像數據,而不適用于多傳感器數據,如激光雷達點云數據。
?Existing fusion methods have been proposed mostly forlidar-camera setups[64,11,42,35,12],as a result of thelimited sensor inputs in existing training datasets[65,19,58].These methods do not only struggle with sensor distor-tions in adverse weather due to the bias of the training data.Either they perform late fusion through filtering after inde-pendently processing the individual sensor streams[12],orthey fuse proposals[35]or high-level feature vectors[64].The network architecture of these approaches is designedwith the assumption that the data streams are consistent andredundant,i.e.an object appearing in one sensory streamalso appears in the other.However,in harsh weather condi-tions,such as fog,rain,snow,or extreme lighting condition,including low-light or low-reflectance objects,multimodalsensor configurations can fail asymmetrically.For exam-ple,conventional RGB cameras provide unreliable noisymeasurements in low-light scene areas,while scanning li-dar sensors provide reliable depth using active illumina-tion.In rain and snow,small particles affect the color im-age and lidar depth estimates equally through backscatter.Adversely,in foggy or snowy conditions,state-of-the-artpulsed lidar systems are restricted to less than 20 m rangedue to backscatter,see Figure 3.While relying on lidarmeasurements might be a solution for night driving,it isnot for adverse weather conditions.
由于現有的訓練數據集[ 65、19、58]中傳感器的輸入有限,現有的融合方法大多是針對相機-相機設置提出的。這些方法不僅在惡劣天氣下由于訓練數據的偏差而與傳感器故障作斗爭。它們要么在獨立處理各個傳感器流后通過濾波進行后期融合[ 12 ],要么融合提案[ 35 ]或高層特征向量[ 64 ]。這些方法的網絡結構是在數據流是一致且冗余的假設下設計的,即出現在一個感知流中的對象也出現在另一個感知流中。然而,在惡劣的天氣條件下,如霧、雨、雪或極端光照條件下,包括低光照或低反射物體,多模態傳感器配置可能會不對稱地失效。例如,傳統的RGB相機在低光照場景區域提供不可靠的噪聲測量,而掃描激光雷達傳感器通過主動照明提供可靠的深度。在雨和雪中,小粒子通過后向散射對彩色圖像和激光雷達深度估計產生同等影響。相反,在霧天或雪天條件下,由于后向散射的影響,先進的脈沖激光雷達系統被限制在20 m以內,見圖3。雖然依靠激光雷達測量可能是夜間駕駛的一種解決方案,但它不適用于惡劣的天氣條件。
In this work,we propose a multimodal fusion method forobject detection in adverse weather,including fog,snow,and harsh rain,without having large annotated trainingdatasets available for these scenarios.Specifically,we han-dle asymmetric measurement corruptions in camera,lidar,radar,and gated NIR sensor streams by departing from ex-isting proposal-level fusion methods:we propose an adap-tive single-shot deep fusion architecture which exchangesfeatures in intertwined feature extractor blocks.This deepearly fusion is steered by measured entropy.The proposedadaptive fusion allows us to learn models that generalizeacross scenarios.To validate our approach,we address thebias in existing datasets by introducing a novel multimodaldataset acquired on three months of acquisition in northernEurope.This dataset is the first large multimodal driving?dataset in adverse weather,with 100k labels for lidar,cam-era,radar,gated NIR sensor,and FIR sensor.Although theweather-bias still prohibits training,this data allows us tovalidate that the proposed method generalizes robustly tounseen weather conditions with asymmetric sensor corrup-tions,while being trained on clean data.Specifically,we make the following contributions:
在本文中,我們提出了一種多模態融合方法,用于在霧、雪和惡劣降雨等惡劣天氣下進行目標檢測,而無需為這些場景提供大量有標注的訓練數據集。具體來說,我們通過與現有的建議級融合方法不同,處理相機、激光雷達、雷達和門控NIR傳感器流中的不對稱測量損壞:我們提出了一種自適應的單樣本深度融合架構,在相互交織的特征提取器塊中交換特征。這種深度早期融合是由測量的熵引導的。提出的自適應融合允許我們學習跨場景泛化的模型。為了驗證我們的方法,我們通過介紹一個新的多模態數據集在歐洲北部的三個月的采集來解決現有數據集的偏見。該數據集是第一個惡劣天氣下的大型多模態駕駛數據集,具有100k個標簽,用于激光雷達、攝像機、雷達、門控NIR傳感器和FIR傳感器。雖然天氣偏差仍然禁止訓練,但這一數據使我們能夠驗證所提出的方法在干凈數據上訓練的同時,對具有不對稱傳感器故障的未知天氣條件具有魯棒性。
具體而言,我們做出以下貢獻:
?We introduce a multimodal adverse weather datasetcovering camera,lidar,radar,gated NIR,and FIR sen-sor data.The dataset contains rare scenarios,such asheavy fog,heavy snow,and severe rain,during morethan 10,000 km of driving in northern Europe.
?我們介紹了一個涵蓋相機、激光雷達、雷達、門控NIR和FIR傳感器數據的多模態不良天氣數據集。該數據集包含了在北歐超過10,000公里的駕駛過程中罕見的場景,如大霧、大雪和暴雨。
?We propose a deep multimodal fusion network whichdeparts from proposal-level fusion,and instead adap-tively fuses driven by measurement entropy.
?我們提出了一個深度多模態融合網絡,它脫離了提案級融合,而是由測量熵驅動的自適應融合。
?We assess the model on the proposed dataset,validat-ing that it generalizes to unseen asymmetric distor-tions.The approach outperforms state-of-the-art fu-sion methods more than 8%AP in hard scenarios in-dependent of weather,including light fog,dense fog,snow,and clear conditions,and it runs in real-time.
?我們在提出的數據集上評估了該模型,驗證了其泛化為看不見的非對稱扭曲。該方法在與天氣無關的硬場景中,包括輕霧、濃霧、雪和晴朗條件下,優于現有的融合方法8 %以上的AP,并且實時運行。
2.Related
WorkDetection in Adverse Weather Conditions Over the lastdecade,seminal work on automotive datasets[5,14,19,16,65,9]has provided a fertile ground for automotiveobject detection[11,8,64,35,40,20],depth estima-tion[18,39,21],lane-detection[26],traffic-light detec-tion[32],road scene segmentation[5,2],and end-to-enddriving models[4,65].Although existing datasets fuel thisresearch area,they are biased towards good weather con-ditions due to geographic location[65]and captured sea-son[19],and thus lack severe distortions introduced by rarefog,severe snow,and rain.A number of recent worksexplore camera-only approaches in such adverse condi-tions[51,7,1].However,these datasets are very small withless than 100 captured images[51]and limited to camera-only vision tasks.In contrast,existing autonomous driv-ing applications rely on multimodal sensor stacks,includ-ing camera,radar,lidar,and emerging sensor,such as gatedNIR imaging[22,23],and have to be evaluated on thou-sands of hours of driving.In this work,we fill this gap andintroduce a large scale evaluation set in order to develop afusion model for such multimodal inputs that is robust tounseen distortions.
近十年來,在汽車數據集[ 5、14、19、16、65、9]上的開創性工作為汽車目標檢測[ 11、8、64、35、40、20]、深度估計[ 18、39、21]、車道線檢測[ 26 ]、交通燈檢測[ 32 ]、道路場景分割[ 5、2]和端到端駕駛模型[ 4、65]提供了肥沃的土壤。盡管現有的數據集為本研究區域提供了燃料,但由于地理位置[ 65 ]和捕獲的季節[ 19 ],它們偏向于良好的天氣狀況,因此缺乏由罕見的霧、嚴重的雪和雨引入的嚴重失真。最近的一些工作探討了在這種不利條件下僅使用相機的方法。然而,這些數據集非常小,只有不到100張拍攝圖像[ 51 ],并且僅限于相機視覺任務。相比之下,現有的自動駕駛應用依賴于多模態傳感器棧,包括相機、雷達、激光雷達和新興傳感器,如門控近紅外成像[ 22、23],并且需要在數千小時的駕駛時間內進行評估。在本工作中,我們填補了這一空白,并引入了一個大規模的評價集,以便為這些多模態輸入開發一個對不可觀測的失真具有魯棒性的融合模型。
Data Preprocessing in Adverse Weather A large body ofwork explores methods for the removal of sensor distor-tions before processing.Especially fog and haze removalfrom conventional intensity image data have been exploredextensively[67,70,33,53,36,7,37,46].Fog results ina distance-dependent loss in contrast and color.Fog re-moval methods have not only been suggested for display?application[25],it has also been proposed as preprocess-ing to improve the performance of downstream semantictasks[51].Existing fog and haze removal methods rely onscene priors on the latent clear image and depth to solve theill-posed recovery.These priors are either hand-crafted[25]and used for depth and transmission estimation separately,or they are learned jointly as part of trainable end-to-endmodels[37,31,72].Existing methods for fog and visibilityestimation[57,59]have been proposed for camera driver-assistance systems.Image restoration approaches have alsobeen applied to deraining[10]or deblurring[36].
數據惡劣天氣下的預處理大量工作探索了在處理前去除傳感器故障的方法。特別是對傳統強度圖像數據中的霧和霾去除進行了廣泛的探索。霧導致對比度和顏色的距離依賴性損失。去霧方法不僅被提出用于顯示應用[ 25 ],也被提出作為預處理來提高下游語義任務的性能[ 51 ]。現有的去霧和去霧方法依賴于潛在清晰圖像和深度的場景先驗來解決不適定恢復問題。這些先驗要么是手工設計的[ 25 ],分別用于深度和傳輸估計,要么作為可訓練端到端模型[ 37、31、72]的一部分共同學習。現有的霧和能見度估計方法[ 57、59]已經被提出用于相機輔助駕駛系統。圖像復原方法也被應用于去雨[ 10 ]或去模糊[ 36 ]。
Domain Adaptation Another line of research tackles theshift of unlabeled data distributions by domain adaptation[60,28,50,27,69,62].Such methods could be appliedto adapt clear labeled scenes to demanding adverse weatherscenes[28]or through the adaptation of feature representa-tions[60].Unfortunately,both of these approaches struggleto generalize,because,in contrast to existing domain trans-fer methods,weather-distorted data in general,not only la-beled data,is underrepresented.Moreover,existing meth-ods do not handle multimodal data.
領域自適應 另一類研究通過領域自適應[ 60、28、50、27、69、62]解決無標簽數據分布的偏移問題。這類方法可以應用于將清晰的標注場景適應苛刻的惡劣天氣場景[ 28 ]或通過特征表示的自適應[ 60 ]。不幸的是,這兩種方法都很難推廣,因為與現有的域轉移方法相比,通常情況下,天氣失真數據,而不僅僅是標記數據,是代表性不足的。此外,現有方法沒有處理多模態數據。
Multisensor Fusion Multisensor feeds in autonomous ve-hicles are typically fused to exploit varying cues in the mea-surements[43],simplify path-planning[15],to allow forredundancy in the presence of distortions[47],or solvefor joint vision tasks,such as 3D object detection[64].Existing sensing systems for fully-autonomous driving in-clude lidar,camera,and radar sensors.As large automotivedatasets[65,19,58]cover limited sensory inputs,existingfusion methods have been proposed mostly for lidar-camerasetups[64,55,11,35,42].Methods such as AVOD[35]andMV3D[11]incorporate multiple views from camera and li-dar to detect objects.They rely on the fusion of pooledregions of interest and hence perform late feature fusionfollowing popular region proposal architectures[49].In adifferent line of research,Qi et al.[48]and Xu et al.[64]propose a pipeline model that requires a valid detection out-put for the camera image and a 3D feature vector extractedfrom the lidar point-cloud.Kim et al.[34]propose a gatingmechanism for camera-lidar fusion.In all existing meth-ods,the sensor streams are processed separately in the fea-ture extraction stage,and we show that this prohibits learn-ing redundancies,and,in fact,performs worse than a singlesensor stream in the presence of asymmetric measurementdistortions.
多傳感器融合在無人駕駛車輛中的多傳感器饋源通常是融合的,以利用測量中的不同線索[ 43 ],簡化路徑規劃[ 15 ],在存在失真時允許冗余[ 47 ],或解決聯合視覺任務,如3D目標檢測[ 64 ]。現有的用于全自主駕駛的傳感系統包括激光雷達、攝像頭和雷達傳感器。由于大型汽車數據集[ 65、19、58]涵蓋的感官輸入有限,現有的融合方法大多針對激光雷達-相機系統[ 64、55、11、35、42]提出。AVOD [ 35 ]和MV3D [ 11 ]等方法融合了相機和激光雷達的多個視角來檢測物體。它們依賴于感興趣區域的融合,因此按照流行的區域建議架構進行后期的特征融合[ 49 ]。在不同的研究路線中,Qi等[ 48 ]和Xu等[ 64 ]提出了一種管道模型,該模型需要相機圖像的有效檢測輸出和從激光雷達點云中提取的三維特征向量。Kim等人[ 34 ]提出了一種相機-眼瞼融合的選通機制。在所有現有的方法中,傳感器流在特征提取階段是分開處理的,我們表明這禁止了學習冗余,并且,事實上,在存在不對稱測量失真的情況下,表現比單個傳感器流差。
3.Multimodal Adverse Weather Dataset
To assess object detection in adverse weather,we haveacquired a large-scale automotive dataset providing 2D and3D detection bounding boxes for multimodal data with afine classification of weather,illumination,and scene typein rare adverse weather situations.Table 1 compares our?dataset to recent large-scale automotive datasets,such as theWaymo[58],NuScenes[6],KITTI[19]and the BDD[68]dataset.In contrast to[6]and[68],our dataset containsexperimental data not only in light weather conditions butalso in heavy snow,rain,and fog.A detailed description ofthe annotation procedures and label specifications is givenin the supplemental material.With this cross-weather an-notation of multimodal sensor data and broad geographi-cal sampling,it is the only existing dataset that allows forthe assessment of our multimodal fusion approach.In thefuture,we envision researchers developing and evaluatingmultimodal fusion methods in weather conditions not cov-ered in existing datasets.
3 .多模態不利天氣數據集
為了評估惡劣天氣下的目標檢測,我們獲得了一個大規模的汽車數據集,為多模態數據提供2D和3D檢測邊界框,在罕見的惡劣天氣情況下對天氣、光照和場景類型進行精細分類。表1比較了我們的 數據集與最近的大規模汽車數據集,如Waymo [ 58 ],NuScenes [ 6 ],KITTI [ 19 ]和BDD [ 68 ]數據集。與[ 6 ]和[ 68 ]不同的是,我們的數據集不僅包含了小天氣條件下的實驗數據,而且包含了大雪、大雨和大霧的實驗數據。補充材料中給出了注釋步驟和標簽規范的詳細描述。通過這種多模態傳感器數據的跨天氣標注和廣泛的地理采樣,它是唯一允許評估我們的多模態融合方法的現有數據集。在未來,我們展望了研究人員在現有數據集未覆蓋的天氣條件下開發和評估多模態融合方法。
In Figure 2,we plot the weather distribution of the pro-posed dataset.The statistics were obtained by manually an-notating all synchronized frames at a frame rate of 0.1 Hz.We guided human annotators to distinguish light from densefog when the visibility fell below 1 km[45]and 100 m,re-spectively.If fog occurred together with precipitation,thescenes were either labeled as snowy or rainy depending onthe environment road conditions.For our experiments,wecombined snow and rainy conditions.Note that the statisticsvalidate the rarity of scenes in heavy adverse weather,whichis in agreement to[61]and demonstrates the difficulty andcritical nature of obtaining such data in the assessment oftruly self-driving vehicles,i.e.without the interaction of re-mote operators outside of geo-fenced areas.We found thatextreme adverse weather conditions occur only locally andchange very quickly.
在圖2中,我們繪制了所提數據集的天氣分布。統計通過手動作標記所有同步幀以0.1 Hz的幀率獲得。當能見度分別低于1 km [ 45 ]和100 m時,我們指導人類注釋者區分輕霧和濃霧。如果霧和降水同時發生,則根據環境道路情況將場景標記為雪天或雨天。對于我們的實驗,我們結合了雪和雨的條件。值得注意的是,該統計數據證實了惡劣天氣下的場景是罕見的,這與文獻[ 61 ]一致,并表明在評估真正的自動駕駛車輛時,獲得這些數據的難度和關鍵性質,即沒有地理禁區以外的遠程運營商的互動。我們發現極端不利氣象條件只在局地發生,且變化非常快。
The individual weather conditions result in asymmetri-cal perturbations of various sensor technologies,leading toasymmetric degradation,i.e.instead of all sensor outputsbeing affected uniformly by a deteriorating environmentalcondition,some sensors degrade more than others,see Fig-ure 3.For example,conventional passive cameras performwell in daytime conditions,but their performance degradesin night-time conditions or challenging illumination settingssuch as low sun illumination.Meanwhile,active scanningsensors as lidar and radar are less affected by ambient lightchanges due to active illumination and a narrow bandpass?on the detector side.On the other hand,active lidar sensorsare highly degraded by scattering media as fog,snow orrain,limiting the maximal perceivable distance at fog den-sities below 50 m to 25 m,see Figure 3.Millimeter-waveradar waves do not strongly scatter in fog[24],but currentlyprovide only low azimuthal resolution.Recent gated im-ages have shown robust perception in adverse weather[23],provide high spatial resolution,but are lacking color infor-mation compared to standard imagers.With these sensor-specific weaknesses and strengths of each sensor,multi-modal data can be crucial in robust detection methods.
不同的氣象條件導致各種傳感器技術的不對稱擾動,導致不對稱的退化,即所有傳感器的輸出不受惡化的環境條件的統一影響,一些傳感器的退化程度高于其他傳感器,見圖3。例如,傳統的被動相機在白天條件下表現良好,但在夜間條件下或太陽光照不足等具有挑戰性的照明設置中,其性能會下降。同時,激光雷達和雷達等主動掃描傳感器由于主動照明和窄帶通,受環境光變化的影響較小在探測器側。另一方面,主動激光雷達傳感器受霧、雪或雨等散射介質的影響嚴重退化,將霧密度低于50 m時的最大可感知距離限制在25 m以內,見圖3。毫米波雷達波在霧中沒有強烈的散射[ 24 ],但目前只能提供較低的方位分辨率。最近的門控圖像在惡劣天氣下表現出穩健的感知[ 23 ],提供了高空間分辨率,但與標準成像儀相比缺乏顏色信息。利用這些傳感器特有的弱點和每個傳感器的優勢,多模態數據可以在穩健的檢測方法中至關重要。
?
?
3.1.Multimodal Sensor Setup
For acquisition we have equipped a test vehicle with sen-sors covering the visible,mm-wave,NIR,and FIR band,seeFigure 2.We measure intensity,depth,and weather condi-tion.
Stereo Camera As visible-wavelength RGB cameras,we use a stereo pair of two front-facing high-dynamic-range automotive RCCB cameras,consisting of two On-Semi AR0230 imagers with a resolution of 1920×1024,a baseline of 20.3 cm and 12 bit quantization.The camerasrun at 30 Hz and are synchronized for stereo imaging.UsingLensagon B5M8018C optics with a focal length of 8 mm,afield-of-view of 39.6°×21.7°is obtained.
3.1.多模態傳感器設置
為了采集,我們配備了一個測試車輛,傳感器覆蓋可見光、毫米波、NIR和FIR波段,見圖2。我們測量強度、深度和天氣狀況。
立體相機 作為可見光波長的RGB相機,我們使用兩個前置的高動態范圍汽車RCCB相機的立體對,由兩個On - Semi AR0230成像儀組成,分辨率為1920 × 1024,基線為20.3 cm,12 bit量化。相機運行在30 Hz,并同步進行立體成像。使用焦距為8mm的Lensagon B5M8018C光學器件,獲得了39.6 ° × 21.7 °的視場。
Gated camera We capture gated images in the NIR bandat 808 nm using a BrightwayVision BrightEye camera op-erating at 120 Hz with a resolution of 1280×720 and a bitdepth of 10 bit.The camera provides a similar field-of-viewas the stereo camera with 31.1°×17.8°.Gated imagersrely on time-synchronized camera and flood-lit flash lasersources[30].The laser pulse emits a variable narrow pulse,and the camera captures the laser echo after an adjustable?delay.This enables to significantly reduce backscatter fromparticles in adverse weather[3].Furthermore,the high im-ager speed enables to capture multiple overlapping sliceswith different range-intensity profiles encoding extractabledepth information in between multiple slices[23].Follow-ing[23],we capture 3 broad slices for depth estimation andadditionally 3-4 narrow slices together with their passivecorrespondence at a system sampling rate of 10 Hz.
門控相機 采用Brightway Vision Bright Eye相機,分辨率為1280 × 720,位深為10 bit,工作頻率為120 Hz,采集808 nm近紅外波段的門控圖像。該相機提供了與31.1 ° × 17.8 °立體相機相似的視場。門控成像儀依賴于時間同步相機和泛光燈閃光激光源[ 30 ]。激光脈沖發射可變窄脈沖,相機經過可調 延時后捕獲激光回波。這使得在惡劣天氣下能夠顯著減少來自粒子的后向散射[ 3 ]。此外,高成像速度能夠捕獲多個具有不同距離-強度剖面的重疊切片,在多個切片之間編碼可提取的深度信息[ 23 ]。根據文獻[ 23 ],在系統采樣率為10 Hz的情況下,捕獲3個用于深度估計的寬切片和3 ~ 4個窄切片及其被動對應。
Radar For radar sensing,we use a proprietary frequency-modulated continuous wave(FMCW)radar at 77 GHz with1°angular resolution and distances up to 200 m.The radarprovides position-velocity detections at 15 Hz.
雷達用于雷達探測,我們使用77 GHz的專用調頻連續波( FMCW )雷達,其角分辨率為1 °,距離可達200 m。雷達提供15 Hz的位置-速度探測。
Lidar On the roof of the car,we mount two laser scannersfrom Velodyne,namely HDL64 S3D and VLP32C.Both areoperating at 903 nm and can provide dual returns(strongestand last)at 10 Hz.While the Velodyne HDL64 S3D pro-vides equally distributed 64 scanning lines with an angularresolution of 0.4°,the Velodyne VLP32C offers 32 non-linear distributed scanning lines.HDL64 S3D and VLP32Cscanners achieve a range of 100 m and 120 m,respectively.
激光雷達在車頂安裝兩臺調速發電機激光掃描儀,分別是HDL64 S3D和VLP32C。兩者都工作在903 nm,并能提供10 Hz的雙回波(最強和最后)。調速發電機HDL64 S3D提供64條均勻分布的掃描線,角度分辨率為0.4 °,調速發電機VLP32C提供32條非線性分布的掃描線。HDL64 S3D和VLP32C掃描儀分別實現了100 m和120 m的掃描范圍。
FIR camera Thermal images are captured with an AxisQ1922 FIR camera at 30 Hz.The camera offers a resolu-tion of 640×480 with a pixel pitch of 17μm and a noiseequivalent temperature difference(NETD)<100 mK.
FIR相機采用AxisQ1922型FIR相機在30 Hz下采集熱圖像。相機分辨率為640 × 480,像元間距為17 μ m,噪聲等效溫差( NETD ) < 100 mK。
Environmental Sensors We measured environmen-tal information with an Airmar WX150 weather stationthat provides temperature,wind speed and humidity,anda proprietary road friction sensor.All sensors are time-synchronized and ego-motion corrected using a proprietaryinertial measurement unit(IMU).The system provides asampling rate of 10 Hz.
環境傳感器我們使用提供溫度、風速和濕度的Airmar WX150氣象站和專有的路面摩擦傳感器測量環境信息。所有傳感器都使用專有的慣性測量單元( IMU )進行時間同步和自運動校正。系統提供10 Hz的采樣率。
3.2.Recordings
Real-world Recordings All experimental data has beencaptured during two test drives in February and December2019 in Germany,Sweden,Denmark,and Finland for twoweeks each,covering a distance of 10,000 km under dif-ferent weather and illumination conditions.A total of 1.4million frames at a frame rate of 10 Hz have been collected.Every 100th frame was manually labeled to balance scenetype coverage.The resulting annotations contain 5,5k clearweather frames,1k captures in dense fog,1k captures inlight fog,and 4k captures in snow/rain.Given the extensivecapture effort,this demonstrates that training data in harshconditions is rare.We tackle this approach by training onlyon clear data and testing on adverse data.The train andtest regions do not have any geographic overlap.Insteadof partitioning by frame,we partition our dataset based onindependent recordings(5-60 min in length)from differentlocations.These recordings originate from 18 different ma-jor cities illustrated in Figure 2 and several smaller citiesalong the route.
3.2?記錄
所有實驗數據均于2019年2月和12月在德國、瑞典、丹麥和芬蘭進行了為期兩周的兩次測試,覆蓋了不同天氣和光照條件下的10000 km距離。在10Hz的幀率下,共采集了140萬幀圖像。每隔100幀進行人工標注以平衡場景類型覆蓋。得到的標注包含5,5k個晴空天氣幀,1k個在濃霧中捕獲,1k個在輕霧中捕獲,4k個在雪/雨中捕獲。鑒于廣泛的捕獲努力,這表明在惡劣條件下的訓練數據是罕見的。我們通過只對清晰數據進行訓練和對不良數據進行測試來解決這個問題。列車和測試區域沒有任何地理重疊。與按幀劃分不同,我們基于不同位置的獨立記錄( 5 ~ 60 min)對數據集進行劃分。這些記錄來源于圖2所示的18個不同的主要城市和沿途的幾個較小的城市。
Controlled Condition Recordings To collect image andrange data under controlled conditions,we also providemeasurements acquired in a fog chamber.Details on the fogchamber setup can be found in[17,13].We have captured35k frames at a frame rate of 10 Hz and labeled a subsetof 1,5k frames under two different illumination conditions(day/night)and three fog densities with meteorological vis-ibilities V of 30 m,40 m and 50 m.Details are given in thesupplemental material,where we also do comparisons to asimulated dataset,using the forward model from[51].
受控條件記錄為了收集受控條件下的圖像和范圍數據,我們還提供了在霧室中獲得的測量。關于霧室設置的細節可以在[ 17、13]中找到。我們在兩種不同光照條件(白天/夜晚)和三種霧天密度下,以10 Hz的幀率拍攝了35k幀圖像,并標注了1,5k幀圖像的子集,氣象能見度V分別為30 m,40 m和50 m。詳細信息在補充材料中給出,我們還使用文獻[ 51 ]中的正演模型與模擬數據集進行了比較。
4.Adaptive Deep Fusion
In this section,we describe the proposed adaptive deepfusion architecture that allows for multimodal fusion in thepresence of unseen asymmetric sensor distortions.We de-vise our architecture under real-time processing constraintsrequired for self-driving vehicles and autonomous drones.Specifically,we propose an efficient single-shot fusion ar-chitecture.
4 .自適應深度融合
在本節中,我們描述了所提出的自適應深度融合架構,該架構允許在不可見的非對稱傳感器失真情況下進行多模態融合。我們在自動駕駛汽車和自主無人機所需的實時處理約束下設計了我們的架構。具體地,我們提出了一種高效的單炮融合結構。
4.1.Adaptive Multimodal Single-Shot Fusion
The proposed network architecture is shown in Figure 4.It consists of multiple single-shot detection branches,eachanalyzing one sensor modality.
4 . 1 .自適應多模態單鏡頭融合
本文提出的網絡架構如圖4所示。它由多個單發檢測分支組成,每個分支分析一個傳感器模態。
Data Representation The camera branch uses conven-tional three-plane RGB inputs,while for the lidar andradar branch,we depart from recent bird’s eye-view(BeV)projection[35]schemes or raw point-cloud representa-tions[64].BeV projection or point-cloud inputs do notallow for deep early fusion as the feature representationsin the early layers are inherently different from the camerafeatures.Hence,existing BeV fusion methods can only fusefeatures in a lifted space,after matching region proposals,but not earlier.Figure 4 visualizes the proposed input dataencoding,which aids deep multimodal fusion.Instead ofusing a naive depth-only input encoding,we provide depth,height,and pulse intensity as input to the lidar network.Forthe radar network,we assume that the radar is scanning in a2D-plane orthogonal to the image plane and parallel to thehorizontal image dimension.Hence,we consider radar in-variant along the vertical image axis and replicate the scanalong vertical axis.Gated images are transformed into theimage plane of the RGB camera using a homography map-ping,see supplemental material.The proposed input en-coding allows for a position and intensity-dependent fusionwith pixel-wise correspondences between different streams.We encode missing measurement samples with zero value.
數據相機分支使用傳統的三平面RGB輸入,而對于激光雷達和雷達分支,我們偏離了最近的鳥瞰( BeV )投影[ 35 ]方案或原始的點云表示[ 64 ]。BeV投影或點云輸入不允許進行深度早期融合,因為早期層中的特征表示與相機特征本質上不同。因此,現有的BeV融合方法只能在匹配區域提議后的提升空間中進行特征融合,而不能更早地進行特征融合。圖4可視化了所提出的輸入數據編碼,這有助于深度多模態融合。我們不使用簡單的深度輸入編碼,而是提供深度、高度和脈沖強度作為激光雷達網絡的輸入。對于雷達網,我們假設雷達在與圖像平面正交且平行于水平圖像維度的二維平面內進行掃描。因此,我們考慮沿垂直圖像軸的雷達不變量,并沿垂直軸復制掃描。門控圖像通過單應映射變換到RGB相機的圖像平面,見補充材料。所提出的輸入編碼允許不同流之間具有像素級對應的位置和強度依賴的融合。我們對缺失的測量樣本進行零值編碼。
Feature Extraction As feature extraction stack in eachstream,we use a modified VGG[54]backbone.Similar to[35,11],we reduce the number of channels by half and cutthe network at the conv4 layer.Inspired by[40,38],we usesix feature layers from conv4-10 as input to SSD detection?layers.The feature maps decrease in size1,implementing afeature pyramid for detections at different scales.As shownin Figure 4,the activations of different feature extractionstacks are exchanged.To steer fusion towards the most reli-able information,we provide the sensor entropy to each fea-ture exchange block.We first convolve the entropy,apply asigmoid,multiply with the concatenated input features fromall sensors,and finally concatenate the input entropy.Thefolding of entropy and application of the sigmoid generatesa multiplication matrix in the interval[0,1].This scales theconcatenated features for each sensor individually based onthe available information.Regions with low entropy canbe attenuated,while entropy rich regions can be amplifiedin the feature extraction.Doing so allows us to adaptivelyfuse features in the feature extraction stack itself,which wemotivate in depth in the next section.
特征提取作為每個流中的特征提取堆棧,我們使用了一個修改的VGG [ 54 ]主干。與[ 35、11]類似,我們將通道數減少一半,并在conv4層進行網絡裁剪。受[ 40、38]的啟發,我們使用conv4 - 10的六個特征層作為SSD檢測層的輸入。特征圖尺寸減小為1,實現不同尺度檢測的特征金字塔。如圖4所示,交換了不同特征提取棧的激活。為了引導融合朝向最可靠的信息,我們為每個特征交換塊提供傳感器熵。我們首先對熵值進行卷積,然后應用aigmoid,與所有傳感器的級聯輸入特征相乘,最后將輸入熵值進行級聯。熵的折疊和sigmoid函數的應用產生了區間[ 0 , 1]上的乘法矩陣。該方法基于可用信息對每個傳感器的關聯特征進行單獨的縮放。在特征提取時,低熵的區域可以被衰減,而熵豐富的區域可以被放大。這樣做允許我們在特征提取堆棧本身中自適應地融合特征,這將在下一節中進行深入研究。
4.2.Entropy-steered Fusion
To steer the deep fusion towards redundant and reliableinformation,we introduce an entropy channel in each sen-sor stream,instead of directly inferring the adverse weathertype and strength as in[57,59].We estimate local measure-ment entropy,the entropy is calculated for each 8 bit binarized stream Iwith pixel values i [0,255]in the proposed image-space?data representation.Each stream is split into patches ofsize M×N=16 px×16 px resulting in a w×h=1920 px×1024 px entropy map.The multimodal entropymaps for two different scenarios are shown in Figure 5:theleft scenario shows a scene containing a vehicle,cyclist,andpedestrians in a controlled fog chamber.The passive RGBcamera and lidar suffer from backscatter and attenuationwith decreasing fog visibilities,while the gated camera sup-presses backscatter through gating.Radar measurementsare also not substantially degraded in fog.The right sce-nario in Figure 5 shows a static outdoor scene under vary-ing ambient lighting.In this scenario,active lidar and radarare not affected by changes in ambient illumination.For thegated camera,the ambient illumination disappears,leavingonly the actively illuminated areas,while the passive RGBcamera degenerates with decreasing ambient light.
為了將深度融合引向冗余和可靠的信息,我們在每個傳感器流中引入一個熵通道,而不是像在[ 57、59]中那樣直接推斷不利天氣類型和強度。我們估計局部測量熵,在提出的圖像空間數據表示中,對每個像素值為i[ 0,255]的8位二進制流I計算熵。每個流被分割成大小為M × N = 16 px × 16 px的小塊,得到w × h = 1920 px × 1024 px的熵圖。兩種不同場景下的多模態熵圖如圖5所示:左側場景為受控霧室內包含車輛、騎車人和行人的場景。被動RGB相機和激光雷達受到后向散射和衰減的影響,霧的能見度降低,而門控相機通過門控抑制后向散射。
The steering process is learned purely on clean weatherdata,which contains different illumination settings presentin day to night-time conditions.No real adverse weatherpatterns are presented during training.Further,we dropsensor streams randomly with probability 0.5 and set theentropy to a constant zero value.
轉向過程純粹是在干凈的天氣數據上學習的,其中包含了白天到夜間不同的光照設置。訓練過程中沒有出現真實的不利天氣形勢。進一步,我們以概率0.5隨機丟棄傳感器流,并將熵值設置為恒定的零值。
4.3.Loss Functions and Training Details
The number of anchor boxes in different feature layersand their sizes play an important role during training andare given in the supplemental material.In total,each anchorbox with class label yi and probability pi is trained using thecross entropy loss with softmax,
?The loss is split up for positive and negative anchor boxeswith a matching threshold of 0.5.For each positive anchorbox,the bounding box coordinates x are regressed using aHuber loss H(x)given by,?
The total number of negative anchors is restricted to 5×thenumber of positive examples using hard example mining[40,52].All networks are trained from scratch with a con-stant learning rate and L2 weight decay of 0.0005.
?
4.3.損失函數和訓練細節
不同特征層中錨框的數量和大小在訓練過程中起著重要作用,并在補充材料中給出。總的來說,每個帶有類別標簽yi和概率pi的錨框使用帶softmax的交叉熵損失進行訓練, 將損失拆分為正負錨框,匹配閾值為0.5。對于每個正錨框,使用由, 給出的Huber損失H ( x )回歸包圍框坐標x。使用難例挖掘[ 40 , 52]將負錨框總數限制為正例數量的5倍。所有網絡從頭開始訓練,學習速率恒定,L2權重衰減0.0005。
5.Assessment
In this section,we validate the proposed fusion modelon unseen experimental test data.We compare the methodagainst existing detectors for single sensory inputs and fu-sion methods,as well as domain adaptation methods.Dueto the weather-bias of training data acquisition,we only usethe clear weather portion of the proposed dataset for train-ing.We assess the detection performance using our novelmultimodal weather dataset as a test set,see supplementaldata for test and training split details.
在本節中,我們在未見到的實驗測試數據上對提出的融合模型進行驗證。我們將該方法與現有的單感官輸入檢測器和融合方法以及域適應方法進行比較。由于訓練數據采集的天氣偏向性,我們只使用所提數據集的晴朗天氣部分進行訓練。我們使用新的多模態天氣數據集作為測試集來評估檢測性能,測試和訓練分割細節見補充數據。
We validate the proposed approach in Table 2,which we?dub Deep Entropy Fusion,on real adverse weather data.We report Average Precision(AP)for three different dif-ficulty levels(easy,moderate,hard)and evaluate on carsfollowing the KITTI evaluation framework[19]at variousfog densities,snow disturbances,and clear weather condi-tions.We compare the proposed model against recent state-of-the-art lidar-camera fusion models,including AVOD-FPN[35],Frustum PointNets[48],and variants of the pro-posed method with alternative fusion or sensory inputs.Asbaseline variants,we implement two fusion and four sin-gle sensor detectors.In particular,we compare against latefusion with image,lidar,gated,and radar features concate-nated just before bounding-box regression(Fusion SSD),and early fusion by concatenating all sensory data at theearly beginning of one feature extraction stack(ConcatSSD).The Fusion SSD network shares the same structureas the proposed model,but without the feature exchangeand the adaptive fusion layer.Moreover,we compare theproposed model against an identical SSD branch with sin-gle sensory input(Image-only SSD,Gated-only SSD,Lidar-only SSD,Radar-only SSD).All models were trained withidentical hyper-parameters and anchors.
在本節中,我們在未知的實驗測試數據上對提出的融合模型進行驗證。我們將該方法與現有的單感官輸入檢測器和融合方法以及域適應方法進行了比較。由于訓練數據采集的天氣偏向性,我們只使用所提數據集的晴朗天氣部分進行訓練。我們使用新的多模態天氣數據集作為測試集來評估檢測性能,測試和訓練分割細節見補充數據。我們在真實的惡劣天氣數據上驗證了表2中提出的方法,我們稱之為深度熵融合。我們報告了3個不同難度等級的平均精度( Average Precision,AP ) (易、中、難),并按照KITTI評價框架[ 19 ]在不同霧密度、雪干擾和晴朗天氣條件下對汽車進行評價。我們將提出的模型與最新的先進的激光雷達-相機融合模型進行了比較,包括AVOD - FPN [ 35 ]、截頭錐體PointNets [ 48 ],以及采用交替融合或感官輸入的改進方法。作為基線變體,我們實現了兩個融合和四個單傳感器檢測器。特別地,我們比較了與圖像、激光雷達、門控和雷達特征在邊界框回歸之前級聯的后期融合( Fusion SSD )和在一個特征提取堆棧的早期連接所有感官數據的早期融合( Concat SSD )。Fusion SSD網絡與本文模型結構相同,但沒有特征交換和自適應融合層。此外,我們將所提出的模型與具有單一感官輸入的SSD分支進行了比較。所有模型均使用相同的超參數和錨點進行訓練。
Evaluated on adverse weather scenarios,the detection?performance decrease for all methods.Note that assessmentmetrics can increase simultaneously as scene complexitychanges between the weather splits.For example,whenfewer vehicles participate in road traffic or the distance be-tween vehicles increases in icy conditions,fewer vehiclesare occluded.While the performance for image and gateddata is almost steady,it decreases substantially for lidar datawhile it increases for radar data.The decrease in lidar per-formance can be described by the strong backscatter,seeSupplemental Material.As a maximum of 100 measure-ment targets limits the performance of the radar input,thereported improvements are resulting from simpler scenes.
在惡劣天氣場景下評估,所有方法的檢測性能都有所下降。值得注意的是,評估度量可以隨著天氣片段之間場景復雜度的變化而同時增加。例如,在結冰條件下,當參與道路交通的車輛較少或者車輛間距離增加時,較少車輛被遮擋。雖然圖像和門控數據的性能幾乎是穩定的,但對于激光雷達數據,它的性能大幅下降,而對于雷達數據,它的性能增加。激光雷達性能的下降可以用強后向散射來描述,見補充材料。由于最多100個測量目標限制了雷達輸入的性能,因此報告的改進來自于更簡單的場景。
Overall,the large reduction in lidar performance forfoggy conditions affects the lidar only detection rate by adrop in 45.38%AP.Furthermore,it also has a strong impacton camera-lidar fusion models AVOD,Concat SSD and Fu-sion SSD.Learned redundancies no longer hold,and thesemethods even fall below image-only methods.
總體而言,霧天條件下激光雷達性能的大幅下降影響了僅激光雷達的探測率下降了45.38 % AP,同時也對相機-激光雷達融合模型AVOD、Concat SSD和Fusion SSD產生了強烈的影響。學習到的冗余不再成立,這些方法甚至低于圖像方法。
Two-stage methods,such as Frustum PointNet[48],dropquickly.However,they asymptotically achieve higher re-sults compared to AVOD,because the statistical priorslearned for the first stage are based on Image-only SSD thatlimits its performance to image-domain priors.AVOD islimited by several assumptions that hold for clear weather,such as the importance sampling of boxes filled with li-dar data during training,achieving the lowest fusion per-formance overall.Moreover,as the fog density increases,the proposed adaptive fusion model outperforms all othermethods.Especially under severe distortions,the proposedadaptive fusion layer results in significant margins overthe model without it(Deep Fusion).Overall the proposedmethod outperforms all baseline approaches.In dense fog,it improves by a margin of 9.69%compared to the next-bestfeature-fusion variant.
兩步法,如截頭錐體點網絡[ 48 ],Dropquickly .然而,與AVOD相比,它們漸進地獲得了更高的結果,因為第一階段學習的統計先驗是基于圖像域的SSD,這限制了它的性能。AVOD受限于幾個適用于晴朗天氣的假設條件,例如訓練時對裝滿激光雷達數據的箱子進行重要性采樣,總體上實現了最低的融合性能。此外,隨著霧密度的增加,所提出的自適應融合模型優于所有其他方法。特別是在嚴重失真下,提出的自適應融合層比沒有它的模型( Deep Fusion )產生了顯著的邊緣。總體而言,所提方法優于所有基線方法。在濃霧中,它比次優的特征融合算法提高了9.69 %。
For completeness,we also compare the proposed modelto recent domain adaptation methods.First,we adapt ourImage-Only SSD features from clear weather to adverseweather following[60].Second,we investigate the style?transfer from clear weather to adverse weather utilizing[28]and generate adverse weather training samples from clearweather input.Note that these methods have an unfair ad-vantage over all other compared approaches as they haveseen adverse weather scenarios sampled from our validationset.Note that domain adaptation methods cannot be directlyapplied as they need target images from a specific domain.Therefore,they do also not offer a solution for rare edgecases with limited data.Furthermore[28]does not modeldistortions,including fog or snow,see experiments in theSupplemental Material.We note that synthetic data aug-mentation following[51]or image-to-image reconstructionmethods that remove adverse weather effects[63]do nei-ther affect the reported margins of the proposed multimodaldeep entropy fusion.
為了完整起見,我們還將提出的模型與最近的域適應方法進行了比較。首先,我們將我們的Image - Only SSD特征從晴朗天氣適應到逆風天氣[ 60 ]。其次,我們對風格進行了考察。為了完整起見,我們還將提出的模型與最近的領域適應方法進行了比較。首先,我們將我們的Image - Only SSD特征從晴朗天氣適應到逆風天氣[ 60 ]。其次,我們利用[ 28 ]研究晴朗天氣到惡劣天氣的風格遷移,并從晴朗天氣輸入生成惡劣天氣訓練樣本。值得注意的是,這些方法與所有其他比較方法相比具有不公平的優勢,因為它們從我們的驗證集中采樣了不利的天氣場景。注意域適應方法不能直接應用,因為它們需要來自特定域的目標圖像。因此,它們也沒有為數據有限的稀有邊緣情況提供解決方案。此外,文獻[ 28 ]沒有對霧或雪等失真進行建模,見補充材料中的實驗。我們注意到,采用合成數據增強[ 51 ]或消除不利天氣影響的圖像到圖像重建方法[ 63 ]并不影響所提出的多模態深度熵融合的報告邊緣。
6.Conclusion and Future Work
In this paper we address a critical problem in au-tonomous driving:multi-sensor fusion in scenarios,whereannotated data is sparse and difficult to obtain due to nat-ural weather bias.To assess multimodal fusion in adverseweather,we introduce a novel adverse weather dataset cov-ering camera,lidar,radar,gated NIR,and FIR sensor data.The dataset contains rare scenarios,such as heavy fog,heavy snow,and severe rain,during more than 10,000 kmof driving in northern Europe.We propose a real-time deepmultimodal fusion network which departs from proposal-level fusion,and instead adaptively fuses driven by mea-surement entropy.Exciting directions for future researchinclude the development of end-to-end models enabling thefailure detection and an adaptive sensor control as noiselevel or power level control in lidar sensors.
6 .結論與未來工作
本文解決了自動駕駛中的一個關鍵問題:在自然天氣偏差導致標注數據稀疏且難以獲得的場景中進行多傳感器融合。為了評估不利天氣下的多模態融合,我們引入了一個新的不利天氣數據集,包括相機、激光雷達、雷達、門控NIR和FIR傳感器數據。該數據集包含了北歐地區超過10,000公里的駕駛過程中罕見的場景,如大霧、大雪和暴雨。我們提出了一個實時的深度多模態融合網絡,它脫離了提案級融合,而是由測量熵驅動的自適應融合。未來令人興奮的研究方向包括開發能夠進行故障檢測的端到端模型以及激光雷達傳感器中的自適應傳感器控制如聲壓級或功率級控制。
總結
以上是生活随笔為你收集整理的Seeing Through Fog Without Seeing Fog:Deep Multimodal Sensor Fusion in Unseen Adverse Weather (翻)的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: RAID合并两块固态硬盘大坑: 配置
- 下一篇: python爬虫抓取百度图片_Pytho