當前位置：首頁 > 编程资源 > 编程问答 >内容正文

编程问答

6D姿态估计从0单排——看论文的小鸡篇——Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images...

發布時間：2024/1/17 编程问答 41 豆豆

生活随笔收集整理的這篇文章主要介紹了 6D姿态估计从0单排——看论文的小鸡篇——Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images... 小編覺得挺不錯的,現在分享給大家,幫大家做個參考.

迎來了第一篇使用CNN對姿態進行估計的文章了，哭了。
這篇文章是基于2014_Learning 6D Object Pose Estimation using 3D Object Coordinates（我們讀過的）這篇文章，在14年的文章中作者把模型渲染的結果——一個像素點可能的模型坐標軸和位置、以及可能所屬的object這兩點用隨機叢林來保存，然后通過像素級別的估計合成結果，最后利用一個energy function來評估pose渲染出的估計結果和實際值之間的誤差來優化pose。這篇文章主要做的內容，就是在之前的隨機森林的基礎上，吧之前能量函數的部分用CNN來完成——用CNN對比模板生成的結果和實際觀測的結果來生成能量值，從而利用能量值來精化得到的Pose。

Analysis-by-Synthesis: compare the observation with the output of a forward process, such as a rendered image of the object of interest in a particular pose.
We propose an approach that "learns to compare", while taking these difficulties (occlusion, complicated sensor noise) into account. This is done by describing the posterior density of a particular object pose with a CNN that compares an observed and rendered image.

The Pose Estimation Task:
Our goal is to estimate the pose \(H\) of a rigid object from a set of observations denoted by \(x\). Each pose \(H=(R,T)\) is a combination of two components. The rotational component \(R\) is a \(3\times3\) matrix describing the rotation around the center of the object. The translational component \(T\) is a 3D vector corresponding to the position of the object center in the cemara coordinate system.

Probabilistic Model: the possterior distribution of the pose \(H\) given the observations \(x\) as a Gibbs distribution: \(p(H|x;\theta)=\frac{\exp(-E(H,x;\theta))}{\int\exp(-E(H,x;\theta))d\hat{H}}\), where \(E(H,x;\theta)\) is so called energy function. The function is a mapping from a pose \(H\) and the observed images \(x\) to a real number, parametrized by the vector \(\theta\). We implement it by using a CNN which directly outputs the energy value. \(\theta\) holds the weights of our CNN.

Convolutional Neural Network: we first render the object in pose \(H\) to obtain rendered images \(r(H)\). CNN compares \(x\) with \(r(H)\) and outputs a value \(f(x,r(H);\theta)\). We define the function as: \(E(H,x;\theta)=f(x,r(H);\theta)\). Our network is trained to assign a low energy values when there is a large agreement between observed images and renderings. we feed all rendered and observed images as separate input channels into the CNN. We consider only a square window around the center of the object with pose H. For performance reasons windows which are bigger than 100x100 pixels are down sampled to this size. And then there is the setting of CNN.

Maximum Likelihood Training: In training we want to find an optimal set of parameters \(\theta^*\) based on labeled training data \(L = (x_1,H_1)...(x_n,H_n)\), where \(x_i\) shall denote observations of the the \(i\)-th training image and \(H_i\) the corresponding ground truth pose. We apply the maximum likelihood paradigm and define: \(\theta^*={\arg\max}_\theta\sum^n_{i=1}\ln p(H_i|x_i;\theta)\). We use stochastic gradient descent to train : \(\frac{\partial}{\partial \theta_j}\ln p(H_i|x_i;\theta)=-\frac{\partial}{\partial \theta_j}E(H_i,x;\theta)+\mathbb{E}[\frac{\partial}{\partial \theta_j}E(H,x_i;\theta)|x_i;\theta]\) with respect to each parameter \(\theta_j\), \(\mathbb{E}[|x_i;\theta]\) stands for the conditional expected value according to the posterior distribution \(p(H_i|x_i;\theta)\)
Sampling: approximate the expected value by a set of pose samples \(\mathbb{E}[\frac{\partial}{\partial\theta_j}E(H,x_i;\theta)|x_i;\theta]\approx \frac{1}{N}\sum^N_{k=1}\frac{\partial}{\partial\theta_j}E(H_k,\hat{x};\theta)\), where \(H_1...H_N\) are pose-samples drawn independently from the posterior \(p(H|x;\theta)\) with the current parameters \(\theta\). Metropolis algorithm generates a sequence of samples \(H_t\) by repeating two steps: 1. Draw a new proposed sample \(H'\) according to a proposal distribution \(Q(H'|H_t)\), which the distribution has to be symmetric 2. Accept or reject the proposed sample according to an acceptance probability \(A(H'|H_t)\). If the proposed sample is accepted set \(H_{t+1}=H'\) else \(H_{t+1}=H_t\). \(A(H'|H_t)=min(1,\frac{p(H'|x;\theta)}{p(H_t|x;\theta)})\)
Proposal Distribution: We define \(Q(H'|H_t)\) implicitly by describing a sampling procedure and ensuring that it is symmetric. The translational component \(T'\) of the proposed sample is directly drawn from a 3D isotropic normal distribution \(N(T_t,\sum_T)\) centered at the translational component \(T_t\) of the current sample \(H_t\). The rotational component \(R'\) of the proposed sample \(H'\) is generated by applying random rotation \(\hat{R}\) to the rotational component \(R_t\) of the current sample: \(R'=\hat{R}R_t\), \(\hat{R}\) is calculated as the Euler vector(rotation matrix), which is drawn from a 3D zero centered isotropic normal distribution \(e\sim N(0,\sum_R)\)
Initialization and Burn-in-phase: To find a good initialization we run our inference procedure using the current parameter set. We then perform the Metropolis algorithm for a total of 130 iterations, disregarding the samples from the first 30 iterations which are considered as burn-in-phase.

轉載于:https://www.cnblogs.com/LeeGoHigh/p/10512135.html

總結

以上是生活随笔為你收集整理的6D姿态估计从0单排——看论文的小鸡篇——Learning Analysis-by-Synthesis for 6D Pose Estimation in RGB-D Images...的全部內容，希望文章能夠幫你解決所遇到的問題。

如果覺得生活随笔網站內容還不錯，歡迎將生活随笔推薦給好友。

上一篇： mybatis-plus的代码生成器
下一篇：封装方法