mongdb 群集_通过对比群集分配进行视觉特征的无监督学习
mongdb 群集
Self-supervised learning, semi-supervised learning, pretraining, self-training, robust representations, etc. are some of the hottest terms right now in the field of Computer Vision and Deep Learning. The recent progress in terms of self-supervised learning is astounding. Towards this end, researchers at FAIR have now come up with this new paper that introduces a new method to learn robust image representations.
自我監督學習,半監督學習,預訓練,自我訓練,魯棒表示等是計算機視覺和深度學習領域中目前最熱門的術語。 自我監督學習方面的最新進展令人震驚。 為此,FAIR的研究人員現在提出了這份新論文 ,其中介紹了一種學習魯棒圖像表示的新方法。
介紹 (Introduction)
One of the most important goals of self-supervised learning is to learn robust representations without using labels. Recent works try to achieve this goal by combining two elements: Contrastive loss and Image transformations. Basically, we want our model to learn more robust representations, not just high-level features, and to achieve a certain level of invariance to image transformations.
自我監督學習的最重要目標之一是在不使用標簽的情況下學習可靠的表示形式。 最近的工作試圖通過結合兩個要素來實現這一目標: 對比損失和圖像變換。 基本上,我們希望我們的模型學習更魯棒的表示,而不僅僅是高級功能,并實現圖像轉換的一定程度的不變性。
The contrastive loss explicitly compares pairs of image representations. It pushes away the representations that come from different images while pulling together the representations that come from a different set of transformations or views of the same image.
對比損失明確地比較了成對的圖像表示。 它將來自不同圖像的表示推開,而將來自同一圖像的一組不同變換或視圖的表示匯總在一起。
Computing all the pairwise comparisons on a large dataset is not practical. There are two ways to overcome this constraint. First, instead of comparing all pairs, approximate the loss by reducing the comparison to a fixed number of random images. Second, instead of approximating the loss, we can approximate the task, e.g. instead of discriminating between each pair, discriminating between groups of images with similar features.
在大型數據集上計算所有成對比較是不實際的。 有兩種方法可以克服此約束。 首先,不是比較所有對,而是通過將比較減少為固定數量的隨機圖像來近似損失。 第二,代替近似損失,我們可以近似任務,例如,代替區別每對,區別具有相似特征的圖像組。
Clustering is a good example of this kind of approximation. However, clustering alone can’t solve the problem as it has its own limitations. For example, the objective of clustering doesn’t scale well with the dataset as it requires a pass over the entire dataset to form image codes during training.
聚類就是這種近似的一個很好的例子。 但是,僅集群無法解決問題,因為它有其自身的局限性。 例如,聚類的目標無法隨數據集很好地擴展,因為它需要在訓練過程中遍歷整個數據集以形成圖像代碼 。
提案 (Proposal)
To overcome the limitations listed above, the authors proposed the following:
為了克服上述限制,作者提出了以下建議:
Online Clustering Loss: The authors propose a scalable loss function that works both on large as well as small batch sizes and doesn’t require extra stuff like a memory bank or a momentum encoder. Theoretically, it can be scaled to an unlimited amount of data.
在線聚類損失:作者提出,在大型以及小批量的作品既并且不需要像記憶庫或氣勢編碼器額外的東西一個可伸縮的損失函數。 從理論上講,它可以擴展到無限量的數據。
Multi-Crop Strategy: A new augmentation technique that uses a mix of views with different resolutions in place of two full-resolution views, without increasing the memory or compute requirements much.
多作物策略:一種新的增強技術,它使用具有不同分辨率的視圖混合來代替兩個全分辨率視圖,而又不增加內存或計算需求。
方法 (Method)
The ultimate goal of this exercise is to learn visual features in an online manner without supervision. To achieve this, the authors propose an online clustering-based self-supervised learning.
該練習的最終目標是在沒有監督的情況下以在線方式學習視覺功能。 為了實現這一目標,作者提出了一種基于在線聚類的自我監督學習方法。
But how is it different from typical clustering approaches?
但是,它與典型的聚類方法有何不同?
Typical clustering methods like DeepClutsering are offline as they rely on two steps in general. In the first step, you cluster the image features of the entire dataset and in the second step, we predict the clusters or the codes for different image views. The fact that these methods require multiple passes over the dataset makes them unsuitable for online learning. Let us see how the authors tackle these problems step by step.
DeepClutsering之類的典型集群方法通常都需要兩個步驟,因此它們處于脫機狀態 。 第一步,對整個數據集的圖像特征進行聚類;在第二步中,我們預測不同圖像視圖的聚類或代碼。 這些方法需要對數據集進行多次遍歷,這一事實使其不適用于在線學習。 讓我們看看作者如何逐步解決這些問題。
在線聚類 (Online Clustering)
We have an image transformation set T. Each image xn is transformed into an augmented view xnt by applying a transformation t sampled from T.
我們有一個圖像變換集T。通過應用從T采樣的變換t,每個圖像xn被變換為增強視圖xnt 。
The augmented view, xnt, is then mapped to a vector representation by applying a non-linear mapping.
然后,通過應用非線性映射將增強視圖xnt映射到矢量表示。
4. We then compute the codes qnt, for this vector znt by mapping it to a set of. K trainable prototype vectors {c?, c?…..c_k}. The matrix formed by these vectors is denoted by C.
4.然后,我們計算代碼qnt 通過將向量znt映射到一組向量。 K個可訓練的原型向量{c1,c2 ..... c_k}。 這些向量形成的矩陣用C表示。
交換預測問題 (Swapped Prediction Problem)
We talked about image transformation, the feature vector projection, and code computation(q) but we haven’t discussed why are we doing it this way. As said earlier, one of the goals of this whole exercise is to learn visual features online without any supervision. We want our models to learn robust representations that are consistent across different image views.
我們討論了圖像變換,特征向量投影和代碼計算(q),但我們沒有討論為什么要這樣做。 如前所述,整個練習的目標之一是在沒有任何監督的情況下在線學習視覺特征。 我們希望我們的模型學習在不同圖像視圖之間一致的魯棒表示。
The authors propose to enforce consistency between codes from different augmentations of the same image. This is inspired by contrastive learning but the difference is that instead of directly comparing the feature vectors, we would compare the cluster assignment for different image views. How?
作者建議在同一張圖片的不同擴充內容之間強制執行代碼之間的一致性。 這是受對比學習啟發的,但不同之處在于,我們將比較不同圖像視圖的聚類分配,而不是直接比較特征向量。 怎么樣?
Once we have computed the codes zt and zs from two different augmentations of the same image, we would compute the codes qt and qs by mapping the features vectors to the K prototypes. The authors then propose to use a swapped prediction problem with the following function:
一旦我們從同一張圖像的兩個不同擴充中計算出代碼zt和zs ,就可以通過將特征向量映射到K個原型來計算代碼qt和qs 。 然后,作者建議使用具有以下功能的交換預測問題:
Each term on the right-hand side in this equation is cross-entropy loss measures the fit between feature z and code q. The intuition behind this is that if the two features capture the same information, it should be possible to predict the code from the other feature. It is almost similar to contrastive learning but here we are comparing the codes instead of the features directly. If we expand one of the terms on the right-hand side, it looks like this:
該方程式右側的每個項都是交叉熵損失 測量特征z和代碼q之間的擬合 。 這背后的直覺是,如果兩個功能捕獲相同的信息,則應該可以從另一個功能預測代碼。 這幾乎與對比學習類似,但是在這里我們直接比較代碼而不是特征。 如果我們在右側擴展術語之一,則如下所示:
Here the softmax operation is applied on the dot product of z and C. The term Τ is the temperature parameter. Taking this loss over all the images and pairs of data augmentations leads to the following loss function for the swapped prediction problem:
此處,softmax操作應用于z和C的點積。項Τ是溫度參數。 對所有圖像和數據增強對進行這種損失會導致以下損失函數用于交換預測問題:
This loss function is jointly minimized with respect to the prototypes C and the parameters θ of the image encoder f, used to produce the features znt
相對于原型C和用于生成特征znt的圖像編碼器f的參數θ,該損失函數被共同最小化。
在線代碼計算 (Online Codes computation)
When we started this discussion, we talked about offline vs online clustering, but we haven’t looked at how this method is online.
在開始討論時,我們談到了離線群集和在線群集,但是我們沒有研究此方法如何在線。
In order to make this method online, the authors propose to computed codes using only image features within a batch. The codes are computed using the prototypes C such that all the examples in a batch are equally partitioned by the prototype. The equipartition constraint is very important here as it ensures that the codes for different images in a batch are distinct, thus preventing the trivial solution where every image has the same code.
為了使此方法在線,作者建議僅使用批處理中的圖像特征來計算代碼。 使用原型C計算代碼,使得一批中的所有示例均由原型平均劃分。 等分約束在這里非常重要,因為它確保了批次中不同圖像的代碼是不同的,從而避免了每個圖像都具有相同代碼的簡單解決方案。
Given B feature vectors Z = [z?, z?, . . . , z_B], we are interested in mapping them to the prototypes C = [c?, . . . , c_K]. This mapping or the codes are represented by Q = [q?, . . . , qB], and Q is optimized to maximize the similarity between the features and the prototypes, i.e.
給定B個特征向量Z = [z 1,z 2,...。 。 。 ,z_B],我們有興趣將它們映射到原型C = [c?,...。 。 。 ,c_K]。 此映射或代碼由Q = [q?,...表示。 。 。 ,qB]和Q進行了優化,以最大化特征和原型之間的相似性,即
where H(Q) is the entropy function, and ε is a parameter that controls the smoothness of the mapping. The above expression represents the optimal transport problem (more about it later). We have the features and the prototypes and now with that, we want to find the optimal codes. The entropy term on the right-hand helps in equipartition (Please correct me in the comments section if I am wrong).
其中H(Q)是熵函數,而ε是控制映射平滑度的參數。 上面的表達式代表了最佳的運輸問題(稍后會詳細介紹)。 我們具有功能和原型,現在,我們想要找到最佳代碼。 右邊的熵術語有助于均分( 如果我錯了,請在評論部分中對我進行糾正 )。
Also, as we are working on mini-batches, the constraint is imposed on mini-batches and looks something like this:
另外,當我們在迷你批處理上工作時,對迷你批處理施加了約束,它看起來像這樣:
where 1_K denotes the vector of ones in dimension K. These constraints enforce that on average each prototype is selected at least (B / K) times in the batch.
其中1_K表示維度K中1的向量。這些約束使得每個批次中的每個原型平均至少選擇(B / K)次。
Once a solution Q* is found for (3), there are two options we can go with. First, we can directly use the soft codes. Second, we can get discrete codes by rounding the solution. The authors found out that discrete codes work well when computing codes in an offline manner on the full dataset. However, in the online setting where we are working with mini-batches, using the discrete codes performs worse than using the continuous codes. An explanation is that the rounding needed to obtain discrete codes is a more aggressive optimization step than gradient updates. While it makes the model converge rapidly, it leads to a worse solution. These soft codes Q* takes the form of a normalized exponential matrix.
找到(3)的解Q *后,我們可以使用兩種選擇。 首先,我們可以直接使用軟代碼。 其次,通過四舍五入可以得到離散代碼。 作者發現,離散代碼在完整數據集上以離線方式計算代碼時效果很好。 但是,在我們使用迷你批處理的在線設置中,使用離散代碼比使用連續代碼更糟糕。 一種解釋是獲得離散代碼所需的舍入是比梯度更新更積極的優化步驟。 盡管它使模型快速收斂,但會導致更糟糕的解決方案。 這些軟代碼Q *采用歸一化指數矩陣的形式。
Here u and v are renormalization vectors computed using a small number of matrix multiplications using the iterative Sinkhorn-Knopp algorithm.
u和v是使用迭代Sinkhorn-Knopp算法使用少量矩陣乘法計算的重歸一化向量。
Side note: Thanks to Amit Chaudhary for pointing out the relevant resources for the transportation polytope and Sinkhorn-Knopp algorithm. You can read about these two in detail here and here
旁注:感謝 Amit Chaudhary 指出運輸多態性和Sinkhorn-Knopp算法的相關資源。 你可以閱讀一下這兩個詳細 她的 E和 這里
小批量工作 (Working with small batches)
When the number B of batch features is too small compared to the number of prototypes K, it is impossible to equally partition the batch into the K prototypes. Therefore, when working with small batches, the authors use features from the previous batches to augment the size of Z in (3), and only the codes of the batch features are used in the training loss.
當批次特征的數量B與原型K的數量相比太小時,無法將批次均等地劃分為K個原型。 因此,在處理小批次時,作者使用先前批次中的特征來增加(3)中的Z的大小,并且訓練損失中僅使用了批次特征的代碼。
The authors propose to store around 3K features, i.e., in the same range as the number of code vectors. This means that they only keep features from the last 15 batches with a batch size of 256, while contrastive methods typically need to store the last 65K instances obtained from the last 250 batches.
作者建議存儲3K左右的特征,即與代碼向量的數量在同一范圍內。 這意味著它們僅保留最近15個批次中的特征(批次大小為256),而對比方法通常需要存儲從最近250個批次中獲得的最近65K實例。
All of the above information is related to online clustering only. Nowhere you discussed the new augmentation strategy. Trying to keep the blogpost short? Huh!
以上所有信息僅與在線群集有關。 您無處討論新的擴充策略。 試圖使博客文章簡短嗎? !
多幅裁剪:以較小的圖像增強視圖 (Multi-crop: Augmenting views with smaller images)
It is a known fact that random crops always help (both in supervised as well as in self-supervised). Comparing random crops of an image plays a central role by capturing information in terms of relations between parts of a scene or an object.
眾所周知,隨機作物總是有幫助的(無論是在有監督的還是在有自我監督的情況下)。 通過捕獲場景或對象各部分之間的關??系信息,比較圖像的隨機作物起著核心作用。
Perfect. Let’s take crops of sizes 4x4, 8x8, 16x16, 32x32, ….. Enough data to make the bloody network learn, ha!
完善。 讓我們以4x4、8x8、16x16、32x32等大小的農作物為例。..足夠的數據使血腥的網絡學會了,哈!
Well, you can do that, but increasing the number of crops quadratically increases the memory and compute requirements. To address this, the authors proposed a new multi-crop strategy where they use:
好的,您可以這樣做,但是增加農作物的數量將二次增加內存和計算需求。 為了解決這個問題,作者提出了一種新的多作物策略,他們在其中使用:
V additional low-resolution crops that cover only small parts of the image.
V僅覆蓋圖像的一小部分的其他低分辨率作物。
The loss function in (1) is then generalized as:
然后將(1)中的損失函數概括為:
The codes are computed using only the standard resolution crops. It is intuitive that if you include all the crops, it will increase the computational time. Also, if the crops are taken over a very small area, it won’t add much info, and this, very limited, partial information can degrade the overall performance.
僅使用標準分辨率作物計算代碼。 直觀地講,如果包括所有農作物,則會增加計算時間。 另外,如果將農作物收在很小的區域,則不會增加太多信息,而這種非常有限的部分信息會降低整體性能。
結果 (Results)
The authors performed a bunch of experiments. I won’t be listing down all the training details here. You can read about them directly from the paper. One important thing to note is that most of the hyperparameters were directly taken from the SimCLR paper along with the LARS optimizer with cosine learning rate, and the MLP projection head. I am listing down some of the results here.
作者進行了大量實驗。 我不會在這里列出所有培訓詳細信息。 您可以直接從紙上閱讀它們。 需要注意的重要一件事是,大多數超參數直接取自SimCLR論文,以及具有余弦學習速率的LARS優化器 r和MLP投影頭。 我在這里列出了一些結果。
Transfer learning on downstream tasks轉移學習下游任務結論 (Conclusion)
I liked this paper a lot. IMHO, this is one of the best papers on SSL to date. Not only it tries to address the problems associated with instance discrimination task and contrastive learning but it also proposes a very creative solution to move forward. The biggest strength of this method is that it is online.
我非常喜歡這篇論文。 恕我直言,這是迄今為止關于SSL的最佳論文之一。 它不僅試圖解決與實例區分任務和對比學習有關的問題,而且還提出了一個非常有創意的解決方案來向前發展。 這種方法的最大優勢是在線。
翻譯自: https://medium.com/@nainaakash012/unsupervised-learning-of-visual-features-by-contrasting-cluster-assignments-fbedc8b9c3db
mongdb 群集
總結
以上是生活随笔為你收集整理的mongdb 群集_通过对比群集分配进行视觉特征的无监督学习的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 对标苹果?蔚来手机称绝无系统广告!二季度
- 下一篇: ansys电力变压器模型_变压器模型……