单细胞数据整合方法 | Comprehensive Integration of Single-Cell Data
操作代碼:https://satijalab.org/seurat/
依賴的算法
CCA
CANONICAL CORRELATION ANALYSIS | R DATA ANALYSIS EXAMPLES?
MNN
The Mutual Nearest Neighbor Method in Functional Nonparametric Regression
?
Comprehensive Integration of Single-Cell Data
實在是沒想到,這篇seurat的V3里面的整合方法居然發在了Cell主刊。
果然:大佬+前沿領域=無限可能
可以看到bioRxiv上是November 02, 2018發布的,然后Cell主刊June 06, 2019正式發表。
方法的創意應該在2017年底就有了,那時候我才剛來做single cell。
Single-cell transcriptomics has transformed our ability to characterize cell states, but deep biological understanding requires more than a taxonomic listing of clusters.
As new methods arise to measure distinct cellular modalities, a key analytical challenge is to integrate these datasets to better understand cellular identity and function.
Here, we develop a strategy to “anchor” diverse datasets together, enabling us to integrate single-cell measurements not only across scRNA-seq technologies, but also across different modalities.
After demonstrating improvement over existing methods for integrating scRNA-seq data, we anchor scRNA-seq experiments with scATAC-seq to explore chromatin differences in closely related interneuron subsets and project protein expression measurements onto a bone marrow atlas to characterize lymphocyte populations.
Lastly, we harmonize in situ gene expression and scRNA-seq datasets, allowing transcriptome-wide imputation of spatial gene expression patterns.
Our work presents a strategy for the assembly of harmonized references and transfer of information across datasets.
亮點1:通過錨定的方法來整合多種數據,不同平臺,不同形態。
亮點2:同時能整合scATAC-seq數據?
亮點3:空間基因表達模式分析
?
至今為止的單細胞重大突破:
- immunophenotype (Stoeckius et al., 2017; Peterson et al., 2017),
- genome sequence (Navin et al., 2011; Vitak et al., 2017),
- lineage origins (Raj et al., 2018; Spanjaard et al., 2018; Alemany et al., 2018),
- DNA methylation landscape (Luo et al., 2018; Kelsey et al., 2017),
- chromatin accessibility (Cao et al., 2018; Lake et al., 2018; Preissl et al., 2018),
- spatial positioning
?
單細胞數據整合的兩大問題:
These questions are well suited to established fields in statistical learning.
第二個問題就類似reference assembly (Li et al., 2010) and mapping (Langmead et al., 2009) for genomic DNA sequences
?
identify shared subpopulations across datasets
- canonical correlation analysis (CCA)
- mutual nearest neighbors (MNNs)
?
第二種整合的問題:
- only a subset of cell types are shared across datasets
- significant technical variation masks shared biological signal.
?
這篇文章解決了三個問題:
- reference assembly
- transfer learning for transcriptomic, epigenomic, proteomic,
- spatially resolved single-cell data
?
核心凝練
Through the identification of cell pairwise correspondences between single cells across datasets, termed ‘‘anchors,’’ we can transformdatasets into a shared space, even in the presence of extensive technical and/or biological differences.
This enables the construction of harmonized atlases at the tissue or organismal scale, as well as effective transfer of discrete or continuous data from a reference onto a query dataset.
?
一些單細胞的常識
false negatives (‘‘drop-outs’’) due to transcript abundance and protocol-specific biases
expression derived from fluorescence in situ hybridization (FISH) exhibits probe-specific noise due to sequence specificity and background binding
?
結果
Identifying Anchor Correspondences across Single-Cell Datasets
基本的假設:we assume that there are correspondences between datasets and that at least a subset of cells represent a shared biological state.
?
Constructing Integrated Atlases at the Scale of Organs and Organisms
評估不同工具在整合不同平臺和不同subtype數據的準確性
?
Leveraging Anchor Correspondences to Classify Cell States
開始整合case和control,cell state
?
Projecting Cellular States across Modalities
整合scATAC-seq
?
Transferring Continuous and Multimodal Data across Experiments
?
?
Predicting Protein Expression in Human Bone Marrow Cells
CITE-seq,預測蛋白表達
?
Spatial Mapping of Single-Cell Sequencing Data in the Mouse Cortex
小鼠大腦皮層的空間比對
?
?
what's my problem?
我也早就意識到這是個重要的有價值的問題了,但是孤軍奮戰,沒有真正的提煉這個問題,也沒有深入思考和理解,更沒有想去利用統計思維來解決這個問題。
可以看到大佬早就看到這個有價值的問題,而且已經召集人馬來討論、思考,用統計學的方法系統的提出了自己的解決方案,也最終憑借自己的實力和名氣把結果發表在最頂級的雜志上了。
?
是什么在阻撓我,讓我一直在原地打轉?
?
轉載于:https://www.cnblogs.com/leezx/p/11244731.html
總結
以上是生活随笔為你收集整理的单细胞数据整合方法 | Comprehensive Integration of Single-Cell Data的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: Ubuntu系统---NVIDIA 驱动
- 下一篇: 获取绩效统计列表