9篇前沿文章 | 一览肿瘤基因组及多组学思路
文章1:Genomic basis for RNA alterations in cancer
此篇文獻已發在上期推文,本文增加了大量的解讀和注釋?(灰色且斜體字),用于輔助對原文的理解
接收: 2019-12-11,Nature
作者:PCAWG Transcriptome Core Group
鏈接:doi.org/10.1038/s41586-020-1970-0
摘 要
? 癌癥中轉錄本的改變通常由體細胞基因組的變化所引起?(Transcript alterations often result from somatic changes in cancer genomes)。(仔細理解這句話,是癌癥中聯合轉錄組和基因組數據分析的理論基礎。即:轉錄(組)變化的分子機制,來自于基因組)
? 癌癥中描述了各種形式的RNA改變?(RNA alteration),包括:過表達、可變剪接和基因融合?(Overexpression, Altered splicing, Gene fusions)。(這些大多是轉錄組測序的研究內容,但我們經常只側重基因的表達定量、差異基因鑒定、GO/KEGG等功能注釋)
? 然而,由于患者之間,以及腫瘤類型之間的異質性?(Heterogeneity),以及通過轉錄組和全基因組測序分析樣本的相對較小的患者隊列?(Small cohort),很難將這些歸因于潛在的基因組變化。
(這里的異質性主要指:1. 每個人的遺傳差異,以及2. 腫瘤組織中不同取樣部分、取樣時間、不同細胞亞型之間的差異。這些差異或異質性,會極大地影響腫瘤研究中的樣本設置,以及數據分析的策略。比如:
? ① 腫瘤WES研究除了需要患者自身的腫瘤組織,還需要自身的非腫瘤組織來配對,后者是為了過濾掉該患者特有的胚系變異(即我們每個人生下來就含有的、與別人不同的DNA序列變異),否則將無法判讀一個突變是來自體細胞 (后天獲得)、還是生來就有 (遺傳或新發突變),即:Somatic Mutation vs. Germline Mutation。公共數據庫并未記錄該患者的先天遺傳變異,所以永遠需要患者自身的正常組織來配對。
? ② 體細胞突變多數情況下是隨機的。因此同一塊腫瘤組織一般可被分為不同的細胞亞群,含有不同的驅動突變、進而有不同的轉錄本表達 (此為空間異質性。腫瘤的演進/進化研究則可能涉及時間上的異質性,比如腫瘤晚期的癌轉移組織,或服用靶向藥1年后出現的耐藥突變組織等等)。
? 但總體上我們認為:單個個體中的不同器官、組織、細胞,都攜帶相同的遺傳物質,即:Germline的突變不受取材的影響,既可以是癌旁,也可以是全血、白細胞甚至口腔脫落細胞;而對于腫瘤體細胞樣本的取材則復雜得多,需要考慮:正常組織配對,時間及空間異質性。
? 由于異質性的存在,就要求取樣方式的多樣化,以及較大的總樣本量 (即:較大的患者隊列),獲得更加可信的分子機制預測結果)
? 在這里,我們展示了迄今為止據我們所知的最全面的癌癥相關基因改變目錄 (Catalogue?of cancer-associated gene alterations),該目錄是通過描述國際癌癥基因組聯盟(International Cancer Genome Consortium, ICGC)和癌癥基因組圖譜(The Cancer Genome Atlas, TCGA)的泛癌全基因組分析(Pan-Cancer Analysis of Whole Genomes, PCAWG)聯盟1,188名捐贈者的腫瘤轉錄組獲得的。
? 我們 (進一步)利用匹配的 (Matched)全基因組測序數據,將幾種類型的RNA改變與胚系和體細胞DNA改變相關聯 (Associated several categories of RNA alterations with germline and somatic DNA?alterations),并確定了可能的遺傳機制?(Genetic mechanism)。
? (我們發現了)細胞拷貝數改變是總基因 (Total gene)與等位基因特異性表達 (Allele-specific expression,?ASE)變化的主要驅動因素。
? 我們鑒定了649個體細胞SNV與基因表達的順式 (cis)相關性,其中68.4%與基因的側翼非編碼區相關 (Flanking non-coding region)。
? 我們 (還)發現1,900個與體細胞突變相關的剪接改變,包括在靠近Alu元件的內含子內部的外顯子形成 (Formation of exons within introns in proximity to Alu elements)。
? 此外,82%的基因融合與結構變異相關,包括75個新類別的“橋接”融合 (由第三個基因組位置連接兩個基因)。
? (總之)我們觀察到不同癌癥類型的轉錄組改變的特征不同,并且與DNA突變特征的變化相關聯。本研究所獲得的基因組背景下RNA改變的概要,為確定與癌癥功能相關的基因和機制,提供了豐富的資源。
引 言
? 為了更廣泛地研究癌癥基因組的改變 (特別是在非編碼區),PCAWG項目的成立便是為了分析大量的全基因組樣本,這些樣本被貢獻給ICGC和TCGA項目。
? (之前)個別項目沒有使用相一致的方法進行一些關鍵分析。因此16個PCAWG工作組的一個主要的重點是:統一地分析PCAWG數據。例如,PCAWG技術工作組領導了原始數據收集、全基因組測序數據的重新排列(比對),并實施了核心體細胞突變檢測流程 (Core somatic mutation calling pipeline)。PCAWG的其它工作組集中于:對拷貝數變異、結構變異、胚系變異、突變特征和驅動基因鑒定等實施統一地分析。
? 在此,我們報告了PCAWG轉錄組工作小組對來自27種腫瘤類型的1,188個樣本?(每類腫瘤154~6個樣本,見下圖;平均值:44)的可用且匹配的轉錄組和基因組圖譜的聯合分析 (Joint analysis of available?matched?transcriptome and genome profiling)?(考慮到各種異質性的存在,這里的“匹配”可能是指:同一個腫瘤病人的同一塊組織的勻漿液 (例如消除單塊腫瘤組織的空間異質性),分別用于RNA、DNA的提取,分別進行轉錄組和基因組測序,即嚴格的“matched,匹配”。腫瘤組織 (也包含了少數轉移組織)必然分別測序RNA、DNA (轉錄組、基因組);癌旁或其它Normal/Healthy組織可能只測序DNA (用于過濾自身的胚系變異),或也測序了RNA (用于做轉錄組差異分析)),提供了迄今為止我們所知的最大的癌癥RNA表型及其潛在的遺傳變化基礎?(RNA phenotypes and their underlying genetic changes in cancer)資源?(Extended Data Fig. 1, Methods, Supplementary Results, Supplementary Table 23)。
Extended Data Fig. 1 | 1,188例PCAWG捐獻者的泛癌表達譜
a,來自27種組織類型的腫瘤和正常RNA-seq數據。樣本總數顯示在柱狀圖的右邊。灰色條表示匹配的健康樣本。
b,女性和男性捐獻者的數量。
c,來自PCAWG研究的腫瘤總數和匹配的健康樣本。一組腫瘤(深紫色)已轉移。
S1_covars.xlsx /?All_samples_cohort
Supplementary_Tables (下載):https://pan.baidu.com/s/10fTsnVYlk30T9pKIq05cHg
提取碼:ysx4
??總之,我們展示了轉錄組數據用于理解特定DNA改變的不同維度如何促進癌變的重要性,并繪制出癌癥相關RNA改變的圖景?(Landscape)。
?癌癥特異性胚系順式-eQTLs?
Cancer-specific germline cis-eQTLs
? 為了研究不同類型RNA改變的潛在機制,我們首先關注了基因表達水平的變化?(Extended Data Fig. 2)。
(表達數量性狀位點 (Expression quantitative trait locus, eQTL)是關聯轉錄組和基因組/外顯組兩個組學的常用、經典方法,屬于多組學研究范疇。
? eQTL是一類能夠影響基因表達量的遺傳位點(大部分都是單核苷酸多態性,SNP),具有一定的生物學意義。迄今為止最全的eQTL數據庫是GTEx。分析SNP和基因表達水平的關聯度,以及SNP與基因的距離,尋找SNP調控的基因。
Cis-eQTLs vs.??trans-eQTLs.?Expression quantitative trait loci (eQTLs) are genetic variants that influence expression levels of mRNA transcripts. Cis-eQTLs commonly refer to genetic variations that act on local genes , and trans-eQTLs are those that act on distant genes and genes residing on different chromosomes.?
a)?cis-eQTL, b)?trans-eQTL, c)?mediated/介導?trans-eQTL with a single cis-mediator, and d)?mediated/介導 trans-eQTL with multiple cis-mediators.?清華大學統計科學中心,https://doi.org/10.1186/s12859-019-2651-6,BMC Bioinformatics
Identification of eQTLs can help advance our understanding of?genetics and regulatory mechanisms of gene expression in various organisms. Consistent findings suggest that many genes are regulated by nearby SNPs, and the identified cis-eQTLs are typically close to?transcription start sites (TSSs). In contrast to cis-eQTLs,?trans-eQTL identification is much more?challenging?because?a greater number of SNP-gene pairs are tested for trans-association. In order to achieve the same power, analysis of?trans-eQTLs requires a much larger sample?size?and/or effect than that in the cis-eQTL analysis. However, trans-eQTLs tend to have?weaker?effects than cis-eQTLs.?
Mediation diagram of the trans-association between rs2239804 and RPL34
Several methods have been developed to?improve?trans-eQTL detection, such as reducing the multiple-testing burden based on pairwise partial correlations from the gene expression data to increase power, and constructing or selecting variables to control for unmeasured confounders that may lead to spurious association
??eQTL分析至少需要三個文件:第1個是樣本信息文件,該文件包含樣本的年齡,性別和人種等等;第2個是基因表達量文件,它表示的是每個基因在每個樣本中的表達含量;第3個是基因型數據,也即每個樣本的基因型數據)
Extended Data Fig. 2 |?概述:在分析中考慮的遺傳變異的不同來源
a, 為了分析順式調控,使用標準eQTL方法,分別檢測單等位、單核苷酸 (Mono-allelic single-nucleotide)胚系變異 (SNV,藍色)與總基因表達?(Total gene expression)的關聯。(藍色圓點SNV,在樣本中存在完全相同的基因組位置;上圖的示例有3處)
? 由于體細胞SNV在隊列中復發率較低?(Low recurrence,紅色圓點SNV,在樣本中不存在完全相同的基因組位置;上圖的示例有0處),根據它們相對于所觀測的基因的位置?(例如啟動子、5 ' UTR或內含子),體細胞SNV被聚集在負荷分類中?(Aggregated?in burden categories)?(例如上圖的“Local somatic SNV burden/局部體細胞SNV負荷”)。
? 然后使用eQTL方法測試局部SNV負荷,獲取與所有基因的ASE/等位基因特異性表達?globally關聯,以及在每個基因水平上的總表達。通過檢測與突變及表觀遺傳特征相關的總基因表達,來估計反式效應?(Trans effects)。
? 所有體細胞順式eQTL分析的窗口大小為1 Mb;ASE及胚系順式eQTL分析的窗口大小為100 kb。
b,概述:不同的數據集,及其對a中所述分析的貢獻。箭頭表示所執行的單個分析之間的依賴關系。
① 胚系基因型來源于匹配的 (Matched)健康全基因組測序 (WGS)樣本。
② 等位基因特異性SCNAs (體細胞拷貝數改變)、突變特征和局部SNV負荷,來自于:與未受影響的 (Unaffected) WGS樣本相比的腫瘤WGS (即N-T配對)。
③?ASE和總表達 (Total expression/FPKM)來自腫瘤和正常RNA-seq數據。
? 我們最初考慮了常見的胚系變異 (次要等位基因頻率 (Minor allele frequency, MAF)≥1%),且靠近單個基因 (±100 kb),并在隊列中繪制了表達定量性狀位點 (eQTL)?(Extended Data Fig. 3, Supplementary Table 1)。
? 該泛癌分析發現了3,532個eQTL基因 (假發現率即FDR≤5%,以下表示為eGenes)?(Supplementary Table 2),富集于轉錄起始位點的近端區域 (TSSs)?(Extended Data Fig. 3)。
Supplementary_Tables / S2_eGenes_v2.xlsx /?pan-analysis
Extended Data Fig. 3 | 胚系eQTL中的先導變異 (Lead variants)
(每一行的3個子圖是一類腫瘤,3個圖是eQTL分析常見的輸出圖形,主要涉及:P值、先導SNP的個數及其與TSS的距離)
? 為了識別癌癥特異性調控變異,我們將我們的eQTL與來自基因型-組織表達項目?(GTEx,數據一般來自健康組織)的eQTL進行比較,采用之前的策略來評估eQTL的Replication,并探索先導eQTL變異在GTEx組織中的邊緣顯著性?(Marginal significance.?P≤0.01, Bonferroni-adjusted)。
? 盡管大多數先導變異在GTEx樣本中都能檢測到 (3,532個eQTL變異中有3,110個),但我們鑒定出了422個?(~8.4%)與GTEx組織不對應的eQTL,這提示了存在癌癥特異性調控?(Extended Data Fig. 4, Supplementary Table 3)。相應的eQTL先導變異富集于異染色質區?(Heterochromatic region) (Fig. 1a,圖中右側第2個顯著性星號:*?)。
Fig. 1 | 與基因表達關聯的胚系及體細胞SNV
a,表觀遺傳學路線圖 (Epigenetics Roadmap)富集分析,顯示泛分析/Pan-analysis的PCAWG特異性eQTLs,以及在GTEx組織中復現/Replicate的eQTLs中,跨細胞系Roadmap因素/Factos的平均倍數變化。
* :P < 0.05/25, PCAWG特異性eQTLs的單側Wilcoxon秩和檢驗,校正了所使用的Roadmap因子的數量 (即25)。數據為均值和標準差.
(其它幾個子圖,將在后文講解)
? 總的來說,這一分析揭示了基因表達調控的胚系框架?(Germline framework)在癌癥組織中很大程度上是保守的。
非編碼區體細胞順式eQTL
Somatic cis-eQTLs in non-coding regions
? 先前的其它研究已經描述了癌癥中的非編碼突變,特別是在啟動子區,及其對基因表達的調控作用。在這里,我們研究了整個基因組中可能的體細胞DNA變化,這些變化是基因表達變化的基礎。
Extended Data Fig. 5 | 順式突變體細胞負荷?(Cis-mutational somatic burden)
a,每種癌癥類型的體細胞突變負荷總數 (Total number of somatic mutational load per cancer type)。SNV的中位數范圍從甲狀腺腺癌的1,139個到皮膚黑色素瘤的72,804個。
(此圖也可以用于繪制腫瘤樣本分類或分組后,各自體細胞突變負荷總數的分布圖)
(橫軸) Shared Aliquots (共享的整除數)
b,由越來越多的患者共享的反復出現的體細胞SNV的數量。一小部分 (≥86個SNV)在超過1%的隊列?(≥12例患者)中均被檢測到。
(此圖可由變異水平的各樣本的SNV矩陣?(熱圖)/VCF文件,統計得到)
? 我們通過聚集 (Aggregating)基因附近?(側翼)2 kb區間?(2-kb intervals adjacent to genes, flanking)的SNV,以及處在外顯子、內含子中的SNV?(Extended Data?Figs. 2, 5, 6),來估計局部突變負荷?(Estimat local mutation burden)。
Extended Data Fig. 6 |?按檢測區域類型劃分的體細胞突變率與負荷頻率 (Mutation rate and burden frequency)
a,每個基因檢測到的、體細胞突變負荷頻率≥1%的、突變區域的個數;
b,每千堿基的突變率 (Mutation rate per?kilobase)。
c,按所測間隔類型劃分的?(側翼區、外顯子、內含子)負荷頻率。
d,前導間隔?(Leading intervals,?FDR≤5%)到其最近的 (左和右)間隔的距離分布 (bp),使其關聯的P值下降了至少一個數量級 (顯示了99%的分布)。
e,檢測的所有的基因組區域 (負荷頻率≥1%,n = 1,049,102),以及所觀察到的FDR為5%的體細胞順式eQTL下的567個基因組區域的分解 (Breakdown)。圖中,Intronic:eGene內含子;Exonic:eGene外顯子;Flank.:表示距離eGene起始和結束1Mb距離內的2kb側翼區域;flank.intergenic:指基因組位置 (無基因注釋)的側翼區域;Flank.intronic:指與鄰近基因內含子重疊的側翼區域;Flank.others:表示與附近基因的一些注釋部分地重疊的側翼區域。
? 接下來,我們分解 (Decomposed)了單個基因的表達變化,考慮了順式基因中常見的突變負荷,以及順式胚系變異和體細胞拷貝數改變 (SCNAs)。這表明SCNAs是表達變化的主要驅動因素 (17%),其次是基因側翼區域的體細胞SNV (1.8%)和胚系變異 (1.3%) (Fig. 1b)。
Fig. 1b
b,對基因表達水平進行方差成分分析 (Variance component analysis),顯示不同種系和體細胞因素,對不同基因集的方差所占的平均比例 (Average proportion of variance?explained?by different germline and somatic factors for different sets of genes),包括所有因子的平均效應:
1)所有遺傳因子 (包括種系和體細胞);2)體細胞拷貝數變異;3)側翼區的體細胞變異;4)人群結構;5)?cis-germline effects;6)體細胞內含子和外顯子突變效應。
(可見:體細胞的內含子和外顯子突變效應的解釋度很小,而主要由拷貝數變異、非編碼區和順式胚系等變異所解釋)
? 我們還測試了所有常見突變負荷與整個基因組的基因表達之間的關聯。我們鑒定了649個具有體細胞eQTL (FDR≤5%)的基因?(Supplementary Table 5)。其中,11個關聯結果位于相應eGene的內含子或外顯子,包括在特定癌癥發病機制中已知存在作用的基因,如卵巢癌中的CDK12和慢性淋巴細胞白血病中的IRF4?(Extended Data Figs. 7, 8)。
Supplementary_Tables / S5_somatic_egenes_rev.xlsx /?eQTL_results_FDR5%
Extended Data Fig. 7 | 與遺傳先導負荷 (Genic?lead burden)相關聯的7個體細胞eGenes的曼哈頓圖
Extended Data Fig. 8 | 8個體細胞eGenes的散點圖,顯示先導權重負荷對基因表達殘差的影響 (Plots show the effect of the lead weighted burden on the gene expression residuals (見原文的Methods) of these genes.?a, CDK12. b, PI4KA. c, IRF4. d, AICDA. e, C11orf73. f, BCL2. g, SGK1. h, TEKT5
? 大多數eQTL (68.4%)與側翼非編碼突變負荷相關 (Extended Data Fig. 6e,見上文)。(由此可見:基因組的非編碼區雖然不直接體現生命活動 (蛋白),但對基因表達的調控非常重要)
? 接下來,我們考慮了位于側翼區域 (n = 556)的eQTLs,并測試了來自Epigenetics Roadmap的細胞類型特異性注釋的富集。確定了13個有富集的注釋?(FDR≤10%)?(Extended Data Fig. 9, Supplementary Table 6),包括:待發?(Poised)啟動子,弱的和活躍的增強子,異染色質,但明顯沒有富集到轉錄因子結合位點?(Supplementary Table 7)。(Roadmap的這些注釋可能被收集在一些帶有注釋的bed文件,結合本文獻的數據及bedtools等工具,做進一步的統計、關聯)。 轉錄不活躍區域的富集可能是由于這些區域的突變率增加?(Extended Data Fig. 9),這之前在癌癥中有報道。
Extended Data Fig. 9 | 與存在體細胞突變負荷的側翼間隔,有所重疊的表觀基因組圖譜標記?(Roadmap epigenome marks)
? 我們還研究了體細胞eGenes的功能特征,并觀察到癌細胞testis基因的二價 (Bivalent)啟動子中體細胞eQTLs的富集?(P?= 0.04, Fisher’s exact test),如TEKT518?(Fig. 1c, Extended Data Fig. 8h)。
Fig. 1c
c,曼哈頓圖顯示TEKT5基因關聯的名義 (Nominal)P值 (用灰色標出),已考慮側翼、內含子和外顯子間隔。先導體細胞負荷與TEKT5表達的增加相關 (P = 1.61 × 10e-6),并與上游二價 (Bivalent)啟動子重疊 (紅點;注釋于:81個Roadmap細胞系,包括8個胚胎干細胞,9個胚胎干細胞來源,5個誘導多能干細胞系)。
? 此外,我們發現了與細胞分化和發育過程相關的基因本體?(即GO)類別的整體 (Global)富集 (FDR≤10%)?(Supplementary Table 8)。總體而言,體細胞eQTL分析發現,大多數非編碼區域與局部基因表達的變化相關,與癌癥特異性胚系eQTL類似,顯示了轉錄非活性區域的富集,如異染色質。
Fig. 1d, 1e
d,突變特征?(Mutational signatures,?Sig)與基因表達之間的顯著性關聯結果總結。
頂:每1類突變特征/Signature?(FDR ≤ 10%)中,關聯基因的總數。
下:每1類突變特征/Signature相關的基因,其富集到的GO分類/Categories或Reactome通路 (FDR≤10%,顯著性水平以顏色編碼,-log10轉換后的校正后的P值)。
e,僅考慮SCNAs、胚系eQTLs、編碼和非編碼突變,AEI?(Allelic expression imbalance,非平衡等位基因表達)存在的標準效應 (Standardized effect)大小。數據是對效應大小的估計和標準誤的估計。
--?未翻譯完,更多內容請查看原文;下文主要涉及:摘要、方法和部分圖形解讀?--
Fig. 2 | 體細胞突變對選擇性剪接的位置特異性影響 (Position-specific effect of somatic mutations on alternative splicing)
a,頂部,外顯子-內含子連接 (Exon–intron junctions)附近,及與外顯子跳過事件 (Exon-skipping event)相關的分支位點 (Branch sites)的突變比例。具有相關剪接變化的突變是指其中:The percentage spliced in-derived |z-score| is ≥ 3 (圖中的深藍色)。星號:Intron positions significantly enriched for splicing changes relative to background based on a permutation test. *P < 0.05, **P < 0.01, ***P < 0.001。底部:?sequence motifs of regions。
Fig. 2b, 2c
b,腫瘤抑制基因STK11的外顯化 (Exonization)事件的例子。圖的上方,對于攜帶變異 (Alternative/替代)等位基因的供體,基因的某部分的RNA-seq的Reads覆蓋顯示為紅色,而對于攜帶參考等位基因的隨機供體 (Random?donor with reference allele)則顯示為灰色。盒式外顯子事件 (Cassette?exon event)顯示在圖的下方。
c, Enrichment of SINE elements in SAVs (Splicing-associated variants,剪接相關變異) compared to sequence background (BG). Shown for SINE elements overlapping in sense (middle) and antisense (right) directions.
Fig. 3 |?與RNA融合相關的結構重排
a,所有檢測到的和新的融合的數量,及其與癌癥普查 (Census)基因的重疊部分。b、橋接融合示意圖。橋接融合是由連接兩個基因的第三個基因組片段形成的復合融合。在每種情況下,只描述了一種可能的基因組排列順序,斷點被突出顯示為“閃電”。
Fig. 4 | 影響腫瘤的DNA和RNA變化的全局視圖?
a, 不同組織類型的不同改變的中位數. Histotypes are ordered by hierarchical clustering based on the pattern of different types of alteration.?Alt., alternative; non-syn, non-synonymous. Cancer-type abbreviations are listed in Supplementary Table 23.?
b, c, Circular representations of the selected genes significantly co-occurred with B2M (b) and PCBP2 (c). Connecting lines indicate the specific types of co-occurrence of alteration pairs. 內部直方圖顯示不同顏色的不同DNA/RNA變化類型的發生頻率。
d, 所有74個癌癥體細胞突變目錄 (COSMIC)的癌癥普查基因,或PCAWG驅動基因,在RNA和DNA水平的改變中、存在頻繁和異質性地改變。黃條:DNA水平發生改變的樣本比例,綠條:RNA水平發生改變的樣本比例。(二者呈現相反的趨勢,可以這么理解:腫瘤中如果一個基因已經發生了突變,則其表達與否,是次要影響因素,后者不再受癌癥演變的選擇;反之亦然。有些基因注定是驅動突變 (如TP53),另外一些基因則是“被動表達 (如GAS7)”,即驅動突變引起的對其它一系列基因表達調控的影響)。中間一欄:該基因觀察到的每種變異類型的比例。
e, 在我們發現的顯著地重復出現的基因的列表中的癌癥基因的富集 (The enrichment of cancer genes within our list of significantly recurrent genes)。
文章2:綜合多組學分析確定非肌肉浸潤性膀胱癌 (Non-muscleinvasive bladder cancer)的預后分子亞型
接收: Nature Communications
時間/作者:2021/丹麥奧胡斯大學醫院分子醫學系
鏈接:doi.org/10.1038/s41467-021-22465-w
摘 要
??非肌層浸潤性膀胱癌(NMIBC)的分子特征是生物異質性大,臨床結果可變。在這里,我們對診斷為NMIBC的患者(n=834)進行了多組學綜合分析。轉錄組分析確定了反映腫瘤生物學和疾病侵襲性的四個類別(1、2a、2b和3)。基于轉錄組的亞型和染色體不穩定性水平提供了超出既定預后臨床病理參數的獨立預后價值。染色體高度不穩定性、p53通路中斷和APOBEC相關突變與轉錄組2a類和不良預后顯著相關。RNA衍生的免疫細胞浸潤與染色體不穩定的腫瘤相關,并在2b類中富集。空間蛋白質組學分析證實2b類腫瘤浸潤程度較高,并證明免疫細胞浸潤程度較高與復發率較低之間存在關聯。最后,使用單樣本分類工具在1228個驗證樣本中記錄了轉錄組分類的獨立預后價值。該分類器為生物標記物發現和優化下一代臨床試驗中的治療和監測提供了框架。
方 法
? 我們將先前研究中收錄的438種腫瘤的RNA-Seq數據,與新的97個腫瘤RNA-Seq數據一起重分析 (Reanalyzed together?)。
? 基于發現樣本,我們創建了一個包含55名患者的BCG隊列?(臨床上,高危NMIBC手術后經膀胱輔助灌注 (Bacillus Calmette–Guérin, BCG)以根除殘留疾病,從而減少復發和進展的頻率),他們符合以下標準: (1)BCG治療的指征是高級別疾病,(2)患者接受了至少6個BCG系列,(3) 在TURB后12個月內開始BCG治療 (因此,對分析的腫瘤給予BCG)。利用我們數據集中的多種可用特征,利用BCG隊列研究BCG失效的時間。BCG無失敗生存期定義為BCG治療后第一次出現高分級腫瘤或第一次進展至MIBC的時間。
部分圖形解讀
Fig. 1?Transcriptomic classes in NMIBC.?
a?Consensus matrix for four clusters. Samples are in both rows and columns and pairwise values range from 0 (samples never cluster together; white) to 1 (samples always cluster together; dark blue).??(樣本的相關性矩陣,發現聚集為4類)
b?Comparison between the three UROMOL2016 transcriptomic classes and the UROMOL2021 four-cluster solution (76% of tumors in UROMOL2016 class 1 remained class 1, 92% of tumors in UROMOL2016 class 2 remained class 2a/2b and 67% of tumors in UROMOL2016 class 3 remained class 3).?(樣本前后分類、聚集的比較)
c?Kaplan–Meier plot of progression-free survival (PFS) for 530 patients stratified by transcriptomic class.?(以分組的轉錄組聚集分類,做無進展生存曲線;四條曲線分別對應4種分類)
d?Kaplan–Meier plot of recurrence-free survival (RFS) for 511 patients stratified by transcriptomic class.?(同上,無復發生存期 生存曲線)
e,?f?Clinicopathological information and selected gene expression signatures for all patients stratified by transcriptomic class. Samples are ordered after increasing silhouette score within each class (lowest to highest class correlation). CIS carcinoma in situ, EORTC European Organisation for Research and Treatment of Cancer, EAU European Association of Urology, MIBC muscle-invasive bladder cancer, EMT epithelial-mesenchymal transition.?(轉錄組分類的,所有患者的臨床病理信息、及選定的基因表達特征,二者的信息映射。樣本在每個類別中增加輪廓分數后排序(從最低到最高類別相關性)。CIS原位癌,EORTC歐洲癌癥研究和治療組織,EAU歐洲泌尿學協會,MIBC肌肉浸潤性膀胱癌,EMT上皮-間質轉化)?(比如EMT基因集合,在各個樣本中的表達值做加和?)
g?RNA-based immune score and immune-related gene expression signatures for all patients stratified by transcriptomic class.?(轉錄組分類的所有患者的RNA免疫評分和免疫相關基因表達特征)
h?Regulon activity profiles for 23 transcription factors. Samples are ordered after increasing silhouette score within each class (lowest to highest class correlation). Regulons (rows) are hierarchically clustered. (23個轉錄因子的調控活性圖譜。樣本在每個類別中增加輪廓分數后排序(從最低到最高類別相關性)。規則(行)是層級聚類的)
i?Regulon activity profiles for potential regulators associated with chromatin remodeling. The most-upregulated regulons within each class are shown. Regulons are hierarchically clustered. P-values were calculated using two-sided Fisher’s exact test for categorical variables, Kruskal–Wallis rank-sum test for continuous variables and two-sided log-rank test for comparing survival curves. Source data are provided as a Source data file. (與染色質重塑相關的潛在調控因子的調控活性譜。每個類別中最受限制的規則顯示出來。規則是層級聚類的。P值的計算采用分類變量的雙側Fisher精確檢驗,連續變量的Kruskal-Wallis秩和檢驗,生存曲線的比較采用雙側log-rank檢驗。源數據作為源數據文件提供)
圖2 NMIBC中拷貝數的變化
a?根據基因組類別 (Genomic class, GC) 1-3分層的473個腫瘤的全基因組拷貝數圖。增益(增益+高平衡增益)和損失(損失+高平衡損失)匯總在染色體帶面板的左側。EORTC歐洲癌癥研究與治療組織,EAU歐洲泌尿外科協會,MIBC肌肉浸潤性膀胱癌。
b?426例按基因組分類的無進展生存期(PFS) Kaplan-Meier圖。
c?399例按基因組分類的患者無復發生存期(RFS) Kaplan-Meier圖。
d?EORTC高危評分(n = 163)按基因組分類分層的患者的PFS Kaplan-Meier圖。p值的計算采用雙側log-rank檢驗。源數據作為源數據文件提供。
Fig. 3 Genomic alterations associated with transcriptomic classes.?(與轉錄組分類相關的基因組改變)?
a?Genomic classes (GCs) compared to transcriptomic classes?(n = 303).?兩個組學分類方式的交叉分布展示、統計檢驗。
b.?12-gene qPCR-based progression risk score?compared to GCs. Colors indicate transcriptomic classes.?
c?Kaplan–Meier plot of progression-free survival (PFS) for 154 patients (including only class 2a and 2b tumors) stratified by?GC.?
d.?Number of?RNA-derived mutations?according to transcriptomic classes.?
e?Landscape of genomic alterations?according to transcriptomic classes. Samples are ordered after the combined contribution of the APOBEC-related mutational signatures. Panels: RNA-derived mutational load,?relative contribution of four RNA-derived mutational signatures?(inferred from 441 tumors having more than 100 single nucleotide variations), selected RNA-derived mutated genes,?copy number alterations?in selected disease driver genes (derived from SNP arrays). Asterisks indicate p-values below 0.05. Daggers indicate BH-adjusted p-values below 0.05.?
f.?Comparison?of?RNA-derived single nucleotide variations?to?whole-exome sequencing (WES)?data from 38 patients for 11,016 mutations in all genes, 280 mutations in the genes most frequently mutated or differentially affected between the classes (n = 82, Supplementary Fig. 5b) and 93 mutations in 19 selected bladder cancer genes (Fig. 3e). Only mutations with > 10 reads in tumor and germline DNA were considered and a mutation was called observed when the frequency of the alternate allele was above 2%.?
g.?Genomic alterations significantly enriched in one transcriptomic class vs. all others.?
h?Overview of?p53 pathway alterations?for all tumors with available copy number data and RNA-Seq data (n = 303).?
i?Amount of genome altered according to?p53 pathway alteration.?intact (完好無損的)
j?Number of mutations according to mutations in?DNA-damage response (DDR) genes?(including TP53, ATM, BRCA1, ERCC2, ATR, MDC1).?
k.?RNA-based?immune scoreaccording?to GCs.?
l?RNA-derived?mutational load according?to GCs.?
m?Relative contribution of the?APOBEC-related mutational signaturesaccording to?transcriptomic class.?
(采用的統計檢驗方法等)?P-values were calculated using two-sided Fisher’s exact test for categorical variables, Kruskal–Wallis rank-sum test for continuous variables and twosided log-rank test for comparing survival curves. For all boxplots, the center line represents the median, box hinges represent first and third quartiles and whiskers represent ± 1.5× interquartile range. Source data are provided as a Source data file.
Fig. 4?Spatial proteomics analysis of tumor immune contexture. a Multiplex immunofluorescence staining with Panel 1 (CD3, CD8, and FOXP3) of tumors with high- and low immune infiltration with magnifications of T helper cells (CD3+, CD8? and FOXP3?), a cytotoxic T lymphocyte (CTL; CD3+, CD8?, FOXP3?) and a regulatory T cell (Treg; CD3+, CD8? and FOXP3+). Yellow dashed lines divide the tumor tissue into parenchymal and stromal regions. Scale bar: 20 μm. All protein measurements were performed once for each distinct sample. b Spatial organization of immune cell infiltration and antigen recognition/escape mechanisms (MHC class 1 and PD-L1) with associated data for genomic class, transcriptomic class, and recurrence rate. The immune cells and immune evasion markers are defined as the percentage of positive cells in the different regions (stroma and parenchyma) and normalized using zscores, (1) z ? exμT σ . Columns are sorted by the degree of immune infiltration into the tumor parenchyma in descending order from left to right. c Immune infiltration stratified by transcriptomic class. Immune infiltration is defined as the percentage of total cells in the parenchyma classified as immune cells. The p-value was calculated using two-sided Wilcoxon rank-sum test. d Immune infiltration stratified by recurrence rate. The p-value was calculated by the one-sided Jonckheere–Terpstra test for trend. e Kaplan–Meier plot of recurrence-free survival (RFS) for patients with tumors with few genomic alterations (GC1 + 2) stratified by immune infiltration. P-value was calculated using two-sided log-rank test. f Distribution of CK5/6 and GATA3 positive carcinoma cells stratified by transcriptomic class. Each column represents a patient. The p-value reflects the difference in CK5/6 expression across classes and was calculated by chi-squared test. For boxplots, the center line represents the median, box hinges represent first and third quartiles and whiskers represent ± 1.5× interquartile range. Source data are provided as a Source data file.
Fig. 5?Prediction models and summary characteristics of classes. a Overview of hazard ratios calculated from univariate Cox regressions of progressionfree survival using clinical and molecular features. Black dots indicate hazard ratios and horizontal lines show 95% confidence intervals (CI). Asterisks indicate p-values below 0.05 and the sample sizes, n, used to derive statistics are written to the right. CIS carcinoma in situ, EORTC European Organisation for Research and Treatment of Cancer, EAU European Association of Urology. b Receiver operating characteristic (ROC) curves for predicting progression within 5 years using logistic regression models (n = 301, events = 19). Asterisks indicate significant model improvement compared to the EORTC model (Likelihood ratio test, BH-adjusted p-value below 0.05). AUC area under the curve, CI confidence interval. c Summary characteristics of the transcriptomic classes. Molecular features associated with the classes are mentioned, and suggestions for therapeutic options with potential clinical benefit are listed. MIBC muscle-invasive bladder cancer, EMT epithelial-mesenchymal transition, CTLs cytotoxic T lymphocytes. Source data are provided as a Source data file.
Fig. 6?Validation of transcriptomic classes in independent cohorts. a Summary of classification results and stage distribution for all tumors, tumors with microarray data and tumors with RNA-Seq data (1228 tumors were classified in total and 1225 of these were assigned to a class). b Association of tumor stage, tumor grade and FGFR3 and TP53 mutation status with transcriptomic classes. P-values were calculated using two-sided Fisher’s exact test. c Kaplan–Meier plot of progression-free survival (PFS) for 511 patients stratified by transcriptomic class. The p-value was calculated using two-sided logrank test. d Association of regulon activities (active vs. repressed status) with transcriptomic classes in the UROMOL cohort (including samples with positive silhouette scores, n = 505) and transcriptomic classes in the independent cohorts (pooled). The heatmap illustrates BH-adjusted p-values from two-sided Fisher’s exact tests. e Pathway enrichment scores within transcriptomic classes in the UROMOL cohort (including samples with positive silhouette scores, n = 505) and transcriptomic classes in the independent cohorts (pooled). Asterisks indicate significant association between pathway and class (one class vs. all other classes, two-sided Wilcoxon rank-sum test, BH-adjusted p-value below 0.05). Triangles indicate direction swaps of pathway enrichment in the independent cohorts compared to the UROMOL cohort. GSVA gene set variation analysis. Source data are provided as a Source data file.
文章3:多組學分析揭示腫瘤突變負荷對肝癌預后的價值
日期: 2021
期刊:Cancer Cell Int (IF=6.5)
鏈接:doi.org/10.1186/s12935-021-02049-w
整篇文章似乎是:轉錄組分析轉錄組,WES分析WES,二者未做關聯分析
摘 要
? 背景: 肝細胞癌 (HCC)是世界上第6種具有高侵襲性特征的常見惡性腫瘤。腫瘤突變負荷 (Tumor mutation burden, TMB)是多種腫瘤免疫治療反應性的指標。然而,TMB在腫瘤免疫微環境 (TIME)中的作用尚不清楚。
? 方法: 采用“maftools”軟件包對突變數據進行分析。采用加權基因共表達網絡分析(WGCNA)確定與TMB值相關的候選模塊和顯著基因。采用R軟件包“limma”對不同水平TMB亞組進行差異分析。基因本體 (GO)富集分析采用“clusterProfiler”、“enrichment plot”和“ggplot2”軟件包實現。通過系統的生物信息學分析,建立了風險評分特征。進一步分析KM生存曲線和受試者工作特征 (ROC)曲線,以判斷預后的有效性。為了描述TIME的綜合上下文,我們使用了XCELL、TIMER、QUANTISEQ、MCPcounter、EPIC、CIBERSORT和CIBERSORT- abs算法。此外,進一步探討了風險評分在免疫檢查點封鎖 (ICB)免疫治療中的潛在作用。實時定量PCR檢測HTRA3的表達。
? 結果: TMB值與老年、男性、早期T狀態呈正相關。共篩選到75個TMB相關基因與差異表達基因 (DEGs)的交集基因,并富集于細胞外基質相關通路。基于3個中樞基因的風險評分顯著影響總生存 (OS)時間、免疫細胞浸潤和ICB相關中樞目標。外部試驗組驗證了風險評分對預后的影響。構建風險臨床圖,供臨床應用。進一步研究證實HTRA3是肝癌預后的影響因素。最后,TP53突變與風險評分相關,不影響基于風險評分的預后預測。
? 結論: 綜合分析TMB可能會為突變驅動的腫瘤發生機制提供新的見解,進一步有助于個性化的免疫治療和肝癌的預后預測。
關鍵詞: 腫瘤突變負荷,肝癌,腫瘤免疫微環境,免疫治療
Fig. 1 Landscape of somatic mutation profiles in HCC samples. A Mutation information of each gene in each sample was shown in the waterfall plot, where different colors with specific annotations at the bottom meant the various mutation types. The barplot above the legend exhibited the number of mutation burden. B Cohort summary plot displaying distribution of variants according to variant classification, type and SNV class. Bottom part (from left to right) indicates mutation load for each sample, variant classification type. A stacked barplot shows top ten mutated genes. C TCGA HCC樣品降雨圖,每個點都是一個根據SNV類型編碼的突變顏色?(Rainfall plot of TCGA HCC sample TCGA?UB?A7MB?01A?11D?A33Q?10. Each point is a mutation color coded according to SNV class.) D?顯示肝癌中SNV分布的 (核苷酸)轉變及反轉,可分為6個轉變和反轉事件。堆疊條形圖顯示了MAF文件中每個樣本的突變譜分布 (Transition and transversion plot displaying distribution of SNVs in HCC classified into six transition and transversion events. Stacked bar plot (bottom) shows distribution of mutation spectra for every sample in the MAF file. E 突變基因間的一致性和排他性聯系 (The coincident and exclusive associations across mutated genes). TMB與年齡的相關性 (The correlation of TMB with age) (F), gender (G) and T status (H)
Fig. 2 Construction of weighted gene co-expression network of HCC samples.
A Sample dendrogram and clinical-traits heatmap was plotted. B Selection of the soft threshold made the index of scale-free topologies reach 0.90 and analysis of the average connectivity of 1–20 soft threshold power. C TMB-related genes with similar expression patterns were merged into the same module using a dynamic tree-cutting algorithm, creating a hierarchical clustering tree. D Heatmap of the correlations between the modules and TMB value (traits). Within every square, the number on the top refers to the coefficient between the TMB level and corresponding module, and the bottom is the P value
Fig. 3 Differential analysis of gene expression data in high- and low-TMB groups and enrichment pathway annotation. A Volcano plot was delineated to visualize the DEGs. Red represented upregulated and green represented downregulated. B Heatmap of top 40 DEGswas drawn to reveal different distribution of expression state, where the colors of red to blue represented alterations from high expression to low expression. C Venn diagram of the hub genes from WGCNA blue module and DEGs. Pathway enrichment analyses of TMB hub genes. D Gene Ontology (GO) enrichment analysis of na?ve B cells-related genes: biological processes (BP), cellular components (CC) and molecular function (MF). E KEGG enrichment analysis of na?ve B cells-related genes.
Fig. 4 發現組預后風險特征的驗證 (Validation of the prognostic risk signature in discovery group). A Heatmap presents the expression pattern of three hub genes in each patient. B 多基因簽名風險評分分布?(Distribution of multi-genes signature risk score). C The survival status and interval of HCC patients. D Kaplan–Meier curve analysis presenting difference of overall survival between the high-risk and low-risk groups. E?體細胞突變數分布 (Distribution of somatic mutation count). F?總生存期的單因素Cox回歸分析?(Univariate Cox regression analyses of overall survival). G?多因素Cox回歸分析總生存期?(Multivariate Cox regression analyses of overall survival).
Fig. 5 預后風險特征的臨床意義 (Clinical significance of the prognostic risk signature). A 熱圖顯示每個樣本的臨床特征,及相應的風險評分的分布情況。高、低風險評分組臨床變量亞型的比例?(Heatmap presents the distribution of clinical feature and corresponding risk score in each sample. Rate of clinical variables subtypes in high or low risk score groups). B Age, C Gender, D WHO grade, E clinical stage, F T status, G N status and H M status
文章4:1,699例兒科白血病和實體瘤的泛癌基因組和轉錄組分析
接收: Nature (Letter)
時間:2018
鏈接:doi:10.1038/nature25795
摘 要
??跨多種癌癥類型的分子畸變 (Aberration)分析,被稱為泛癌分析,確定在不同譜系的癌細胞中失調的關鍵生物過程的共性和差異。泛癌分析已用于1~4歲成人癌癥,但未用于兒童癌癥,這些癌癥通常發生在發育中的中胚層組織而不是成人上皮組織。在此,我們對6種組織類型的1,699例兒童白血病和實體腫瘤進行了體細胞改變的泛癌癥研究,包括單核苷酸變異、小插入或缺失、結構變異、拷貝數改變、基因融合和內部串聯重復,在統一的分析框架下處理了全基因組、全外顯子組和轉錄組測序數據。我們報告了兒科癌癥中的142個驅動基因,其中只有45%與成人泛癌癥研究中發現的一致;拷貝數改變和結構變異構成了大多數 (62%)的事件。研究確定了11個全基因組突變特征,其中一個是由于8例非整倍體白血病中暴露于紫外線所致。34%的蛋白編碼突變檢測到突變等位基因的轉錄,20%表現出等位基因特異性表達。這些數據為兒科癌癥提供了一個全面的基因組架構,并強調了兒科癌癥特異性發展精確治療的必要性。
??對兒童腫瘤組臨床試驗中登記的1,699名兒童癌癥患者的配對腫瘤和正常樣本進行了分析,包括689例B系急性淋巴母細胞白血病 (B-ALL), 267例T-ALL, 210例急性髓系白血病?(AML), 316個神經母細胞瘤 (NBL), 128個Wilms腫瘤和89個骨肉瘤 (擴展數據圖1a-c)。所有腫瘤標本都是在最初診斷時獲得的,98.5%的患者年齡在20歲或更小 (參見方法,擴展數據圖1d)。
Extended Data Figure 1 | Cohort description and workflow. a, Venn diagram of samples analysed by whole-exome (WES), whole genome (CGI) and whole transcriptome (RNA-seq) sequencing in this cohort. b, c, Sample-level sequencing status of the entire cohort (b) and those with WGS data (c, SNP6 for T-ALL). d, Age distribution for each histotype. Median, first and third quartiles are indicated by horizontal bars. Sample sizes are indicated in parentheses. Percentage of cases with age over 20 years are indicated. e, Analytical workflow. The tumour/normal BAM files of WES data were analysed by our in-house pipeline followed by manual quality control. The mutation annotation format files generated by CGI were downloaded from TARGET Data Matrix (see Methods) and analysed by a pipeline developed for this dataset, including SNVs, indels and structural variants. CAN and LOH were analysed using read counts of germline SNPs in the mutation annotation format files. Manual quality control was also performed. For RNA-seq data, the FASTQ files were re-mapped and fusions and ITDs were analysed with CICERO. The resultant mutations were analysed by GRIN (SNVs, indels, CNAs, structural variants and fusions) and MutSigCV (SNVs and indels) to discover 142 recurrently mutated genes. f, One representative sample with chromothripsis for each histotype. CNAs are shown in the inner circle, orange indicates copy gain and blue indicates copy loss. Intra- and interchromosomal rearrangements are shown as green and purple curves, respectively.
體細胞突變率和特征。每個組織類型的樣本大小顯示在括號中。來自WGS的非編碼SNV的突變率(a),來自WGS和WES的編碼SNV的突變率(b)。紅線表示中位數。a和b分別縮放到WGS (n=651)、WGS或WES (n= 1639)的樣本總數。c,從WGS和T-ALL WES數據中識別的突變特征及其在每種組織類型中的貢獻。d,各組織類型中代表性樣本的突變譜。超變異體(高于相應組織類型平均比率三個標準差/SD)用星號標記。e、各組織類型中各特征MAF的均值和標準差 (Mean and s.d. of MAF of each signature in each histotype)。
圖2 | 兒科癌癥中候選驅動基因。a,前100個反復突變的基因: 每種組織類型的病例數以與圖例相同的顏色顯示。星號表示既往成人泛癌分析中未報道的基因。b、兩兩關系有統計學意義 (P < 0.05; 雙側Fisher精確檢驗)在每個組織類型中是否共存(紅色)或排他(藍色)。Q < 0.05的基因對被標記為暗紅色(同時發生)或深藍色(不同時發生),以解釋錯誤發現率。僅在WGS + WES樣本中檢測到的顯著性用星號標記。括號中顯示的是突變樣本的數量。
圖3 | 兒童癌癥中生物學過程與體細胞改變
a,顯示每種組織類型中至少有一個驅動因子改變的腫瘤的百分比。WGS分析的腫瘤可能有點突變(淺灰色),CNAs或結構變異(深灰色),或兩者兼有(黑色)。對于T-ALL, CNAs來源于SNP陣列。b,每種組織類型中21種生物通路中發生體細胞改變的腫瘤百分比; 組織類型排序如a所示。每個通路的彩色部分表示3個TCGA泛癌癥研究中缺失的基因變異的百分比。c, RAS、酪氨酸激酶和PI3K通路中組織型突變的發生。
Extended Data Figure 4 | Example driver mutations.
a, Diverse mutation types of STAG2. Variants are coloured by histotype as in Fig. 2. Circles and half-moons represent mutations and structural alterations, respectively. Bottom panel shows RNA-seq for an SNV at the ?8 position of STAG2 exon 7, which created a de novo splice site resulting in an out-of-frame transcript. b–d, Truncating mutations by deletion or ITD. e, Cohesin complex detected by HotNet2 analysis. f, Samples with mutations in cohesion complex. g–k, Selected examples of singleton oncogenic activation caused by high level amplifications including CDK4 (g), PDGFRA (h), and YAP1 (i) with FPKM and histotype-wise ranks indicated, as well as recurrent co-amplification of MYCN-ALK in two NBL samples (j, k). l, Recurrent MAP3K4 mutation with structural model in N lobe (m). Location of the mutation p.G1366R is indicated by a magenta sphere and the alteration side chain is modelled as a stick. Known activating alterations (p.I1361M and p.M1415I) are shown as teal spheres. GADD45 binding (A1), kinase inhibitor (A2), and kinase domains (B1, B2) are indicated in l. n, ITD in UBTF. o, Fusion of FEV. p, q, Mutations in novel driver genes NIPBL and LEMD3.
文章5:成人彌漫性膠質瘤的縱向分子軌跡 (Longitudinal molecular trajectories)
日期:20 November 2019
期刊:Nature
鏈接:doi.org/10.1038/s41586-019-1775-1
摘 要
? 在成人彌漫性膠質瘤患者中,導致普遍治療耐藥的進化過程尚不清楚。在這里,我們分析了222名成年膠質瘤患者的分離的DNA測序數據和相匹配的臨床注釋。通過分析彌漫性膠質瘤3種主要亞型的突變和拷貝數,我們發現在疾病初始階段檢測到的驅動基因在復發時保留,而很少有證據表明復發特異性基因改變。在不同的膠質瘤亞型中,使用烷基化劑治療可導致高突變表型的不同率,高突變與總生存期的差異無關。在復發性膠質瘤中經常發現獲得性非整倍體,其特征是IDH突變,但不存在染色體臂1p/19q的共缺失,并進一步與細胞周期的獲得性改變和不良結果相融合。每個腫瘤的克隆結構隨著時間的推移保持相似,但亞克隆選擇的存在與生存率下降有關。最后,初始和復發膠質瘤之間的免疫編輯水平沒有差異。總的來說,我們的研究結果表明,最強的選擇壓力發生在膠質瘤早期發展階段,而目前的治療方法在很大程度上以隨機的方式塑造了這一演變。
方 法
? GLASS數據集包括未發表和已發表的測序數據,如補充表1所示。在隊列中有來自436個膠質瘤樣本 (200例患者)的外顯子組,來自165個膠質瘤樣本 (78例患者)的全基因組數據,78個膠質瘤樣本(38例患者)的外顯子組/全基因組數據重疊。所有患者均可獲得匹配的胚系測序。該數據集包括257組至少兩個時間間隔的腫瘤樣本,17例獨立復發,19例患者至少有兩個地理上不同的腫瘤部分。更具體地說,該數據集包括211例原發性膠質瘤的外顯子組或全基因組測序數據,234例首次復發,32例第2次復發,11例第3次復發和1例第4次復發?(補充表7)。
文章6:透過GLASS觀察膠質瘤:彌漫性膠質瘤的分子進化和膠質瘤縱向分析聯盟 (Glioma Longitudinal Analysis Consortium)
日期:?2018
期刊:Neuro-Oncology
鏈接:doi:10.1093/neuonc/noy020
摘 要
? 成人彌漫性膠質瘤 (Adult diffuse gliomas)是一組不同的腦腫瘤,對患者和他們的家人造成高度的情感傷害。癌癥基因組圖譜和類似的項目為診斷膠質瘤的體細胞改變和分子亞型提供了全面的理解。然而,在疾病進展過程中,膠質瘤經歷了顯著的細胞和分子進化。我們回顧了當前關于原發腫瘤和疾病復發后的基因組和表觀遺傳學異常的知識,強調了文獻中的空白,并闡述了需要一個新的多機構的努力來彌合這些知識空白,以及膠質瘤縱向分析聯盟 (GLASS)如何旨在系統地編目膠質瘤的縱向變化。GLASS計劃將為膠質瘤向一種致命表型的進化提供必要的洞見,有可能揭示可靶向的 (腫瘤)弱點,并最終改善有需要的患者群體的結果。
文章7:全基因組和轉錄組分析在兒童和青年成人癌癥中的可行性
(側重臨床診斷和應用)
接收: Nature Communications
時間/作者:2022/紀念斯隆凱特琳癌癥中心兒科
鏈接:doi.org/10.1038/s41467-022-30233-7
摘 要
? 腫瘤全基因組和轉錄組測序 (cWGTS)在腫瘤學中的應用日益受到重視。然而,cWGTS的實施受到以下方面的挑戰:需要在臨床相關的時間框架內提供結果,對檢測靈敏度的擔憂,結果的報告和優先排序。在一項前瞻性研究中,我們制定了一個工作流程,在9天內報告全面的cWGTS結果。將cWGTS與診斷組分析進行比較,表明cWGTS具有在一個工作流程中捕獲所有臨床報告的具有同等敏感性的突變的潛力。基準測試確定了臨床WGS測序的最佳深度至少為80×。結合胚系、體細胞DNA和RNA-seq數據,可以實現數據驅動的變異優先排序和報告,報告的致癌結果比標準護理的患者多54%。這些結果確立了實施cWGTS作為臨床腫瘤學綜合測試的關鍵技術考慮。
Fig. 1 End-to-end cWGTS workflow. a Schematic representation of the end-to-end cWGTS workflow, with information on median-time duration (in hours) for each step, as determined by a time trial over four consecutive batches containing n = 16 tumors and representation of dedicated resources necessary to execute the workflow. b Comparison of bestreported turnaround times in literature, from sample collection to results ready for tumor board review. For our study, we show an orange bar denoting median time for n = 16 samples with minimum and maximum times denoted with the error bar. These samples were processed post optimization.
Fig. 2 Analytical validity of cWGTS for clinical biomarkers. a The left barplot depicts the proportion of patients with therapy-informing, oncogenic, or no relevant findings reported by MSK-IMPACT as defined by OncoKb (Levels 1–4). The right barplot shows the breakdown (0,1,2) of the highest level of OncoKb annotation in the study cohort. b Barplot demonstrating breakdown of the highest OncoKb level by the number of informative biomarkers in study cohort. c Barplot demonstrating breakdown of the highest OncoKb level by disease class. d Scatterplot shows the comparison of variant allele frequency (VAF) of MSK-IMPACT variants as reported by MSK-IMPACT (x axis) and absolute VAF estimates by pileup in WGS data (y axis) (Pearson correlation). Discrepant mutations are observed along the x axis. Mutations are color-coded by call status, where Both is called in both assays and ITH is mutations that were not called in higher- depth resequencing and/or had proportion test p-value < 0.05. e Barplot demonstrating breakdown of MSK-IMPACT mutations, observed in both WGS and MSK-IMPACT or only MSK-IMPACT (ITH). f Validation of oncogenic fusions reported by MSK-IMPACT/MSK-Fusion in cWGTS. The asterisk indicates that the SS18-SSX1 that was reported by MSK-Fusion was reported as SS18-SSX2 by RNA-seq and supported by spanning reads in WGS. Main oncotree disease code listed underneath for each patient (ARMS alveolar rhabdomyosarcoma, CHS chondrosarcoma, DLGT diffuse leptomeningeal glioneural tumor, DSRCT desmoplastic small round-cell tumor, ES Ewing sarcoma, MBL medulloblastoma, MFH undifferentiated pleomorphic sarcoma/malignant fibrous histiocytoma/high-grade spindle-cell sarcoma, RCSNOS round-cell sarcoma, NOS, SYNS synovial sarcoma, US undifferentiated sarcoma, USPC undifferentiated sarcoma of the peritoneal cavity). Source data for panels a–e and f are provided in Supplementary Data 4 and 6.
Fig. 3 Assessment of optimal coverage for WGS. a Barplots demonstrating sensitivity of variant detection and 95% confidence intervals (error bars) by coverage depth (100x, 80x, 60x, and 30–40x) from left to right for: 1. clinically relevant events detected by MSK-IMPACT and WGS (n = 220), 2. genomewide SNVs, 3. genome-wide indels, and 4. genome-wide SVs. Only data from samples with original median coverage >100x (n = 32) are shown. Red dots indicate overall sensitivity of all mutations across all BAMs at the same subsampling level. b Histograms of variant allele frequencies for each subsampling level for a representative sample in the study cohort (H135973), showing loss in sensitivity to detect subclonal mutations at lower sequencing depth of coverage. c Scatterplot of effective local coverage vs VAF in subsampled BAMs for the clinically relevant calls from MSK-IMPACT. Variants called in subsampled BAMs are shown with circles, while the missed variants are denoted with X’s. Trendline shows the cumulative binomial distribution for obtaining at least 2 variant reads, given the effective coverage and variant allele fraction. Source data for panels a, c are provided at the data repository. Raw data for panel b can be accessed at the dbGAP study.
Fig. 4 Additional relevant findings detected by cWGTS as compared with standard of care. a Heatmap of additional relevant findings by cWGTS colored by what technology (WES, WGS, and RNA-seq) may detect each event. Columns represent patients, while rows are clinical event types. The asterisks for Germline indicate pathogenicity supported by mutational signatures. b (top) Stacked-bar breakdown of patients with clinically relevant findings by assay. The blue areas (solid or meshed) represent patients with relevant findings from targeted sequencing (RNA and DNA), while the orange areas (solid or meshed) are for patients with findings from cWGTS. The blue/orange mesh indicates patients that had relevant findings from both targeted sequencing and WGTS. (bottom) Stacked-bar breakdown of findings specific to cWGTS from the patients in the orange section (solid or meshed) from top. The relevant findings are colored by event type. SV, structural variant. TMB, tumor mutation burden. MSI, microsatellite instability. Small Mut, small mutations, including substitutions and insertion/deletions. Viral, viral integration. Source data for panels a, b are provided in Supplementary Data 3.
Fig. 5 Integration of DNA and RNA findings for variant annotation. a Top panel shows absolute copy number on the y axis and the structural variants (SVs) that result in PAX3-FOXO3 fusion in patient H134768. Lower panel displays RNA fusion product created by the corresponding genomic SVs. b tSNE clustering of methylation data from rhabdomyosarcoma samples color-coded by disease subtype (ARMS: alveolar, ERMS: embryonal, SCRMS: spindle cell, and SRMS: sclerosing). The patient harboring the PAX3-FOXO3 fusion clusters with the ARMS samples. c Top panel shows the chromoplexy event among chromosomes 6, 9, and 18, resulting in the localization of NFIB enhancer to the MYB locus in patient H133676. Lower panel displays H3K27me3 chromatin marks from Drier et al., Nature Genetics 2016.?
d Boxplot shows the MYB expression in transcripts per million (TPM) across the cohort. Center line indicates the median and whiskers extend within +/?1.5x the interquartile range (IQR) from the box. The patient with MYB-NFIB event (H133676) is highlighted in orange, demonstrating that the SV event in panels c associates with overexpression of MYB, validating the SV as an enhancer-hijacking event. e Diagram of SV events targeting TP53 gene body in osteosarcoma patients (n = 12, the 13th patient’s event breakpoints fall outside of the gene body). SVs are shown as arrows with absolute copy number on the y axis (gray dots) overlaid over the exonic structure of TP53 (TRA: translocation, DUP: duplication, DEL: deletion, INV: inversion). f Boxplot shows the comparison of TP53 expression in RNA between TP53-rearranged samples and those without any rearrangement with a center line indicating the median and whiskers extending within +/?1.5 x the IQR (two-sided Mann–Whitney U test, p = 1.645e-03). Raw data for panel a–c can be accessed at the dbGAP study. Source data for panel e are provided in Supplementary Data 9. Source data for panels d, f are provided at the data repository.
Fig. 6 Genome-wide distribution and patterns of somatic mutations for four different patients. a Neuroblastoma patient (H135421) harboring a pathogenic germline MUTYH variant (c.924 + 3A > C). b Immature teratoma patient (H135466) with a pathogenic germline PMS2 mutation (c.538- 1G > C). c Malignant peripheral nerve sheath tumor patient (H135073) harboring a germline PMS2 variant of unknown significance (VUS) (p.W841*). For each patient, the top panel is a Circos plot showing the different types of somatic mutations along the genome. The outermost ring shows the intermutation distance for all SNVs color-coded by the pyrimidine partner of the mutated base. The middle ring shows small insertions (green) and deletions (red). The innermost ring shows copy number changes, and the arcs show SVs. Middle panel is a barplot showing the absolute number of mutations attributed to the five mutational signatures with the highest exposure in the tumor. Bottom panel is a barplot showing the 96 trinucleotide contexts of SNVs. d Genome-wide distribution and patterns of somatic mutations identified in the patient outside the cohort with recurrent osteosarcoma (H201472). WGS results show the sample is hypermutated, with enrichment in SBS26, T > C mutations, repeat-mediated deletions, and MSI unstable. The patient was found to be harboring a pathogenic PMS2 variant (p.D699H) (repeat deletion: repeat-mediated deletion, m-homology: microhomologymediated deletion, deletion other: all other deletions, TRA translocation, DUP duplication, DEL deletion, INV inversion). Raw data for this figure can be accessed at the dbGAP study.
Fig. 7 Genome-wide mutational burden in the context of immunotherapy. a Distribution of coding tumor mutational burden (TMB) as assessed by WGS across the cohort (n = 114), colored by treatment status of the patient at the time of sampling. Dotted line indicates median-coding TMB (SNVs and indels) as previously reported by the Zero Childhood Cancer study. Patients are grouped by disease category (NB: neuroblastoma, CNS: central nervous system, C: carcinoma, WT: Wilms tumor, Germ: germ cell tumor, H: hepatoblastoma, O: other). Carcinoma patients C1 and C2 who responded to immunotherapy are labeled. b Distribution of structural variant (SV) (right) and gene fusion (left) burden across the samples with both WGS and RNA-seq available (n = 101). Patient C2 had a poor-quality RNA sample, so clonal fusions from another time point from the same patient are shown. c (top) Genome-wide distribution and patterns of somatic mutations for tumor C1 (H135022), patient with metastatic adrenocortical carcinoma, depicting high SV burden. Circos plots are shown as described in Fig. 6. PET imaging shows resolution of a large pulmonary metastatic lesion (red arrow) following treatment with nivolumab and ipilimumab. d Genome-wide distribution and patterns of somatic mutations for H135462, a 14-year-old with relapsed refractory poorly differentiated clearcell carcinoma with high TMB and SV burden. Circos plots are shown as described in Fig. 5. PET imaging shows resolution of multiple metastatic lesions (red arrows) following treatment with pembrolizumab. Source data for panels a and b are provided at the data repository. Raw data for panel c, d can be accessed at the dbGAP study.
Fig. 8 Comparison of WGS data from matched fresh frozen tumor tissue and cfDNA. a Coverage values ordered by estimated tumor context in cfDNA. b Estimates of tumor content. c Barplots showing the proportion of de novo mutation calls in cfDNA that are present in the matched fresh frozen tumor broken down by variant type. cfDNA samples with no high-confidence SVs denoted with an asterisk. d Genome-wide distribution and mutation patterns of matched fresh frozen (left) and cfDNA (right) samples for H158182. Circos plots are shown as described in Fig. 6. e Individual-level clonality analysis for H158182. (left) Scatterplot of cancer cell fraction (CCF) values for all substitutions color-coded by the estimated cluster. (middle) Phylogenetic tree representation of clusters annotated with clinically relevant variants. (right) Clone-level mutational signature analysis showing the proportion of mutations attributed to each mutational signature with total numbers of mutations in each cluster shown on the right. Whereas drivers associated with these clones could not be determined, cfDNA-specific SNV calls recapitulated mutation signatures in the FF sample, and were enriched for platinum-associated mutational signatures pointing to the existence of therapy-exposed tumor subclones in circulation. (repeat deletion: repeat-mediated deletion, mhomology: microhomology-mediated deletion, deletion other: all other deletions, TRA: translocation, DUP: duplication, DEL: deletion, INV: inversion). Source data for panels a, b are provided in Supplementary Data 11. Source data for panel c are provided at the data repository. Raw data for panels d, e can be accessed at the dbGAP study.
文章8:神經膠質瘤的發展是由遺傳進化和微環境相互作用形成的
來源:2022,Cell
鏈接:doi.org/10.1016/j.cell.2022.04.038
摘 要
? 導致彌漫性膠質瘤治療阻力的因素仍不清楚。為了確定治療相關的細胞和基因變化,我們分析了304例異檸檬酸脫氫酶(IDH)野生型和IDH突變型膠質瘤成年患者的腫瘤對的RNA和/或DNA測序數據。腫瘤以不同的方式復發,依賴于IDH突變狀態,并可歸因于組織學特征組成、體細胞改變和微環境相互作用的改變。在兩種膠質瘤亞型復發時,高突變和獲得性CDKN2A缺失與增殖的腫瘤細胞增加相關,反映了腫瘤的活躍生長。野生型IDH腫瘤復發時更具侵襲性,其腫瘤細胞表現出神經元信號程序表達增加,這反映了神經元相互作用在促進膠質瘤進展中的可能作用。間充質轉化與骨髓細胞狀態相關,骨髓細胞狀態是由與腫瘤細胞的特異性配體-受體相互作用定義的。總的來說,這些復發相關的表型是改變疾病進展的潛在目標。
Figure 1. Longitudinal cellular heterogeneity in glioma
(A)?Each column represents an initial (I) and recurrent (R) tumor pair. Pairs are arranged based on the combined representation of the proneural and mesenchymal subtypes in their initial tumors. The first track indicates whole-exome (WXS) or whole-genome sequencing (WGS) data availability. The next three tracks indicate bulk subtype signature representation. Stacked bar plots indicate cell-state composition based on the single-cell-based deconvolution method, CIBERSORTx.?
(B) Sankey plot indicating whether the highest-scoring transcriptional subtype changed at recurrence. Numbers in parentheses indicate the number of samples of each subtype: proneural (Pro.), classical (Class.), and mesenchymal (Mes.).?
(C) Average cell-state composition of transcriptional subtypes (left) and initial and recurrent tumors by IDH status (right).
文章9:空間分辨率多組學破譯惡性膠質瘤中腫瘤-宿主雙向相互依賴關系
日期:?2022
期刊:Cancer Cell
鏈接:doi.org/10.1016/j.ccell.2022.05.009
摘 要
? 膠質母細胞瘤是中樞神經系統的惡性腫瘤,其特征是亞克隆多樣性和發育層次的動態適應。這些腫瘤在空間背景下的動態重組的來源仍然難以捉摸。在本研究中,我們通過空間分辨的轉錄組學、代謝組學和蛋白質組學對膠質母細胞瘤進行了表征。通過破譯患者間區域共享的轉錄程序,我們推斷膠質母細胞瘤是由譜系狀態的空間隔離組織起來的,并適應炎癥和/或代謝刺激,讓人想起成熟星形膠質細胞的反應性轉化。代謝成像和成像質細胞分析技術的整合揭示了局部區域腫瘤與宿主的相互依賴性,從而產生空間獨占的自適應轉錄程序。推斷拷貝數的改變強調了與反應性轉錄程序相關的亞克隆的空間內聚組織,證實了環境壓力導致選擇壓力。將膠質母細胞瘤干細胞植入人類和嚙齒動物新皮質組織模擬各種環境的模型證實,其轉錄狀態來源于對各種環境的動態適應。
圖1 方法和隊列概述。(A)空間數據集的工作流和隊列說明(左)和使用的分析方法概述(右)。(B)所有整合的空間分辨轉錄組點的t-隨機鄰域嵌入(tSNE)圖。顏色反映個體標本和患者。數字表示匿名的患者樣本ID。字母表示組織的解剖起源。T,腫瘤; TI,腫瘤浸潤性; TC,腫瘤的核心; C,皮層。(C)預測腫瘤細胞含量的工作流程概述(上)和tSNE圖(下)。顏色表示預測的腫瘤細胞含量的百分比。(D)組織學界定區域的不同分辨率的例子。(E)基于ANN估計的stRNA-seq數據集中惡性斑百分比點圖。在底部,barplot圖解的組織區域分布的樣本。
上述9篇文獻下載鏈接 (提取碼 ysx4)
https://pan.baidu.com/s/1BRC6B0GW2UxELN1pSPFzAg?
往期精品(點擊圖片直達文字對應教程)
機器學習
后臺回復“生信寶典福利第一波”或點擊閱讀原文獲取教程合集
總結
以上是生活随笔為你收集整理的9篇前沿文章 | 一览肿瘤基因组及多组学思路的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: UML课程大作业-网上书店系统
- 下一篇: RBF神经网络——案例一