TCGA数据库下载数据
TCGA數據庫下載數據
(2014-12-11 21:40:40)
轉載▼
標簽:
tcga數據庫 |
分類:biology |
TCGA數據庫是癌癥基因圖集(Cancer Genome Atlas,TCGA)計劃,將針對不同癌癥的所有基因變異進行系統分析。網址:http://cancergenome.nih.gov/
點擊“lauch Data Portal”,進入下載頁面;然后,選擇你需要的癌癥數據,進行下載即可。當然“Dowddata”點擊后,會出現四種不同的方式進行數據下載和篩選
現在,了解RNA-SeqV2中基因和基因的isoform的表達量的計算。當你選擇了RNA-SeqV2的level3數據時,要注意到此時有兩種不同的方法來估算基因的表達量和基因的isoform的表達量,這里可以查看如下網址1和網址2以及這個網址:而RNA-SeqV2的具體信息查閱網址3:?其中,在文件.rsem.genes.results中raw_counts是指The number of reads mapping to this gene,scaled_estimate是指T值“tau value”;而在rsem.genes.normalized_results文件中normalized_count是指upper quartile normalized RSEM count estimates(75%)。或許可以看看這個網址4:???這里有詳細列舉TCGA的RNAseq的處理。在這個網址5上:我們可以知道RSEM的計算具體如下網址6。同時在以下兩個網址,可以知道Thescaled estimate value作為一個衡量基因或isoform的表達量,應該沒有問題。參考網址7和網址
TCGA上面的數據有很多,但是RNA-seq的原始數據貌似必須要申請,而且要求符合一定條件。這個網址9讓大家看看數據的情況。數據一般是48,50,75,paired-en
我從TCGA下載的數據中isoform的id為uc002icp.3,這個應該是UCSCgene的轉錄本id,然后我在下載了hg19的gtf,卻沒有找到這個id;我想到了hg19的注釋文件版本不同是:hg19June2011build),卻沒有發現這個轉錄本id。然后在網址9能找到一些有用的數據。而網址9的信息來自網址10
網址10里面存儲了文件“hg19_M_rCRS.fa”,其實是assemble 染色體序列,來自UCSC中hg19的24條染色體和一個chrM序列。
關于TCGA的一些中文介紹,如網址11;
最后,TCGA中做map的hg19基因組以及所參照的基因注釋文件在網址12.該網址中的hg19June2011build的gaf文件就里面包含了所有的基因(20806),以及與UCSCgene對應的轉錄本id,當然還有它們間的序列。
對于TCGA數據的使用,你可以參考:
| 可參考 這個網站提供的工具:該網址的中文參考:http://www.howsci.com/integrative-analysis-of-complex-cancer-genomics-and-clinical-profiles-using-the-cbioportalal.html http://www.cbioportal.org/public-portal/ The cBio Cancer Genomics Portal providesvisualization,analysisanddownloadof large-scalecancer genomicsdata sets. The portal is developed and maintained by theComputational Biology CenteratMemorial Sloan-Kettering Cancer Center. TCGA中每個樣本都是相互獨立的,兩個樣本的barcode中sample type為06或01,而其他都相同,但是這兩個樣本都有可能來自同個病人的不同組織。例如:兩個乳腺癌樣本TCGA-BH-A1FE-01,TCGA-BH-A1FE-06,01意味著原始癌癥樣本來自乳腺,06意味著轉移樣本,經過查證來自卵巢。具體查找方法,見以下郵件內容: Each sample in TCGA is a separate sample.I think the best place to look for site information of each sample is the pathology report.You can find the pathology report file name (they are pdf files) in the biospecimen_sample file in the pathology_report_file_name column.Once you have the file name, you can search for it using the Bulk Download toolathttps://tcga-data.nci.nih.gov/tcga/findArchives.htm.You will want to copy and paste the pathology report file name in the File Name field and click the Find button.This will give you the directory that contains the pdf file.You can then click the View Files link and it will display all the pathology files in that directory.You will need to search for the pdf file you are interested in and you can open it for viewing there. Let me give you an example.Sample TCGA-D3-A1Q6-06A has pathology report TCGA-D3-A1Q6.5BA4EDD7-8462-4028-8CB9-8FE2DDC51D3E.pdf, and sample TCGA-D3-A1Q6-07A has pathology report TCGA-D3-A1Q6.7E74D698-CA50-40D9-8A06-6CECFD8580DA.pdf. Using the Bulk Download tool, you will find that both of these files are in directory nationwidechildrens.org_SKCM.pathology_reports.Level_1.180.9.0.You can View Files directly in the Bulk Download tool, or you can go to the Open-Access HTTP Directory athttps://tcga-data.nci.nih.gov/tcgafiles/ftp_auth/distro_ftpusers/anonymous/tumor/and drill down to the pathology reports for the disease you are interested in and find the pdf files in there.To drill down you would select the disease you are interested in (e.g., skcm) and then select bcr, nationwidechildrens.org, pathology_reports, reports, and then select the directory that you found using the Builk Download tool.In that directory you will find both pdf files of interest. If you look at the pdf file for sample TCGA-D3-A1Q6-06A, you will see in the hand-written notes at the top right, “Site: subcutaneous tissue.”This matches diagnosis A, found at right upper arm.If you look at the pdf file for sample TCGA-D3-A1Q6-07A, you will see the hand-written notes “Site: lymph nodes, axillary.”This matches diagnosis D, lymph nodes. This is probably the best place to find any details about the location of the sample in question.Note however, that normal samples (such as the BRCA normal sample TCGA-BH-A18V-11, do not have pathology reports. Normal solid tissue samples are typically normal tissue collected adjacent to the tumor sample. |
總結
以上是生活随笔為你收集整理的TCGA数据库下载数据的全部內容,希望文章能夠幫你解決所遇到的問題。
- 上一篇: 漫游飞行_魔兽世界:德拉诺时光周 冲声望
- 下一篇: 第一次失效_神兵小将:净化之力失效地魔兵