GSE98149 (包含H3K9me3的全部階段,H3K4me3和H3K27me3的zygote、E6.5 Epi、E6.5 Exe、E7.5 Epi、E7.5 Exe、E8.5 embryo、Esc) Title:Reprogramming of H3K9me3-dependent heterochromatin during mammalian early embryo development [ChIP-seq] Organism:Mus musculus
for ((i=594;i<=670;i++));dowget ftp://ftp-trace.ncbi.nlm.nih.gov/sra/sra-instant/reads/ByStudy/sra/SRP/SRP105/SRP105176/SRR5479$i/SRR5479$i.sra;done &
[/code]```code# sratookit: .sra 文件 → fastq文件ls *sra |while read id;do/home/chen/sratoolkit.2.8.2-ubuntu64/bin/fastq-dump --gzip --split-3 $id;done &
# 下載小鼠參考基因組的 indexwget -c "ftp://ftp.ccb.jhu.edu/pub/data/bowtie2_indexes/mm10.zip" &# 解壓unzip mm10.zip &
[/code]## 2.2 質(zhì)量控制(data_assess)```code# Fastqc 進(jìn)行質(zhì)控ls *fq | while read id; do fastqc -t 4 $id; done &# multiqc:質(zhì)控結(jié)果批量查看multiqc *fastqc.zip --export &
[/code]```code## trimmomatic # 安裝 trimmomaticwget -c http://www.usadellab.org/cms/uploads/supplementary/Trimmomatic/Trimmomatic-0.38.zip &unzip Trimmomatic-0.38.zip# 數(shù)據(jù)清理# -threads 設(shè)置多線程運(yùn)行java -jar "/data/chen/biosoft/Trimmomatic-0.38/trimmomatic-0.38.jar" PE -threads 2 -phred33 \# 2個(gè)輸入文件${name}_1.fq.gz ${name}_2.trim.fq.gz \# 4個(gè)輸出文件${name}_R1.clean.fq.gz ${name}_R1.unpaired.fq.gz \${name}_R2.clean.fq.gz ${name}_R2.unpaired.fq.gz \# ILLUMINACLIP:去接頭# "$adapter"/Exome.fa :adapter 序列的 fasta 文件# 2:16 個(gè)堿基長(zhǎng)度的種子序列中可以有 2 個(gè)錯(cuò)配# 30:采用回文模式時(shí)匹配得分至少為30 (約50個(gè)堿基)# 10:采用簡(jiǎn)單模式時(shí)匹配得分至少為10 (約17個(gè)堿基)ILLUMINACLIP:"$adapter"/Exome.fa:2:30:10 \# LEADING:3,從序列的開頭開始去掉質(zhì)量值小于 3 的堿基;# TRAILING:3,從序列的末尾開始去掉質(zhì)量值小于 3 的堿基;# SLIDINGWINDOW:4:15,從 5' 端開始以 4 bp 的窗口計(jì)算堿基平均質(zhì)量,# 如果此平均值低于 15,則從這個(gè)位置截?cái)?read;# HEADCROP:<length> 在reads的首端切除指定的長(zhǎng)度;# MINLEN:36, 如果 reads 長(zhǎng)度小于 36 bp 則扔掉整條 read。LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 HEADCROP:10 MINLEN:36
[/code]
## 2.3 比對(duì)到參考基因組(mapping_analysis)Bowtie2 或 BWA```code# bowtie2 [options]* -x <bt2-idx> {-1 <m1> -2 <m2> | -U <r>} -S [<hit>]# -p/--threads NTHREADS 設(shè)置線程數(shù). Default: 1# -q reads 是 fastq 格式的# -x <bt2-idx> index 路徑# -1 <m1> 雙末端測(cè)序的 _1.fastq 路徑。可以為多個(gè)文件,并用逗號(hào)分開;多個(gè)文件必須和 -2 <m2> 中制定的文件一一對(duì)應(yīng)。# -2 <m2> 雙末端測(cè)序的 _2.fastq 路徑.# -U <r> 非雙末端測(cè)序的 fastq 路徑。可以為多個(gè)文件,并用逗號(hào)分開。# -S <hit> 輸出 Sam 格式文件。# -3/--trim3 <int> 剪掉3'端<int>長(zhǎng)度的堿基,再用于比對(duì)。(default: 0).# 用fastqc看了看數(shù)據(jù)質(zhì)量,發(fā)現(xiàn)3端質(zhì)量有點(diǎn)問題,我就用了-3 5 --local參數(shù),# --local 如果fq文件是沒有經(jīng)過 trim 的,可以用局部比對(duì)執(zhí)行 soft-clipping,加上參數(shù)--local 。該模式下對(duì)read進(jìn)行局部比對(duì), 從而, read 兩端的一些堿基不比對(duì),從而使比對(duì)得分滿足要求.bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479594_1.fastq -2 SRR5479594_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/MII_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479595_1.fastq -2 SRR5479595_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/MII_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479597_1.fastq -2 SRR5479597_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/sperm_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479598_1.fastq -2 SRR5479598_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/sperm_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479596_1.fastq -2 SRR5479596_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/sperm_input.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479599_1.fastq -2 SRR5479599_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/zygote_input.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479605_1.fastq -2 SRR5479605_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/zygote_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479606_1.fastq -2 SRR5479606_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/zygote_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479607_1.fastq -2 SRR5479607_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/2cell_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479608_1.fastq -2 SRR5479608_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/2cell_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479609_1.fastq -2 SRR5479609_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/4cell_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479610_1.fastq -2 SRR5479610_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/4cell_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479611_1.fastq -2 SRR5479611_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/8cell_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479612_1.fastq -2 SRR5479612_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/8cell_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479613_1.fastq -2 SRR5479613_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/morula_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479614_1.fastq -2 SRR5479614_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/morula_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479615_1.fastq -2 SRR5479615_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/ICM_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479616_1.fastq -2 SRR5479616_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/ICM_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479617_1.fastq -2 SRR5479617_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TE_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479618_1.fastq -2 SRR5479618_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TE_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479623_1.fastq -2 SRR5479623_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/ESC_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479624_1.fastq -2 SRR5479624_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/ESC_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479625_1.fastq -2 SRR5479625_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TSC_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479626_1.fastq -2 SRR5479626_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TSC_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479627_1.fastq -2 SRR5479627_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/TSC_rep3.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479628_1.fastq -2 SRR5479628_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Epi_input.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479634_1.fastq -2 SRR5479634_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Epi_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479635_1.fastq -2 SRR5479635_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Epi_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479636_1.fastq -2 SRR5479636_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Exe_input.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479642_1.fastq -2 SRR5479642_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Exe_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479643_1.fastq -2 SRR5479643_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E65Exe_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479644_1.fastq -2 SRR5479644_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_input.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479650_1.fastq -2 SRR5479650_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479651_1.fastq -2 SRR5479651_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479652_1.fastq -2 SRR5479652_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_rep3.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479653_1.fastq -2 SRR5479653_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Epi_rep4.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479654_1.fastq -2 SRR5479654_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Exe_input.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479661_1.fastq -2 SRR5479661_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Exe_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479662_1.fastq -2 SRR5479662_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E75Exe_rep2.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479663_1.fastq -2 SRR5479663_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E85Epi_input.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479669_1.fastq -2 SRR5479669_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E85Epi_rep1.sam &bowtie2 -p 6 -q -x /data/chen/Data/reference/index/mm10/mm10 -1 SRR5479670_1.fastq -2 SRR5479670_2.fastq -S /data/chen/Data/GSE98149/chip-seq/h3k9me3/alignment/E85Epi_rep2.sam &
[/code]


## 2.4 搜峰(Peak_calling)### **MACS2**peaks calling:尋找可能的結(jié)合位點(diǎn),即基因組中大量reads富集的區(qū)域。#### 2.4.1 MACS2 核心: callpeak 用法```code# Example for regular peak callingmacs2 callpeak -t ChIP.bam -c Control.bam -f BAM -g hs -n test -B -q 0.01# Example for broad peak callingmacs2 callpeak -t ChIP.bam -c Control.bam --broad -g hs --broad-cutoff 0.1
[/code]```code# 批量callpeakmacs2 callpeak -c IgGold.bam -t suz12.bam -q 0.05 -f BAM -g mm -n suz12 2> suz12.macs2.log &macs2 callpeak -c IgGold.bam -t ring1B.bam -q 0.05 -f BAM -g mm -n ring1B 2> ring1B.macs2.log &macs2 callpeak -c IgGold.bam -t cbx7.bam -q 0.05 -f BAM -g mm -n cbx7 2> cbx7.macs2.log &macs2 callpeak -c IgGold.bam -t RYBP.bam -q 0.01 -f BAM -g mm -n RYBP 2>RYBP.macs2.log &
[/code]**-t** /–treatment FILENAME——處理組輸入
This is the only REQUIRED parameter for MACS. File can be in any supported
format specified by –format option. Check –format for detail. If you have more
than one alignment files, you can specify them as ` -t A B C ` . MACS will
pool up all these files together.**-c** /–control——對(duì)照組輸入
The control or mock data file. Please follow the same direction as for
-t/–treatment.**-f** /–format FORMAT——-t和-c提供文件的格式,目前MACS能夠識(shí)別的格式有 “ELAND”, “BED”,
“ELANDMULTI”, “ELANDEXPORT”, “ELANDMULTIPET” (雙端測(cè)序), “SAM”, “BAM”, “BOWTIE”,
“BAMPE”, “BEDPE”. 除”BAMPE”, “BEDPE”需要特別聲明外,其他格式都可以用 AUTO自動(dòng)檢測(cè)。如果不提供這項(xiàng),就是自動(dòng)檢測(cè)選擇。
Format of tag file, can be “ELAND”, “BED”, “ELANDMULTI”, “ELANDEXPORT”,
“ELANDMULTIPET” (for pair-end tags), “SAM”, “BAM”, “BOWTIE”, “BAMPE” or
“BEDPE”. Default is “AUTO” which will allow MACS to decide the format
automatically. “AUTO” is also usefule when you combine different formats of
files. Note that MACS can’t detect “BAMPE” or “BEDPE” format with “AUTO”, and
you have to implicitly specify the format for “BAMPE” and “BEDPE”.**-g** /–gsize——基因組大小,默認(rèn)提供了hs, mm, ce,
dm選項(xiàng),不在其中的話,比如說擬南芥,就需要自己提供了(擬南芥根據(jù)NCBI顯示是119,667,750,也就是1.2e8)。
PLEASE assign this parameter to fit your needs!
It’s the mappable genome size or effective genome size which is defined as the
genome size which can be sequenced. Because of the repetitive features on the
chromsomes, the actual mappable genome size will be smaller than the original
size, about 90% or 70% of the genome size. The default hs – 2.7e9 is
recommended for UCSC human hg18 assembly. Here are all precompiled parameters
for effective genome size:
hs: 2.7e9 (人類是2.7e9,也就是2.7G)
mm: 1.87e9
ce: 9e7
dm: 1.2e8**-n** /–name——輸出文件的前綴名。表示實(shí)驗(yàn)的名字, 請(qǐng)取一個(gè)有意義的名字。
The name string of the experiment. MACS will use this string NAME to create
output files like ‘NAME_peaks.xls’, ‘NAME_negative_peaks.xls’,
‘NAME_peaks.bed’ , ‘NAME_summits.bed’, ‘NAME_model.r’ and so on. So please
avoid any confliction between these filenames and your existing files.**-B** /–bdg
會(huì)保存更多的信息在bedGraph文件中,如fragment pileup, control lambda, -log10pvalue and
-log10qvalue scores。
If this flag is on, MACS will store the fragment pileup, control lambda,
-log10pvalue and -log10qvalue scores in bedGraph files. The bedGraph files
will be stored in current directory named NAME+’_treat_pileup.bdg’ for
treatment data, NAME+’_control_lambda.bdg’ for local lambda values from
control, NAME+’_treat_pvalue.bdg’ for Poisson pvalue scores (in -log10(pvalue)
form), and NAME+’_treat_qvalue.bdg’ for q-value scores from
Benjamini–Hochberg–Yekutieli procedure [
http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests
](http://en.wikipedia.org/wiki/False_discovery_rate#Dependent_tests)**-q** /–qvalue——q值,也就是最小的PDR閾值, 默認(rèn)是0.05。q值是根據(jù)p值利用BH計(jì)算,也就是多重試驗(yàn)矯正后的結(jié)果。
The qvalue (minimum FDR) cutoff to call significant regions. Default is 0.01.
For broad marks, you can try 0.05 as cutoff. Q-values are calculated from
p-values using Benjamini-Hochberg procedure.**-p** /–pvalue——p值,指定 p 值后 MACS2 就不會(huì)用 q 值了。
The pvalue cutoff. If -p is specified, MACS2 will use pvalue instead of
qvalue.**-m**
/–mfold——和MFOLD有關(guān),而MFOLD和MACS預(yù)構(gòu)建模型有關(guān),默認(rèn)是5:50,MACS會(huì)先尋找100多個(gè)peak區(qū)構(gòu)建模型,一般不用改,因?yàn)槟悴欢?
This parameter is used to select the regions within MFOLD range of high-
confidence enrichment ratio against background to build model. The regions
must be lower than upper limit, and higher than the lower limit of fold
enrichment. DEFAULT:5,50 means using all regions not too low (>5) and not too
high (<50) to build paired-peaks model. If MACS can not find more than 100
regions to build model, it will use the –extsize parameter to continue the
peak detection ONLY if –fix-bimodal is set.#### 2.4.2 callpeak 結(jié)果文件說明```code# (在當(dāng)前目錄下)統(tǒng)計(jì) *bed 的行數(shù)(peak數(shù))wc -l *bed2384 cbx7_summits.bed8342 ring1B_summits.bed0 RYBP_summits.bed7619 suz12_summits.bed# 在文件a中統(tǒng)計(jì) hello 出現(xiàn)的行數(shù):# grep hello a | wc -l# wc(word count)#-c 統(tǒng)計(jì)字節(jié)數(shù)。#-l 統(tǒng)計(jì)行數(shù)。line#-m 統(tǒng)計(jì)字符數(shù)。這個(gè)標(biāo)志不能與 -c 標(biāo)志一起使用。#-w 統(tǒng)計(jì)字?jǐn)?shù)。一個(gè)字被定義為由空白、跳格或換行字符分隔的字符串。#-L 打印最長(zhǎng)行的長(zhǎng)度。
[/code]callpeak會(huì)得到如下結(jié)果文件:**NAME_summits.bed** :Browser Extensible Data,記錄每個(gè)peak的peak
summits,話句話說就是記錄極值點(diǎn)的位置。MACS建議用該文件尋找結(jié)合位點(diǎn)的motif。能夠直接載入U(xiǎn)CSC
browser,用其他軟件分析時(shí)需要去掉第一行。**NAME_peaks.xls** :以表格形式存放peak信息,雖然后綴是xls,但其實(shí)能用文本編輯器打開,和bed格式類似,但是 **以1為基**
,而bed文件是以0為基.也就是說xls的坐標(biāo)都要減一才是bed文件的坐標(biāo)。**NAME_peaks.narrowPeak** , **NAME_peaks.broadPeak** 類似。后面4列表示為, integer score
for display, fold-change,-log10pvalue,-log10qvalue,relative summit position to
peak start。內(nèi)容和NAME_peaks.xls基本一致,適合用于導(dǎo)入R進(jìn)行分析。**NAME_model.r** :能通過 ` $ Rscript NAME_model.r ` 作圖,得到是基于你提供數(shù)據(jù)的peak模型。**.bdg** :能夠用 UCSC genome browser 轉(zhuǎn)換成更小的 bigWig 文件。#### 2.4.3 bdg file → wig file> 為了方便在IGV上查看ChIP-seq的結(jié)果和后期的可視化展示,需要把macs2的結(jié)果轉(zhuǎn)化為bw提供給IGV。一共分為三步:第一步:使用 bdgcmp 得到 **FE** 或者 **logLR 轉(zhuǎn)化后的文件** (Run MACS2 bdgcmp to generate
fold-enrichment and logLR track)```codemacs2 bdgcmp -t H3K36me1_EE_rep1_treat_pileup.bdg -c H3K36me1_EE_rep1_control_lambda.bdg -o H3K36me1_EE_rep1_FE.bdg -m FEmacs2 bdgcmp -t H3K36me1_EE_rep1_treat_pileup.bdg -c H3K36me1_EE_rep1_control_lambda.bdg -o H3K36me1_EE_rep1_logLR.bdg -m logLR -p 0.00001# 參數(shù)解釋# -m FE 計(jì)算富集倍數(shù)降低噪音# -p 為了避免log的時(shí)候input值為0時(shí)發(fā)生error,給予一個(gè)很小的值
[/code]第二步: **預(yù)處理** 文件,下載對(duì)應(yīng) **參考基因組染色體長(zhǎng)度** 文件使用conda安裝以下三個(gè)處理軟件:
bedGraphToBigWig
bedClip
bedtools下載染色體長(zhǎng)度文件: [
http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz
](http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/chromInfo.txt.gz)
并解壓(針對(duì)human,其余物種的皆可以按照類似網(wǎng)址下載)寫一個(gè)小小的sh腳本方便一步轉(zhuǎn)化name.sh:```code#!/bin/bash# check commands: slopBed, bedGraphToBigWig and bedClipwhich bedtools &>/dev/null || { echo "bedtools not found! Download bedTools: <http://code.google.com/p/bedtools/>"; exit 1; }which bedGraphToBigWig &>/dev/null || { echo "bedGraphToBigWig not found! Download: <http://hgdownload.cse.ucsc.edu/admin/exe/>"; exit 1; }which bedClip &>/dev/null || { echo "bedClip not found! Download: <http://hgdownload.cse.ucsc.edu/admin/exe/>"; exit 1; }# end of checkingif [ $# -lt 2 ];thenecho "Need 2 parameters! <bedgraph> <chrom info>"exitfiF=$1G=$2bedtools slop -i ${F} -g ${G} -b 0 | bedClip stdin ${G} ${F}.clipLC_COLLATE=C sort -k1,1 -k2,2n ${F}.clip > ${F}.sort.clipbedGraphToBigWig ${F}.sort.clip ${G} ${F/bdg/bw}rm -f ${F}.clip ${F}.sort.clip
[/code]chmod +x name.sh第三步:生成 **bw** 文件```code./name.sh H3K36me1_EE_rep1_FE.bdg hg19.len./name.sh H3K36me1_EE_rep1_logLR.bdg hg19.len
[/code]最后得到產(chǎn)物,至于的使用哪一個(gè)作為輸入文件大家就根據(jù)需要來吧H3K36me1EErep1_FE.bw
H3K36me1EErep1_logLR.bw## 2.5 峰注釋(Peak_anno)### ChIPseeker> ChIPseeker的功能分為三類:
> 注釋:提取peak附近最近的基因,注釋peak所在區(qū)域。
> 比較:估計(jì)ChIP peak數(shù)據(jù)集中重疊部分的顯著性;整合GEO數(shù)據(jù)集,以便于將當(dāng)前結(jié)果和已知結(jié)果比較。
> 可視化:peak的覆蓋情況;TSS區(qū)域結(jié)合的peak的平均表達(dá)譜和熱圖;基因組注釋;TSS距離;peak和基因的重疊。
```code# 加載ChIPseeker、基因組注釋包和bed數(shù)據(jù)biocLite("ChIPseeker")biocLite("org.Mm.eg.db")biocLite("TxDb.Mmusculus.UCSC.mm10.knownGene")library("ChIPseeker")# 下載source ("https://bioconductor.org/biocLite.R")biocLite("ChIPseeker")biocLite("org.Mm.eg.db")biocLite("TxDb.Mmusculus.UCSC.mm10.knownGene")biocLite("clusterProfiler")biocLite("ReactomePA")biocLite("DOSE")#載入library("ChIPseeker")library("org.Mm.eg.db")library("TxDb.Mmusculus.UCSC.mm10.knownGene")txdb <- TxDb.Mmusculus.UCSC.mm10.knownGenelibrary("clusterProfiler")# 讀入bed文件ring1B <- readPeakFile("F:/Data/ChIP/ring1B_peaks.narrowPeak")cbx7 <- readPeakFile("F:/Data/ChIP/cbx7_peaks.narrowPeak")RYBP <- readPeakFile("F:/Data/ChIP/RYBP2_summits.bed")suz12 <- readPeakFile("f:/Data/ChIP/suz12_peaks.narrowPeak")
[/code]Chip peaks coverage plot查看peak在全基因組的位置covplot(ring1B,chrs=c(“chr17”, “chr18”)) #specific chrring1Bsuz12cbx7RYBPring1B(chr17&18)● Heatmap of ChIP binding to TSS regionspromoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)
tagMatrix <- getTagMatrix(ring1B, windows=promoter)
tagHeatmap(tagMatrix, xlim=c(-3000, 3000), color=”red”)Average Profile of ChIP peaks binding to TSS region● Confidence interval estimated by bootstrap methodplotAvgProf(tagMatrix, xlim=c(-3000, 3000), conf = 0.95, resample = 1000)peak的注釋peak的注釋用annotatePeak**,TSS (transcription start site) region
可以自己設(shè)定,默認(rèn)是(-3000,3000),TxDb 是指某個(gè)物種的基因組,例如TxDb.Hsapiens.UCSC.hg38.knownGene,
TxDb.Hsapiens.UCSC.hg19.knownGene for human genome hg38 and hg19,
TxDb.Mmusculus.UCSC.mm10.knownGene and TxDb.Mmusculus.UCSC.mm9.knownGene for
mouse mm10 and mm9.peakAnno <- annotatePeak(ring1B, tssRegion=c(-3000, 3000),
TxDb=txdb, annoDb=”org.Mm.eg.db”)可視化 Pie and Bar plotplotAnnoBar(peakAnno)
vennpie(peakAnno)
upsetplot(peakAnno)餅圖:條形圖:upsetplot: upset技術(shù)適用于多于5個(gè)集合的表示情況。可視化TSS區(qū)域的TF binding lociplotDistToTSS(peakAnno,
title=”Distribution of transcription factor-binding loci\nrelative to TSS”)多個(gè)peak的比較多個(gè)peak set注釋時(shí),先構(gòu)建list,然后用lapply.
list(name1=bed_file1,name2=bed_file2)RYBP的數(shù)據(jù)有問題,這里加上去,會(huì)一直報(bào)錯(cuò)。peaks <- list(cbx7=cbx7,ring1B=ring1B,suz12=suz12)
promoter <- getPromoters(TxDb=txdb, upstream=3000, downstream=3000)
tagMatrixList <- lapply(peaks, getTagMatrix, windows=promoter)
plotAvgProf(tagMatrixList, xlim=c(-3000, 3000))
plotAvgProf(tagMatrixList, xlim=c(-3000, 3000), conf=0.95,resample=500,
facet=”row”)
tagHeatmap(tagMatrixList, xlim=c(-3000, 3000), color=NULL)ChIP peak annotation comparisionpeakAnnoList <- lapply(peaks, annotatePeak, TxDb=txdb,
tssRegion=c(-3000, 3000), verbose=FALSE)
plotAnnoBar(peakAnnoList)
plotDistToTSS(peakAnnoList)Overlap of peaks and annotated genesgenes= lapply(peakAnnoList, function(i) as.data.frame(i)$geneId)
vennplot(genes)