# bulk ATAC-seq

```
https://www.bilibili.com/video/BV1C7411C7ez

https://mp.weixin.qq.com/mp/appmsgalbum?__biz=MzAxMDkxODM1Ng==&action=getalbum&album_id=3825619502127398912&subscene=&scenenote=https%3A%2F%2Fmp.weixin.qq.com%2Fs%3F__biz%3DMzAxMDkxODM1Ng%3D%3D%26mid%3D2247540277%26idx%3D1%26sn%3Db819604a31817864946caa499ed7459d%26key%3Ddaf9bdc5abc4e8d0de2d73f4892456cf32b499b4ea92519ad6c950ef1fc249cbe45fdb98ff8c123a25cd7703a8e3de48b49ddd759640b7e1fbe3ef16f6cf08eec26b79f07f0f03e3733159a7720cd2236c1b6ea8849004b7b0eb420b50e3c87d2abc78b6b26c2c86539e603e0000f17dffedff28206694301903aefed566facc%26ascene%3D1%26uin%3DMTc5NzA0NTcyMA%253D%253D%26devicetype%3DWindows%2B11%2Bx64%26version%3D63090c33%26lang%3Dzh_CN%26countrycode%3DCN%26exportkey%3Dn_ChQIAhIQbJU5m6igruRJ6ldletPEVBLfAQIE97dBBAEAAAAAAL%252FkAcU26FMAAAAOpnltbLcz9gKNyK89dVj0L75sCTSInqlwAZaJS%252F5VqX5Muxs0e8FMfzfOBo12Ubh8uX85KPKZ5IldxpmzI1apAz73tpJdrrxxf0%252BWz451%252Bu3IFZQRKshnFWuvklApnc6tvWaIcBoomSKkfktFHGTtBQYwzJ3eAcK71gWhBYvo1eysfXtmxYsmklGi6dWiAy3cD1UDXROhGZAdeWrKSZ4MfsG57tOsjmJssVHudyZ1ErNYmnXzVa8Ayoh7beGgoMZgjx7AoYke3Pk%253D%26acctmode%3D0%26pass_ticket%3D7Ah3R9bRL6k2akcxy2ybiQTF%252Fgh8OSxQxrahCM13N39OMQdH%252Bubn%252BTXY4MC4Ec5c%26wx_header%3D1%26fasttmpl_type%3D0%26fasttmpl_fullversion%3D7712138-zh_CN-zip%26fasttmpl_flag%3D1&nolastread=1&sessionid=-2017737171&scene=21#wechat_redirect
```

## 实验原理

ATAC-seq（Assay for Transposase-Accessible Chromatin using sequencing）是一种用于研究染色质可及性（chromatin accessibility）的高通量测序技术。它通过检测基因组中开放染色质的区域，揭示调控元件（如启动子、增强子等）的位置和活性，从而帮助理解基因表达的调控机制。

大致流程（参考文献：Disease-associated astrocyte epigenetic memory promotes CNS pathology）：

Sequencing libraries were prepared largely as described previously4,82,83. After isolation of nuclei, transposition was performed using the kit (Illumina, FC-121-1030). DNA was then amplified using NEBNext High Fidelity 2× PCR Master Mix (New England Biolabs, M0541S) for 5 cycles. DNA quantity was then measured using a Viia 7 Real-Time PCR System (Thermo Fisher Scientific) and the number of cycles required to achieve 1/3 of maximal SYBR Green fluorescence was determined and libraries were amplified accordingly. TruSeq adapters (universal: Ad1\_noMX and barcoded: Ad2.1-Ad2.24) were used according to the Buenrostro protocol. Libraries were purified using MiniElute PCR Purification Kit (Qiagen, 28006) followed by double-sided Agencourt AMPure XP bead purification (Beckman Coulter, A63881) to remove primer dimers and large DNA fragments. Libraries were analysed on a 2100 Bioanalyzer (Agilent Technologies) and High Sensitivity DNA Kit (Agilent Technologies, 5067-4626). Libraries were sequenced by Genewiz on an Illumina HiSeq 4000 by 2×75 bp paired end sequencing.

Paired end ATAC-seq reads were first assessed using FASTQC. Reads were trimmed using cutadapt to remove adapters and low quality reads below 30. Pair-end reads were then aligned against the GRCm39/mm10 mouse genome assembly with Bowtie (v2.3.0)84 in local mode, sensitive settings, and a maximum fragment size of 2,000. Duplicated reads were marked using Picard (v.2.5.0). Alignments were filtered with SAMtools (v1.3) to exclude reads with mapping quality <30, not properly paired, duplicated, aligned to mitochondrial genome, and/or aligned to ENCODE blacklist regions. Alignments with an insertion size of >100 bp were removed to enrich for nucleosome-free reads. ATAC-seq peaks were called for each replicate using MACS2, using --format BAMPE and --keep-dup all. IDR (v2.0.2) was used to determine consistency of peak detection between individual replicates and peaks with a threshold below 0.10 were merged between replicates for downstream analysis. For differential peaks, merged peaks were mapped to specific genic regions using bedtools intersect and reads were counted using subread featureCounts (v.1.6.2) to produce a count matrix. DESeq2 was then used to find differential peaks.

## 前期准备

```
/ bash

conda create -y -n atacseq

conda activate atacseq

conda install -n atacseq -c bioconda -c conda-forge \
    fastqc multiqc cutadapt bowtie2 samtools bedtools picard deeptools macs3 subread -y

conda install -y trim-galore

```

```
/ bash

cd ~/ATAC_data
ls 00.rawdata
#放入测序原始数据fq.gz文件

#新建文件夹
mkdir -p {01.qc,02.trim_nextera,03.align,04.bam,05.bw,06.peak,07.counts,08.diffbind,09.annotation,10.ref,11.scripts,12.logs}
nano samples.tsv

#填入
sample	group	r1	r2
KO1	KO	00.rawdata/KO1_1.fq.gz	00.rawdata/KO1_2.fq.gz
KO2	KO	00.rawdata/KO2_1.fq.gz	00.rawdata/KO2_2.fq.gz
KO3	KO	00.rawdata/KO3_1.fq.gz	00.rawdata/KO3_2.fq.gz
WT1	WT	00.rawdata/WT1_1.fq.gz	00.rawdata/WT1_2.fq.gz
WT2	WT	00.rawdata/WT2_1.fq.gz	00.rawdata/WT2_2.fq.gz
WT3	WT	00.rawdata/WT3_1.fq.gz	00.rawdata/WT3_2.fq.gz

#control+O,control+X

cat samples.tsv

```

## 数据过滤和质控

### Fastqc

每张图的含义：<https://mp.weixin.qq.com/s/YxuKrUvqqlNa3ZFk-MO7mQ>

```
/ bash

#对原始 FASTQ文件进行质控分析

mkdir -p 01.qc/raw_fastqc

#fastqc
for fq in 00.rawdata/*.fq.gz; do
    fastqc -t 4 -o 01.qc/raw_fastqc "$fq"
done

##multiqc
multiqc 01.qc/raw_fastqc -o 01.qc/raw_fastqc


#阅读html的报告，发现Adapter Content出现Nextera的adapter
```

高通量测序实验中，原始测序数据通常包含接头序列、低质量碱基或其他测序偏差。如果直接将这些原始数据用于下游分析，可能会影响比对效率和分析结果的准确性。因此，对测序数据进行质量控制（Quality Control, QC）和过滤（Filtering）是分析流程中必不可少的一步。目前有多种工具可用于数据过滤和质控，其中 Trim Galore 是常用的一款，结合了 Cutadapt 的接头去除功能和 FastQC 的质量评估功能，操作简单且效果可靠。除了Trim Galore，也常使用 fastp 对测序数据进行过滤和质控。

### Trim-Galore

```
/ bash

#写去接头脚本

mkdir -p 01.qc/trim_fastqc_nextera 02.trim_nextera 12.logs

nano 11.scripts/01.trim_galore_nextera.sh

#写入

#!/usr/bin/env bash
set -uo pipefail

cd ~/ATAC_data

mkdir -p 01.qc/trim_fastqc_nextera 02.trim_nextera 12.logs

tail -n +2 samples.tsv | while IFS=$'\t' read -r sample group r1 r2; do
    echo "[$(date)] Trimming ${sample} ..."

    if trim_galore \
        --paired \
        --quality 20 \
        --length 20 \
        --stringency 3 \
        --nextera \     #是Illumina还是nextera的adapter
        --fastqc \
        --fastqc_args "-o 01.qc/trim_fastqc_nextera" \
        --dont_gzip \
        -o 02.trim_nextera \
        "$r1" "$r2" \
        > "12.logs/${sample}.nextera_trim.log" 2>&1
    then
        echo "[$(date)] Finished ${sample}"
    else
        echo "[$(date)] ERROR in ${sample}" | tee -a 12.logs/trim_nextera.failed.log
    fi
done

##ctrl+o写入，回车，control+x离开

chmod +x 11.scripts/01.trim_galore_nextera.sh
bash 11.scripts/01.trim_galore_nextera.sh
```

{% hint style="info" %}
此处需要注意接头修剪相关参数：<https://mp.weixin.qq.com/s/MKMLJpMMAoirDmY_QqDHCg>

需要确认接头adapter是Illumina 通用 / Nextera / small RNA 接头
{% endhint %}

```
/ bash
#跑完后如果你还想压缩，可以再手工压缩：
find 02.trim_nextera -name "*.fq" -print0 | xargs -0 -n 1 -P 4 gzip

multiqc 01.qc/trim_fastqc_nextera -o 01.qc/trim_fastqc_nextera


#更新 trimmed 样本表
(
echo -e "sample\tgroup\tr1\tr2"
tail -n +2 samples.tsv | while IFS=$'\t' read -r sample group r1 r2; do
    r1_base=$(basename "$r1" .fq.gz)
    r2_base=$(basename "$r2" .fq.gz)
    echo -e "${sample}\t${group}\t02.trim_nextera/${r1_base}_val_1.fq.gz\t02.trim_nextera/${r2_base}_val_2.fq.gz"
done
) > samples.trim.tsv

cat samples.trim.tsv
```

## 序列比对

### 获取index

根据物种种属，从<https://bowtie-bio.sourceforge.net/bowtie2/manual.shtml> 下载并解压，置于10.ref/bowtiw2\_index/。

### Bowtie2序列比对

```
/ bash
bowtie2 \
--very -sensitive \
-X 2000 \
--no-mixed \
--no-discordant \
-p 24 \
-x 10.ref/bowtiw2_index/GRCh38_noalt_as \
-1 02.trim_nextera/KO1_1_val_1.fq.gz \
-2 02.trim_nextera/KO1_2_val_2.fq.gz \
-S 03.align/KO1.sam \
2> 12.logs/KO1.bowtie2.log

##看日志
cat 12.logs/KO1.bowtie2.log
```

{% hint style="info" %}
`#-x：由bowtie2-build所生成的mm10索引文件的前缀`

`#--local：使用--local的比对模式，对read进行局部比对`

`#--very-sensitive：-D 20 -R 3 -N 0 -L 20 -i S,1,0.50`

`#-X：最长的插入片段长度`

`#-p：线程数`

`#-1：双端测序的文件1`

`#-2：双端测序的文件2`

`#-S：所生成的sam格式的文件前缀`
{% endhint %}