5. Sample sheet¶

A CSV file describing samples, conditions and replicate number is required during the PM4NGS project creation.

PM4NGS will copy the sample sheet file to the folder data/{{dataset_name}} with the standard name sample_table.csv.

5.1. Single-end example¶

This is an sample sheet example for single-end sequencing technology data:

sample_name	file	condition	replicate
SRR4011416	/net/rawdata/SRR4011416.fastq.gz	Exp_O2_growth_no_rifampicin	1
SRR4011417	/net/rawdata/SRR4011417.fastq.gz	Exp_O2_growth_no_rifampicin	2
SRR4011421	/net/rawdata/SRR4011421.fastq.gz	Exp_O2_growth_rifampicin	1
SRR4011425	/net/rawdata/SRR4011425.fastq.gz	Exp_O2_growth_rifampicin	2
SRR4011418	/net/rawdata/SRR4011418.fastq.gz	Stat_02_growth_no_rifampicin	1
SRR4011419	/net/rawdata/SRR4011419.fastq.gz	Stat_02_growth_no_rifampicin	2

Table source: sample_sheet_single_end.csv

5.2. Paired-end example¶

For paired-end sequencing technology data, the | should be used to separate forward and reverse fastq files:

sample_name	file	condition	replicate
SRR2126784	http://myserver.net/SRR2126784_1.fastq.gz\|http://myserver.net/SRR2126784_2.fastq.gz	PRE_NACT	1
SRR2126785	http://myserver.net/SRR2126785_1.fastq.gz\|http://myserver.net/SRR2126785_2.fastq.gz	PRE_NACT	1
SRR2126786	http://myserver.net/SRR2126786_1.fastq.gz\|http://myserver.net/SRR2126786_2.fastq.gz	PRE_NACT	1
SRR2126787	http://myserver.net/SRR2126787_1.fastq.gz\|http://myserver.net/SRR2126787_2.fastq.gz	PRE_NACT	1
SRR3383790	http://myserver.net/SRR3383790_1.fastq.gz\|http://myserver.net/SRR3383790_2.fastq.gz	PRE_NACT	1

Source: sample_sheet_paired_end.csv

PM4NGS will copy or download the raw fastq files to the data/{{dataset_name}}/ directory during the project creation if the [--copy-rawdata] is used.

5.3. Processing data from the NCBI SRA¶

For data in the NCBI SRA database, the file column should be empty. PM4NGS will download the files during the pre-processing quality control step.

sample_name	condition	replicate
SRR7549105	ES	1
SRR7549106	ES	2
SRR7549109	MEF	1
SRR7549110	MEF	2
SRR7549114	rB	1
SRR7549113	rB	2

Source: sample_sheet_sra.csv

5.4. Sample sheet column names and description¶

Note

Columns names are required and are case sensitive.

Columns

sample_name: Sample names. It can be different of sample file name.
file: This is the absolute path or URL to the raw fastq file.

For paired-end data the files should be separated using the unix pipe | as SRR4053795_1.fastq.gz|SRR4053795_2.fastq.gz must exist.

The data files will be copied to the folder data/{{dataset_name}}/.
condition: Conditions to group the samples. Use only alphanumeric characters.

For RNASeq projects the differential gene expression will be generated comparing these conditions. If there are multiple conditions all comparisons will be generated. It must be at least two conditions.

For ChIPSeq projects differential binding events will be detected comparing these conditions. If there are multiple conditions all comparisons will be generated. It must be at least two conditions.

For ChIPexo projects the samples of the same condition will be grouped for the peak calling with MACE.
replicate: Replicate number for samples.