5. Sample sheet¶
A CSV file describing samples, conditions and replicate number is required during the PM4NGS project creation.
PM4NGS will copy the sample sheet file to the folder data/{{dataset_name}} with the standard name sample_table.csv.
5.1. Single-end example¶
This is an sample sheet example for single-end sequencing technology data:
sample_name | file | condition | replicate |
---|---|---|---|
SRR4011416 | /net/rawdata/SRR4011416.fastq.gz | Exp_O2_growth_no_rifampicin | 1 |
SRR4011417 | /net/rawdata/SRR4011417.fastq.gz | Exp_O2_growth_no_rifampicin | 2 |
SRR4011421 | /net/rawdata/SRR4011421.fastq.gz | Exp_O2_growth_rifampicin | 1 |
SRR4011425 | /net/rawdata/SRR4011425.fastq.gz | Exp_O2_growth_rifampicin | 2 |
SRR4011418 | /net/rawdata/SRR4011418.fastq.gz | Stat_02_growth_no_rifampicin | 1 |
SRR4011419 | /net/rawdata/SRR4011419.fastq.gz | Stat_02_growth_no_rifampicin | 2 |
Table source: sample_sheet_single_end.csv
5.2. Paired-end example¶
For paired-end sequencing technology data, the | should be used to separate forward and reverse fastq files:
sample_name | file | condition | replicate |
---|---|---|---|
SRR2126784 | http://myserver.net/SRR2126784_1.fastq.gz|http://myserver.net/SRR2126784_2.fastq.gz | PRE_NACT | 1 |
SRR2126785 | http://myserver.net/SRR2126785_1.fastq.gz|http://myserver.net/SRR2126785_2.fastq.gz | PRE_NACT | 1 |
SRR2126786 | http://myserver.net/SRR2126786_1.fastq.gz|http://myserver.net/SRR2126786_2.fastq.gz | PRE_NACT | 1 |
SRR2126787 | http://myserver.net/SRR2126787_1.fastq.gz|http://myserver.net/SRR2126787_2.fastq.gz | PRE_NACT | 1 |
SRR3383790 | http://myserver.net/SRR3383790_1.fastq.gz|http://myserver.net/SRR3383790_2.fastq.gz | PRE_NACT | 1 |
Source: sample_sheet_paired_end.csv
PM4NGS will copy or download the raw fastq files to the data/{{dataset_name}}/ directory during the project creation if the [--copy-rawdata] is used.
5.3. Processing data from the NCBI SRA¶
For data in the NCBI SRA database, the file column should be empty. PM4NGS will download the files during the pre-processing quality control step.
sample_name | file | condition | replicate |
---|---|---|---|
SRR7549105 | ES | 1 | |
SRR7549106 | ES | 2 | |
SRR7549109 | MEF | 1 | |
SRR7549110 | MEF | 2 | |
SRR7549114 | rB | 1 | |
SRR7549113 | rB | 2 |
Source: sample_sheet_sra.csv
5.4. Sample sheet column names and description¶
Note
Columns names are required and are case sensitive.
Columns
sample_name: Sample names. It can be different of sample file name.
file: This is the absolute path or URL to the raw fastq file.
For paired-end data the files should be separated using the unix pipe | as SRR4053795_1.fastq.gz|SRR4053795_2.fastq.gz must exist.
The data files will be copied to the folder data/{{dataset_name}}/.
condition: Conditions to group the samples. Use only alphanumeric characters.
For RNASeq projects the differential gene expression will be generated comparing these conditions. If there are multiple conditions all comparisons will be generated. It must be at least two conditions.
For ChIPSeq projects differential binding events will be detected comparing these conditions. If there are multiple conditions all comparisons will be generated. It must be at least two conditions.
For ChIPexo projects the samples of the same condition will be grouped for the peak calling with MACE.
replicate: Replicate number for samples.