5. Sample sheet

A CSV file describing samples, conditions and replicate number is required during the PM4NGS project creation.

PM4NGS will copy the sample sheet file to the folder data/{{dataset_name}} with the standard name sample_table.csv.

5.1. Single-end example

This is an sample sheet example for single-end sequencing technology data:

sample_name file condition replicate
SRR4011416 /net/rawdata/SRR4011416.fastq.gz Exp_O2_growth_no_rifampicin 1
SRR4011417 /net/rawdata/SRR4011417.fastq.gz Exp_O2_growth_no_rifampicin 2
SRR4011421 /net/rawdata/SRR4011421.fastq.gz Exp_O2_growth_rifampicin 1
SRR4011425 /net/rawdata/SRR4011425.fastq.gz Exp_O2_growth_rifampicin 2
SRR4011418 /net/rawdata/SRR4011418.fastq.gz Stat_02_growth_no_rifampicin 1
SRR4011419 /net/rawdata/SRR4011419.fastq.gz Stat_02_growth_no_rifampicin 2

Table source: sample_sheet_single_end.csv

5.2. Paired-end example

For paired-end sequencing technology data, the | should be used to separate forward and reverse fastq files:

sample_name file condition replicate
SRR2126784 http://myserver.net/SRR2126784_1.fastq.gz|http://myserver.net/SRR2126784_2.fastq.gz PRE_NACT 1
SRR2126785 http://myserver.net/SRR2126785_1.fastq.gz|http://myserver.net/SRR2126785_2.fastq.gz PRE_NACT 1
SRR2126786 http://myserver.net/SRR2126786_1.fastq.gz|http://myserver.net/SRR2126786_2.fastq.gz PRE_NACT 1
SRR2126787 http://myserver.net/SRR2126787_1.fastq.gz|http://myserver.net/SRR2126787_2.fastq.gz PRE_NACT 1
SRR3383790 http://myserver.net/SRR3383790_1.fastq.gz|http://myserver.net/SRR3383790_2.fastq.gz PRE_NACT 1

Source: sample_sheet_paired_end.csv

PM4NGS will copy or download the raw fastq files to the data/{{dataset_name}}/ directory during the project creation if the [--copy-rawdata] is used.

5.3. Processing data from the NCBI SRA

For data in the NCBI SRA database, the file column should be empty. PM4NGS will download the files during the pre-processing quality control step.

sample_name file condition replicate
SRR7549105   ES 1
SRR7549106   ES 2
SRR7549109   MEF 1
SRR7549110   MEF 2
SRR7549114   rB 1
SRR7549113   rB 2

Source: sample_sheet_sra.csv

5.4. Sample sheet column names and description

Note

Columns names are required and are case sensitive.

Columns

  • sample_name: Sample names. It can be different of sample file name.

  • file: This is the absolute path or URL to the raw fastq file.

    For paired-end data the files should be separated using the unix pipe | as SRR4053795_1.fastq.gz|SRR4053795_2.fastq.gz must exist.

    The data files will be copied to the folder data/{{dataset_name}}/.

  • condition: Conditions to group the samples. Use only alphanumeric characters.

    For RNASeq projects the differential gene expression will be generated comparing these conditions. If there are multiple conditions all comparisons will be generated. It must be at least two conditions.

    For ChIPSeq projects differential binding events will be detected comparing these conditions. If there are multiple conditions all comparisons will be generated. It must be at least two conditions.

    For ChIPexo projects the samples of the same condition will be grouped for the peak calling with MACE.

  • replicate: Replicate number for samples.