Increase number of reads during subset step

assigned to @jsabban

added Doing label

closed

reopened

nombre de read	taille séquences	taille en bytes	taille génome	couverture
50 000 000	150	37 500 000 000	3 000 000 000	5
5 000 000	150	3 750 000 000	3 000 000 000	1
500 000	150	375 000 000	5 000 000	30

Numbers of bytes of the fastq will be calculated using number of read in the subset and the read length.

Some lines to calculate the numbers of reads need according to read length, file weigth :

if ( params.sequencer =~ /NovaSeq.*/ ) {
    if ( params.n_samples >= params.large_sampling_threshold ) {
         params.nova_subset_seq = params.large_indexing_nova_subset_seq
    }
    params.bytes_subset_seq = params.nova_subset_seq.toBigInteger() * params.read_length * 5
    params.subset_seq = params.nova_subset_seq
} else {
    params.bytes_subset_seq = params.miseq_subset_seq.toBigInteger() * params.read_length * 5
    params.subset_seq = params.miseq_subset_seq
}
System.out.println "Seuil de taille de fichier pour subset : " + params.bytes_subset_seq + " bytes."
System.out.println "Nombre de reads pour subset : " + params.subset_seq + "."

In conflict with Seqtk sample process :

withName: SEQTK_SAMPLE {
    ext.args2 = params.subset_seq
}

params.subset_seq is not definded

New idea : do not try to split fastq according to their size, but run seqtk_sample process anyway. If fastq contains less than the read_number threshold, they will output as same.

mentioned in commit 40661eb5

removed Doing label

closed

Increase number of reads during subset step

Designs

Child items ...

Activity