Remultiplexing

What if you have a set of FASTQ files where each FASTQ file (or FASTQ file pair) is a single cell? This can be difficult to feed into programs that expect barcodes to be associated with each cell rather than having each cell be in a separate file.

Therefore, splitcode has a remultiplexing option where you can simply supply a batch file containing all the individual FASTQ files and splitcode will assign “fake” barcodes for each file supplied.

To do this, first create a batch.txt file as follows:

cell1  a_R1.fastq.gz a_R2.fastq.gz
cell1  b_R1.fastq.gz b_R2.fastq.gz
cell2  c_R1.fastq.gz c_R2.fastq.gz
cell3  d_R1.fastq.gz d_R2.faastq.gz

Now simply run splitcode with the --remultiplex option as follows:

splitcode --remultiplex -o out_R1.fastq.gz,out_R2.fastq.gz --outb=barcodes.fastq.gz batch.txt

The output files out_R1.fastq.gz and out_R2.fastq.gz will contain all the input FASTQ files from batch.txt concatenated together. The barcodes.fastq.gz file will contain unique barcodes corresponding to each cell, as specified in the batch.txt file. Note that cell1 appears twice in the batch.txt file; therefore both pairs of files (beginning with a_ and b_) will have their reads associated with the same barcode. The barcodes will be 16-bp in length and will look like AAAAAAAAAAAAAAAA (for cell1), AAAAAAAAAAAAAAAC (for cell2), and AAAAAAAAAAAAAAAG (for cell3).

There are multiple ways to output the barcodes that are made from remultiplexing:

  • You can supply the --outb option to have the barcodes supplied in a separate file (as was done above).

  • You can supply the --bc-names option to put the barcode after the BC:Z: SAM tag in the read header.

  • You can supply the --com-names option to put the unique numerical ID after the BI:i: SAM tag in the read header.

  • You can supply the --pipe option so that the barcode is interleaved with the reads.

  • You can omit the --outb option which will cause, by default, the barcode to appear at the very beginning of each read in the first (R1) read file.

  • You can use the --bclen option to change the barcodes to be of lengths other than 16-bp.