Getting started
Installation
To use splitcode, you can install it via conda:
conda install -c bioconda splitcode
Or, to install it from source:
git clone https://github.com/pachterlab/splitcode
cd splitcode
mkdir build
cd build
cmake ..
make
make install
Alternately, one can download the binaries for Mac and Linux here: https://github.com/pachterlab/splitcode/releases
Note
make install
will not work unless you have permission to access the systems folders. In this case, after running the make
step in the build directory, one can simply find the splitcode binary at src/splitcode and use that directly.
Graphical User Interface (GUI)
To use splitcode’s GUI, please visit https://pachterlab.github.io/splitcode/
Note
This GUI simply serves as a sandbox to try out and test certain features.
Command-line structure
The command-line structure for running splitcode is as follows:
splitcode [arguments] fastq-files
A list of options can be viewed by running splitcode -h
.
The arguments you supply give splitcode instructions on what to do with your FASTQ files. Most often, you’d want to supply a config file to splitcode, specifying how you want your reads to be processed. You’d also want to supply an output option.
Overview
Barcodes
A permutation of tags identified within a read forms a unique barcode
. This generated barcode can thus be used to demultiplex reads based on the identified tags. This barcode is 16 base pairs in length and supplying --mapping=mapfile.txt
will output a file named mapfile.txt that maps the generated barcode with the tags (and their order).
Extraction
Sometimes important technical sequences are unknown (such as in the case of UMIs) and we need to pick them out from reads based on their absolute location within reads or based on their location relative to a tag. It is possible to isolate such sequences by using an extraction expression.
Output
Basic usage
Here, we demonstrate a basic usage example of splitcode where we search for the sequence ATCG and replace it with TTTT.
First, create a config file named config.txt
with the following contents:
id tag sub
id1 ATCG TTTT
Next, let’s create a sample FASTQ file called intro.fastq
with the following contents:
@read1
GGGATCGCCC
+
!!!!!!!!!!
@read2
ATCGTTTTTT
+
!!!!!!!!!!
Then, run the following:
splitcode -c config.txt --nFastqs=1 --pipe intro.fastq
The resulting output will be as follows:
@read1
GGGTTTTCCC
+
!!!KKKK!!!
@read2
TTTTTTTTTT
+
KKKK!!!!!!
As you can see from the output, the sequence ATCG has been replaced with TTTT. Also note that the quality scores are set to K
– every new nucleotide that splitcode inserts will always have this quality score. The --nFastqs=1
argument means that we’re only considering one FASTQ file as part of a set of reads. If we had two FASTQ files as part of our set of reads (as is the case with paired-end reads), we’d set that value to 2. The --pipe
argument means that we’re writing the results directly to standard output. If we wanted to write to a file called output.fastq, we would not use that argument; instead, we would supply -o output.fastq
.