.. _Nesting guide: Nesting config files ==================== Introduction ------------ Sometimes, you might want to perform operations on sequences that have been extracted, modified, or corrected with splitcode (i.e. go through another round of splitcoding). For example, one might want to extract sequences then correct them. This is possible by using a nested config file. Essentially, simply place ``@nest`` at the bottom of your config file and add config options after it. A simple example ---------------- Say we have the following FASTQ file (input.fastq): .. code-block:: text @read AAATTTTGGGGG + KKKKKKKKKKKK Let's say we want to replace the **AAA** at the beginning with **GGGGG** and then, in a second step, replace **GGGGG** with **CCC**. In other words the sequence goes from **AAATTTTGGGGG** to **GGGGGTTTTGGGGG** to **CCCTTTTCCC**. We would specify the following config file (config.txt): .. code-block:: text ids tags subs X AAA GGGGG @nest ids tags subs Y GGGGG CCC When we run the following command: .. code-block:: text splitcode -c config.txt --pipe input.fastq THe following will be printed out: .. code-block:: text @read CCCTTTTCCC + KKKKKKKKKK An extraction example --------------------- OK, we'll use the same input.fastq as the previous example, where the sample read sequence is **AAATTTTGGGGG**. .. code-block:: text :caption: input.fastq @read AAATTTTGGGGG + KKKKKKKKKKKK Let's say we want to do the following operations: 1) Extract 4 bp's after encountering AAA, 2) Error-correct the extracted sequence to the following scheme (AAAA becomes TTT; TTTT becomes GGG; CCCC becomes AAA; GGGG becomes CCC). We set up the following config.txt file: .. code-block:: text @extract {X} ids tags X AAA @nest ids tags subs locations Y1 AAAA TTT 0 Y2 TTTT GGG 0 Y3 CCCC AAA 0 Y4 GGGG CCC 0 OK, we set ``locations`` to be ``0``. Why? Because of ``@nest``, at the next level, the extracted sequence will become file #0 and the input read will become file #1. So, when we run: .. code-block:: text splitcode -c config.txt -o out_R1.fq,out_R2.fq input.fastq We'll get the following outputs: .. code-block:: text :caption: out_R1.fq @read GGG + KKK .. code-block:: text :caption: out_R2.fq @read AAATTTTGGGGG + KKKKKKKKKKKK Note that we specified two output files because, again, due to ``@nest``, at the next level, the extracted sequence (from the first level) became file #0 and the input read became file #1. Error-correcting extracted sequences to a list of barcodes ---------------------------------------------------------- OK, building off the above, let's reuse the input.fastq read sequence file: .. code-block:: text :caption: input.fastq @read AAATTTTGGGGG + KKKKKKKKKKKK Let's say we have the following barcodes list (b.txt): .. code-block:: text :caption: b.txt ATAT TCGA GAGG TATT And let's set the following config file, allowing for one mismatch via the ``distances`` column, and correcting it to its original sequence via the ``.`` value in ``subs``. We provide ``b.txt$`` (with the ``$`` after the file name to specify that each sequence in that file should be its own unique tag). .. code-block:: text @extract {X} ids tags subs X AAA GGGGG @nest ids tags subs locations distances Y b.txt$ . 0 1 So, when we run: .. code-block:: text splitcode -c config.txt -o out_R1.fq,out_R2.fq input.fastq We'll get the following outputs: .. code-block:: text :caption: out_R1.fq @read TATT + KKKK .. code-block:: text :caption: out_R2.fq @read AAATTTTGGGGG + KKKKKKKKKKKK As you can see the **TTTT** that was extracted from input.fastq was corrected (via one hamming distance) to **TATT**. You can do other stuff too, e.g. if you set ``minFinds`` to 1, the extracted sequences that did not match anything in ``b.txt`` will not be outputted.