Skip to content

Latest commit

 

History

History
329 lines (285 loc) · 8.61 KB

examples.md

File metadata and controls

329 lines (285 loc) · 8.61 KB

GFAK CLI usage

Below are some examples of how to use the gfak command line tools to perform manipulations on GFA.

Importing a GFA file in GFA1 and converting it to GFA2

./gfak convert -S 2.0 data/v1.gfa

Produces the following output:

H	VN:Z:2.0
O	path1	1+ 4+ 5+ 2+
O	path2	1+ 4+ 5+ 6+ 3+
O	ref	1+ 2+ 3+
S	1	1	G
E	8	1+	2+	1$	1$	0	0	0M
E	9	1+	4+	1$	1$	0	0	0M
S	2	1	T
E	10	2+	3+	1$	1$	0	0	0M
S	3	1	G
S	4	1	C
E	11	4+	5+	1$	1$	0	0	0M
S	5	1	C
E	12	5+	2+	1$	1$	0	0	0M
E	13	5+	6+	1$	1$	0	0	0M
S	6	1	T
E	14	6+	3+	1$	1$	0	0	0M
./gfak convert -S 2.0 data/v1.gfa | md5sum: 268e075f19c7600304b51247b11e5f0f

Importing a GFA file in GFA2 and converting it to GFA1

    ./gfak convert -S 1.0 data/gfa_2.gfa
H	VN:Z:1.0
P	1p	12-,11+,32+,28-,20-,16+	140M,22M,140M,22M,81M,70M
P	2p	12-,8+,32-,31-,20-,16+,23-,16+	140M,22M,140M,22M,81M,70M,22M,70M
S	8	AAAGATAGAAAAGTGAGTGTAT
C	8	+	32	-	11	11M
S	11	AAAGATAGAAATACACGATGCG
C	11	+	32	+	11	11M
S	12	TTTCTATCTTTAATCGATAAAAGTAAAAAAATTGAGCAGTAGTATAAAATGAACTTGCGTTATAAAAAGGATTTTGTTATATTGTAGTAGTTGCTTGAATTATGACTAGATAATCAATGAGCTAATACGAGAATTTTAAT
C	12	-	11	+	0	11M
C	12	-	8	+	0	11M
S	16	AGAAATTACACACAAAGTTATACTATTTTTAGCAACATATTCACAGGTATTTGACATATAGAGAACTGAA
C	16	+	23	-	59	11M
S	20	GTGTAATTTCTAATTATCCACAATTCTGAAAACTATAAATGTGCATAAGTGGATAACTTTTCCTTCTATAGAATATCTGTT
C	20	-	16	+	0	11M
C	20	-	16	+	0	11M
S	23	GTGTAATTTCTTTCAGTTCTCT
C	23	-	16	+	0	11M
S	28	GAATATCTGTTAGTGAGTGTAT
C	28	-	20	-	0	11M
S	31	GAATATCTGTTTACACGATGCG
C	31	-	20	-	0	11M
S	32	TACACGATGCGAGCAATCAAATTTCATAACATCACCATGAGTTTGGTCCGAAGCATGAGTGTTTACAATGTTTGAATACCTTATACAGTTCTTATACATACTTTATAAATTATTTCCCAAGCTGTTTTGATACACTCACT
C	32	+	28	-	129	11M
C	32	-	31	-	0	11M
    ./gfak convert -S 1.0 data/gfa_2.gfa | md5sum: d7bb881a8880850acb2977efa28c7979

Sorting a GFA1 file

    ./gfak sort data/test.gfa
H	VN:Z:1.0
S	1	CGATGCAA
S	2	TGCAAAGTAC
S	3	TGCAACGTATAGACTTGTCAC	RC:i:4
S	4	GCATATA
S	5	CGATGATA
S	6	ATGA
L	1	+	2	+	5M
L	3	+	2	+	0M
L	3	+	4	-	1M1D2M1S
L	4	-	5	+	0M
C	5	+	6	+	2	4M
./gfak sort data/test.gfa | md5sum: 6dd44a9a0cc7308c7d6b92e8f0d9e648

Sorting a GFA2 file

    ./gfak sort data/gfa_2.gfa
H	VN:Z:2.0
S	8	22	AAAGATAGAAAAGTGAGTGTAT
S	11	22	AAAGATAGAAATACACGATGCG
S	12	140	TTTCTATCTTTAATCGATAAAAGTAAAAAAATTGAGCAGTAGTATAAAATGAACTTGCGTTATAAAAAGGATTTTGTTATATTGTAGTAGTTGCTTGAATTATGACTAGATAATCAATGAGCTAATACGAGAATTTTAAT
S	16	70	AGAAATTACACACAAAGTTATACTATTTTTAGCAACATATTCACAGGTATTTGACATATAGAGAACTGAA
S	20	81	GTGTAATTTCTAATTATCCACAATTCTGAAAACTATAAATGTGCATAAGTGGATAACTTTTCCTTCTATAGAATATCTGTT
S	23	22	GTGTAATTTCTTTCAGTTCTCT
S	28	22	GAATATCTGTTAGTGAGTGTAT
S	31	22	GAATATCTGTTTACACGATGCG
S	32	140	TACACGATGCGAGCAATCAAATTTCATAACATCACCATGAGTTTGGTCCGAAGCATGAGTGTTTACAATGTTTGAATACCTTATACAGTTCTTATACATACTTTATAAATTATTTCCCAAGCTGTTTTGATACACTCACT
F	11	1+	0	22$	129	151	11M
F	12	1-	0	140$	0	140	11M
F	12	2-	0	140$	0	140	11M
F	16	1+	0	70$	350	420$	11M
F	16	2+	0	70$	350	420	11M
F	16	2+	0	70$	420	490$	11M
F	20	1-	0	81$	280	361	11M
F	20	2-	0	81$	280	361	11M
F	23	2-	0	22$	409	431	11M
F	28	1-	0	22$	269	291	11M
F	31	2-	0	22$	269	291	11M
F	32	1+	0	140$	140	280	11M
F	32	2-	0	140$	140	280	11M
F	8	2+	0	22$	129	151	11M
E	34	8+	32-	11	22$	129	140$	11M
E	35	11+	32+	11	22$	0	11	11M
E	36	12-	11+	0	11	0	11	11M
E	37	12-	8+	0	11	0	11	11M
E	38	16+	23-	59	70$	11	22$	11M
E	39	20-	16+	0	11	0	11	11M
E	40	20-	16+	0	11	0	11	11M
E	41	23-	16+	0	11	0	11	11M
E	42	28-	20-	0	11	70	81$	11M
E	43	31-	20-	0	11	70	81$	11M
E	44	32+	28-	129	140$	11	22$	11M
E	45	32-	31-	0	11	11	22$	11M
O	1p	12- 11+ 32+ 28- 20- 16+
O	2p	12- 8+ 32- 31- 20- 16+ 23- 16+
    ./gfak sort data/gfa_2.gfa | md5sum: fa3b92296d3a23f9db99e611815788d4

Extracting FASTA records from a GFA file

    ./gfak extract data/gfa_2.gfa
>8
AAAGATAGAAAAGTGAGTGTAT
>11
AAAGATAGAAATACACGATGCG
>12
TTTCTATCTTTAATCGATAAAAGTAAAAAAATTGAGCAGTAGTATAAAATGAACTTGCGTTATAAAAAGGATTTTGTTATATTGTAGTAGTTGCTTGAATTATGACTAGATAATCAATGAGCTAATACGAGAATTTTAAT
>16
AGAAATTACACACAAAGTTATACTATTTTTAGCAACATATTCACAGGTATTTGACATATAGAGAACTGAA
>20
GTGTAATTTCTAATTATCCACAATTCTGAAAACTATAAATGTGCATAAGTGGATAACTTTTCCTTCTATAGAATATCTGTT
>23
GTGTAATTTCTTTCAGTTCTCT
>28
GAATATCTGTTAGTGAGTGTAT
>31
GAATATCTGTTTACACGATGCG
>32
TACACGATGCGAGCAATCAAATTTCATAACATCACCATGAGTTTGGTCCGAAGCATGAGTGTTTACAATGTTTGAATACCTTATACAGTTCTTATACATACTTTATAAATTATTTCCCAAGCTGTTTTGATACACTCACT
./gfak extract data/gfa_2.gfa | sort | md5sum: 43bbe8fee3f67fd90b90ee885ddb15e3  
cat data/no_seqs.fa | sort | md5sum: 43bbe8fee3f67fd90b90ee885ddb15e3

Replacing sequence placeholders with FASTA records

    ./gfak fillseq -f data/no_seqs.fa data/no_seqs.gfa
H	VN:Z:2.0
O	1p	12- 11+ 32+ 28- 20- 16+
O	2p	12- 8+ 32- 31- 20- 16+ 23- 16+
S	8	22	AAAGATAGAAAAGTGAGTGTAT
F	8	2+	0	22$	129	151	11M
E	34	8+	32-	11	22$	129	140$	11M
S	11	22	AAAGATAGAAATACACGATGCG
F	11	1+	0	22$	129	151	11M
E	35	11+	32+	11	22$	0	11	11M
S	12	140	TTTCTATCTTTAATCGATAAAAGTAAAAAAATTGAGCAGTAGTATAAAATGAACTTGCGTTATAAAAAGGATTTTGTTATATTGTAGTAGTTGCTTGAATTATGACTAGATAATCAATGAGCTAATACGAGAATTTTAAT
F	12	1-	0	140$	0	140	11M
F	12	2-	0	140$	0	140	11M
E	36	12-	11+	0	11	0	11	11M
E	37	12-	8+	0	11	0	11	11M
S	16	70	AGAAATTACACACAAAGTTATACTATTTTTAGCAACATATTCACAGGTATTTGACATATAGAGAACTGAA
F	16	1+	0	70$	350	420$	11M
F	16	2+	0	70$	350	420	11M
F	16	2+	0	70$	420	490$	11M
E	38	16+	23-	59	70$	11	22$	11M
S	20	81	GTGTAATTTCTAATTATCCACAATTCTGAAAACTATAAATGTGCATAAGTGGATAACTTTTCCTTCTATAGAATATCTGTT
F	20	1-	0	81$	280	361	11M
F	20	2-	0	81$	280	361	11M
E	39	20-	16+	0	11	0	11	11M
E	40	20-	16+	0	11	0	11	11M
S	23	22	GTGTAATTTCTTTCAGTTCTCT
F	23	2-	0	22$	409	431	11M
E	41	23-	16+	0	11	0	11	11M
S	28	22	GAATATCTGTTAGTGAGTGTAT
F	28	1-	0	22$	269	291	11M
E	42	28-	20-	0	11	70	81$	11M
S	31	22	GAATATCTGTTTACACGATGCG
F	31	2-	0	22$	269	291	11M
E	43	31-	20-	0	11	70	81$	11M
S	32	140	TACACGATGCGAGCAATCAAATTTCATAACATCACCATGAGTTTGGTCCGAAGCATGAGTGTTTACAATGTTTGAATACCTTATACAGTTCTTATACATACTTTATAAATTATTTCCCAAGCTGTTTTGATACACTCACT
F	32	1+	0	140$	140	280	11M
F	32	2-	0	140$	140	280	11M
E	44	32+	28-	129	140$	11	22$	11M
E	45	32-	31-	0	11	11	22$	11M
./gfak fillseq -f data/no_seqs.fa data/no_seqs.gfa | md5sum: caaf91eac390521d68d56bad57f7b3b3

Bumping the ID space of a graph.

    ./gfak ids -s 9:9:9 data/test.gfa
H   VN:Z:1.0
S   10  CGATGCAA
L   10  +   11  +   5M
S   11  TGCAAAGTAC
S   12  TGCAACGTATAGACTTGTCAC   RC:i:4
L   12  +   11  +   0M
L   12  +   13  -   1M1D2M1S
S   13  GCATATA
L   13  -   14  +   0M
S   14  CGATGATA
C   14  +   15  +   2   4M
S   15  ATGA
diff <(./gfak ids -s 9:9:9 data/test.gfa) <(cat data/re_id.gfa)

Merging two GFA files

cat data/re_id.gfa``` ./gfak merge -S 2.0 data/test.gfa data/gfa_2.gfa



H VN:Z:2.0 O 1p 12- 11+ 32+ 28- 20- 16+ O 2p 12- 8+ 32- 31- 20- 16+ 23- 16+ S 1 8 CGATGCAA E 8 1+ 2+ 8$ 8$ 0 0 5M S 2 10 TGCAAAGTAC S 3 21 TGCAACGTATAGACTTGTCAC RC:i:4 E 9 3+ 2+ 21$ 21$ 0 0 0M E 10 3+ 4- 21$ 21$ 0 0 1M1D2M1S S 4 7 GCATATA E 11 4- 5+ 7$ 7$ 0 0 0M S 5 8 CGATGATA E 12 5+ 6+ 2 6 0 4$ 4M S 6 4 ATGA S 8 22 AAAGATAGAAAAGTGAGTGTAT F 8 2+ 0 22$ 129 151 11M E 34 8+ 32- 11 22$ 129 140$ 11M S 11 22 AAAGATAGAAATACACGATGCG F 11 1+ 0 22$ 129 151 11M E 35 11+ 32+ 11 22$ 0 11 11M S 12 140 TTTCTATCTTTAATCGATAAAAGTAAAAAAATTGAGCAGTAGTATAAAATGAACTTGCGTTATAAAAAGGATTTTGTTATATTGTAGTAGTTGCTTGAATTATGACTAGATAATCAATGAGCTAATACGAGAATTTTAAT F 12 1- 0 140$ 0 140 11M F 12 2- 0 140$ 0 140 11M E 36 12- 11+ 0 11 0 11 11M E 37 12- 8+ 0 11 0 11 11M S 16 70 AGAAATTACACACAAAGTTATACTATTTTTAGCAACATATTCACAGGTATTTGACATATAGAGAACTGAA F 16 1+ 0 70$ 350 420$ 11M F 16 2+ 0 70$ 350 420 11M F 16 2+ 0 70$ 420 490$ 11M E 38 16+ 23- 59 70$ 11 22$ 11M S 20 81 GTGTAATTTCTAATTATCCACAATTCTGAAAACTATAAATGTGCATAAGTGGATAACTTTTCCTTCTATAGAATATCTGTT F 20 1- 0 81$ 280 361 11M F 20 2- 0 81$ 280 361 11M E 39 20- 16+ 0 11 0 11 11M E 40 20- 16+ 0 11 0 11 11M S 23 22 GTGTAATTTCTTTCAGTTCTCT F 23 2- 0 22$ 409 431 11M E 41 23- 16+ 0 11 0 11 11M S 28 22 GAATATCTGTTAGTGAGTGTAT F 28 1- 0 22$ 269 291 11M E 42 28- 20- 0 11 70 81$ 11M S 31 22 GAATATCTGTTTACACGATGCG F 31 2- 0 22$ 269 291 11M E 43 31- 20- 0 11 70 81$ 11M S 32 140 TACACGATGCGAGCAATCAAATTTCATAACATCACCATGAGTTTGGTCCGAAGCATGAGTGTTTACAATGTTTGAATACCTTATACAGTTCTTATACATACTTTATAAATTATTTCCCAAGCTGTTTTGATACACTCACT F 32 1+ 0 140$ 140 280 11M F 32 2- 0 140$ 140 280 11M E 44 32+ 28- 129 140$ 11 22$ 11M E 45 32- 31- 0 11 11 22$ 11M


./gfak merge -S 2.0 data/test.gfa data/gfa_2.gfa | md5sum: ca3b52673b63de931cd64a50669e7147