Hi @GDKO
First of all, thank you so much for developing this pipeline. It's very practical, and I had no problems with the installation and running.
I'm writing to ask you a couple of questions. I have a genome from an insect of the order Hemiptera, and I'd like to know if there are any horizontally transferred genes from bacteria and fungi to my insect. I ran it step by step as indicated in the wiki, but I'm wondering if I'm setting the parameters correctly. For example, for my groups, I set the file like this:
Ingroup:
33208: Metazoa
EGP:
7524: Hemiptera
- Is this configuration correct?
And the configurations file similar to:
---
max_threads: 30
# DB path
blast_db_path: nr # blast: change to the local blast_db path
fasta_path: uniprot_sprot.fasta # diamond: change to the local fasta path for sp, ur90, or custom database
mode: sp # use blast for blast database, use sp for swissprot database, ur90 for unif90 or custom database
data_type: AA # data type DNA, AA
## Algorithm options
# prepare
ai_cutoff: 0
ahs_cutoff: 0
outg_pct_cutoff: 80
selection: "ai or ahs" # select sequences based on which metrics, another example "(ai or ahs) and outg_pct"
percent_identity: 100 # select hits equal or below this number
cutoffextend: 20# when ingroup hit is found, we take this hit + n hits
number_hits_noingroup: 50 # when no ingroup hit is found, we take this number of hits
trimal: false
min_num_hits: 4 # select queries with at least that many blast hits
percentage_similar_hits: 0.7 # group queries based on this
# detect, classify, evaluate
fastml: true # Use fasttree instead of IQTree
node_support: 0 # nodes below that number will collapse
complex_per_ingroup: 20 # if D/(D+I) smaller than this then node is considered Ingroup
complex_per_donor: 80 # if D/(D+I) greater than this then node is considered Donor
complex_per_node: 90 # if node contains percent number of this category, it is assigned
# Program specific options
mafft_options: '--anysymbol --auto'
trimal_options: '-automated1'
#IQ-Tree
iqmodel: '-mset WAG,LG,JTT -AICc -mrate E,I,G,R'
ufbootstrap: 1000
iq_threads: 30
- My question is, should I adjust the ai_cutoff and ahs_cutoff values similar to the tutorial? What is your suggestion? I left them at 0.
Finally, after running each step I got the following:
(avp39) [enriquepet19961998@pitzer-login02 output_dir]$ ls -lh
total 707K
drwxr-xr-x 2 enriquepet19961998 PAS0471 32K Dec 19 16:11 alt_topology
-rw-r--r-- 1 enriquepet19961998 PAS0471 2.1K Dec 19 16:11 alt_topology_results.tsv
drwxr-xr-x 11 enriquepet19961998 PAS0471 4.0K Dec 19 15:28 classification
-rw-r--r-- 1 enriquepet19961998 PAS0471 232 Dec 19 15:28 classification_results.txt
-rw-r--r-- 1 enriquepet19961998 PAS0471 166K Dec 19 15:28 classification_tree_results.txt
drwxr-xr-x 2 enriquepet19961998 PAS0471 32K Dec 19 01:22 fastagroups
drwxr-xr-x 2 enriquepet19961998 PAS0471 32K Dec 19 01:33 fasttree
-rw-r--r-- 1 enriquepet19961998 PAS0471 177 Dec 19 01:34 fasttree_general_results.txt
drwxr-xr-x 2 enriquepet19961998 PAS0471 128K Dec 19 01:34 fasttree_nexus
-rw-r--r-- 1 enriquepet19961998 PAS0471 197K Dec 19 01:34 fasttree_tree_results.txt
-rw-r--r-- 1 enriquepet19961998 PAS0471 16K Dec 19 17:52 fasttree_tree_results.txt_k5.prox.txt
-rw-r--r-- 1 enriquepet19961998 PAS0471 35K Dec 19 01:22 groups.tsv
drwxr-xr-x 2 enriquepet19961998 PAS0471 32K Dec 19 01:26 mafftgroups
drwxr-xr-x 2 enriquepet19961998 PAS0471 4.0K Dec 19 01:21 tmp
- I understand that the final file for decision-making should be fasttree_tree_results.txt_k5.prox.txt?
(avp39) [enriquepet19961998@pitzer-login02 output_dir]$ wc -l fasttree_tree_results.txt_k5.prox.txt
82 fasttree_tree_results.txt_k5.prox.txt
Content:
HGT output_dir/mafftgroups/gp9.fa /fs/ess/test2/hgt/output_dir/fasttree/gp9.fa.fasttree Dinsect_Chr_1_g674.t1 NN-NN|X|H--NN 0.56
HGT output_dir/mafftgroups/gp9.fa /fs/ess/test2/hgt/output_dir/fasttree/gp9.fa.fasttree Dinsect_Chr_1_g674.t2 N-NNH|X|--NNN 0.56
HGT output_dir/mafftgroups/gp26.fa /fs/ess/test2/hgt/output_dir/fasttree/gp26.fa.fasttree Dinsect_Chr_8_g575.t1 NNNNN|X|HHNNN 0.6
HGT output_dir/mafftgroups/gp26.fa /fs/ess/test2/hgt/output_dir/fasttree/gp26.fa.fasttree Dinsect_Chr_8_g575.t3 NNNHH|X|NNNNN 0.6
HGT output_dir/mafftgroups/gp26.fa /fs/ess/test2/hgt/output_dir/fasttree/gp26.fa.fasttree Dinsect_Chr_8_g575.t2 NNNNH|X|HNNNN 0.6
HGT output_dir/mafftgroups/gp27.fa /fs/ess/test2/hgt/output_dir/fasttree/gp27.fa.fasttree Dinsect_Chr_6_g286.t1 NN-NN|X|NNNNN 0.92
HGT output_dir/mafftgroups/gp29.fa /fs/ess/test2/hgt/output_dir/fasttree/gp29.fa.fasttree Dinsect_Chr_1_g201.t1 NNNNN|X|NNN-- 0.84
HGT-NT output_dir/mafftgroups/gp42.fa /fs/ess/test2/hgt/output_dir/fasttree/gp42.fa.fasttree Dinsect_Chr_1_g1409.t1 NNNNN|X|NN--- 0.76
HGT-NT output_dir/mafftgroups/gp46.fa /fs/ess/test2/hgt/output_dir/fasttree/gp46.fa.fasttree Dinsect_Chr_6_g707.t1 NNH--|X|N---N 0.4
HGT-NT output_dir/mafftgroups/gp46.fa /fs/ess/test2/hgt/output_dir/fasttree/gp46.fa.fasttree Dinsect_Chr_3_g640.t1 --N--|X|N-N-- 0.44
Finally, when I run a blastp command on some sequences, for example Dinsect_Chr_1_g674.t1, blastp shows that it's identical to another insect:
Select seq gb|KAJ6626570.1| 1 member(s), 1 organism(s) flies Glycine betaine reductase ATRR [Pseudolycoriella hygida] 1598 1598 100% 0.0 59.15% 1271 KAJ6626570.1
- How should this result be interpreted? Several proteins from my insect, classified as HGT and with positive values, match those of other insects.
I would greatly appreciate your feedback.
Thanks so much.
Hi @GDKO
First of all, thank you so much for developing this pipeline. It's very practical, and I had no problems with the installation and running.
I'm writing to ask you a couple of questions. I have a genome from an insect of the order Hemiptera, and I'd like to know if there are any horizontally transferred genes from bacteria and fungi to my insect. I ran it step by step as indicated in the wiki, but I'm wondering if I'm setting the parameters correctly. For example, for my groups, I set the file like this:
And the configurations file similar to:
Finally, after running each step I got the following:
Content:
Finally, when I run a blastp command on some sequences, for example Dinsect_Chr_1_g674.t1, blastp shows that it's identical to another insect:
I would greatly appreciate your feedback.
Thanks so much.