TDG09

Installation

Download from here: http://www.homepages.ucl.ac.uk/~ucbtaut/

http://bit.ly/fVbnDr ⇒ tdg09.zip

unzip tdg09.zip

Detection of site in the Influenza Virus Hemagglutinin HA1 chain

We would like to know which sites are important between the Hemagglutinin HA1 of the virus in Avian (genes starting with Av_) and in Human (genes starting with Hu_).

cd ./Tutorial/tdg09  # Folder of installation

The execution of TDG09 with this dataset could take 20 minute or more, depending of the computation power.

Lauch TDG09 with this command (it takes time (20 minutes or more)):

With Linux or MacOSX, you can run the command like this:

./run.sh etc/H1.faa etc/H1.tree > tdg.out

With Windows, you need to execute the whole command:

java -cp lib\commons-lang-2.4.jar;lib\flanagan.jar;lib\pal-1.5.1.jar;dist\tdg09.jar models.MainAvHu09 etc\H1.faa etc\H1.tree

The output is put in the file tdg.out. We need to transform it with some unix tools (the four lines are one unique command): (/!\ These Unix tools need Cygwin to work in Windows /!\ If you don't have Cygwin, you can continue the tutorial by skip the R part. The sites are given at the end).

grep "Site\|Parameters\|Log\-likelihood" tdg.out \
| tr '\n' ' ' | sed "s/Site: /\\`echo -e '\n\r'`/g" \
| awk '{$1=$1}1' OFS=" " \
| cut -d' ' -f1,4,7,10,13 > tdg2.out

Now, we need to load this file tdg2.out into R

# Launch R

R

And put these commands:

tdg.out <- read.csv('tdg2.out', sep=' ', header=F)                                               # Load file
tdg.out <- tdg.out[!is.na(tdg.out$V2),]                                                          # Remove conserved sites
tdg.out$lrt <- pchisq(2 * (tdg.out$V5 - tdg.out$V3), df=(tdg.out$V4 - tdg.out$V2), lower.tail=F) # Perform likelihood test
tdg.out$fdr <- tdg.out$lrt * length(tdg.out$lrt) / rank(tdg.out$lrt)                             # Get false discovery rate (FDR)
tdg.out[tdg.out$fdr < 0.20, "V1"]             # Print all sites under FDR=20% (very relaxed)
tdg.out[tdg.out$fdr < 0.05, "V1"]             # Print all sites under FDR=5% (medium) 
tdg.out[tdg.out$fdr < 0.01, "V1"]             # Print all sites under FDR=1% (stringent) 
 [1]   2   9  62 130 155 168 169 173 177 202 203 204 212 239 252 253 275 276 286
[20] 289 300 303 315 325 416 421 460 471
Jalview

Load multiple alignment: H1.faa (you will need to remove the first line, as it is in Phylip format. ⇒ Sort by ID.

Load tree: H1.tree Put a vertical line a the root of the tree to split the tree in two.

⇒ Visualise the position of these sites.

CATH-Gene3D is a Global Biodata Core Resource Learn more...