(a) BLAST
What is the impact on
• the speed of the heuristic
• the number of false negatives
• the number of false positives
of the following changes in BLAST parameters:-
(i) increase/decrease in w; where w is the length of words
(ii) increase/decrease T; where T is the least score to find list of words corresponding to each word from query sequence when scored using a pair-score matrix.
(iii) increase/decrease in S; where S is the threshold score after extension of alignment
(b) The higher the level of accuracy required in DNA sequences, more time consuming the process of database formation is. What is done to reduce this time? Does this bring in errors? Mention how accuracy is then improved.
(a) Discuss the advantages and disadvantages of progressive alignment, and high dimensional dynamic programming.
(b) Give an example of dynamic programming in bioinformatics other than sequence alignment. Explain how dynamic programming is applied and why it helps in your example.
(a) In order to calculate a multiple sequence alignment for N sequences, how many pair- wise alignments have to be calculated?
(b) Align the following using “star alignment” showing all intermediate steps:
S1= ATTCGGATT
S2= ATCCGGATT
S3= ATGGAATTTT
S4= ATGTTGTT
S5= AGTCAGG
(a) You have a protein of unknown function from a bacterium. You have made a knock- out mutant, but the bacteria die immediately without the corresponding gene. You have sequenced the protein. What steps would you take to guess the function of the protein? What kind of information would you look for?
(a) What is the difference between spotted and oligonucleotide microarrays?
(b) What is a probe? How are probes for microarrays designed?
(c) What is a probeset? What is probeset summarization and why do we need it?
(d) If a gene is shown to be induced four-fold in a microarray experiment, what would be the log2-transformed expression ratio?
(a) Why do you have to normalize microarray data to compare two conditions? Explain two normalization techniques that can be used here.
(b) Describe and discuss specific problems likely to appear on a microarray? Describe and discuss what measures can be taken to reduce or eliminate such effects from a data analysis point of view?
(a) What is the output obtained from a RNA-seq experiment? Why do you have to remove rRNA and tRNA before performing RNA-seq?
(b) Why is mapping of RNA-seq reads more difficult than mapping re-sequencing reads or ChIP-seq reads? Explain.
(c) What is Phred quality score? Explain its use in RNA-seq experiment.
(a) Why must the inside of a spectrometer kept at a high vacuum?
(b) How are molecular ions formed? What information could be obtained from
mass/charge value of a molecular ion?
(c) Define Ion Trap, ICR, Quadrupole and Octapole.
(a) A researcher is scanning a cDNA microarray and obtains an image with the following characteristics: most of the spots are visible and many are very bright; the background appears to be light gray. The researcher proceeds to the image processing and quantification stages and finds that most spots appear to be characterized by a high average intensity. Discuss what might have happened? What steps would you undertake in order to test your hypothesis and correct the situation?
(b) What experiment can be used as an alternative to microarray analysis? What are its advantages over the former?
(c) What is the difference between concordant, discordant and unmapped reads?
There are two hypothetical “one column” multiple sequence alignments. In the first alignment of N sequences, every residue is a tyrosine. In the second alignment, there are N-1 tyrosines and one proline.
(b) Given that the BLOSUM62 Y ↔ Y score is 7, calculate the score of the first alignment as a function of N. This is SY N (N).
(c) Given that the BLOSUM62 Y ↔ P score is -4, calculate the score of the second alignment as a function of N. This is SY N-1 P1 (N).(d) Evaluate and simplify the following expression, representing the fractional difference between the two different sequence alignments.
f (N) = [SY N (N) - SY N-1 P1 (N)]/ SY N (N)
(e) Construct a plot of the expression you derived in part (c), and explain why this scoring behavior is incorrect