Prepare a table of complete 3 letter abbreviations of GenBank divisions (PRI, ROD, MAM, BCT etc.)
Access any flatfile from NCBI (The NCBI home page is http://www.ncbi.nlm.nih.gov ). Decode every information given in the accessed file
• What is the first line indicating
• What is the nature of the sequence
• Identify the version
• Is the data you have accessed is coding sequences or open reading frame? Which is the start and stop codon?
• Has it got untranslated regions?
• Has it been linked to the protein database? If connected, how many amino acids? What is the accession number?
• Is the information published?
Calculate the dynamic programming matrix and the optimal local and global alignment for the DNA sequences
a: GAATTC and b: GATTA,
scoring +2 for a match,
-1 for a mismatch,
and using a linear gap penalty function W(L) = -2L
Tiny openings or pores in plant tissue that allow for gas exchange
The PAM matrices are considered nonreciprocal, meaning that the probability of changing an amino acid such as alanine to arginine is not equal to the probability of changing an arginine to an alanine. Why?
Retrieve the following information of the given mouse genes : PGK1 , GAPDH , Alpha - globin , Insulin ; Gene ID , No. of Exons and Introns , CDS length & Introns length , Protein ID , Amino Acids sequence length . Present all the information in a tabular format. Sequences should be retrieved in both GenBank and Fasta Format.
For a given gene sequence, how do we find the 5' transcription start site. What is the % similarity to consensus initiator sequence responsible for transcription initiation. How do we identify and mark the binding site for TF 1 B.
CS444: BIOINFORMATICS (Assignment 1 - Lab)
(To be made handwritten)
The following transcript was found to be abundant in a human patient’s blood sample.
>Example
ACTCTTCTGGTCCCCACAGACTCAGAGAGAACCCACCATGGTGCTGTCTCCTGCCGACAAGACCAACGTC
AAGGCCGCCTGGGGTAAGGTCGGCGCGCACGCTGGCGAGTATGGTGCGGAGGCCCTGGAGAGGATGTTCC
TGTCCTTCCCCACCACCAAGACCTACTTCCCGCACTTCGACCTGAGCCACGGCTCTGCCCAGGTTAAGGG
CCACGGCAAGAAGGTGGCCGACGCGCTGACCAACGCCGTGGCGCACGTGGACGACATGCCCAACGCGCTG
TCCGCCCTGAGCGACCTGCACGCGCACAAGCTTCGGGTGGACCCGGTCAACTTCAAGCTCCTAAGCCACT
GCCTGCTGGTGACCCTGGCCGCCCACCTCCCCGCCGAGTTCACCCCTGCGGTGCACGCCTCCCTGGACAA
GTTCCTGGCTTCTGTGAGCACCGTGCTGACCTCCAAATACCGTTAAGCTGGAGCCTCGGTGGCCATGCTT
CTTGCCCCTTGGGCCTCCCCCCAGCCCCTCCTCCCCTTCCTGCACCCGTACCCCCGTGGTCTTTGAATAA
AGTCTGAGTGGGCGGCA
Q1:
Which BLAST program should we use in this case?
Sol:
Q2:
What are the names and accession numbers of the top ten hits from your BLAST search?
Sol:
Q3:
What are the percent identities for the top five hits?
Sol:
Q4:
How many identical and non identical nucleotides are there in your top hit compared to your last reported hit?
Sol:
Q5:
What is the “Official Symbol” and “Official Full Name” for this gene?
Sol:
Q6:
What is the “Lineage” for this gene?
Sol:
Q7:
What chromosome is this gene located on?
Sol:
Q8:
How many exons are annotated for this gene?
Sol:
Q9:
What is the function of the encoded protein?
Sol:
Q10:
Does the protein have a role in human disease(s)? If so, what diseases?
Sol:
CS444: BIOINFORMATICS (Assignment 1)
Q1: What is the complement to the DNA sequence given below?
5’-ACCAAACAAAGTTGGGTAAGGATAGATCAATCAATGATCATATTCTAGTACACTTAGGATTCAAGATCCT
ATTATCAGGGACAAGAGCAGGATTAGGGATATCCGAGATGGCCACACTTTTGAGGAGCTTAGCATTGTTC
AAAAGAAACAAGGACAAACCACCCATTACATCAGGATCCGGTGGAGCCATCAGAGGAATCAAACACATTA
TTATAGTACCAATTCCTGGAGATTCCTCAATTACCACTCGATCCAGACTACTGGACCGGTTGGTCAGGTT
AATTGGAAACCCGGATGTGAGCGGGCCCAAACTAACAGGGGCACTAATAGGTATATTATCCTTATTTGTG
GAGTCTCCAGGTCAATTGATTCAGAGGATCACCGATGACCCTGACGTTAGCATCAGGCTGTTAGAGGTTG
TTCAGAGTGACCAGTCACAATCTGGCCTTACCTTCGCATCAAGAGGTACCAACATGGAGGATGAGGCGGA
CCAATACTTTTCACATGATGATCCAAGCAGTAGTGATCAATCCAGGTCCGGATGGTTCGAGAACAAGGAA
ATCTCAGATATTGAAGTGCAAGACCCTGAGGGATTCAACATGATTCTGGGTACCATTCTAGCCCAGATCT
GGGTCTTGCTCGCAAAGGCGGTTACGGCCCCAGACACGGCAGCTGATTCGGAGCTAAGAAGGTGGATAAA
GTACACCCAACAAAGAAGGGTAGTTGGTGAATTTAGATTGGAGAGAAAATGGTTGGATGTGGTGAGGAAC
AGGATTGCCGAGGACCTCTCTTTACGCCGATTCATGGTGGCTCTAATCCTGGATATCAAGAGGACACCCG
GGAACAAACCTAGGATTGCTGAAATGATATGTGACATTGATACATATATCGTAGAGGCAGGATTAGCCAG
TTTTATCCTGACTATTAAGTTTGGGATAGAAACTATGTATCCTGCTCTTGGACTGCATGAATTTGCTGGT
GAGTTATCCACACTTGAGTCCTTGATGAATCTTTACCAGCAAATGGGAGAAACTGCACCCTACATGGTAA-3’
Q2: What is the mRNA sequence of the given DNA sequence in Q1?
Q3: What is the protein sequence formed from the mRNA sequence of Q2?
Q4: What will be the mRNA encoded sequence if all the “AT”s are mutated into “TA”s in Q1 DNA sequence?
Q5: What will be the protein sequence of the new mRNA sequence formed after Q4?