Central Dogma and the Process of Protein Synthesis
“Central Dogma” term introduced by Francis Crick in the year 1958. [“central dogma” (n.d.)] After discovery of DNA structure, the function of this molecule was of interest to the scientist, so the process tells about the transferring of genetic material after that converting it into protein.
The central dogma is defined as a study of genetics and molecular biology with various exceptions that information related to genetic is coded in self-replicating DNA and it goes through unidirectional transfer to messenger RNA in transcription which act as templates for protein synthesis in translation. [“central dogma” (n.d.)]
According to the definition there are three stages involved in conversion of DNA to RNA to Protein.The steps are
Replication : Replication is the first step in protein synthesis where DNA gets unzip to form a new strand of DNA.
Transcription: Transcription is the second step in protein synthesis where the information coded in DNA is copied (transcribed) into mRNA.
Translation : Translation is the third step in protein synthesis where information gets encoded from mRNA is deciphered(translated) into sequences of amino acids. This process occurs in the ribosome.
DNA plays major role of inheriting the genetic information from one generation to another.
RNA helps in transferring the genetic information and reading the codes, through the molecules of Messenger RNA, Transfer RNA and Ribosomal RNA where it gets transferred.
Proteins are then formed at the end they are folded and are involved in various functions in body.
Biological Databases helps in toreing, organizing and analyzing the biological data. Biological data present in the form of sequences or structures of protein and nucleic acids.
Biological databases has two main functions: First, It makes scientists to get biological information from the databases and Secondly , makes it computer-readable form. There are two types of databases: sequence database and structure database. Sequence database having information for both nucleic acid and protein sequences, whereas the structure database having data about Proteins only.
Primary database provides information about the sequences or structures.
Secondary database retrives information from the primary database and makes available publicaly.
Composite database provides a variety of different primary database sources, which less the need to search multiple resources. [“Introduction to Biological Databases”(n.d.)]
Primary sequence databases are the most commonly used database for the scientific studies about any organisms. Primary databases are populated because it has experimentally-procure data, such as protein sequence, nucleotide sequence or macromolecular structure.
Three primary sequence database which are mostly used are:
- DDBJ(DNA Data Bank of Japan): It is the key nucleotide sequence data bank, which is officially certified database in Asia for collecting nucleotide sequences from various researchers. Further distribute this data to the internationally recognized accession number for data submitter.
Primary Sequence Databases: MBL, PDB, Pfam, and SCOP
MBL :The EMBL stands for European Molecular Biology Laboratory. It is the most important database for bioinformatics researchers which also provides services and developing. It also helps in maintaining various scientific databases, which are freely available.
- GenBank(National Center for Biotechnology Information): This database is for the nucleotide sequences. It has the annotation collection DNA sequences which are publicly available. It provides the most up-to-date data for scientific community. GenBank is one of the part of INSDC (International Nucleotide Sequence Database Collaboration), which along with other databases like The European Nucleotide Archive(ENA), DNA DataBank Of Japan (DDBJ) exchanges its data.
- c) List 3 protein structure databases with their respective URL’s.
1 PDB, Protein DataBank: It is a protein structure database which makes all PDB curates, annotates publicly available. It provides the information about 3D shapes of protein, nucleic acid and other complex structures. It helps researchers and others to explore the protein world which further helps in understanding the aspects of biomedicine and agriculture and helps in knowing the protein’s effect on health and diseases.
RCSB PDB gets funds from the NIH, the National Science Foundation (DBI-1338415), and the Department of Energy.
2 Pfam: It is a large database with huge collection of protein domains and families which is given in hidden Markov models (HMMs) and also in multiple sequence alignments. It helps in the identification of domain of proteins. Domain are the functional regions of proteins which provides insight into their function.
Pfam provides a full alignment information from variety of databases. Its data is based on the Uniprot references Proteomes. It helps in analyzing the protein sequences, view its alignments and annotations along with groups of the related entries. We can query in Pfam through accession or ID.
Pfam is freely available and accessible. It has provided a clan concept which means a collection of Pfam entries. It provides information from different different resources like primary literature, profile–profile comparisons, known structures, and other databases such as SCOP.
3 SCOP: SCOP is known as a Structural Classification of Proteins, which provides relationship between known protein structures with complete details and comprehensive description. It also describes the evolutionary relationships between near and distant proteins. The Pairwise sequence comparison method uses for finding the homologous proteins of structures which are known, by matching the sequences of proteins with unknown structure of protein with known structure but having least similarity. It provides information known or unknown both proteins.
SCOP has also classified the proteins: Family, Superfamily, Common fold and Class.
SCOP has been updated (updated 2018-04-08, stable release March 2018). Now this is SCOPe (Structural Classification of Proteins — extended) this is updated for maintenance of SCOP
KEGG GENOME Database: is a collection of KEGG organisms, these are organisms with complete genome sequences and selected viruses with relevance to diseases.
It also helps in the taxonomic classification which allows search and analysis of specific organism groups. KEGG helps in the hierarchal study of organism genome. It is used to compare the KEGG organisms to NCBI references.
Protein Structure Databases: MobiDB, SIFTS, and SASBDB
KEGG genome is supported by MGENOME which is collection of metagenome sequences.
GOLD: It is known as the Genomes Online Database which provides huge information about the genome and metagenome.
It gets genome from the sequencing project. It also connected with metadata, around the world.
MIG (Mouse Genome Informatics): MGI is the internationally available database.
This is for the laboratory mouse which providing combined of genetic, genomic, and biological data. This data further helps in studying of human health and disease.
Myosin is a motor protein, which plays a key role in muscle contraction.
It has a round head region with ribbon like tail region and this region has ATPase activities for actin simulation. It interacts with actin protein filament during contraction of muscle.
Myosin VI is the one of type of myosin protein and also known as MYO6. This molecular motor protein involved in inner vesicle and in the organelle transport. It maintains structural integrity of the inner part of cells of ear hairs. Myosin VI causes the hearing loss and non-syndromic autosomal dominant because of mutation. [ myosin VI [ Homo sapiens (human)]”( 2018) ]
- What are the major pathways involved and how many exons are present?
Major pathways involved in gene Myosin VI:-
- Gap junction degradation, organism-specific biosystem(from REACTOME)
- Gap junction trafficking, organism-specific biosystem(from REACTOME)
- Gap junction trafficking and regulation, organism-specific biosystem(from REACTOME)
- Glutamate Binding, Activation of AMPA Receptors and Synaptic Plasticity, organism-specific biosystem(from REACTOME)
- Membrane Trafficking, organism-specific biosystem(from REACTOME)
- Neuronal System, organism-specific biosystem(from REACTOME)
- Neurotransmitter Receptor Binding And Downstream Transmission In The Postsynaptic Cell, organism-specific biosystem(from REACTOME)
- Stabilization and expansion of the E-cadherin adherens junction, organism-specific biosystem(from Pathway Interaction Database)
- Trafficking of AMPA receptors, organism-specific biosystem(from REACTOME)
- Transmission across Chemical Synapses, organism-specific biosystem(from REACTOME)
Vesicle-mediated transport, organism-specific biosystem (from REACTOME) [MYO6 myosin VI (pasted from https://www.ncbi.nlm.nih.gov/gene/4646)]
- Analyze the report given in the results page and get the following information:
- Template used in building the model and sequence identity.
Template |
Seq Identity |
Description |
||
5aj4.19.A |
16.85% |
MITORIBOSOMAL PROTEIN MS26, MRPS2 |
||
2n11.1.A |
97.50% |
Unconventional myosin-VI |
||
2n12.1.A |
98.48% |
Unconventional myosin-VI |
||
2bki.1.A |
98.25% |
UNCONVENTIONAL MYOSIN |
||
4anj.1.A |
97.92% |
UNCONVENTIONAL MYOSIN-VI, GREEN FLUORESCENT PROTEIN |
||
2kia.1.A |
94.57% |
Myosin-VI |
Provide the sequence alignment of the target and the template. (Check the model report provided in the results page).
Selected ‘protein blast’ and pasted the protein sequence saved from Q2b then Execute the search with the ‘swissprot’ database excluding Homo sapiens.
- Pasted the results obtained from report:
We got results against Myosin VI after doing Protein BLAST. It is showing top 1 best 100 hits given by Protein BLAST which is shown in red lines. Its molecule type is amino acid. It is also showing the conserved domain.
Its showing alignments with different organisms like Pan troglodytes, homo sapiens, Pongo abelli, Mandrillus leucophaeus, Pan paniscus, Macaca mulatta.
- Translation of the scientific name of five matching sequences organisms into common names:
1 Pan troglodytes :- Chimpanzee
2 Pongo abelli :- Sumatran orangutan
3 Mandrillus leucophaeus :- Mainland Drill
4 Pan paniscus :- bonobo , pygmy chimpanzee
5 Macaca mulatta :- Rhesus monkey
BLAST result page for Gallus gallus (chicken) and its corresponding protein.
In this, we took protein ( unconventional myosin-VI ) Gallus gallus (taxid: 9031)
then take FASTA format file.
Save the file.
Paste the sequence in Swiss-Model and build a model
Comparing human protein structure with Gallus gallus structure:
We have generated two Swiss Model structure for Myosin IV protein, one for Human and other for Gallus gallus.
Both structures are same but not identical and in human, we got 6 SWISS MODEL. In case of Gallus gallus we got 9 SWISS MODEL.
In two models only, we got ligand, not in all models.
The structure, we compared has no ligands
Human model is showing the sequence identity – 98.25%
Gallus gallus model is showing the sequence identity- 93.36%
References:-
“Diagram for Central Dogma” (n.d)https://www.coursepics.com/lesson/central-dogma-theory/
“Central dogma” (n.d.). https://www.merriam-webster.com/medical/central%20dogma
“The Elaboration of the Central Dogma” (2014). https://www.nature.com/scitable/ebooks/the-elaboration-of-the-central-dogma-16553173
“Introduction to Biological Databases” (n.d.). https://iasri.res.in/ebook/win_school_aa/notes/Biological_Databases.pdf
Robert D. F., John T., Jaina M.,1 Penny C. C., Stephen J. S., Hans R.H., Goran C., Kristoffer F.,3 Sean R. E., Erik S. and Alex B. (2007). “The Pfam protein families database”. Nucleic Acids Research D281–D288. doi: 10.1093/nar/gkm960, https://www.ncbi.nlm.nih.gov/pmc/articles/PMC2238907/
Loredana L. C., Bart A., Tim J. P. H., Steven E. B., Alexey G. M., and Cyrus C. (2000) “SCOP: a Structural Classification of Proteins database”. Nucleic Acids Research 257–259. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC102479/
“DATABASES” (n.d.). https://www.bioinformaticsworld.com/database.htm
“The Structures of Life” (n.d.). https://publications.nigms.nih.gov/structlife/glossary.html#structuralgenomics
“Myosin and diagram ”(n.d.) https://www.ncbi.nlm.nih.gov/books/NBK21607/def-item/A7672/
“ myosin VI [ Homo sapiens (human)]”( 2018), https://www.ncbi.nlm.nih.gov/gene/4646