Genomics Analysis of the HIV Protease


Overview of the Life Cycle of HIV

The gene sequence, and directly related amino acid sequence of a protein can give tremendous insight in to the structure-function relationships of the protein. Where a significant number of genetic variants of the protein exist, the level of information that can be obtained raises dramatically by a careful analysis of the conservation of residues within the sequence.

Conservation can be at two levels:1] exact conservation, and 2] structural/functional conservation,[See amino acid structures and properties] where amino acid side chains with similar properties are conserved at a particular position in the protein sequence: the nature of such structural/functional conservation can also give important insight into the particular aspects of the structure that is being utilized. For example can only hydrophobic amino acids substitute?, must they be large hydrophobes? Can hydrophylic amino acids replace one another or must they conserve a negative or positive charge?

In the experimental data that you will analyze in this problem set many important points about HIV and its protease [a target for much current drug design] are illustrated. You are given the amino acid sequences of 26 HIV protease variants, in FASTA format in the appended file: DATA-HIV-Protease, together with the amino acid sequence of human pepsin [a stomach enzyme involved in protein degradation] The data in the FASTA format files includes the polyprotein overlaps at either end of the Protease sequence. The reference protease sequence to use: HIV1B is shown below:

You are also given a table of the structures and important properties of the 20 amino acid side chains [See amino acid structures and properties] which occur in most proteins and the genetic codon table which tells you which three base codons code for which amino acids.

Step 1:

Using the sequence of the protease from HIV 1B1 isolate as the reference construct a table of the amino acid variation at each position in the protease sequence. This is done by using Clustalw [] and the FASTA Format data in the data file. The data is cut and pasted into the Clustalw site it the appropriate data box. Be sure to enter your email address, and indicate in the color alignment box that you want color alignment: this will make it easier to view the results of the alignment.

Step 2:

Determine for which positions within the protease sequence there is absolute conservation or structural/functional conservation, making note of which properties are conserved from the table of amino acid structures and properties

For example: in the accompanying sequence alignment for a region of a thiol protease family residues that are completely conserved are indicated by * and essentially conserved by either : or ., while several types of structural/functional characteristics are indicated by color coding: red indicates hydrophobic characteristics, green hydrophilic characteristics, purple is positive charge, and blue positive charge.


Step 3: Remembering that the HIV Protease is an Aspartic Protease identify the active site residues

Step 4: Attempt to predict a function or location for each residue in the HIV Protease based upon the nature of its positional conservation of function or lack there-of.

Step 5: Download the HIV Protease file 1A30.pdb from the Protein Data Base  and using a graphics program such as Rasmol, Prekin/Mage, or VMD identify the various conserved residues to assess the implications of your findings

Step 6:

Using this information attempt to identify the active site residues of the human stomach protease Pepsin [note: the human enzyme is a single polypeptide chain and contains two active site Aspartate residues where-as the HIV protease is a Homo-Dimer (each chain is identical) with each chain containing a single active site Aspartate residue.

Step 7: Using the codon table supplied determine which positional changes in the HIV protease could have resulted from single base changes

Step 8: Using the regions adjacent to the protease sequence in the alignment file deduce what you can about the specificity of the HIV Protease


Using PUBMED identify residues in the HIV Protease which confer Drug Resistance: map these onto the three dimensional structure of the molecule. Postulate how such mutations might affect the activity of the protease and design experiments to test these predictions.

Using the sequence of one of the HIV Proteases "predict" the secondary structure of the protein using various algorithyms provided at

Compare the secondary structure predictions with the "reality" of the three dimensional structure. Using these comparisons together with the sequence alignmnets what can you deduce about the roles various types of amino acids play in secondary structure?