Insilico Codon Bias Correction for Transgenic Biological Protein Sequences for Vaccine Production
Codon optimization is primarily used in enhancing the levels of protein expression in the host species. Each species has its own codon usage bias, which represents the codons abundance frequency in that species. Using the host usage profile contributes to personalize the synthesis of the DNA vaccines that can achieve highly active vectors the host cells. For optimizing protein expression levels in a particular host, the genetic code sequence needs correction of codon frequency bias to match the expression of host codon landscape rather than the donating organism profile. In this work, we have applied two approaches for optimizing codon usage in protein-coding sequences. The first approach adopts a substitution-based method to replace less frequent codons with differentially higher frequency codons at the specific codon usage bias tables. The second approach finds and replaces the maximal exact protein matches between the unoptimized sequence and the host proteome. We evaluated our work by optimizing the Avian Influenza H1N1 virus's HA gene to maximize the protein expression before synthesizing the DNA vaccine. Our method produced optimized sequences with higher GC content by 17%, which is similar to eukaryotic sequence profiles than the viral usage profiles, allowing for better expression in avian host cells. © 2020 IEEE.