Skip to main content

*What would evolution do? using AI to study codon usage patterns*

Prof Rachel Kolodny ( Department of Computer Science, University of Haifa )

Evolution selects a codon sequence to encode each protein among exponentially many alternatives. Codon usage is not uniform, rather, it is under selection, e.g., to improve translation and co-translational folding, and it differs among organisms, even when focusing on homologous proteins. We study this selection by developing a transformer-based deep network to predict the evolutionarily selected codons in four organisms: the eukaryotes* S. cerevisiae* and *S. pombe*, and bacteria: *E. coli*, *B. subtilis*. We test our predictions on a large and varied set from the four organisms, which does not include close homologues to the train-set proteins. The network is not only trained on all four organisms, but also trained to mimic the codon usage from the codons of a homologous protein.

Our network predicts evolutionary-selected codons better than the frequency-based baseline in many cases. This shows that there are learnable correlation patterns among codons, and that these can be exploited to improve performance. The improvement is most significant in highly expressed proteins in all four organisms. Also, the improvement increases for longer proteins in *S. cerevisiae* and *B. subtilis* and is more pronounced for specific molecular functions. We also find that it is hard to exploit the codons in a homologous protein to further improve the prediction accuracy, especially when sequence identity is not very high. Interestingly, it is easier to learn to predict the codons more accurately in the two bacterial organisms. Studying codon usage patterns with contemporary AI methods offers a new perspective on the forces that shape codon usage and can be used to optimize a foreign codon sequence, with the goal of maximizing protein expression.

Joint work with Tomer Sidi (Haifa) and Tamir Tuller (TAU).

 

 

Share this: