EternaBot is a secondary structure algorithm created by players of Eterna project.

Eterna players have designed and experimentally tested over 700 sequences. Based the experimental results, they have proposed a set of rules for robust RNA design.

EternaBot is built to design a sequence based on those rules. It compiles each design rule as a scoring function. The bot then tries to create a sequence that maximizes the combination of the scoring functions.

Please contact Joseph Yesselman (jyesselm at unl dot edu) for technical support and bug reports.

Rules were selected with Least Angle Regression. Weights were determined by linear regression. Description shows the original design rule statement by participants. The scoring functions show the pseudocode of the scoring function coded from the corresponding design rule. In the scoring functions, the "Number [Number]" notation represents an originally proposed parameter and an optimized parameter respectively. Optimization was done using the downhill simplex algorithm provided by scipy.optomize.fmin to minimize the average squared error between the predicted and actual structure mapping for the training set).

“ Let's try out the Strategy Market feature with some simple criteria...

50% of pairs are UA

Free energy = –1.5 * number of pairs [e.g. 48 kcal if there are 32 pairs]

Melting point between 77 and 97°C „

\( \begin{array}{l} score\, =\, 100\\ score\, =\, score\, -\, abs(\frac{(\text{number of UA pairs})}{(\text{total number of pairs})}\, –\, 0.50 [0.42]) * 100.00 [93.04]\\ score\, =\, score\, -\, abs(–1.50 [–1.87] * (\text{total number of pairs})\, –\, (\text{free energy})) * 1.00 [1.15]\\ score\, =\, score\, –\, max (\\ \qquad 77.00 [63.60]\, –\, (\text{melting point}),\\ \qquad (\text{melting point})\, –\, 97.00 [102.00],\\ \qquad 0\\ ) * 1.00 [0.94]\\ \end{array} \)

Weight: 0.092

Original Proposal“ Melting Point between 97 and 107

Free Energy between –30 and –60

G bases of 22%

U bases of 13%

C bases of 20% „

\( \begin{array}{l} score\, =\, 100\\ score\, =\, score\, -\, abs((\text{number of Gs})/(\text{sequence length})\, -\, 0.22 [0.15])*100.00 [116.23]\\ score\, =\, score\, -\, abs((\text{number of Us})/(\text{sequence length})\, -\, 0.13 [0.07])*100.00 [129.23]\\ score\, =\, score\, -\, abs((\text{number of Cs})/(\text{sequence length})\, -\, 0.20 [0.19])*100.00 [117.62]\\ length\_weight\, =\, \frac{2.5}{\sqrt{2\Pi}}e^{\frac{-((\text{sequence length})\, -\, 100)^2}{5000}}\\ \text{if} (\text{free energy}) < –60.00 [–68.34]\\ \qquad score\, =\, score\, -\, abs((\text{free energy})\, -\, –60.00 [–68.34])*1.00 [1.12]*length\_weight\\ \text{if} (\text{free energy}) > –30.00 [–30.20]\\ \qquad score\, =\, score\, -\, abs((\text{free energy})\, -\, –30.00 [–30.20])*1.00 [1.12]*length\_weight\\ \text{if} (\text{melting point}) < 33.4\\ \qquad score\, =\, score\, -\, abs((melting point)\, -\, 97.00 [35.47])*1.00 [1.36]*length\_weight\\ \text{if} (\text{melting point}) > 107.00 [133.66]\\ \qquad score\, =\, score\, -\, abs((melting point)\, -\, 107.00 [133.66])*1.00 [1.36]*length\_weight\\ \end{array} \)

Weight: 0.36

Original Proposal“ I make a wish for a strategy that says:

All GC-pairs in the in multiloopjunctions, have to turn in same direction. (Red nucleotide to the right and green nucleotide to the left.) Exception: the GC-pair connecting multiloop and neck,are allowed to turn in both directions, without being penalized.

I would like to give –2 point for each wrong turning GC-pair. „

\( \begin{array}{l} score\, =\, 100\\ score\, =\, score\, -\, (\text{number of GC pairs in wrong directions adjacent to multiloops except those in the first stack from 5' end}) * 1.00 [5.71]\\ score\, =\, score\, -\, (\text{number of non-GC pairs adjacent to multiloops}) * 2.00 [6.63] \end{array} \)

Weight: 0.21

Original Proposal“ plot_score = (number of white cells in the upper triangle of the pairwise probabilities plot) / (total number of cells in the upper triangle of the pairwise probabilities plot)

cap_score = ((number of GC pairs that are at the end of a stack) + 0.5 * (number of GC pairs that are 1 away from the end of a stack)) / (3 * total number of stacks)

gc_penalty = 2 if 80% or more of the design's pairs are GC pairs, 0 otherwise.

A design's total score is: (2 + plot_score + cap_score - gc_penalty) * 25

The +2 and *25 are just to make it come out to between 0 and 100. „

\( \begin{array}{l} score\, =\, plot\_score * 1.00 [0.88]\, +\, cap\_score * 1.00 [1.05]\\ score\, =\, score\, -\, (\text{gc_penalty with GC pair threshold on 0.80 [0.83]}) * 2.00 [2.10]\\ score\, =\, (score\, +\, 2) * 25 \end{array} \)

Weight: 0.12

Original Proposal“ I would like to ad a strategy for numbers of yellow nucleotides allowed pr. lengt of string (neckarea excluded):

If a string/arm is this number of nucleotides long, then allow this number of yellow adenine. For each yellow nucleotide below the minimum or above the maximum, penalize with –2.

String length (yellow nucleotides)

3 (1-2) String eg. the bulged cross and the asymmetry

4 (1-2)

5 (1-3)

6 (2-3)

7 (3-4)

8 (2-5)

9 (1-4)

This could be used to rule out some of the cub scouts and a few christmas threes. „

\( \begin{array}{l} score\, =\, 100\\ \text{for each stack}\\ \qquad score\, =\, score\, –\, max(\\ \qquad\qquad (\text{number of AU pairs})\, –\, (\text{upper bound on number of AU pairs}),\\ \qquad\qquad (\text{lower bound on number of AU pairs})\, –\, (\text{number of AU pairs}),\\ \qquad\qquad 0\\ \qquad ) * 2.00 [10.50] \end{array} \)

Weight: 0.22

Original Proposal