Mathematical Methods of Population Genetics

The mathematical methods of population genetics theory characterize quantitatively the gene distribution dynamics in evolving populations [1-3]. There are two types of models: deterministic and stochastic. Deterministic models are based on the approximation of an infinitely large population size. In this case the fluctuations of gene frequencies (in a gene distribution) can be neglected and the population dynamics can be described in terms of the mean gene frequencies. The stochastic models describe the probabilistic processes in finite size populations. Here we review very briefly the main equations and mathematical methods of population genetics by considering the most representative examples.

Deterministic models

Let's consider a population of diploid¹⁾ organisms with several alleles²⁾ A₁ , A₂,..., A_K in some locus³⁾. We assume that the organism fitness is determined mainly by the considered locus. Designating the number of organisms and the fitness of the gene pair A_i A_jby n_ij and W_ij, respectively, we can introduce the genotype and gene frequencies P_ij and P_i, as well as the mean gene fitnesses W_i in accordance with the expressions [1,2,4]:

P_ij = n_ij/n ,

P_i= S _jP_ij ,

W_i =P_i^-1 S _jW_ijP_ij ,

(1)

where n is the population size, index i refers to the class of organisms {A_i A_j}_j=1,2,...,
K, which contain the gene A_i. The population is supposed to be a panmictic⁴⁾ one: during reproduction the new gene combinations are chosen randomly throughout in the whole population. For panmictic populations the Hardy-Weinberg principle can be approximately applied [1]:

P_ij =P_i P_j ,

i, j = 1,...,K .

(2)

Eqs. (2) implies, that during mating the genotypes are formed proportionally to the corresponding gene frequencies.

The evolutionary dynamics of the population in terms of the gene frequencies P_ican be described by the following differential equations [1,2,4]:

dP_i /dt = W_i P_i - <W> P_i - S _ju_jiP_i + S _ju_ijP_j ,

i = 1,...,K ,

(3)

where t is time, <W> = S _ijW_ijP_ijis the mean fitness in a population; u_ijis the mutation rate of the transition A_j--> A_i, u_ii =0 (i, j = 1,..., K). The first term in the right side of Eqs. (3) characterizes the selection of the organisms in accordance with their fitnesses, the second term takes into account the condition S _iP_i = 1, the third and fourth terms describe the mutation transitions.

Note that similar equations are used in the quasispecies model (for the deterministic case) [5].

Neglecting the mutations, we can analyze the dynamics of genes in the population by means of the equations:

dP_i /dt = W_i P_i - <W> P_i ,

i = 1,...,K .

(4)

Using (1), (2), (4), one can deduce (under the condition that the values W_ijare constant), that the rate of increase for the mean fitness is proportional to the fitness variance V = S _i P_i (W_i - <W>)² [1,3]:

d<W>/dt = 2 S _i P_i (W_i - <W>)² .

(5)

In accordance with (4), (5), the mean fitness <W> always increases, until an equilibrium state (dP_i /dt = 0) is reached.

The equation (5) characterizes quantitatively The Fundamental Theorem of Natural Selection (R.A.Fisher, 1930), which in our case can be formulated as follows [3]:

In a sufficiently large panmictic population, where the organisms' fitness is determined by one locus and the selection pressure parameters are defined by the constant values W_ij, the mean fitness in a population increases, reaching a stationary value in some genetic equilibrium state. The increase rate of the mean fitness is proportional to the fitness variance; it becomes zero in an equilibrium state.

The described model is a simple example of the deterministic approach. The wide spectrum of analogous models, which describe the different particularities, concerning several gene loci, age and female/male distributions in a population, inbreeding, migrations, subdivisions of populations, were developed and investigated, especially in connection with concrete genetic data interpretations [1,3,4].

Stochastic models

Deterministic models provide effective methods for evolving population description. However, they use the approximation of an infinitely large population size, which is too strong for many real cases. To overcome this limitation, the probabilistic methods of population genetics were developed [1,3,4,6]. These methods include the analysis by means of Markov chains (especially, by using the generating functions) [4,7], and the diffuse approximation [1,3,4,6].

Below we sketch the main equations and some examples of the diffuse approximation. This approximation provides a non-trivial and effective method of population genetics.

We consider a population of diploid organisms with two alleles A₁ and A₂ in a certain locus. The population size n is supposed to be finite, but sufficiently large, so that the gene frequencies can be described by continuous values. We also suppose that the population size n is constant.

Let's introduce the function j = j (X,t|P,0) , which characterizes the probability density of the frequency X of the gene A₁ at the time moment t under condition that the initial frequency (at t = 0) of this gene is equal to P. Under the assumption that the changes of the gene frequencies at one generation are small, the populations dynamics can be described approximately by the following partial differential equations [1,3,4]:

śj/śt = - ś (M_d_Xj )/śX + (1/2) ś ²(V_d_Xj)/śX² ,

(6)

śj/śt = M_d_Pśj/śP + (1/2)V_d_P ś ²j/śP² ,

(7)

where M_d_X , M_d_P and V_d_X, V_d _Pare the mean values and the variances of the changes of the frequencies X, P during one generation; time unit is equal to one generation. Eq. (6) is the forward Kolmogorov differential equation (in physics it is called the Fokker-Planck equation); Eq. (7) is the backward Kolmogorov differential equation.

The first terms in the right sides of Eqs. (6), (7) describe a systematic selection pressure, which is due to the fitness difference of the genes A₁ and A₂. The second terms characterize the random drift of the frequencies, which is due to the fluctuations in the finite size population.

Using Eq. (6), one can determine the time evolution of the gene frequency distribution, Eq. (7) provides the means to estimate the probabilities of gene fixation.

Assuming that 1) the fitnesses of gene A₁ and A₂ are equal to 1 and 1-s, respectively and 2) the gene contributions to the fitnesses of the gene pairs A₁ A₁, A₁ A₂, and A₂ A₂ are additive, one can obtain, that the values M_d_X , M_d _P and V_d _X, V_d _Pare determined by the following expressions [1,3,4]:

M_d_X = sX(1-X) ,

M_d_P = sP(1-P) ,

V_d_X = X(1-X)/2n ,

V_d_P= P(1-P)/2n .

(8)

If the evolution is purely neutral (s = 0), Eq. (6) takes the form:

śj/śt = (1/4n) ś ²[X(1-X)j]/śX² .

(9)

This equation was solved analytically by M.Kimura [1,6]. The solution is rather complex. The main results can be summarized as follows: 1) only one gene (A₁ or A₂) is fixed in the final population, 2) the typical transition time from the initial gene frequency distribution to the final one is of the order of 2n generations. Note that these results agree with the results of a simple neutral evolution game.

Using Eq. (7), we can estimate the probability of the fixation of the gene A₁in the final population u(P). Considering the infinite time asymptotic, for the final population we can set śj/śt = 0. The probability to be found can be approximated by the value [1]: u(P) = j/2n (here u(P) = j dX, where dX = 1/2n is the minimal frequency change step in population, see also [3] for more rigorous consideration). Using this approximation and combining (7), (8), we obtain:

sdu /dP + (1/4n) d²u /dP² = 0 .

(10)

Solving this simple equation for the natural boundary conditions: u (1) = 1, u (0) = 0, we obtain the probability of gene A₁ fixation in a final population [1,3,6]:

u(P) = [1 - exp (- 4nsP)] [1 - exp (- 4ns)]^-1 .

(11)

This expression shows, that if 4ns << 1, the neutral gene fixation takes place: u(P) ť P, if 4ns >> 1, the advantageous gene A₁ is selected: u(P) ť 1; the population size n_c ~ (4s)^-1 is the boundary value, demarcating "neutral" and "selective" regions.

Conclusion

The mathematical models of population genetics describe the gene frequency distributions in evolving populations. The deterministic methods are used to analyze the mean frequency dynamics; the stochastic methods take into account the fluctuations, which are due to the finite population size.

Glossary:

¹⁾ Diploid organism: An individual having two chromosome sets in each of its cells.

²⁾ Allele: One of the different forms of a gene that can exist at a single locus.

³⁾ Gene locus: The specific place on a chromosome where a gene is located.

⁴⁾ Panmictic population: Random-mating population.

References:

1. J.F. Crow, M. Kimura. "An introduction to population genetics theory". New York etc, Harper & Row. 1970.

2. T. Nagylaki. "Introduction to theoretical population genetics ". Berlin etc, Springer Verlag. 1992.

3. Yu.M. Svirezhev, V.P. Pasekov. "Fundamentals of mathematical evolutionary genetics". Moscow, Nauka. 1982 (In Russian), Dordrecht, Kluwer Academic Publishers, 1990.

4. P.A.P. Moran. "The statistical processes of evolutionary theory", Oxford, Clarendon Press, 1962.

5. M. Eigen. Naturwissenshaften. 1971. Vol.58. P. 465. M. Eigen, P. Schuster. The Hypercycle: A principle of natural selforganization, Springer, Berlin, 1979

6. M. Kimura. "The neutral theory of molecular evolution". Cambridge Un-ty Press. 1983.

7. S. Karlin. "A first course in stochastic processes". New York, London, Academic Press. 1968.

Fisher R. A. The Genetical Theory of Natural Selection, 2nd edition, Dover Publications, New York, 1958.

Author
V.G. Red'ko

Date
Sep 9, 1998

Home

Metasystem Transition Theory

Evolutionary Theory

Mathematical Modeling of Evolution

General Models of Evolution

Up
Prev. Next
Down

Discussion

Add comment...