Kolmogorov-Smirnov Two-sample Test

The Kolmogorov-Smirnov two-sample test is a test of the null hypothesis that two independent samples have been drawn from the same population (or from populations with the same distribution). The test uses the maximal difference between cumulative frequency distributions of two samples as the test statistic.

The KS-test has the advantage of making no assumption about the distribution of data. (Technically speaking it is non-parametric and distribution free.) Note however, that this generality comes at some cost: other tests (for example Student’s t-test) may be more sensitive if the data meet the requirements of the test. In addition to calculating the D statistic, this page will report if the data seem normal or lognormal. (If it is silent, assume normal data at your own risk!) It will enable you to view the data graphically which can help you understand how the data is distributed.

Rsources:online tool, R

Posted in BioStatistics | Tagged , | Leave a comment

Common ordination techniques by category

Informal techniques
Indirect gradient analysis


Distance-based approaches

  • Polar ordination, PO (Bray-Curtis ordination)
  • Principal Coordinates Analysis, PCoA (Metric multidimensional scaling)
  • Nonmetric Multidimensional Scaling, NMDS

Eigenanalysis-based approaches
Linear model Continue reading

Posted in BioStatistics | Tagged | Leave a comment

Single-nucleotide polymorphism

A single-nucleotide polymorphism (SNP, pronounced snip) is a DNA sequence variation occurring when a single nucleotideA, T, C, or G — in thegenome (or other shared sequence) differs between members of a species (or between paired chromosomes in an individual). For example, two sequenced DNA fragments from different individuals, AAGCCTA to AAGCTTA, contain a difference in a single nucleotide. In this case we say that there are two alleles : C and T. Almost all common SNPs have only two alleles.

Within a population, SNPs can be assigned a minor allele frequency — the lowest allele frequency at a locus that is observed in a particular population. This is simply the lesser of the two allele frequencies for single-nucleotide polymorphisms. There are variations between human populations, so a SNP allele that is common in one geographical or ethnic group may be much rarer in another.

n the past, SNPs with a minor allele frequency of greater than or equal to 1% (or 0.5%, etc.) were given the title “SNP”.[1] Some used “mutation” to refer to variations with low allele frequency. With the advent of a better understanding of evolution, this definition is no longer necessary, e.g., a database such as dbSNP includes “SNPs” that have lower allele frequency than 1%.

Single nucleotide polymorphisms may fall within coding sequences of genes, non-coding regions of genes, or in the intergenic regions between genes. SNPs within a coding sequence will not necessarily change the amino acid sequence of the protein that is produced, due to degeneracy of the genetic code. A SNP in which both forms lead to the same polypeptide sequence is termed synonymous (sometimes called a silent mutation) — if a different polypeptide sequence is produced they are nonsynonymous. A nonsynonymous change may either be missense or nonsense, where a missense change results in a different amino acid, while a nonsense change results in a premature stop codon. SNPs that are not in protein-coding regions may still have consequences for gene splicing,transcription factor binding, or the sequence of non-coding RNA.

Genotyping provides a measurement of the genetic variation between members of a species. Single nucleotide polymorphisms (SNP) are the most common type of genetic variation. A SNP is a single base pair mutation at a specific locus, usually consisting of two alleles (where the rare allele frequency is ≥ 1%). SNPs are often found to be the etiology of many human diseases and are becoming of particular interest in pharmacogenetics. Because SNPs are evolutionarily conserved, they have been proposed as markers for use in quantitative trait loci (QTL) analysis and in association studies in place of microsatellites. The use of SNPs is being extended in the HapMap project, which is attempting to provide the minimal set of SNPs needed to genotype the human genome. SNPs can also provide a genetic fingerprint for use in identity testing (Rapley & Harbron 2004).

Posted in Molecular markers | Tagged | Leave a comment

Genetic load or genetic burden

In population genetics, genetic load or genetic burden is a measure of the cost of lost alleles due to selection (selectional load) or mutation(mutational load). It is a value in the range 0 < L < 1, where 0 represents no load. The concept was first formulated in 1937 by JBS Haldane, independently formulated, named and applied to humans in 1950 by H. J. Muller[1], and elaborated further by Haldane in 1957.

Genetic load is the reduction in selective value for a population compared to what the population would have if all individuals had the most favored genotype.[3] It is normally stated in terms of fitness as the reduction in the mean fitness for a population compared to the maximum fitness.

Related: mutation load

Mutation load is caused when a mutation at a locus produces a new allele of either lesser or greater fitness. This lowers the average fitness of the population; a deleterious mutation has a lower relative fitness, lowering average load, while an advantageous mutation effectively lowers the relative fitness of the existing allele, and thus also lowers average fitness.

Selection load

Selection occurs when the fitnesses of particular alleles are inequal, hence selection always exerts a load. With directional selection, the allele frequencies will tend towards an equilibrium position with the fittest allele reaching a frequency in mutation-selection balance. As mutations are rare, this is effectively fixation

Posted in Population genetics | Tagged , , | Leave a comment

Balancing selection and over-dominace selction

Balancing selection refers to a number of selective processes by which multiple alleles (different versions of a gene) are actively maintained in the gene pool of a population at frequencies above that of gene mutation. This usually happens when the heterozygotes for the alleles under consideration have a higher adaptive value than the homozygote.[1] In this way genetic polymorphism is conserved.

here are three main types of natural selection: In directional selection the allele frequency for a trait continuously shifts in one direction. In stabilizing selection the frequency of the alleles of lower fitness decreases until they vanish. Balancing selection is similar but not identical to disruptive selection where individuals of extreme trait values are favored against those with average trait values. These terms are used for quantitative characters controlled by a number of genes.

In overdominance selection, individuals, or more specifically, alleles, that are selected for are those in the heterozygous form. This is also called heterozygote advantage because it is believed that there is some advantage to being heterozygous at that particular locus.

Posted in Adaptive evolution | Tagged , | Leave a comment