Gravitation field algorithm and its application in gene cluster

biomed - Wang , Zheng Ming , Liu Gui-Xia , Zhou Chun-Guang , Liang Yan-Chun , Wang Yan

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

11 pages

English

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

A propos
Informations
Extrait

Description

Searching optima is one of the most challenging tasks in clustering genes from available experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used by biotechnologists. All these algorithms are based on the imitation of natural phenomena. Results This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA) which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM) of planetary formation. GFA simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem. And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene Expression Omnibus. Conclusions The mathematical proof demonstrates that GFA could be convergent in the global optimum by probability 1 in three conditions for one independent variable mass functions. In addition to these results, the fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and the inherent defects in SA and GA. Some results and source code (in Matlab) are publicly available at http://ccst.jlu.edu.cn/CSBG/GFA .

Informations

Publié par	biomed
Publié le	01 janvier 2010
Nombre de lectures	5
Langue	English

Extrait

Zheng et al. Algorithms for Molecular Biology 2010, 5:32
http://www.almob.org/content/5/1/32
RESEARCH Open Access
Gravitation field algorithm and its application in
gene cluster
† * * * †Ming Zheng , Gui-xia Liu , Chun-guang Zhou , Yan-chun Liang , Yan Wang
Abstract
Background: Searching optima is one of the most challenging tasks in clustering genes from available
experimental data or given functions. SA, GA, PSO and other similar efficient global optimization methods are used
by biotechnologists. All these algorithms are based on the imitation of natural phenomena.
Results: This paper proposes a novel searching optimization algorithm called Gravitation Field Algorithm (GFA)
which is derived from the famous astronomy theory Solar Nebular Disk Model (SNDM) of planetary formation. GFA
simulates the Gravitation field and outperforms GA and SA in some multimodal functions optimization problem.
And GFA also can be used in the forms of unimodal functions. GFA clusters the dataset well from the Gene
Expression Omnibus.
Conclusions: The mathematical proof demonstrates that GFA could be convergent in the global optimum by
probability 1 in three conditions for one independent variable mass functions. In addition to these results, the
fundamental optimization concept in this paper is used to analyze how SA and GA affect the global search and
the inherent defects in SA and GA. Some results and source code (in Matlab) are publicly available at http://ccst.jlu.
edu.cn/CSBG/GFA.
Background themselves, to find all the valleys of given functions. But
Two of the most challenging tasks of optimization algo- we still have a lot of parameters to consider, as known
rithms are to search the global optimum and to find all as the number of valleys and the valley distance etc, and
local optima of the space of solutions in clustering the performances of the modifications are not good
genes from available experimental data [1], e.g. the gene enough.
expression profiles, or given functions. In view of recent GA is a traditional searching-optimization algorithm,
technological developments for large-scale measure- this algorithm can search global optimal solution with
ments of DNA expression level, these two problems can probability criterion [5], but it can’t converge to the
glooften be formulated and many methods have been pro- bal optimal solution in the theory [6]. So GA always
posed. In particular, the heuristic searches are more pro- traps in local optima or genetic drift. Anyway, the
runmising than other kinds of searching approaches. These time of this algorithm is acceptable for most cases of
approaches include GA (genetic algorithm) [2], SA searching-optimization problems.
(simulated annealing) [3], PSO (Particle Swarm Optimi- SA is a generic probabilistic meta-algorithm for global
zation) [4] etc. But some inherent drawbacks, especially optimization problems, namely locating a good
approxithe inability to the multi-modal functions optimization, mation to the global optimum of a given function in a
can be found from the traditional heuristic search algo- large search space. If we search enough time with SA, it
rithms above. Each of these concepts allows for several will converge in the global optimal solution with
probmodifications, which multiplies the number of possible ability 1 [7]. But the biggest drawback of SA is that the
models for data analysis we can change the algorithm running-time of SA is so long that the efficiency is not
tolerant to us.
Recently, some other searching algorithms have been
* Correspondence: liugx@jlu.edu.cn; cgzhou@jlu.edu.cn; ycliang@jlu.edu.cn
proposed and discussed, such as PSO etc. These algo-† Contributed equally
College of Computer Science and Technology, Jilin University, Changchun, rithms can search solution like GA. But actually, most
130012, China
© 2010 Zheng et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons
Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in
any medium, provided the original work is properly cited.Zheng et al. Algorithms for Molecular Biology 2010, 5:32 Page 2 of 11
http://www.almob.org/content/5/1/32
of them will not converge to the global optimal solution positions of the dusts can be random or based on the
either. prior knowledge (i.e. there will be greater probability to
In this paper, we propose a new heuristic search exist peak values in some positions). Then we assign a
approach Gravitation Field Algorithm (GFA). It not only mass value to every dust. The mass values, which are
can handle unimodal functions optimization, which tra- described above, are associated with their positions, and
ditional heuristic searches can do the same, but also can are calculated by the mass function. So when the dusts
deal with multimodal functions optimization, which tra- move as we described below, they have a certain
probditional heuristic search algorithms can’t do. The experi- ability to find a new position, in which there is a new
ments of the benchmark functions demonstrate that. mass value that is bigger than the mass value of the
cenThe idea of GFA is derived from the modern widely tre dust. This initialization approach was used because
accepted theory of planetary formation– the Solar Neb- the extension of the space solution could be considered.
ular Disk Model (SNDM) [8] in the astronomy. And the density of the dusts is proportional to the
probThecomplexastronomytheorycanbeintroduced ability of existence of extreme value.
simply as follows: Strategy of division
Several billions years ago, there wasn’t any planet in To decompose the solution space, we divide the space
the Solar System; just dusts rounded the whole world. into pieces called groups. In one group, the special dust
Then the dusts assembled by their own gravitations. For called centre dust corresponds to the max mass value in
a long time, the rocks had come out. It was the thresh- the group. The other dusts called surrounding dusts
old time of the whole Solar System. From then, the whose mass values are smaller than the mass value of
rocks moved fast to assemble together, and the bigger the centre dust are in the power distance of the centre
rocks arrested the smaller rocks. And they became lar- dust. The power distance is the space of the group in
gerrocks.Finally,theplanetscameout,andtherocks which the centre dust pulls other surrounding dusts
around them were absorbed out. toward it. De-signing an effective strategy of division is
GFA is derived from the point of view of the hypoth- achallengingworkintheGFA.Whenthenumberof
esis theory described above. To start with, all the solu- in-dependent variables is greater than one, an
appropritions, which are the dusts in the algorithm model, are ate strategy of division always needs a smart method.
initialized randomly, or based on the prior knowledge; There are many criterions of division. For example, we
what’s more, we assign every dust (solution) a weight, can make a power weight strategy: the size of each
we call it mass, whose values are based on the mass group is decided by the mass value of the centre dust in
function generated from the space of the problem solu- the group, proportional or inversely proportional to the
tion;finally,theGFAbegins. The power of attraction, peak values. Fig. 1 was shown to explain this strategy.
which belongs to a certain dust and exists between Here, another simple average strategy of division in
every two dusts, pulls other dusts, which have the same the form of two independent variables is given. The task
influence to other dusts. Hence, the dusts assemble of this strategy is to find the method of division in
together, and the planets come out in the end–they are
the optima. If you want to find global optimal solution,
the planets assemble again, and the biggest planets will
come out. To give a penalty of that the highest mass
dust rules the whole space of solution, we propose a
distance which can reduce the effect of gravitation field.
Methods
Description of GFA
Before all the works start, we design a mass function on
the basis of special problem. The mass function is
similar to the fitness function in GA. Both of them are score
functions which are the criterions for a special solution.
We can make the mass function be proportional to, or
inversely proportional to, the extreme values of the
problem. It all depends. Figure 1 Graph of power weight strategy in solution space. The
Initialization round black nodes are the centre dusts. The rectangle black nodes
are surrounding dusts. The big white round charts with other nodesThe algorithm begins with the initial step. We generate
in them are the power distance domains of their internal centreand select n dusts d (i = 1, ..., n) in the mass functioni
dusts.
domain [x ,x ] to build the initial solution set. Thebegin endZheng et al. Algorithms for Molecular Biology 2010, 5:32 Page 3 of 11
http://www.almob.org/content/5/1/32
which the areas of all groups are all the same. First, we maybe we want to make a fast convergence. And big
decide the number of groups m based on the mass func- pace could accelerate the convergence.
tion and the prior knowledge. Then we factorize m with When the surrounding dusts move toward the centre
two maximum fac