OpenXM/doc/issac2000/homogeneous-network.tex - diff

Return to homogeneous-network.tex CVS log

Up to [local] / OpenXM / doc / issac2000

Diff for /OpenXM/doc/issac2000/homogeneous-network.tex between version 1.4 and 1.13

version 1.4, 2000/01/11 05:17:11

version 1.13, 2000/01/17 08:50:56

Line 1

% $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.3 2000/01/07 06:27:55 noro Exp $

% $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.12 2000/01/17 08:06:15 noro Exp $

\section{Applications}

\subsection{Distributed computation with homogeneous servers}

\label{section:homog}

OpenXM also aims at speedup by a distributed computation

One of the aims of OpenXM is a parallel speedup by a distributed computation

with homogeneous servers. As the current specification of OpenXM does

not include communication between servers, one cannot expect

the maximal parallel speedup. However it is possible to execute

Line 17 by FFT over small finite fields and Chinese remainder

It can be easily parallelized:

\begin{tabbing}

Input :\= $f_1, f_2 \in Z[x]$\\

Input :\= $f_1, f_2 \in {\bf Z}[x]$ such that $deg(f_1), deg(f_2) < 2^M$\\

\> such that $deg(f_1), deg(f_2) < 2^M$\\

Output : $f = f_1f_2$ \\

Output : $f = f_1f_2 \bmod p$\\

$P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is an odd prime, \\

$P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is a prime, \\

\> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\

Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\

for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\

Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\

\> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\

\> ($f_1, f_2$ are regarded as integral.\\

\> (The product is computed by FFT.)\\

\> The product is computed by FFT.)\\

return $\phi_m(\sum F_j)$\\

(For $a \in Z$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)

(For $a \in {\bf Z}$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)

\end{tabbing}

Figure \ref{speedup}

shows the speedup factor under the above distributed computation

on {\tt Risa/Asir}. For each $n$, two polynomials of degree $n$

on Risa/Asir. For each $n$, two polynomials of degree $n$

with 3000bit coefficients are generated and the product is computed.

The machine is Fujitsu AP3000,

The machine is FUJITSU AP3000,

a cluster of Sun connected with a high speed network and MPI over the

a cluster of Sun workstations connected with a high speed network

network is used to implement OpenXM.

and MPI over the network is used to implement OpenXM.

\begin{figure}[htbp]

\epsfxsize=8.5cm

\epsffile{speedup.ps}

Line 46 network is used to implement OpenXM.

Line 44 network is used to implement OpenXM.

\label{speedup}

\end{figure}

The task of a client is the generation and partition of $P$, sending

If the number of servers is $L$ and the inputs are fixed, then the cost to

and receiving of polynomials and the synthesis of the result. If the

compute $F_j$ in parallel is $O(1/L)$, whereas the cost

number of servers is $L$ and the inputs are fixed, then the time to

to send and receive polynomials is $O(L)$ if {\tt ox\_push\_cmo()} and

compute $F_j$ in parallel is proportional to $1/L$, whereas the time

{\tt ox\_pop\_cmo()} are repeatedly applied on the client.

for sending and receiving of polynomials is proportional to $L$

Therefore the speedup is limited and the upper bound of

because we don't have the broadcast and the reduce

operations. Therefore the speedup is limited and the upper bound of

the speedup factor depends on the ratio of

the computational cost and the communication cost.

the computational cost and the communication cost for each unit operation.

Figure \ref{speedup} shows that

the speedup is satisfactory if the degree is large and the number of

the speedup is satisfactory if the degree is large and $L$

servers is not large, say, up to 10 under the above envionment.

is not large, say, up to 10 under the above environment.

If OpenXM provides operations for the broadcast and the reduction

such as {\tt MPI\_Bcast} and {\tt MPI\_Reduce} respectively, the cost of

sending $f_1$, $f_2$ and gathering $F_j$ may be reduced to $O(\log_2L)$

and we can expect better results in such a case.

\subsubsection{Gr\"obner basis computation by various methods}

\subsubsection{Competitive distributed computation by various strategies}

Singular \cite{Singular} implements {\tt MP} interface for distributed

SINGULAR \cite{Singular} implements {\it MP} interface for distributed

computation and a competitive Gr\"obner basis computation is

illustrated as an example of distributed computation. However,

illustrated as an example of distributed computation.

interruption has not implemented yet and the looser process have to be

Such a distributed computation is also possible on OpenXM.

killed explicitly. As stated in Section \ref{secsession} OpenXM

The following Risa/Asir function computes a Gr\"obner basis by

provides such a function and one can safely reset the server and

continue to use it. Furthermore, if a client provides synchronous I/O

multiplexing by {\tt select()}, then a polling is not necessary. The

following {\tt Risa/Asir} function computes a Gr\"obner basis by

starting the computations simultaneously from the homogenized input and

the input itself. The client watches the streams by {\tt ox\_select()}

and The result which is returned first is taken. Then the remaining

and the result which is returned first is taken. Then the remaining

server is reset.

\begin{verbatim}

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>