OpenXM/doc/issac2000/homogeneous-network.tex - diff

Return to homogeneous-network.tex CVS log

Up to [local] / OpenXM / doc / issac2000

Diff for /OpenXM/doc/issac2000/homogeneous-network.tex between version 1.6 and 1.13

version 1.6, 2000/01/15 02:24:18

version 1.13, 2000/01/17 08:50:56

Line 1

% $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.5 2000/01/15 00:20:45 takayama Exp $

% $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.12 2000/01/17 08:06:15 noro Exp $

\section{Applications}

\subsection{Distributed computation with homogeneous servers}

\label{section:homog}

Line 18 by FFT over small finite fields and Chinese remainder

Line 17 by FFT over small finite fields and Chinese remainder

It can be easily parallelized:

\begin{tabbing}

Input :\= $f_1, f_2 \in Z[x]$\\

Input :\= $f_1, f_2 \in {\bf Z}[x]$ such that $deg(f_1), deg(f_2) < 2^M$\\

\> such that $deg(f_1), deg(f_2) < 2^M$\\

Output : $f = f_1f_2$ \\

Output : $f = f_1f_2 \bmod p$\\

$P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is an odd prime, \\

$P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is a prime, \\

\> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\

Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\

for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\

Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\

\> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\

\> ($f_1, f_2$ are regarded as integral.\\

\> (The product is computed by FFT.)\\

\> The product is computed by FFT.)\\

return $\phi_m(\sum F_j)$\\

(For $a \in Z$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)

(For $a \in {\bf Z}$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)

\end{tabbing}

Figure \ref{speedup}

shows the speedup factor under the above distributed computation

on {\tt Risa/Asir}. For each $n$, two polynomials of degree $n$

on Risa/Asir. For each $n$, two polynomials of degree $n$

with 3000bit coefficients are generated and the product is computed.

The machine is Fujitsu AP3000,

The machine is FUJITSU AP3000,

a cluster of Sun connected with a high speed network and MPI over the

a cluster of Sun workstations connected with a high speed network

network is used to implement OpenXM.

and MPI over the network is used to implement OpenXM.

\begin{figure}[htbp]

\epsfxsize=8.5cm

\epsffile{speedup.ps}

Line 47 network is used to implement OpenXM.

Line 44 network is used to implement OpenXM.

\label{speedup}

\end{figure}

The task of a client is the generation and partition of $P$, sending

If the number of servers is $L$ and the inputs are fixed, then the cost to

and receiving of polynomials and the synthesis of the result. If the

number of servers is $L$ and the inputs are fixed, then the cost to

compute $F_j$ in parallel is $O(1/L)$, whereas the cost

to send and receive polynomials is $O(L)$

to send and receive polynomials is $O(L)$ if {\tt ox\_push\_cmo()} and

because we don't have the broadcast and the reduce

{\tt ox\_pop\_cmo()} are repeatedly applied on the client.

operations. Therefore the speedup is limited and the upper bound of

Therefore the speedup is limited and the upper bound of

the speedup factor depends on the ratio of

the computational cost and the communication cost.

the computational cost and the communication cost for each unit operation.

Figure \ref{speedup} shows that

the speedup is satisfactory if the degree is large and $L$

is not large, say, up to 10 under the above envionment.

is not large, say, up to 10 under the above environment.

If OpenXM provides the broadcast and the reduce operations, the cost of

If OpenXM provides operations for the broadcast and the reduction

sending $f_1$, $f_2$ and gathering $F_j$ may be reduced to $O(log_2L)$

such as {\tt MPI\_Bcast} and {\tt MPI\_Reduce} respectively, the cost of

and we will obtain better results in such a case.

sending $f_1$, $f_2$ and gathering $F_j$ may be reduced to $O(\log_2L)$

and we can expect better results in such a case.

\subsubsection{Competitive distributed computation by various strategies}

Singular \cite{Singular} implements {\tt MP} interface for distributed

SINGULAR \cite{Singular} implements {\it MP} interface for distributed

computation and a competitive Gr\"obner basis computation is

illustrated as an example of distributed computation.

Such a distributed computation is also possible on OpenXM.

The following {\tt Risa/Asir} function computes a Gr\"obner basis by

The following Risa/Asir function computes a Gr\"obner basis by

starting the computations simultaneously from the homogenized input and

the input itself. The client watches the streams by {\tt ox\_select()}

and The result which is returned first is taken. Then the remaining

and the result which is returned first is taken. Then the remaining

server is reset.

\begin{verbatim}

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>