[BACK]Return to homogeneous-network.tex CVS log [TXT][DIR] Up to [local] / OpenXM / doc / issac2000

Diff for /OpenXM/doc/issac2000/homogeneous-network.tex between version 1.2 and 1.4

version 1.2, 2000/01/02 07:32:12 version 1.4, 2000/01/11 05:17:11
Line 1 
Line 1 
 % $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.1 1999/12/23 10:25:08 takayama Exp $  % $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.3 2000/01/07 06:27:55 noro Exp $
   
 \section{Applications}  \section{Applications}
 \subsection{Homogeneous Network}  (Noro)  \subsection{Distributed computation with homogeneous servers}
   
 Interactive distributed computation?  OpenXM also aims at speedup by a distributed computation
   with homogeneous servers. As the current specification of OpenXM does
   not include communication between servers, one cannot expect
   the maximal parallel speedup. However it is possible to execute
   several types of distributed computation as follows.
   
   \subsubsection{Product of univariate polynomials}
   
   Shoup \cite{Shoup} showed that the product of univariate polynomials
   with large degrees and large coefficients can be computed efficiently
   by FFT over small finite fields and Chinese remainder theorem.
   It can be easily parallelized:
   
   \begin{tabbing}
   Input :\= $f_1, f_2 \in Z[x]$\\
   \> such that $deg(f_1), deg(f_2) < 2^M$\\
   Output : $f = f_1f_2 \bmod p$\\
   $P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is a prime, \\
   \> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\
   Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\
   for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\
   Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\
   \> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\
   \> ($f_1, f_2$ are regarded as integral.\\
   \> The product is computed by FFT.)\\
   return $\phi_m(\sum F_j)$\\
   (For $a \in Z$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)
   \end{tabbing}
   
   Figure \ref{speedup}
   shows the speedup factor under the above distributed computation
   on {\tt Risa/Asir}. For each $n$, two polynomials of degree $n$
   with 3000bit coefficients are generated and the product is computed.
   The machine is Fujitsu AP3000,
   a cluster of Sun connected with a high speed network and MPI over the
   network is used to implement OpenXM.
   \begin{figure}[htbp]
   \epsfxsize=8.5cm
   \epsffile{speedup.ps}
   \caption{Speedup factor}
   \label{speedup}
   \end{figure}
   
   The task of a client is the generation and partition of $P$, sending
   and receiving of polynomials and the synthesis of the result. If the
   number of servers is $L$ and the inputs are fixed, then the time to
   compute $F_j$ in parallel is proportional to $1/L$, whereas the time
   for sending and receiving of polynomials is proportional to $L$
   because we don't have the broadcast and the reduce
   operations. Therefore the speedup is limited and the upper bound of
   the speedup factor depends on the ratio of
   the computational cost and the communication cost.
   Figure \ref{speedup} shows that
   the speedup is satisfactory if the degree is large and the number of
   servers is not large, say, up to 10 under the above envionment.
   
   \subsubsection{Gr\"obner basis computation by various methods}
   
   Singular \cite{Singular} implements {\tt MP} interface for distributed
   computation and a competitive Gr\"obner basis computation is
   illustrated as an example of distributed computation.  However,
   interruption has not implemented yet and the looser process have to be
   killed explicitly. As stated in Section \ref{secsession} OpenXM
   provides such a function and one can safely reset the server and
   continue to use it.  Furthermore, if a client provides synchronous I/O
   multiplexing by {\tt select()}, then a polling is not necessary.  The
   following {\tt Risa/Asir} function computes a Gr\"obner basis by
   starting the computations simultaneously from the homogenized input and
   the input itself.  The client watches the streams by {\tt ox\_select()}
   and The result which is returned first is taken. Then the remaining
   server is reset.
   
   \begin{verbatim}
   /* G:set of polys; V:list of variables */
   /* O:type of order; P0,P1: id's of servers */
   def dgr(G,V,O,P0,P1)
   {
     P = [P0,P1]; /* server list */
     map(ox_reset,P); /* reset servers */
     /* P0 executes non-homogenized computation */
     ox_cmo_rpc(P0,"dp_gr_main",G,V,0,1,O);
     /* P1 executes homogenized computation */
     ox_cmo_rpc(P1,"dp_gr_main",G,V,1,1,O);
     map(ox_push_cmd,P,262); /* 262 = OX_popCMO */
     F = ox_select(P); /* wait for data */
     /* F[0] is a server's id which is ready */
     R = ox_get(F[0]);
     if ( F[0] == P0 ) {
       Win = "nonhomo"; Lose = P1;
     } else {
       Win = "homo"; Lose = P0;
     }
     ox_reset(Lose); /* reset the loser */
     return [Win,R];
   }
   \end{verbatim}

Legend:
Removed from v.1.2  
changed lines
  Added in v.1.4

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>