[BACK]Return to homogeneous-network.tex CVS log [TXT][DIR] Up to [local] / OpenXM / doc / issac2000

Diff for /OpenXM/doc/issac2000/homogeneous-network.tex between version 1.3 and 1.12

version 1.3, 2000/01/07 06:27:55 version 1.12, 2000/01/17 08:06:15
Line 1 
Line 1 
 % $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.2 2000/01/02 07:32:12 takayama Exp $  % $OpenXM: OpenXM/doc/issac2000/homogeneous-network.tex,v 1.11 2000/01/17 07:15:52 noro Exp $
   
 \section{Applications}  
 \subsection{Distributed computation with homogeneous servers}  \subsection{Distributed computation with homogeneous servers}
   \label{section:homog}
   
 OpenXM also aims at speedup by a distributed computation  One of the aims of OpenXM is a parallel speedup by a distributed computation
 with homogeneous servers. As the current specification of OpenXM does  with homogeneous servers. As the current specification of OpenXM does
 not include communication between servers, one cannot expect  not include communication between servers, one cannot expect
 the maximal parallel speedup. However it is possible to execute  the maximal parallel speedup. However it is possible to execute
Line 17  by FFT over small finite fields and Chinese remainder 
Line 17  by FFT over small finite fields and Chinese remainder 
 It can be easily parallelized:  It can be easily parallelized:
   
 \begin{tabbing}  \begin{tabbing}
 Input :\= $f_1, f_2 \in Z[x]$\\  Input :\= $f_1, f_2 \in {\bf Z}[x]$ such that $deg(f_1), deg(f_2) < 2^M$\\
 \> such that $deg(f_1), deg(f_2) < 2^M$\\  Output : $f = f_1f_2$ \\
 Output : $f = f_1f_2 \bmod p$\\  $P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is an odd prime, \\
 $P \leftarrow$ \= $\{m_1,\cdots,m_N\}$ where $m_i$ is a prime, \\  
 \> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\  \> $2^{M+1}|m_i-1$ and $m=\prod m_i $ is sufficiently large. \\
 Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\  Separate $P$ into disjoint subsets $P_1, \cdots, P_L$.\\
 for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\  for \= $j=1$ to $L$ $M_j \leftarrow \prod_{m_i\in P_j} m_i$\\
 Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\  Compute $F_j$ such that $F_j \equiv f_1f_2 \bmod M_j$\\
 \> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\  \> and $F_j \equiv 0 \bmod m/M_j$ in parallel.\\
 \> ($f_1, f_2$ are regarded as integral.\\  \> (The product is computed by FFT.)\\
 \> The product is computed by FFT.)\\  
 return $\phi_m(\sum F_j)$\\  return $\phi_m(\sum F_j)$\\
 (For $a \in Z$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)  (For $a \in {\bf Z}$, $\phi_m(a) \in (-m/2,m/2)$ and $\phi_m(a)\equiv a \bmod m$)
 \end{tabbing}  \end{tabbing}
   
 Figure \ref{speedup}  Figure \ref{speedup}
 shows the speedup factor under the above distributed computation  shows the speedup factor under the above distributed computation
 on {\tt Risa/Asir}. For each $n$, two polynomials of degree $n$  on Risa/Asir. For each $n$, two polynomials of degree $n$
 with 3000bit coefficients are generated and the product is computed.  with 3000bit coefficients are generated and the product is computed.
 The machine is Fujitsu AP3000,  The machine is Fujitsu AP3000,
 a cluster of Sun connected with a high speed network and MPI over the  a cluster of Sun connected with a high speed network and MPI over the
Line 46  network is used to implement OpenXM.
Line 44  network is used to implement OpenXM.
 \label{speedup}  \label{speedup}
 \end{figure}  \end{figure}
   
 The task of a client is the generation and partition of $P$, sending  If the number of servers is $L$ and the inputs are fixed, then the cost to
 and receiving of polynomials and the synthesis of the result. If the  compute $F_j$ in parallel is $O(1/L)$, whereas the cost
 number of servers is $L$ and the inputs are fixed, then the time to  to send and receive polynomials is $O(L)$ if {\tt ox\_push\_cmo()} and
 compute $F_j$ in parallel is proportional to $1/L$, whereas the time  {\tt ox\_pop\_cmo()} are repeatedly applied on the client.
 for sending and receiving of polynomials is proportional to $L$  Therefore the speedup is limited and the upper bound of
 because we don't have the broadcast and the reduce  the speedup factor depends on the ratio of
 operations. Therefore the speedup is limited and the upper bound of  the computational cost and the communication cost for each unit operation.
 the speedup factor depends on the communication cost and the degree  Figure \ref{speedup} shows that
 of inputs. Figure \ref{speedup} shows that  the speedup is satisfactory if the degree is large and $L$
 the speedup is satisfactory if the degree is large and the number of  is not large, say, up to 10 under the above environment.
 servers is not large, say, up to 10.  If OpenXM provides operations for the broadcast and the reduction
   such as {\tt MPI\_Bcast} and {\tt MPI\_Reduce} respectively, the cost of
   sending $f_1$, $f_2$ and gathering $F_j$ may be reduced to $O(log_2L)$
   and we can expect better results in such a case.
   
 \subsubsection{Order counting of an elliptic curve}  \subsubsection{Competitive distributed computation by various strategies}
   
 \subsubsection{Gr\"obner basis computation by various methods}  SINGULAR \cite{Singular} implements {\it MP} interface for distributed
   
 Singular \cite{Singular} implements {\tt MP} interface for distributed  
 computation and a competitive Gr\"obner basis computation is  computation and a competitive Gr\"obner basis computation is
 illustrated as an example of distributed computation.  However,  illustrated as an example of distributed computation.
 interruption has not implemented yet and the looser process have to be  Such a distributed computation is also possible on OpenXM.
 killed explicitly. As stated in Section \ref{secsession} OpenXM  The following Risa/Asir function computes a Gr\"obner basis by
 provides such a function and one can safely reset the server and  
 continue to use it.  Furthermore, if a client provides synchronous I/O  
 multiplexing by {\tt select()}, then a polling is not necessary.  The  
 following {\tt Risa/Asir} function computes a Gr\"obner basis by  
 starting the computations simultaneously from the homogenized input and  starting the computations simultaneously from the homogenized input and
 the input itself.  The client watches the streams by {\tt ox\_select()}  the input itself.  The client watches the streams by {\tt ox\_select()}
 and The result which is returned first is taken. Then the remaining  and the result which is returned first is taken. Then the remaining
 server is reset.  server is reset.
   
 \begin{verbatim}  \begin{verbatim}

Legend:
Removed from v.1.3  
changed lines
  Added in v.1.12

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>