===================================================================
RCS file: /home/cvs/OpenXM_contrib/gmp/doc/Attic/projects.html,v
retrieving revision 1.1.1.1
retrieving revision 1.1.1.2
diff -u -p -r1.1.1.1 -r1.1.1.2
--- OpenXM_contrib/gmp/doc/Attic/projects.html	2000/09/09 14:12:20	1.1.1.1
+++ OpenXM_contrib/gmp/doc/Attic/projects.html	2003/08/25 16:06:11	1.1.1.2
@@ -13,9 +13,30 @@
   </h1>
 </center>
 
+<font size=-1>
+Copyright 2000, 2001, 2002 Free Software Foundation, Inc. <br><br>
+This file is part of the GNU MP Library. <br><br>
+The GNU MP Library is free software; you can redistribute it and/or modify
+it under the terms of the GNU Lesser General Public License as published
+by the Free Software Foundation; either version 2.1 of the License, or (at
+your option) any later version. <br><br>
+The GNU MP Library is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
+License for more details. <br><br>
+You should have received a copy of the GNU Lesser General Public License
+along with the GNU MP Library; see the file COPYING.LIB.  If not, write to
+the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
+MA 02111-1307, USA.
+</font>
+
+<hr>
+<!-- NB. timestamp updated automatically by emacs -->
 <comment>
-  An up-to-date version of this file is available at
+  This file current as of 14 May 2002.  An up-to-date version is available at
   <a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>.
+  Please send comments about this page to
+  <a href="mailto:bug-gmp@gnu.org">bug-gmp@gnu.org</a>.
 </comment>
 
 <p> This file lists projects suitable for volunteers.  Please see the
@@ -28,23 +49,8 @@
     problems.)
 
 <ul>
-<li> <strong>C++ wrapper</strong>
+<li> <strong>Faster multiplication</strong>
 
-  <p> A C++ wrapper for GMP is highly desirable.  Several people have started
-  writing one, but nobody has so far finished it.  For a wrapper to be useful,
-  one needs to pay attention to efficiency.
-
-  <ol>
-    <li> Write arithmetic functions for direct applications of most
-         elementary C++ types.  This might be necessary to avoid
-         ambiguities, but it is also desirable from an efficiency
-         standpoint.
-    <li> Avoid running constructors/destructors when not necessary.
-         For example, implement <code>a&nbsp+=&nbsp;b</code> by directly applying mpz_add.
-  </ol>
-
-<p> <li> <strong>Faster multiplication</strong>
-
   <p> The current multiplication code uses Karatsuba, 3-way Toom-Cook,
       or Fermat FFT.  Several new developments are desirable:
 
@@ -62,14 +68,29 @@
     <li> It's possible CPU dependent effects like cache locality will
          have a greater impact on speed than algorithmic improvements.
 
+    <li> Add support for partial products, either a given number of low limbs
+         or high limbs of the result.  A high partial product can be used by
+         <code>mpf_mul</code> now, a low half partial product might be of use
+         in a future sub-quadratic REDC.  On small sizes a partial product
+         will be faster simply through fewer cross-products, similar to the
+         way squaring is faster.  But work by Thom Mulders shows that for
+         Karatsuba and higher order algorithms the advantage is progressively
+         lost, so for large sizes partial products turn out to be no faster.
+
   </ol>
 
+  <p> Another possibility would be an optimized cube.  In the basecase that
+      should definitely be able to save cross-products in a similar fashion to
+      squaring, but some investigation might be needed for how best to adapt
+      the higher-order algorithms.  Not sure whether cubing or further small
+      powers have any particularly important uses though.
 
 <p> <li> <strong>Assembly routines</strong>
 
-  <p> Write new and tune existing assembly routines.  The programs in mpn/tests
-  and the tune/speed.c program are useful for testing and timing the routines
-  you write.  See the README files in those directories for more information.
+  <p> Write new and improve existing assembly routines.  The tests/devel
+  programs and the tune/speed.c and tune/many.pl programs are useful for
+  testing and timing the routines you write.  See the README files in those
+  directories for more information.
 
   <p> Please make sure your new routines are fast for these three situations:
   <ol>
@@ -106,6 +127,9 @@
   current code can mostly be reused.  It should be possible to share code
   between GCD and GCDEXT, and probably with Jacobi symbols too.
 
+  <p> Paul Zimmermann has worked on sub-quadratic GCD and GCDEXT, but it seems
+  that the most likely algorithm doesn't kick in until about 3000 limbs.
+
 <p> <li> <strong>Math functions for the mpf layer</strong>
 
   <p> Implement the functions of math.h for the GMP mpf layer!  Check the book
@@ -113,20 +137,25 @@
   functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos,
   cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
 
+  <p> These are implemented in mpfr, redoing them in mpf might not be worth
+  bothering with, if the long term plan is to bring mpfr in as the new mpf.
+
 <p> <li> <strong>Faster sqrt</strong>
 
-  <p> The current code for computing square roots use a Newton iteration that
-  rely on division.  It is possible to avoid using division by computing
-  1/sqrt(A) using this formula:
+  <p> The current code uses divisions, which are reasonably fast, but it'd be
+  possible to use only multiplications by computing 1/sqrt(A) using this
+  formula:
   <pre>
                                     2
                    x   = x  (3 - A x )/2.
                     i+1   i         i  </pre>
-  The final result is then computed without division using,
+  And the final result
   <pre>
                      sqrt(A) = A x .
                                   n  </pre>
-  The resulting code should be substantially faster than the current code.
+  That final multiply might be the full size of the input (though it might
+  only need the high half of that), so there may or may not be any speedup
+  overall.
 
 <p> <li> <strong>Nth root</strong>
 
@@ -137,12 +166,144 @@
   <p> If the routine becomes fast enough, perhaps square roots could be computed
   using this function.
 
-<p> <li> <strong>More random number generators</strong>
+<p> <li> <strong>Quotient-Only Division</strong>
 
-  <p> Implement some more pseudo random number generator algorithms.
-      Today there's only Linear Congruential.
+  <p> Some work can be saved when only the quotient is required in a division,
+      basically the necessary correction -0, -1 or -2 to the estimated
+      quotient can almost always be determined from only a few limbs of
+      multiply and subtract, rather than forming a complete remainder.  The
+      greatest savings are when the quotient is small compared to the dividend
+      and divisor.
 
+  <p> Some code along these lines can be found in the current
+      <code>mpn_tdiv_qr</code>, though perhaps calculating bigger chunks of
+      remainder than might be strictly necessary.  That function in its
+      current form actually then always goes on to calculate a full remainder.
+      Burnikel and Zeigler describe a similar approach for the divide and
+      conquer case.
 
+<p> <li> <strong>Sub-Quadratic REDC and Exact Division</strong>
+
+  <p> <code>mpn_bdivmod</code> and the <code>redc</code> in
+      <code>mpz_powm</code> should use some sort of divide and conquer
+      algorithm.  This would benefit <code>mpz_divexact</code>, and
+      <code>mpn_gcd</code> on large unequal size operands.  See "Exact
+      Division with Karatsuba Complexity" by Jebelean for a (brief)
+      description.
+
+  <p> Failing that, some sort of <code>DIVEXACT_THRESHOLD</code> could be
+      added to control whether <code>mpz_divexact</code> uses
+      <code>mpn_bdivmod</code> or <code>mpn_tdiv_qr</code>, since the latter
+      is faster on large divisors.
+
+  <p> For the REDC, basically all that's needed is Montgomery's algorithm done
+      in multi-limb integers.  R is multiple limbs, and the inverse and
+      operands are multi-precision.
+
+  <p> For exact division the time to calculate a multi-limb inverse is not
+      amortized across many modular operations, but instead will probably
+      create a threshold below which the current style
+      <code>mpn_bdivmod</code> is best.  There's also Krandick and Jebelean,
+      "Bidirectional Exact Integer Division" to basically use a low to high
+      exact division for the low half quotient, and a quotient-only division
+      for the high half.
+
+  <p> It will be noted that low-half and high-half multiplies, and a low-half
+      square, can be used.  These ought to each take as little as half the
+      time of a full multiply, or square, though work by Thom Mulders shows
+      the advantage is progressively lost as Karatsuba and higher algorithms
+      are applied.
+
+<p> <li> <strong>Exceptions</strong>
+
+  <p> Some sort of scheme for exceptions handling would be desirable.
+      Presently the only thing documented is that divide by zero in GMP
+      functions provokes a deliberate machine divide by zero (on those systems
+      where such a thing exists at least).  The global <code>gmp_errno</code>
+      is not actually documented, except for the old <code>gmp_randinit</code>
+      function.  Being currently just a plain global means it's not
+      thread-safe.
+
+  <p> The basic choices for exceptions are returning an error code or having
+      a handler function to be called.  The disadvantage of error returns is
+      they have to be checked, leading to tedious and rarely executed code,
+      and strictly speaking such a scheme wouldn't be source or binary
+      compatible.  The disadvantage of a handler function is that a
+      <code>longjmp</code> or similar recovery from it may be difficult.  A
+      combination would be possible, for instance by allowing the handler to
+      return an error code.
+
+  <p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can
+      normally be detected at the start of functions, so exception handling
+      would have a clean state.  What's worth considering though is that the
+      GMP function detecting the exception may have been called via some third
+      party library or self contained application module, and hence have
+      various bits of state to be cleaned up above it.  It'd be highly
+      desirable for an exceptions scheme to allow for such cleanups.
+
+  <p> The C++ destructor mechanism could help with cleanups both internally
+      and externally, but being a plain C library we don't want to depend on
+      that.
+
+  <p> A C++ <code>throw</code> might be a good optional extra exceptions
+      mechanism, perhaps under a build option.  For GCC
+      <code>-fexceptions</code> will add the necessary frame information to
+      plain C code, or GMP could be compiled as C++.
+
+  <p> Out-of-memory exceptions are expected to be handled by the
+      <code>mp_set_memory_functions</code> routines, rather than being a
+      prospective part of divide-by-zero etc.  Some similar considerations
+      apply but what differs is that out-of-memory can arise deep within GMP
+      internals.  Even fundamental routines like <code>mpn_add_n</code> and
+      <code>mpn_addmul_1</code> can use temporary memory (for instance on Cray
+      vector systems).  Allowing for an error code return would require an
+      awful lot of checking internally.  Perhaps it'd still be worthwhile, but
+      it'd be a lot of changes and the extra code would probably be rather
+      rarely executed in normal usages.
+
+  <p> A <code>longjmp</code> recovery for out-of-memory will currently, in
+      general, lead to memory leaks and may leave GMP variables operated on in
+      inconsistent states.  Maybe it'd be possible to record recovery
+      information for use by the relevant allocate or reallocate function, but
+      that too would be a lot of changes.
+
+  <p> One scheme for out-of-memory would be to note that all GMP allocations
+      go through the <code>mp_set_memory_functions</code> routines.  So if the
+      application has an intended <code>setjmp</code> recovery point it can
+      record memory activity by GMP and abandon space allocated and variables
+      initialized after that point.  This might be as simple as directing the
+      allocation functions to a separate pool, but in general would have the
+      disadvantage of needing application-level bookkeeping on top of the
+      normal system <code>malloc</code>.  An advantage however is that it
+      needs nothing from GMP itself and on that basis doesn't burden
+      applications not needing recovery.  Note that there's probably some
+      details to be worked out here about reallocs of existing variables, and
+      perhaps about copying or swapping between "permanent" and "temporary"
+      variables.
+
+  <p> Applications desiring a fine-grained error control, for instance a
+      language interpreter, would very possibly not be well served by a scheme
+      requiring <code>longjmp</code>.  Wrapping every GMP function call with a
+      <code>setjmp</code> would be very inconvenient.
+
+  <p> Stack overflow is another possible exception, but perhaps not one that
+      can be easily detected in general.  On i386 GNU/Linux for instance GCC
+      normally doesn't generate stack probes for an <code>alloca</code>, but
+      merely adjusts <code>%esp</code>.  A big enough <code>alloca</code> can
+      miss the stack redzone and hit arbitrary data.  GMP stack usage is
+      normally a function of operand size, knowing that might suffice for some
+      applications.  Otherwise a fixed maximum usage can probably be obtained
+      by building with <code>--enable-alloca=malloc-reentrant</code> (or
+      <code>notreentrant</code>).
+
+  <p> Actually recovering from stack overflow is of course another problem.
+      It might be possible to catch a <code>SIGSEGV</code> in the stack
+      redzone and do something in a <code>sigaltstack</code>, on systems which
+      have that, but recovery might otherwise not be possible.  This is worth
+      bearing in mind because there's no point worrying about tight and
+      careful out-of-memory recovery if an out-of-stack is fatal.
+
+
 <p> <li> <strong>Test Suite</strong>
 
   <p> Add a test suite for old bugs.  These tests wouldn't loop or use
@@ -172,23 +333,40 @@
       seeds used, and perhaps to snapshot operands before performing
       each test, so any problem exposed can be reproduced.
 
-</ul>
 
-<hr>
+<p> <li> <strong>Performance Tool</strong>
 
-<table width="100%">
-  <tr>
-    <td>
-      <font size=2>
-      Please send comments about this page to
-      <a href="mailto:tege@swox.com">tege@swox.com</a>.<br>
-      Copyright (C) 1999, 2000 Torbjörn Granlund.
-      </font>
-    </td>
-    <td align=right>
-    </td>
-  </tr>
-</table>
+  <p> It'd be nice to have some sort of tool for getting an overview of
+      performance.  Clearly a great many things could be done, but some
+      primary uses would be,
 
+      <ol>
+        <li> Checking speed variations between compilers.
+        <li> Checking relative performance between systems or CPUs.
+      </ol>
+
+  <p> A combination of measuring some fundamental routines and some
+      representative application routines might satisfy these.
+
+  <p> The tune/time.c routines would be the easiest way to get good
+      accurate measurements on lots of different systems.  The high level
+      <code>speed_measure</code> may or may not suit, but the basic
+      <code>speed_starttime</code> and <code>speed_endtime</code> would cover
+      lots of portability and accuracy questions.
+
+
+</ul>
+<hr>
+
 </body>
 </html>
+
+<!--
+Local variables:
+eval: (add-hook 'write-file-hooks 'time-stamp)
+time-stamp-start: "This file current as of "
+time-stamp-format: "%:d %3b %:y"
+time-stamp-end: "\\."
+time-stamp-line-limit: 50
+End:
+-->