OpenXM_contrib/gmp/doc/projects.html - diff

Return to projects.html CVS log

Up to [local] / OpenXM_contrib / gmp / doc

Diff for /OpenXM_contrib/gmp/doc/Attic/projects.html between version 1.1 and 1.1.1.2

-version 1.1, 2000/09/09 14:12:20
+version 1.1.1.2, 2003/08/25 16:06:11
 Line 13
 Line 13
 Line 13
    </h1>
  </center>
+ <font size=-1>
+ Copyright 2000, 2001, 2002 Free Software Foundation, Inc. <br><br>
+ This file is part of the GNU MP Library. <br><br>
+ The GNU MP Library is free software; you can redistribute it and/or modify
+ it under the terms of the GNU Lesser General Public License as published
+ by the Free Software Foundation; either version 2.1 of the License, or (at
+ your option) any later version. <br><br>
+ The GNU MP Library is distributed in the hope that it will be useful, but
+ WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+ or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public
+ License for more details. <br><br>
+ You should have received a copy of the GNU Lesser General Public License
+ along with the GNU MP Library; see the file COPYING.LIB.  If not, write to
+ the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
+ MA 02111-1307, USA.
+ </font>
+ <hr>
+ <!-- NB. timestamp updated automatically by emacs -->
  <comment>
-   An up-to-date version of this file is available at
+   This file current as of 14 May 2002.  An up-to-date version is available at
    <a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>.
+   Please send comments about this page to
+   <a href="mailto:bug-gmp@gnu.org">bug-gmp@gnu.org</a>.
  </comment>
  <p> This file lists projects suitable for volunteers.  Please see the
-Line 28
+Line 49
 Line 28
 Line 49
      problems.)
  <ul>
- <li> <strong>C++ wrapper</strong>
+ <li> <strong>Faster multiplication</strong>
-   <p> A C++ wrapper for GMP is highly desirable.  Several people have started
-   writing one, but nobody has so far finished it.  For a wrapper to be useful,
-   one needs to pay attention to efficiency.
-   <ol>
-     <li> Write arithmetic functions for direct applications of most
-          elementary C++ types.  This might be necessary to avoid
-          ambiguities, but it is also desirable from an efficiency
-          standpoint.
-     <li> Avoid running constructors/destructors when not necessary.
-          For example, implement <code>a&nbsp+=&nbsp;b</code> by directly applying mpz_add.
-   </ol>
- <p> <li> <strong>Faster multiplication</strong>
    <p> The current multiplication code uses Karatsuba, 3-way Toom-Cook,
        or Fermat FFT.  Several new developments are desirable:
-Line 62
+Line 68
 Line 62
 Line 68
      <li> It's possible CPU dependent effects like cache locality will
           have a greater impact on speed than algorithmic improvements.
+     <li> Add support for partial products, either a given number of low limbs
+          or high limbs of the result.  A high partial product can be used by
+          <code>mpf_mul</code> now, a low half partial product might be of use
+          in a future sub-quadratic REDC.  On small sizes a partial product
+          will be faster simply through fewer cross-products, similar to the
+          way squaring is faster.  But work by Thom Mulders shows that for
+          Karatsuba and higher order algorithms the advantage is progressively
+          lost, so for large sizes partial products turn out to be no faster.
    </ol>
+   <p> Another possibility would be an optimized cube.  In the basecase that
+       should definitely be able to save cross-products in a similar fashion to
+       squaring, but some investigation might be needed for how best to adapt
+       the higher-order algorithms.  Not sure whether cubing or further small
+       powers have any particularly important uses though.
  <p> <li> <strong>Assembly routines</strong>
-   <p> Write new and tune existing assembly routines.  The programs in mpn/tests
+   <p> Write new and improve existing assembly routines.  The tests/devel
-   and the tune/speed.c program are useful for testing and timing the routines
+   programs and the tune/speed.c and tune/many.pl programs are useful for
-   you write.  See the README files in those directories for more information.
+   testing and timing the routines you write.  See the README files in those
+   directories for more information.
    <p> Please make sure your new routines are fast for these three situations:
    <ol>
-Line 106
+Line 127
 Line 106
 Line 127
    current code can mostly be reused.  It should be possible to share code
    between GCD and GCDEXT, and probably with Jacobi symbols too.
+   <p> Paul Zimmermann has worked on sub-quadratic GCD and GCDEXT, but it seems
+   that the most likely algorithm doesn't kick in until about 3000 limbs.
  <p> <li> <strong>Math functions for the mpf layer</strong>
    <p> Implement the functions of math.h for the GMP mpf layer!  Check the book
-Line 113
+Line 137
 Line 113
 Line 137
    functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos,
    cosh, exp, log, log10, pow, sin, sinh, tan, tanh.
+   <p> These are implemented in mpfr, redoing them in mpf might not be worth
+   bothering with, if the long term plan is to bring mpfr in as the new mpf.
  <p> <li> <strong>Faster sqrt</strong>
-   <p> The current code for computing square roots use a Newton iteration that
+   <p> The current code uses divisions, which are reasonably fast, but it'd be
-   rely on division.  It is possible to avoid using division by computing
+   possible to use only multiplications by computing 1/sqrt(A) using this
-/sqrt(A) using this formula:
+   formula:
    <pre>
                     x   = x  (3 - A x )/2.
                      i+1   i         i  </pre>
-   The final result is then computed without division using,
+   And the final result
    <pre>
                       sqrt(A) = A x .
                                    n  </pre>
-   The resulting code should be substantially faster than the current code.
+   That final multiply might be the full size of the input (though it might
+   only need the high half of that), so there may or may not be any speedup
+   overall.
  <p> <li> <strong>Nth root</strong>
-Line 137
+Line 166
 Line 137
 Line 166
    <p> If the routine becomes fast enough, perhaps square roots could be computed
    using this function.
- <p> <li> <strong>More random number generators</strong>
+ <p> <li> <strong>Quotient-Only Division</strong>
-   <p> Implement some more pseudo random number generator algorithms.
+   <p> Some work can be saved when only the quotient is required in a division,
-       Today there's only Linear Congruential.
+       basically the necessary correction -0, -1 or -2 to the estimated
+       quotient can almost always be determined from only a few limbs of
+       multiply and subtract, rather than forming a complete remainder.  The
+       greatest savings are when the quotient is small compared to the dividend
+       and divisor.
+   <p> Some code along these lines can be found in the current
+       <code>mpn_tdiv_qr</code>, though perhaps calculating bigger chunks of
+       remainder than might be strictly necessary.  That function in its
+       current form actually then always goes on to calculate a full remainder.
+       Burnikel and Zeigler describe a similar approach for the divide and
+       conquer case.
+ <p> <li> <strong>Sub-Quadratic REDC and Exact Division</strong>
+   <p> <code>mpn_bdivmod</code> and the <code>redc</code> in
+       <code>mpz_powm</code> should use some sort of divide and conquer
+       algorithm.  This would benefit <code>mpz_divexact</code>, and
+       <code>mpn_gcd</code> on large unequal size operands.  See "Exact
+       Division with Karatsuba Complexity" by Jebelean for a (brief)
+       description.
+   <p> Failing that, some sort of <code>DIVEXACT_THRESHOLD</code> could be
+       added to control whether <code>mpz_divexact</code> uses
+       <code>mpn_bdivmod</code> or <code>mpn_tdiv_qr</code>, since the latter
+       is faster on large divisors.
+   <p> For the REDC, basically all that's needed is Montgomery's algorithm done
+       in multi-limb integers.  R is multiple limbs, and the inverse and
+       operands are multi-precision.
+   <p> For exact division the time to calculate a multi-limb inverse is not
+       amortized across many modular operations, but instead will probably
+       create a threshold below which the current style
+       <code>mpn_bdivmod</code> is best.  There's also Krandick and Jebelean,
+       "Bidirectional Exact Integer Division" to basically use a low to high
+       exact division for the low half quotient, and a quotient-only division
+       for the high half.
+   <p> It will be noted that low-half and high-half multiplies, and a low-half
+       square, can be used.  These ought to each take as little as half the
+       time of a full multiply, or square, though work by Thom Mulders shows
+       the advantage is progressively lost as Karatsuba and higher algorithms
+       are applied.
+ <p> <li> <strong>Exceptions</strong>
+   <p> Some sort of scheme for exceptions handling would be desirable.
+       Presently the only thing documented is that divide by zero in GMP
+       functions provokes a deliberate machine divide by zero (on those systems
+       where such a thing exists at least).  The global <code>gmp_errno</code>
+       is not actually documented, except for the old <code>gmp_randinit</code>
+       function.  Being currently just a plain global means it's not
+       thread-safe.
+   <p> The basic choices for exceptions are returning an error code or having
+       a handler function to be called.  The disadvantage of error returns is
+       they have to be checked, leading to tedious and rarely executed code,
+       and strictly speaking such a scheme wouldn't be source or binary
+       compatible.  The disadvantage of a handler function is that a
+       <code>longjmp</code> or similar recovery from it may be difficult.  A
+       combination would be possible, for instance by allowing the handler to
+       return an error code.
+   <p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can
+       normally be detected at the start of functions, so exception handling
+       would have a clean state.  What's worth considering though is that the
+       GMP function detecting the exception may have been called via some third
+       party library or self contained application module, and hence have
+       various bits of state to be cleaned up above it.  It'd be highly
+       desirable for an exceptions scheme to allow for such cleanups.
+   <p> The C++ destructor mechanism could help with cleanups both internally
+       and externally, but being a plain C library we don't want to depend on
+       that.
+   <p> A C++ <code>throw</code> might be a good optional extra exceptions
+       mechanism, perhaps under a build option.  For GCC
+       <code>-fexceptions</code> will add the necessary frame information to
+       plain C code, or GMP could be compiled as C++.
+   <p> Out-of-memory exceptions are expected to be handled by the
+       <code>mp_set_memory_functions</code> routines, rather than being a
+       prospective part of divide-by-zero etc.  Some similar considerations
+       apply but what differs is that out-of-memory can arise deep within GMP
+       internals.  Even fundamental routines like <code>mpn_add_n</code> and
+       <code>mpn_addmul_1</code> can use temporary memory (for instance on Cray
+       vector systems).  Allowing for an error code return would require an
+       awful lot of checking internally.  Perhaps it'd still be worthwhile, but
+       it'd be a lot of changes and the extra code would probably be rather
+       rarely executed in normal usages.
+   <p> A <code>longjmp</code> recovery for out-of-memory will currently, in
+       general, lead to memory leaks and may leave GMP variables operated on in
+       inconsistent states.  Maybe it'd be possible to record recovery
+       information for use by the relevant allocate or reallocate function, but
+       that too would be a lot of changes.
+   <p> One scheme for out-of-memory would be to note that all GMP allocations
+       go through the <code>mp_set_memory_functions</code> routines.  So if the
+       application has an intended <code>setjmp</code> recovery point it can
+       record memory activity by GMP and abandon space allocated and variables
+       initialized after that point.  This might be as simple as directing the
+       allocation functions to a separate pool, but in general would have the
+       disadvantage of needing application-level bookkeeping on top of the
+       normal system <code>malloc</code>.  An advantage however is that it
+       needs nothing from GMP itself and on that basis doesn't burden
+       applications not needing recovery.  Note that there's probably some
+       details to be worked out here about reallocs of existing variables, and
+       perhaps about copying or swapping between "permanent" and "temporary"
+       variables.
+   <p> Applications desiring a fine-grained error control, for instance a
+       language interpreter, would very possibly not be well served by a scheme
+       requiring <code>longjmp</code>.  Wrapping every GMP function call with a
+       <code>setjmp</code> would be very inconvenient.
+   <p> Stack overflow is another possible exception, but perhaps not one that
+       can be easily detected in general.  On i386 GNU/Linux for instance GCC
+       normally doesn't generate stack probes for an <code>alloca</code>, but
+       merely adjusts <code>%esp</code>.  A big enough <code>alloca</code> can
+       miss the stack redzone and hit arbitrary data.  GMP stack usage is
+       normally a function of operand size, knowing that might suffice for some
+       applications.  Otherwise a fixed maximum usage can probably be obtained
+       by building with <code>--enable-alloca=malloc-reentrant</code> (or
+       <code>notreentrant</code>).
+   <p> Actually recovering from stack overflow is of course another problem.
+       It might be possible to catch a <code>SIGSEGV</code> in the stack
+       redzone and do something in a <code>sigaltstack</code>, on systems which
+       have that, but recovery might otherwise not be possible.  This is worth
+       bearing in mind because there's no point worrying about tight and
+       careful out-of-memory recovery if an out-of-stack is fatal.
  <p> <li> <strong>Test Suite</strong>
    <p> Add a test suite for old bugs.  These tests wouldn't loop or use
-Line 172
+Line 333
 Line 172
 Line 333
        seeds used, and perhaps to snapshot operands before performing
        each test, so any problem exposed can be reproduced.
- </ul>
- <hr>
+ <p> <li> <strong>Performance Tool</strong>
- <table width="100%">
+   <p> It'd be nice to have some sort of tool for getting an overview of
-   <tr>
+       performance.  Clearly a great many things could be done, but some
-     <td>
+       primary uses would be,
-       <font size=2>
-       Please send comments about this page to
-       <a href="mailto:tege@swox.com">tege@swox.com</a>.<br>
-       Copyright (C) 1999, 2000 Torbj�rn Granlund.
-       </font>
-     </td>
-     <td align=right>
-     </td>
-   </tr>
- </table>
+       <ol>
+         <li> Checking speed variations between compilers.
+         <li> Checking relative performance between systems or CPUs.
+       </ol>
+   <p> A combination of measuring some fundamental routines and some
+       representative application routines might satisfy these.
+   <p> The tune/time.c routines would be the easiest way to get good
+       accurate measurements on lots of different systems.  The high level
+       <code>speed_measure</code> may or may not suit, but the basic
+       <code>speed_starttime</code> and <code>speed_endtime</code> would cover
+       lots of portability and accuracy questions.
+ </ul>
+ <hr>
  </body>
  </html>
+ <!--
+ Local variables:
+ eval: (add-hook 'write-file-hooks 'time-stamp)
+ time-stamp-start: "This file current as of "
+ time-stamp-format: "%:d %3b %:y"
+ time-stamp-end: "\\."
+ time-stamp-line-limit: 50
+ End:
+ -->

FreeBSD-CVSweb <freebsd-cvsweb@FreeBSD.org>