===================================================================
RCS file: /home/cvs/OpenXM_contrib/gmp/doc/Attic/projects.html,v
retrieving revision 1.1
retrieving revision 1.1.1.2
diff -u -p -r1.1 -r1.1.1.2
--- OpenXM_contrib/gmp/doc/Attic/projects.html 2000/09/09 14:12:20 1.1
+++ OpenXM_contrib/gmp/doc/Attic/projects.html 2003/08/25 16:06:11 1.1.1.2
@@ -13,9 +13,30 @@
+
+Copyright 2000, 2001, 2002 Free Software Foundation, Inc.
+This file is part of the GNU MP Library.
+The GNU MP Library is free software; you can redistribute it and/or modify
+it under the terms of the GNU Lesser General Public License as published
+by the Free Software Foundation; either version 2.1 of the License, or (at
+your option) any later version.
+The GNU MP Library is distributed in the hope that it will be useful, but
+WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY
+or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public
+License for more details.
+You should have received a copy of the GNU Lesser General Public License
+along with the GNU MP Library; see the file COPYING.LIB. If not, write to
+the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston,
+MA 02111-1307, USA.
+
+
+
This file lists projects suitable for volunteers. Please see the @@ -28,23 +49,8 @@ problems.)
A C++ wrapper for GMP is highly desirable. Several people have started - writing one, but nobody has so far finished it. For a wrapper to be useful, - one needs to pay attention to efficiency. - -
a += b
by directly applying mpz_add.
-
The current multiplication code uses Karatsuba, 3-way Toom-Cook, or Fermat FFT. Several new developments are desirable: @@ -62,14 +68,29 @@
mpf_mul
now, a low half partial product might be of use
+ in a future sub-quadratic REDC. On small sizes a partial product
+ will be faster simply through fewer cross-products, similar to the
+ way squaring is faster. But work by Thom Mulders shows that for
+ Karatsuba and higher order algorithms the advantage is progressively
+ lost, so for large sizes partial products turn out to be no faster.
+
+ Another possibility would be an optimized cube. In the basecase that + should definitely be able to save cross-products in a similar fashion to + squaring, but some investigation might be needed for how best to adapt + the higher-order algorithms. Not sure whether cubing or further small + powers have any particularly important uses though.
Write new and tune existing assembly routines. The programs in mpn/tests - and the tune/speed.c program are useful for testing and timing the routines - you write. See the README files in those directories for more information. +
Write new and improve existing assembly routines. The tests/devel + programs and the tune/speed.c and tune/many.pl programs are useful for + testing and timing the routines you write. See the README files in those + directories for more information.
Please make sure your new routines are fast for these three situations:
Paul Zimmermann has worked on sub-quadratic GCD and GCDEXT, but it seems + that the most likely algorithm doesn't kick in until about 3000 limbs. +
Implement the functions of math.h for the GMP mpf layer! Check the book @@ -113,20 +137,25 @@ functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos, cosh, exp, log, log10, pow, sin, sinh, tan, tanh. +
These are implemented in mpfr, redoing them in mpf might not be worth + bothering with, if the long term plan is to bring mpfr in as the new mpf. +
The current code for computing square roots use a Newton iteration that - rely on division. It is possible to avoid using division by computing - 1/sqrt(A) using this formula: +
The current code uses divisions, which are reasonably fast, but it'd be + possible to use only multiplications by computing 1/sqrt(A) using this + formula:
2 x = x (3 - A x )/2. i+1 i i- The final result is then computed without division using, + And the final result
sqrt(A) = A x . n- The resulting code should be substantially faster than the current code. + That final multiply might be the full size of the input (though it might + only need the high half of that), so there may or may not be any speedup + overall.
If the routine becomes fast enough, perhaps square roots could be computed using this function. -
Implement some more pseudo random number generator algorithms. - Today there's only Linear Congruential. +
Some work can be saved when only the quotient is required in a division, + basically the necessary correction -0, -1 or -2 to the estimated + quotient can almost always be determined from only a few limbs of + multiply and subtract, rather than forming a complete remainder. The + greatest savings are when the quotient is small compared to the dividend + and divisor. +
Some code along these lines can be found in the current
+ mpn_tdiv_qr
, though perhaps calculating bigger chunks of
+ remainder than might be strictly necessary. That function in its
+ current form actually then always goes on to calculate a full remainder.
+ Burnikel and Zeigler describe a similar approach for the divide and
+ conquer case.
+
mpn_bdivmod
and the redc
in
+ mpz_powm
should use some sort of divide and conquer
+ algorithm. This would benefit mpz_divexact
, and
+ mpn_gcd
on large unequal size operands. See "Exact
+ Division with Karatsuba Complexity" by Jebelean for a (brief)
+ description.
+
+
Failing that, some sort of DIVEXACT_THRESHOLD
could be
+ added to control whether mpz_divexact
uses
+ mpn_bdivmod
or mpn_tdiv_qr
, since the latter
+ is faster on large divisors.
+
+
For the REDC, basically all that's needed is Montgomery's algorithm done + in multi-limb integers. R is multiple limbs, and the inverse and + operands are multi-precision. + +
For exact division the time to calculate a multi-limb inverse is not
+ amortized across many modular operations, but instead will probably
+ create a threshold below which the current style
+ mpn_bdivmod
is best. There's also Krandick and Jebelean,
+ "Bidirectional Exact Integer Division" to basically use a low to high
+ exact division for the low half quotient, and a quotient-only division
+ for the high half.
+
+
It will be noted that low-half and high-half multiplies, and a low-half + square, can be used. These ought to each take as little as half the + time of a full multiply, or square, though work by Thom Mulders shows + the advantage is progressively lost as Karatsuba and higher algorithms + are applied. + +
Some sort of scheme for exceptions handling would be desirable.
+ Presently the only thing documented is that divide by zero in GMP
+ functions provokes a deliberate machine divide by zero (on those systems
+ where such a thing exists at least). The global gmp_errno
+ is not actually documented, except for the old gmp_randinit
+ function. Being currently just a plain global means it's not
+ thread-safe.
+
+
The basic choices for exceptions are returning an error code or having
+ a handler function to be called. The disadvantage of error returns is
+ they have to be checked, leading to tedious and rarely executed code,
+ and strictly speaking such a scheme wouldn't be source or binary
+ compatible. The disadvantage of a handler function is that a
+ longjmp
or similar recovery from it may be difficult. A
+ combination would be possible, for instance by allowing the handler to
+ return an error code.
+
+
Divide-by-zero, sqrt-of-negative, and similar operand range errors can + normally be detected at the start of functions, so exception handling + would have a clean state. What's worth considering though is that the + GMP function detecting the exception may have been called via some third + party library or self contained application module, and hence have + various bits of state to be cleaned up above it. It'd be highly + desirable for an exceptions scheme to allow for such cleanups. + +
The C++ destructor mechanism could help with cleanups both internally + and externally, but being a plain C library we don't want to depend on + that. + +
A C++ throw
might be a good optional extra exceptions
+ mechanism, perhaps under a build option. For GCC
+ -fexceptions
will add the necessary frame information to
+ plain C code, or GMP could be compiled as C++.
+
+
Out-of-memory exceptions are expected to be handled by the
+ mp_set_memory_functions
routines, rather than being a
+ prospective part of divide-by-zero etc. Some similar considerations
+ apply but what differs is that out-of-memory can arise deep within GMP
+ internals. Even fundamental routines like mpn_add_n
and
+ mpn_addmul_1
can use temporary memory (for instance on Cray
+ vector systems). Allowing for an error code return would require an
+ awful lot of checking internally. Perhaps it'd still be worthwhile, but
+ it'd be a lot of changes and the extra code would probably be rather
+ rarely executed in normal usages.
+
+
A longjmp
recovery for out-of-memory will currently, in
+ general, lead to memory leaks and may leave GMP variables operated on in
+ inconsistent states. Maybe it'd be possible to record recovery
+ information for use by the relevant allocate or reallocate function, but
+ that too would be a lot of changes.
+
+
One scheme for out-of-memory would be to note that all GMP allocations
+ go through the mp_set_memory_functions
routines. So if the
+ application has an intended setjmp
recovery point it can
+ record memory activity by GMP and abandon space allocated and variables
+ initialized after that point. This might be as simple as directing the
+ allocation functions to a separate pool, but in general would have the
+ disadvantage of needing application-level bookkeeping on top of the
+ normal system malloc
. An advantage however is that it
+ needs nothing from GMP itself and on that basis doesn't burden
+ applications not needing recovery. Note that there's probably some
+ details to be worked out here about reallocs of existing variables, and
+ perhaps about copying or swapping between "permanent" and "temporary"
+ variables.
+
+
Applications desiring a fine-grained error control, for instance a
+ language interpreter, would very possibly not be well served by a scheme
+ requiring longjmp
. Wrapping every GMP function call with a
+ setjmp
would be very inconvenient.
+
+
Stack overflow is another possible exception, but perhaps not one that
+ can be easily detected in general. On i386 GNU/Linux for instance GCC
+ normally doesn't generate stack probes for an alloca
, but
+ merely adjusts %esp
. A big enough alloca
can
+ miss the stack redzone and hit arbitrary data. GMP stack usage is
+ normally a function of operand size, knowing that might suffice for some
+ applications. Otherwise a fixed maximum usage can probably be obtained
+ by building with --enable-alloca=malloc-reentrant
(or
+ notreentrant
).
+
+
Actually recovering from stack overflow is of course another problem.
+ It might be possible to catch a SIGSEGV
in the stack
+ redzone and do something in a sigaltstack
, on systems which
+ have that, but recovery might otherwise not be possible. This is worth
+ bearing in mind because there's no point worrying about tight and
+ careful out-of-memory recovery if an out-of-stack is fatal.
+
+
Add a test suite for old bugs. These tests wouldn't loop or use @@ -172,23 +333,40 @@ seeds used, and perhaps to snapshot operands before performing each test, so any problem exposed can be reproduced. -
-
- Please send comments about this page to
- tege@swox.com. - Copyright (C) 1999, 2000 Torbjörn Granlund. - - |
- - | -
It'd be nice to have some sort of tool for getting an overview of + performance. Clearly a great many things could be done, but some + primary uses would be, +
A combination of measuring some fundamental routines and some + representative application routines might satisfy these. + +
The tune/time.c routines would be the easiest way to get good
+ accurate measurements on lots of different systems. The high level
+ speed_measure
may or may not suit, but the basic
+ speed_starttime
and speed_endtime
would cover
+ lots of portability and accuracy questions.
+
+
+
+