version 1.1, 2000/09/09 14:12:20 |
version 1.1.1.2, 2003/08/25 16:06:11 |
|
|
</h1> |
</h1> |
</center> |
</center> |
|
|
|
<font size=-1> |
|
Copyright 2000, 2001, 2002 Free Software Foundation, Inc. <br><br> |
|
This file is part of the GNU MP Library. <br><br> |
|
The GNU MP Library is free software; you can redistribute it and/or modify |
|
it under the terms of the GNU Lesser General Public License as published |
|
by the Free Software Foundation; either version 2.1 of the License, or (at |
|
your option) any later version. <br><br> |
|
The GNU MP Library is distributed in the hope that it will be useful, but |
|
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY |
|
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public |
|
License for more details. <br><br> |
|
You should have received a copy of the GNU Lesser General Public License |
|
along with the GNU MP Library; see the file COPYING.LIB. If not, write to |
|
the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, |
|
MA 02111-1307, USA. |
|
</font> |
|
|
|
<hr> |
|
<!-- NB. timestamp updated automatically by emacs --> |
<comment> |
<comment> |
An up-to-date version of this file is available at |
This file current as of 14 May 2002. An up-to-date version is available at |
<a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>. |
<a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>. |
|
Please send comments about this page to |
|
<a href="mailto:bug-gmp@gnu.org">bug-gmp@gnu.org</a>. |
</comment> |
</comment> |
|
|
<p> This file lists projects suitable for volunteers. Please see the |
<p> This file lists projects suitable for volunteers. Please see the |
|
|
problems.) |
problems.) |
|
|
<ul> |
<ul> |
<li> <strong>C++ wrapper</strong> |
<li> <strong>Faster multiplication</strong> |
|
|
<p> A C++ wrapper for GMP is highly desirable. Several people have started |
|
writing one, but nobody has so far finished it. For a wrapper to be useful, |
|
one needs to pay attention to efficiency. |
|
|
|
<ol> |
|
<li> Write arithmetic functions for direct applications of most |
|
elementary C++ types. This might be necessary to avoid |
|
ambiguities, but it is also desirable from an efficiency |
|
standpoint. |
|
<li> Avoid running constructors/destructors when not necessary. |
|
For example, implement <code>a += b</code> by directly applying mpz_add. |
|
</ol> |
|
|
|
<p> <li> <strong>Faster multiplication</strong> |
|
|
|
<p> The current multiplication code uses Karatsuba, 3-way Toom-Cook, |
<p> The current multiplication code uses Karatsuba, 3-way Toom-Cook, |
or Fermat FFT. Several new developments are desirable: |
or Fermat FFT. Several new developments are desirable: |
|
|
|
|
<li> It's possible CPU dependent effects like cache locality will |
<li> It's possible CPU dependent effects like cache locality will |
have a greater impact on speed than algorithmic improvements. |
have a greater impact on speed than algorithmic improvements. |
|
|
|
<li> Add support for partial products, either a given number of low limbs |
|
or high limbs of the result. A high partial product can be used by |
|
<code>mpf_mul</code> now, a low half partial product might be of use |
|
in a future sub-quadratic REDC. On small sizes a partial product |
|
will be faster simply through fewer cross-products, similar to the |
|
way squaring is faster. But work by Thom Mulders shows that for |
|
Karatsuba and higher order algorithms the advantage is progressively |
|
lost, so for large sizes partial products turn out to be no faster. |
|
|
</ol> |
</ol> |
|
|
|
<p> Another possibility would be an optimized cube. In the basecase that |
|
should definitely be able to save cross-products in a similar fashion to |
|
squaring, but some investigation might be needed for how best to adapt |
|
the higher-order algorithms. Not sure whether cubing or further small |
|
powers have any particularly important uses though. |
|
|
<p> <li> <strong>Assembly routines</strong> |
<p> <li> <strong>Assembly routines</strong> |
|
|
<p> Write new and tune existing assembly routines. The programs in mpn/tests |
<p> Write new and improve existing assembly routines. The tests/devel |
and the tune/speed.c program are useful for testing and timing the routines |
programs and the tune/speed.c and tune/many.pl programs are useful for |
you write. See the README files in those directories for more information. |
testing and timing the routines you write. See the README files in those |
|
directories for more information. |
|
|
<p> Please make sure your new routines are fast for these three situations: |
<p> Please make sure your new routines are fast for these three situations: |
<ol> |
<ol> |
|
|
current code can mostly be reused. It should be possible to share code |
current code can mostly be reused. It should be possible to share code |
between GCD and GCDEXT, and probably with Jacobi symbols too. |
between GCD and GCDEXT, and probably with Jacobi symbols too. |
|
|
|
<p> Paul Zimmermann has worked on sub-quadratic GCD and GCDEXT, but it seems |
|
that the most likely algorithm doesn't kick in until about 3000 limbs. |
|
|
<p> <li> <strong>Math functions for the mpf layer</strong> |
<p> <li> <strong>Math functions for the mpf layer</strong> |
|
|
<p> Implement the functions of math.h for the GMP mpf layer! Check the book |
<p> Implement the functions of math.h for the GMP mpf layer! Check the book |
|
|
functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos, |
functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos, |
cosh, exp, log, log10, pow, sin, sinh, tan, tanh. |
cosh, exp, log, log10, pow, sin, sinh, tan, tanh. |
|
|
|
<p> These are implemented in mpfr, redoing them in mpf might not be worth |
|
bothering with, if the long term plan is to bring mpfr in as the new mpf. |
|
|
<p> <li> <strong>Faster sqrt</strong> |
<p> <li> <strong>Faster sqrt</strong> |
|
|
<p> The current code for computing square roots use a Newton iteration that |
<p> The current code uses divisions, which are reasonably fast, but it'd be |
rely on division. It is possible to avoid using division by computing |
possible to use only multiplications by computing 1/sqrt(A) using this |
1/sqrt(A) using this formula: |
formula: |
<pre> |
<pre> |
2 |
2 |
x = x (3 - A x )/2. |
x = x (3 - A x )/2. |
i+1 i i </pre> |
i+1 i i </pre> |
The final result is then computed without division using, |
And the final result |
<pre> |
<pre> |
sqrt(A) = A x . |
sqrt(A) = A x . |
n </pre> |
n </pre> |
The resulting code should be substantially faster than the current code. |
That final multiply might be the full size of the input (though it might |
|
only need the high half of that), so there may or may not be any speedup |
|
overall. |
|
|
<p> <li> <strong>Nth root</strong> |
<p> <li> <strong>Nth root</strong> |
|
|
|
|
<p> If the routine becomes fast enough, perhaps square roots could be computed |
<p> If the routine becomes fast enough, perhaps square roots could be computed |
using this function. |
using this function. |
|
|
<p> <li> <strong>More random number generators</strong> |
<p> <li> <strong>Quotient-Only Division</strong> |
|
|
<p> Implement some more pseudo random number generator algorithms. |
<p> Some work can be saved when only the quotient is required in a division, |
Today there's only Linear Congruential. |
basically the necessary correction -0, -1 or -2 to the estimated |
|
quotient can almost always be determined from only a few limbs of |
|
multiply and subtract, rather than forming a complete remainder. The |
|
greatest savings are when the quotient is small compared to the dividend |
|
and divisor. |
|
|
|
<p> Some code along these lines can be found in the current |
|
<code>mpn_tdiv_qr</code>, though perhaps calculating bigger chunks of |
|
remainder than might be strictly necessary. That function in its |
|
current form actually then always goes on to calculate a full remainder. |
|
Burnikel and Zeigler describe a similar approach for the divide and |
|
conquer case. |
|
|
|
<p> <li> <strong>Sub-Quadratic REDC and Exact Division</strong> |
|
|
|
<p> <code>mpn_bdivmod</code> and the <code>redc</code> in |
|
<code>mpz_powm</code> should use some sort of divide and conquer |
|
algorithm. This would benefit <code>mpz_divexact</code>, and |
|
<code>mpn_gcd</code> on large unequal size operands. See "Exact |
|
Division with Karatsuba Complexity" by Jebelean for a (brief) |
|
description. |
|
|
|
<p> Failing that, some sort of <code>DIVEXACT_THRESHOLD</code> could be |
|
added to control whether <code>mpz_divexact</code> uses |
|
<code>mpn_bdivmod</code> or <code>mpn_tdiv_qr</code>, since the latter |
|
is faster on large divisors. |
|
|
|
<p> For the REDC, basically all that's needed is Montgomery's algorithm done |
|
in multi-limb integers. R is multiple limbs, and the inverse and |
|
operands are multi-precision. |
|
|
|
<p> For exact division the time to calculate a multi-limb inverse is not |
|
amortized across many modular operations, but instead will probably |
|
create a threshold below which the current style |
|
<code>mpn_bdivmod</code> is best. There's also Krandick and Jebelean, |
|
"Bidirectional Exact Integer Division" to basically use a low to high |
|
exact division for the low half quotient, and a quotient-only division |
|
for the high half. |
|
|
|
<p> It will be noted that low-half and high-half multiplies, and a low-half |
|
square, can be used. These ought to each take as little as half the |
|
time of a full multiply, or square, though work by Thom Mulders shows |
|
the advantage is progressively lost as Karatsuba and higher algorithms |
|
are applied. |
|
|
|
<p> <li> <strong>Exceptions</strong> |
|
|
|
<p> Some sort of scheme for exceptions handling would be desirable. |
|
Presently the only thing documented is that divide by zero in GMP |
|
functions provokes a deliberate machine divide by zero (on those systems |
|
where such a thing exists at least). The global <code>gmp_errno</code> |
|
is not actually documented, except for the old <code>gmp_randinit</code> |
|
function. Being currently just a plain global means it's not |
|
thread-safe. |
|
|
|
<p> The basic choices for exceptions are returning an error code or having |
|
a handler function to be called. The disadvantage of error returns is |
|
they have to be checked, leading to tedious and rarely executed code, |
|
and strictly speaking such a scheme wouldn't be source or binary |
|
compatible. The disadvantage of a handler function is that a |
|
<code>longjmp</code> or similar recovery from it may be difficult. A |
|
combination would be possible, for instance by allowing the handler to |
|
return an error code. |
|
|
|
<p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can |
|
normally be detected at the start of functions, so exception handling |
|
would have a clean state. What's worth considering though is that the |
|
GMP function detecting the exception may have been called via some third |
|
party library or self contained application module, and hence have |
|
various bits of state to be cleaned up above it. It'd be highly |
|
desirable for an exceptions scheme to allow for such cleanups. |
|
|
|
<p> The C++ destructor mechanism could help with cleanups both internally |
|
and externally, but being a plain C library we don't want to depend on |
|
that. |
|
|
|
<p> A C++ <code>throw</code> might be a good optional extra exceptions |
|
mechanism, perhaps under a build option. For GCC |
|
<code>-fexceptions</code> will add the necessary frame information to |
|
plain C code, or GMP could be compiled as C++. |
|
|
|
<p> Out-of-memory exceptions are expected to be handled by the |
|
<code>mp_set_memory_functions</code> routines, rather than being a |
|
prospective part of divide-by-zero etc. Some similar considerations |
|
apply but what differs is that out-of-memory can arise deep within GMP |
|
internals. Even fundamental routines like <code>mpn_add_n</code> and |
|
<code>mpn_addmul_1</code> can use temporary memory (for instance on Cray |
|
vector systems). Allowing for an error code return would require an |
|
awful lot of checking internally. Perhaps it'd still be worthwhile, but |
|
it'd be a lot of changes and the extra code would probably be rather |
|
rarely executed in normal usages. |
|
|
|
<p> A <code>longjmp</code> recovery for out-of-memory will currently, in |
|
general, lead to memory leaks and may leave GMP variables operated on in |
|
inconsistent states. Maybe it'd be possible to record recovery |
|
information for use by the relevant allocate or reallocate function, but |
|
that too would be a lot of changes. |
|
|
|
<p> One scheme for out-of-memory would be to note that all GMP allocations |
|
go through the <code>mp_set_memory_functions</code> routines. So if the |
|
application has an intended <code>setjmp</code> recovery point it can |
|
record memory activity by GMP and abandon space allocated and variables |
|
initialized after that point. This might be as simple as directing the |
|
allocation functions to a separate pool, but in general would have the |
|
disadvantage of needing application-level bookkeeping on top of the |
|
normal system <code>malloc</code>. An advantage however is that it |
|
needs nothing from GMP itself and on that basis doesn't burden |
|
applications not needing recovery. Note that there's probably some |
|
details to be worked out here about reallocs of existing variables, and |
|
perhaps about copying or swapping between "permanent" and "temporary" |
|
variables. |
|
|
|
<p> Applications desiring a fine-grained error control, for instance a |
|
language interpreter, would very possibly not be well served by a scheme |
|
requiring <code>longjmp</code>. Wrapping every GMP function call with a |
|
<code>setjmp</code> would be very inconvenient. |
|
|
|
<p> Stack overflow is another possible exception, but perhaps not one that |
|
can be easily detected in general. On i386 GNU/Linux for instance GCC |
|
normally doesn't generate stack probes for an <code>alloca</code>, but |
|
merely adjusts <code>%esp</code>. A big enough <code>alloca</code> can |
|
miss the stack redzone and hit arbitrary data. GMP stack usage is |
|
normally a function of operand size, knowing that might suffice for some |
|
applications. Otherwise a fixed maximum usage can probably be obtained |
|
by building with <code>--enable-alloca=malloc-reentrant</code> (or |
|
<code>notreentrant</code>). |
|
|
|
<p> Actually recovering from stack overflow is of course another problem. |
|
It might be possible to catch a <code>SIGSEGV</code> in the stack |
|
redzone and do something in a <code>sigaltstack</code>, on systems which |
|
have that, but recovery might otherwise not be possible. This is worth |
|
bearing in mind because there's no point worrying about tight and |
|
careful out-of-memory recovery if an out-of-stack is fatal. |
|
|
|
|
<p> <li> <strong>Test Suite</strong> |
<p> <li> <strong>Test Suite</strong> |
|
|
<p> Add a test suite for old bugs. These tests wouldn't loop or use |
<p> Add a test suite for old bugs. These tests wouldn't loop or use |
|
|
seeds used, and perhaps to snapshot operands before performing |
seeds used, and perhaps to snapshot operands before performing |
each test, so any problem exposed can be reproduced. |
each test, so any problem exposed can be reproduced. |
|
|
</ul> |
|
|
|
<hr> |
<p> <li> <strong>Performance Tool</strong> |
|
|
<table width="100%"> |
<p> It'd be nice to have some sort of tool for getting an overview of |
<tr> |
performance. Clearly a great many things could be done, but some |
<td> |
primary uses would be, |
<font size=2> |
|
Please send comments about this page to |
|
<a href="mailto:tege@swox.com">tege@swox.com</a>.<br> |
|
Copyright (C) 1999, 2000 Torbjörn Granlund. |
|
</font> |
|
</td> |
|
<td align=right> |
|
</td> |
|
</tr> |
|
</table> |
|
|
|
|
<ol> |
|
<li> Checking speed variations between compilers. |
|
<li> Checking relative performance between systems or CPUs. |
|
</ol> |
|
|
|
<p> A combination of measuring some fundamental routines and some |
|
representative application routines might satisfy these. |
|
|
|
<p> The tune/time.c routines would be the easiest way to get good |
|
accurate measurements on lots of different systems. The high level |
|
<code>speed_measure</code> may or may not suit, but the basic |
|
<code>speed_starttime</code> and <code>speed_endtime</code> would cover |
|
lots of portability and accuracy questions. |
|
|
|
|
|
</ul> |
|
<hr> |
|
|
</body> |
</body> |
</html> |
</html> |
|
|
|
<!-- |
|
Local variables: |
|
eval: (add-hook 'write-file-hooks 'time-stamp) |
|
time-stamp-start: "This file current as of " |
|
time-stamp-format: "%:d %3b %:y" |
|
time-stamp-end: "\\." |
|
time-stamp-line-limit: 50 |
|
End: |
|
--> |