| version 1.1.1.1, 2000/09/09 14:12:20 |
version 1.1.1.2, 2003/08/25 16:06:11 |
|
|
| </h1> |
</h1> |
| </center> |
</center> |
| |
|
| |
<font size=-1> |
| |
Copyright 2000, 2001, 2002 Free Software Foundation, Inc. <br><br> |
| |
This file is part of the GNU MP Library. <br><br> |
| |
The GNU MP Library is free software; you can redistribute it and/or modify |
| |
it under the terms of the GNU Lesser General Public License as published |
| |
by the Free Software Foundation; either version 2.1 of the License, or (at |
| |
your option) any later version. <br><br> |
| |
The GNU MP Library is distributed in the hope that it will be useful, but |
| |
WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY |
| |
or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public |
| |
License for more details. <br><br> |
| |
You should have received a copy of the GNU Lesser General Public License |
| |
along with the GNU MP Library; see the file COPYING.LIB. If not, write to |
| |
the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, |
| |
MA 02111-1307, USA. |
| |
</font> |
| |
|
| |
<hr> |
| |
<!-- NB. timestamp updated automatically by emacs --> |
| <comment> |
<comment> |
| An up-to-date version of this file is available at |
This file current as of 14 May 2002. An up-to-date version is available at |
| <a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>. |
<a href="http://www.swox.com/gmp/projects.html">http://www.swox.com/gmp/projects.html</a>. |
| |
Please send comments about this page to |
| |
<a href="mailto:bug-gmp@gnu.org">bug-gmp@gnu.org</a>. |
| </comment> |
</comment> |
| |
|
| <p> This file lists projects suitable for volunteers. Please see the |
<p> This file lists projects suitable for volunteers. Please see the |
|
|
| problems.) |
problems.) |
| |
|
| <ul> |
<ul> |
| <li> <strong>C++ wrapper</strong> |
<li> <strong>Faster multiplication</strong> |
| |
|
| <p> A C++ wrapper for GMP is highly desirable. Several people have started |
|
| writing one, but nobody has so far finished it. For a wrapper to be useful, |
|
| one needs to pay attention to efficiency. |
|
| |
|
| <ol> |
|
| <li> Write arithmetic functions for direct applications of most |
|
| elementary C++ types. This might be necessary to avoid |
|
| ambiguities, but it is also desirable from an efficiency |
|
| standpoint. |
|
| <li> Avoid running constructors/destructors when not necessary. |
|
| For example, implement <code>a += b</code> by directly applying mpz_add. |
|
| </ol> |
|
| |
|
| <p> <li> <strong>Faster multiplication</strong> |
|
| |
|
| <p> The current multiplication code uses Karatsuba, 3-way Toom-Cook, |
<p> The current multiplication code uses Karatsuba, 3-way Toom-Cook, |
| or Fermat FFT. Several new developments are desirable: |
or Fermat FFT. Several new developments are desirable: |
| |
|
|
|
| <li> It's possible CPU dependent effects like cache locality will |
<li> It's possible CPU dependent effects like cache locality will |
| have a greater impact on speed than algorithmic improvements. |
have a greater impact on speed than algorithmic improvements. |
| |
|
| |
<li> Add support for partial products, either a given number of low limbs |
| |
or high limbs of the result. A high partial product can be used by |
| |
<code>mpf_mul</code> now, a low half partial product might be of use |
| |
in a future sub-quadratic REDC. On small sizes a partial product |
| |
will be faster simply through fewer cross-products, similar to the |
| |
way squaring is faster. But work by Thom Mulders shows that for |
| |
Karatsuba and higher order algorithms the advantage is progressively |
| |
lost, so for large sizes partial products turn out to be no faster. |
| |
|
| </ol> |
</ol> |
| |
|
| |
<p> Another possibility would be an optimized cube. In the basecase that |
| |
should definitely be able to save cross-products in a similar fashion to |
| |
squaring, but some investigation might be needed for how best to adapt |
| |
the higher-order algorithms. Not sure whether cubing or further small |
| |
powers have any particularly important uses though. |
| |
|
| <p> <li> <strong>Assembly routines</strong> |
<p> <li> <strong>Assembly routines</strong> |
| |
|
| <p> Write new and tune existing assembly routines. The programs in mpn/tests |
<p> Write new and improve existing assembly routines. The tests/devel |
| and the tune/speed.c program are useful for testing and timing the routines |
programs and the tune/speed.c and tune/many.pl programs are useful for |
| you write. See the README files in those directories for more information. |
testing and timing the routines you write. See the README files in those |
| |
directories for more information. |
| |
|
| <p> Please make sure your new routines are fast for these three situations: |
<p> Please make sure your new routines are fast for these three situations: |
| <ol> |
<ol> |
|
|
| current code can mostly be reused. It should be possible to share code |
current code can mostly be reused. It should be possible to share code |
| between GCD and GCDEXT, and probably with Jacobi symbols too. |
between GCD and GCDEXT, and probably with Jacobi symbols too. |
| |
|
| |
<p> Paul Zimmermann has worked on sub-quadratic GCD and GCDEXT, but it seems |
| |
that the most likely algorithm doesn't kick in until about 3000 limbs. |
| |
|
| <p> <li> <strong>Math functions for the mpf layer</strong> |
<p> <li> <strong>Math functions for the mpf layer</strong> |
| |
|
| <p> Implement the functions of math.h for the GMP mpf layer! Check the book |
<p> Implement the functions of math.h for the GMP mpf layer! Check the book |
|
|
| functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos, |
functions are desirable: acos, acosh, asin, asinh, atan, atanh, atan2, cos, |
| cosh, exp, log, log10, pow, sin, sinh, tan, tanh. |
cosh, exp, log, log10, pow, sin, sinh, tan, tanh. |
| |
|
| |
<p> These are implemented in mpfr, redoing them in mpf might not be worth |
| |
bothering with, if the long term plan is to bring mpfr in as the new mpf. |
| |
|
| <p> <li> <strong>Faster sqrt</strong> |
<p> <li> <strong>Faster sqrt</strong> |
| |
|
| <p> The current code for computing square roots use a Newton iteration that |
<p> The current code uses divisions, which are reasonably fast, but it'd be |
| rely on division. It is possible to avoid using division by computing |
possible to use only multiplications by computing 1/sqrt(A) using this |
| 1/sqrt(A) using this formula: |
formula: |
| <pre> |
<pre> |
| 2 |
2 |
| x = x (3 - A x )/2. |
x = x (3 - A x )/2. |
| i+1 i i </pre> |
i+1 i i </pre> |
| The final result is then computed without division using, |
And the final result |
| <pre> |
<pre> |
| sqrt(A) = A x . |
sqrt(A) = A x . |
| n </pre> |
n </pre> |
| The resulting code should be substantially faster than the current code. |
That final multiply might be the full size of the input (though it might |
| |
only need the high half of that), so there may or may not be any speedup |
| |
overall. |
| |
|
| <p> <li> <strong>Nth root</strong> |
<p> <li> <strong>Nth root</strong> |
| |
|
|
|
| <p> If the routine becomes fast enough, perhaps square roots could be computed |
<p> If the routine becomes fast enough, perhaps square roots could be computed |
| using this function. |
using this function. |
| |
|
| <p> <li> <strong>More random number generators</strong> |
<p> <li> <strong>Quotient-Only Division</strong> |
| |
|
| <p> Implement some more pseudo random number generator algorithms. |
<p> Some work can be saved when only the quotient is required in a division, |
| Today there's only Linear Congruential. |
basically the necessary correction -0, -1 or -2 to the estimated |
| |
quotient can almost always be determined from only a few limbs of |
| |
multiply and subtract, rather than forming a complete remainder. The |
| |
greatest savings are when the quotient is small compared to the dividend |
| |
and divisor. |
| |
|
| |
<p> Some code along these lines can be found in the current |
| |
<code>mpn_tdiv_qr</code>, though perhaps calculating bigger chunks of |
| |
remainder than might be strictly necessary. That function in its |
| |
current form actually then always goes on to calculate a full remainder. |
| |
Burnikel and Zeigler describe a similar approach for the divide and |
| |
conquer case. |
| |
|
| |
<p> <li> <strong>Sub-Quadratic REDC and Exact Division</strong> |
| |
|
| |
<p> <code>mpn_bdivmod</code> and the <code>redc</code> in |
| |
<code>mpz_powm</code> should use some sort of divide and conquer |
| |
algorithm. This would benefit <code>mpz_divexact</code>, and |
| |
<code>mpn_gcd</code> on large unequal size operands. See "Exact |
| |
Division with Karatsuba Complexity" by Jebelean for a (brief) |
| |
description. |
| |
|
| |
<p> Failing that, some sort of <code>DIVEXACT_THRESHOLD</code> could be |
| |
added to control whether <code>mpz_divexact</code> uses |
| |
<code>mpn_bdivmod</code> or <code>mpn_tdiv_qr</code>, since the latter |
| |
is faster on large divisors. |
| |
|
| |
<p> For the REDC, basically all that's needed is Montgomery's algorithm done |
| |
in multi-limb integers. R is multiple limbs, and the inverse and |
| |
operands are multi-precision. |
| |
|
| |
<p> For exact division the time to calculate a multi-limb inverse is not |
| |
amortized across many modular operations, but instead will probably |
| |
create a threshold below which the current style |
| |
<code>mpn_bdivmod</code> is best. There's also Krandick and Jebelean, |
| |
"Bidirectional Exact Integer Division" to basically use a low to high |
| |
exact division for the low half quotient, and a quotient-only division |
| |
for the high half. |
| |
|
| |
<p> It will be noted that low-half and high-half multiplies, and a low-half |
| |
square, can be used. These ought to each take as little as half the |
| |
time of a full multiply, or square, though work by Thom Mulders shows |
| |
the advantage is progressively lost as Karatsuba and higher algorithms |
| |
are applied. |
| |
|
| |
<p> <li> <strong>Exceptions</strong> |
| |
|
| |
<p> Some sort of scheme for exceptions handling would be desirable. |
| |
Presently the only thing documented is that divide by zero in GMP |
| |
functions provokes a deliberate machine divide by zero (on those systems |
| |
where such a thing exists at least). The global <code>gmp_errno</code> |
| |
is not actually documented, except for the old <code>gmp_randinit</code> |
| |
function. Being currently just a plain global means it's not |
| |
thread-safe. |
| |
|
| |
<p> The basic choices for exceptions are returning an error code or having |
| |
a handler function to be called. The disadvantage of error returns is |
| |
they have to be checked, leading to tedious and rarely executed code, |
| |
and strictly speaking such a scheme wouldn't be source or binary |
| |
compatible. The disadvantage of a handler function is that a |
| |
<code>longjmp</code> or similar recovery from it may be difficult. A |
| |
combination would be possible, for instance by allowing the handler to |
| |
return an error code. |
| |
|
| |
<p> Divide-by-zero, sqrt-of-negative, and similar operand range errors can |
| |
normally be detected at the start of functions, so exception handling |
| |
would have a clean state. What's worth considering though is that the |
| |
GMP function detecting the exception may have been called via some third |
| |
party library or self contained application module, and hence have |
| |
various bits of state to be cleaned up above it. It'd be highly |
| |
desirable for an exceptions scheme to allow for such cleanups. |
| |
|
| |
<p> The C++ destructor mechanism could help with cleanups both internally |
| |
and externally, but being a plain C library we don't want to depend on |
| |
that. |
| |
|
| |
<p> A C++ <code>throw</code> might be a good optional extra exceptions |
| |
mechanism, perhaps under a build option. For GCC |
| |
<code>-fexceptions</code> will add the necessary frame information to |
| |
plain C code, or GMP could be compiled as C++. |
| |
|
| |
<p> Out-of-memory exceptions are expected to be handled by the |
| |
<code>mp_set_memory_functions</code> routines, rather than being a |
| |
prospective part of divide-by-zero etc. Some similar considerations |
| |
apply but what differs is that out-of-memory can arise deep within GMP |
| |
internals. Even fundamental routines like <code>mpn_add_n</code> and |
| |
<code>mpn_addmul_1</code> can use temporary memory (for instance on Cray |
| |
vector systems). Allowing for an error code return would require an |
| |
awful lot of checking internally. Perhaps it'd still be worthwhile, but |
| |
it'd be a lot of changes and the extra code would probably be rather |
| |
rarely executed in normal usages. |
| |
|
| |
<p> A <code>longjmp</code> recovery for out-of-memory will currently, in |
| |
general, lead to memory leaks and may leave GMP variables operated on in |
| |
inconsistent states. Maybe it'd be possible to record recovery |
| |
information for use by the relevant allocate or reallocate function, but |
| |
that too would be a lot of changes. |
| |
|
| |
<p> One scheme for out-of-memory would be to note that all GMP allocations |
| |
go through the <code>mp_set_memory_functions</code> routines. So if the |
| |
application has an intended <code>setjmp</code> recovery point it can |
| |
record memory activity by GMP and abandon space allocated and variables |
| |
initialized after that point. This might be as simple as directing the |
| |
allocation functions to a separate pool, but in general would have the |
| |
disadvantage of needing application-level bookkeeping on top of the |
| |
normal system <code>malloc</code>. An advantage however is that it |
| |
needs nothing from GMP itself and on that basis doesn't burden |
| |
applications not needing recovery. Note that there's probably some |
| |
details to be worked out here about reallocs of existing variables, and |
| |
perhaps about copying or swapping between "permanent" and "temporary" |
| |
variables. |
| |
|
| |
<p> Applications desiring a fine-grained error control, for instance a |
| |
language interpreter, would very possibly not be well served by a scheme |
| |
requiring <code>longjmp</code>. Wrapping every GMP function call with a |
| |
<code>setjmp</code> would be very inconvenient. |
| |
|
| |
<p> Stack overflow is another possible exception, but perhaps not one that |
| |
can be easily detected in general. On i386 GNU/Linux for instance GCC |
| |
normally doesn't generate stack probes for an <code>alloca</code>, but |
| |
merely adjusts <code>%esp</code>. A big enough <code>alloca</code> can |
| |
miss the stack redzone and hit arbitrary data. GMP stack usage is |
| |
normally a function of operand size, knowing that might suffice for some |
| |
applications. Otherwise a fixed maximum usage can probably be obtained |
| |
by building with <code>--enable-alloca=malloc-reentrant</code> (or |
| |
<code>notreentrant</code>). |
| |
|
| |
<p> Actually recovering from stack overflow is of course another problem. |
| |
It might be possible to catch a <code>SIGSEGV</code> in the stack |
| |
redzone and do something in a <code>sigaltstack</code>, on systems which |
| |
have that, but recovery might otherwise not be possible. This is worth |
| |
bearing in mind because there's no point worrying about tight and |
| |
careful out-of-memory recovery if an out-of-stack is fatal. |
| |
|
| |
|
| <p> <li> <strong>Test Suite</strong> |
<p> <li> <strong>Test Suite</strong> |
| |
|
| <p> Add a test suite for old bugs. These tests wouldn't loop or use |
<p> Add a test suite for old bugs. These tests wouldn't loop or use |
|
|
| seeds used, and perhaps to snapshot operands before performing |
seeds used, and perhaps to snapshot operands before performing |
| each test, so any problem exposed can be reproduced. |
each test, so any problem exposed can be reproduced. |
| |
|
| </ul> |
|
| |
|
| <hr> |
<p> <li> <strong>Performance Tool</strong> |
| |
|
| <table width="100%"> |
<p> It'd be nice to have some sort of tool for getting an overview of |
| <tr> |
performance. Clearly a great many things could be done, but some |
| <td> |
primary uses would be, |
| <font size=2> |
|
| Please send comments about this page to |
|
| <a href="mailto:tege@swox.com">tege@swox.com</a>.<br> |
|
| Copyright (C) 1999, 2000 Torbjörn Granlund. |
|
| </font> |
|
| </td> |
|
| <td align=right> |
|
| </td> |
|
| </tr> |
|
| </table> |
|
| |
|
| |
<ol> |
| |
<li> Checking speed variations between compilers. |
| |
<li> Checking relative performance between systems or CPUs. |
| |
</ol> |
| |
|
| |
<p> A combination of measuring some fundamental routines and some |
| |
representative application routines might satisfy these. |
| |
|
| |
<p> The tune/time.c routines would be the easiest way to get good |
| |
accurate measurements on lots of different systems. The high level |
| |
<code>speed_measure</code> may or may not suit, but the basic |
| |
<code>speed_starttime</code> and <code>speed_endtime</code> would cover |
| |
lots of portability and accuracy questions. |
| |
|
| |
|
| |
</ul> |
| |
<hr> |
| |
|
| </body> |
</body> |
| </html> |
</html> |
| |
|
| |
<!-- |
| |
Local variables: |
| |
eval: (add-hook 'write-file-hooks 'time-stamp) |
| |
time-stamp-start: "This file current as of " |
| |
time-stamp-format: "%:d %3b %:y" |
| |
time-stamp-end: "\\." |
| |
time-stamp-line-limit: 50 |
| |
End: |
| |
--> |