=================================================================== RCS file: /home/cvs/OpenXM_contrib/gmp/mpn/x86/k6/Attic/README,v retrieving revision 1.1 retrieving revision 1.1.1.2 diff -u -p -r1.1 -r1.1.1.2 --- OpenXM_contrib/gmp/mpn/x86/k6/Attic/README 2000/09/09 14:12:42 1.1 +++ OpenXM_contrib/gmp/mpn/x86/k6/Attic/README 2003/08/25 16:06:27 1.1.1.2 @@ -1,4 +1,25 @@ +Copyright 2000, 2001 Free Software Foundation, Inc. +This file is part of the GNU MP Library. + +The GNU MP Library is free software; you can redistribute it and/or modify +it under the terms of the GNU Lesser General Public License as published by +the Free Software Foundation; either version 2.1 of the License, or (at your +option) any later version. + +The GNU MP Library is distributed in the hope that it will be useful, but +WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY +or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public +License for more details. + +You should have received a copy of the GNU Lesser General Public License +along with the GNU MP Library; see the file COPYING.LIB. If not, write to +the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA +02111-1307, USA. + + + + AMD K6 MPN SUBROUTINES @@ -6,9 +27,10 @@ This directory contains code optimized for AMD K6 CPUs, meaning K6, K6-2 and K6-3. -The mmx and k62mmx subdirectories have routines using MMX instructions. All -K6s have MMX, the separate directories are just so that ./configure can omit -them if the assembler doesn't support MMX. +The mmx subdirectory has MMX code suiting plain K6, the k62mmx subdirectory +has MMX code suiting K6-2 and K6-3. All chips in the K6 family have MMX, +the separate directories are just so that ./configure can omit them if the +assembler doesn't support MMX. @@ -28,38 +50,21 @@ Times for the loops, with all code and data in L1 cach mpn_sqr_basecase 4.7 cycles/crossproduct (approx) or 9.2 cycles/triangleproduct (approx) + mpn_l/rshift 3.0 + mpn_divrem_1 20.0 mpn_mod_1 20.0 mpn_divexact_by3 11.0 - mpn_l/rshift 3.0 + mpn_copyi 1.0 + mpn_copyd 1.0 - mpn_copyi/copyd 1.0 - mpn_com_n 1.5-1.85 \ - mpn_and/andn/ior/xor_n 1.5-1.75 | varying with - mpn_iorn/xnor_n 2.0-2.25 | data alignment - mpn_nand/nior_n 2.0-2.25 / - - mpn_popcount 12.5 - mpn_hamdist 13.0 - - K6-2 and K6-3 have dual-issue MMX and get the following improvements. mpn_l/rshift 1.75 - mpn_copyi/copyd 0.56 or 1.0 \ - | - mpn_com_n 1.0-1.2 | varying with - mpn_and/andn/ior/xor_n 1.2-1.5 | data alignment - mpn_iorn/xnor_n 1.5-2.0 | - mpn_nand/nior_n 1.75-2.0 / - mpn_popcount 9.0 - mpn_hamdist 11.5 - - Prefetching of sources hasn't yet given any joy. With the 3DNow "prefetch" instruction, code seems to run slower, and with just "mov" loads it doesn't seem faster. Results so far are inconsistent. The K6 does a hardware @@ -74,7 +79,7 @@ NOTES All K6 family chips have MMX, but only K6-2 and K6-3 have 3DNow. Plain K6 executes MMX instructions only in the X pipe, but K6-2 and K6-3 can -execute them in both X and Y (and together). +execute them in both X and Y (and in both together). Branch misprediction penalty is 1 to 4 cycles (Optimization Manual chapter 6 table 12). @@ -163,7 +168,7 @@ Addressing modes happens with forms like "0F opcode mod/rm" with mod/rm=00-xxx-100 since with mod=00 the sib determines whether there's a displacement. - This affects all MMX and 3DNow instructions, and others with an 0F prefix + This affects all MMX and 3DNow instructions, and others with an 0F prefix, like movzbl. The modes affected are anything with an index and no displacement, or an index but no base, and this includes (%esp) which is really (,%esp,1). @@ -188,7 +193,7 @@ Various - femms 3 cycles - jecxz 2 cycles taken, 13 not taken (optimization manual says 7 not taken) - divl 20 cycles back-to-back -- imull 2 decode, 2 execute +- imull 2 decode, 3 execute - mull 2 decode, 3 execute (optimization manual decoding sample) - prefetch 2 cycles - rcll/rcrl implicit by one bit: 2 cycles