GhaSShee

Algorithms

# Continuous Fraction ## Algorith of Float2Fractional There is a function ~~~ countinuousFraction :: Float -> Fractional /* ** find rational approximation to given real number ** David Eppstein / UC Irvine / 8 Aug 1993 ** ** With corrections from Arno Formella, May 2008 ** ** usage: a.out r d ** r is real number to approx ** d is the maximum denominator allowed ** ** based on the theory of continued fractions ** if x = a1 + 1/(a2 + 1/(a3 + 1/(a4 + ...))) ** then best approximation is found by truncating this series ** (with some adjustments in the last term). ** ** Note the fraction can be recovered as the first column of the matrix ** ( a1 1 ) ( a2 1 ) ( a3 1 ) ... ** ( 1 0 ) ( 1 0 ) ( 1 0 ) ** Instead of keeping the sequence of continued fraction terms, ** we just keep the last partial product of these matrices. */ #include main(ac, av) int ac; char ** av; { double atof(); int atoi(); void exit(); long m[2][2]; double x, startx; long maxden; long ai; /* read command line arguments */ if (ac != 3) { fprintf(stderr, "usage: %s r d\n",av[0]); // AF: argument missing exit(1); } startx = x = atof(av[1]); maxden = atoi(av[2]); /* initialize matrix */ m[0][0] = m[1][1] = 1; m[0][1] = m[1][0] = 0; /* loop finding terms until denom gets too big */ while (m[1][0] * ( ai = (long)x ) + m[1][1] <= maxden) { long t; t = m[0][0] * ai + m[0][1]; m[0][1] = m[0][0]; m[0][0] = t; t = m[1][0] * ai + m[1][1]; m[1][1] = m[1][0]; m[1][0] = t; if(x==(double)ai) break; // AF: division by zero x = 1/(x - (double) ai); if(x>(double)0x7FFFFFFF) break; // AF: representation failure } /* now remaining x is between 0 and 1/ai */ /* approx as either 0 or 1/m where m is max that will fit in maxden */ /* first try zero */ printf("%ld/%ld, error = %e\n", m[0][0], m[1][0], startx - ((double) m[0][0] / (double) m[1][0])); /* now try other possibility */ ai = (maxden - m[1][1]) / m[1][0]; m[0][0] = m[0][0] * ai + m[0][1]; m[1][0] = m[1][0] * ai + m[1][1]; printf("%ld/%ld, error = %e\n", m[0][0], m[1][0], startx - ((double) m[0][0] / (double) m[1][0])); } ~~~ https://stackoverflow.com/questions/95727/how-to-convert-floats-to-human-readable-fractions ## Homotopy Method There is an alternative to Newton Method since it is difficult to find answer always, think of equations with exponentials. # sine definition ## Basic idea - http://stackoverflow.com/questions/26146190/how-is-sine-implemented-in-java - https://whiteboardfox.com/60544-8397-9933   $ 6 $ 次までのテーラー展開で $ 2 ^ {-58} $ の精度、
  さらに近似式により誤差をなくすように設計されている ~~~ 33 * Algorithm 34 * 1. Since sin(-x) = -sin(x), we need only to consider positive x. 35 * 2. if x < 2^-27 (hx < 0x3e400000 0), return x with inexact if x!=0. 36 * 3. sin(x) is approximated by a polynomial of degree 13 on 37 * [0,pi/4] 38 * 3 13 39 * sin(x) ~ x + S1*x + ... + S6*x 40 * where 41 * 42 * |sin(x) 2 4 6 8 10 12 | -58 43 * |----- - (1+S1*x +S2*x +S3*x +S4*x +S5*x +S6*x )| <= 2 44 * | x | 45 * 46 * 4. sin(x+y) = sin(x) + sin'(x')*y 47 * ~ sin(x) + (1-x*x/2)*y 48 * For better accuracy, let 49 * 3 2 2 2 2 50 * r = x *(S2+x *(S3+x *(S4+x *(S5+x *S6)))) 51 * then 3 2 52 * sin(x) = x + (S1*x + (x *(r-y/2)+y)) ~~~ ## In reality [sine() depends on the domain ](https://sourceware.org/git/?p=glibc.git;a=blob;f=sysdeps/ieee754/dbl-64/s_sin.c;hb=HEAD#l281) from [here](https://stackoverflow.com/questions/2284860/how-does-c-compute-sin-and-other-math-functions) ~~~ /* * IBM Accurate Mathematical Library * written by International Business Machines Corp. * Copyright (C) 2001-2017 Free Software Foundation, Inc. * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU Lesser General Public License as published by * the Free Software Foundation; either version 2.1 of the License, or * (at your option) any later version. * * This program is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the * GNU Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public License * along with this program; if not, see . */ /****************************************************************************/ /* */ /* MODULE_NAME:usncs.c */ /* */ /* FUNCTIONS: usin */ /* ucos */ /* slow */ /* slow1 */ /* slow2 */ /* sloww */ /* sloww1 */ /* sloww2 */ /* bsloww */ /* bsloww1 */ /* bsloww2 */ /* cslow2 */ /* FILES NEEDED: dla.h endian.h mpa.h mydefs.h usncs.h */ /* branred.c sincos32.c dosincos.c mpa.c */ /* sincos.tbl */ /* */ /* An ultimate sin and routine. Given an IEEE double machine number x */ /* it computes the correctly rounded (to nearest) value of sin(x) or cos(x) */ /* Assumption: Machine arithmetic operations are performed in */ /* round to nearest mode of IEEE 754 standard. */ /* */ /****************************************************************************/ #include #include #include "endian.h" #include "mydefs.h" #include "usncs.h" #include "MathLib.h" #include #include #include /* Helper macros to compute sin of the input values. */ #define POLYNOMIAL2(xx) ((((s5 * (xx) + s4) * (xx) + s3) * (xx) + s2) * (xx)) #define POLYNOMIAL(xx) (POLYNOMIAL2 (xx) + s1) /* The computed polynomial is a variation of the Taylor series expansion for sin(a): a - a^3/3! + a^5/5! - a^7/7! + a^9/9! + (1 - a^2) * da / 2 The constants s1, s2, s3, etc. are pre-computed values of 1/3!, 1/5! and so on. The result is returned to LHS and correction in COR. */ #define TAYLOR_SIN(xx, a, da, cor) \ ({ \ double t = ((POLYNOMIAL (xx) * (a) - 0.5 * (da)) * (xx) + (da)); \ double res = (a) + t; \ (cor) = ((a) - res) + t; \ res; \ }) /* This is again a variation of the Taylor series expansion with the term x^3/3! expanded into the following for better accuracy: bb * x ^ 3 + 3 * aa * x * x1 * x2 + aa * x1 ^ 3 + aa * x2 ^ 3 The correction term is dx and bb + aa = -1/3! */ #define TAYLOR_SLOW(x0, dx, cor) \ ({ \ static const double th2_36 = 206158430208.0; /* 1.5*2**37 */ \ double xx = (x0) * (x0); \ double x1 = ((x0) + th2_36) - th2_36; \ double y = aa * x1 * x1 * x1; \ double r = (x0) + y; \ double x2 = ((x0) - x1) + (dx); \ double t = (((POLYNOMIAL2 (xx) + bb) * xx + 3.0 * aa * x1 * x2) \ * (x0) + aa * x2 * x2 * x2 + (dx)); \ t = (((x0) - r) + y) + t; \ double res = r + t; \ (cor) = (r - res) + t; \ res; \ }) #define SINCOS_TABLE_LOOKUP(u, sn, ssn, cs, ccs) \ ({ \ int4 k = u.i[LOW_HALF] << 2; \ sn = __sincostab.x[k]; \ ssn = __sincostab.x[k + 1]; \ cs = __sincostab.x[k + 2]; \ ccs = __sincostab.x[k + 3]; \ }) #ifndef SECTION # define SECTION #endif extern const union { int4 i[880]; double x[440]; } __sincostab attribute_hidden; static const double sn3 = -1.66666666666664880952546298448555E-01, sn5 = 8.33333214285722277379541354343671E-03, cs2 = 4.99999999999999999999950396842453E-01, cs4 = -4.16666666666664434524222570944589E-02, cs6 = 1.38888874007937613028114285595617E-03; static const double t22 = 0x1.8p22; void __dubsin (double x, double dx, double w[]); void __docos (double x, double dx, double w[]); double __mpsin (double x, double dx, bool reduce_range); double __mpcos (double x, double dx, bool reduce_range); static double slow (double x); static double slow1 (double x); static double slow2 (double x); static double sloww (double x, double dx, double orig, bool shift_quadrant); static double sloww1 (double x, double dx, double orig, bool shift_quadrant); static double sloww2 (double x, double dx, double orig, int n); static double bsloww (double x, double dx, double orig, int n); static double bsloww1 (double x, double dx, double orig, int n); static double bsloww2 (double x, double dx, double orig, int n); int __branred (double x, double *a, double *aa); static double cslow2 (double x); /* Given a number partitioned into X and DX, this function computes the cosine of the number by combining the sin and cos of X (as computed by a variation of the Taylor series) with the values looked up from the sin/cos table to get the result in RES and a correction value in COR. */ static inline double __always_inline do_cos (double x, double dx, double *corp) { mynumber u; if (x < 0) dx = -dx; u.x = big + fabs (x); x = fabs (x) - (u.x - big) + dx; double xx, s, sn, ssn, c, cs, ccs, res, cor; xx = x * x; s = x + x * xx * (sn3 + xx * sn5); c = xx * (cs2 + xx * (cs4 + xx * cs6)); SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs); cor = (ccs - s * ssn - cs * c) - sn * s; res = cs + cor; cor = (cs - res) + cor; *corp = cor; return res; } /* A more precise variant of DO_COS. EPS is the adjustment to the correction COR. */ static inline double __always_inline do_cos_slow (double x, double dx, double eps, double *corp) { mynumber u; if (x <= 0) dx = -dx; u.x = big + fabs (x); x = fabs (x) - (u.x - big); double xx, y, x1, x2, e1, e2, res, cor; double s, sn, ssn, c, cs, ccs; xx = x * x; s = x * xx * (sn3 + xx * sn5); c = x * dx + xx * (cs2 + xx * (cs4 + xx * cs6)); SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs); x1 = (x + t22) - t22; x2 = (x - x1) + dx; e1 = (sn + t22) - t22; e2 = (sn - e1) + ssn; cor = (ccs - cs * c - e1 * x2 - e2 * x) - sn * s; y = cs - e1 * x1; cor = cor + ((cs - y) - e1 * x1); res = y + cor; cor = (y - res) + cor; cor = 1.0005 * cor + __copysign (eps, cor); *corp = cor; return res; } /* Given a number partitioned into X and DX, this function computes the sine of the number by combining the sin and cos of X (as computed by a variation of the Taylor series) with the values looked up from the sin/cos table to get the result in RES and a correction value in COR. */ static inline double __always_inline do_sin (double x, double dx, double *corp) { mynumber u; if (x <= 0) dx = -dx; u.x = big + fabs (x); x = fabs (x) - (u.x - big); double xx, s, sn, ssn, c, cs, ccs, cor, res; xx = x * x; s = x + (dx + x * xx * (sn3 + xx * sn5)); c = x * dx + xx * (cs2 + xx * (cs4 + xx * cs6)); SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs); cor = (ssn + s * ccs - sn * c) + cs * s; res = sn + cor; cor = (sn - res) + cor; *corp = cor; return res; } /* A more precise variant of DO_SIN. EPS is the adjustment to the correction COR. */ static inline double __always_inline do_sin_slow (double x, double dx, double eps, double *corp) { mynumber u; if (x <= 0) dx = -dx; u.x = big + fabs (x); x = fabs (x) - (u.x - big); double xx, y, x1, x2, c1, c2, res, cor; double s, sn, ssn, c, cs, ccs; xx = x * x; s = x * xx * (sn3 + xx * sn5); c = xx * (cs2 + xx * (cs4 + xx * cs6)); SINCOS_TABLE_LOOKUP (u, sn, ssn, cs, ccs); x1 = (x + t22) - t22; x2 = (x - x1) + dx; c1 = (cs + t22) - t22; c2 = (cs - c1) + ccs; cor = (ssn + s * ccs + cs * s + c2 * x + c1 * x2 - sn * x * dx) - sn * c; y = sn + c1 * x1; cor = cor + ((sn - y) + c1 * x1); res = y + cor; cor = (y - res) + cor; cor = 1.0005 * cor + __copysign (eps, cor); *corp = cor; return res; } /* Reduce range of X and compute sin of a + da. When SHIFT_QUADRANT is true, the routine returns the cosine of a + da by rotating the quadrant once and computing the sine of the result. */ static inline double __always_inline reduce_and_compute (double x, bool shift_quadrant) { double retval = 0, a, da; unsigned int n = __branred (x, &a, &da); int4 k = (n + shift_quadrant) % 4; switch (k) { case 2: a = -a; da = -da; /* Fall through. */ case 0: if (a * a < 0.01588) retval = bsloww (a, da, x, n); else retval = bsloww1 (a, da, x, n); break; case 1: case 3: retval = bsloww2 (a, da, x, n); break; } return retval; } static inline int4 __always_inline reduce_sincos_1 (double x, double *a, double *da) { mynumber v; double t = (x * hpinv + toint); double xn = t - toint; v.x = t; double y = (x - xn * mp1) - xn * mp2; int4 n = v.i[LOW_HALF] & 3; double db = xn * mp3; double b = y - db; db = (y - b) - db; *a = b; *da = db; return n; } /* Compute sin (A + DA). cos can be computed by passing SHIFT_QUADRANT as true, which results in shifting the quadrant N clockwise. */ static double __always_inline do_sincos_1 (double a, double da, double x, int4 n, bool shift_quadrant) { double xx, retval, res, cor; double eps = fabs (x) * 1.2e-30; int k1 = (n + shift_quadrant) & 3; switch (k1) { /* quarter of unit circle */ case 2: a = -a; da = -da; /* Fall through. */ case 0: xx = a * a; if (xx < 0.01588) { /* Taylor series. */ res = TAYLOR_SIN (xx, a, da, cor); cor = 1.02 * cor + __copysign (eps, cor); retval = (res == res + cor) ? res : sloww (a, da, x, shift_quadrant); } else { res = do_sin (a, da, &cor); cor = 1.035 * cor + __copysign (eps, cor); retval = ((res == res + cor) ? __copysign (res, a) : sloww1 (a, da, x, shift_quadrant)); } break; case 1: case 3: res = do_cos (a, da, &cor); cor = 1.025 * cor + __copysign (eps, cor); retval = ((res == res + cor) ? ((n & 2) ? -res : res) : sloww2 (a, da, x, n)); break; } return retval; } static inline int4 __always_inline reduce_sincos_2 (double x, double *a, double *da) { mynumber v; double t = (x * hpinv + toint); double xn = t - toint; v.x = t; double xn1 = (xn + 8.0e22) - 8.0e22; double xn2 = xn - xn1; double y = ((((x - xn1 * mp1) - xn1 * mp2) - xn2 * mp1) - xn2 * mp2); int4 n = v.i[LOW_HALF] & 3; double db = xn1 * pp3; t = y - db; db = (y - t) - db; db = (db - xn2 * pp3) - xn * pp4; double b = t + db; db = (t - b) + db; *a = b; *da = db; return n; } /* Compute sin (A + DA). cos can be computed by passing SHIFT_QUADRANT as true, which results in shifting the quadrant N clockwise. */ static double __always_inline do_sincos_2 (double a, double da, double x, int4 n, bool shift_quadrant) { double res, retval, cor, xx; double eps = 1.0e-24; int4 k = (n + shift_quadrant) & 3; switch (k) { case 2: a = -a; da = -da; /* Fall through. */ case 0: xx = a * a; if (xx < 0.01588) { /* Taylor series. */ res = TAYLOR_SIN (xx, a, da, cor); cor = 1.02 * cor + __copysign (eps, cor); retval = (res == res + cor) ? res : bsloww (a, da, x, n); } else { res = do_sin (a, da, &cor); cor = 1.035 * cor + __copysign (eps, cor); retval = ((res == res + cor) ? __copysign (res, a) : bsloww1 (a, da, x, n)); } break; case 1: case 3: res = do_cos (a, da, &cor); cor = 1.025 * cor + __copysign (eps, cor); retval = ((res == res + cor) ? ((n & 2) ? -res : res) : bsloww2 (a, da, x, n)); break; } return retval; } /*******************************************************************/ /* An ultimate sin routine. Given an IEEE double machine number x */ /* it computes the correctly rounded (to nearest) value of sin(x) */ /*******************************************************************/ #ifdef IN_SINCOS static double #else double SECTION #endif __sin (double x) { double xx, res, t, cor; mynumber u; int4 k, m; double retval = 0; #ifndef IN_SINCOS SET_RESTORE_ROUND_53BIT (FE_TONEAREST); #endif u.x = x; m = u.i[HIGH_HALF]; k = 0x7fffffff & m; /* no sign */ if (k < 0x3e500000) /* if x->0 =>sin(x)=x */ { math_check_force_underflow (x); retval = x; } /*---------------------------- 2^-26 < |x|< 0.25 ----------------------*/ else if (k < 0x3fd00000) { xx = x * x; /* Taylor series. */ t = POLYNOMIAL (xx) * (xx * x); res = x + t; cor = (x - res) + t; retval = (res == res + 1.07 * cor) ? res : slow (x); } /* else if (k < 0x3fd00000) */ /*---------------------------- 0.25<|x|< 0.855469---------------------- */ else if (k < 0x3feb6000) { res = do_sin (x, 0, &cor); retval = (res == res + 1.096 * cor) ? res : slow1 (x); retval = __copysign (retval, x); } /* else if (k < 0x3feb6000) */ /*----------------------- 0.855469 <|x|<2.426265 ----------------------*/ else if (k < 0x400368fd) { t = hp0 - fabs (x); res = do_cos (t, hp1, &cor); retval = (res == res + 1.020 * cor) ? res : slow2 (x); retval = __copysign (retval, x); } /* else if (k < 0x400368fd) */ #ifndef IN_SINCOS /*-------------------------- 2.426265<|x|< 105414350 ----------------------*/ else if (k < 0x419921FB) { double a, da; int4 n = reduce_sincos_1 (x, &a, &da); retval = do_sincos_1 (a, da, x, n, false); } /* else if (k < 0x419921FB ) */ /*---------------------105414350 <|x|< 281474976710656 --------------------*/ else if (k < 0x42F00000) { double a, da; int4 n = reduce_sincos_2 (x, &a, &da); retval = do_sincos_2 (a, da, x, n, false); } /* else if (k < 0x42F00000 ) */ /* -----------------281474976710656 <|x| <2^1024----------------------------*/ else if (k < 0x7ff00000) retval = reduce_and_compute (x, false); /*--------------------- |x| > 2^1024 ----------------------------------*/ else { if (k == 0x7ff00000 && u.i[LOW_HALF] == 0) __set_errno (EDOM); retval = x / x; } #endif return retval; } /*******************************************************************/ /* An ultimate cos routine. Given an IEEE double machine number x */ /* it computes the correctly rounded (to nearest) value of cos(x) */ /*******************************************************************/ #ifdef IN_SINCOS static double #else double SECTION #endif __cos (double x) { double y, xx, res, cor, a, da; mynumber u; int4 k, m; double retval = 0; #ifndef IN_SINCOS SET_RESTORE_ROUND_53BIT (FE_TONEAREST); #endif u.x = x; m = u.i[HIGH_HALF]; k = 0x7fffffff & m; /* |x|<2^-27 => cos(x)=1 */ if (k < 0x3e400000) retval = 1.0; else if (k < 0x3feb6000) { /* 2^-27 < |x| < 0.855469 */ res = do_cos (x, 0, &cor); retval = (res == res + 1.020 * cor) ? res : cslow2 (x); } /* else if (k < 0x3feb6000) */ else if (k < 0x400368fd) { /* 0.855469 <|x|<2.426265 */ ; y = hp0 - fabs (x); a = y + hp1; da = (y - a) + hp1; xx = a * a; if (xx < 0.01588) { res = TAYLOR_SIN (xx, a, da, cor); cor = 1.02 * cor + __copysign (1.0e-31, cor); retval = (res == res + cor) ? res : sloww (a, da, x, true); } else { res = do_sin (a, da, &cor); cor = 1.035 * cor + __copysign (1.0e-31, cor); retval = ((res == res + cor) ? __copysign (res, a) : sloww1 (a, da, x, true)); } } /* else if (k < 0x400368fd) */ #ifndef IN_SINCOS else if (k < 0x419921FB) { /* 2.426265<|x|< 105414350 */ double a, da; int4 n = reduce_sincos_1 (x, &a, &da); retval = do_sincos_1 (a, da, x, n, true); } /* else if (k < 0x419921FB ) */ else if (k < 0x42F00000) { double a, da; int4 n = reduce_sincos_2 (x, &a, &da); retval = do_sincos_2 (a, da, x, n, true); } /* else if (k < 0x42F00000 ) */ /* 281474976710656 <|x| <2^1024 */ else if (k < 0x7ff00000) retval = reduce_and_compute (x, true); else { if (k == 0x7ff00000 && u.i[LOW_HALF] == 0) __set_errno (EDOM); retval = x / x; /* |x| > 2^1024 */ } #endif return retval; } /************************************************************************/ /* Routine compute sin(x) for 2^-26 < |x|< 0.25 by Taylor with more */ /* precision and if still doesn't accurate enough by mpsin or dubsin */ /************************************************************************/ static inline double __always_inline slow (double x) { double res, cor, w[2]; res = TAYLOR_SLOW (x, 0, cor); if (res == res + 1.0007 * cor) return res; __dubsin (fabs (x), 0, w); if (w[0] == w[0] + 1.000000001 * w[1]) return __copysign (w[0], x); return __copysign (__mpsin (fabs (x), 0, false), x); } /*******************************************************************************/ /* Routine compute sin(x) for 0.25<|x|< 0.855469 by __sincostab.tbl and Taylor */ /* and if result still doesn't accurate enough by mpsin or dubsin */ /*******************************************************************************/ static inline double __always_inline slow1 (double x) { double w[2], cor, res; res = do_sin_slow (x, 0, 0, &cor); if (res == res + cor) return res; __dubsin (fabs (x), 0, w); if (w[0] == w[0] + 1.000000005 * w[1]) return w[0]; return __mpsin (fabs (x), 0, false); } /**************************************************************************/ /* Routine compute sin(x) for 0.855469 <|x|<2.426265 by __sincostab.tbl */ /* and if result still doesn't accurate enough by mpsin or dubsin */ /**************************************************************************/ static inline double __always_inline slow2 (double x) { double w[2], y, y1, y2, cor, res; double t = hp0 - fabs (x); res = do_cos_slow (t, hp1, 0, &cor); if (res == res + cor) return res; y = fabs (x) - hp0; y1 = y - hp1; y2 = (y - y1) - hp1; __docos (y1, y2, w); if (w[0] == w[0] + 1.000000005 * w[1]) return w[0]; return __mpsin (fabs (x), 0, false); } /* Compute sin(x + dx) where X is small enough to use Taylor series around zero and (x + dx) in the first or third quarter of the unit circle. ORIG is the original value of X for computing error of the result. If the result is not accurate enough, the routine calls mpsin or dubsin. SHIFT_QUADRANT rotates the unit circle by 1 to compute the cosine instead of sine. */ static inline double __always_inline sloww (double x, double dx, double orig, bool shift_quadrant) { double y, t, res, cor, w[2], a, da, xn; mynumber v; int4 n; res = TAYLOR_SLOW (x, dx, cor); double eps = fabs (orig) * 3.1e-30; cor = 1.0005 * cor + __copysign (eps, cor); if (res == res + cor) return res; a = fabs (x); da = (x > 0) ? dx : -dx; __dubsin (a, da, w); eps = fabs (orig) * 1.1e-30; cor = 1.000000001 * w[1] + __copysign (eps, w[1]); if (w[0] == w[0] + cor) return __copysign (w[0], x); t = (orig * hpinv + toint); xn = t - toint; v.x = t; y = (orig - xn * mp1) - xn * mp2; n = (v.i[LOW_HALF] + shift_quadrant) & 3; da = xn * pp3; t = y - da; da = (y - t) - da; y = xn * pp4; a = t - y; da = ((t - a) - y) + da; if (n & 2) { a = -a; da = -da; } x = fabs (a); dx = (a > 0) ? da : -da; __dubsin (x, dx, w); eps = fabs (orig) * 1.1e-40; cor = 1.000000001 * w[1] + __copysign (eps, w[1]); if (w[0] == w[0] + cor) return __copysign (w[0], a); return shift_quadrant ? __mpcos (orig, 0, true) : __mpsin (orig, 0, true); } /* Compute sin(x + dx) where X is in the first or third quarter of the unit circle. ORIG is the original value of X for computing error of the result. If the result is not accurate enough, the routine calls mpsin or dubsin. SHIFT_QUADRANT rotates the unit circle by 1 to compute the cosine instead of sine. */ static inline double __always_inline sloww1 (double x, double dx, double orig, bool shift_quadrant) { double w[2], cor, res; res = do_sin_slow (x, dx, 3.1e-30 * fabs (orig), &cor); if (res == res + cor) return __copysign (res, x); dx = (x > 0 ? dx : -dx); __dubsin (fabs (x), dx, w); double eps = 1.1e-30 * fabs (orig); cor = 1.000000005 * w[1] + __copysign (eps, w[1]); if (w[0] == w[0] + cor) return __copysign (w[0], x); return shift_quadrant ? __mpcos (orig, 0, true) : __mpsin (orig, 0, true); } /***************************************************************************/ /* Routine compute sin(x+dx) (Double-Length number) where x in second or */ /* fourth quarter of unit circle.Routine receive also the original value */ /* and quarter(n= 1or 3)of x for computing error of result.And if result not*/ /* accurate enough routine calls mpsin1 or dubsin */ /***************************************************************************/ static inline double __always_inline sloww2 (double x, double dx, double orig, int n) { double w[2], cor, res; res = do_cos_slow (x, dx, 3.1e-30 * fabs (orig), &cor); if (res == res + cor) return (n & 2) ? -res : res; dx = x > 0 ? dx : -dx; __docos (fabs (x), dx, w); double eps = 1.1e-30 * fabs (orig); cor = 1.000000005 * w[1] + __copysign (eps, w[1]); if (w[0] == w[0] + cor) return (n & 2) ? -w[0] : w[0]; return (n & 1) ? __mpsin (orig, 0, true) : __mpcos (orig, 0, true); } /***************************************************************************/ /* Routine compute sin(x+dx) or cos(x+dx) (Double-Length number) where x */ /* is small enough to use Taylor series around zero and (x+dx) */ /* in first or third quarter of unit circle.Routine receive also */ /* (right argument) the original value of x for computing error of */ /* result.And if result not accurate enough routine calls other routines */ /***************************************************************************/ static inline double __always_inline bsloww (double x, double dx, double orig, int n) { double res, cor, w[2], a, da; res = TAYLOR_SLOW (x, dx, cor); cor = 1.0005 * cor + __copysign (1.1e-24, cor); if (res == res + cor) return res; a = fabs (x); da = (x > 0) ? dx : -dx; __dubsin (a, da, w); cor = 1.000000001 * w[1] + __copysign (1.1e-24, w[1]); if (w[0] == w[0] + cor) return __copysign (w[0], x); return (n & 1) ? __mpcos (orig, 0, true) : __mpsin (orig, 0, true); } /***************************************************************************/ /* Routine compute sin(x+dx) or cos(x+dx) (Double-Length number) where x */ /* in first or third quarter of unit circle.Routine receive also */ /* (right argument) the original value of x for computing error of result.*/ /* And if result not accurate enough routine calls other routines */ /***************************************************************************/ static inline double __always_inline bsloww1 (double x, double dx, double orig, int n) { double w[2], cor, res; res = do_sin_slow (x, dx, 1.1e-24, &cor); if (res == res + cor) return (x > 0) ? res : -res; dx = (x > 0) ? dx : -dx; __dubsin (fabs (x), dx, w); cor = 1.000000005 * w[1] + __copysign (1.1e-24, w[1]); if (w[0] == w[0] + cor) return __copysign (w[0], x); return (n & 1) ? __mpcos (orig, 0, true) : __mpsin (orig, 0, true); } /***************************************************************************/ /* Routine compute sin(x+dx) or cos(x+dx) (Double-Length number) where x */ /* in second or fourth quarter of unit circle.Routine receive also the */ /* original value and quarter(n= 1or 3)of x for computing error of result. */ /* And if result not accurate enough routine calls other routines */ /***************************************************************************/ static inline double __always_inline bsloww2 (double x, double dx, double orig, int n) { double w[2], cor, res; res = do_cos_slow (x, dx, 1.1e-24, &cor); if (res == res + cor) return (n & 2) ? -res : res; dx = (x > 0) ? dx : -dx; __docos (fabs (x), dx, w); cor = 1.000000005 * w[1] + __copysign (1.1e-24, w[1]); if (w[0] == w[0] + cor) return (n & 2) ? -w[0] : w[0]; return (n & 1) ? __mpsin (orig, 0, true) : __mpcos (orig, 0, true); } /************************************************************************/ /* Routine compute cos(x) for 2^-27 < |x|< 0.25 by Taylor with more */ /* precision and if still doesn't accurate enough by mpcos or docos */ /************************************************************************/ static inline double __always_inline cslow2 (double x) { double w[2], cor, res; res = do_cos_slow (x, 0, 0, &cor); if (res == res + cor) return res; __docos (fabs (x), 0, w); if (w[0] == w[0] + 1.000000005 * w[1]) return w[0]; return __mpcos (x, 0, false); } #ifndef __cos weak_alias (__cos, cos) # ifdef NO_LONG_DOUBLE strong_alias (__cos, __cosl) weak_alias (__cos, cosl) # endif #endif #ifndef __sin weak_alias (__sin, sin) # ifdef NO_LONG_DOUBLE strong_alias (__sin, __sinl) weak_alias (__sin, sinl) # endif #endif ~~~ # Best Algorithm for Bit Reversal ( from MSB->LSB to LSB->MSB) in C ~~~ up vote 164 down vote favorite 128 What is the best algorithm to achieve the following: 0010 0000 => 0000 0100 The conversion is from MSB->LSB to LSB->MSB. All bits must be reversed; that is, this is not endianness-swapping. c algorithm bit-manipulation shareedit edited Apr 23 '09 at 10:48 asked Apr 14 '09 at 2:48 green_t 1,0133917 1 I think the appropriate name is a bitwise operation. – Lucas Apr 14 '09 at 2:52 1 I think you meant reversal, not rotation. – Juliano Apr 14 '09 at 2:53 Most ARM processors have a built-in operation for that. The ARM Cortex-M0 doesn't, and I found using a per-byte table to swap bits is the fastest approach. – starblue Jun 19 '14 at 21:09 Also see Sean Eron Anderson's Bit Twiddling Hacks. – jww Aug 9 '15 at 16:42 Please define "best" – Lee Taylor Dec 12 '15 at 23:38 add a comment 26 Answers active oldest votes up vote 373 down vote accepted NOTE: All algorithms below are in C, but should be portable to your language of choice (just don't look at me when they're not as fast :) ~~~ Options Low Memory (32-bit int, 32-bit machine)(from here): ~~~ unsigned int reverse(register unsigned int x) { x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1)); x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2)); x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4)); x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8)); return((x >> 16) | (x << 16)); } From the famous Bit Twiddling Hacks page: Fastest (lookup table): static const unsigned char BitReverseTable256[] = { 0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0, 0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8, 0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4, 0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC, 0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2, 0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA, 0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6, 0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE, 0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1, 0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9, 0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5, 0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD, 0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3, 0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB, 0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7, 0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF }; unsigned int v; // reverse 32-bit value, 8 bits at time unsigned int c; // c will get v reversed // Option 1: c = (BitReverseTable256[v & 0xff] << 24) | (BitReverseTable256[(v >> 8) & 0xff] << 16) | (BitReverseTable256[(v >> 16) & 0xff] << 8) | (BitReverseTable256[(v >> 24) & 0xff]); // Option 2: unsigned char * p = (unsigned char *) &v; unsigned char * q = (unsigned char *) &c; q[3] = BitReverseTable256[p[0]]; q[2] = BitReverseTable256[p[1]]; q[1] = BitReverseTable256[p[2]]; q[0] = BitReverseTable256[p[3]]; ~~~ You can extend this idea to 64-bit ints, or trade off memory for speed (assuming your L1 Data Cache is large enough), and reverse 16-bits at a time with a 64K-entry lookup table. Others Simple ~~~ unsigned int v; // input bits to be reversed unsigned int r = v; // r will be reversed bits of v; first get LSB of v int s = sizeof(v) * CHAR_BIT - 1; // extra shift needed at end for (v >>= 1; v; v >>= 1) { r <<= 1; r |= v & 1; s--; } r <<= s; // shift when v's highest bits are zero Faster (32-bit processor) unsigned char b = x; b = ((b * 0x0802LU & 0x22110LU) | (b * 0x8020LU & 0x88440LU)) * 0x10101LU >> 16; Faster (64-bit processor) unsigned char b; // reverse this (8-bit) byte b = (b * 0x0202020202ULL & 0x010884422010ULL) % 1023; If you want to do this on a 32-bit int, just reverse the bits in each bytes, and reverse the order of the bytes. That is: unsigned int toReverse; unsigned int reversed; unsigned char inByte0 = (toReverse & 0xFF); unsigned char inByte1 = (toReverse & 0xFF00) >> 8; unsigned char inByte2 = (toReverse & 0xFF0000) >> 16; unsigned char inByte3 = (toReverse & 0xFF000000) >> 24; reversed = (reverseBits(inByte0) << 24) | (reverseBits(inByte1) << 16) | (reverseBits(inByte2) << 8) | (reverseBits(inByte3); ~~~ Results I benchmarked the two most promising solutions, the lookup table, and bitwise-AND (the first one). The test machine is a laptop w/ 4GB of DDR2-800 and a Core 2 Duo T7500 @ 2.4GHz, 4MB L2 Cache; YMMV. I used gcc 4.3.2 on 64-bit Linux. OpenMP (and the GCC bindings) were used for high-resolution timers. reverse.c ~~~ #include #include #include unsigned int reverse(register unsigned int x) { x = (((x & 0xaaaaaaaa) >> 1) | ((x & 0x55555555) << 1)); x = (((x & 0xcccccccc) >> 2) | ((x & 0x33333333) << 2)); x = (((x & 0xf0f0f0f0) >> 4) | ((x & 0x0f0f0f0f) << 4)); x = (((x & 0xff00ff00) >> 8) | ((x & 0x00ff00ff) << 8)); return((x >> 16) | (x << 16)); } int main() { unsigned int *ints = malloc(100000000*sizeof(unsigned int)); unsigned int *ints2 = malloc(100000000*sizeof(unsigned int)); for(unsigned int i = 0; i < 100000000; i++) ints[i] = rand(); unsigned int *inptr = ints; unsigned int *outptr = ints2; unsigned int *endptr = ints + 100000000; // Starting the time measurement double start = omp_get_wtime(); // Computations to be measured while(inptr != endptr) { (*outptr) = reverse(*inptr); inptr++; outptr++; } // Measuring the elapsed time double end = omp_get_wtime(); // Time calculation (in seconds) printf("Time: %f seconds\n", end-start); free(ints); free(ints2); return 0; } ~~~ reverse_lookup.c ~~~ #include #include #include static const unsigned char BitReverseTable256[] = { 0x00, 0x80, 0x40, 0xC0, 0x20, 0xA0, 0x60, 0xE0, 0x10, 0x90, 0x50, 0xD0, 0x30, 0xB0, 0x70, 0xF0, 0x08, 0x88, 0x48, 0xC8, 0x28, 0xA8, 0x68, 0xE8, 0x18, 0x98, 0x58, 0xD8, 0x38, 0xB8, 0x78, 0xF8, 0x04, 0x84, 0x44, 0xC4, 0x24, 0xA4, 0x64, 0xE4, 0x14, 0x94, 0x54, 0xD4, 0x34, 0xB4, 0x74, 0xF4, 0x0C, 0x8C, 0x4C, 0xCC, 0x2C, 0xAC, 0x6C, 0xEC, 0x1C, 0x9C, 0x5C, 0xDC, 0x3C, 0xBC, 0x7C, 0xFC, 0x02, 0x82, 0x42, 0xC2, 0x22, 0xA2, 0x62, 0xE2, 0x12, 0x92, 0x52, 0xD2, 0x32, 0xB2, 0x72, 0xF2, 0x0A, 0x8A, 0x4A, 0xCA, 0x2A, 0xAA, 0x6A, 0xEA, 0x1A, 0x9A, 0x5A, 0xDA, 0x3A, 0xBA, 0x7A, 0xFA, 0x06, 0x86, 0x46, 0xC6, 0x26, 0xA6, 0x66, 0xE6, 0x16, 0x96, 0x56, 0xD6, 0x36, 0xB6, 0x76, 0xF6, 0x0E, 0x8E, 0x4E, 0xCE, 0x2E, 0xAE, 0x6E, 0xEE, 0x1E, 0x9E, 0x5E, 0xDE, 0x3E, 0xBE, 0x7E, 0xFE, 0x01, 0x81, 0x41, 0xC1, 0x21, 0xA1, 0x61, 0xE1, 0x11, 0x91, 0x51, 0xD1, 0x31, 0xB1, 0x71, 0xF1, 0x09, 0x89, 0x49, 0xC9, 0x29, 0xA9, 0x69, 0xE9, 0x19, 0x99, 0x59, 0xD9, 0x39, 0xB9, 0x79, 0xF9, 0x05, 0x85, 0x45, 0xC5, 0x25, 0xA5, 0x65, 0xE5, 0x15, 0x95, 0x55, 0xD5, 0x35, 0xB5, 0x75, 0xF5, 0x0D, 0x8D, 0x4D, 0xCD, 0x2D, 0xAD, 0x6D, 0xED, 0x1D, 0x9D, 0x5D, 0xDD, 0x3D, 0xBD, 0x7D, 0xFD, 0x03, 0x83, 0x43, 0xC3, 0x23, 0xA3, 0x63, 0xE3, 0x13, 0x93, 0x53, 0xD3, 0x33, 0xB3, 0x73, 0xF3, 0x0B, 0x8B, 0x4B, 0xCB, 0x2B, 0xAB, 0x6B, 0xEB, 0x1B, 0x9B, 0x5B, 0xDB, 0x3B, 0xBB, 0x7B, 0xFB, 0x07, 0x87, 0x47, 0xC7, 0x27, 0xA7, 0x67, 0xE7, 0x17, 0x97, 0x57, 0xD7, 0x37, 0xB7, 0x77, 0xF7, 0x0F, 0x8F, 0x4F, 0xCF, 0x2F, 0xAF, 0x6F, 0xEF, 0x1F, 0x9F, 0x5F, 0xDF, 0x3F, 0xBF, 0x7F, 0xFF }; int main() { unsigned int *ints = malloc(100000000*sizeof(unsigned int)); unsigned int *ints2 = malloc(100000000*sizeof(unsigned int)); for(unsigned int i = 0; i < 100000000; i++) ints[i] = rand(); unsigned int *inptr = ints; unsigned int *outptr = ints2; unsigned int *endptr = ints + 100000000; // Starting the time measurement double start = omp_get_wtime(); // Computations to be measured while(inptr != endptr) { unsigned int in = *inptr; // Option 1: //*outptr = (BitReverseTable256[in & 0xff] << 24) | // (BitReverseTable256[(in >> 8) & 0xff] << 16) | // (BitReverseTable256[(in >> 16) & 0xff] << 8) | // (BitReverseTable256[(in >> 24) & 0xff]); // Option 2: unsigned char * p = (unsigned char *) &(*inptr); unsigned char * q = (unsigned char *) &(*outptr); q[3] = BitReverseTable256[p[0]]; q[2] = BitReverseTable256[p[1]]; q[1] = BitReverseTable256[p[2]]; q[0] = BitReverseTable256[p[3]]; inptr++; outptr++; } // Measuring the elapsed time double end = omp_get_wtime(); // Time calculation (in seconds) printf("Time: %f seconds\n", end-start); free(ints); free(ints2); return 0; } ~~~ I tried both approaches at several different optimizations, ran 3 trials at each level, and each trial reversed 100 million random unsigned ints. For the lookup table option, I tried both schemes (options 1 and 2) given on the bitwise hacks page. Results are shown below. Bitwise AND ~~~ mrj10@mjlap:~/code$ gcc -fopenmp -std=c99 -o reverse reverse.c mrj10@mjlap:~/code$ ./reverse Time: 2.000593 seconds mrj10@mjlap:~/code$ ./reverse Time: 1.938893 seconds mrj10@mjlap:~/code$ ./reverse Time: 1.936365 seconds mrj10@mjlap:~/code$ gcc -fopenmp -std=c99 -O2 -o reverse reverse.c mrj10@mjlap:~/code$ ./reverse Time: 0.942709 seconds mrj10@mjlap:~/code$ ./reverse Time: 0.991104 seconds mrj10@mjlap:~/code$ ./reverse Time: 0.947203 seconds mrj10@mjlap:~/code$ gcc -fopenmp -std=c99 -O3 -o reverse reverse.c mrj10@mjlap:~/code$ ./reverse Time: 0.922639 seconds mrj10@mjlap:~/code$ ./reverse Time: 0.892372 seconds mrj10@mjlap:~/code$ ./reverse Time: 0.891688 seconds Lookup Table (option 1) mrj10@mjlap:~/code$ gcc -fopenmp -std=c99 -o reverse_lookup reverse_lookup.c mrj10@mjlap:~/code$ ./reverse_lookup Time: 1.201127 seconds mrj10@mjlap:~/code$ ./reverse_lookup Time: 1.196129 seconds mrj10@mjlap:~/code$ ./reverse_lookup Time: 1.235972 seconds mrj10@mjlap:~/code$ gcc -fopenmp -std=c99 -O2 -o reverse_lookup reverse_lookup.c mrj10@mjlap:~/code$ ./reverse_lookup Time: 0.633042 seconds mrj10@mjlap:~/code$ ./reverse_lookup Time: 0.655880 seconds mrj10@mjlap:~/code$ ./reverse_lookup Time: 0.633390 seconds mrj10@mjlap:~/code$ gcc -fopenmp -std=c99 -O3 -o reverse_lookup reverse_lookup.c mrj10@mjlap:~/code$ ./reverse_lookup Time: 0.652322 seconds mrj10@mjlap:~/code$ ./reverse_lookup Time: 0.631739 seconds mrj10@mjlap:~/code$ ./reverse_lookup Time: 0.652431 seconds Lookup Table (option 2) mrj10@mjlap:~/code$ gcc -fopenmp -std=c99 -o reverse_lookup reverse_lookup.c mrj10@mjlap:~/code$ ./reverse_lookup Time: 1.671537 seconds mrj10@mjlap:~/code$ ./reverse_lookup Time: 1.688173 seconds mrj10@mjlap:~/code$ ./reverse_lookup Time: 1.664662 seconds mrj10@mjlap:~/code$ gcc -fopenmp -std=c99 -O2 -o reverse_lookup reverse_lookup.c mrj10@mjlap:~/code$ ./reverse_lookup Time: 1.049851 seconds mrj10@mjlap:~/code$ ./reverse_lookup Time: 1.048403 seconds mrj10@mjlap:~/code$ ./reverse_lookup Time: 1.085086 seconds mrj10@mjlap:~/code$ gcc -fopenmp -std=c99 -O3 -o reverse_lookup reverse_lookup.c mrj10@mjlap:~/code$ ./reverse_lookup Time: 1.082223 seconds mrj10@mjlap:~/code$ ./reverse_lookup Time: 1.053431 seconds mrj10@mjlap:~/code$ ./reverse_lookup Time: 1.081224 seconds ~~~ Conclusion Use the lookup table, with option 1 (byte addressing is unsurprisingly slow) if you're concerned about performance. If you need to squeeze every last byte of memory out of your system (and you might, if you care about the performance of bit reversal), the optimized versions of the bitwise-AND approach aren't too shabby either. Caveat Yes, I know the benchmark code is a complete hack. Suggestions on how to improve it are more than welcome. Things I know about: I don't have access to ICC. This may be faster (please respond in a comment if you can test this out). A 64K lookup table may do well on some modern microarchitectures with large L1D. -mtune=native didn't work for -O2/-O3 (ld blew up with some crazy symbol redefinition error), so I don't believe the generated code is tuned for my microarchitecture. There may be a way to do this slightly faster with SSE. I have no idea how, but with fast replication, packed bitwise AND, and swizzling instructions, there's got to be something there. I know only enough x86 assembly to be dangerous; here's the code GCC generated on -O3 for option 1, so somebody more knowledgable than myself can check it out: 32-bit ~~~ .L3: movl (%r12,%rsi), %ecx movzbl %cl, %eax movzbl BitReverseTable256(%rax), %edx movl %ecx, %eax shrl $24, %eax mov %eax, %eax movzbl BitReverseTable256(%rax), %eax sall $24, %edx orl %eax, %edx movzbl %ch, %eax shrl $16, %ecx movzbl BitReverseTable256(%rax), %eax movzbl %cl, %ecx sall $16, %eax orl %eax, %edx movzbl BitReverseTable256(%rcx), %eax sall $8, %eax orl %eax, %edx movl %edx, (%r13,%rsi) addq $4, %rsi cmpq $400000000, %rsi jne .L3 ~~~ EDIT: I also tried using uint64_t's on my machine to see if there was any performance boost. Performance was about 10% faster than 32-bit, and was nearly identical whether you were just using 64-bit types to reverse bits on two 32-bit ints at a time, or whether you were actually reversing bits in half as many 64-bit values. The assembly code is shown below (for the former case, reversing bits for 2 32-bit ints at a time): ~~~ .L3: movq (%r12,%rsi), %rdx movq %rdx, %rax shrq $24, %rax andl $255, %eax movzbl BitReverseTable256(%rax), %ecx movzbq %dl,%rax movzbl BitReverseTable256(%rax), %eax salq $24, %rax orq %rax, %rcx movq %rdx, %rax shrq $56, %rax movzbl BitReverseTable256(%rax), %eax salq $32, %rax orq %rax, %rcx movzbl %dh, %eax shrq $16, %rdx movzbl BitReverseTable256(%rax), %eax salq $16, %rax orq %rax, %rcx movzbq %dl,%rax shrq $16, %rdx movzbl BitReverseTable256(%rax), %eax salq $8, %rax orq %rax, %rcx movzbq %dl,%rax shrq $8, %rdx movzbl BitReverseTable256(%rax), %eax salq $56, %rax orq %rax, %rcx movzbq %dl,%rax shrq $8, %rdx movzbl BitReverseTable256(%rax), %eax andl $255, %edx salq $48, %rax orq %rax, %rcx movzbl BitReverseTable256(%rdx), %eax salq $40, %rax orq %rax, %rcx movq %rcx, (%r13,%rsi) addq $8, %rsi cmpq $400000000, %rsi jne .L3 ~~~ shareedit edited Jun 4 '12 at 15:04 Shahbaz 27k552100 answered Apr 14 '09 at 3:11 Matt J 20.8k53452 6 It was an interesting exercise, if not all that fulfilling. If nothing else, I hope seeing the process is constructive to somebody else who may want to benchmark something more meritorious :) – Matt J Apr 14 '09 at 7:09 27 Good god! What a thorough answer! – Nathan Fellman Apr 14 '09 at 8:06 2 Why can't I mark an answer as favorite? :) This might be handy, great reference. – lacop Apr 14 '09 at 11:16 5 There needs to be a "best of" on stackoverflow for answers like this. – Jay Nov 1 '11 at 14:26 3 My...God! I think I've found...what may very well be...a TRUE speciman. I shall have to consult my documents, and do further research, but something tells me (God, help me), that this is by far the greatest, most thorough and useful answer Stack Overflow has yet. Even John Skeet would be both appalled and impressed! – zeboidlund Mar 5 '12 at 2:42 show 10 more comments # Draw Circle ~~~ // C code void drawcircle(int x0, int y0, int radius) { int x = radius; int y = 0; int err = 0; while (x >= y) { putpixel(x0 + x, y0 + y); // angle :: 0 ~ 4/pi putpixel(x0 + y, y0 + x); putpixel(x0 - y, y0 + x); putpixel(x0 - x, y0 + y); putpixel(x0 - x, y0 - y); putpixel(x0 - y, y0 - x); putpixel(x0 + y, y0 - x); putpixel(x0 + x, y0 - y); if (err <= 0) { y += 1; err += 2*y + 1; } if (err > 0) { x -= 1; err -= 2*x + 1; } } } ~~~ # Paint paint closed area = recursion 塗りつぶし = 再帰 (x,y) から始めて、色が　c1 色なら c2 色に塗りつぶす ~~~ procedure Paint(x,y,c1,c2: longint); begin if PixelColor(x,y)=C1 then begin SetPixelColor(x,y,C2); Paint(x+1,y); Paint(x,y+1); Paint(x-1,y); Paint(x,y-1); end; end: ~~~ # naming set theory on prime number when we name set use, ~~~ set[2], set[3], set[5], set[7], set[11], ... set[prime(1)], set[prime(2)], ... ~~~ then we can have arbitrary product of sets and then compress this database with radix tree or something if (n != noprime ; set[n]) it is product of sets and decomposing to prime number gets each sets. is there any advantage than directly pointing DB?