arithmetic - Calculate fractional part of square root without taking square root

Monday, 10 December 2018

arithmetic - Calculate fractional part of square root without taking square root

Let's say I have a number $x>0$ and I need to calculate the fractional part of its square root:

$f(x) = \sqrt x-\lfloor\sqrt x\rfloor$

If I have $\lfloor\sqrt x\rfloor$ available, is there a way I can achieve (or approximate) this without having to calculate any other square roots? I would like to avoid square roots for performance reasons as they are usually a rather slow operation to calculate.

As an example of what I am thinking about, here is one option:

$f(x)\approx\frac{x-\lfloor\sqrt x\rfloor^2}{(\lfloor\sqrt x\rfloor+1)^2-\lfloor\sqrt x\rfloor^2}$

As $x$ gets large, this approximation becomes better and better (less than 1% error if $\sqrt x\ge40$ ), but for small $x$ , it's quite terrible (unbounded error as $x$ goes to $0$ ).

Can I do better? Preferably an error of less than 1% for any $\sqrt x<10000$ .

The holy grail would be to find a formula that doesn't need any division either.

Aside:
In case anyone is interested why I need this: I'm trying to see if I can speed up Xiaolin Wu's anti-aliased circle drawing functions by calculating the needed variables incrementally (Bresenham-style)

Answer

If you are using IEEE 754 compliant floating point numbers, it may be that square root operations are faster than you might suppose, as the square root operation is directly supported in the standard and any compliant hardware is guaranteed to round correctly (unlike for sine and cosine) and is often using a hardware implementation of the Newton-Raphson method Alijah described.

That said, the algorithm you linked to only uses the fractional part of the square root to calculate pixel opacity, and consequently the final opacity value ranges only from 0 to 255. Because of the small range, floating point numbers may be overkill and a fixed-point integer representation might work better. If the range is truly only a byte and the maximum size of your radii aren't too large, you can use a look-up table with fixed-point input to skip the expensive square root calculation. A 16-bit fixed-point number would give you a 64KB look-up table, which isn't too bad.

You might also be able to avoid a division and square root operation for calculating the $45^\circ$ value (ffd in the algorithm) by using the Fast Inverse Square Root hack.

Now for the question of whether there is a method to calculate the fractional part of a square root knowing only the integer part, there is one approach that iteratively calculates a square root and minimizes divisions: Continued Fractions. The simplest approach I know of for what you are wanting to do (fast convergence) is detailed here and works as follows:

$a_0 = x - \lfloor\sqrt x\rfloor^2$

$b_0 = 2\lfloor\sqrt x\rfloor$

$a_n = a_0b_{n-1}$

$b_n = b_0b_{n-1} + a_{n-1}$

which gives you quadratically better and better approximations of the fractional part of $\sqrt x$ , and you divide $\frac{a_n}{b_n}$ when you've done enough iterations to ensure accuracy. If you are ultimately needing only byte sized opacity values, it should only take 3 or 4 iterations or so, and we save the division until the end, which is the only significant difference between this and the Newton-Raphson method other than the fact that it gives you the fractional part directly.

If you really want to pursue incrementally calculating variables as far as it can go, you can use Gosper's continued fraction algorithms (see especially the section on square roots) and calculate all the variables involved as continued fractions one term at a time. This allows you to avoid square root calculations and divisions other than bit shifts, as well as abort as soon as you know the correct pixel opacity (or whatever else you're calculating) without wasting time calculating digits you don't need, but it involves a serious overhaul to the algorithm you linked, and if I went into detail this answer would turn into a book.

So essentially, if you have memory to spare and the max length of your radii isn't huge and your output size is small, go with a look-up table with fixed-point numbers. If you want simplicity of implementation, go with floating point or fixed-point numbers. If you want to absolutely avoid square root calculation without a look-up table go with Newton-Raphson or the continued fraction variant on it. If you want to absolutely minimize wasted computation at the expense of some up-front overhead, go with Gosper's continued fraction algorithms.

Blog

Monday, 10 December 2018