I am developing the quadratic sieve algorithm and I reached a new bottle neck: The matrix processing.
I been reading quit a lot about this topic and I found many solutions
- Gaussian elimination: This perhaps the most common approach for this problem. It's running time is $O(N^3)$ above GF (2)
- Method of Four Russians: It optimize the classic Gaussian elimination to $O(\frac{N^3}{\log{N}})$ by partition the matrix into small square blocks of size $\log N$. You can read more about it here(section 3.2).
- Block Wiedemann algorithm is used to parallelize the matrix of several machines in order to speed up the process. Here is a good example of such implementation over GAPP, they use a little bit different algorithm for that.
But non of the above was good enough for me, so I created my own algorithm.
It's based on Gaussian elimination, but instead of searching for pivot in each column I represent the columns with hash table and linked list(This is how HashMap in java is implemented) where each node of the list is stored in hash table and contain the pivot index of the original matrix. So I can iterate over all the pivots of column.
When I start to perform Gaussian elimination it only takes $O(1)$ to find the first pivot of a column, once I find such pivot I store it in another hash table, so I want pick it for the next column I will operate. Next I iterate over the reset of the pivot of the first column and xor the rows to eliminate all the reset of the pivot of the first column. I do the same trick with the rows, so I got duplicate representation of the matrix, by hash-table rows and columns. This allows me to speed up the row xoring process as I only iterate over the row pivot. The xor operation itself of each node is $O(1)$ but it evolves 2 hash operations and it's by it self quite expensive. And so this is the part where this algorithm fails. Even so that I estimate it's performance as sub-quadratic on average.
I ended up implementing classic Gaussian elimination using bit operators and it allows me to process square matrices with up to $10,000$ lines in minutes time. But it's not fast enough for me, my goal is factoring 100 digits number, this require around 1M square matrix size, and this impossible with my current approach.
Is there any other methods that I missed? What is the best non parallel method to solve this problem?
No comments:
Post a Comment