Monthly Archives: February 2013

Sorting index and data arrays simultaneously

MIAs have to allow indexing in any arbitrary order. When considering SparseMIAs, this means that their non-zero indices must be re-sorted frequently. While this can be done easily, there is one important caveat—every time non-zeros indices are sorted, the corresponding non-zero data entries must also be sorted in the same way. At first glance, this seems somewhat trivial, but actually it isn’t if you’re wedded to using the C++ Standard Library for your sorting and algorithms. It also isn’t trivial if you’d like a fast solution that doesn’t consume extra memory. For some more background and quick analysis of the different options, see David Gleich’s post, which was inspired by the similar problem of sorting compressed-format sparse matrices.

The reason why this is difficult, is that the C++ Standard places some strict requirements on how iterators refer to their underlying containers. This allows its battle-hardened algorithms to use optimizations, such as moving sections of the containers to temporary buffers, that provide noticeable improvements to running-time.

However, these optimizations create a barrier to simple solutions, e.g., swapping the contents of one array whenever another array is swapped. Several solutions work around this problem, by using tuples of iterators. The most obvious is boost::zip_iterator, which unfortunately does not work with std::sort. David Gleich provides a solution in his post that works with std::sort. As he notes, usability, not performance was his requirement. Another solution is the Tuple Iterator written by Anthony Williams, which provides a tuple of iterators that conforms fully to the C++ Standard.

Both Williams and Gleich’s solutions are excellent and clever, but unfortunately they both also suffer from slow performance due to the amount of indirection involved to be able to work with C++ Standard algorithms. Since sorting is so central to LibMIA’s sparse functionality, performance is a big concern.

So, I thought I would share how LibMIA accomplishes this. First off, note that if a sorting algorithm was guaranteed to only swap, and not create temporary buffers, then as noted above, a good solution to this problem would be to simply swap entries in the data array every time the index array is swapped. To that end, LibMIA uses a modified version of algorithm guru Keith Schwartz’s IntroSort code. (As an aside, anyone even marginally interested in algorithms should take a look through Keith Schwartz’s incredible repository of open-source algorithms).

For those who don’t know, IntroSort is also the sorting algorithm behind std::sort, so it is a good choice. LibMIA’s modifies the Schwarz’s code by providing the user the option to use a swapping class  instead of the usual std::iter_swap. As well, LibMIA removes all references to the C++ Standard Library algorithms that was in the original code (for instance to perform heap sort), just because I was paranoid about the aforementioned temporary buffers, and I want to guarantee that only swapping will take place.

One example of a swapping class is a dual swapper, defined as:

template<class MainIterator , class FollowerIterator >
struct DualSwapper
{
const MainIterator _mainBegin;
const FollowerIterator _followBegin;
DualSwapper(MainIterator _mainIt,FollowerIterator _followIt): _mainBegin(_mainIt),_followBegin(_followIt){}
void operator()(MainIterator it1, MainIterator it2) const
{
std::iter_swap(_followBegin+(it1-_mainBegin),_followBegin+(it2-_mainBegin));
std::iter_swap(it1,it2);
}
};

The class acts as an alternative to std::iter_swap that will perform an extra swap. Upon construction, you have to provide it with the beginning iterators of both the index and the data array. It uses these beginning iterators as the means to swap the data container (or following container), every time a swap is performed on the index container (or main container).

Putting it all together, if we want to sort a data array based on how an index array is sorted, we can do so using the following code:

size_t N = //some size
std::vector<index_type> index_array(N);
typedef std::vector<index_type>::iterator index_iterator;
std::vector<data_type> data_array(N);
typedef std::vector<data_type>::iterator data_iterator;
//fill both arrays somehow
//
//Now perform sorting
Introsort(index_array.begin(),index_array.end(),std::less<index_type>(), DualSwapper<index_iterator,data_iterator>(index_array.begin(),data_array.begin()));

The running times of the above are very good, posting times slower to but comparable with std::sort of just the index_array. Those interested in the routine can find it in the LibMIAAlgorithm.h header file. Using it should be as simple as including the header. I hope it ends up being useful to someone else too!

Finally, thank you to David Gleich, Anthony Williams, and Keith Schwartz for generously sharing their knowledge and code online.

Cheers,

Adam Harrison