Startup panic switch

3/9/2023

Startup panic switch

Read Now

But even then it's a tricky balance, where something like pattern analysis is really hard if not impossible to do without some slowdown for cold code. The cold results don't look too hot for small sizes <= 20, these could be addressed by adding extra logic.

The cheap to access but expensive to compare type f128 shows a modest speedup of 10-20%, with descending inputs again showing the largest outliers at a 5x speedup for descending-20, which switches from insertion sort to the more sophisticated sort_small. But it might be thanks to the tuned insert right and left functions. The 1k type generally produces the most noisy results, so I'm not sure this is a real signal. I'd argue that the overall speedup is worth it here, but this can be tuned with the qualifies_for_branchless_sort heuristic. For smaller sizes there are also a noticeable amount of slowdowns. These results are in-line with the average reduced comparison counts and the extreme speedups are explained by the ability to detect fully or mostly decreasing inputs even for small inputs. Here are the speedups and slowdowns for Zen3:įor hot-u64 we can see that the speedups are the most extreme for smaller sizes and level out at 30%.įor hot-string which is relatively expensive to access we see it level out at 4% while seeing 10-15% speedup for smaller sizes, and 3x speedup as the most extreme case hot-string-descending-20. The full benchmark results can be found here: There I go into more detail on the test methodology and graphs. To understand this PR, it's advisable to read #100856.

Near perfect sorting of descending inputs starting from size 8.
On average fewer comparisons for types that are judged cheap to move.Same performance for very densely distributed random inputs and ascending and descending inputs, len >= 40.1.2-1.5x faster than existing implementation for random, saw and other patterns (integers).In my repository I have all the implementations copied into individual modules, and I'm not sure how to test that in the standard library. But even then I'm not sure how that affects LTO and inlining, which are critical for performance. In essence I could image them both living in core and stable sorting requiring a passed in function that does the allocation. They live in separate modules, and I don't know enough about the structure of the standard library to unify them. And from a code structure point, it copies several elements from slice::sort. That includes tests not found or not found with the same rigor in the standard library tests, observable_is_less, panic_retain_original_set and violate_ord_retain_original_set. It passes my test suite in both with normal Rust and Miri. Before going into too much detail on the speedup, the most important thing is correctness. Fundamentally it uses optimal sorting networks to speedup sorting small slices. This is a followup to #100856, this time speeding up slice::unstable_sort.

0 Comments

Startup panic switch

Leave a Reply.

Author

Archives

Categories