![]() But even then it's a tricky balance, where something like pattern analysis is really hard if not impossible to do without some slowdown for cold code. The cold results don't look too hot for small sizes <= 20, these could be addressed by adding extra logic. ![]() The cheap to access but expensive to compare type f128 shows a modest speedup of 10-20%, with descending inputs again showing the largest outliers at a 5x speedup for descending-20, which switches from insertion sort to the more sophisticated sort_small. But it might be thanks to the tuned insert right and left functions. The 1k type generally produces the most noisy results, so I'm not sure this is a real signal. I'd argue that the overall speedup is worth it here, but this can be tuned with the qualifies_for_branchless_sort heuristic. For smaller sizes there are also a noticeable amount of slowdowns. These results are in-line with the average reduced comparison counts and the extreme speedups are explained by the ability to detect fully or mostly decreasing inputs even for small inputs. Here are the speedups and slowdowns for Zen3:įor hot-u64 we can see that the speedups are the most extreme for smaller sizes and level out at 30%.įor hot-string which is relatively expensive to access we see it level out at 4% while seeing 10-15% speedup for smaller sizes, and 3x speedup as the most extreme case hot-string-descending-20. The full benchmark results can be found here: There I go into more detail on the test methodology and graphs. To understand this PR, it's advisable to read #100856.
0 Comments
Leave a Reply. |
Details
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |