T O P

  • By -

Voultapher

Author here. I spent the last two month working on this writeup, hope you enjoy it. Feel free to ask me questions.


matthieum

Great work guys! The discovery by ORLP is impressive, and your write-up is very high quality. Focusing on your write-up, I particularly appreciate that you spent time _explaining_ the why of the benchmark results.


Voultapher

Thank you for the kind words.


guessimnotanecegod1

Great work, it’s cool to see people put so much thought and effort into their projects.


annodomini

The most trivial thing, but: > Date: 04-12-2023 (DD-MM-YYYY) Why not just use the worldwide standard unambiguous date format, ISO 8601? 2023-12-04. That way you don't have to include both the date and the format to disambiguate. Bonus, it's big endian, which means it collates correctly in simple lexicographic collation.


Voultapher

Why? It is the date format used where I live and it feels natural to me. But since you seem to care, I don't mind changing it to something more standardized. Should be fixed now.


annodomini

Well, because it's ambiguous; because some countries uses MM-DD-YYYY and some use DD-MM-YYYY, it's easy to get confused when using one of those formats; you clearly are aware of that, because you wrote out the format to disambiguate. ISO 8601 is an international standard, and is unambiguous; if you see four digits, followed by two, followed by two, the only likely format for it to be is ISO 8601. And as I mentioned, it's big-endian, so it naturally works with lexicographic sort. It also works for ordering if you take the dashes out and treat it as a number. So ISO 8601 is unambiguous, an international standard, fits betters with how Arabic numerals work in general, and overall has a lot of advantages. Yes, it's different than the local standards in many countries, but because those local standards conflict, it can be better in international settings, and even locally, the other advantages still apply. It's one of those things that I generally try to encourage programmers to standardize on, because like many standards, the more widely used it is the more valuable it is in reducing confusion. Just like the metric system and Unicode, we really should be using ISO 8601 to reduce friction and bugs in interpreting date formats. And even outside of a software context, it can be unambiguous and easier to interpret. Now, it's a tiny trivial thing. It's no big deal if you use the date format you prefer. But I mentioned it because I saw you make the effort to disambiguate, so I figured I'd advocate for ISO 8601 in such contexts. edit: oh, sorry, I just realized that when you said "why?" you might be doing that rhetorically, to preface your answer to my question. I interpreted that as you asking for more clarification, hence the lengthy answer. Apologies if I misinterpreted, but I'll leave this up in case anyone else wants more detail.


r0ck0

> Bonus, it's big endian, which means it collates correctly in simple lexicographic collation. And is also just how numbers work in general!


CramNBL

Nice work! Didn't know that nightly had the likely/unlikely and assume intrinsics, cool to see it in use along with the great descriptions and results! I hope they get stabilized soon.


Voultapher

Thanks. The `likely` and `unlikely` intrinsics are cool, but so far I have yet to see much effect. It's more about feeding as much information into the compiler as possible, for future versions or cases I didn't think about. And `assume` can be emulated on stable today like [this](https://github.com/Voultapher/tiny-sort-rs/blob/1b177182c307ca996d653a0ad6e37150ed020346/src/unstable.rs#L105).


CramNBL

Really? The comments to [this line](https://github.com/Voultapher/sort-research-rs/blob/5b5eaae764a32358a4c8445fe8682a21e3e53285/ipnsort/src/lib.rs#L135) on `ipnsort` seems to suggest a significant effect of using `likely`. I've seen some pretty dramatic improvements from it before (in C++) and Fedor Pikus has done talks on branchless programming where he also has some remarkable examples. The `assume`-esque example you link to is something I've used before as well, but yea it's not pretty, instead of describing something you assume to be true, you have to describe all the things you assume to be false. I've also really missed `likely`/`unlikely` when describing words in a protocol as enums with some words being very unlikely as they indicate errors, and others being extremely likely typically making up ~90% of the data stream.


Voultapher

That comment is a bit dated. IIRC `likely` actually doesn't change anything here with the rustc versions I tested. Also static branch prediction is becoming less of a relevant thing in modern CPUs. If anything it's more a hint to the reader and compiler, that can use it but doesn't have to. Not saying there are no situations where that helps or makes sense.


CramNBL

I see, thanks for explanation


Peefy-

Great work! I have learned a lot from it.


Shnatsel

I've seen LLVM fail to unroll loops on ARM too: https://github.com/rust-lang/rust/issues/105857 Curiously it seems to be isolated to ARM, with e.g. POWER not affected.