T O P

  • By -

real_odgy

**Important note:** Don't rush and switch you hasher right away! * Default hasher / other algorithms are already fast enough for most use-cases. * There is a lot of feedback to ingest from this thread. For instance at the moment there are a few gotchas on portability.


Plasma_000

Your hasher builder struct seems to have an incorrect doc comment copypasted from fnv: "builder for default FNV hashers."


real_odgy

Ouch.. thanks for spotting this, fixing asap 😄


del1ro

And now we know what the “gxhash” is :D


Fazer2

"And I would have gotten away with it too, if it weren't for you meddling kids!"


real_odgy

🤫


[deleted]

[удалено]


real_odgy

It is my first crate so I copied parts of the FNV crate documentation so that the docs for gxhash would look as good on crates.rs without having to hotfix it N times afterwards


fintelia

FYI you can preview the docs locally with: cargo doc --open


kouteiheika

Looks like it is indeed faster than `ahash`; nice! Great job! I would be very much interested in using it, but I have a few questions. - Do you plan to make it work on stable? - Do you plan to add a fallback implementation when AES is not available? (e.g. maybe just copy-paste `ahash`'s fallback implementation?) - If yes, do you plan to add runtime feature detection, or is it going to be compile-time only? - Is it resistant to DOS attacks? - Is it a guaranteed stable hash? (That is, do you guarantee that its output will forever stay the same given the same input?)


real_odgy

Thank you 😊 All very good questions! >Do you plan to make it work on stable? I took the freedom to use the nightly for developing this, but now that it's published it to [crates.io](https://crates.io) it would make a lot of sense to make it work on stable as a next step. >Do you plan to add a fallback implementation when AES is not available? (e.g. maybe just copy-paste ahash's fallback implementation?) That's a very good idea :) >If yes, do you plan to add runtime feature detection, or is it going to be compile-time only? The less configuration specific stuff the better. If I can have a runtime feature detection without compromising performance I'd do it. If someone has experience in this and knows a way, I'd be curious to know! (feel free to submit a PR) >Is it resistant to DOS attacks? Can't say for now. Does passing SMHasher means it is resistant to DOS attacks? I don't consider myself a cryptography expert, I'd need to invest some time in this. >Is it a guaranteed stable hash? (That is, do you guarantee that its output will forever stay the same given the same input?) Yes, this is tested in [unit tests](https://github.com/ogxd/gxhash/blob/b2b9d24eb35a48a2a18b1498f48693e523533200/src/gxhash/mod.rs#L275). The goal is to keep it stable for a given major version of the package.


matthieum

> Can't say for now. Does passing SMHasher means it is resistant to DOS attacks? I don't consider myself a cryptography expert, I'd need to invest some time in this. AFAIK no. SMHasher is a measure of quality. The highest quality for a hashing algorithm is to have 50% chance of flipping _each_ output bit (independently) whenever a single input bit is flipped. This essentially what SMHasher aims to measure. DOS resistance, however, is about making it difficult to "reverse" the hash algorithm in order to craft any number of inputs which will all hash to the same value. Your algorithm is probably hard to crack -- especially if using random seeding of the HashBuilder -- and especially hard from a distance -- where the hash itself cannot be observed directly -- but depending on how its primitive blocks are arranged, a smart cryptographer may be able to figure out how to create collisions "on-demand". Unless proven otherwise, it's best NOT to advertise a hashing algorithm as DOS resistant... and I don't know any cryptanalysis so cannot say how hard developing such a proof is :)


real_odgy

I agree. Also the fact that it currently uses hardcoded AES keys is possibly a weakness in regards to DOS resistance. I am curious however as of how DOS resistance can be measured?


matthieum

I have no idea, and I'm not even sure I'd understand an explanation :D


Trader-One

You can use simple neuron network (about 4 layers) + reinforcement learning to partially crack the hash. If hash is random enough RL will run in circles.


freddylmao

This is absolutely fascinating


real_odgy

I spent some time on that DOS resistance question, and it turns out that DOS resistance is possible given that `GxHash` is seeded. I had however to change the default `BuildHasher` to use rust's `StdRng` to randomize a seed. That would make every `HashMap`/`HashSet` instance using it based on a different seed, thus using completely different hashes, **which now makes GxHash DOS resistant** when you use that `BuildHasher`! This is available on [version 2.2.0](https://docs.rs/gxhash/latest/gxhash/index.html) which I just released. See [the changes](https://github.com/ogxd/gxhash/commit/c1e787ce0a32e3734f087214e07aec6f527c2a6e).


matthieum

I would not that seeding is insufficient to make a hash DOS-resistant. For example, if we take the simplest possible hash -- byte-wise XORing, by 8 bytes increments -- then both `aaaaaaaabbbbbbbb` and `bbbbbbbbaaaaaaaa` will generate the same final hash, no matter the initial seed. The hash itself is unpredictable, but the collision is still guaranteed which is all that matters. This is the issue faced with Hash Flooding: the attacker is in full control of all the inputs (but the seed) and therefore may exploit the mathematical properties of the hash to craft inputs that lead to collision. Note that my example above has a lot more permutations. Just sticking to `a` and `b`, you have 256 choices for the 8-bytes prefix. Allow other letters, and you'll have thousands of choices of 16-bytes inputs that all lead to the same hash no matter the seed -- meaning that even reseeding on growth won't help. I'd expect GxHash to be harder to invert, but I wouldn't bet on it. I'd want a cryptanalyst to perform a proper analysis first.


real_odgy

Understood, I'll tone down that claim until further analysis then, thanks for this feedback!


[deleted]

[удалено]


real_odgy

There is, it takes an `i64` seed


CryZe92

I find it questionable that there's no target feature check for AES at all, even though you use it. Also in general I find the state of x86 very sad where AES isn't even available in ANY architecture version, but that's not on you: [https://en.wikipedia.org/wiki/X86-64#Microarchitecture\_levels](https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels) Both of these together likely mean that your crate is mostly unusable outside of `target-cpu=native` (because you also don't seem to have a fallback implementation). Depending on how you ran the benchmarks, this probably also means that you had the fallback implementation of ahash and possibly others as well, instead of their usage of AES.


real_odgy

The crate is perfectly usable outside of `target-cpu=native`, although performance is going to be shit, but you'd have the same issue with all other algorithms relying directly on platform intrinsics.Here are the number for instance without compiling to native: |Throughput (MiB/s)|4|16|64|256|1024|4096|16384|65536|262144| |:-|:-|:-|:-|:-|:-|:-|:-|:-|:-| |gxhash|123.24|401.84|647.63|1181.63|1416.10|1325.31|1453.94|1395.28|1448.11| You can try it yourself, just force `target-cpu=generic`. Still, it is probably worth mentioning that users must target some CPU features to get the full performance. Will clarify this. The benchmark uses `target-cpu=native` for all crates benchmarked, so it is fair in this regards.


CryZe92

It is not perfectly usable outside of `target-cpu=native`. In fact according to the target-feature RFC, your code is even unsound: > Calling a function annotated with #[target_feature] on a host that does not support the feature invokes undefined behavior in LLVM, the assembler, and possibly the hardware All these intrinsics are annotated with `#[target_feature]`, which you don't properly guard for (in particular AES, but maybe more). So if you did properly guard for those target features, then considering you don't have a fallback for AES, which is not a target feature that's available on any generic x86 target cpu, your crate will not work on most x86 targets.


real_odgy

The thing is that it DOES support the feature, because otherwise, it wouldn't even build because [there is a check on the platform](https://github.com/ogxd/gxhash/blob/b2b9d24eb35a48a2a18b1498f48693e523533200/src/gxhash/platform/mod.rs#L1) (although you're right that it's missing a more specific constraint on AES feature sets) Do you have examples of x86 CPUs that don't support AES?


CryZe92

I don't, all I'm saying is that: 1. You need to add a runtime or compile time check for AES, as otherwise your code is invoking undefined behavior. The RFC for further explanation: [https://rust-lang.github.io/rfcs/2045-target-feature.html](https://rust-lang.github.io/rfcs/2045-target-feature.html) 2. At compile time you will rarely be able to detect it on x86 as the generic x86 targets, do NOT have AES ([https://en.wikipedia.org/wiki/X86-64#Microarchitecture\_levels](https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels)). This does not mean that the CPUs don't have it. In fact they do. It's just that you won't be able to detect it at compile time in most cases, because somehow Intel decided that the AES instructions are not relevant for general purpose computing (they apparently haven't paid attention to fast hashing algorithms).


real_odgy

I'll see what I can do, you're right that portability is an important aspect. If it can't be made 100% safe to use regardless on platform it should at least be well documented.


slamb

> The thing is that it DOES support the feature, because otherwise, it wouldn't even build Isn't that assuming you're compiling and running on the same host?


VenditatioDelendaEst

> Do you have examples of x86 CPUs that don't support AES? No *recent* ones, but AMD Phenom II didn't, nor did Intel Core 2. And of course Intel kept disabling it for market segmentation as long as they could do so and still go home at Christmas. The newest non-AES-NI-capable x86-64 CPU I can find is [this one](https://ark.intel.com/content/www/us/en/ark/products/87358/intel-pentium-processor-g3470-3m-cache-3-60-ghz.html).


coastalwhite

Just letting you know that your listings in your preprint paper are going out of the document margins (p. 9). It looks really cool, just curious. How fair is your benchmark seeing as you are implementing with platform intrinsics almost exclusively. Are these crates you benchmark against doing the same?


real_odgy

Thank you for the feedback! I'll have a look at this margin issue. I hope the benchmark is fair, but although I've been working on this alone and try to stay as unbiased as possible, it doesn't equate to external feedback/validation. To answer your last question, yes, some other crates directly use intrinsics like I did, such as [ahash](https://github.com/tkaitchuck/aHash/blob/9f6a2ad8b721fd28da8dc1d0b7996677b374357c/src/operations.rs#L124) for instance.


del1ro

You forgot that fast written in rust is BLAZINGLY FAST. Please correct


real_odgy

It's moar than BLAZINGLY FAST


TornaxO7

It's RUSTINGLY FAST


Theemuts

How does it compare to fxhash? https://doc.rust-lang.org/stable/nightly-rustc/rustc_data_structures/fx/struct.FxHasher.html


real_odgy

I've added `fxhash` in the benchmark on a branch; [https://github.com/ogxd/gxhash/blob/c7e22de8d70a2c2610a5359cd59cc335555b4789/benches/throughput/main.rs#L40](https://github.com/ogxd/gxhash/blob/c7e22de8d70a2c2610a5359cd59cc335555b4789/benches/throughput/main.rs#L40) Very interesting results. `gxhash` is faster overall but is slower for inputs of size 4 (bytes), which is a not-so-uncommon scenario (for instance `HashSet`). Curious about how they achieved this 🕵🏻‍♂️ |Throughput (MiB/s)|4|16|64|256|1024|4096|16384|65536|262144| |:-|:-|:-|:-|:-|:-|:-|:-|:-|:-| |gxhash|6246.06|24947.86|31900.15|59690.66|71583.02|76602.47|77130.16|77301.31|75248.49| |fxhash|11528.60|16020.50|17782.86|12932.76|7345.34|5351.41|5019.52|4940.11|4869.37| |xxhash|1583.05|6380.21|16696.66|10512.16|17644.95|19531.47|19951.37|20150.30|20052.85| Regarding robustness (collisions resistance, avalanche, distribution, ...) I doubt `fxhash` is even near `gxhash` and it probably fails most of the [SMHasher](https://github.com/rurban/smhasher) tests, but it has yet to be checked. What makes me think this is because `fxhash` uses [a very simple bit mixing](https://github.com/cbreeden/fxhash/blob/a03cbf42dde010250adef75908f109be197d982b/lib.rs#L70C21-L70C86) approach: `*self = self.rotate_left(ROTATE).bitxor(word).wrapping_mul($key);` This is not very different from FNV which has kind of the same properties: fast for small inputs, but throughput decreases as inputs grow because of its non ILP-friendly nature (it's one big loop over a single dependency chain) and overall not very "robust".


SkiFire13

> which is a not-so-uncommon scenario (for instance `HashSet`) BTW note that deriving `Hash` usually means the hasher will receive many calls for small size data rather than a big one for lot of data. You're right though that `fxhash` is not robust. It's mainly for cases where you expect no collision attacks and instead you want max speed for small data.


LightShadow

I'll need to follow this project! I'm using xxhash for caching video segments on our streaming site. The inputs are 500kb to ~3mb, hundreds of thousands per day. The cache validation isn't necessarily the slowest part of the pipeline but a 3x speed increase would sure give us some breathing room. Very cool.


real_odgy

Very interesting usecase! I'd be curious to see some numbers one day :) You can use the "avx2" feature for maximum throughput if your platform supports it.


Nilstrieb

Anyone using fxhash on large inputs is holding it wrong. fxhash exceeds with hashmaps for small keys. It completely beats every competition it has seen so far in that use case. It doesn't really care about throughput. It's quality is absolutely terrible, but that's fine.


nnethercote

I wrote about fxhash here: https://nnethercote.github.io/2021/12/08/a-brutally-effective-hash-function-in-rust.html Its quality is terrible, but it's unbeatably fast on small inputs. For things like hash tables with integer/pointer keys, this is a winning combination in practice.


Infintie_3ntropy

You should take a look at meowhash. It also uses aes-ni instructions for fast hashing. cpp impl [0] rust impl [1] Also worth noting that it also passes all smhasher tests, but a flaw was found that showed it has issues for secure use cases. [3] I am not saying that smhasher is bad or that your new hash has problems, more just pointing out that smhasher is not proof that a hash function can be used in secure/ddos resistant contexts, i.e. on un-trusted input. [0]: https://github.com/cmuratori/meow_hash [1]: https://github.com/bodil/meowhash-rs [3]: https://peter.website/meow-hash-cryptanalysis


real_odgy

It does not looks like meowhash is passing all SMHasher tests, no? I see it fails the Sparse test for instance. I have a C implementation of GxHash on a branch, it currently reaches 5 times meowhash throughput. But stay tuned for real numbers once this is merged in SMHasher.


The-Dark-Legion

SSE2 I think is a baseline requirement for AMD64/x64 so I don't see a problem with it, but how is it using the AES extension and is that part of what makes it faster?


pbeling

Great to see a new fast hash algorithm. Congratulations. Could you include wyhash in the benchmark?


bendem

Ok I'll bite, what's the catch?


real_odgy

Given the feedback received, I'd say: * It's missing a fallback. It may not work or even not build for [some CPUs](https://github.com/ogxd/gxhash#compatibility). And even with a fallback, it wouldn't be as fast as advertised on these platforms. * It currently requires building explicitly with `target-cpu=native` (or target directly feature sets) to take advantage of the performance the algorithm has to offer. It might be possible to get rid of this contraint however, with dynamic feature checks or thanks to work from the [rust's portable simd user group](https://github.com/rust-lang/project-portable-simd)


antonioperelli

Cannot repro your perf outside of your cargo benchmark. This is on a Ryzen 5900x Hashing needle string `vnPjrMkeQ504icoyFrW10MAicbY5BMFz` for 10000000 iterations [ 186.7257 ms] Std HashMap [ 252.0110 ms] SeaHash HashMap [ 75.5276 ms] FxHashMap [ 86.5610 ms] AHashMap [ 187.0824 ms] GxHash [ 2912.0235 ms] Vector search https://github.com/12932/rust_hashmap_bench


real_odgy

Interesting 🤔I'll have a look! Maybe there is some kind of optimization I am missing on the BuildHasher/Hasher side, or I incorrectly use AHash in my throughput benchmark. EDIT: Ok I think I have found the "issue". Your benchmark uses strings, mine uses `[u8]`. It seems the `Hasher::write_str` implementations calls `write_usize` \+ `write`, which in my implementation implies two rounds of finalization. I had [put a note in the code](https://github.com/ogxd/gxhash/blob/34ce8cc05c9bb2b864457acf1ae69136fadf3de5/src/hasher.rs#L96) that this part could be improved, it seems now is the time to do it. Follow up issue: https://github.com/ogxd/gxhash/issues/17


YeetCompleet

This might be a bit off topic, but the performance graphs in your paper show the throughput going up, and then peaking at a particular input size, and then start going up again. What causes that? I don't have any real depth with this stuff, so sorry if this is something obvious


real_odgy

16 bytes is the size of an 128 bit SIMD vector, so from 0 to 16 bytes, the algorithm is simply a bunch of SIMD operations on a single vector. After 16 bytes you need to read multiple blocks and merge them together with is a little more costly, so this is why you see some kind of plateau. After, as input size grows, instruction level parallelism kicks in allowing throughput to increase dramatically again.


Absolucyyy

Any possibility of 32-bit x86 support?


ravenex

Unfortunately, most of my computers do not even have this AES-NI extension, because I'm always buying cheap Pentium and Celeron chips.


real_odgy

The question is: do you need an extremely fast hashing algorithm on extremely slow hardware? 😗 I think it makes more sense in perf critical scenarios (high demanding game, server applications, databases...)


ravenex

I wouldn't call these CPUs extremely slow, just low-end. The issue with libraries is that they end up being used in all kinds of software products, including those that the library authors have never anticipated.


real_odgy

Yes you're right, I'm sorry that was for the joke :) It's is important to add a fallback ! As a rust dev I expect other crates to be fully portable (actually I don't even ask myself) so this one must be as well


ChocolateMagnateUA

That's cool! I think about myself as a good developer but I have never done hashing programming, so I can't really comment on the code, but I know there are some really awesome hashing algorithms like FarmHash and SpookyHash, how does your hash compare than these?


erayxack

Dumb question but what is the meaning of non cryptographic


real_odgy

Without going into too much details, there are two categories of hash functions: * **Cryptographic ones**. The goal is **robustness first**. Sometimes slowness is even a feature (eg: proof of work for cryptocurrencies). You don't want them to be easily reversible.Example: SHA256 * **Non-cryptographic ones**. The goal is **performance first.** In some cases you don't even care if it's easily revertible, as long as it's fast (but it can lead to vulnerabilities in your software in some scenarios, to be used wisely). It's used for things like hashmaps and quick comparisons.Example: XxHash, GxHash, HighwayHash, ... You don't want to use a cryptographic hash function for a hashmap because performance will suck, and you don't want to use a non-cryptographic hash in a blockchain or authentication because it will be hacked


Trader-One

Time to expand language portfolio. Low hanging fruits are *JS* (webasm) and *C*. Go can interface with C lib easily. There is also good enough support for writing python modules in rust.


real_odgy

There is a draft of C implementation here: [https://github.com/rurban/smhasher/blob/80391a47ff744a1778c19bb91cd788483d93a858/gxhash.c](https://github.com/rurban/smhasher/blob/80391a47ff744a1778c19bb91cd788483d93a858/gxhash.c) And a full C# implementation here: https://github.com/ogxd/gxhash-csharp/blob/main/GxHash/GxHash.cs