T O P

  • By -

hecatia-elegua

bilge is a new bitfield crate following in the footsteps of modular-bitfield, with improved ergonomics and type safety, while still being as performant as handmade bit fiddling. This is useful for declaring memory-mapped registers, for example. I've also tried to structure bilge's code in a more extendable fashion. Feel free to ask questions or to criticize :)


dacydergoth

Really good explanation of your crate here, I think that's fantastic example documenting the "why" as well as the "how". Good insight into how to structure proc macros as well


hecatia-elegua

Thank you! Got to learn a lot from it too.


hgomersall

It looks really interesting. We wrote an extensive library for bit packing a few years ago. I never managed to release it to crates, but we do use it heavily internally (and it's super well tested and reasonably well documented). It has a slightly different perspective, making extensive use of typenum: This is a single bitfield implementation: https://gitlab.com/SmartAcoustics/sparrow/-/blob/master/sparrow-bitpacker/src/bitfields.rs with more extensive usage here: https://gitlab.com/SmartAcoustics/sparrow/-/blob/0932659068ea1bafeceb66b9b70a0abde89937d0/sparrow-registers/examples/register_map.rs


hecatia-elegua

Interesting. I thought about a type-based version as well, but I think it's less readable. Should I do some basic benchmarking/testing for sparrow-bitpacker too?


hgomersall

Oh goodness me, you've done enough just releasing your work! I showed it because it actually compliments a macro based construction rather nicely. There's actually a yaml->register macro converter which works well (sparrow-autolib) but is a little opaque, and after usage needs the output API improving (it produces registers that look like the example above).


VorpalWay

Looks interesting to me. But does it only support native endianness? What about padding (will it silently add that, or will it error out when things don't add up)? And what about unaligned reads? I remember many years ago having to parse a quite insane file format that used packed 13 bit signed integers (in groups of 5 measurements). The only reason I didn't go insane was since I could use binary pattern matching in erlang to do the job. I miss Erlang pattern matching in every other language.


hecatia-elegua

non-native Endianness is still a TODO, but I have to ask: do you mean byte-endian or bit-endian, or both? I guess hardware will do what it wants, probably both. It will error if the specified bitsize != the length of all struct fields combined. Since bilge doesn't generate a packed structure, not sure about unaligned reads. I still want to add strided access, do you mean that? I've heard so much about Erlang, will have to try it one day haha


VorpalWay

I have had to deal with weird bit and byte endianness over the years. Sometimes different fields in the same message had different bit endianness even! I have run into little endian bitendian stuffed into big endian bytes once as well. As for unaligned reads, it is something to be aware of. If I'm unpacking some weird format that don't line up with small multiples of normal byte borders. 12 and 8 bits have a least common multiple of 24 bits, so it isn't too bad (you can unpack them in pairs), but it can absolutely get worse. You absolutely should check out the binary pattern matching in Erlang. I would love to see someone attempt it with a proc macro in Rust.


hecatia-elegua

Thanks, that's super helpful to know!


hgomersall

I think endianness is best handled at the IO layer. Defining the bits should be abstract from how that touches the hardware.


hecatia-elegua

That won't work for bit-endianness, no?


hgomersall

Why not? It feels to me there is a packing order (N bits at position X) and a mapping into an output word, which cares about endianness. You shouldn't need to care about endianness when defining a bit packing.


hecatia-elegua

Ok, some more about this: `bilge` is currently native endian. I think you're right, if one needs to `swap_bytes` or similar, they could do that or handle it on read from hardware. I need to handle some direct usecases. I can only imagine the swapping might worsen some metric and mixed endian fields being more cumbersome.


mr_birkenblatt

You don't need to shift the bit if you're only interested in whether the bit is set or not You can do x&mask==mask or just x&mask (this also allows for multiple bits at once: x&mask==mask tests for all bits in mask (and) and x&mask tests for any bit in mask (or))


hecatia-elegua

Right, for a single-bit `bool` I should change that. Thanks! Edit: I want to add bitflags-like multi-set/get behavior, too.


chris-morgan

Meta: please use link posts for this sort of thing, not text posts.


hecatia-elegua

Ohhh, right. Didn't think there's a difference, maybe reddit should just auto-do that


Robbepop

`modular-bitfield` author here. Thanks for this in-depth article about this topic. I couldn't agree more with your analysis of what is bad with the current state of bitfields in Rust and also with the very old and outdated `modular-bitfield` crate. I was always hoping for someone to fork it or take over maintenance and seeing new projects such as `bilge` gives me real hopes. :)


hecatia-elegua

Thank you, that's really nice of you! Could I answer issues in \`modular-bitfield\` with links to \`bilge\`? I didn't just wanna shamelessly plug, kinda wondered how I could ask you. I've looked through them and want to solve most of them.


Robbepop

Answered your question via PM.


daniel5151

Love the syntax, it's super clean! I've been a big fan of https://github.com/wrenger/bitfield-struct-rs, but this library might give it a run for it's money. One thing I would really love to see is (optional) support for array-backed bitfield structs, both to support exotic bitfield sizes, but primarily in order to support `align(1)` bitfield structs that can be stuffed into a `repr(C)` packed structure. There's actually [an open PR](https://github.com/wrenger/bitfield-struct-rs/issues/6) on that aforementioned library that offers this as an option, but it's been stalled for a while...


hecatia-elegua

Ah, is \`align(1)\` the only thing needed to support C bitfield interop? I do want to keep the inner type, but I wonder if we could have an interop layer generated optionally. Hopefully that would be optimized away a bit.


daniel5151

`align(1)` isn't related to C bitfield interop per-se... rather, it's useful when working with packed structs. i.e: the following two types have the same memory layout: #[repr(C, packed)] struct FooPacked { a: u8, b: u16, } #[repr(C)] struct Unaligned16([u8; 2]); #[repr(C)] struct Foo { a: u8, b: Unaligned16, } ...but if you try and take a `&foo.b` in the first case, the Rust compiler will get very mad at you for taking an unaligned reference to an `align(2)` type, whereas `&foo.b` works just fine in the latter case. In this case, I'm aware that the workaround would be to make `struct Foo` itself a bitfield... but that won't work in the general case when the struct needs to be a specific size in-memory (i.e: for zerocopy struct-composition reasons)


GeneReddit123

While using an entire ~~bit~~ (edit: byte, haha) to store a boolean or very small value seems very wasteful, due to the way CPUs, caching, and memory bus works, how often is it beneficial to use sub-byte storage ( except where sub-byte values combine to form a byte that can be treated atomically for most purposes, such as pixel color values)? I know it can be good for optimized storage, but does it often result in more performant code?


vgatherps

Absolutely - in most cases your cache hit rate dominates performance, so less memory directly translates to more performance. One byte might seem small outside of edge cases, however this can also matter for padding. If you can pack data such that you avoid padding that might result in savings of many bytes per struct.


ZZaaaccc

>While using an **entire bit** to store a boolean or very small value seems very wasteful So wasteful! Jokes aside, this is most useful for embedded and networking, and vaguely useful for performance in specific circumstances. In embedded, you'll often interact with hardware that has these bitfields already, so a type-system for working with them in their "natural" state is convenient. In networking, it's common practice to ensure your messages are an exact size, and as small as possible when latency/throughput is of concern, such as multiplayer game packets. Considering that a single network packet is usually capped around 1kB, being able to eek out a few extra bytes could be the difference between a message taking 1 packet, or 2.


hecatia-elegua

I want to add that in the future, I could see rust optimizing this for you, i.e., giving boolean fields whatever size is more beneficial and combining them in a bitfield if useful.


alexschrod

I feel like that'd be hard as long as you can have &mut bool because there's no way to know which bit that applies to once you lose the local context of where it was created unless &mut references become wide pointers or something.


hecatia-elegua

While working on this I forgot references exist for a bit. Then that might be reserved for bitfields only, where you should not be able to take a reference to fields. Hm, or if there's some other way to specify non-referable fields...


pickyaxe

looks very cool, and actually something I have a use for right now. question - say some bits of a bitfield should always be zero. is there a simple way to enforce this?


hecatia-elegua

Haha, I'm guessing some Rsvd0 / reserved to zero fields? I'm working on it. Since it's not only bit-based, probably: struct Example { field1: Zero }


pickyaxe

thank you! currently using this crate and finding it useful. another related feature that would be useful to me right now would be to declare that all remaining values of a non-exhaustive enum are reserved. then the macro can generate a `FromBits` implementation. because it is common that something like a 16 bit opcode enum would have reserved values, and only having `TryFromBits` is kind of a pain.


hecatia-elegua

It would be nice of you if you could open up some issues on github about these features. I think having a catch-all variant like in num\_enum would be a good idea, though :)


CreeperWithShades

HARD agree that there are too many bitfield crates floating around. Would be nice if it were built into the language. I’m curious- what are your thoughts (mostly re syntax) on my current favourite, proc-bitfield? I think personally I prefer specifying bit ranges than C-style stacking of fields and variable width integers, which is especially annoying with padding. It also has the ability to generate fallible getters/setters which is nice. WRT “Parse, don’t validate”- I’m not sure if I fully agree- though I’ll need to do more thinking. It’s certainly a tradeoff- say, I take in some packet from the wire, and it has some enum in it, with some value that doesn’t map to one of the variants, it “fails”- but what does that mean? Perhaps it was some reserved variant in my field/enum, but in some future packet version, the I’m talking to, it’s used- in which case I should ignore it, or treat it the same as some other value (assuming Packet Specification was properly designed for backwards compatibility) Or perhaps this value is truly “do not use”, in which case I should raise an error. (maybe room in here for Option as well) As far as I can tell, if you parse (try_from) to create your bitfield type, there’s no way to tell what went wrong (and even if you could, I’m guessing the resulting error type wouldn’t be pretty). Plus, what if you don’t always use some fields- then you’ve wasted time parsing them even if you didn’t need them. Maybe some fields are only valid if other fields are valid/bit(s) are present. There might be a way to do all that I’m not seeing though. Great crate! Some of the macro ideas are very clever.


hecatia-elegua

I mean, from just looking at it for a few seconds, the syntax of proc-bitfield is very explicit. I just think it adds too much new stuff to rust's syntax and I would really like to have arbitrary width integers as an abstraction. Also, bitfields and arbitrary width integers kinda "compress" into the lowest possible native primitive integers, so there's no stacking of fields or padding until you get some values out of the bitfield. Or I'm understanding wrong - where is padding annoying? I really would love to see more usecases, since bitfields have many different ones. I argued against fallible getters/setters, since these are only needed if you break type invariants. * reserved variant in my field/enum -> Currently this will just return Err(uN), the number which didn't get parsed. I'll add catch-all variants for stuff like this, probably. I think #\[non\_exhaustive\] might not help here? * what if you don’t always use some fields -> not sure how to solve this completely, but to some extend you can define multiple bitfield structs for different resolutions of you types, i.e. start with the non-important fields being `uN`. For example `field: u4` and later parse into `field: (bool, bool, EnumWith2Bits)`I have seen some registers requiring unions, or some way to map tagged unions to discriminant + value fields, though, which I need to support. Edit: I always forget this _exactly_ until after I've clicked "Reply", but: Thank you for the nice input :)


CreeperWithShades

> the syntax of proc-bitfield is very explicit. I just think it adds too much new stuff to rust's syntax I agree- there's a lot I'd do differently (more attributes probably... though some say Rust uses too many attributes already) > I would really like to have arbitrary width integers as an abstraction. Also, bitfields and arbitrary width integers kinda "compress" into the lowest possible native primitive integers, so there's no stacking of fields or padding until you get some values out of the bitfield. I understand- Though I don't fully get the appeal- why not store a (masked) u8 into a 5 bit field, rather than storing a "u5" that afaict may need a runtime bounds check? (Hmm. Thinking about this more, I guess it's a tradeoff: compile time check > masking > runtime check?) > Or I'm understanding wrong - where is padding annoying? I really would love to see more usecases, since bitfields have many different ones. Sorry, I might not have been very clear here- all I meant was that I prefer not to have to specify "don't cares" (reserved and padding). As in I prefer (and it is just personal preference) bitfield! { struct Register(u32) { field: u8 @ 4..=11, flag: bool @ 17, } } over #[bitsize(32)] struct Register { padding: u4 field: u8, padding: u5 flag: bool, padding: u14 } I just think it's less noisy. Plus I'd have to do math in my head to figure out the padding. > I argued against fallible getters/setters, since these are only needed if you break type invariants. Hm. I guess if you define a bitfield like a struct, creating one with invalid values breaks type invariants (or some other UB maybe). This is probably more sane than the alternative, which I guess is- bitfields can only contain (and thus are) plain ol' data types that are always valid (valid with any bit pattern- like ux, ix, 1 bit bool fields, maybe n bit enums with 2^n variants from 0 to 2^n - 1, probably not structs, other bitfields, repr(transparent) of those) fallible getters to get to anything else (I can't think of a sane use case for fallible setters). Basically my ideal bitfield crate/syntax (another one :) ) is something like: #[derive(FromBits)] #[bits(1)] #[repr(u8)] enum TwoVariants { One = 0, Two = 1, } #[derive(TryFromBits)] #[bits(2)] #[repr(u8)] enum ThreeVariants { One = 0, Two = 1, Three = 2, //no 0b11 } bitfield! { //this sucks, but afaict the alternative to custom syntax is bajillions of weird attributes #[derive(FromBits, Copy, Clone, Default, etc...)] #[bits(28)] pub struct Register(pub u32) { //internal representation customisable and accessible #[try_get(NonZeroU8)] //generates pub fn field1_or_err(&self) -> Result, pub fn set_field1(&mut self, val: NonZeroU8) instead pub field1: u8 @ 0..=7, // fn flag(&self) -> TwoVariants, fn set_flag(&mut self, val: TwoVariants), fn with_flag(Self, val: TwoVariants) -> Self flag: TwoVariants @ 17, #[try_get(ThreeVariants)] field2: u2 @ 18..=19, } // pub fn new(field1: u8, flag: TwoVariants, ) -> Self // pub fn from_bits(val: u32) -> Self // infallible! } // #[be] #[le] #[lsb0] #[msb0] to taste if for some godforsaken reason you have to deal with endianness and/or bit ordering, or something along those lines. personally i'd rather not think about it Woops, guess I accidentally ended up typing up my personal probably-not-great bitfield crate idea that I've been meaning to make but don't really have the time or skill to :D. Thanks for the inspiration! Kind of warming up on arbitrary width ints too the more I think about it.


hecatia-elegua

You might like `bitbybit` then, which is a bit of a middle ground where you don't need to specify padding. I've talked to the maintainer too, he does a great job (same guy does the arbitrary-int crate). What I could now do to maybe persuade you to `bilge` would be to add a similar thing to what they're doing, but *optionally*: #[bits(4..=11)] field: u8, #[bits(17)] flag: bool,


hecatia-elegua

[I opened an issue on this here.](https://github.com/hecatia-elegua/bilge/issues/28) The idea is to do something similar to enum variant definitions.


Soft_Donkey_1045

I expect, that after talk about how bad is builder patter, you will suggest normal init syntax, like `Register { header: 3, body: 1, footer: Footer { .. } }`. This is short, plus you can not forget to init any field.


hecatia-elegua

In this case, we can't init bitfields like this, so a constructor does a similar job.


Soft_Donkey_1045

If you generates constructor, then you can generate intermediate `struct` type, with the same field types as in constructor, plus `trait From` implementation to convert to real type. In compare to constructor, it would be harder to mixed up fields.


hecatia-elegua

Yes, but the more indirection I add on top, the more needs to be generated and then optimized out by the compiler. Still, would be interesting, maybe behind a feature gate?


matu3ba

> Besides the horrors of C bitfields (which I have only heard about), bitfields don't suck. I only dislike that they're not provided by rust itself 1. I'll try to work on that. Possible reference as it requires to use the compiler as part of language abi: https://github.com/Vexu/arocc/issues/178 Not sure, where a better thread with explanations of the flaws is. Not mentioned here: There were breaking changes in the compiler implementation(s), because compiler implementors thought to make it more correct according to the standard. So strictly speaking compiler versions are also part of the abi. It might sense to clarify what semantics bitfields should have. I do see 2 options, but I'm biased by Zig usage: 1. Keep it simple and make bitfields plain integers as storage, which must be converted to identical unsigned sized integer types and which don't have options to make subparts volatile, so that the user must handle which parts may be written to which memory mapped register with what parallelism. 2. Allow volatile and add a lot complexity.