this page was originally written for a different part of my site. i moved it here with minimal edits. it's still valuable information, so i didn't want to discard it
storing files with large regions of consecutive equal bits in less space. the original context for this was in storing long 1-bit audio files more efficiently such that they might fit in the flash memory of an arduino uno
the basis of this idea is simple: just count bits up. count how many of the same bit appear in a row and add that value to an array, switch the bit you're looking for, and repeat. things get a bit more complex when you begin looking into how to store that information effectively
...is a term that refers to how many bits are used to store a single sample of information. for most regular applications, it makes sense to use a bit depth that's a multiple of 8 bits. this is not a regular application. depending on how "homogeneous" the file you're compressing actually is, most homogeneous fields (sequences of identical bits) might not reach 255 bits, and it won't make sense to use a bit-depth of 8 or higher. for my project, it sometimes makes sense to use a bit-depth of 4. so how does that work?
this initially sounds like a problem, we can't store things in spaces smaller than one byte, right? that's not an incorrect assessment, but nothing's stopping us from cramming everything we want in one byte anyway.
we can cut our byte into halves. we'll store 4 bits in the upper bits, and 4 bits in the lower bits. to access these in isolation looks as follows:
shift the byte over
AAAABBBB
>> 4
=
0000AAAA
some implementations of bitshift might cause the value to be malformed, fix it with bitwise and:
1111AAAA
&
00001111 or 0xf
bitwise and to zero out the upper bits
AAAABBBB
&
00001111
=
0000BBBB
each "delta" represents the number of same bits in a row, that much is pretty clear. how do we represent them in bits though? the first obvious thing is just to store the count in each section of size bit-depth. this comes with a couple issues. we waste as many bits as our bit_depth anytime we have to store a delta larger than the max integer, we waste bit_depth many bits if the file starts with a 0, and again if it ends in a 1.
i haven't actually used this method yet, but it's been on my mind a while so i'll describe it here:
in each byte, store the bit this delta represents in the highest order bit. now you don't waste bits in all those cases i described before :)
you could store other information in that bit too, like instead of what bit this delta refers to, you could mark whether this value specifies some multiple of bit_depth-1 and store enormously lengthy homogeneous fields in just two bytes. you'd still be wasting space in those scenarios i mentioned before, but it could save huge amounts of total space on particular kinds of files
save space by counting. ez. huge W. who knew file compression for very particular kinds of files was possible just by using pre-school level logic and techniques. lol