#123
ProposedPlutus

Bitwise operations over BuiltinByteString

Created on  by Koz Ross

Abstract

We describe the semantics of a set of bitwise operations for Plutus BuiltinByteStrings. Specifically, we provide descriptions for:

  • Bit shifts and rotations
  • Counting the number of set bits (popcount)
  • Finding the first set bit

We base our work on similar operations described in CIP-58, but use the bit indexing scheme from the logical operations cip for the semantics. This is intended as follow-on work from both of these.

Motivation: why is this CIP necessary?

Bitwise operations, both over fixed-width and variable-width blocks of bits, have a range of uses. Indeed, we have already proposed CIP-122, with some example cases, and a range of primitive operations on BuiltinByteStrings designed to allow bitwise operations in the service of those example cases, as well as many others. These operations form a core of functionality, which is important and necessary, but not complete. We believe that the operations we describe in this CIP form a useful 'completion' of the work in CIP-122, based on similar work done in the earlier CIP-58.

To demonstrate why our proposed operations are useful, we re-use the cases provided in the CIP-122, and show why the operations we describe would be beneficial.

Case 1: integer set

For integer sets, the previous description lacks two important, and useful, operations:

  • Given an integer set, return its cardinality; and
  • Given an integer set, return its minimal member (or specify it is empty).

These operations have a range of uses. The first corresponds to the notion of Hamming weight, which can be used for operations ranging from representing boards in chess games to exponentiation by squaring to succinct data structures. Together with bitwise XOR, it can also compute the Hamming distance. The second operation also has a range of uses, ranging from succinct priority queues to integer normalization. It is also useful for rank-select dictionaries, a succinct structure that can act as the basis of a range of others, such as dictionaries, multisets and trees of different arity.

In all of the above, these operations need to be implemented efficiently to be useful. While we could use only bit reading to perform all of these, it is extremely inefficient: given an input of length nn, assuming that any bit distribution is equally probable, we need Θ(8n)\Theta(8 \cdot n) time in the average case. While it is impossible to do both of these operations in sub-linear time in general, the large constant factors this imposes (as well as the overhead of looping over bit indexes) is a cost we can ill afford on-chain, especially if the goal is to use these operations as 'building blocks' for something like a data structure.

Case 2: hashing

In our previously-described case, we stated what operations we would need for the Argon2 family of hashes specifically. However, Argon2 has a specific advantage in that the number of operations it requires are both relatively few, and the most complex of which (BLAKE2b hashing) already exists in Plutus Core as a primitive. However, other hash functions (and indeed, many other cryptographic primitives) rely on two other important instructions: bit shifts and bit rotations. As an example, consider SHA512, which is an important component in several cryptographic protocols (including Ed25519 signature verification): its implementation requires both shifts and rotations to work.

Like with Case 1, we can theoretically simulate both rotations and shifts using a combination of bit reads and bit writes to an empty BuiltinByteString. However, the cost of this is extreme: we would need to produce a list of index-value pairs of length equal to the Hamming weight of the input, only to then immediately discard it! To put this into some perspective, for an 8-byte input, performing a rotation involves allocating an expected 32 index-value pairs, using significantly more memory than the result. On-chain, we can't really afford this cost, especially in an operation intended to be used as part of larger constructions (as would be necessary here).

Specification

We describe the proposed operations in several stages. First, we give an overview of the proposed operations' signatures and costings; second, we describe the semantics of each proposed operation in detail, as well as some examples. Lastly, we provide laws that any implementation of the proposed operations should obey.

Throughout, we make use of the bit indexing scheme described in a CIP-122. We also re-use the notation x[i]x[i] to refer to the value of at bit index ii of xx, and the notation xix\\{i\\} to refer to the byte at byte index ii of xx: both are specified in CIP-122.

Operation semantics

Our proposed operations will have the following signatures:

  • bitwiseShift :: BuiltinByteString -> BuiltinInteger -> BuiltinByteString
  • bitwiseRotate :: BuiltinByteString -> BuiltinInteger -> BuiltinByteString
  • countSetBits :: BuiltinByteString -> BuiltinInteger
  • findFirstSetBit :: BuiltinByteString -> BuiltinInteger

We assume the following costing, for both memory and execution time:

Operation Execution time cost Memory cost
bitwiseShift Linear in the BuiltinByteString argument As execution time
bitwiseRotate Linear in the BuiltinByteString argument As execution time
countSetBits Linear in the argument Constant
findFirstSetBit Linear in the argument Constant

bitwiseShift

bitwiseShift takes two arguments; we name and describe them below.

  1. The BuiltinByteString to be shifted. This is the data argument.
  2. The shift, whose sign indicates direction and whose magnitude indicates the size of the shift. This is the shift argument, and has type BuiltinInteger.

Let bb refer to the data argument, whose length in bytes is nn, and let ii refer to the shift argument. Let the result of bitwiseShift called with bb and ii be brb_r, also of length nn.

For all j0,1,8n1j \in 0, 1, \ldots 8 \cdot n - 1, we have

br[j]={b[ji]if ji0,1,8n10otherwiseb_r[j] = \begin{cases} b[j - i] & \text{if } j - i \in 0, 1, \ldots 8 \cdot n - 1\\ 0 & \text{otherwise} \\ \end{cases}

Some examples of the intended behaviour of bitwiseShift follow. For brevity, we write BuiltinByteString literals as lists of hexadecimal values.

-- Shifting the empty bytestring does nothing
bitwiseShift [] 3 => []
-- Regardless of direction
bitwiseShift [] (-3) => []
-- Positive shifts move bits to higher indexes, cutting off high indexes and
-- filling low ones with zeroes
bitwiseShift [0xEB, 0xFC] 5 => [0x7F, 0x80]
-- Negative shifts move bits to lower indexes, cutting off low indexes and
-- filling high ones with zeroes
bitwiseShift [0xEB, 0xFC] (-5) => [0x07, 0x5F]
-- Shifting by the total number of bits or more clears all bytes
bitwiseShift [0xEB, 0xFC] 16 => [0x00, 0x00]
-- Regardless of direction
bitwiseShift [0xEB, 0xFC] (-16) => [0x00, 0x00]

bitwiseRotate

bitwiseRotate takes two arguments; we name and describe them below.

  1. The BuiltinByteString to be rotated. This is the data argument.
  2. The rotation, whose sign indicates direction and whose magnitude indicates the size of the rotation. This is the rotation argument, and has type BuiltinInteger.

Let bb refer to the data argument, whose length in bytes is nn, and let ii refer to the rotation argument. Let the result of bitwiseRotate called with bb and ii be brb_r, also of length nn.

For all j0,1,8n1j \in 0, 1, \ldots 8 \cdot n - 1, we have br=b[(ji) mod (8n)]b_r = b[(j - i) \text{ } \mathrm{mod} \text { } (8 \cdot n)].

Some examples of the intended behaviour of bitwiseRotate follow. For brevity, we write BuiltinByteString literals as lists of hexadecimal values.

-- Rotating the empty bytestring does nothing
bitwiseRotate [] 3 => []
-- Regardless of direction
bitwiseRotate [] (-1) => []
-- Positive rotations move bits to higher indexes, 'wrapping around' for high
-- indexes into low indexes
bitwiseRotate [0xEB, 0xFC] 5 => [0x7F, 0x9D]
-- Negative rotations move bits to lower indexes, 'wrapping around' for low
-- indexes into high indexes
bitwiseRotate [0xEB, 0xFC] (-5) => [0xE7, 0x5F]
-- Rotation by the total number of bits does nothing
bitwiseRotate [0xEB, 0xFC] 16 => [0xEB, 0xFC]
-- Regardless of direction
bitwiseRotate [0xEB, 0xFC] (-16) => [0xEB, 0xFC]
-- Rotation by more than the total number of bits is the same as the remainder
-- after division by number of bits
bitwiseRotate [0xEB, 0xFC] 21 =>[0x7F, 0x9D]
-- Regardless of direction, preserving sign
bitwiseRotate [0xEB, 0xFC] (-21) => [0xE7, 0x5F]

countSetBits

Let bb refer to countSetBits' only argument, whose length in bytes is nn, and let rr be the result of calling countSetBits on bb. Then we have

r=i=08n1b[i]r = \sum_{i=0}^{8 \cdot n - 1} b[i]

Some examples of the intended behaviour of countSetBits follow. For brevity, we write BuiltinByteString literals as lists of hexadecimal values.

-- The empty bytestring has no set bits
countSetBits [] => 0
-- Bytestrings with only zero bytes have no set bits
countSetBits [0x00, 0x00] => 0
-- Set bits are counted regardless of where they are
countSetBits [0x01, 0x00] => 1
countSetBits [0x00, 0x01] => 1

findFirstSetBit

Let bb refer to findFirstSetBit's only argument, whose length in bytes is nn, and let rr be the result of calling findFirstSetBit on bb. Then we have the following:

  1. r1,0,1,,8n1r \in -1, 0, 1, \ldots, 8 \cdot n - 1
  2. If for all i0,1,n1i \in 0, 1, \ldots n - 1, bi=0x00b\\{i\\} = \texttt{0x00}, then r=1r = -1; otherwise, r>1r > -1.
  3. If r>1r > -1, then b[r]=1b[r] = 1, and for all i0,1,,r1i \in 0, 1, \ldots, r - 1, b[i]=0b[i] = 0.

Some examples of the intended behaviour of findFirstSetBit follow. For brevity, we write BuiltinByteString literals as lists of hexadecimal values.

-- The empty bytestring has no first set bit
findFirstSetBit [] => -1
-- Bytestrings with only zero bytes have no first set bit
findFirstSetBit [0x00, 0x00] => -1
-- Only the first set bit matters, regardless what comes after it
findFirstSetBit [0x00, 0x02] => 1
findFirstSetBit [0xFF, 0xF2] => 1

Laws

Throughout, we use bitLen bs to indicate the number of bits in bs; that is, sizeOfByteString bs * 8. We also make reference to logical operations from a previous CIP as part of specifying these laws.

Shifts and rotations

We describe the laws for bitwiseShift and bitwiseRotate together, as they are similar. Firstly, we observe that bitwiseShift and bitwiseRotate both form a monoid homomorphism between natural number addition and function composition:

bitwiseShift bs 0 = bitwiseRotate bs 0 = bs
 
bitwiseShift bs (i + j) = bitwiseShift (bitwiseShift bs i) j
 
bitwiseRotate bs (i + j) = bitwiseRotate (bitwiseRotate bs i) j

However, bitwiseRotate's homomorphism is between integer addition and function composition: namely, i and j in the above law are allowed to have different signs. bitwiseShift's composition law only holds if i and j don't have opposite signs: that is, if they're either both non-negative or both non-positive.

Shifts by more than the number of bits in the data argument produce an empty BuiltinByteString:

-- n is non-negative
 
bitwiseShift bs (bitLen bs + n) = 
bitwiseShift bs (- (bitLen bs + n)) = 
replicateByteString (sizeOfByteString bs) 0x00

Rotations, on the other hand, exhibit 'modular roll-over':

-- n is non-negative
bitwiseRotate bs (binLen bs + n) = bitwiseRotate bs n
 
bitwiseRotate bs (- (bitLen bs + n)) = bitwiseRotate bs (- n)

Shifts clear bits at low indexes if the shift argument is positive, and at high indexes if the shift argument is negative:

-- 0 < n < bitLen bs, and 0 <= i < n
readBit (bitwiseShift bs n) i = False

readBit (bitwiseShift bs (- n)) (bitLen bs - i - 1)  = False

Rotations instead preserve all set and clear bits, but move them around:

-- 0 <= i < bitLen bs
readBit bs i = readBit (bitwiseRotate bs j) (modInteger (i + j) (bitLen bs))

countSetBits

countSetBits forms a monoid homomorphism between BuiltinByteString concatenation and natural number addition:

countSetBits "" = 0
 
countSetBits (x <> y) = countSetBits x + countSetBits y

countSetBits also demonstrates that bitwiseRotate indeed preserves the number of set (and thus clear) bits:

countSetBits bs = countSetBits (bitwiseRotate bs i)

There is also a relationship between the result of countSetBits on a given argument and its complement:

countSetBits bs = bitLen bs - countSetBits (bitwiseLogicalComplement bs)

Furthermore, countSetBits exhibits (or more precisely, gives evidence for) the inclusion-exclusion principle from combinatorics, but only under truncation semantics:

countSetBits (bitwiseLogicalXor False x y) = countSetBits (bitwiseLogicalOr
False x y) - countSetBits (bitwiseLogicalAnd False x y)

Lastly, countSetBits has a relationship to bitwise XOR, regardless of semantics:

countSetBits (bitwiseLogicalXor semantics x x) = 0

findFirstSetBit

BuiltinByteStrings where every byte is the same (provided they are not empty) have the same first bit as a singleton of that same byte:

-- 0 <= w8 <= 255, n >= 1
findFirstSetBit (replicateByteString n w8) = 
findFirstSetBit (replicateByteString 1 w8)

Additionally, findFirstSet has a relationship to bitwise XOR, regardless of semantics:

findFirstSetBit (bitwiseLogicalXor semantics x x) = -1

Any result of a findFirstSetBit operation that isn't -1 gives a valid bit index to a set bit, but any non-negative BuiltinInteger less than this will give an index to a clear bit:

-- bs is not all zero bytes or empty
readBit bs (findFirstSetBit bs) = True
 
-- 0 <= i < findFirstSet bs
readBit bs i = False

Rationale: how does this CIP achieve its goals?

Our four operations, along with their semantics, fulfil the requirements of both our cases, and are law-abiding, familiar, consistent and straightforward to implement. Furthermore, they relate directly to operations provided by CIP-122, as well as being identical to the equivalent operations in CIP-58. At the same time, some alternative choices could have been made:

  • Not implementing these operations at all, instead requiring higher-level APIs to implement them atop CIP-122 primitives;
  • Providing only some of these four operations;
  • Having findFirstSetBit return the bit length for an all-zero argument, instead of -1;
  • Including a way of finding the last set bit as well.

We discuss our choices with regards to all the above, and specify why we made the choices that we did.

While we chose to provide all of these operations as primitives, we could instead have required higher-level APIs to provide these on the basis of CIP-122 bit reads and writes. This is indeed possible:

  • bitwiseShift and bitwiseRotate is a loop over all bits in the data argument, which is used to construct a change list that sets appropriate bits (with the right offset), which then gets applied to a BuiltinByteString of the same length as the data argument, but full of zero bytes.
  • countSetBits is a fold over all bit indexes, incrementing an accumulator every time a set bit is found.
  • findFirstSetBit is a fold over all bit indexes, returning the first index with a set bit, or (-1) if none can be found.

However, especially for bitwiseShift and bitwiseRotate, this comes at significant cost. For shifts and rotations, we have to construct a change list argument whose length is proportional to the Hamming weight of the data argument. Assuming that all inputs are equally likely, this is proportional to the bit length of the data argument: such a change list is likely significantly larger than the result of the shift or rotation, making the memory cost of such operations far higher than it needs to be. Given the constraints on memory and execution time on-chain, at minimum, these two operations would have to be primitives to make them viable at all: Case 2-style implementations would have intolerably large memory costs otherwise, as such algorithms often make heavy use of both shifts and rotations.

The case for countSetBits and findFirstSetBit is less clear here, as they wouldn't require nearly as much of a memory cost if implemented atop CIP-122 primitives using folds. However, the cost would still be significant: countSetBits requires looping over every bit index in the argument, and findFirstSetBit (again, assuming any bit distribution in an argument is equally probable) requires looping over about half this many. This would make these operations unreasonable even over smaller inputs, which would make applications like rank-select dictionaries (which expect these operations to be fast and low-cost) unworkable. As succinct data structures are foremost in our minds when considering the current primitives, we believe it is important that countSetBits and findFirstSetBit are as efficient as they could be, hence their inclusion.

When designing our operations, we tried to keep them familiar to Haskellers, namely by having them behave like the similar operations from the Bits and FiniteBits type classes. In particular, bitwiseShift and bitwiseRotate act similarly given the same shift (or rotation) argument, as positive values shift left and negative ones shift right. The only exception is the choice for findFirstSetBit to return -1 when no set bits are found, which runs counter to the way countTrailingZeros from FiniteBits works, which instead returns the bit length. This is also different to how this operations works in hardware. Indeed, having findFirstSetBit work this way would not only be more familiar, it would also provide additional laws:

-- Not possible under our current definition
-- bitLen is the bit length of the argument
0 <= popcount bs <= bitLen bs - findFirstSet bs 

Under both definitions, the intent is the same: if the argument doesn't contain any set bits, produce an invalid index. The difference is that we choose to produce a negative invalid index, whereas the consensus is to produce a non-negative invalid index instead. However, one task that is likely to come up frequently when using findFirstSetBit is checking whether the index we were given was valid or not (essentially, whether the argument has any nonzero bytes in it). Under our definition, all that would be required is

let found = findFirstSetBit bs
  in if found < 0
     then weMissed
     else validIndex

This is a cheap operation in Plutus Core, requiring only comparing against a constant. However, if we used the more widely-used definition, we would instead have to do this:

let found = findFirstSetBit bs
    bitLen = 8 * sizeOfByteString bs
  in if found >= bitLen
     then weMissed
     else validIndex

This requires us to do considerably more work (finding the length of the argument, multiplying by 8, then compare against that result), and is also much more prone to error: users have to remember to use a >= comparison, as well as to multiply the argument length by 8. This is less of an issue with implementations of this operation in other languages, as their equivalent operations are designed for fixed-width arguments (indeed, FiniteBits requires this), which makes their bit length a constant. In our case, this isn't as simple, as BuiltinByteStrings have variable length, which would make the cost described above unavoidable. Our solution is both more efficient and less error-prone: all a user needs to remember is that invalid indexes from findFirstSetBit are negative. On this basis, we decided to vary from conventional approaches.

One notable omission from our operators is the equivalent of counting leading zeroes: namely, an operation that would find the last set bit. Typically, both a count of leading and trailing zeroes is provided in both hardware and software: this is the case for FiniteBits, as well as most hardware implementations. To relate this to our cases, specifically Case 1, this would allow us to efficiently find the largest element in an integer set. The reason we omit this operation is because, when compared with counting trailing zeroes, it is far less useful: while it can be used for computing fast integer square roots, it lacks many other uses. Counting trailing zeroes, on the other hand, is essential for rank-select dictionaries (specifically for the select operation), and also enables a range of other uses, which we have already mentioned. In order to limit the number of new primitives, we decided that counting leading zeroes can be omitted for now. However, our design doesn't preclude such an operation from being added later if a use case for it is found to be useful.

Path to Active

Acceptance Criteria

We consider the following criteria to be essential for acceptance:

  • A proof-of-concept implementation of the operations specified in this document, outside of the Plutus source tree. The implementation must be in GHC Haskell, without relying on the FFI.
  • The proof-of-concept implementation must have tests, demonstrating that it behaves as the specification requires.
  • The proof-of-concept implementation must demonstrate that it will successfully build, and pass its tests, using all GHC versions currently usable to build Plutus (8.10, 9.2 and 9.6 at the time of writing), across all Tier 1 platforms.

Ideally, the implementation should also demonstrate its performance characteristics by well-designed benchmarks.

Implementation Plan

MLabs has begun the implementation of the proof-of-concept as required in the acceptance criteria. Upon completion, we will send a pull request to Plutus with the implementation of the primitives for Plutus Core, mirroring the proof-of-concept.

This CIP is licensed under Apache-2.0.