Integer-ByteString conversions
Abstract
Plutus Core primitive operations to convert BuiltinInteger
to
BuiltinByteString
, and vice-versa. Furthermore, the BuiltinInteger
conversion allows different endianness for the encoding (most-significant-first
and most-significant-last), as well as padding with zeroes based on a requested
length if required.
Motivation: why is this CIP necessary?
Plutus Core creates a strong abstraction boundary between the concepts of
'number' (represented by BuiltinInteger
) and 'blob of bytes' (represented by
BuiltinByteString
), defining different sets of (largely non-overlapping)
operations for each. This is, in principle, a good practice, as these concepts
are distinct in (most of) the operations that make sense on them. However,
sometimes, being able to 'move between' these two 'worlds' is important: namely,
the ability to represent a given BuiltinInteger
as a BuiltinByteString
, as
well as to convert between this representation and the BuiltinInteger
it
represents. Currently, no such capability exists: while CIP-0058
proposed such a capability (among others), to date, this has not been
implemented into Plutus Core.
To see why such a capability would be beneficial, we give two motivating use cases.
Case 1: signing bids
Consider the following code snippet:
validBidTerms :: AuctionTerms -> CurrencySymbol -> BidTerms -> Bool
validBidTerms AuctionTerms {..} auctionID BidTerms {..}
| BidderInfo {..} <- bt'Bidder =
validBidderInfo bt'Bidder &&
-- The bidder pubkey hash corresponds to the bidder verification key.
verifyEd25519Signature at'SellerVK
(sellerSignatureMessage auctionID bi'BidderVK)
bt'SellerSignature &&
-- The seller authorized the bidder to participate
verifyEd25519Signature bi'BidderVK
(bidderSignatureMessage auctionID bt'BidPrice bi'bidderPKH)
bt'BidderSignature
-- The bidder authorized the bid
bidderSignatureMessage
:: CurrencySymbol
-> Integer
-> PubKeyHash
-> BuiltinByteString
bidderSignatureMessage auctionID bidPrice bidderPKH =
toByteString auctionID <>
toByteString bidPrice <>
toByteString bidderPKH
sellerSignatureMessage
:: CurrencySymbol
-> BuiltinByteString
-> BuiltinByteString
sellerSignatureMessage auctionID bidderVK =
toByteString auctionID <>
bidderVK
Here, we attempt to verify (using the Curve25519) that a bid at an auction was
signed by a particular bidder. The message to verify must include the bid
placed, represented using Integer
here (which translates to BuiltinInteger
onchain). However, the verifyEd25519Signature
primitive can only accept
BuiltinByteString
s as messages to verify. Thus, we have a problem: how to
include the placed bid into the bid message to be verified?
More generally, constructing messages to sign usually consists of
concatenating together some primitives represented as BuiltinByteString
s.
We currently have a way to do this for (some) strings using encodeUtf8
,
but no way to do this for BuiltinInteger
s.
Case 2: finite fields
Finite fields, also known as Galois fields, are a common
algebraic structure in cryptographic constructions. Many, if not most, common
constructions in cryptography use finite fields as their basis, including
Curve25519, Curve448 and the Pasta
curves, to name but a few. Elements in a finite field are
naturally representable as BuiltinInteger
s of bounded size onchain, but for
applications like the constructions specified above (and indeed, anything built
atop such constructions), we need to be able to perform the following tasks
efficiently:
- Verify that a particular value belongs to the field; and
- Perform bitwise (that is, non-numerical) operations on such values, possibly together with numerical ones.
Furthermore, Case 2 presents two further challenges: endianness and padding. Due to many cryptographic algorithms being designed for use over the network, their specifications assume a big-endian byte ordering in their implementations. Likewise, due to the finiteness of a finite field's elements, they can be encoded in a fixed-length form, which implementations make use of, both for convenience and efficiency.
While it is not outright impossible to perform conversions from BuiltinInteger
to BuiltinByteString
currently, it is unreasonably difficult and
resource-intensive: BuiltinInteger
to BuiltinByteString
involves a repeated
combination of division-with-remainder in a loop, while BuiltinByteString
to
BuiltinInteger
involves repeated multiplications by large constants and
accumulations. Aside from these both requiring looping (with the overheads this
imposes), both of these are effectively quadratic operations with current
primitives: the only means we have to accumulate BuiltinByteString
s is by
consing or appending (which are both quadratic due to BuiltinByteString
being
a counted array), and any BuiltinInteger
operation is linear in the size of
its arguments. This makes even Case 1 far more effort, both for the developer and
the node, than it should be, and Case 2 ranges from difficult to impossible once
we factor in the limited available primitive operations and the endianness and
padding problems.
We propose that two primitives be added to Plutus Core: one for converting
BuiltinInteger
s to BuiltinByteString
s, the other for converting
BuiltinByteString
s to BuiltinInteger
s. The first of these primitives would
allow for specifying an endianness for the result, as well as to perform padding
to a required length if necessary; the second primitive is able to operate on
padded or unpadded encodings, in either endianness.
Additionally, we state the following goals that any implementation of such primitives must have.
No metadata
The representation produced by the BuiltinInteger
to BuiltinByteString
conversion should be 'minimal', representing only the number being given to it,
and no other information besides. It would be tempting to, for example, encode
the endianness requested into the BuiltinByteString
, but ultimately, this
information could be added later by users if they want it, while removing it
would be trickier. Additionally, metadata-related concerns would complicate both
the specification and implementation of the primitives, for arguably marginal
benefit.
Internals-independence
Users of these primitives should not need to know how exactly
BuiltinInteger
s are represented to use them successfully. This is beneficial
to both users (as they now don't have to concern themselves with
platform-specific implementation issues) and Plutus Core maintainers (as changes
in the representation of BuiltinInteger
aren't going to affect these
primitives).
No support for negative numbers
While for fixed-size numbers, two's-complement is the default choice for negative number representations, for arbitrary-size numbers, there is no agreed-upon choice. Furthermore, indicating the 'negativity' of a number would require making representations larger or more complex regardless of which representation we chose, while also complicating both the primitives we want to define, and any user-defined operations on such representations, possibly in ways that users do not want. Lastly, for our cases, negative values are not really needed, and if the ability to encode negative numbers was necessary, users could still define whichever one(s) they needed themselves, with little effort or computational cost.
This CIP partially supercedes CIP-0058: specifically, the
specifications here replace the integerToByteString
and byteStringToInteger
primitives specified in CIP-0058, as improved, and more general, solutions.
Specification
We describe the specification of two Plutus Core primitives, which will have the following signatures:
builtinIntegerToByteString :: BuiltinBool -> BuiltinInteger -> BuiltinInteger -> BuiltinByteString
builtinByteStringToInteger :: BuiltinBool -> BuiltinByteString -> BuiltinInteger
To describe the semantics of these primitives, we first specify how we represent
a BuiltinInteger
as a BuiltinByteString
; after that, we describe the two
primitives, as well as giving some properties they must follow.
Representation
Our BuiltinByteString
representations of non-negative BuiltinInteger
s treat
the BuiltinInteger
being represented as a sequence of digits in base-256.
Thus, any byte in the BuiltinByteString
representation corresponds to a single
base-256 digit, whose digit value is equal to its value as an 8-bit unsigned
integer. For example, the byte 0x80
would have digit value 128, while the byte
0x03
would have digit value 3.
To determine place value, we define two possible arrangements of digits in such a representation: most-significant-first, and most-significant-last. In the most-significant-first representation, the first digit (that is, the byte at index 0) has the highest place value; in the most-significant-last representation, the first digit instead has the lowest place value. These correspond to the notions of big-endian and little-endian respectively.
For any positive BuiltinInteger
i
, let
be its base-256 form. Then, for the most-significant-first
representation, the BuiltinByteString
encoding for i
is the
BuiltinByteString
b
such that . For
the most-significant-last encoding, we instead have
.
For example, consider the number 123_456
. Its base-256 form is
64 * 256 ^ 0 + 226 * 256 ^ 1 + 1 * 256 ^ 2
Therefore, its most-significant-first representation would be
[ 0x01, 0xE2, 0x40 ]
while its most-significant-last representation would be
[ 0x40, 0xE2, 0x01 ]
For 0
, in line with the above definition, both its most-significant-first and
most-significant-last representation is []
(that is, the empty
BuiltinByteString
).
To represent any given non-negative BuiltinInteger
i
as above, we require
a minimum number of base-256 digits. For positive i
, this is
; for i = 0
, we define this to be
. We can choose to represent i
with more digits than this minimum, by the
use of padding. Let be the minimum number of digits to represent i
, and
let be a positive number: to represent i
using digits in the
most-significant-first encoding, we set the first bytes of the encoding as
0x0
; for the most-significant-last encoding, we set the last bytes of
the encoding as 0x0
instead.
To extend our previous example, a five-digit, most-significant-first
representation of 123_456
is
[ 0x00, 0x00, 0x01, 0xC2, 0x80 ]
while the most-significant-last representation would be
[ 0x80, 0xC2, 0x01, 0x00, 0x00 ]
We observe that these extra digits do not change what exact BuiltinInteger
is
represented, as any zero digit has zero place value.
builtinIntegerToByteString
We can now describe the semantics of the builtinIntegerToByteString
primitive. The builtinIntegerToByteString
function takes three arguments; we
specify (and name) them below:
- Whether the most-significant-first encoding should be used. This is the
endianness argument, which has type
BuiltinBool
. - The number of bytes required in the output (if such a requirement exists).
This is the length argument, which has type
BuiltinInteger
. - The
BuiltinInteger
to convert. This is the input.
If the input is negative, builtinIntegerToByteString
fails. In this case, the
resulting error message must specify at least the following information:
- That
builtinIntegerToByteString
failed due to a negative conversion attempt; and - What negative
BuiltinInteger
was passed as the input.
If the length argument is outside the closed interval ,
builtinIntegerToByteString
fails. In this case, the resulting error message
must specify at least the following information:
- That
builtinIntegerToByteString
failed due to an invalid length argument; and - What
BuiltinInteger
was passed as the length argument.
If the input is 0
, builtinIntegerToByteString
returns the
BuiltinByteString
consisting of a number of zero bytes equal to the length
argument.
If the input is positive, and the length argument is also positive, let d
be
the minimum number of digits required to represent the input (as per the
representation described above). If d
is greater than the length argument,
builtinIntegerToByteString
fails. In this case, the resulting error message
must specify at least the following information:
- That
builtinIntegerToByteString
failed due to the requested length being insufficient for the input; - What
BuiltinInteger
was passed as the length argument; and - What
BuiltinInteger
was passed as the input.
If d
is equal to, or greater, than the length argument,
builtinIntegerToByteString
returns the BuiltinByteString
encoding the input.
This will be the most-significant-first encoding if the endianness argument is
True
, or the most-significant-last encoding if the endianness argument is
False
. The resulting BuiltinByteString
will be padded to the length
specified by the padding argument if necessary.
If the input is positive, and the length argument is zero,
builtinIntegerToByteString
returns the BuiltinByteString
encoding the input.
Its length will be minimal (that is, no padding will be done). If the endianness
argument is True
, the result will use the most-significant-first encoding, and
if the endianness argument is False
, the result will use the
most-significant-last encoding.
We give some examples of the intended behaviour of builtinIntegerToByteString
below:
-- fails due to negative input
builtinIntegerToByteString False 0 (-1) -- => ERROR
-- endianness argument doesn't affect this case
builtinIntegerToByteString True 0 (-1) -- => ERROR
-- length argument doesn't affect this case
builtinIntegerToByteString False 100 (-1) -- => ERROR
-- zero case, no padding
builtinIntegerToByteString False 0 0 -- => []
-- endianness argument doesn't affect this case
builtinIntegerToByteString True 0 0 -- => []
-- length argument adds more zeroes, but endianness doesn't matter
builtinIntegerToByteString False 5 0 -- => [ 0x00, 0x00, 0x00, 0x00, 0x00 ]
builtinIntegerToByteString True 5 0 -- => [ 0x00, 0x00, 0x00, 0x00, 0x00 ]
-- length argument too large (2^29)
builtinIntegerToByteString False 536870912 0 -- => ERROR
-- endianness doesn't affect this case
builtinIntegerToByteString True 536870912 0 -- => ERROR
-- fails due to insufficient digits (404 needs 2)
builtinIntegerToByteString False 1 404 -- => ERROR
-- endianness argument doesn't affect this case
builtinIntegerToByteString True 1 404 -- => ERROR
-- zero length argument is exactly the same as requesting exactly the right
-- digit count
builtinIntegerToByteString False 2 404 -- => [ 0x94, 0x01 ]
builtinIntegerToByteString False 0 404 -- => [ 0x94, 0x01 ]
-- switching endianness argument reverses the result
builtinIntegerToByteString True 2 404 -- => [ 0x01, 0x94 ]
builtinIntegerToByteString True 0 404 -- => [ 0x01, 0x94 ]
-- padding for most-significant-last goes at the end
builtinIntegerToByteString False 5 404 -- => [ 0x94, 0x01, 0x00, 0x00, 0x00 ]
-- padding for most-significant-first goes at the start
builtinIntegerToByteString True 5 404 -- => [ 0x00, 0x00, 0x00, 0x01, 0x94 ]
We also describe properties that any implementation of builtinIntegerToByteString
must
have. Throughout, q
is not negative, p
is positive, d
is in the closed
interval , k
is in the closed interval ,
0 <= j < k
, and 1 <= r <= 255
. We also define singleton x = consByteString x emptyByteString
.
lengthOfByteString (builtinIntegerToByteString e d 0) = d
indexByteString (builtinIntegerToByteString e k 0) j = 0
lengthOfByteString (builtinIntegerToByteString e 0 p) > 0
builtinIntegerToByteString False 0 (multiplyInteger p 256) = consByteString 0 (builtinIntegerToByteString False 0 p)
builtinIntegerToByteString True 0 (multiplyInteger p 256) = appendByteString (builtinIntegerToByteString True 0 p) (singleton 0)
builtinIntegerToByteString False 0 (plusInteger (multiplyInteger q 256) r) = appendByteString (builtinIntegerToByteString False 0 r)
(builtinIntegerToByteString False 0 q)`builtinIntegerToByteString True 0 (plusInteger (multiplyInteger q 256) r) = appendByteString (builtinIntegerToByteString False 0 q) (builtinIntegerToByteString False 0 r)
builtinByteStringToInteger
The builtinByteStringToInteger
primitive takes two arguments. We specify, and
name, these below:
- Whether the input uses the most-significant-first encoding. This is the
stated endianness argument, which has type
BuiltinBool
. - The
BuiltinByteString
to convert. This is the input.
If the input is the empty BuiltinByteString
, builtinByteStringToInteger
returns 0. If the input is non-empty, builtinByteStringToInteger
produces the
BuiltinInteger
encoded by the input. The encoding is treated as
most-significant-first if the stated endianness argument is True
, and
most-significant-last if the stated endianness argument is False
. The input
encoding may be padded or not.
We give some examples of the intended behaviour of builtinByteStringToInteger
below:
-- empty input gives zero
builtinByteStringToInteger False emptyByteString => 0
-- stated endianness argument doesn't affect this case
builtinByteStringToInteger True emptyByteString => 0
-- if all the bytes are the same, stated endianness argument doesn't matter
builtinByteStringToInteger False (consByteString 0x01 (consByteString 0x01
emptyByteString) -- => 257
builtinByteStringToInteger True (consByteString 0x01 (consByteString 0x01
emptyByteString) -- => 257
-- most-significant-first padding is at the start
builtinByteStringToInteger True (consByteString 0x00 (consByteString 0x01
(consByteString 0x01 emptyByteString))) -- => 257
builtinByteStringToInteger False (consByteString 0x00 (consByteString 0x01
(consByteString 0x01 emptyByteString))) -- => 65792
-- most-significant-last padding is at the end
builtinByteStringToInteger False (consByteString 0x01 (consByteString 0x01
(consByteString 0x00 emptyByteString) -- => 257
builtinByteStringToInteger True (consByteString 0x01 (consByteString 0x01
(consByteString 0x00 emptyByteString) -- => 65792
We also describe properties that any builtinByteStringToInteger
implementation
must have. Throughout, q
is not negative and 0 <= w8 <= 255
.
builtinByteStringToInteger b (builtinIntegerToByteString b 0 q) = q
builtinByteStringToInteger b (consByteString w8 emptyByteString) = w8
builtinIntegerToByteString b (lengthOfByteString bs) (builtinByteStringToInteger b bs) = bs
Rationale: how does this CIP achieve its goals?
We believe that these operations address both of the described cases well, while
also meeting the goals stated at the start of this CIP. Our specified primitives
address both the problems of endianness and padding specified in Case 2, while
also ensuring that use cases like Case 1 (where bounding length isn't important)
are not made more difficult than necessary. The representation we have chosen is
metadata-free, doesn't depend on any representation choices (current or future)
of BuiltinInteger
, while also being flexible enough to satisfy both cases
where endianness and padding matter, and when they don't.
Alternative possibilities
As part of this proposal, we considered two alternative possibilities:
- Use the CIP-0058 versions of these operations; and
- Have a uniform treatment of the length argument for
builtinIntegerToByteString
(always minimum or always maximum).
CIP-0058 defines a sizeable collection of bitwise primitive
operations for Plutus Core, mostly for use over BuiltinByteString
s. As part of
these, it also defines conversion functions similar to
builtinIntegerToByteString
and builtinByteStringToInteger
, which are named
integerToByteString
and byteStringToInteger
respectively. Unlike the
operations specified in this CIP, the CIP-0058 operations do not address the
problems of either padding or endianness: more precisely, the representations
constructed are always minimally-sized, and use a big-endian encoding. While in
the context they are being presented in, these choices are defensible, they do
not adequately address Case 2, and in
particular, many cryptographic constructions used with finite fields. Users of
the CIP-0058 primitives who needed to ensure a minimum length of a converted
BuiltinInteger
would have to pad manually, which CIP-0058 gives no additional
support for; additionally, if a little-endian representation was required, the
BuiltinByteString
result would have to be reversed, which has quadratic cost
if using only Plutus Core primitives. Thus, we consider these implementations to
be a good attempt, but not suited to even their intended use, much less more
general applications.
An alternative possibility for the length argument of
builtinIntegerToByteString
would be to treat the argument as either a minimum,
or a maximum, rather than our more hybrid approach. Specifically, for any input
i
, let d
be the minimum number of digits required to represent i
as per
the description of our representation, and let k
be the length argument.
Then:
- The minimum length argument approach would produce a result of size ; that is, if the length argument is smaller than the minimum required digits, the minimum would be used instead.
- The maximum length argument approach would produce a result of size
k
, and would error if .
Both the minimum length argument, and the maximum length argument, approaches have merits. The maximum length argument approach in particular is useful for Case 2: in such a setting, we already know the maximum size of any element's representation, and if we somehow ended up with a larger representation than this, it would be a mistake, which the maximum length argument would catch immediately. For the minimum length argument approach, the advantage would be more for Case 1: where the length of the representation is not known (and the user isn't particularly concerned anyway). In such a situation, the user could pass any argument and know that the conversion would still work.
However, both of these approaches have disadvantages as well. The minimum
length argument approach would be more tedious to use with Case 2, as each
conversion would require a size check of the resulting
BuiltinByteString
. While this is not expensive, it is annoying, and given the
complexity of the constructions that would be built atop of any finite field
implementations, it feels unreasonable to require this from users. Likewise, the
maximum length argument approach is unreasonable for situations like Case 1: the only
way to establish how many digits would be required involves performing an
integer logarithm in base 256, which is inefficient and error-prone. Our hybrid
approach gives the benefits of both the minimum length argument and maximum
length argument approaches, without the downsides of either: we observe that,
for situations like Case 1, the length argument would be 0
in practically all
cases, which is a value that would not be useful in any situation where the maximum
length argument approach would be used. This observation allows our approach to work
equally well for both Case 1 and 2, with minimal friction.
Path to Active
Acceptance Criteria
We consider the following criteria to be essential for acceptance:
- A proof-of-concept implementation of the operation specified here must exist, outside of the Plutus source tree. The implementation must be in Haskell.
- The proof-of-concept implementation must have tests, demonstrating that it behaves as the specification requires, and that the representations it produces match the described representation in this CIP.
- The proof-of-concept implementation must demonstrate that it will successfully build, and pass its tests, using all GHC versions currently usable to build Plutus (8.10, 9.2 and 9.6 at the time of writing), across all Tier 1 platforms.
Ideally, the implementation should also demonstrate its performance characteristics by well-designed benchmarks.
Implementation Plan
MLabs have completed the implementation of the proof of concept as required (located here. This implementation has been merged into Plutus Core, and will be released in the upcoming V3 release.
Copyright
This CIP is licensed under the Apache-2.0 license.