# Wallet Checksums

## Abstract

We introduce a checksum algorithm to help users verify they are restoring the right wallet before the restoration actually takes place.

## Motivation: why is this CIP necessary?

Users occasionally enter the wrong mnemonic for their wallet. In this case, they simply see a 0 ADA wallet after syncing is over. This not only wastes the user's time, in the worst case it makes them think they either lost all their ADA or think there is a bug in the wallet implementation.

To solve this, we introduce a checksum that can be computed without having to perform wallet restoration.

## Specification

First, it's important to note that the method for generating a checksum is heavily dependent on the type of wallet (ex: BIP44, etc.). We outline an algorithm that works with most, but not all, types of wallet.

### Requirements for checksum

- Easily recomputed without access to mnemonic, private key or other similarly sensitive data
- Does not reveal anything about the wallet (irreversible -- cannot tell addresses, private key, etc. from just seeing the checksum)
- Negligible chance of collision
- Easy to memorize for the user
- Can be easily saved both digitally or on paper

### Implementation Outline

To satisfy (1), the checksum SHOULD be seeded from the public key for the wallet. Notably, in the BIP44 case, it should come from the bip44 account derivation level's public key.
**Note**: For HD wallets, the public key used SHOULD contain the chaincode also because we need to make sure that not just the public key, but all its child keys also, are properly generated.

To satisfy (2) and (3), the a hash of the public key is used

To satisfy (4) and (5), we generate for an *ImagePart* and a *TextPart*. The brain can roughly remember images allowing you to quickly dismiss checksums that look totally different. However, since images can sometimes be similar, a *TextPart* is also provided for double-checking. Additionally, if the user does not have access to a printer, the text part can be easily written down by hand on a piece of paper to satisfy (5).

## Rationale: how does this CIP achieve its goals?

We first provide a template for the code, explain the template and then provide the parameterization we use for Cardano

```
function calculateChecksum(publicKeyHash: string /* note: lowercase hex representation */) {
const hash = hash1(publicKeyHash);
const [a, b, c, d] = hash_to_32(hash); // get a 4 byte value from the hash
const alpha = `ABCDEJHKLNOPSTXZ`; // take 16 letters from the alphabet that are easy to distinguish
// construct the TextPart from a group of letters and a group of numbers
const letters = x => `${alpha[Math.floor(x / 16)]}${alpha[x % 16]}`;
const numbers = `${((c << 8) + d) % 10000}`.padStart(4, '0');
const id = `${letters(a)}${letters(b)}-${numbers}`;
return {
hash, // used to generate the ImagePart
id, // used as the TextPart
};
}
```

### TextPart rationale

For ease of perception it seems that short alphanumeric sequences are the best for humans to remember, especially when letters and numbers are separated and not mixed together.

#### Letter part

For letters, we render the bytes in hex, but replace the alphanumeric used in hex with this letter-only alphabet:

`A B C D E J H K L N O P S T X Z`

This alphabet satisfies the following requirements:

- Has exactly 16 letters (one-to-one mapping with 2 bytes in HEX)
- Does not contain characters that look too much like each other
- Minimizes occurrences of undesirable words in this list.

#### Number part

The last two bytes are compressed to a 4-digit number. For this we will simply take the last 4 digits of the 16-bit integer number constructed from 2 bytes as `((A << 8) + B) % 10000`

(zero-padded).

This above produces 10000 unique values across all possible values of A and B and giving maximum of 7 potential collisions per value and 6.5 average collisions per value, which is the minimum, given the fact that we reduce maximum potential number 65536 to 4 digits.
**Note**: resulting number is zero-padded to 4 digits.

### ImagePart rationale

For the image, we take the result of `hash1`

and use it as the seed for the blockies library.

This library in particular has the following benefits:

- Has been audited
- Used by other blockchains and therefore has common libraries for implementation

**Note**: This library internally re-hashes its input to a 128-bit entropy string

### Hash algorithms used in Byron + ITN

For `hash1`

, we use `blake2b512`

. Blake2b is a standardized hash function that is used in Cardano for other purposes like key derivations. Reusing blake2b means one less dependency. We use `512`

bytes of output to try and future-proof this algorithm (better to spread the entropy across more bits than needed than end up not capturing the full entropy in the future).

For `hash_to_32`

we use CRC32. We hash a second time for the following:

- The
*TextPart*is constructed from 4 bytes (2 for letters, 2 for numbers) and so we need to project the result of`hash1`

down to 4 bytes. - We don't want to simply take the last 4 bytes of
`hash1`

because that would reveal part of the input used to generate the*ImagePart*. Although strictly speaking this should not be of a concern (since the result of`hash1`

doesn't reveal any information about the original key), we take this as a precaution. `CRC32`

is used in the Byron implementation of Cardano as a checksum for addresses, meaning no additional dependency has to be added.

Although there is no specification for CRC32 and many variations exist, in Cardano we use the CRC-32-IEEE variation. You can find a C implementation here

### Hash algorithms used in Shelley mainnet

- For
`hash1`

, we still use`blake2b512`

but we now set the blake2b`personalization`

to the the utf-8 byte equivalent of`wallets checksum`

(exactly 16 utf-8 bytes in length) to avoid collision with any other standard that decides to hash a public key. - For
`hash_to_32`

, we no longer use`crc32`

for the following reasons:

- It has multiple competing implementations all called
`crc32`

(easily to get the wrong implementation library) - It requires building a lookup table, making it slower than other hashing algorithms for similar safety
- Cardano no longer uses
`crc32`

in the Shelley mainnet as addresses now use BIP173 - bech32 which has its own checksum algorithm.

Instead, we replace it with FNV-1a in 32-bit mode. FNV-1a is fast, easy to implement inline and gives good empirical distribution.

### When no public key is present

Note that a different construction is needed for wallet types which do not have a public key (such as a balance tracking application which simply manages a set of addresses). In the balanace tracking case, simply hashing the set of addresses used is possible, but it means that adding & removing an address would change the checksum (possibly unintuitive). Since the checksum is meant to represent the wallet itself, we also cannot run a checksum on the name of the wallet or any other user-inputted data.

## Path to Active

### Acceptance Criteria

- There exists a reference implementation with test vectors.
- Checksums are adopted by two or more wallets.
- Yoroi

### Implementation Plan

- Reference implementations:

## Copyright

This CIP is licensed under Apache-2.0