Looking at encoding/base32 today, I saw a chance to eliminate bound checks. My previous adventure didn't have much luck with this. However, now there are 8 checks per main loop iteration that we can easily expunge:
// from base32.Encode
for len(src) > 0 {
...
for i := 0; i < 8; i++ {
if len(dst) > i {
dst[i] = enc.encode[b[i]] // <-- bounds check on enc.encode
}
}
...
}
Only minor magic is required. We need to convince Go's Bounds Check Elimination (BCE) that we are always in bounds. In this case, that takes two tweaks:
- Switch enc.encode from string to [32]byte
- Bitwise 'and' to limit maximum value
It's a three line change:
type Encoding struct {
- encode string
+ encode [32]byte
decodeMap [256]byte
padChar rune
}
@@ -37,8 +37,12 @@ const encodeHex = "0123456789ABCDEFGHIJKLMNOPQRSTUV"
// NewEncoding returns a new Encoding defined by the given alphabet,
// which must be a 32-byte string.
func NewEncoding(encoder string) *Encoding {
e := new(Encoding)
- e.encode = encoder
+ copy(e.encode[:], encoder)
e.padChar = StdPadding
for i := 0; i < len(e.decodeMap); i++ {
@@ -132,7 +136,7 @@ func (enc *Encoding) Encode(dst, src []byte) {
// Encode 5-bit blocks using the base32 alphabet
for i := 0; i < 8; i++ {
if len(dst) > i {
- dst[i] = enc.encode[b[i]]
+ dst[i] = enc.encode[b[i]&31]
}
}
Purging those conditionals from an inner loop gives us:
name old time/op new time/op delta
EncodeToString-4 41.9µs ± 1% 37.2µs ± 1% -11.05% (p=0.000 n=9+10)
DecodeString-4 103µs ± 2% 98µs ± 3% -4.29% (p=0.000 n=10+10)
name old speed new speed delta
EncodeToString-4 196MB/s ± 1% 220MB/s ± 1% +12.42% (p=0.000 n=9+10)
DecodeString-4 128MB/s ± 2% 134MB/s ± 3% +4.49% (p=0.000 n=10+10)
So a 1.1x increase in throughput. Not amazing but not bad for such a small code tweak. If accepted, the pull request would become part of Go 1.10.