Specify options to better handle boltdb fragmentation

### Background

When profiling my lnd node on testnet, I noticed a rather high heap memory with most of it due to boltdb:
```
Showing nodes accounting for 67.91MB, 98.54% of 68.92MB total
Showing top 10 nodes out of 64
      flat  flat%   sum%        cum   cum%
   24.33MB 35.30% 35.30%    24.33MB 35.30%  github.com/coreos/bbolt.(*freelist).reindex
   14.36MB 20.84% 56.14%    14.36MB 20.84%  github.com/coreos/bbolt.pgids.merge
    8.04MB 11.67% 67.80%     8.04MB 11.67%  github.com/lightningnetwork/lnd/pool.NewReadBuffer.func1
    7.20MB 10.45% 78.26%     7.20MB 10.45%  github.com/coreos/bbolt.(*DB).allocate
```
digging deeper, I saw that `UpdateEdgePolicy` was using a lot of memory in the `db.Update` function.
```
         0    22.07MB (flat, cum) 32.02% of Total
         .          .   1836:func (c *ChannelGraph) UpdateEdgePolicy(edge *ChannelEdgePolicy) error {
         .          .   1837:	c.cacheMu.Lock()
         .          .   1838:	defer c.cacheMu.Unlock()
         .          .   1839:
         .          .   1840:	var isUpdate1 bool
         .    22.07MB   1841:	err := c.db.Update(func(tx *bbolt.Tx) error {
         .          .   1842:		var err error
         .          .   1843:		isUpdate1, err = updateEdgePolicy(tx, edge)
         .          .   1844:		return err
         .          .   1845:	})
         .          .   1846:	if err != nil {
```
digging even deeper we see that ~7MB is being allocated for every `Update` call because it is commiting a new freelist to disc every time:
```
ROUTINE ======================== github.com/coreos/bbolt.(*DB).allocate in /Users/nsa/go/pkg/mod/github.com/coreos/bbolt@v1.3.2/db.go
    7.20MB     7.20MB (flat, cum) 10.45% of Total
         .          .    914:	// Allocate a temporary buffer for the page.
         .          .    915:	var buf []byte
         .          .    916:	if count == 1 {
         .          .    917:		buf = db.pagePool.Get().([]byte)
         .          .    918:	} else {
    7.20MB     7.20MB    919:		buf = make([]byte, count*db.pageSize)
         .          .    920:	}
         .          .    921:	p := (*page)(unsafe.Pointer(&buf[0]))
         .          .    922:	p.overflow = uint32(count - 1)
         .          .    923:
         .          .    924:	// Use pages from the freelist if they are available.
```
By disabling the committing of the freelist to disk and setting the underlying freelist to hashmap-based, we see the memory usage drop since `merge` calls are cheaper with a hashmap and because 7MB isn't being allocated every `Update` call.  If we disable committing the freelist to disk, we still have to reindex the freelist upon opening the db, but for my 7MB freelist this time was negligible.
```
Showing nodes accounting for 41976.83kB, 98.79% of 42488.85kB total
Showing top 10 nodes out of 85
      flat  flat%   sum%        cum   cum%
24912.89kB 58.63% 58.63% 24912.89kB 58.63%  github.com/coreos/bbolt.(*freelist).reindex
 7168.88kB 16.87% 75.51%  7168.88kB 16.87%  github.com/lightningnetwork/lnd/htlcswitch.(*circuitMap).decodeCircuit
 2337.56kB  5.50% 81.01%  2337.56kB  5.50%  github.com/lightningnetwork/lnd/channeldb.newRejectCache
 2337.56kB  5.50% 86.51%  9506.44kB 22.37%  github.com/lightningnetwork/lnd/htlcswitch.(*circuitMap).restoreMemState.func1.1
 1646.53kB  3.88% 90.38%  1646.53kB  3.88%  github.com/lightningnetwork/lnd/pool.NewReadBuffer.func1
 1089.33kB  2.56% 92.95%  1089.33kB  2.56%  github.com/lightningnetwork/lnd/pool.NewWriteBuffer.func1
  902.59kB  2.12% 95.07%   902.59kB  2.12%  github.com/lightningnetwork/lnd/routing/chainview.(*BtcdFilteredChainView).chainFilterer
  557.26kB  1.31% 96.38%   557.26kB  1.31%  crypto/elliptic.initTable
  512.16kB  1.21% 97.59%   512.16kB  1.21%  crypto/aes.(*aesCipherGCM).NewGCM
  512.07kB  1.21% 98.79%   512.07kB  1.21%  github.com/lightningnetwork/lnd/channeldb.deserializeChanEdgePolicy
```

As a result, UpdateEdgePolicy calls go from 7MB to 513KB:
```
UpdateEdgePolicy goes from 7MB to 513KB
ROUTINE ======================== github.com/lightningnetwork/lnd/channeldb.(*ChannelGraph).UpdateEdgePolicy in /Users/nsa/go/src/github.com/lightningnetwork/lnd/channeldb/graph.go
         0   513.12kB (flat, cum) 0.064% of Total
         .          .   1836:func (c *ChannelGraph) UpdateEdgePolicy(edge *ChannelEdgePolicy) error {
         .          .   1837:	c.cacheMu.Lock()
         .          .   1838:	defer c.cacheMu.Unlock()
         .          .   1839:
         .          .   1840:	var isUpdate1 bool
         .   513.12kB   1841:	err := c.db.Update(func(tx *bbolt.Tx) error { // 21.55MB of in-use space?
         .          .   1842:		var err error
         .          .   1843:		isUpdate1, err = updateEdgePolicy(tx, edge)
         .          .   1844:		return err
         .          .   1845:	})
         .          .   1846:	if err != nil {
```

The options can be specified in `channeldb/db.go` when opening the db like so:
```
        options := &bbolt.Options{
                NoFreelistSync: true,
                FreelistType: "hashmap"
        }
	bdb, err := bbolt.Open(path, dbFilePermission, options)
	if err != nil {
		return nil, err
	}
```

### Your environment

* version of `lnd`
master
* which operating system (`uname -a` on *Nix)
Mac
* version of `boltdb`
1.3.2

### Steps to reproduce

Have a fragmented `channel.db`. You can check if this is the case by running:
`bolt pages channel.db` and counting the number of free pages (should be very high).  This happens when boltdb frees up a lot of pages by deleting something.  I did run `bolt compact` on my channel.db and it did get rid of *most* of the pages on my freelist, but:
1) a user shouldn't have to continually run `bolt compact` to have perf improvements on the heap
2) it doesn't always work, see: https://github.com/boltdb/bolt/issues/640

This could easily be a boltdb flag in lnd to minimize heap usage.
### Related:
https://github.com/etcd-io/bbolt/pull/141



Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Specify options to better handle boltdb fragmentation #3241

Background

Your environment

Steps to reproduce

Related:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Specify options to better handle boltdb fragmentation #3241

Description

Background

Your environment

Steps to reproduce

Related:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions