-
Notifications
You must be signed in to change notification settings - Fork 2.2k
Description
Background
When profiling my lnd node on testnet, I noticed a rather high heap memory with most of it due to boltdb:
Showing nodes accounting for 67.91MB, 98.54% of 68.92MB total
Showing top 10 nodes out of 64
flat flat% sum% cum cum%
24.33MB 35.30% 35.30% 24.33MB 35.30% github.com/coreos/bbolt.(*freelist).reindex
14.36MB 20.84% 56.14% 14.36MB 20.84% github.com/coreos/bbolt.pgids.merge
8.04MB 11.67% 67.80% 8.04MB 11.67% github.com/lightningnetwork/lnd/pool.NewReadBuffer.func1
7.20MB 10.45% 78.26% 7.20MB 10.45% github.com/coreos/bbolt.(*DB).allocate
digging deeper, I saw that UpdateEdgePolicy
was using a lot of memory in the db.Update
function.
0 22.07MB (flat, cum) 32.02% of Total
. . 1836:func (c *ChannelGraph) UpdateEdgePolicy(edge *ChannelEdgePolicy) error {
. . 1837: c.cacheMu.Lock()
. . 1838: defer c.cacheMu.Unlock()
. . 1839:
. . 1840: var isUpdate1 bool
. 22.07MB 1841: err := c.db.Update(func(tx *bbolt.Tx) error {
. . 1842: var err error
. . 1843: isUpdate1, err = updateEdgePolicy(tx, edge)
. . 1844: return err
. . 1845: })
. . 1846: if err != nil {
digging even deeper we see that ~7MB is being allocated for every Update
call because it is commiting a new freelist to disc every time:
ROUTINE ======================== github.com/coreos/bbolt.(*DB).allocate in /Users/nsa/go/pkg/mod/github.com/coreos/[email protected]/db.go
7.20MB 7.20MB (flat, cum) 10.45% of Total
. . 914: // Allocate a temporary buffer for the page.
. . 915: var buf []byte
. . 916: if count == 1 {
. . 917: buf = db.pagePool.Get().([]byte)
. . 918: } else {
7.20MB 7.20MB 919: buf = make([]byte, count*db.pageSize)
. . 920: }
. . 921: p := (*page)(unsafe.Pointer(&buf[0]))
. . 922: p.overflow = uint32(count - 1)
. . 923:
. . 924: // Use pages from the freelist if they are available.
By disabling the committing of the freelist to disk and setting the underlying freelist to hashmap-based, we see the memory usage drop since merge
calls are cheaper with a hashmap and because 7MB isn't being allocated every Update
call. If we disable committing the freelist to disk, we still have to reindex the freelist upon opening the db, but for my 7MB freelist this time was negligible.
Showing nodes accounting for 41976.83kB, 98.79% of 42488.85kB total
Showing top 10 nodes out of 85
flat flat% sum% cum cum%
24912.89kB 58.63% 58.63% 24912.89kB 58.63% github.com/coreos/bbolt.(*freelist).reindex
7168.88kB 16.87% 75.51% 7168.88kB 16.87% github.com/lightningnetwork/lnd/htlcswitch.(*circuitMap).decodeCircuit
2337.56kB 5.50% 81.01% 2337.56kB 5.50% github.com/lightningnetwork/lnd/channeldb.newRejectCache
2337.56kB 5.50% 86.51% 9506.44kB 22.37% github.com/lightningnetwork/lnd/htlcswitch.(*circuitMap).restoreMemState.func1.1
1646.53kB 3.88% 90.38% 1646.53kB 3.88% github.com/lightningnetwork/lnd/pool.NewReadBuffer.func1
1089.33kB 2.56% 92.95% 1089.33kB 2.56% github.com/lightningnetwork/lnd/pool.NewWriteBuffer.func1
902.59kB 2.12% 95.07% 902.59kB 2.12% github.com/lightningnetwork/lnd/routing/chainview.(*BtcdFilteredChainView).chainFilterer
557.26kB 1.31% 96.38% 557.26kB 1.31% crypto/elliptic.initTable
512.16kB 1.21% 97.59% 512.16kB 1.21% crypto/aes.(*aesCipherGCM).NewGCM
512.07kB 1.21% 98.79% 512.07kB 1.21% github.com/lightningnetwork/lnd/channeldb.deserializeChanEdgePolicy
As a result, UpdateEdgePolicy calls go from 7MB to 513KB:
UpdateEdgePolicy goes from 7MB to 513KB
ROUTINE ======================== github.com/lightningnetwork/lnd/channeldb.(*ChannelGraph).UpdateEdgePolicy in /Users/nsa/go/src/github.com/lightningnetwork/lnd/channeldb/graph.go
0 513.12kB (flat, cum) 0.064% of Total
. . 1836:func (c *ChannelGraph) UpdateEdgePolicy(edge *ChannelEdgePolicy) error {
. . 1837: c.cacheMu.Lock()
. . 1838: defer c.cacheMu.Unlock()
. . 1839:
. . 1840: var isUpdate1 bool
. 513.12kB 1841: err := c.db.Update(func(tx *bbolt.Tx) error { // 21.55MB of in-use space?
. . 1842: var err error
. . 1843: isUpdate1, err = updateEdgePolicy(tx, edge)
. . 1844: return err
. . 1845: })
. . 1846: if err != nil {
The options can be specified in channeldb/db.go
when opening the db like so:
options := &bbolt.Options{
NoFreelistSync: true,
FreelistType: "hashmap"
}
bdb, err := bbolt.Open(path, dbFilePermission, options)
if err != nil {
return nil, err
}
Your environment
- version of
lnd
master - which operating system (
uname -a
on *Nix)
Mac - version of
boltdb
1.3.2
Steps to reproduce
Have a fragmented channel.db
. You can check if this is the case by running:
bolt pages channel.db
and counting the number of free pages (should be very high). This happens when boltdb frees up a lot of pages by deleting something. I did run bolt compact
on my channel.db and it did get rid of most of the pages on my freelist, but:
- a user shouldn't have to continually run
bolt compact
to have perf improvements on the heap - it doesn't always work, see: Performance problems with large freelists boltdb/bolt#640
This could easily be a boltdb flag in lnd to minimize heap usage.