I believe I have identified a bug with the peer reconnection code. I have a node that has done nothing for the last month. A month ago I generated a few blocks and connected to it from a couple other nodes, but then shut those other nodes down. The main node has been attempting to reconnect to those 2 nodes for the last month, and spinning more and more of the CPU as each day goes by until it's using about 25% of the instance's CPU now. I got an alert about low CPU credits this morning and found this to be the problem when I investigated.
When I run the node with `-debug` I see these lines repeated over and over again:
net: Trying to connect to 123.123.123.123:6700
net: trying connection 123.123.123.123:6700 lastseen=627.1hrs
net: connect() to 123.123.123.123:6700 failed after select(): Connection refused (111)
net: Connection not established
net: Trying to connect to 192.168.1.49:6700
net: trying connection 192.168.1.49:6700 lastseen=627.1hrs
net: connection to 192.168.1.49:6700 timeout
net: Connection not established
If I start the node in either `-offline` mode, or if I delete the `peers.dat` file before starting, it does not exhibit this behavior, and sits at 0% CPU like I expect for a node that only has 50 blocks, is not mining and has no actual peers.
This is with versions 2.0a5, 2.0a6, 2.0a7 and 2.0b1.
I also tested copying the .multichain directory to another instance on a newer version of Ubuntu (bionic instead of xenial), and a non-vm computer, and I get the same behavior.
This is a graph showing how it has escalated over the last month.
How long will it wait before it stops trying to connect to these dead nodes? And why is there no way to view what is in the peers.dat file, and to clean them up? I've searched around and not found any definitive answers to how this system works. (or is supposed to work.)
Thanks!
Nate