Hi
пн, 23 авг. 2021 г. в 09:49, <Simon.Westphahl@bmw.de>:
Hi,

On 21 August 2021 21:04, Vitaliy Lotorev wrote:
> After two days node_request: 300-0000000094 still hanging in active requests (node 0000000113 is also seen in web UI in building state).
> To clean nodepool I have to restart nodepool once a week.

A 'building' static nodes will become ready when a node of that type becomes available. So it looks like there are no usable nodes of that type that can be assigned.

Is this only a problem for a specific label (centos7?) or a more general issue? It might be that this static node is never registered because it is unreachable or the host key checking failed.
 
 It applies to any label.

Do you see any exceptions (e.g. StaticNodeError) or log messages like "Failed to connect to node"?
Indeed I see paramiko.transport exceptions "Incompatible ssh peer (no acceptable host key)", but after a series of exceptions there is always a successful message saying (all expected) nodes are registered:
2021-08-23 19:54:23,791 DEBUG nodepool.CleanupWorker: Cleaning up held nodes...
2021-08-23 19:54:23,792 DEBUG nodepool.CleanupWorker: Cleaning up empty nodes...
2021-08-23 19:54:32,583 DEBUG nodepool.PoolWorker.static-vms-main: Active requests: ['300-0000000209', '300-0000000210', '300-0000000211', '300-0000000212']
2021-08-23 19:54:39,317 DEBUG nodepool.DeletedNodeWorker: Marking for deletion unlocked node 0000000291 (state: used, allocated_to: 300-0000000225)
2021-08-23 19:54:39,317 INFO nodepool.DeletedNodeWorker: Deleting used instance ewolly-pc from static-vms
2021-08-23 19:54:39,345 INFO nodepool.NodeDeleter: Deleting ZK node id=0000000291, state=deleting, external_id=ewolly-pc
2021-08-23 19:54:39,434 DEBUG nodepool.utils: Added ssh host key: ssh-ed25519
2021-08-23 19:54:39,451 DEBUG nodepool.utils: Added ssh host key: ecdsa-sha2-nistp256
2021-08-23 19:54:39,460 ERROR paramiko.transport: Exception: Incompatible ssh peer (no acceptable host key)
2021-08-23 19:54:39,460 ERROR paramiko.transport: Traceback (most recent call last):
2021-08-23 19:54:39,460 ERROR paramiko.transport:   File "/usr/local/lib/python3.7/site-packages/paramiko/transport.py", line 2083, in run
2021-08-23 19:54:39,460 ERROR paramiko.transport:     self._handler_table[ptype](self, m)
2021-08-23 19:54:39,461 ERROR paramiko.transport:   File "/usr/local/lib/python3.7/site-packages/paramiko/transport.py", line 2198, in _negotiate_keys
2021-08-23 19:54:39,461 ERROR paramiko.transport:     self._parse_kex_init(m)
2021-08-23 19:54:39,461 ERROR paramiko.transport:   File "/usr/local/lib/python3.7/site-packages/paramiko/transport.py", line 2378, in _parse_kex_init
2021-08-23 19:54:39,461 ERROR paramiko.transport:     "Incompatible ssh peer (no acceptable host key)"
2021-08-23 19:54:39,461 ERROR paramiko.transport: paramiko.ssh_exception.SSHException: Incompatible ssh peer (no acceptable host key)
2021-08-23 19:54:39,461 ERROR paramiko.transport:
2021-08-23 19:54:39,470 ERROR paramiko.transport:Exception: Incompatible ssh peer (no acceptable host key)
2021-08-23 19:54:39,470 ERROR paramiko.transport: Traceback (most recent call last):
2021-08-23 19:54:39,471 ERROR paramiko.transport:   File "/usr/local/lib/python3.7/site-packages/paramiko/transport.py", line 2083, in run
2021-08-23 19:54:39,471 ERROR paramiko.transport:     self._handler_table[ptype](self, m)
2021-08-23 19:54:39,471 ERROR paramiko.transport:   File "/usr/local/lib/python3.7/site-packages/paramiko/transport.py", line 2198, in _negotiate_keys
2021-08-23 19:54:39,471 ERROR paramiko.transport:     self._parse_kex_init(m)
2021-08-23 19:54:39,471 ERROR paramiko.transport:   File "/usr/local/lib/python3.7/site-packages/paramiko/transport.py", line 2378, in _parse_kex_init
2021-08-23 19:54:39,471 ERROR paramiko.transport:     "Incompatible ssh peer (no acceptable host key)"
2021-08-23 19:54:39,471 ERROR paramiko.transport: paramiko.ssh_exception.SSHException: Incompatible ssh peer (no acceptable host key)
2021-08-23 19:54:39,472 ERROR paramiko.transport:
2021-08-23 19:54:39,488 DEBUG nodepool.utils: Added ssh host key: ssh-rsa
2021-08-23 19:54:39,497 ERROR paramiko.transport: Exception: Incompatible ssh peer (no acceptable host key)
2021-08-23 19:54:39,497 ERROR paramiko.transport: Traceback (most recent call last):
2021-08-23 19:54:39,497 ERROR paramiko.transport:   File "/usr/local/lib/python3.7/site-packages/paramiko/transport.py", line 2083, in run
2021-08-23 19:54:39,498 ERROR paramiko.transport:     self._handler_table[ptype](self, m)
2021-08-23 19:54:39,498 ERROR paramiko.transport:   File "/usr/local/lib/python3.7/site-packages/paramiko/transport.py", line 2198, in _negotiate_keys
2021-08-23 19:54:39,498 ERROR paramiko.transport:     self._parse_kex_init(m)
2021-08-23 19:54:39,498 ERROR paramiko.transport:   File "/usr/local/lib/python3.7/site-packages/paramiko/transport.py", line 2378, in _parse_kex_init
2021-08-23 19:54:39,498 ERROR paramiko.transport:     "Incompatible ssh peer (no acceptable host key)"
2021-08-23 19:54:39,498 ERROR paramiko.transport: paramiko.ssh_exception.SSHException: Incompatible ssh peer (no acceptable host key)
2021-08-23 19:54:39,498 ERROR paramiko.transport:
2021-08-23 19:54:39,528 DEBUG nodepool.driver.static.StaticNodeProvider: Registered static node Node(hostname='ewolly-pc', username='zuulbot1', port=2222)


I've uploaded a pastebin with more logs and successful connections to https://pastebin.com/DZ52k6Th
I don't quite understand what causes these exceptions. I generate host keys with 'ssh-keyscan -p 2222 -t ed25519 <host>' and specify them in nodepool.yaml.

> Actually this issue has been tracing me for more than a year (so it's not something that appeared in the latest nodepool versions).
> I hoped recent commit [1] (included in nodepool 4.2.2) might be relevant to fixing my issue but it has not changed anything.

We are also using the static provider quite extensively and don't observe any issues with nodes stuck in 'building' state.
To debug this we probably need some additional info, like the output of `nodepool list` and maybe the startup logs.
Here is an output for centos7 label (all nodes I expect to exist in 'ready' state and some in 'building' state for hours):
$ docker-compose exec launcher nodepool list | grep -E 'ID|centos7'
| ID         | Provider   | Label    | Server ID                   | Public IPv4   | IPv6 | State    | Age         | Locked   |
| 0000000276 | static-vms | centos7  | None                        | None          | None | building | 00:05:36:50 | locked   |
| 0000000298 | static-vms | centos7  | ewolly-pc                   | 10.96.11.40   | None | ready    | 00:00:29:39 | unlocked |
| 0000000275 | static-vms | centos7  | None                        | None          | None | building | 00:05:36:50 | locked   |
| 0000000297 | static-vms | centos7  | monoceros-pc                | 10.96.11.102  | None | ready    | 00:00:36:00 | unlocked |
| 0000000278 | static-vms | centos7  | None                        | None          | None | building | 00:05:36:50 | locked   |
| 0000000277 | static-vms | centos7  | None                        | None          | None | building | 00:05:36:50 | locked   |
| 0000000299 | static-vms | centos7  | ewolly-pc                   | 10.96.11.40   | None | ready    | 00:00:01:43 | unlocked |
| 0000000294 | static-vms | centos7  | lynx-pc                     | 10.0.11.12    | None | ready    | 00:01:08:26 | unlocked |
| 0000000296 | static-vms | centos7  | nto7-centos7-ci1            | 10.96.11.95   | None | ready    | 00:00:36:58 | unlocked |
| 0000000295 | static-vms | centos7  | monoceros-pc                | 10.96.11.102  | None | ready    | 00:01:01:33 | unlocked |


-- 
Regards, Vitaliy