Moving clusters
From Molecular Modeling Wiki
(Difference between revisions)
Line 68: | Line 68: | ||
: <font color="green">Cluster is up and running. Some nodes are broken and will have to be repaired (mostly dead power source).</font> | : <font color="green">Cluster is up and running. Some nodes are broken and will have to be repaired (mostly dead power source).</font> | ||
- | ; | + | ; palladium |
- | : <font color=" | + | : <font color="lightgreen">Cluster is up and running except for ''private'' nodes ''p27'' to ''p31'' that need completely new cabling (hope to get them connected on Friday). Some older nodes are broken and will not be repaired.</font> |
Revision as of 13:23, 3 September 2008
Warning
Due to the security problems, the SSH server keys will be regenerated on most cluster servers. This means that you can receive a warning similar to the following when trying to log in for the first time:
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that the RSA host key has just been changed. The fingerprint for the RSA key sent by the remote host is 9a:a3:68:49:29:f6:a0:f4:c1:64:a0:fd:98:67:b2:67. Please contact your system administrator. Add correct host key in /root/.ssh/known_hosts to get rid of this message. Offending key in /root/.ssh/known_hosts:53 RSA host key for iridium has changed and you have requested strict checking. Host key verification failed.
To get rid of this warning and get access to the cluster, do the following:
- Find a row that starts with Offending key ... in the text of the warning and remember line number after a colon (53 in the above example).
- Open the ~/.ssh/known_hosts file delete the line which number was mentioned in the warning (in vi, press <5><3><shift-G><d><d> in case of the above example; replace numbers according to your needs).
- Save the file and try to connect to the cluster.
- If the connection fails again with the same or similar warning, repeat all steps.
Actual status
- centrum
- Server is up and running.
- carol
- Working on ...
- marge
- ... to be moved.
- teogate
- ... to be moved.
- althea
- Server is up and running.
- lithium
- Cluster is up and running, except for clients l07 and l22-l25 that need some repair (l22-l24 are probably ok, but have to be tested, l25's memory module failed so it is in a service).
- helium
- Cluster is up and running.
- francium
- Server is up and running.
- There will be more maintenance work on server, so it can be restarted several times; meanwhile you can access your data.
- The cluster is being checked.
- The nodes will not be started before the infiniband switch is returned from a warranty repair - it can take two or three weeks.
- iridium
- Cluster is up and running. Enable queues as needed.
- cobalt
- Not running.
- argon
- Cluster is up and running. Some nodes are broken and will have to be repaired (mostly dead power source).
- krypton
- Cluster is up and running. Some nodes are broken and will have to be repaired (mostly dead power source).
- radon
- Cluster is up and running. Some nodes are broken and will have to be repaired (mostly dead power source).
- palladium
- Cluster is up and running except for private nodes p27 to p31 that need completely new cabling (hope to get them connected on Friday). Some older nodes are broken and will not be repaired.
- vanad
- Discontinued.
- titanium
- Discontinued.
- niob
- Discontinued.