Moving clusters

From Molecular Modeling Wiki

(Difference between revisions)
Jump to: navigation, search
Line 69: Line 69:
; palladium
; palladium
-
: <font color="blue">Cluster is up and running except for nodes ''p27'' to ''p31'' that need a completely new cabling (hope to get them connected on Friday). Some older nodes are broken and will not be repaired.</font>
+
: <font color="green">Cluster is up and running</font> <font color="red">except for nodes ''p27'' to ''p31'' that need a completely new cabling (hope to get them connected on Friday). Some older nodes are broken and will not be repaired.</font>

Revision as of 13:25, 3 September 2008

Warning

Due to the security problems, the SSH server keys will be regenerated on most cluster servers. This means that you can receive a warning similar to the following when trying to log in for the first time:

@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
@    WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED!     @
@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@
IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY!
Someone could be eavesdropping on you right now (man-in-the-middle attack)!
It is also possible that the RSA host key has just been changed.
The fingerprint for the RSA key sent by the remote host is
9a:a3:68:49:29:f6:a0:f4:c1:64:a0:fd:98:67:b2:67.
Please contact your system administrator.
Add correct host key in /root/.ssh/known_hosts to get rid of this message.
Offending key in /root/.ssh/known_hosts:53
RSA host key for iridium has changed and you have requested strict checking.
Host key verification failed.

To get rid of this warning and get access to the cluster, do the following:

  1. Find a row that starts with Offending key ... in the text of the warning and remember line number after a colon (53 in the above example).
  2. Open the ~/.ssh/known_hosts file delete the line which number was mentioned in the warning (in vi, press <5><3><shift-G><d><d> in case of the above example; replace numbers according to your needs).
  3. Save the file and try to connect to the cluster.
  4. If the connection fails again with the same or similar warning, repeat all steps.


Actual status

centrum
Server is up and running.
carol
Working on ...
marge
... to be moved.
teogate
... to be moved.
althea
Server is up and running.
lithium
Cluster is up and running, except for clients l07 and l22-l25 that need some repair (l22-l24 are probably ok, but have to be tested, l25's memory module failed so it is in a service).
helium
Cluster is up and running.
francium
Server is up and running.
There will be more maintenance work on server, so it can be restarted several times; meanwhile you can access your data.
The cluster is being checked.
The nodes will not be started before the infiniband switch is returned from a warranty repair - it can take two or three weeks.
iridium
Cluster is up and running. Enable queues as needed.
cobalt
Not running (Friday?).
argon
Cluster is up and running. Some nodes are broken and will have to be repaired (mostly dead power source).
krypton
Cluster is up and running. Some nodes are broken and will have to be repaired (mostly dead power source).
radon
Cluster is up and running. Some nodes are broken and will have to be repaired (mostly dead power source).
palladium
Cluster is up and running except for nodes p27 to p31 that need a completely new cabling (hope to get them connected on Friday). Some older nodes are broken and will not be repaired.


vanad
Discontinued.
titanium
Discontinued.
niob
Discontinued.
Personal tools