Hi,
I have been building a simple 2 node "cluster" to play around with OpenIFS. Each node has 4 cores. I'm able to run OpenIFS (the T21 test) just fine on both nodes separately with 4 processes. I'm also able to invoke the executable on node 2 from node 1 using mpirun, so I'm confident that the MPI connection/network settings etc. are configured correctly and that both nodes can talk to each other.
However, when I try to run OpenIFS with 8 processes across both nodes, it hangs with no output - not even a node file. I've tried the solutions to the "
" question in the FAQ but the problem remains. Are there any other common causes of this problem?When I Ctrl-C the executable I can see from the stack trace that it always seems to be stuck in SUMPINI, but I don't know which line. Also, I only get back 4 copies of the stack trace, not 8 as I would expect from an 8-process invocation.
Other details about the system that might be relevant:
I'm running the executable as mpirun -np 8 --hostfile machinefile -x LD_LIBRARY_PATH. machinefile contains the IP addresses of both nodes.
Any ideas?
Thanks!