c - bash: /usr/bin/hydra_pmi_proxy: No such file or directory -
i struggling set mpi cluster, following setting mpich2 cluster in ubuntu tutorial. have running , machine file this:
pythagoras:2 # spawn 2 processes on pythagoras geomcomp # spawn 1 process on geomcomp
the tutorial states:
and run (the parameter next -n specifies number of processes spawn , distribute among nodes): mpiu@ub0:~$ mpiexec -n 8 -f machinefile ./mpi_hello
with -n 1 , -n 2 runs fine, -n 3, fails, can see below:
gsamaras@pythagoras:/mirror$ mpiexec -n 1 -f machinefile ./mpi_hello hello processor 0 of 1 gsamaras@pythagoras:/mirror$ mpiexec -n 2 -f machinefile ./mpi_hello hello processor 0 of 2 hello processor 1 of 2 gsamaras@pythagoras:/mirror$ mpiexec -n 3 -f machinefile ./mpi_hello bash: /usr/bin/hydra_pmi_proxy: no such file or directory {hungs up}
maybe parameter next -n specifies number of machines? mean number of processes stated in machinefile, isn't it? also, have used 2 machines mpi cluster (hope case , output getting not master node (i.e. pythagoras), slave 1 (i.e. geomcomp)).
edit_1
well think parameter next -n specifies number of processes, since in tutorial linked to, uses 4 machines , machine file implies 8 processes run. why need parameter next -n though? whatever reason is, still can't why run fails -n 3.
edit_2
following edit_1, -n 3 logical, since machinefile implies 3 processes spawned.
edit_3
i think problem lies when tries spawn process in slave node (i.e. geomcomp).
edit_4
pythagoras runs on debian 8, while geomcomp runs on debian 6. machines of same architecture. problem lies in geomcomp, since tried mpiexec -n 1 ./mpi_hello
there , said no daemon runs.
so, got, in pythagoras:
gsamaras@pythagoras:~$ mpichversion mpich version: 3.1 mpich release date: thu feb 20 11:41:13 cst 2014 mpich device: ch3:nemesis mpich configure: --build=x86_64-linux-gnu --prefix=/usr --includedir=${prefix}/include --mandir=${prefix}/share/man --infodir=${prefix}/share/info --sysconfdir=/etc --localstatedir=/var --libdir=${prefix}/lib/x86_64-linux-gnu --libexecdir=${prefix}/lib/x86_64-linux-gnu --disable-maintainer-mode --disable-dependency-tracking --enable-shared --prefix=/usr --enable-fc --disable-rpath --disable-wrapper-rpath --sysconfdir=/etc/mpich --libdir=/usr/lib/x86_64-linux-gnu --includedir=/usr/include/mpich --docdir=/usr/share/doc/mpich --with-hwloc-prefix=system --enable-checkpointing --with-hydra-ckpointlib=blcr mpich cc: gcc -g -o2 -fstack-protector-strong -wformat -werror=format-security -g -o2 -fstack-protector-strong -wformat -werror=format-security -o2 mpich cxx: g++ -g -o2 -fstack-protector-strong -wformat -werror=format-security -g -o2 -fstack-protector-strong -wformat -werror=format-security mpich f77: gfortran -g -o2 -fstack-protector-strong -g -o2 -fstack-protector-strong -o2 mpich fc: gfortran -g -o2 -fstack-protector-strong -g -o2 -fstack-protector-strong gsamaras@pythagoras:~$ mpiexec /usr/bin/mpiexec gsamaras@pythagoras:~$ mpirun /usr/bin/mpirun
where in geomcomp got:
gsamaras@geomcomp:~$ mpichversion -bash: mpichversion: command not found gsamaras@geomcomp:~$ mpiexec /usr/bin/mpiexec gsamaras@geomcomp:~$ mpirun /usr/bin/mpirun
i had installed mpich2, tutorial instructed. should do? working on /mirror
@ master node. mounted on slave node.
1. relevant question, mpiexec.hydra - how run mpi process on machines locations of hydra_pmi_proxy different?, different mine, might case here too. 2. damn it, hydra know greek island, missing? :/
i'd you've identified genuine shortcomming of hydra: there should way tell paths on other nodes different.
where mpich installed on pythagoras? mpich installed on geocomp?
in simplest configuration, have, example, common home directory, , have installed mpich ${home}/soft/mpich.
hydra might not starting "login shell" on remote machine. if add mpich installation path path environment variable, you'll have in file .bashrc (or whatever equivalent shell is).
to test this, try 'ssh geocomp mpichversion' , 'ssh pythagoras mpichversion' , plain old 'mpichversion'. should tell how environment set up.
in case, environment strage! debian 8 , debian 6 , looks not same versions of mpich.. think, abi initiative, mpich-3.1 , newer work mpich-3.1, if have version of mpich pre-dates "mpich2 mpich" conversion, there no such gaurantees.
and setting abi aside, you've got mpich expects hydra launcher (the debian 8 version) , mpich expects mpd launcher. (the debian 6 version)
and if have recent enough packages, way things can work if have same architecture on machines. abi, ken points out, not mean suppore heterogeneous environments.
remove distro packages , build mpich on both machines.
Comments
Post a Comment