[mpich-discuss] forrtl errors
Gus Correa
gus at ldeo.columbia.edu
Tue Oct 7 18:54:47 CDT 2008
Hi Christopher and list
In a thread parallel to this, answering your question about "mpif77 with
ifort",
Rajeev suggested that you fix the MPICH2 configuration error,
by changing F70=ifort to F77=ifort.
Besides, your configuration script has yet another typo: --enable-f70
instead of --enable-f77.
There is a non-negligible chance that this is part of the problem with
your I/O too.
The best way to go about it would be to do a "make distclean" in your
MPICH2 directory,
to wipe off any old mess, and rebuild a fresh MPICH2 with the right
configuration options.
After you build MPICH2 fresh, it is a good idea to compile and run some
of their example programs,
(in the "examples" directory and subdirectories): cpi.c, f77/fpi.f,
f90/pi3f90.f90, and cxx/cxxpi.cxx,
just to make sure the build was right, and that you can run the programs
in your cluster or "NOW".
Also, when you compile and run, better use full path names to the
compiler wrappers (mpicc, etc),
and to mpiexec, to avoid any confusion with other versions of MPI that
may be hanging around on
your computer (or make sure your PATH variable is neatly organized).
This is a very common problem, as most Linux distributions and commercial
compilers flood our computers with a variety of MPI stuff.
Very often people want to use, say, MPICH2, but inadvertently compile
with LAM MPI mpicc,
then run with mpiexec from MPICH-1, and things the like,
because their PATH is not what they think it is.
(Check it with "which mpicc", "which mpiexec", etc.)
Then, you can remove any residual of previous compilations of your program.
(make cleanall, if you have a proper makefile, or simply remove the
executable and any object
files, pre-processed files, etc, by hand).
The next step is a fresh recompilation of your program, using the newly
and correctly built mpif77.
Finally run the program again and see how it goes.
Not guaranteed that it will work, but at least you can discard problems
with how MPICH2 was built,
and how you launch MPI programs on your computers.
This is painful, but likely to prevent a misleading superposition of
different errors,
and may narrow your search.
My two cents,
Gus Correa
--
---------------------------------------------------------------------
Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------
Christopher Tanner wrote:
> Gus / All -
>
> I've NFS mounted the /home directory on all nodes. To ensure that the
> permissions are correct and the NFS export mechanism is correct, I
> ssh'ed to each node and made sure I could read and write files to the
> / home/<user> directory. Is this sufficient to ensure that mpiexec
> can read and read to the home directory?
>
> I'm launching the process from the /home/<user> directory, where the
> data files to be read/wrote are. The executable is in the NFS
> exported directory /usr/local/bin.
>
> Regarding the code itself making I/O errors, this is what I assumed
> initially. Since it occurred on two different applications, I'm
> assuming it's the MPI and not the application, but I could be wrong.
>
> Does anything here stand out as bad?
>
> I emailed out the output from my configure command -- hopefully this
> may shed some light on the issue.
>
> -------------------------------------------
> Chris Tanner
> Space Systems Design Lab
> Georgia Institute of Technology
> christopher.tanner at gatech.edu
> -------------------------------------------
>
>
>
> On Oct 7, 2008, at 3:37 PM, Gus Correa wrote:
>
>> Hi Christopher and list
>>
>> A number of different problems can generate I/O errors in a parallel
>> environment.
>> Some that I came across with (there are certainly more):
>>
>> 1) Permissions on the target directory. (Can you read and write there?)
>> 2) If you are running on separate hosts (a cluster or a "NOW"),
>> are you doing I?O to local disks/filesystems, or to a NFS mounted
>> directory?
>> 2.A) If local disks, are the presumed directories already created
>> there, and with the right permissions?
>> 2.B) If NFS, is the export/mount mechanism operating properly?
>> 3) On which directory do your processes start in each execution host?
>> The same as in the host where you launch the mpiexec command or on a
>> different directory? (See mpiexec -wdir option, assuming you are
>> using the mpiexec that comes with MPICH2. There are other mpiexec
>> commands, though.)
>> 4) Code (Fortran code, I presume) that makes wrong assumptions about
>> file status,.
>> e.g. "open(fin,file='myfile',status=old)" but 'myfile' doesn't exist
>> yet.
>>
>> Witting a very simple MPI test program that where each process opens/
>> creates, writes, and closes,
>> a different file may help you sort this out.
>>
>> Also, I wonder if your precompiled commercial applications are using
>> the same MPICH2 that
>> you configured, or some other MPI version.
>>
>> I hope this helps,
>> Gus Correa
>>
>> --
>> ---------------------------------------------------------------------
>> Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
>> Lamont-Doherty Earth Observatory - Columbia University
>> P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
>> ---------------------------------------------------------------------
>>
>>
>> Christopher Tanner wrote:
>>
>>> Hi All -
>>>
>>> I am receiving the same errors in multiple applications when I try
>>> to run them over MPICH2. They all read:
>>>
>>> forrtl: Input/output error
>>> forrtl: No such file or directory
>>> forrtl: severe ...
>>>
>>> This doesn't happen when I try to run any tests (i.e. mpiexec ...
>>> hostname), only whenever I run the applications. Additionally, it
>>> happens with pre-compiled (i.e. commercial applications)
>>> applications as well as applications compiled on the machine (i.e.
>>> open-source applications). At first I thought it was something to
>>> do with the application, now I'm starting to think I've done
>>> something wrong with MPICH2. Below is the configure command I used:
>>>
>>> ./configure --prefix=/usr/local/mpi/mpich2 --enable-f77 --enable-
>>> f90 -- enable-cxx --enable-sharedlibs=gcc --enable-fast=defopt
>>> CC=icc CFLAGS=- m64 CXX=icpc CXXFLAGS=-m64 F77=ifort FFLAGS=-m64
>>> F90=ifort F90FLAGS=-m64
>>>
>>> Anyone have any clues? Thanks!
>>>
>>> -------------------------------------------
>>> Chris Tanner
>>> Space Systems Design Lab
>>> Georgia Institute of Technology
>>> christopher.tanner at gatech.edu
>>> -------------------------------------------
>>>
>>>
>>
More information about the mpich-discuss
mailing list