[mpich-discuss] forrtl errors
Christopher Tanner
christopher.tanner at gatech.edu
Wed Oct 8 10:38:13 CDT 2008
Thanks for your help guys -- fixing the typo fixed the problem. Sorry
for such a rookie mistake.
-------------------------------------------
Chris Tanner
Space Systems Design Lab
Georgia Institute of Technology
christopher.tanner at gatech.edu
-------------------------------------------
On Oct 7, 2008, at 7:54 PM, Gus Correa wrote:
> Hi Christopher and list
>
> In a thread parallel to this, answering your question about "mpif77
> with ifort",
> Rajeev suggested that you fix the MPICH2 configuration error,
> by changing F70=ifort to F77=ifort.
> Besides, your configuration script has yet another typo: --enable-
> f70 instead of --enable-f77.
> There is a non-negligible chance that this is part of the problem
> with your I/O too.
>
> The best way to go about it would be to do a "make distclean" in
> your MPICH2 directory,
> to wipe off any old mess, and rebuild a fresh MPICH2 with the right
> configuration options.
>
> After you build MPICH2 fresh, it is a good idea to compile and run
> some of their example programs,
> (in the "examples" directory and subdirectories): cpi.c, f77/fpi.f,
> f90/pi3f90.f90, and cxx/cxxpi.cxx,
> just to make sure the build was right, and that you can run the
> programs in your cluster or "NOW".
>
> Also, when you compile and run, better use full path names to the
> compiler wrappers (mpicc, etc),
> and to mpiexec, to avoid any confusion with other versions of MPI
> that may be hanging around on
> your computer (or make sure your PATH variable is neatly organized).
> This is a very common problem, as most Linux distributions and
> commercial
> compilers flood our computers with a variety of MPI stuff.
> Very often people want to use, say, MPICH2, but inadvertently
> compile with LAM MPI mpicc,
> then run with mpiexec from MPICH-1, and things the like,
> because their PATH is not what they think it is.
> (Check it with "which mpicc", "which mpiexec", etc.)
>
> Then, you can remove any residual of previous compilations of your
> program.
> (make cleanall, if you have a proper makefile, or simply remove the
> executable and any object
> files, pre-processed files, etc, by hand).
>
> The next step is a fresh recompilation of your program, using the
> newly and correctly built mpif77.
> Finally run the program again and see how it goes.
> Not guaranteed that it will work, but at least you can discard
> problems with how MPICH2 was built,
> and how you launch MPI programs on your computers.
> This is painful, but likely to prevent a misleading superposition of
> different errors,
> and may narrow your search.
>
> My two cents,
> Gus Correa
>
> --
> ---------------------------------------------------------------------
> Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
> Lamont-Doherty Earth Observatory - Columbia University
> P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
>
> Christopher Tanner wrote:
>
>> Gus / All -
>>
>> I've NFS mounted the /home directory on all nodes. To ensure that
>> the permissions are correct and the NFS export mechanism is
>> correct, I ssh'ed to each node and made sure I could read and
>> write files to the / home/<user> directory. Is this sufficient to
>> ensure that mpiexec can read and read to the home directory?
>>
>> I'm launching the process from the /home/<user> directory, where
>> the data files to be read/wrote are. The executable is in the NFS
>> exported directory /usr/local/bin.
>>
>> Regarding the code itself making I/O errors, this is what I
>> assumed initially. Since it occurred on two different
>> applications, I'm assuming it's the MPI and not the application,
>> but I could be wrong.
>>
>> Does anything here stand out as bad?
>>
>> I emailed out the output from my configure command -- hopefully
>> this may shed some light on the issue.
>>
>> -------------------------------------------
>> Chris Tanner
>> Space Systems Design Lab
>> Georgia Institute of Technology
>> christopher.tanner at gatech.edu
>> -------------------------------------------
>>
>>
>>
>> On Oct 7, 2008, at 3:37 PM, Gus Correa wrote:
>>
>>> Hi Christopher and list
>>>
>>> A number of different problems can generate I/O errors in a
>>> parallel environment.
>>> Some that I came across with (there are certainly more):
>>>
>>> 1) Permissions on the target directory. (Can you read and write
>>> there?)
>>> 2) If you are running on separate hosts (a cluster or a "NOW"),
>>> are you doing I?O to local disks/filesystems, or to a NFS mounted
>>> directory?
>>> 2.A) If local disks, are the presumed directories already created
>>> there, and with the right permissions?
>>> 2.B) If NFS, is the export/mount mechanism operating properly?
>>> 3) On which directory do your processes start in each execution
>>> host?
>>> The same as in the host where you launch the mpiexec command or on
>>> a different directory? (See mpiexec -wdir option, assuming you
>>> are using the mpiexec that comes with MPICH2. There are other
>>> mpiexec commands, though.)
>>> 4) Code (Fortran code, I presume) that makes wrong assumptions
>>> about file status,.
>>> e.g. "open(fin,file='myfile',status=old)" but 'myfile' doesn't
>>> exist yet.
>>>
>>> Witting a very simple MPI test program that where each process
>>> opens/ creates, writes, and closes,
>>> a different file may help you sort this out.
>>>
>>> Also, I wonder if your precompiled commercial applications are
>>> using the same MPICH2 that
>>> you configured, or some other MPI version.
>>>
>>> I hope this helps,
>>> Gus Correa
>>>
>>> --
>>> ---------------------------------------------------------------------
>>> Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
>>> Lamont-Doherty Earth Observatory - Columbia University
>>> P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
>>> ---------------------------------------------------------------------
>>>
>>>
>>> Christopher Tanner wrote:
>>>
>>>> Hi All -
>>>>
>>>> I am receiving the same errors in multiple applications when I
>>>> try to run them over MPICH2. They all read:
>>>>
>>>> forrtl: Input/output error
>>>> forrtl: No such file or directory
>>>> forrtl: severe ...
>>>>
>>>> This doesn't happen when I try to run any tests (i.e.
>>>> mpiexec ... hostname), only whenever I run the applications.
>>>> Additionally, it happens with pre-compiled (i.e. commercial
>>>> applications) applications as well as applications compiled on
>>>> the machine (i.e. open-source applications). At first I thought
>>>> it was something to do with the application, now I'm starting
>>>> to think I've done something wrong with MPICH2. Below is the
>>>> configure command I used:
>>>>
>>>> ./configure --prefix=/usr/local/mpi/mpich2 --enable-f77 --enable-
>>>> f90 -- enable-cxx --enable-sharedlibs=gcc --enable-fast=defopt
>>>> CC=icc CFLAGS=- m64 CXX=icpc CXXFLAGS=-m64 F77=ifort FFLAGS=-m64
>>>> F90=ifort F90FLAGS=-m64
>>>>
>>>> Anyone have any clues? Thanks!
>>>>
>>>> -------------------------------------------
>>>> Chris Tanner
>>>> Space Systems Design Lab
>>>> Georgia Institute of Technology
>>>> christopher.tanner at gatech.edu
>>>> -------------------------------------------
>>>>
>>>>
>>>
>
More information about the mpich-discuss
mailing list