[mpich-discuss] forrtl errors

Gus Correa gus at ldeo.columbia.edu
Tue Oct 7 18:54:47 CDT 2008


Hi Christopher and list

In a thread parallel to this, answering your question about "mpif77 with 
ifort",
Rajeev suggested that you fix the MPICH2 configuration error,
by changing F70=ifort to F77=ifort.
Besides, your configuration script has yet another typo: --enable-f70 
instead of --enable-f77.
There is a non-negligible chance that this is part of the problem with 
your I/O too.

The best way to go about it would be to do a "make distclean" in your 
MPICH2 directory,
to wipe off any old mess, and rebuild a fresh MPICH2 with the right 
configuration options.

After you build MPICH2 fresh, it is a good idea to compile and run some 
of their example programs,
(in the "examples" directory and subdirectories): cpi.c, f77/fpi.f,  
f90/pi3f90.f90, and cxx/cxxpi.cxx,
just to make sure the build was right, and that you can run the programs 
in your cluster or "NOW".

Also, when you compile and run, better use full path names to the 
compiler wrappers (mpicc, etc),
and to mpiexec, to avoid any confusion with other versions of MPI that 
may be hanging around on
your computer (or make sure your PATH variable is neatly organized).
This is a very common problem, as most Linux distributions and commercial
compilers flood our computers with a variety of MPI stuff.
Very often  people want to use, say, MPICH2, but inadvertently compile 
with LAM MPI mpicc,
then run with mpiexec from MPICH-1, and things the like,
because their PATH is not what they think it is.
(Check it with "which mpicc", "which mpiexec", etc.)

Then, you can remove any residual of previous compilations of your program.
(make cleanall, if you have a proper makefile, or simply remove the 
executable and any object
files, pre-processed files, etc, by hand).

The next step is a fresh recompilation of your program, using the newly 
and correctly built mpif77.
 
Finally run the program again and see how it goes.
Not guaranteed that it will work, but at least you can discard problems 
with how MPICH2 was built,
and how you launch MPI programs on your computers.
This is painful, but likely to prevent a misleading superposition of 
different errors,
and may narrow your search.

My two cents,
Gus Correa

-- 
---------------------------------------------------------------------
Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
Lamont-Doherty Earth Observatory - Columbia University
P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
---------------------------------------------------------------------


Christopher Tanner wrote:

> Gus / All -
>
> I've NFS mounted the /home directory on all nodes. To ensure that the  
> permissions are correct and the NFS export mechanism is correct, I  
> ssh'ed to each node and made sure I could read and write files to the 
> / home/<user> directory. Is this sufficient to ensure that mpiexec 
> can  read and read to the home directory?
>
> I'm launching the process from the /home/<user> directory, where the  
> data files to be read/wrote are. The executable is in the NFS 
> exported  directory /usr/local/bin.
>
> Regarding the code itself making I/O errors, this is what I assumed  
> initially. Since it occurred on two different applications, I'm  
> assuming it's the MPI and not the application, but I could be wrong.
>
> Does anything here stand out as bad?
>
> I emailed out the output from my configure command -- hopefully this  
> may shed some light on the issue.
>
> -------------------------------------------
> Chris Tanner
> Space Systems Design Lab
> Georgia Institute of Technology
> christopher.tanner at gatech.edu
> -------------------------------------------
>
>
>
> On Oct 7, 2008, at 3:37 PM, Gus Correa wrote:
>
>> Hi Christopher and list
>>
>> A number of different problems can generate I/O errors in a parallel  
>> environment.
>> Some that I came across with (there are certainly more):
>>
>> 1) Permissions on the target directory. (Can you read and write  there?)
>> 2) If you are running on separate hosts (a cluster or a "NOW"),
>> are you doing I?O to local disks/filesystems, or to a NFS mounted  
>> directory?
>> 2.A) If local disks, are the presumed directories already created  
>> there, and with the right permissions?
>> 2.B) If NFS, is the export/mount mechanism operating properly?
>> 3)  On which directory do your processes start in each execution host?
>> The same as in the host where you launch the mpiexec command or on a  
>> different directory? (See mpiexec -wdir option, assuming you are  
>> using the mpiexec that comes with MPICH2. There are other mpiexec  
>> commands, though.)
>> 4) Code (Fortran code, I presume) that makes wrong assumptions about  
>> file status,.
>> e.g. "open(fin,file='myfile',status=old)" but 'myfile' doesn't exist  
>> yet.
>>
>> Witting a very simple MPI test program that where each process opens/ 
>> creates, writes, and closes,
>> a different file may help you sort this out.
>>
>> Also, I wonder if your precompiled commercial applications are using  
>> the same MPICH2 that
>> you configured, or some other MPI version.
>>
>> I hope this helps,
>> Gus Correa
>>
>> -- 
>> ---------------------------------------------------------------------
>> Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
>> Lamont-Doherty Earth Observatory - Columbia University
>> P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
>> ---------------------------------------------------------------------
>>
>>
>> Christopher Tanner wrote:
>>
>>> Hi All -
>>>
>>> I am receiving the same errors in multiple applications when I try  
>>> to  run them over MPICH2. They all read:
>>>
>>> forrtl: Input/output error
>>> forrtl: No such file or directory
>>> forrtl: severe ...
>>>
>>> This doesn't happen when I try to run any tests (i.e. mpiexec ...   
>>> hostname), only whenever I run the applications. Additionally, it   
>>> happens with pre-compiled (i.e. commercial applications)  
>>> applications  as well as applications compiled on the machine (i.e.  
>>> open-source  applications). At first I thought it was something to  
>>> do with the  application, now I'm starting to think I've done  
>>> something wrong with  MPICH2. Below is the configure command I used:
>>>
>>> ./configure --prefix=/usr/local/mpi/mpich2 --enable-f77 --enable- 
>>> f90 -- enable-cxx --enable-sharedlibs=gcc --enable-fast=defopt  
>>> CC=icc CFLAGS=- m64 CXX=icpc CXXFLAGS=-m64 F77=ifort FFLAGS=-m64  
>>> F90=ifort F90FLAGS=-m64
>>>
>>> Anyone have any clues? Thanks!
>>>
>>> -------------------------------------------
>>> Chris Tanner
>>> Space Systems Design Lab
>>> Georgia Institute of Technology
>>> christopher.tanner at gatech.edu
>>> -------------------------------------------
>>>
>>>
>>




More information about the mpich-discuss mailing list