Error in independent mode

Roger Ting rogermht at iinet.net.au
Mon Apr 5 07:05:32 CDT 2004


Hi Rob,
         Thanks for replying to the message. I tried using two processors to
write to the file
independently and simultaneously and it worked fine. But it won't work if  i
scale up to 3 processors.
        I did not  use the collective mode because i cannot foresee how each
processor can
call the append operation simultaneously. From my understanding, collective
operations mean all
processors should call the same functions at the same point and at the same
time. What i understand is
if processor A want to append an entry but processor B will append an entry
fifteen seconds later and
i am using the collective operations, processor A will be blocked until
processor B call the collective
operation together with processor A. Hence, the whole application will waste
15 seconds when processor
A could have append the entry and move on. I know i could have used the
serial API but ideally
it would better for both processors to update the key file simultaneously
right? I thought it will be slightly
faster than the token approach which each processor can append an entry
independently but not simultaneously.
This is the reason i  haven't used collective operation as suggested by  the
manual and you.
          I am running the job on a Linux Cluster with about  90 nodes which
have 2 processors in each node.
The version of MPI i am using is mpich version 1.25 with Intel Compiler and
Redhat version 7. I don't
know the arrangement of the filesystem etc. I am guessing that  processes
will be spawned each nodes and
even though each node has a local file systems but there are some storage
nodes. Any idea why
some nodes cannot access the file?

Roger





----- Original Message ----- 
From: "Rob Ross" <rross at mcs.anl.gov>
To: "Roger Ting" <rogermht at iinet.net.au>
Cc: "paranetcdf" <parallel-netcdf at mcs.anl.gov>
Sent: Monday, April 05, 2004 11:54 AM
Subject: Re: Error in independent mode


> Hi Roger,
>
> You've caught me on vacation and RobL at a conference; sorry for the
> delay.
>
> What MPI are you using?
>
> In theory this should work just fine.  It's not an issue of correctness
> given your algorithm (see other message).  Sounds like one of your
> processes didn't have access to the file system or something?  Are you
> running all the processes on one machine?  Could you try that and see if
> it works?
>
> Thanks,
>
> Rob
>
> On Sat, 27 Mar 2004, Roger Ting wrote:
>
> > Hi,
> >     I get this error when i update a netcdf file in independent mode.
> >
> >     2: MPI_File_write error = Input/output error
> >      Unknow error occurs in writting file
> >
> > Anyone have any idea what is going on ? This happens when i scale up to
5 processors.
> >       This sounds silly but what happen is i use a root processor to
> > keep track of which position in the netcdf to append the new entry.
> > Whenever a processor wants to add an enrty to the file, it sends a
> > message to the root processor. The root will return the latest position.
> > If there are five entries in the file, the root processor will return 5.
> > Therefore, the slave processor will write the enrty at position six. The
> > slave is blocked before it get the message from the root processor.
> > Therefore, i don't see how does the I/O error happens. Each processor
> > should be guarantee to write to different position in the file.
> >
> > Thx
> >
> >
>
>




More information about the parallel-netcdf mailing list