[MPICH] MPI-IO, vector datatype

Rob Ross rross at mcs.anl.gov
Fri May 4 08:08:44 CDT 2007


Glad Rajeev figured it out.

Regards,

Rob

Russell L. Carter wrote:
> That appears to be it.  That's not conventional
> c++ semantics. Damn.  It all works perfectly now!
> 
> Now I know why there's that effort over at
> www.boost.org to sanify the MPI api.
> 
> Thank you all, especially Rajeev and Rob for all
> your help.
> 
> Best,
> Russell
> 
> 
> Rajeev Thakur wrote:
>> OK, I am no C++ expert, but if I define a datatype
>> MPI::Datatype newtype; and do
>> newtype =  filetype.Create_vector(...)
>> newtype.Commit();
>>
>> And then pass newtype as the filetype to Set_view in both places, your 
>> code
>> works.
>>
>> Rajeev
>>
>>
>>> -----Original Message-----
>>> From: Russell L. Carter [mailto:rcarter at esturion.net] Sent: Thursday, 
>>> May 03, 2007 11:57 PM
>>> To: Rajeev Thakur
>>> Cc: 'Rob Ross'; mpich-discuss at mcs.anl.gov
>>> Subject: Re: [MPICH] MPI-IO, vector datatype
>>>
>>> All right, I've been thinking, ok, I'm an old fart,
>>> I can revert to assembly code. :-)
>>>
>>> I will provide you with the direct analogue in c tomorrow.
>>>
>>> Now that I think about it, it's quite possible that I was wrong
>>> about the c++ api not being a problem, as there may be a
>>> missing & (reference) operator somewhere.
>>>
>>> Best,
>>> Russell
>>>
>>> Rajeev Thakur wrote:
>>>> I don't see any bug in the program, so I am guessing it has 
>>> to do with C++,
>>>> may be even the C++ binding in MPICH2. Can you run the C 
>>> version of the
>>>> program you downloaded from the book.
>>>> Rajeev
>>>>> -----Original Message-----
>>>>> From: Russell L. Carter [mailto:rcarter at esturion.net] Sent: 
>>>>> Thursday, May 03, 2007 11:35 PM
>>>>> To: Rajeev Thakur
>>>>> Cc: 'Rob Ross'; mpich-discuss at mcs.anl.gov
>>>>> Subject: Re: [MPICH] MPI-IO, vector datatype
>>>>>
>>>>> Rajeev Thakur wrote:
>>>>>> Can you try writing to /tmp in case /home/rcarter is NFS.
>>>>> Yes indeed, NFS is problematical no?  Generally it fails, as
>>>>> I discovered today.  Probably from the error messages I need
>>>>> to enforce sync semantics.  But after running into these problems
>>>>> I settled back on testing with multiple process single filesystem,
>>>>> using the local unix fs, or multiple node global pvfs2 filesystems
>>>>> I have.
>>>>>
>>>>> So yes, those last od dumps are from a single system, single
>>>>> filesystem.  Specifically a linux 2.6 kernel with 2 cpus and
>>>>> a lot of fast disk.
>>>>>
>>>>> I might add that I admin all these systems and have been doing
>>>>> this sort of stuff for 17 years so any underlying (re)configuration
>>>>> that might help is not out of the question to try out.
>>>>>
>>>>> But I don't think that's the problem.
>>>>>
>>>>> Best,
>>>>> Russell
>>>>>
>>>>>> Rajeev
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Russell L. Carter [mailto:rcarter at esturion.net] Sent: 
>>>>>>> Thursday, May 03, 2007 9:03 PM
>>>>>>> To: Rob Ross
>>>>>>> Cc: Rajeev Thakur; mpich-discuss at mcs.anl.gov
>>>>>>> Subject: Re: [MPICH] MPI-IO, vector datatype
>>>>>>>
>>>>>>> Hi Rob,
>>>>>>>
>>>>>>> Rob Ross wrote:
>>>>>>>> Hi Russell,
>>>>>>>>
>>>>>>>> The "nblocks(1)" sets that variable to 1, yes? Sorry, C++ 
>>>>>>> isn't my thing.
>>>>>>>
>>>>>>> Well, I mentioned that I tried multiple values for nblocks: 1, 2, 
>>>>>>> and 4,
>>>>>>> for instance.  It would only increase the lines of code to add in
>>>>>>> a command line argument and I wanted to keep the code as small
>>>>>>> as possible, and it surely is.
>>>>>>>
>>>>>>> To get the wrong result, set nblocks to 2: nblocks(2).
>>>>>>>
>>>>>>> I'd like to emphasize that I have tried to change nothing about
>>>>>>> the algorithm in the read_all.c program featured on p. 
>>> 65. of Using
>>>>>>> MPI-2.   Using that algorithm, I can't write a file and then
>>>>>>> read it with the same view.  My c++ code is written to make
>>>>>>> that especially clear.  The c++ code in mpicxx.h is just dead
>>>>>>> simple inline calls to the c api, so it's not a c++ problem.
>>>>>>>
>>>>>>> Maybe I'm wrong (cool, problem solved), and there's a 
>>>>> working example
>>>>>>> somewhere?  That would be great.
>>>>>>>
>>>>>>> Best,
>>>>>>> Russell
>>>>>>>
>>>>>>>
>>>>>>>> A vector with a count of 1 is the same as a contig with a 
>>>>>>> count equal to
>>>>>>>> the blocksize of the vector. This would explain what you're 
>>>>>>> seeing. The
>>>>>>>> stride is only used if the count is greater than 1.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Rob
>>>>>>>>
>>>>>>>> Russell L. Carter wrote:
>>>>>>>>>> It is easy to run on a single machine. With MPD, all you 
>>>>>>> need to do is
>>>>>>>>>> % mpd &
>>>>>>>>>> % mpiexec -n 2 a.out 
>>>>>>>>> Works great.  No difference between pvfs2 and unix.
>>>>>>>>>
>>>>>>>>>> blocks of 4 ints each because you have defined INTS_PER_BLK=4.
>>>>>>>>> I'm guilty of a transcription error, crap.  Sorry about that,
>>>>>>>>> that's a stupid waste of time.  Should have been 
>>> INTS_PER_BLK=8.
>>>>>>>>> With INTS_PER_BLK=4, I agree with your values but the problem
>>>>>>>>> is still there.  I have found what appears to be the problem.
>>>>>>>>> The stride arg in the Create_vector method appears to be
>>>>>>>>> ignored.  It doesn't matter what I set it to, 0 on up to
>>>>>>>>> nprocs*blocksize, the block data for each proc is written
>>>>>>>>> out contiguously.
>>>>>>>>>
>>>>>>>>> If I set the view displacement to be myrank*nints,
>>>>>>>>> the file always looks like this, without
>>>>>>>>> any holes, for any number of blocks and stride I set
>>>>>>>>> (nprocs is 2, neg is rank 0, pos is rank 1):
>>>>>>>>>
>>>>>>>>> 0000000           0          -1          -2          -3
>>>>>>>>> 0000020          -4          -5          -6          -7
>>>>>>>>> 0000040          -8          -9         -10         -11
>>>>>>>>> 0000060         -12         -13         -14         -15
>>>>>>>>> 0000100         -16         -17         -18         -19
>>>>>>>>> 0000120         -20         -21         -22         -23
>>>>>>>>> 0000140         -24         -25         -26         -27
>>>>>>>>> 0000160         -28         -29         -30         -31
>>>>>>>>> 0000200           0           1           2           3
>>>>>>>>> 0000220           4           5           6           7
>>>>>>>>> 0000240           8           9          10          11
>>>>>>>>> 0000260          12          13          14          15
>>>>>>>>> 0000300          16          17          18          19
>>>>>>>>> 0000320          20          21          22          23
>>>>>>>>> 0000340          24          25          26          27
>>>>>>>>> 0000360          28          29          30          31
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> If I set the view displacements to
>>>>>>>>> blocksize*sizeof(int)*myrank, the file looks like this,
>>>>>>>>> for any stride (nblocks/proc is 2 here):
>>>>>>>>>
>>>>>>>>> 0000000           0          -1          -2          -3
>>>>>>>>> 0000020          -4          -5          -6          -7
>>>>>>>>> 0000040          -8          -9         -10         -11
>>>>>>>>> 0000060         -12         -13         -14         -15
>>>>>>>>> 0000100           0           1           2           3
>>>>>>>>> 0000120           4           5           6           7
>>>>>>>>> 0000140           8           9          10          11
>>>>>>>>> 0000160          12          13          14          15
>>>>>>>>> 0000200          16          17          18          19
>>>>>>>>> 0000220          20          21          22          23
>>>>>>>>> 0000240          24          25          26          27
>>>>>>>>> 0000260          28          29          30          31
>>>>>>>>>
>>>>>>>>> The further reduced code is appended.  As far as I can tell
>>>>>>>>> it should produce identical datatypes and views as the program
>>>>>>>>> on p. 65 of Using MPI-2.  It was my impression that that
>>>>>>>>> program was intended to read interleaved data, maybe it's
>>>>>>>>> not?
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Russell
>>>>>>>>>
>>>>>>>>> #include "mpi.h"
>>>>>>>>> #include <iostream>
>>>>>>>>> using namespace std;
>>>>>>>>>
>>>>>>>>> struct tester
>>>>>>>>> {
>>>>>>>>>     tester()
>>>>>>>>>         : myrank(MPI::COMM_WORLD.Get_rank()),
>>>>>>>>>           nprocs(MPI::COMM_WORLD.Get_size()),
>>>>>>>>>           bufsize(FILESIZE/nprocs), nints(bufsize/sizeof(int)),
>>>>>>>>>           nblocks(1), blocksize(nints/nblocks),
>>>>>>>>>           filetype(MPI::INT),
>>>>>>>>>           //fname("pvfs2:/mnt/pvfs/tst/testfile")
>>>>>>>>>           fname("/home/rcarter/mpibin/testfile")
>>>>>>>>>     {
>>>>>>>>>         std::ios::sync_with_stdio(false);
>>>>>>>>>         filetype.Create_vector(nblocks, blocksize, nprocs 
>>>>>>> * blocksize);
>>>>>>>>>         filetype.Commit();
>>>>>>>>>         obuf = new int[bufsize];
>>>>>>>>>         ibuf = new int[bufsize];
>>>>>>>>>     }
>>>>>>>>>     ~tester() {
>>>>>>>>>         delete[] obuf;
>>>>>>>>>         delete[] ibuf;
>>>>>>>>>     }
>>>>>>>>>     void write()
>>>>>>>>>     {
>>>>>>>>>         for (int i = 0; i < nints; ++i) {
>>>>>>>>>             if (myrank)
>>>>>>>>>                 obuf[i] = i;
>>>>>>>>>             else
>>>>>>>>>                 obuf[i] = -i;
>>>>>>>>>         }
>>>>>>>>>
>>>>>>>>>         MPI::File f = open_set_view(MPI_MODE_CREATE | 
>>>>>>> MPI_MODE_WRONLY);
>>>>>>>>>         f.Write_all(obuf, nints, MPI_INT, status);
>>>>>>>>>         f.Close();
>>>>>>>>>     }
>>>>>>>>>     void read()
>>>>>>>>>     {
>>>>>>>>>         MPI::File f = open_set_view(MPI_MODE_RDONLY);
>>>>>>>>>         f.Read_all(ibuf, nints, MPI_INT, status);
>>>>>>>>>         f.Close();
>>>>>>>>>         for (int i = 0; i < nints; ++i) {
>>>>>>>>>             if (obuf[i] != ibuf[i]) {
>>>>>>>>>                 cerr << "myrank, i, obuf[i], ibuf[i]: " << 
>>>>>>> myrank << " "
>>>>>>>>>                      << i << " " << obuf[i] << " " << 
>>>>>>> ibuf[i] << endl;
>>>>>>>>>             }
>>>>>>>>>         }
>>>>>>>>>     }
>>>>>>>>> private:
>>>>>>>>>     static const int FILESIZE = 256;
>>>>>>>>>     int myrank, nprocs, bufsize, nints, nblocks, 
>>>>>>> blocksize, *obuf, *ibuf;
>>>>>>>>>     MPI::Datatype filetype;
>>>>>>>>>     string fname;
>>>>>>>>>     MPI::Status status;
>>>>>>>>>
>>>>>>>>>     MPI::File open_set_view(int mode)
>>>>>>>>>     {
>>>>>>>>>         MPI::File f = MPI::File::Open(MPI::COMM_WORLD, 
>>>>>>> fname.c_str(),
>>>>>>>>>                                       mode, MPI::INFO_NULL);
>>>>>>>>>         MPI::Offset disp = blocksize * sizeof(int) * myrank;
>>>>>>>>>         f.Set_view(disp, MPI_INT, filetype, "native", 
>>>>>>> MPI_INFO_NULL);
>>>>>>>>>         return f;
>>>>>>>>>     }
>>>>>>>>> };
>>>>>>>>> int main()
>>>>>>>>> {
>>>>>>>>>     cerr << "Starting rwall.\n";
>>>>>>>>>     try {
>>>>>>>>>         MPI::Init();
>>>>>>>>>         tester t;
>>>>>>>>>         t.write();
>>>>>>>>>         MPI::COMM_WORLD.Barrier();
>>>>>>>>>         t.read();
>>>>>>>>>         MPI::Finalize();
>>>>>>>>>     } catch (exception &e) {
>>>>>>>>>         cerr << "\nCaught exception: " << e.what() << endl;
>>>>>>>>>         return -1;
>>>>>>>>>     } catch (MPI::Exception& e) {
>>>>>>>>>         cerr << "\nError:\n" << e.Get_error_string();
>>>>>>>>>         return -2;
>>>>>>>>>     }
>>>>>>>>>     cerr << "rwall end.\n";
>>>>>>>>>     return 0;
>>>>>>>>> }
>>>>>>>>>
>>>>>>> -- 
>>>>>>> Russell L. Carter
>>>>>>> Esturion, LLC
>>>>>>> 2285 Sandia Drive
>>>>>>> Prescott, Arizona 86301
>>>>>>>
>>>>>>> rcarter at esturion.net
>>>>>>> 928 308-4154
>>>>>>>
>>>>>>>
>>>>> -- 
>>>>> Russell L. Carter
>>>>> Esturion, LLC
>>>>> 2285 Sandia Drive
>>>>> Prescott, Arizona 86301
>>>>>
>>>>> rcarter at esturion.net
>>>>> 928 308-4154
>>>>>
>>>>>
>>>
>>> -- 
>>> Russell L. Carter
>>> Esturion, LLC
>>> 2285 Sandia Drive
>>> Prescott, Arizona 86301
>>>
>>> rcarter at esturion.net
>>> 928 308-4154
>>>
>>>
>>
> 
> 




More information about the mpich-discuss mailing list