[MPICH] MPI-IO, vector datatype
Russell L. Carter
rcarter at esturion.net
Thu May 3 23:35:22 CDT 2007
Rajeev Thakur wrote:
> Can you try writing to /tmp in case /home/rcarter is NFS.
Yes indeed, NFS is problematical no? Generally it fails, as
I discovered today. Probably from the error messages I need
to enforce sync semantics. But after running into these problems
I settled back on testing with multiple process single filesystem,
using the local unix fs, or multiple node global pvfs2 filesystems
I have.
So yes, those last od dumps are from a single system, single
filesystem. Specifically a linux 2.6 kernel with 2 cpus and
a lot of fast disk.
I might add that I admin all these systems and have been doing
this sort of stuff for 17 years so any underlying (re)configuration
that might help is not out of the question to try out.
But I don't think that's the problem.
Best,
Russell
> Rajeev
>
>> -----Original Message-----
>> From: Russell L. Carter [mailto:rcarter at esturion.net]
>> Sent: Thursday, May 03, 2007 9:03 PM
>> To: Rob Ross
>> Cc: Rajeev Thakur; mpich-discuss at mcs.anl.gov
>> Subject: Re: [MPICH] MPI-IO, vector datatype
>>
>> Hi Rob,
>>
>> Rob Ross wrote:
>>> Hi Russell,
>>>
>>> The "nblocks(1)" sets that variable to 1, yes? Sorry, C++
>> isn't my thing.
>>
>> Well, I mentioned that I tried multiple values for nblocks:
>> 1, 2, and 4,
>> for instance. It would only increase the lines of code to add in
>> a command line argument and I wanted to keep the code as small
>> as possible, and it surely is.
>>
>> To get the wrong result, set nblocks to 2: nblocks(2).
>>
>> I'd like to emphasize that I have tried to change nothing about
>> the algorithm in the read_all.c program featured on p. 65. of Using
>> MPI-2. Using that algorithm, I can't write a file and then
>> read it with the same view. My c++ code is written to make
>> that especially clear. The c++ code in mpicxx.h is just dead
>> simple inline calls to the c api, so it's not a c++ problem.
>>
>> Maybe I'm wrong (cool, problem solved), and there's a working example
>> somewhere? That would be great.
>>
>> Best,
>> Russell
>>
>>
>>> A vector with a count of 1 is the same as a contig with a
>> count equal to
>>> the blocksize of the vector. This would explain what you're
>> seeing. The
>>> stride is only used if the count is greater than 1.
>>>
>>> Regards,
>>>
>>> Rob
>>>
>>> Russell L. Carter wrote:
>>>>> It is easy to run on a single machine. With MPD, all you
>> need to do is
>>>>> % mpd &
>>>>> % mpiexec -n 2 a.out
>>>> Works great. No difference between pvfs2 and unix.
>>>>
>>>>> blocks of 4 ints each because you have defined INTS_PER_BLK=4.
>>>> I'm guilty of a transcription error, crap. Sorry about that,
>>>> that's a stupid waste of time. Should have been INTS_PER_BLK=8.
>>>> With INTS_PER_BLK=4, I agree with your values but the problem
>>>> is still there. I have found what appears to be the problem.
>>>> The stride arg in the Create_vector method appears to be
>>>> ignored. It doesn't matter what I set it to, 0 on up to
>>>> nprocs*blocksize, the block data for each proc is written
>>>> out contiguously.
>>>>
>>>> If I set the view displacement to be myrank*nints,
>>>> the file always looks like this, without
>>>> any holes, for any number of blocks and stride I set
>>>> (nprocs is 2, neg is rank 0, pos is rank 1):
>>>>
>>>> 0000000 0 -1 -2 -3
>>>> 0000020 -4 -5 -6 -7
>>>> 0000040 -8 -9 -10 -11
>>>> 0000060 -12 -13 -14 -15
>>>> 0000100 -16 -17 -18 -19
>>>> 0000120 -20 -21 -22 -23
>>>> 0000140 -24 -25 -26 -27
>>>> 0000160 -28 -29 -30 -31
>>>> 0000200 0 1 2 3
>>>> 0000220 4 5 6 7
>>>> 0000240 8 9 10 11
>>>> 0000260 12 13 14 15
>>>> 0000300 16 17 18 19
>>>> 0000320 20 21 22 23
>>>> 0000340 24 25 26 27
>>>> 0000360 28 29 30 31
>>>>
>>>>
>>>> If I set the view displacements to
>>>> blocksize*sizeof(int)*myrank, the file looks like this,
>>>> for any stride (nblocks/proc is 2 here):
>>>>
>>>> 0000000 0 -1 -2 -3
>>>> 0000020 -4 -5 -6 -7
>>>> 0000040 -8 -9 -10 -11
>>>> 0000060 -12 -13 -14 -15
>>>> 0000100 0 1 2 3
>>>> 0000120 4 5 6 7
>>>> 0000140 8 9 10 11
>>>> 0000160 12 13 14 15
>>>> 0000200 16 17 18 19
>>>> 0000220 20 21 22 23
>>>> 0000240 24 25 26 27
>>>> 0000260 28 29 30 31
>>>>
>>>> The further reduced code is appended. As far as I can tell
>>>> it should produce identical datatypes and views as the program
>>>> on p. 65 of Using MPI-2. It was my impression that that
>>>> program was intended to read interleaved data, maybe it's
>>>> not?
>>>>
>>>> Thanks,
>>>> Russell
>>>>
>>>> #include "mpi.h"
>>>> #include <iostream>
>>>> using namespace std;
>>>>
>>>> struct tester
>>>> {
>>>> tester()
>>>> : myrank(MPI::COMM_WORLD.Get_rank()),
>>>> nprocs(MPI::COMM_WORLD.Get_size()),
>>>> bufsize(FILESIZE/nprocs), nints(bufsize/sizeof(int)),
>>>> nblocks(1), blocksize(nints/nblocks),
>>>> filetype(MPI::INT),
>>>> //fname("pvfs2:/mnt/pvfs/tst/testfile")
>>>> fname("/home/rcarter/mpibin/testfile")
>>>> {
>>>> std::ios::sync_with_stdio(false);
>>>> filetype.Create_vector(nblocks, blocksize, nprocs
>> * blocksize);
>>>> filetype.Commit();
>>>> obuf = new int[bufsize];
>>>> ibuf = new int[bufsize];
>>>> }
>>>> ~tester() {
>>>> delete[] obuf;
>>>> delete[] ibuf;
>>>> }
>>>> void write()
>>>> {
>>>> for (int i = 0; i < nints; ++i) {
>>>> if (myrank)
>>>> obuf[i] = i;
>>>> else
>>>> obuf[i] = -i;
>>>> }
>>>>
>>>> MPI::File f = open_set_view(MPI_MODE_CREATE |
>> MPI_MODE_WRONLY);
>>>> f.Write_all(obuf, nints, MPI_INT, status);
>>>> f.Close();
>>>> }
>>>> void read()
>>>> {
>>>> MPI::File f = open_set_view(MPI_MODE_RDONLY);
>>>> f.Read_all(ibuf, nints, MPI_INT, status);
>>>> f.Close();
>>>> for (int i = 0; i < nints; ++i) {
>>>> if (obuf[i] != ibuf[i]) {
>>>> cerr << "myrank, i, obuf[i], ibuf[i]: " <<
>> myrank << " "
>>>> << i << " " << obuf[i] << " " <<
>> ibuf[i] << endl;
>>>> }
>>>> }
>>>> }
>>>> private:
>>>> static const int FILESIZE = 256;
>>>> int myrank, nprocs, bufsize, nints, nblocks,
>> blocksize, *obuf, *ibuf;
>>>> MPI::Datatype filetype;
>>>> string fname;
>>>> MPI::Status status;
>>>>
>>>> MPI::File open_set_view(int mode)
>>>> {
>>>> MPI::File f = MPI::File::Open(MPI::COMM_WORLD,
>> fname.c_str(),
>>>> mode, MPI::INFO_NULL);
>>>> MPI::Offset disp = blocksize * sizeof(int) * myrank;
>>>> f.Set_view(disp, MPI_INT, filetype, "native",
>> MPI_INFO_NULL);
>>>> return f;
>>>> }
>>>> };
>>>> int main()
>>>> {
>>>> cerr << "Starting rwall.\n";
>>>> try {
>>>> MPI::Init();
>>>> tester t;
>>>> t.write();
>>>> MPI::COMM_WORLD.Barrier();
>>>> t.read();
>>>> MPI::Finalize();
>>>> } catch (exception &e) {
>>>> cerr << "\nCaught exception: " << e.what() << endl;
>>>> return -1;
>>>> } catch (MPI::Exception& e) {
>>>> cerr << "\nError:\n" << e.Get_error_string();
>>>> return -2;
>>>> }
>>>> cerr << "rwall end.\n";
>>>> return 0;
>>>> }
>>>>
>>
>> --
>> Russell L. Carter
>> Esturion, LLC
>> 2285 Sandia Drive
>> Prescott, Arizona 86301
>>
>> rcarter at esturion.net
>> 928 308-4154
>>
>>
>
--
Russell L. Carter
Esturion, LLC
2285 Sandia Drive
Prescott, Arizona 86301
rcarter at esturion.net
928 308-4154
More information about the mpich-discuss
mailing list