[MPICH] Help With I/O

Matthew Chambers matthew.chambers at vanderbilt.edu
Wed Apr 18 13:24:23 CDT 2007


Oh, good point.  I didn't notice that the num_elements_per_rank was a float.
Still, there are better, more transparent ways to do that (like using
modulus: "if( num_vector_elements % num_processes == 0 ) /*
num_elements_per_rank will be an integer */).  But you seem to have ignored
the rest of my post.  I said that it would be useful to see the debug output
from your for loop (I accepted your assertion that the program was ok until
then).  And what is the status on the buf variable being allocated?

 

  _____  

From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Erich Peterson
Sent: Wednesday, April 18, 2007 12:39 PM
To: mpich-discuss at mcs.anl.gov
Subject: RE: [MPICH] Help With I/O

 

Hi, the reasoning behind the if statements on the last process, is I might
need to allocate 1 or 2 more elements to the last process it the vector_size
/ num_processes is not an even number. As far as debugging goes I have
debugged it to the for loop, and everything seems fine until it. I think it
has something to do with needing to have a lock on one or both of the file
accesses. The problem is I have very limited knowledge of MPI-IO and was
hoping someone who does could easily see what I am doing wrong with the I/O
part.
 
Thanks,
Erich





  _____  

From: matthew.chambers at vanderbilt.edu
To: mpich-discuss at mcs.anl.gov
Subject: Re: [MPICH] Help With I/O
Date: Wed, 18 Apr 2007 09:39:50 -0500

Hi Erich,

 

Frankly the code leading up to that for loop is a bit scary.  I don't really
know anything about MPI I/O, but I can get you a few tips on your C++:

-          Unless you're using a horribly out of date compiler like MSVC 6,
you should use the standard header names <iostream>, <vector>, <ctime>, etc.

-          If you are using a horribly out of date compiler like MSVC 6, you
should upgrade to the free MSVC++ 2005 Express Edition.

-          In this case it's a cosmetic fix, but you should probably pass
the vector<bool> parameter by reference instead of by value.

-          You seem to be doing some mind boggling casting in order to
determine if num_elements_per_rank is too big to fit in an int (but only on
your last process?).  You might get rid of that voodoo by using size_t
(usually at least an unsigned 4 byte int) for your position indexes instead
(vector<T>::size() returns vector<T>::size_type which usually boils down to
size_t).

 

Beyond that, I would need to see some debug output from your for loop.  For
example, what indexes are actually being passed to the I/O calls by each
process?  Does MPI::File::Read_at() allocate memory for the "buf" variable
you pass it?  If not, you haven't allocated any memory for it and that would
lead to a crash before you could say "new char."

 

Good luck,

Matt Chambers

 

  _____  

From: owner-mpich-discuss at mcs.anl.gov
[mailto:owner-mpich-discuss at mcs.anl.gov] On Behalf Of Erich Peterson
Sent: Wednesday, April 18, 2007 3:02 AM
To: mpich-discuss at mcs.anl.gov
Subject: [MPICH] Help With I/O

 

Hi all, I'm trying to write this little routine which is part of my graduate
project. What I'm trying to do is pass a vector of bools in to the QueryData
method. In that method I split the vector up into equal parts among the
number of processes, have each process open a datafile which as 20 byte
records (no new lines), read that record from the file if the vector they
are checking has "true" (basically if vector[0] = true, it means grab the
first record of the file), and lastly, it should output that record into a
new file.
 
I have been able to determine it is messing up on the for-loop. The error
is:
 
[Student at cluster1 erich_test_area]$ mpiexec -n 3
/mnt/pvfs2/acxiom/erich_test_area/RecordRetrieval
terminate called after throwing an instance of 'MPI::Exception'
terminate called after throwing an instance of 'MPI::Exception'
terminate called after throwing an instance of 'MPI::Exception'
rank 0 in job 1  cluster1_33602   caused collective abort of all ranks
  exit status of rank 0: killed by signal 6 

If someone could please tell me or edit the code if they see what is wrong.
Thanks!
 
Main.cpp:

#include "RecordRetrieval.h"
#include <vector.h>
int main()
{
   vector<bool> vec;
   vec.push_back(true);
   vec.push_back(false);
   vec.push_back(true);
   vec.push_back(false);
   vec.push_back(true);
   vec.push_back(false);
   RecordRetrieval rec;
   rec.QueryData(vec, "test.dat");
   return 0;
}

RecordRetrieval.cpp:
 
#include "RecordRetrieval.h"
#include "mpi.h"
#include "time.h"
#include "iostream.h"
void RecordRetrieval::QueryData(vector<bool> unencoded_vector, char *
filename)
{
    int num_processes;
    int num_vector_elements;
    float num_elements_per_rank;
    int local_start_position;
    int local_end_position;
    char * buf;
    int my_rank;
    MPI::File input_file;
    MPI::Status input_file_status;
    MPI::File output_file;
    MPI::Status output_file_status;
    //MPI::Offset filesize;
    char output_filename[30];
    size_t i;
    struct tm tim;
    time_t now;
    now = time(NULL);
    tim = *(localtime(&now));
    i = strftime(output_filename, 30, "%m_%d_%Y_%H_%M_%S", &tim);

    /* Let the system do what it needs to start up MPI */
    MPI::Init();
    /* Get my process rank */
    my_rank = MPI::COMM_WORLD.Get_rank();
    /* Find out how many processes are being used */
    num_processes = MPI::COMM_WORLD.Get_size();
    num_vector_elements = unencoded_vector.size();

    num_elements_per_rank = num_vector_elements / num_processes;
    local_start_position = my_rank * (int)num_elements_per_rank;
    if(my_rank == num_processes - 1)
    {
        if(num_elements_per_rank * num_processes ==
(int)num_elements_per_rank * num_processes)
        {
            local_end_position = local_start_position +
((int)num_elements_per_rank - 1);
        }
        else
        {
            local_end_position = (local_start_position +
(int)num_elements_per_rank - 1) +
                (((int)num_elements_per_rank * num_processes) -
((int)num_elements_per_rank * num_processes));
        }
    }
    else
    {
        local_end_position = local_start_position +
((int)num_elements_per_rank - 1);
    }
    input_file = MPI::File::Open(MPI::COMM_WORLD, filename,
MPI::MODE_RDONLY,
                    MPI::INFO_NULL);
    output_file = MPI::File::Open(MPI::COMM_WORLD, output_filename,
MPI::MODE_CREATE | MPI::MODE_WRONLY, MPI::INFO_NULL);
    // filesize = input_file.Get_size();
    for(int i = local_start_position; i < local_end_position + 1; i++)
    {
        if(unencoded_vector[i])
        {
            input_file.Read_at(i * 20, buf, 20, MPI_CHAR,
input_file_status);
            output_file.Write_shared(buf, 20, MPI_CHAR, output_file_status);
        }
    }
    cout << "Error";
    input_file.Close();
    output_file.Close();
                                           
    MPI::Finalize();
}

  _____  

Discover the new Windows Vista Learn more!
<http://search.msn.com/results.aspx?q=windows+vista&mkt=en-US&form=QBRE> 

 

  _____  

Invite your mail contacts to join your friends list with Windows Live
Spaces. It's easy! Try it!
<http://spaces.live.com/spacesapi.aspx?wx_action=create&wx_url=/friends.aspx
&mkt=en-us> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20070418/c8929b77/attachment.htm>


More information about the mpich-discuss mailing list