[MPICH] slow IOR when using fileview

Wei-keng Liao wkliao at ece.northwestern.edu
Sat Jun 30 02:02:45 CDT 2007


I am experiencing slow IOR performance on Cray XT3 when using fileview 
option. I extract the code into a simpler version (attached). The code 
compares two collective writes: MPI_File_write_all and 
MPI_File_write_at_all. The former uses an MPI fileview and the latter uses 
explicit file offset. For both cases, each process writes 10 MB to a 
shared file, contiguously, non-overlapping, non-interleaved. On the Cray 
XT3 with Lustre file system, the former is extremely slower than the 
latter. Here is an output for using 8 processes:

2: MPI_File_write_all() time = 4.72 sec
3: MPI_File_write_all() time = 4.74 sec
6: MPI_File_write_all() time = 4.77 sec
1: MPI_File_write_all() time = 4.79 sec
7: MPI_File_write_all() time = 4.81 sec
0: MPI_File_write_all() time = 4.83 sec
5: MPI_File_write_all() time = 4.85 sec
4: MPI_File_write_all() time = 4.89 sec
2: MPI_File_write_at_all() time = 0.02 sec
1: MPI_File_write_at_all() time = 0.02 sec
3: MPI_File_write_at_all() time = 0.02 sec
0: MPI_File_write_at_all() time = 0.02 sec
6: MPI_File_write_at_all() time = 0.02 sec
4: MPI_File_write_at_all() time = 0.02 sec
7: MPI_File_write_at_all() time = 0.02 sec
5: MPI_File_write_at_all() time = 0.02 sec

I tried the same code on other machines and different file systems (eg. 
PVFS), and timings for both cases were very close to each other. If anyone 
has access to a Cray XT3 machine, could you please try it and let me know? 
Thanks.

Wei-keng
-------------- next part --------------
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>

#define LEN 10485760

/*----< main() >------------------------------------------------------------*/
int main(int argc, char **argv) {
    int          i, rank, np, globalSizes[2], localSizes[2], startIndices[2];
    char         buf[LEN], filename[1024];
    double       timing;
    MPI_Offset   offset;
    MPI_File     fh;
    MPI_Datatype fileType;
    MPI_Status   status;

    MPI_Init(&argc, &argv);
    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
    MPI_Comm_size(MPI_COMM_WORLD, &np);
    if (argc != 2) {
        fprintf(stderr,"Usage: %s filename\n",argv[0]);
        MPI_Finalize();
        exit(1);
    }
    for (i=0; i<LEN; i++) buf[i] = '0'+rank;

    /* MPI collective write with file setview -----------------------------*/
    sprintf(filename, "%s.view",argv[1]);
    MPI_File_open(MPI_COMM_WORLD, filename, MPI_MODE_CREATE | MPI_MODE_WRONLY,
                  MPI_INFO_NULL, &fh);

     globalSizes[0] = np;    globalSizes[1] = LEN;
      localSizes[0] = 1;      localSizes[1] = LEN;
    startIndices[0] = rank; startIndices[1] = 0;

    MPI_Type_create_subarray(2, globalSizes, localSizes, startIndices,
                             MPI_ORDER_C, MPI_BYTE, &fileType);
    MPI_Type_commit(&fileType);

    MPI_File_set_view(fh, 0, MPI_BYTE, fileType, "native", MPI_INFO_NULL);

    MPI_Barrier(MPI_COMM_WORLD);
    timing = MPI_Wtime();
    MPI_File_write_all(fh, buf, LEN, MPI_BYTE, &status);
    timing = MPI_Wtime() - timing;
    printf("%d: MPI_File_write_all() time = %.2f sec\n",rank,timing);

    MPI_File_close(&fh);
    MPI_Type_free(&fileType);

    /* MPI collective write without file setview --------------------------*/
    sprintf(filename, "%s.noview",argv[1]);

    MPI_File_open(MPI_COMM_WORLD, filename, MPI_MODE_CREATE | MPI_MODE_WRONLY,
                  MPI_INFO_NULL, &fh);

    offset = LEN * rank;

    MPI_Barrier(MPI_COMM_WORLD);
    timing = MPI_Wtime();
    MPI_File_write_at_all(fh, offset, buf, LEN, MPI_BYTE, &status);
    timing = MPI_Wtime() - timing;
    printf("%d: MPI_File_write_at_all() time = %.2f sec\n",rank,timing);

    MPI_File_close(&fh);

    MPI_Finalize();
    return 0;
}



More information about the mpich-discuss mailing list