[MPICH] slow IOR when using fileview
Wei-keng Liao
wkliao at ece.northwestern.edu
Sat Jun 30 02:02:45 CDT 2007
I am experiencing slow IOR performance on Cray XT3 when using fileview
option. I extract the code into a simpler version (attached). The code
compares two collective writes: MPI_File_write_all and
MPI_File_write_at_all. The former uses an MPI fileview and the latter uses
explicit file offset. For both cases, each process writes 10 MB to a
shared file, contiguously, non-overlapping, non-interleaved. On the Cray
XT3 with Lustre file system, the former is extremely slower than the
latter. Here is an output for using 8 processes:
2: MPI_File_write_all() time = 4.72 sec
3: MPI_File_write_all() time = 4.74 sec
6: MPI_File_write_all() time = 4.77 sec
1: MPI_File_write_all() time = 4.79 sec
7: MPI_File_write_all() time = 4.81 sec
0: MPI_File_write_all() time = 4.83 sec
5: MPI_File_write_all() time = 4.85 sec
4: MPI_File_write_all() time = 4.89 sec
2: MPI_File_write_at_all() time = 0.02 sec
1: MPI_File_write_at_all() time = 0.02 sec
3: MPI_File_write_at_all() time = 0.02 sec
0: MPI_File_write_at_all() time = 0.02 sec
6: MPI_File_write_at_all() time = 0.02 sec
4: MPI_File_write_at_all() time = 0.02 sec
7: MPI_File_write_at_all() time = 0.02 sec
5: MPI_File_write_at_all() time = 0.02 sec
I tried the same code on other machines and different file systems (eg.
PVFS), and timings for both cases were very close to each other. If anyone
has access to a Cray XT3 machine, could you please try it and let me know?
Thanks.
Wei-keng
-------------- next part --------------
#include <stdio.h>
#include <stdlib.h>
#include <mpi.h>
#define LEN 10485760
/*----< main() >------------------------------------------------------------*/
int main(int argc, char **argv) {
int i, rank, np, globalSizes[2], localSizes[2], startIndices[2];
char buf[LEN], filename[1024];
double timing;
MPI_Offset offset;
MPI_File fh;
MPI_Datatype fileType;
MPI_Status status;
MPI_Init(&argc, &argv);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);
MPI_Comm_size(MPI_COMM_WORLD, &np);
if (argc != 2) {
fprintf(stderr,"Usage: %s filename\n",argv[0]);
MPI_Finalize();
exit(1);
}
for (i=0; i<LEN; i++) buf[i] = '0'+rank;
/* MPI collective write with file setview -----------------------------*/
sprintf(filename, "%s.view",argv[1]);
MPI_File_open(MPI_COMM_WORLD, filename, MPI_MODE_CREATE | MPI_MODE_WRONLY,
MPI_INFO_NULL, &fh);
globalSizes[0] = np; globalSizes[1] = LEN;
localSizes[0] = 1; localSizes[1] = LEN;
startIndices[0] = rank; startIndices[1] = 0;
MPI_Type_create_subarray(2, globalSizes, localSizes, startIndices,
MPI_ORDER_C, MPI_BYTE, &fileType);
MPI_Type_commit(&fileType);
MPI_File_set_view(fh, 0, MPI_BYTE, fileType, "native", MPI_INFO_NULL);
MPI_Barrier(MPI_COMM_WORLD);
timing = MPI_Wtime();
MPI_File_write_all(fh, buf, LEN, MPI_BYTE, &status);
timing = MPI_Wtime() - timing;
printf("%d: MPI_File_write_all() time = %.2f sec\n",rank,timing);
MPI_File_close(&fh);
MPI_Type_free(&fileType);
/* MPI collective write without file setview --------------------------*/
sprintf(filename, "%s.noview",argv[1]);
MPI_File_open(MPI_COMM_WORLD, filename, MPI_MODE_CREATE | MPI_MODE_WRONLY,
MPI_INFO_NULL, &fh);
offset = LEN * rank;
MPI_Barrier(MPI_COMM_WORLD);
timing = MPI_Wtime();
MPI_File_write_at_all(fh, offset, buf, LEN, MPI_BYTE, &status);
timing = MPI_Wtime() - timing;
printf("%d: MPI_File_write_at_all() time = %.2f sec\n",rank,timing);
MPI_File_close(&fh);
MPI_Finalize();
return 0;
}
More information about the mpich-discuss
mailing list