[mpich-discuss] adio lustre patch: calculation of "avail_cb_nodes"
pascal.deveze at bull.net
pascal.deveze at bull.net
Thu Jan 14 08:22:40 CST 2010
Hi,
In this mail I give my investigations about the calculation of
"avail_cb_nodes"
in the subroutine ADIOI_LUSTRE_Get_striping_info() in the file
ad_lustre_aggregate.c.
The goal of this calculation is to make to best choice for the number of
processes
that will write to the common file during the "two phase IO write".
For Lustre , to avoid extent lock conflicts, it is necessary that each OST
is
accessed by one or more constant clients. Each client must access only one
OST.
If that last condition is not possible, then each client must access the
minimum number of OSTs.
The parameter "nprocs_for_coll" is the number of processes that might
write,
so "avail_cb_nodes" must be less or egal to that number.
The parameter "stripe_count" gives the number of OSTs that store the common
file.
The parameter "CO" gives the maximum number of client each OST is allowed
to serve
(by default it is set to 1).
I made tests with different values of "nprocs_for_coll", "stripe_count "
(hint called striping_factor) and "CO" (hint called
romio_lustre_co_ratio).
I got strange values of "avail_cb_nodes", e.g. :
- With stripe_count=18, nprocs_for_coll=6, CO=1 the calculation gives
avail_cb_nodes=3
The value 6 would be better
- With stripe_count=15, nprocs_for_coll=4, CO=1 the calculation gives
avail_cb_nodes=4
The right value is 3 (4 is bad because an OST will be accessed by all
processes)
- With stripe_count=28, nprocs_for_coll=57, CO=3 the calculation gives
avail_cb_nodes=42
The right value is 56 (42 is bad because a client wil access 2
different OSTs)
I propose a new algorithm to calculate "avail_cb_nodes" on the first
attached file.
I also attach a little command "avail_cb_nodes.c" that you can use to see
where are the differencies:
"avail_cb_nodes 18 6 1" will give the result for stripe_count=18,
nprocs_for_coll=6, CO=1.
If one of the parameter is null, "avail_cb_nodes" enters in a loop and
displays a lot of combinations.
You will see a lot of differences.
I hope this will help.
I see also one issue about when to calculate "avail_cb_nodes". Today,
ADIOI_LUSTRE_Get_striping_info() is called for each "two phase IO write",
and will be called
also for each "two phase IO read" when it will be available.
I think on my part, that this could be done only once (or perhaps also each
time the parameter
"romio_lustre_co_ratio" is changed). "avail_cb_nodes" could be a field in
the struct ADIOI_hints_struct
(below the co_ratio field) or in the struct ADIOI_FileD.
I do not see any reason to make that calculation each time. Am I missing
something ?
Best regards,
Pascal
(See attached file: patch-for-avail_cb_nodes.txt)
(See attached file: avail_cb_nodes.c)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: patch-for-avail_cb_nodes.txt
Type: application/octet-stream
Size: 4318 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100114/c9c380b7/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: avail_cb_nodes.c
Type: application/octet-stream
Size: 4237 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/mpich-discuss/attachments/20100114/c9c380b7/attachment-0001.obj>
More information about the mpich-discuss
mailing list