[Fwd: Re: [Swift-devel] Re: swift-falkon problem... plots to explain plateaus...]
Ioan Raicu
iraicu at cs.uchicago.edu
Tue Apr 1 10:43:16 CDT 2008
Mihael Hategan wrote:
> On Tue, 2008-04-01 at 10:26 -0500, Ioan Raicu wrote:
>
>> Michael Wilde wrote:
>>
>>> We're only working on the BG/P system, and GPFS is the only shared
>>> filesystem there.
>>>
>> There is PVFS, but that performed even worse in our tests.
>>
>>> GPFS access, however, remains a big scalabiity issue. Frequent small
>>> accesses to GPFS in our measurements really slow down the workflow. We
>>> did a lot of micro-benchmark tests.
>>>
>> Yes! The BG/P's GPFS probably performs the worst out of all GPFSes I
>> have worked on, in terms of small granular accesses. For example,
>> reading 1 byte files, invoking a trivial script (i.e. exit 0), etc...
>> all perform extremely poor, to the point that we need to move away from
>> GPFS almost completely. For example, the things that we eventually need
>> to avoid on GPFS for the BG/P are:
>> invoking wrapper.sh
>> mkdir
>> any logging to GPFS
>>
>
> Doing nothing can be incredibly fast.
>
What I meant is that we need to move these operations to the local file
system, i.e. RAM. We have run applications on BG/P via Falkon only, and
implemented a caching strategy that caches all scripts, binaries, and
input data, to RAM... once the task execution (all from RAM) completes,
and has written its output to RAM, then there is a single copy operation
of the output data from RAM to GPFS. We control how frequently this
copy operation occurs, so we can essentially scale quite nicely and
linearly with this approach. The hope is that we can eventually work
this kind of functionality in the wrapper.sh, or in Swift itself. So, a
reply to your statement, we would like to preserve the functionality of
the wrapper.sh, but move as much as possible of that functionality from
a shared file system to a local disk.
Ioan
>
>> There are probably others.
>>
>>> Zhao, can you gather a set of these tests into a small suite and post
>>> numbers so the Swift developers can get an understanding of the
>>> system's GPFS access performance?
>>>
>>> Also note: the only local filesystem is RAM disk on /tmp or /dev/shm.
>>> (Ioan and Zhao should confirm if they verified that /tmp is on RAM).
>>>
>> Yes, there are no local disks on either BG/P or SiCortex. Both machines
>> have /tmp and dev/shm mounted as ram disks.
>>
>> Ioan
>>
>>> - Mike
>>>
>>> On 4/1/08 5:05 AM, Ben Clifford wrote:
>>>
>>>> On Tue, 1 Apr 2008, Ben Clifford wrote:
>>>>
>>>>
>>>>>> With this fixed, the total time in wrapper.sh including the app is
>>>>>> now about
>>>>>> 15 seconds, with 3 being in the app-wrapper itself. The time seems
>>>>>> about
>>>>>> evenly spread over the several wrapper.sh operations, which is not
>>>>>> surprising
>>>>>> when 500 wrappers hit NFS all at once.
>>>>>>
>>>>> Does this machine have a higher (/different) performance shared file
>>>>> system such as PVFS or GPFS? We spent some time in november layout
>>>>> out the filesystem to be sympathetic to GPFS to help avoid
>>>>> bottlenecks like you are seeing here. It would be kinda sad if
>>>>> either it isn't available or you aren't using it even though it is
>>>>> available.
>>>>>
>>>>> From what I can tell from the web, PVFS and/or GPFS are available on
>>>>> all
>>>>>
>>>> of the Argonne Blue Gene machines. Is this true? I don't want to
>>>> provide more scalability support for NFS-on-bluegene if it is.
>>>>
>>>>
>>> _______________________________________________
>>> Swift-devel mailing list
>>> Swift-devel at ci.uchicago.edu
>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>>
>>>
>
>
>
--
===================================================
Ioan Raicu
Ph.D. Candidate
===================================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
===================================================
Email: iraicu at cs.uchicago.edu
Web: http://www.cs.uchicago.edu/~iraicu
http://dev.globus.org/wiki/Incubator/Falkon
http://dsl-wiki.cs.uchicago.edu/index.php/Main_Page
===================================================
===================================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20080401/206e127a/attachment.html>
More information about the Swift-devel
mailing list