[Swift-devel] feature request

Michael Wilde wilde at mcs.anl.gov
Tue Apr 21 11:15:07 CDT 2009


[meant to cc this to swift-devel]

Two separate questions here:

1) what to do:

Test coasters on a set of agreed upon (and growing list of) sites:

Start with these:

localhost:
TG: teraport, ucanl, mercury, sdsc, abe, queenbee, ranger
OSG: red.unl.edu, some wisc.edu site (pick 3 for now)
Local: HNL cluster (gwynn)

Thats a good list to start for 0.9.  We can easily grow this once you
get that far.

If I missed a high prio target please let me know.

Others in the wings: Jazz, MCS kBT cluster, TG bigred and purdue condor;
many more OSG sites.


2) what failed

I didnt look in your log yet, but be aware that you need a proxy to run
coasters even on localhost (for secure messaging with GSI)

Best to start with localhost testing on communicado, not surveyor


On 4/21/09 10:54 AM, Zhao Zhang wrote:
> Dear All
> 
> I am trying to run swift on local site. I checked out the latest swift 
> code, and built it. Started test as below, it failed.
> I am not clear with my test goals here, am I making sure coaster is 
> working on all sites we have, or am I testing the existing
> coaster could be use by any users?
> 
> zhao
> 
> zzhang at login6.surveyor:/home/falkon/swift_coaster/cog/modules/swift/tests/sites> 
> ./run-site coaster/coaster-local.xml
> testing site configuration: coaster/coaster-local.xml
> Removing files from previous runs
> Running test 061-cattwo at Tue Apr 21 10:51:21 CDT 2009
> Swift svn swift-r2865 cog-r2388
> 
> RunID: 20090421-1051-er55uyr8
> Progress:
> Multiple entries found for cat on localhost. Using the first one
> Progress:  Submitted:1
> Failed to transfer wrapper log from 
> 061-cattwo-20090421-1051-er55uyr8/info/i on localhost
> Progress:  Submitted:1
> Failed to transfer wrapper log from 
> 061-cattwo-20090421-1051-er55uyr8/info/k on localhost
> Progress:  Submitted:1
> Failed to transfer wrapper log from 
> 061-cattwo-20090421-1051-er55uyr8/info/m on localhost
> Execution failed:
>        Exception in cat:
> Arguments: [061-cattwo.1.in, 061-cattwo.2.in]
> Host: localhost
> Directory: 061-cattwo-20090421-1051-er55uyr8/jobs/m/cat-mt8ngp9j
> stderr.txt:
> 
> stdout.txt:
> 
> ----
> 
> Caused by:
>        Could not submit job
> Caused by:
>        Could not start coaster service
> Caused by:
>        Task ended before registration was received.
> STDOUT:
> STDERR: which: no gmd5sum in 
> (/home/falkon/swift_coaster/cog/modules/swift/dist/swift-svn/bin:/home/zzhang/ruby-1.8.7-p72/bin/bin:/home/zzhang/chirp/bin:/home/zzhang/gridftp/bin:/home/zzhang/gridftp/sbin:/home/zzhang/xar/bin:/home/falkon/swift_scratch/cog/modules/swift/dist/swift-svn/bin:/home/falkon/falkon/bin:/home/falkon/falkon/service:/home/falkon/falkon/worker:/home/falkon/falkon/client:/home/falkon/falkon/monitor:/home/falkon/falkon/webserver:/home/falkon/falkon/ploticus/src:/home/falkon/falkon/apache-ant-1.7.0:/home/falkon/falkon/apache-ant-1.7.0/bin:/usr/lib/jvm/java:/usr/lib/jvm/java/bin:/home/falkon/falkon/container:/home/falkon/falkon/container/bin:/bin:/usr/sbin:/etc:/usr/X11R6/bin:/usr/bin:/sbin:/usr/local/bin:/bgsys/drivers/ppcfloor/bin:/bgsys/drivers/ppcfloor/comm/bin:/dbhome/bgpdb2c/sqllib/lib:/opt/ibmcmp/vac/bg/9.0/bin:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:/software/common/apps/mpiscripts:/software/common/apps/projects-list/bin:/software/common/adm/softenv/bin:/home/

zzhang/bin/linux-sles10-ppc64:/home/zzhang/bin:.:/software/common/apps/misc-scripts:/bgsys/drivers/ppcfloor/bin:/usr/lib/mit/bin:/usr/lib/mit/sbin) 

> 
> 
> 
> Caused by:
>        Job failed with an exit code of 1
> Cleaning up...
> Done
> SWIFT RETURN CODE NON-ZERO - test 061-cattwo
> 
> 
> Michael Wilde wrote:
>> Zhao, based on prior discussion of coaster testing on this list, we 
>> all agree its high priority.
>>
>> Can you set aside everything else and focus on it, and let us know at 
>> the end of the day if you've been able to run the existing site tests 
>> as a starting point?
>>
>> Then you need to test running coasters manually, then (I assume, Ben 
>> and Mihael) create additional site tests that exercise coasters?
>>
>> Mihael, you should provide a list of coaster-specific aspects to test.
>> The job-time management aspects come to mind, as do coaster cleanup 
>> and termination.
>>
>>
>> On 4/21/09 3:15 AM, Ben Clifford wrote:
>>> At the weekend, I put out a release candidate for Swift 0.9 with a 
>>> 7-day day test period.
>>>
>>> Performing coaster testing on that release candidate using the 
>>> existing Swift site tests, and with additional sites, is something 
>>> that would be useful to the release process and has a clearly defined 
>>> short timescale - it is something that should happen before the weekend.
>>>
>>> On Thu, 16 Apr 2009, Ben Clifford wrote:
>>>
>>>> On Wed, 15 Apr 2009, Michael Wilde wrote:
>>>>
>>>>> Zhao, based on Ben's suggestion and our earlier discussion, can you 
>>>>> locate and
>>>>> try the Swift test suite, and then look in detail at, and try, the 
>>>>> per-site
>>>>> suite.
>>>>>
>>>>> Im looking for an assessment of what it takes to create a branch of 
>>>>> the suite
>>>>> to test coasters in th same manner that the sites are tested.
>>>> Put swift on your path and get a proxy.
>>>>
>>>> Then:
>>>>  cd tests/sites
>>>>  ./run-all coaster/
>>>>
>>>> This will start running the tests in the coaster/ subdirectory.
>>>>
>>>> Each of the files in there is a site definition. One site test is 
>>>> run for each of those. When/if all of them have exited, the list of 
>>>> sites that worked and the list of sites that did not work is output.
>>>>
>>>> If you want to run a single site, say ./run-site 
>>>> coaster/coaster-local.xml
>>>> (for example)
>>>>
>>>> To add new sites, put an file in the coaster/ subdirectory 
>>>> containing an appropriate site definition.
>>>>
>>>> In order to make site tests that many people can run, I usually make 
>>>> a remote work directory and chmod a+rwxt on that remote work 
>>>> directory so no matter who runs, Swift will not encounter permission 
>>>> problems.
>>>>
>>>>> And what it takes to run that on a regular basis to find problems 
>>>>> in sites and
>>>>> Swift *before* our users find them.
>>>> The main problem with the site tests is that they generally need 
>>>> credentials, and its unclear what the right way to handle long term 
>>>> testing credentials is.
>>>>
>>>>
>>
> 




More information about the Swift-devel mailing list