[Swift-devel] Re: swift-falkon problem

Zhao Zhang zhaozhang at uchicago.edu
Tue Mar 18 12:00:26 CDT 2008


Hi, Mike

Are you running runam4?  I think there is a range of the variable we 
chose, so in my test, I made sure, each input has an output. Not all 
input data will have results.

zhao

Michael Wilde wrote:
> Moving forward on this:
>
> Zhao's update to the falkon worker agent "bgexec" fixed the problem of 
> not finding wrapper.sh on the worker node.
>
> With the new bgexec in place, the workflow ran successfully for runs 
> of 1 job and 25 jobs.
>
> In a run of 100 jobs I start to see problems:
>
> - 89 of 100 jobs produced output data files on shared/
> - 89 info files, 60 success files
> - 29 output files made it back to the swift run directory
>   (amdi.*)
>
> All the logs and the server-side runtime directory are on the CI FS at
> ~benc/swift-logs/wilde/run313
>
> I am debugging this, but if you could take a look Ben that would be 
> great.
>
> I will test the jobs locally to ensure that all 100 parameters yield 
> successful output. But the app - a shell around a C program - should 
> yield a zero-length file when the job fails and a single decimal 
> number when it succeeds.
>
> This is still running with locally mounted NFS for data access.
> I will try the ssh approach after I rule out problems in my app.
>
> After mis-judging the previous problem as an NFS coherence issue, I 
> dont want to be hasty in prejudging this one.
>
> - Mike
>
>
>
> On 3/17/08 3:19 PM, Michael Wilde wrote:
>> Sorry - another mis-diagnosis and incorrect conclusion on my part.
>>
>> Zhao just told me that we have out of date falkon worker code on the 
>> sicortex that is not chdir'ing to the cwd arg of the falkon request.
>>
>> That explains what Im seeing. Its being fixed now and checked in.
>>
>> -- 
>>
>> To answer your questions though:
>>
>> Im running swift on a linux box bblogin.mcs.anl.gov
>>
>> It mounts the sicortex under /sicortex-homes
>>
>> I run swift from /sicortex-homes/wilde/amiga/run
>>
>> My sites file says:
>>
>> <pool handle="sico">
>>       <gridftp  url="local://localhost"/>
>>       <execution provider="deef"
>>
>> url="http://140.221.37.30:50001/wsrf/services/GenericPortal/core/WS/GPFactoryService"/> 
>>
>>       <workdirectory>/home/wilde/swiftwork</workdirectory>
>> </pool>
>>
>> and /home/wilde/swiftwork on bblogin is a symlink to 
>> /sicortex-homes/wilde/swiftwork
>>
>> so that when swift writes files to the sicortex dir (eg when it 
>> creates shared/*) its using the same pathname that the worker-side 
>> will use when the job runs.  Ie, even though the mount-points differ 
>> between the swift host and the worker host, symlinks make the workdir 
>> appear under same name on both sides.
>>
>> If NFS adheres to its close-to-open-coherence semantics, this then 
>> should I think work.
>>
>> My scp-provider question is probably still worth answering and trying 
>> if this doesnt work.
>>
>> - Mike
>>
>>
>>
>>
>> On 3/17/08 2:57 PM, Ben Clifford wrote:
>>> what does your filesystem layout look like?
>>>
>>> Where are you running swift? And where are you putting your 
>>> scicortex site directory? On an NFS that is also accessible from 
>>> your submit machine? If so, what path?
>>>
>>
>



More information about the Swift-devel mailing list