[Swift-devel] Re: swift-falkon problem

Michael Wilde wilde at mcs.anl.gov
Tue Mar 18 09:05:39 CDT 2008


Moving forward on this:

Zhao's update to the falkon worker agent "bgexec" fixed the problem of 
not finding wrapper.sh on the worker node.

With the new bgexec in place, the workflow ran successfully for runs of 
1 job and 25 jobs.

In a run of 100 jobs I start to see problems:

- 89 of 100 jobs produced output data files on shared/
- 89 info files, 60 success files
- 29 output files made it back to the swift run directory
   (amdi.*)

All the logs and the server-side runtime directory are on the CI FS at
~benc/swift-logs/wilde/run313

I am debugging this, but if you could take a look Ben that would be great.

I will test the jobs locally to ensure that all 100 parameters yield 
successful output. But the app - a shell around a C program - should 
yield a zero-length file when the job fails and a single decimal number 
when it succeeds.

This is still running with locally mounted NFS for data access.
I will try the ssh approach after I rule out problems in my app.

After mis-judging the previous problem as an NFS coherence issue, I dont 
want to be hasty in prejudging this one.

- Mike



On 3/17/08 3:19 PM, Michael Wilde wrote:
> Sorry - another mis-diagnosis and incorrect conclusion on my part.
> 
> Zhao just told me that we have out of date falkon worker code on the 
> sicortex that is not chdir'ing to the cwd arg of the falkon request.
> 
> That explains what Im seeing. Its being fixed now and checked in.
> 
> -- 
> 
> To answer your questions though:
> 
> Im running swift on a linux box bblogin.mcs.anl.gov
> 
> It mounts the sicortex under /sicortex-homes
> 
> I run swift from /sicortex-homes/wilde/amiga/run
> 
> My sites file says:
> 
> <pool handle="sico">
>       <gridftp  url="local://localhost"/>
>       <execution provider="deef"
> 
> url="http://140.221.37.30:50001/wsrf/services/GenericPortal/core/WS/GPFactoryService"/> 
> 
>       <workdirectory>/home/wilde/swiftwork</workdirectory>
> </pool>
> 
> and /home/wilde/swiftwork on bblogin is a symlink to 
> /sicortex-homes/wilde/swiftwork
> 
> so that when swift writes files to the sicortex dir (eg when it creates 
> shared/*) its using the same pathname that the worker-side will use when 
> the job runs.  Ie, even though the mount-points differ between the swift 
> host and the worker host, symlinks make the workdir appear under same 
> name on both sides.
> 
> If NFS adheres to its close-to-open-coherence semantics, this then 
> should I think work.
> 
> My scp-provider question is probably still worth answering and trying if 
> this doesnt work.
> 
> - Mike
> 
> 
> 
> 
> On 3/17/08 2:57 PM, Ben Clifford wrote:
>> what does your filesystem layout look like?
>>
>> Where are you running swift? And where are you putting your scicortex 
>> site directory? On an NFS that is also accessible from your submit 
>> machine? If so, what path?
>>
> 



More information about the Swift-devel mailing list