[Swift-devel] Re: swift-falkon problem
Zhao Zhang
zhaozhang at uchicago.edu
Tue Mar 18 12:00:26 CDT 2008
Hi, Mike
Are you running runam4? I think there is a range of the variable we
chose, so in my test, I made sure, each input has an output. Not all
input data will have results.
zhao
Michael Wilde wrote:
> Moving forward on this:
>
> Zhao's update to the falkon worker agent "bgexec" fixed the problem of
> not finding wrapper.sh on the worker node.
>
> With the new bgexec in place, the workflow ran successfully for runs
> of 1 job and 25 jobs.
>
> In a run of 100 jobs I start to see problems:
>
> - 89 of 100 jobs produced output data files on shared/
> - 89 info files, 60 success files
> - 29 output files made it back to the swift run directory
> (amdi.*)
>
> All the logs and the server-side runtime directory are on the CI FS at
> ~benc/swift-logs/wilde/run313
>
> I am debugging this, but if you could take a look Ben that would be
> great.
>
> I will test the jobs locally to ensure that all 100 parameters yield
> successful output. But the app - a shell around a C program - should
> yield a zero-length file when the job fails and a single decimal
> number when it succeeds.
>
> This is still running with locally mounted NFS for data access.
> I will try the ssh approach after I rule out problems in my app.
>
> After mis-judging the previous problem as an NFS coherence issue, I
> dont want to be hasty in prejudging this one.
>
> - Mike
>
>
>
> On 3/17/08 3:19 PM, Michael Wilde wrote:
>> Sorry - another mis-diagnosis and incorrect conclusion on my part.
>>
>> Zhao just told me that we have out of date falkon worker code on the
>> sicortex that is not chdir'ing to the cwd arg of the falkon request.
>>
>> That explains what Im seeing. Its being fixed now and checked in.
>>
>> --
>>
>> To answer your questions though:
>>
>> Im running swift on a linux box bblogin.mcs.anl.gov
>>
>> It mounts the sicortex under /sicortex-homes
>>
>> I run swift from /sicortex-homes/wilde/amiga/run
>>
>> My sites file says:
>>
>> <pool handle="sico">
>> <gridftp url="local://localhost"/>
>> <execution provider="deef"
>>
>> url="http://140.221.37.30:50001/wsrf/services/GenericPortal/core/WS/GPFactoryService"/>
>>
>> <workdirectory>/home/wilde/swiftwork</workdirectory>
>> </pool>
>>
>> and /home/wilde/swiftwork on bblogin is a symlink to
>> /sicortex-homes/wilde/swiftwork
>>
>> so that when swift writes files to the sicortex dir (eg when it
>> creates shared/*) its using the same pathname that the worker-side
>> will use when the job runs. Ie, even though the mount-points differ
>> between the swift host and the worker host, symlinks make the workdir
>> appear under same name on both sides.
>>
>> If NFS adheres to its close-to-open-coherence semantics, this then
>> should I think work.
>>
>> My scp-provider question is probably still worth answering and trying
>> if this doesnt work.
>>
>> - Mike
>>
>>
>>
>>
>> On 3/17/08 2:57 PM, Ben Clifford wrote:
>>> what does your filesystem layout look like?
>>>
>>> Where are you running swift? And where are you putting your
>>> scicortex site directory? On an NFS that is also accessible from
>>> your submit machine? If so, what path?
>>>
>>
>
More information about the Swift-devel
mailing list