[Swift-devel] Re: Problems with coaster data provider

Mihael Hategan hategan at mcs.anl.gov
Tue Jul 20 12:22:07 CDT 2010


That is odd. It looks like all characters for the swift wrapper are in
uppercase.


On Tue, 2010-07-20 at 09:20 -0600, wilde at mcs.anl.gov wrote:
> I tried the coaster data provider from MCS host vanquish to crush (2 of the compute servers) via ssh:local and get the error:
> 
> "org.globus.cog.abstraction.impl.file.FileResourceException: org.globus.cog.karajan.workflow.service.ProtocolException: Unknown command: #!/BIN/BASH" (Full error text below).
> 
> Has anyone else tried the coaster data provider?
> 
> My sites file has the single pool:
> 
>   <pool handle="crush">
>     <execution provider="coaster" url="crush.mcs.anl.gov" jobmanager="ssh:local"/>
>     <profile namespace="globus" key="workersPerNode">8</profile>
>     <profile namespace="globus" key="maxTime">3500</profile>
>     <profile namespace="globus" key="slots">1</profile>
>     <profile namespace="globus" key="nodeGranularity">1</profile>
>     <profile namespace="globus" key="maxNodes">1</profile>
> 
>     <profile key="jobThrottle" namespace="karajan">.07</profile>
>     <profile namespace="karajan" key="initialScore">10000</profile>
> 
>     <filesystem provider="coaster" url="ssh://crush.mcs.anl.gov" />
>     <workdirectory>/home/wilde/swiftwork/crush</workdirectory>
>   </pool>
> 
> Is that the correct url= value?
> 
> I set these properties:
> 
> wrapperlog.always.transfer=false
> sitedir.keep=true
> execution.retries=0
> status.mode=provider
> 
> The run command, svn version, and full error text on stdout/err is:
> 
> vanquish$ swift -tc.file tc -sites.file crushds.xml -config cf catsn.swift -n=1
> Swift svn swift-r3449 cog-r2816
> 
> RunID: 20100720-1006-z1vio8i1
> Progress:
> Progress:  Failed:1
> Execution failed:
> 	Could not initialize shared directory on crush
> Caused by:
> 	org.globus.cog.abstraction.impl.file.FileResourceException: org.globus.cog.karajan.workflow.service.ProtocolException: Unknown command: #!/BIN/BASH
> # THIS SCRIPT MUST BE INVOKED INSIDE OF BASH, NOT PLAIN SH
> # NOTE THAT THIS SCRIPT MODIFIES $IFS
> 
> INFOSECTION() {
> 
> ...full text of _swiftwrap shows up here, in upper case...
> 
> # ENSURE WE EXIT WITH A 0 AFTER A SUCCESSFUL EXECUTION
> EXIT 0
> 
> # LOCAL VARIABLES: 
> # MODE: SH
> # SH-BASIC-OFFSET: 8
> # END:
> 
> Cleaning up...
> Shutting down service at https://140.221.8.62:59300
> Got channel MetaChannel: 2039421489[205498061: {}] -> GSSSChannel-0494700354(1)[205498061: {}]
> + Done
> vanquish$
> 
> The _swiftwrap file was created in the workdirectory shared/ subdir but has length zero. So presumably it was about to be transferred but the transfer failed.
> 
> vanquish$ ls -lR /home/wilde/swiftwork/crush/*8i1
> /home/wilde/swiftwork/crush/catsn-20100720-1006-z1vio8i1:
> total 2
> drwxr-sr-x 2 wilde mcsz 3 Jul 20 10:06 shared/
> 
> /home/wilde/swiftwork/crush/catsn-20100720-1006-z1vio8i1/shared:
> total 1
> -rw-r--r-- 1 wilde mcsz 0 Jul 20 10:06 _swiftwrap
> vanquish$ 
> 
> ----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:
> 
> > Most of the problems that were obvious with coaster file staging
> > should
> > be fixed now. I ran a few tests for 1024 cat jobs on TP with ssh:pbs
> > with 2-8 workers/node (such that "concurrent" workers are tested) and
> > it
> > consistently seemed fine.
> > 
> > I also quickly made a fake provider and I am getting a rate of about
> > 100
> > j/s. So that seems not to infirm my previous suspicion.
> > 
> > On Mon, 2010-07-12 at 11:52 -0500, Michael Wilde wrote:
> > > Here's my view on these:
> > > 
> > > > 2. test/fix coaster file staging
> > > 
> > > This would be useful for both real apps and (I think) for CDM
> > testing. I would do this first.
> > > 
> > > I would then add:
> > > 
> > > 5. Adjustments needed, if any, on multicore handling in PBS and SGE
> > provider.
> > > 
> > > 6. Adjustments and fixes for reliability and logging, if needed, in
> > Condor-G provider.
> > > 
> > > I expect that 5 & 6 would be small tasks, and they are not yet
> > clearly defined. I think that other people could do them.
> > > 
> > > Maybe add:
> > > 
> > > 7. -tui fixes. Seems not to be working so well on recent tests;
> > several of the screens, including the source-code view, seem not to be
> > working.
> > > 
> > > Then:
> > > 
> > > > 1. make swift core faster
> > > 
> > > I would do this second; I think you said you need about 7-10 days to
> > try things and see what can be done, maybe more after that if the
> > exploration suggests things that will take much (re)coding?
> > > 
> > > > 3. standalone coaster service
> > > 
> > > The current manual coasters is proving useful. 
> > > > 4. swift shell
> > > 
> > > Lets defer (4) for now; if we can instead run swift repeatedly and
> > either have the coaster worker pool re-connect quickly to each new
> > swift, or quickly start new pools within the same cluster job(s), that
> > would suffice for now.
> > > 
> > > Justin, do you want to weigh in on these?
> > > 
> > > Thanks,
> > > 
> > > Mike
> > > 
> > > 
> > > > The idea is that some recent changes may have shifted the
> > existing
> > > > priorities. So think of this from the perspective of
> > > > user/application/publication goals rather than what you think
> > would
> > > > be
> > > > "nice to have".
> > > > 
> > > > Mihael
> > > > 
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > >
> 





More information about the Swift-devel mailing list