[Swift-devel] Re: Problems with coaster data provider

Mihael Hategan hategan at mcs.anl.gov
Tue Jul 20 12:29:25 CDT 2010


On Tue, 2010-07-20 at 12:22 -0500, Mihael Hategan wrote:
> That is odd. It looks like all characters for the swift wrapper are in
> uppercase.

Actually it looks like something is off there.

Btw, the coaster provider staging is different from the coaster data
provider. If you want the former, say use.provider.staging=true in
swift.properties.

> 
> 
> On Tue, 2010-07-20 at 09:20 -0600, wilde at mcs.anl.gov wrote:
> > I tried the coaster data provider from MCS host vanquish to crush (2 of the compute servers) via ssh:local and get the error:
> > 
> > "org.globus.cog.abstraction.impl.file.FileResourceException: org.globus.cog.karajan.workflow.service.ProtocolException: Unknown command: #!/BIN/BASH" (Full error text below).
> > 
> > Has anyone else tried the coaster data provider?
> > 
> > My sites file has the single pool:
> > 
> >   <pool handle="crush">
> >     <execution provider="coaster" url="crush.mcs.anl.gov" jobmanager="ssh:local"/>
> >     <profile namespace="globus" key="workersPerNode">8</profile>
> >     <profile namespace="globus" key="maxTime">3500</profile>
> >     <profile namespace="globus" key="slots">1</profile>
> >     <profile namespace="globus" key="nodeGranularity">1</profile>
> >     <profile namespace="globus" key="maxNodes">1</profile>
> > 
> >     <profile key="jobThrottle" namespace="karajan">.07</profile>
> >     <profile namespace="karajan" key="initialScore">10000</profile>
> > 
> >     <filesystem provider="coaster" url="ssh://crush.mcs.anl.gov" />
> >     <workdirectory>/home/wilde/swiftwork/crush</workdirectory>
> >   </pool>
> > 
> > Is that the correct url= value?
> > 
> > I set these properties:
> > 
> > wrapperlog.always.transfer=false
> > sitedir.keep=true
> > execution.retries=0
> > status.mode=provider
> > 
> > The run command, svn version, and full error text on stdout/err is:
> > 
> > vanquish$ swift -tc.file tc -sites.file crushds.xml -config cf catsn.swift -n=1
> > Swift svn swift-r3449 cog-r2816
> > 
> > RunID: 20100720-1006-z1vio8i1
> > Progress:
> > Progress:  Failed:1
> > Execution failed:
> > 	Could not initialize shared directory on crush
> > Caused by:
> > 	org.globus.cog.abstraction.impl.file.FileResourceException: org.globus.cog.karajan.workflow.service.ProtocolException: Unknown command: #!/BIN/BASH
> > # THIS SCRIPT MUST BE INVOKED INSIDE OF BASH, NOT PLAIN SH
> > # NOTE THAT THIS SCRIPT MODIFIES $IFS
> > 
> > INFOSECTION() {
> > 
> > ...full text of _swiftwrap shows up here, in upper case...
> > 
> > # ENSURE WE EXIT WITH A 0 AFTER A SUCCESSFUL EXECUTION
> > EXIT 0
> > 
> > # LOCAL VARIABLES: 
> > # MODE: SH
> > # SH-BASIC-OFFSET: 8
> > # END:
> > 
> > Cleaning up...
> > Shutting down service at https://140.221.8.62:59300
> > Got channel MetaChannel: 2039421489[205498061: {}] -> GSSSChannel-0494700354(1)[205498061: {}]
> > + Done
> > vanquish$
> > 
> > The _swiftwrap file was created in the workdirectory shared/ subdir but has length zero. So presumably it was about to be transferred but the transfer failed.
> > 
> > vanquish$ ls -lR /home/wilde/swiftwork/crush/*8i1
> > /home/wilde/swiftwork/crush/catsn-20100720-1006-z1vio8i1:
> > total 2
> > drwxr-sr-x 2 wilde mcsz 3 Jul 20 10:06 shared/
> > 
> > /home/wilde/swiftwork/crush/catsn-20100720-1006-z1vio8i1/shared:
> > total 1
> > -rw-r--r-- 1 wilde mcsz 0 Jul 20 10:06 _swiftwrap
> > vanquish$ 
> > 
> > ----- "Mihael Hategan" <hategan at mcs.anl.gov> wrote:
> > 
> > > Most of the problems that were obvious with coaster file staging
> > > should
> > > be fixed now. I ran a few tests for 1024 cat jobs on TP with ssh:pbs
> > > with 2-8 workers/node (such that "concurrent" workers are tested) and
> > > it
> > > consistently seemed fine.
> > > 
> > > I also quickly made a fake provider and I am getting a rate of about
> > > 100
> > > j/s. So that seems not to infirm my previous suspicion.
> > > 
> > > On Mon, 2010-07-12 at 11:52 -0500, Michael Wilde wrote:
> > > > Here's my view on these:
> > > > 
> > > > > 2. test/fix coaster file staging
> > > > 
> > > > This would be useful for both real apps and (I think) for CDM
> > > testing. I would do this first.
> > > > 
> > > > I would then add:
> > > > 
> > > > 5. Adjustments needed, if any, on multicore handling in PBS and SGE
> > > provider.
> > > > 
> > > > 6. Adjustments and fixes for reliability and logging, if needed, in
> > > Condor-G provider.
> > > > 
> > > > I expect that 5 & 6 would be small tasks, and they are not yet
> > > clearly defined. I think that other people could do them.
> > > > 
> > > > Maybe add:
> > > > 
> > > > 7. -tui fixes. Seems not to be working so well on recent tests;
> > > several of the screens, including the source-code view, seem not to be
> > > working.
> > > > 
> > > > Then:
> > > > 
> > > > > 1. make swift core faster
> > > > 
> > > > I would do this second; I think you said you need about 7-10 days to
> > > try things and see what can be done, maybe more after that if the
> > > exploration suggests things that will take much (re)coding?
> > > > 
> > > > > 3. standalone coaster service
> > > > 
> > > > The current manual coasters is proving useful. 
> > > > > 4. swift shell
> > > > 
> > > > Lets defer (4) for now; if we can instead run swift repeatedly and
> > > either have the coaster worker pool re-connect quickly to each new
> > > swift, or quickly start new pools within the same cluster job(s), that
> > > would suffice for now.
> > > > 
> > > > Justin, do you want to weigh in on these?
> > > > 
> > > > Thanks,
> > > > 
> > > > Mike
> > > > 
> > > > 
> > > > > The idea is that some recent changes may have shifted the
> > > existing
> > > > > priorities. So think of this from the perspective of
> > > > > user/application/publication goals rather than what you think
> > > would
> > > > > be
> > > > > "nice to have".
> > > > > 
> > > > > Mihael
> > > > > 
> > > > > _______________________________________________
> > > > > Swift-devel mailing list
> > > > > Swift-devel at ci.uchicago.edu
> > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
> > > >
> > 
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel





More information about the Swift-devel mailing list