[Swift-user] Error message on Cray XE6
Michael Wilde
wilde at mcs.anl.gov
Sat Apr 14 10:51:07 CDT 2012
OK, here's a workaround for this problem:
You need to add this line to the swift command bin/swift in your Swift release.
After:
updateOptions "$SWIFT_HOME" "swift.home"
Add:
updateOptions "$USER_HOME" "user.home"
This is near line 92 in the version I tested, Swift trunk swift-r5739 cog-r3368.
Then you can do:
USER_HOME=/lustre/beagle/wilde swift -config cf -tc.file tc -sites.file pbs.xml catsn.swift -n=1
Lorenzo, if you are using "module load swift" we'll need to update that, or you can copy the swift release directory structure that module load points you to, then modify the swift command there, and put that modified release first in your PATH.
We'll work out a way to get something like this into the production module and trunk. I dont know of other systems that are currently affected by this, but Im sure they will come up.
- Mike
----- Original Message -----
> From: "Michael Wilde" <wilde at mcs.anl.gov>
> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> Cc: swift-user at ci.uchicago.edu
> Sent: Saturday, April 14, 2012 10:13:40 AM
> Subject: Re: [Swift-user] Error message on Cray XE6
> stackoverflow says this should work:
>
> java -Duser.home=<new_location> <your_program>
>
> Need to get that in via the swift command.
>
> - Mike
>
>
> ----- Original Message -----
> > From: "Michael Wilde" <wilde at mcs.anl.gov>
> > To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > Cc: "Lorenzo Pesce" <lpesce at uchicago.edu>,
> > swift-user at ci.uchicago.edu
> > Sent: Saturday, April 14, 2012 10:10:00 AM
> > Subject: Re: [Swift-user] Error message on Cray XE6
> > I just tried both setting HOME=/lustre/beagle/wilde and setting
> > user.home to the same thing. Neither works. I think user.home is
> > coming from the Java property, and that doesnt seem to be influenced
> > by the HOME env var. I was about to look if Java can be asked to
> > change home. Maybe by setting a command line arg to Java.
> >
> > - Mike
> >
> > ----- Original Message -----
> > > From: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > > To: "Michael Wilde" <wilde at mcs.anl.gov>
> > > Cc: "Lorenzo Pesce" <lpesce at uchicago.edu>,
> > > swift-user at ci.uchicago.edu
> > > Sent: Saturday, April 14, 2012 10:02:14 AM
> > > Subject: Re: [Swift-user] Error message on Cray XE6
> > > That is an easy fix I believe. I know where the code is so I will
> > > change and test.
> > >
> > > In the mean time could you try something? Try setting
> > > user.home=<someplace.on.lustre>
> > > in your config file and try again.
> > >
> > > On Apr 14, 2012, at 9:58, Michael Wilde <wilde at mcs.anl.gov> wrote:
> > >
> > > > /home is no longer mounted by the compute nodes, per the
> > > > post-maitenance summary:
> > > >
> > > > "External filesystem dependencies minimized: Compute nodes and
> > > > the
> > > > scheduler should now continue to process and complete jobs
> > > > without
> > > > the threat of interference of external filesystem outages.
> > > > /gpfs/pads is only available on login1 through login5; /home is
> > > > on
> > > > login and mom nodes only."
> > > >
> > > > So we need to (finally) remove Swift's dependence on
> > > > $HOME/.globus
> > > > and $HOME/.globus/scripts in particular.
> > > >
> > > > I suggest - since the swift command already needs to write to
> > > > "."
> > > > -
> > > > that we create a scripts/ directory in "." instead of
> > > > $HOME/.globus.
> > > > And this should be used by any provider that would have
> > > > previously
> > > > created files below .globus.
> > > >
> > > > I'll echo this to swift-devel and start a thread there to
> > > > discuss.
> > > > Its possible there's already a property to cause scripts/ to be
> > > > created elsewhere. If not, I think we should make one. I think
> > > > grouping the scripts created by a run into the current dir,
> > > > along
> > > > with the swift log, _concurrent, and (in the conventions I use
> > > > in
> > > > my
> > > > run scripts) swiftwork/.
> > > >
> > > > Lorenzo, hopefully we can at least get you a workaround for this
> > > > soon.
> > > >
> > > > You *might* be able to trick swift into doing this by setting
> > > > HOME=/lustre/beagle/$USER. I already tried a symlink under
> > > > .globus
> > > > and that didnt work, as /home is not even readable by the
> > > > compute
> > > > nodes, which in this case need to run the coaster worker (.pl)
> > > > script.
> > > >
> > > > - Mike
> > > >
> > > >
> > > > ----- Original Message -----
> > > >> From: "Lorenzo Pesce" <lpesce at uchicago.edu>
> > > >> To: "Jonathan Monette" <jonmon at mcs.anl.gov>
> > > >> Cc: swift-user at ci.uchicago.edu
> > > >> Sent: Saturday, April 14, 2012 8:15:39 AM
> > > >> Subject: Re: [Swift-user] Error message on Cray XE6
> > > >> In principle the access to the /home filesystem should still be
> > > >> there.
> > > >>
> > > >> The only thing I did was to chance the cf file to remove some
> > > >> errors I
> > > >> had into it, so that might also be the source of the problem.
> > > >> This
> > > >> is
> > > >> what it looks like now:
> > > >> (BTW, the comments are not mine, I run swift only from lustre)
> > > >>
> > > >>
> > > >> # Whether to transfer the wrappers from the compute nodes
> > > >> # I like to launch from my home dir, but keep everything on
> > > >> # lustre
> > > >> wrapperlog.always.transfer=false
> > > >>
> > > >> #Indicates whether the working directory on the remote site
> > > >> # should be left intact even when a run completes successfully
> > > >> sitedir.keep=true
> > > >>
> > > >> #try only once
> > > >> execution.retries=1
> > > >>
> > > >> # Attempt to run as much as possible, i.g., ignore non-fatal
> > > >> errors
> > > >> lazy.errors=true
> > > >>
> > > >> # to reduce filesystem access
> > > >> status.mode=provider
> > > >>
> > > >> use.provider.staging=false
> > > >>
> > > >> provider.staging.pin.swiftfiles=false
> > > >>
> > > >> foreach.max.threads=100
> > > >>
> > > >> provenance.log=false
> > > >>
> > > >>
> > > >>
> > > >>
> > > >> On Apr 14, 2012, at 12:10 AM, Jonathan Monette wrote:
> > > >>
> > > >>> The perl script is the worker script that is submitted with
> > > >>> PBS.
> > > >>> I
> > > >>> have not tried to run on Beagle since the maintenance period
> > > >>> has
> > > >>> ended so I am not exactly sure why the error popped up. One
> > > >>> reason
> > > >>> could be that the home file system is no longer mounted on the
> > > >>> compute nodes. I know they spoke about that being a
> > > >>> possibility
> > > >>> but
> > > >>> not sure they implemented that during the maintenance period.
> > > >>> Do
> > > >>> you
> > > >>> know if the home file system is still mounted on the compute
> > > >>> nodes?
> > > >>>
> > > >>> On Apr 13, 2012, at 17:18, Lorenzo Pesce <lpesce at uchicago.edu>
> > > >>> wrote:
> > > >>>
> > > >>>> Hi --
> > > >>>> I haven't seen this one before:
> > > >>>>
> > > >>>> Can't open perl script
> > > >>>> "/home/lpesce/.globus/coasters/cscript7176272791806289394.pl":
> > > >>>> No
> > > >>>> such file or directory
> > > >>>>
> > > >>>> The config of the cray has changed, might this have anything
> > > >>>> to
> > > >>>> do
> > > >>>> with it?
> > > >>>> I have no idea what perl script is it talking about and why
> > > >>>> it
> > > >>>> is
> > > >>>> looking to home.
> > > >>>>
> > > >>>> Thanks a lot,
> > > >>>>
> > > >>>> Lorenzo
> > > >>>>
> > > >>>>
> > > >>>>
> > > >>>> _______________________________________________
> > > >>>> Swift-user mailing list
> > > >>>> Swift-user at ci.uchicago.edu
> > > >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> > > >>
> > > >> _______________________________________________
> > > >> Swift-user mailing list
> > > >> Swift-user at ci.uchicago.edu
> > > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
> > > >
> > > > --
> > > > Michael Wilde
> > > > Computation Institute, University of Chicago
> > > > Mathematics and Computer Science Division
> > > > Argonne National Laboratory
> > > >
> >
> > --
> > Michael Wilde
> > Computation Institute, University of Chicago
> > Mathematics and Computer Science Division
> > Argonne National Laboratory
>
> --
> Michael Wilde
> Computation Institute, University of Chicago
> Mathematics and Computer Science Division
> Argonne National Laboratory
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user
--
Michael Wilde
Computation Institute, University of Chicago
Mathematics and Computer Science Division
Argonne National Laboratory
More information about the Swift-user
mailing list