[Swift-devel] Re: Can you test 0.92 branch for 0.92.1 release?

Jonathan Monette jon.monette at gmail.com
Mon Apr 11 12:46:48 CDT 2011


My PBS error 254 was indeed that the outfile to the app was not created by
the script so the stageout failed.

On Wed, Apr 6, 2011 at 12:31 PM, Jonathan Monette <jon.monette at gmail.com>wrote:

> Ok.  I found the app.  It is a wrapper script I have that just makes sure
> the the app I call returns exit code 0 and not some other exit code.  Some
> of the apps run and complete but not all of them.  I can only assume it is
> still returning an error code so I have to track this down.  One thing that
> should be changed is when the error 254 occurs that it specifies the name of
> the app that failed(or job or something).  This will at least help track
> down why and where.
>
>
> On Tue, Apr 5, 2011 at 3:14 PM, Jonathan Monette <jon.monette at gmail.com>wrote:
>
>> Yes.  I will certainly do that.  And those are the usual suspects that I
>> have seen for error 254, but the app I believe is failing do not have any of
>> those properties.  I am re-running the script hoping with some changes that
>> will hopefully shed more on where it fails.  PADS is in maintenance mode.
>>  There are several jobs in the queue and looks like none are even running.
>>
>>
>> On Tue, Apr 5, 2011 at 3:06 PM, Michael Wilde <wilde at mcs.anl.gov> wrote:
>>
>>> Jon,
>>>
>>> PBS Error 254 may be something like app in tc.data is not executable, or
>>> app script calls something not found or not executable, or that makes it
>>> return non-zero. It falls in that class of error that I just railed about in
>>> Bug 321: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=321
>>>
>>> Its not clear to me that the same root problem manifests in exactly the
>>> same error codes and messages under varioud providers and configurations,
>>> which is another problem that the fix(es) to Big 321 should deal with.
>>>
>>> When you fix your 254, could you report back to swift-devel what it was,
>>> and either file as a new bug or update Bug 321?
>>>
>>> - Mike
>>>
>>>
>>> ----- Original Message -----
>>> > Correct. Based off how I was looping I was receiving the same cache
>>> > error that Allan was receiving. Also, I never though of this but my
>>> > Montage scripts were running very slowly in the trunk at some point(I
>>> > am assuming this was the point that the twice each bug was introduced
>>> > and everything was being done twice). Under the 0.92 branch by small
>>> > workflows complete. My large workflows error out with PBS error 254 I
>>> > believe. Cannot remember the error code but believe it was this one.
>>> > But this is not due to the twice each bug.
>>> >
>>> >
>>> > On Tue, Apr 5, 2011 at 2:50 PM, Michael Wilde < wilde at mcs.anl.gov >
>>> > wrote:
>>> >
>>> >
>>> > Just to clarify: we detected this bug by diagnosing the error that
>>> > Allan was getting in his SCEC workflow, trying to add a file to a
>>> > local cache that was already there.
>>> >
>>> > I never verified if the same bug was causing failures in Montage, but
>>> > Jon reported Apr 4 12:04 AM that the small Montage was working under
>>> > the fixed 0.92 branch and that the large Montage run was still to be
>>> > tested.
>>> >
>>> > - Mike
>>> >
>>> >
>>> >
>>> >
>>> > ----- Original Message -----
>>> > > got it thanks...to be clear i wasn't going to try to run the whole
>>> > > montage scripit :P but this is easier than extracting the faulty
>>> > > loop
>>> > > :)
>>> > >
>>> > >
>>> > > On Tue, Apr 5, 2011 at 12:37 PM, Jonathan Monette <
>>> > > jon.monette at gmail.com > wrote:
>>> > >
>>> > >
>>> > > Yes. That is the one I remember seeing. That is much easier than
>>> > > what
>>> > > my Montage scripts are doing.
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > On Tue, Apr 5, 2011 at 2:33 PM, Michael Wilde < wilde at mcs.anl.gov >
>>> > > wrote:
>>> > >
>>> > >
>>> > > Yes, I had posted variations of the following to the list:
>>> > >
>>> > > zz3.swift:
>>> > >
>>> > > int arr[];
>>> > >
>>> > > arr[0]=1;
>>> > > arr[1]=2;
>>> > >
>>> > > foreach a in arr {
>>> > > trace("for", a);
>>> > > }
>>> > >
>>> > > zz6.swift:
>>> > >
>>> > >
>>> > > int arr[];
>>> > >
>>> > > foreach a,i in [0:9] {
>>> > > arr[i] = i;
>>> > > }
>>> > >
>>> > > trace("arr",arr);
>>> > >
>>> > > foreach a,i in arr {
>>> > > trace("for", a,i);
>>> > > }
>>> > >
>>> > >
>>> > > com$
>>> > >
>>> PATH=/home/wilde/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/:$PATH
>>> > > com$ which swift
>>> > > ~/swift/src/0.92/cog/modules/swift/dist/swift-svn/bin/swift
>>> > > com$ cd swift/lab
>>> > > com$ swift zz3.swift
>>> > > Swift svn swift-r4252 (swift modified locally) cog-r3088 (cog
>>> > > modified
>>> > > locally)
>>> > >
>>> > > RunID: 20110404-1344-j98f22id
>>> > > Progress:
>>> > > SwiftScript trace: for, 2
>>> > > SwiftScript trace: for, 1
>>> > > Final status:
>>> > > com$ PATH=~/swift/rev/swift-0.92/bin:$PATH
>>> > > com$ swift zz3.swift
>>> > > Swift svn swift-r4157 cog-r3056
>>> > >
>>> > > RunID: 20110404-1344-ensm4te8
>>> > > Progress:
>>> > > SwiftScript trace: for, 1
>>> > > SwiftScript trace: for, 2
>>> > > SwiftScript trace: for, 2
>>> > > SwiftScript trace: for, 1
>>> > > Final status:
>>> > > com$ swift zz6.swift
>>> > > Swift svn swift-r4157 cog-r3056
>>> > >
>>> > > RunID: 20110404-1344-i7y6q1i1
>>> > > Progress:
>>> > > SwiftScript trace: arr, arr.$[]/10
>>> > > SwiftScript trace: for, 3, 3
>>> > > SwiftScript trace: for, 2, 2
>>> > > SwiftScript trace: for, 4, 4
>>> > > SwiftScript trace: for, 5, 5
>>> > > SwiftScript trace: for, 3, 3
>>> > > SwiftScript trace: for, 5, 5
>>> > > SwiftScript trace: for, 9, 9
>>> > > SwiftScript trace: for, 4, 4
>>> > > SwiftScript trace: for, 1, 1
>>> > > SwiftScript trace: for, 7, 7
>>> > > SwiftScript trace: for, 7, 7
>>> > > SwiftScript trace: for, 6, 6
>>> > > SwiftScript trace: for, 9, 9
>>> > > SwiftScript trace: for, 6, 6
>>> > > SwiftScript trace: for, 1, 1
>>> > > SwiftScript trace: for, 2, 2
>>> > > SwiftScript trace: for, 0, 0
>>> > > SwiftScript trace: for, 8, 8
>>> > > SwiftScript trace: for, 0, 0
>>> > > SwiftScript trace: for, 8, 8
>>> > > Final status:
>>> > > com$
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > ----- Original Message -----
>>> > > > The script I posted might be too complex to use to replicate the
>>> > > > twice
>>> > > > each bug. However, didn't Mike post a simple loop script that was
>>> > > > looping twice when the bug was initially found?
>>> > > >
>>> > > >
>>> > > > On Tue, Apr 5, 2011 at 2:17 PM, Ketan Maheshwari <
>>> > > > ketancmaheshwari at gmail.com > wrote:
>>> > > >
>>> > > >
>>> > > >
>>> > > > Sarah,
>>> > > >
>>> > > >
>>> > > > I do not have the test you are asking for yet. I am looking at the
>>> > > > test suite and will start on Beagle soon.
>>> > > >
>>> > > >
>>> > > > Ketan
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > On Apr 5, 2011, at 2:13 PM, Sarah Kenny wrote:
>>> > > >
>>> > > >
>>> > > > i'm currently working on a swift script to replicate the bug for
>>> > > > .92
>>> > > > which i will then commit to svn in the test suite. if you mike, or
>>> > > > ketan already have this let me know (i'm trying to hack the script
>>> > > > jon
>>> > > > posted to the list) and i'll use yours...david said he doesn't
>>> > > > have
>>> > > > one.
>>> > > >
>>> > > > as i said, my plan was to test on ranger, abe and a couple of
>>> > > > (uci)
>>> > > > local workstations.
>>> > > >
>>> > > > ~sk
>>> > > >
>>> > > >
>>> > > > On Tue, Apr 5, 2011 at 12:10 PM, Michael Wilde < wilde at mcs.anl.gov
>>> > > > >
>>> > > > wrote:
>>> > > >
>>> > > >
>>> > > > David, Sarah, Ketan,
>>> > > >
>>> > > > Can you all report back to the devel list on your progress on
>>> > > > testing
>>> > > > the release? Ie, what systems are you testing, and which of those
>>> > > > tests are complete? When will the rest be done, and hence when are
>>> > > > we
>>> > > > ready to tag and release the fix?
>>> > > >
>>> > > > I asked who will create the test to confirm that the twice-each
>>> > > > bug
>>> > > > is
>>> > > > fixed, but no one responded. Which of the three of you feel you
>>> > > > know
>>> > > > how to do this? Is this being tested in your new tests?
>>> > > >
>>> > > > Ketan tells me that in the 0.92+ interim release I made for Beagle
>>> > > > it
>>> > > > looks like the resume feature is not working. I was aware that
>>> > > > such
>>> > > > a
>>> > > > bug was reported in trunk, but in the original 0.92 Cray version
>>> > > > (under /home/wilde/swift/rev) resume *was* working. Does the test
>>> > > > suite test the resume feature at the moment?
>>> > > >
>>> > > > Lastly, who will tag and upload the new release, remove or change
>>> > > > the
>>> > > > red warning in the download page, and announce 0.92.1 on
>>> > > > swift-user?
>>> > > >
>>> > > > - Mike
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > ----- Original Message -----
>>> > > > > Thanks, David. Please cc all discussion of this sort to
>>> > > > > swift-devel.
>>> > > > >
>>> > > > > I assume SVN is working for you now? (It was working for me,
>>> > > > > from
>>> > > > > communicadao, around 9AM this morning).
>>> > > > >
>>> > > > > - Mike
>>> > > > >
>>> > > > >
>>> > > > > ----- Original Message -----
>>> > > > > > It appears that there may be a problem with
>>> > > > > > svn.ci.uchicago.edu
>>> > > > > > .
>>> > > > > > I
>>> > > > > > am
>>> > > > > > unable to connect from an SVN client or through the web
>>> > > > > > interface
>>> > > > > > -
>>> > > > > > both attempts just hang indefinitely. I have sent an email to
>>> > > > > > support
>>> > > > > > (ticket 12539), but just wanted to give you guys a heads up
>>> > > > > > that
>>> > > > > > there
>>> > > > > > may be an issue there. I will try to run the tests again in
>>> > > > > > the
>>> > > > > > morning.
>>> > > > > >
>>> > > > > > David
>>> > > > > >
>>> > > > > >
>>> > > > > > On Mon, Apr 4, 2011 at 2:42 PM, Michael Wilde <
>>> > > > > > wilde at mcs.anl.gov
>>> > > > > > >
>>> > > > > > wrote:
>>> > > > > >
>>> > > > > >
>>> > > > > > David, Sarah,
>>> > > > > >
>>> > > > > > How quickly could you re-divide the Swift site test plan
>>> > > > > > between
>>> > > > > > you
>>> > > > > > and confirm back to swift-devel that we are ready to tag and
>>> > > > > > release
>>> > > > > > the branch as 0.92.1?
>>> > > > > >
>>> > > > > > Before we do that, you need to add a test to the test suite
>>> > > > > > that
>>> > > > > > can
>>> > > > > > replicate the twice-each bug and verify that its detected in
>>> > > > > > 0.92
>>> > > > > > and
>>> > > > > > corrected in 0.92.1
>>> > > > > >
>>> > > > > > Can you possibly do this by noon tomorrow?
>>> > > > > >
>>> > > > > > Can you post a checklist of tests with names of who's going to
>>> > > > > > run
>>> > > > > > them?
>>> > > > > >
>>> > > > > > Depending on what you can commit to, I will see if I, Ketan,
>>> > > > > > and/or
>>> > > > > > Justin can help take various sites as well. I feel we really
>>> > > > > > need
>>> > > > > > to
>>> > > > > > do this quickly so we have a stable trusted release out there.
>>> > > > > >
>>> > > > > >
>>> > > > > >
>>> > > > > > Thanks,
>>> > > > > >
>>> > > > > > Mike
>>> > > > > >
>>> > > > > > --
>>> > > > > > Michael Wilde
>>> > > > > > Computation Institute, University of Chicago
>>> > > > > > Mathematics and Computer Science Division
>>> > > > > > Argonne National Laboratory
>>> > > > >
>>> > > > > --
>>> > > > > Michael Wilde
>>> > > > > Computation Institute, University of Chicago
>>> > > > > Mathematics and Computer Science Division
>>> > > > > Argonne National Laboratory
>>> > > > >
>>> > > > > _______________________________________________
>>> > > > > Swift-devel mailing list
>>> > > > > Swift-devel at ci.uchicago.edu
>>> > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>> > > >
>>> > > > --
>>> > > >
>>> > > >
>>> > > >
>>> > > > Michael Wilde
>>> > > > Computation Institute, University of Chicago
>>> > > > Mathematics and Computer Science Division
>>> > > > Argonne National Laboratory
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > _______________________________________________
>>> > > > Swift-devel mailing list
>>> > > > Swift-devel at ci.uchicago.edu
>>> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>> > > >
>>> > > >
>>> > > >
>>> > > >
>>> > > > --
>>> > > > Any intelligent fool can make things bigger and more complex... It
>>> > > > takes a touch of genius - and a lot of courage to move in the
>>> > > > opposite
>>> > > > direction.
>>> > > > - Albert Einstein
>>> > > >
>>> > > >
>>> > > >
>>> > > > _______________________________________________
>>> > > > Swift-devel mailing list
>>> > > > Swift-devel at ci.uchicago.edu
>>> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>> > >
>>> > > --
>>> > > Michael Wilde
>>> > > Computation Institute, University of Chicago
>>> > > Mathematics and Computer Science Division
>>> > > Argonne National Laboratory
>>> > >
>>> > >
>>> > >
>>> > >
>>> > > --
>>> > >
>>> > >
>>> > >
>>> > > Any intelligent fool can make things bigger and more complex... It
>>> > > takes a touch of genius - and a lot of courage to move in the
>>> > > opposite
>>> > > direction.
>>> > > - Albert Einstein
>>> > >
>>> > >
>>> > >
>>> > > _______________________________________________
>>> > > Swift-devel mailing list
>>> > > Swift-devel at ci.uchicago.edu
>>> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel
>>> >
>>> > --
>>> >
>>> >
>>> >
>>> > Michael Wilde
>>> > Computation Institute, University of Chicago
>>> > Mathematics and Computer Science Division
>>> > Argonne National Laboratory
>>> >
>>> >
>>> >
>>> >
>>> > --
>>> > Any intelligent fool can make things bigger and more complex... It
>>> > takes a touch of genius - and a lot of courage to move in the opposite
>>> > direction.
>>> > - Albert Einstein
>>>
>>> --
>>> Michael Wilde
>>> Computation Institute, University of Chicago
>>> Mathematics and Computer Science Division
>>> Argonne National Laboratory
>>>
>>>
>>
>>
>> --
>> Any intelligent fool can make things bigger and more complex... It takes a
>> touch of genius - and a lot of courage to move in the opposite direction.
>> - Albert Einstein
>>
>>
>>
>
>
> --
> Any intelligent fool can make things bigger and more complex... It takes a
> touch of genius - and a lot of courage to move in the opposite direction.
> - Albert Einstein
>
>
>


-- 
Any intelligent fool can make things bigger and more complex... It takes a
touch of genius - and a lot of courage to move in the opposite direction.
- Albert Einstein
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mcs.anl.gov/pipermail/swift-devel/attachments/20110411/3c1128e5/attachment.html>


More information about the Swift-devel mailing list