From chad at uchicago.edu Thu Nov 1 18:33:29 2007
From: chad at uchicago.edu (Chad Glendenin)
Date: Thu, 1 Nov 2007 18:33:29 -0500
Subject: [Swift-user] Running on teraport
Message-ID: <7D0F4CC1-073B-4A29-B962-A52CB7AA9557@uchicago.edu>
I just got an account on teraport, and I'm trying to see if I can run
a Swift 0.3 workflow from my laptop to teraport, but it's not
working. Right now, I'm just trying to run 'hostname' to verify that
it's running in the right place. I added this line to tc.data:
teraport hostname /bin/hostname INSTALLED INTEL32::LINUX null
but with tabs instead of spaces.
In sites.xml, I just uncommented the teraport entry and changed the
storage and work directories from Tibi's home directory to my own,
like this:
/home/chad/tmp/swift
The script is basically the same as "hello world," but with
'hostname' instead of 'echo'.
When I try to run it, I get the following:
Execution failed:
Missing argument minor for sys:element(url, storage, major,
minor, patch)
Is that a problem with the sites.xml entry? What am I forgetting?
Thanks,
ccg
From hategan at mcs.anl.gov Thu Nov 1 18:38:01 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 01 Nov 2007 18:38:01 -0500
Subject: [Swift-user] Running on teraport
In-Reply-To: <7D0F4CC1-073B-4A29-B962-A52CB7AA9557@uchicago.edu>
References: <7D0F4CC1-073B-4A29-B962-A52CB7AA9557@uchicago.edu>
Message-ID: <1193960281.9812.9.camel@blabla.mcs.anl.gov>
Chad,
Paste the whole sites.xml here or run with debug (-d) and paste the
output. Or both.
Mihael
On Thu, 2007-11-01 at 18:33 -0500, Chad Glendenin wrote:
> I just got an account on teraport, and I'm trying to see if I can run
> a Swift 0.3 workflow from my laptop to teraport, but it's not
> working. Right now, I'm just trying to run 'hostname' to verify that
> it's running in the right place. I added this line to tc.data:
>
> teraport hostname /bin/hostname INSTALLED INTEL32::LINUX null
>
> but with tabs instead of spaces.
>
> In sites.xml, I just uncommented the teraport entry and changed the
> storage and work directories from Tibi's home directory to my own,
> like this:
>
>
>
>
>
> /home/chad/tmp/swift
>
>
> The script is basically the same as "hello world," but with
> 'hostname' instead of 'echo'.
>
> When I try to run it, I get the following:
>
> Execution failed:
> Missing argument minor for sys:element(url, storage, major,
> minor, patch)
>
> Is that a problem with the sites.xml entry? What am I forgetting?
>
> Thanks,
> ccg
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>
From chad at uchicago.edu Thu Nov 1 19:43:24 2007
From: chad at uchicago.edu (Chad Glendenin)
Date: Thu, 1 Nov 2007 19:43:24 -0500
Subject: [Swift-user] Running on teraport
In-Reply-To: <1193960281.9812.9.camel@blabla.mcs.anl.gov>
References: <7D0F4CC1-073B-4A29-B962-A52CB7AA9557@uchicago.edu>
<1193960281.9812.9.camel@blabla.mcs.anl.gov>
Message-ID:
Thanks for pointing out -d. That told me the line number of the
problem, which just turned out to be an easily fixed typo.
ccg
On Nov 1, 2007, at 6:38 PM, Mihael Hategan wrote:
> Chad,
>
> Paste the whole sites.xml here or run with debug (-d) and paste the
> output. Or both.
>
> Mihael
>
> On Thu, 2007-11-01 at 18:33 -0500, Chad Glendenin wrote:
>> I just got an account on teraport, and I'm trying to see if I can run
>> a Swift 0.3 workflow from my laptop to teraport, but it's not
>> working. Right now, I'm just trying to run 'hostname' to verify that
>> it's running in the right place. I added this line to tc.data:
>>
>> teraport hostname /bin/hostname INSTALLED INTEL32::LINUX null
>>
>> but with tabs instead of spaces.
>>
>> In sites.xml, I just uncommented the teraport entry and changed the
>> storage and work directories from Tibi's home directory to my own,
>> like this:
>>
>>
>>
>>
>>
>> /home/chad/tmp/swift
>>
>>
>> The script is basically the same as "hello world," but with
>> 'hostname' instead of 'echo'.
>>
>> When I try to run it, I get the following:
>>
>> Execution failed:
>> Missing argument minor for sys:element(url, storage, major,
>> minor, patch)
>>
>> Is that a problem with the sites.xml entry? What am I forgetting?
>>
>> Thanks,
>> ccg
>>
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>
>
From deng at mcs.anl.gov Tue Nov 6 11:46:56 2007
From: deng at mcs.anl.gov (Yuqing Deng)
Date: Tue, 6 Nov 2007 11:46:56 -0600
Subject: [Swift-user] Job bundles
Message-ID:
Hi,
I am using swift to run workflow on login-abe.ncsa.teragrid.org at
ncsa. Abe is allocated on node basis. Each of the node has 8
computing cores. My jobs are all serial. What happens is that only
one jobs runs on one core per node. It there a way to bundle jobs so
that 8 of them
could run simultaneously on a node? I have tried to use
8
1
in the sites.xml file. But doing that seems to run the same job eight
times on a node.
Thanks,
Yuqing
From iraicu at cs.uchicago.edu Tue Nov 6 11:53:30 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 06 Nov 2007 11:53:30 -0600
Subject: [Swift-user] Job bundles
In-Reply-To:
References:
Message-ID: <4730AA1A.2020809@cs.uchicago.edu>
Hi,
I had a similar discussion with Nika, and here was the summary of what I
told her (in the context of GRAM4):
GRAM4 has two elements that dictate how many nodes and processes it gets
(in the XML RSL).
processCount
hostCount
The is the number of nodes you want, say 1. The is
the number of processes that you want to start in total. To compute the
number of processes per node, you would simply take
processCount/hostCount. Now, the catch is that all the commands and
argument to the particular GRAM4 call will have to be the same, so you
won't be able to specify to GRAM that you want 8 processes per node say,
but to run a different process for each of those 8. You will have to
have this kind of logic internally in your app. If Swift works the way
I think it works, I don't think you will be able to use multiple
processors unless at least one of the following is true:
1) the application is already multi-threaded, and implicitly can use
multiple cores
2) the LRM allows the partitioning of the SMP machine into smaller
pieces; for example, with 8 processor node, if it lets you submit 8 jobs
that only need 1 processor, and it will launch 8 different jobs on the
same node, then you are fine... the parallelism will be done
automatically by the LRM, as long as you ask for only 1 process at a
time; on the TG at least, I don't think this is how things work, and
when you get a node, regardless of how many processors it has, you get
full access to all processors, not just the ones you asked for.
3) the bundling component in Swift somehow should be able to control how
many concurrent jobs it should perform; by default, I suppose it
serializes the entire bundle, but you could imagine having a parameter
that allows you to increase the parallelism if you know the application
is not CPU bound for example
Choices #1 and #2 are the easiest, as you don't have to do anything
special from Swift's point of view. Choice #3 requires that Swift
handle the parallelism. GRAM4 as far as I know will not handle this.
Maybe the Swift team can shed more light on option #3, if there is such
an option.
Ioan
Yuqing Deng wrote:
> Hi,
>
> I am using swift to run workflow on login-abe.ncsa.teragrid.org at
> ncsa. Abe is allocated on node basis. Each of the node has 8
> computing cores. My jobs are all serial. What happens is that only
> one jobs runs on one core per node. It there a way to bundle jobs so
> that 8 of them
> could run simultaneously on a node? I have tried to use
> 8
> 1
> in the sites.xml file. But doing that seems to run the same job eight
> times on a node.
>
> Thanks,
>
> Yuqing
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>
>
--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web: http://www.cs.uchicago.edu/~iraicu
http://dsl.cs.uchicago.edu/
============================================
============================================
From benc at hawaga.org.uk Tue Nov 6 12:15:06 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 6 Nov 2007 18:15:06 +0000 (GMT)
Subject: [Swift-user] Job bundles
In-Reply-To: <4730AA1A.2020809@cs.uchicago.edu>
References:
<4730AA1A.2020809@cs.uchicago.edu>
Message-ID:
On Tue, 6 Nov 2007, Ioan Raicu wrote:
> 2) the LRM allows the partitioning of the SMP machine into smaller pieces; for
> example, with 8 processor node, if it lets you submit 8 jobs that only need 1
> processor, and it will launch 8 different jobs on the same node, then you are
> fine... the parallelism will be done automatically by the LRM, as long as you
> ask for only 1 process at a time; on the TG at least, I don't think this is
> how things work, and when you get a node, regardless of how many processors it
> has, you get full access to all processors, not just the ones you asked for.
PBS allows the specification of multiple processes per node, like this
(grabbed from google)
> qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs
It looks like abe runs PBS.
So I think you could specify a globus profile key in the sites.xml,
perhaps something like this:
8
I haven't tried this myself, but I'd be interested to hear your results.
--
From iraicu at cs.uchicago.edu Tue Nov 6 12:26:02 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 06 Nov 2007 12:26:02 -0600
Subject: [Swift-user] Job bundles
In-Reply-To:
References:
<4730AA1A.2020809@cs.uchicago.edu>
Message-ID: <4730B1BA.4000404@cs.uchicago.edu>
Right, its not that PBS doesn't support it, its more of a policy thing.
On the TeraGrid, my experience has been that when PBS (or whatever LRM
is being used) allocates CPUs, it always allocates at the machine level,
not at the CPU level. That means, if you have an 8 processor machine,
and you get 1 processor on that machine, then you get (and are charged
for) the whole machine as you have exclusive rights to this machine for
the duration of your reservation. I have seen this behave differently
in other environments, such as TeraPort, where PBS was allocating at the
processor level, and not the machine level. This is why I said that I
think Swift would need to somehow handle this at the worker node
scripts, and not rely necessarily on the LRM doing this.
Ioan
Ben Clifford wrote:
> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>
>
>> 2) the LRM allows the partitioning of the SMP machine into smaller pieces; for
>> example, with 8 processor node, if it lets you submit 8 jobs that only need 1
>> processor, and it will launch 8 different jobs on the same node, then you are
>> fine... the parallelism will be done automatically by the LRM, as long as you
>> ask for only 1 process at a time; on the TG at least, I don't think this is
>> how things work, and when you get a node, regardless of how many processors it
>> has, you get full access to all processors, not just the ones you asked for.
>>
>
>
> PBS allows the specification of multiple processes per node, like this
> (grabbed from google)
>
>
>> qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs
>>
>
> It looks like abe runs PBS.
>
> So I think you could specify a globus profile key in the sites.xml,
> perhaps something like this:
>
> 8
>
> I haven't tried this myself, but I'd be interested to hear your results.
>
--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web: http://www.cs.uchicago.edu/~iraicu
http://dsl.cs.uchicago.edu/
============================================
============================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From benc at hawaga.org.uk Tue Nov 6 12:29:53 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 6 Nov 2007 18:29:53 +0000 (GMT)
Subject: [Swift-user] Job bundles
In-Reply-To: <4730B1BA.4000404@cs.uchicago.edu>
References:
<4730AA1A.2020809@cs.uchicago.edu>
<4730B1BA.4000404@cs.uchicago.edu>
Message-ID:
That's what the ppn parameter specifies to PBS.
On Tue, 6 Nov 2007, Ioan Raicu wrote:
> Right, its not that PBS doesn't support it, its more of a policy thing. On
> the TeraGrid, my experience has been that when PBS (or whatever LRM is being
> used) allocates CPUs, it always allocates at the machine level, not at the CPU
> level. That means, if you have an 8 processor machine, and you get 1
> processor on that machine, then you get (and are charged for) the whole
> machine as you have exclusive rights to this machine for the duration of your
> reservation. I have seen this behave differently in other environments, such
> as TeraPort, where PBS was allocating at the processor level, and not the
> machine level. This is why I said that I think Swift would need to somehow
> handle this at the worker node scripts, and not rely necessarily on the LRM
> doing this.
> Ioan
>
> Ben Clifford wrote:
> > On Tue, 6 Nov 2007, Ioan Raicu wrote:
> >
> >
> > > 2) the LRM allows the partitioning of the SMP machine into smaller pieces;
> > > for
> > > example, with 8 processor node, if it lets you submit 8 jobs that only
> > > need 1
> > > processor, and it will launch 8 different jobs on the same node, then you
> > > are
> > > fine... the parallelism will be done automatically by the LRM, as long as
> > > you
> > > ask for only 1 process at a time; on the TG at least, I don't think this
> > > is
> > > how things work, and when you get a node, regardless of how many
> > > processors it
> > > has, you get full access to all processors, not just the ones you asked
> > > for.
> > >
> >
> >
> > PBS allows the specification of multiple processes per node, like this
> > (grabbed from google)
> >
> >
> > > qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs
> > >
> >
> > It looks like abe runs PBS.
> >
> > So I think you could specify a globus profile key in the sites.xml, perhaps
> > something like this:
> >
> > 8
> >
> > I haven't tried this myself, but I'd be interested to hear your results.
> >
>
>
From iraicu at cs.uchicago.edu Tue Nov 6 12:36:48 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 06 Nov 2007 12:36:48 -0600
Subject: [Swift-user] Job bundles
In-Reply-To:
References:
<4730AA1A.2020809@cs.uchicago.edu>
<4730B1BA.4000404@cs.uchicago.edu>
Message-ID: <4730B440.8040602@cs.uchicago.edu>
Here is what I get at the UC/ANL TG site:
qsub -I -l nodes=1:ppn=1:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
qsub -I -l nodes=1:ppn=2:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
iraicu at tg-viz-login2:~> showq -u iraicu
active jobs------------------------
JOBID USERNAME STATE PROCS REMAINING
STARTTIME
1574623 iraicu Running 2 00:29:55 Tue Nov 6
12:34:23
1574621 iraicu Running 2 00:29:21 Tue Nov 6
12:33:49
2 active jobs 4 of 242 processors in use by local jobs (1.65%)
20 of 121 nodes active (16.53%)
eligible jobs----------------------
JOBID USERNAME STATE PROCS WCLIMIT
QUEUETIME
0 eligible jobs
blocked jobs-----------------------
JOBID USERNAME STATE PROCS WCLIMIT
QUEUETIME
0 blocked jobs
Total jobs: 2
Notice that both jobs have 2 processors allocated! These same commands
on TeraPort would have yielded one allocation with 1 processor and
another with 2 processors. This is what I meant by "it a policy thing",
because PBS can be configured to ignore the ppn field.
Ioan
Ben Clifford wrote:
> That's what the ppn parameter specifies to PBS.
>
> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>
>
>> Right, its not that PBS doesn't support it, its more of a policy thing. On
>> the TeraGrid, my experience has been that when PBS (or whatever LRM is being
>> used) allocates CPUs, it always allocates at the machine level, not at the CPU
>> level. That means, if you have an 8 processor machine, and you get 1
>> processor on that machine, then you get (and are charged for) the whole
>> machine as you have exclusive rights to this machine for the duration of your
>> reservation. I have seen this behave differently in other environments, such
>> as TeraPort, where PBS was allocating at the processor level, and not the
>> machine level. This is why I said that I think Swift would need to somehow
>> handle this at the worker node scripts, and not rely necessarily on the LRM
>> doing this.
>> Ioan
>>
>> Ben Clifford wrote:
>>
>>> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>>>
>>>
>>>
>>>> 2) the LRM allows the partitioning of the SMP machine into smaller pieces;
>>>> for
>>>> example, with 8 processor node, if it lets you submit 8 jobs that only
>>>> need 1
>>>> processor, and it will launch 8 different jobs on the same node, then you
>>>> are
>>>> fine... the parallelism will be done automatically by the LRM, as long as
>>>> you
>>>> ask for only 1 process at a time; on the TG at least, I don't think this
>>>> is
>>>> how things work, and when you get a node, regardless of how many
>>>> processors it
>>>> has, you get full access to all processors, not just the ones you asked
>>>> for.
>>>>
>>>>
>>> PBS allows the specification of multiple processes per node, like this
>>> (grabbed from google)
>>>
>>>
>>>
>>>> qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs
>>>>
>>>>
>>> It looks like abe runs PBS.
>>>
>>> So I think you could specify a globus profile key in the sites.xml, perhaps
>>> something like this:
>>>
>>> 8
>>>
>>> I haven't tried this myself, but I'd be interested to hear your results.
>>>
>>>
>>
>
>
--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web: http://www.cs.uchicago.edu/~iraicu
http://dsl.cs.uchicago.edu/
============================================
============================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From benc at hawaga.org.uk Tue Nov 6 12:57:46 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 6 Nov 2007 18:57:46 +0000 (GMT)
Subject: [Swift-user] Job bundles
In-Reply-To: <4730B440.8040602@cs.uchicago.edu>
References:
<4730AA1A.2020809@cs.uchicago.edu>
<4730B1BA.4000404@cs.uchicago.edu>
<4730B440.8040602@cs.uchicago.edu>
Message-ID:
yeah, I see same. though the TG UC docs suggest it should work.
I can't log into abe to see what happens there but it would be interesting
to know.
On Tue, 6 Nov 2007, Ioan Raicu wrote:
> Here is what I get at the UC/ANL TG site:
> qsub -I -l nodes=1:ppn=1:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
> qsub -I -l nodes=1:ppn=2:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
>
> iraicu at tg-viz-login2:~> showq -u iraicu
>
> active jobs------------------------
> JOBID USERNAME STATE PROCS REMAINING STARTTIME
>
> 1574623 iraicu Running 2 00:29:55 Tue Nov 6 12:34:23
> 1574621 iraicu Running 2 00:29:21 Tue Nov 6 12:33:49
>
> 2 active jobs 4 of 242 processors in use by local jobs (1.65%)
> 20 of 121 nodes active (16.53%)
>
> eligible jobs----------------------
> JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME
>
>
> 0 eligible jobs
> blocked jobs-----------------------
> JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME
>
>
> 0 blocked jobs
> Total jobs: 2
>
> Notice that both jobs have 2 processors allocated! These same commands on
> TeraPort would have yielded one allocation with 1 processor and another with 2
> processors. This is what I meant by "it a policy thing", because PBS can be
> configured to ignore the ppn field.
>
> Ioan
>
> Ben Clifford wrote:
> > That's what the ppn parameter specifies to PBS.
> >
> > On Tue, 6 Nov 2007, Ioan Raicu wrote:
> >
> >
> > > Right, its not that PBS doesn't support it, its more of a policy thing.
> > > On
> > > the TeraGrid, my experience has been that when PBS (or whatever LRM is
> > > being
> > > used) allocates CPUs, it always allocates at the machine level, not at the
> > > CPU
> > > level. That means, if you have an 8 processor machine, and you get 1
> > > processor on that machine, then you get (and are charged for) the whole
> > > machine as you have exclusive rights to this machine for the duration of
> > > your
> > > reservation. I have seen this behave differently in other environments,
> > > such
> > > as TeraPort, where PBS was allocating at the processor level, and not the
> > > machine level. This is why I said that I think Swift would need to
> > > somehow
> > > handle this at the worker node scripts, and not rely necessarily on the
> > > LRM
> > > doing this. Ioan
> > >
> > > Ben Clifford wrote:
> > >
> > > > On Tue, 6 Nov 2007, Ioan Raicu wrote:
> > > >
> > > >
> > > > > 2) the LRM allows the partitioning of the SMP machine into smaller
> > > > > pieces;
> > > > > for
> > > > > example, with 8 processor node, if it lets you submit 8 jobs that only
> > > > > need 1
> > > > > processor, and it will launch 8 different jobs on the same node, then
> > > > > you
> > > > > are
> > > > > fine... the parallelism will be done automatically by the LRM, as long
> > > > > as
> > > > > you
> > > > > ask for only 1 process at a time; on the TG at least, I don't think
> > > > > this
> > > > > is
> > > > > how things work, and when you get a node, regardless of how many
> > > > > processors it
> > > > > has, you get full access to all processors, not just the ones you
> > > > > asked
> > > > > for.
> > > > >
> > > > PBS allows the specification of multiple processes per node, like this
> > > > (grabbed from google)
> > > >
> > > >
> > > > > qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs
> > > > >
> > > > It looks like abe runs PBS.
> > > >
> > > > So I think you could specify a globus profile key in the sites.xml,
> > > > perhaps
> > > > something like this:
> > > >
> > > > 8
> > > >
> > > > I haven't tried this myself, but I'd be interested to hear your results.
> > > >
> > >
> >
> >
>
>
From iraicu at cs.uchicago.edu Tue Nov 6 13:10:16 2007
From: iraicu at cs.uchicago.edu (Ioan Raicu)
Date: Tue, 06 Nov 2007 13:10:16 -0600
Subject: [Swift-user] Job bundles
In-Reply-To:
References:
<4730AA1A.2020809@cs.uchicago.edu>
<4730B1BA.4000404@cs.uchicago.edu>
<4730B440.8040602@cs.uchicago.edu>
Message-ID: <4730BC18.30204@cs.uchicago.edu>
If the docs say that PBS should support this option, maybe write help at tg
to ask them why it doesn't work as the docs say.
Ioan
Ben Clifford wrote:
> yeah, I see same. though the TG UC docs suggest it should work.
>
> I can't log into abe to see what happens there but it would be interesting
> to know.
>
> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>
>
>> Here is what I get at the UC/ANL TG site:
>> qsub -I -l nodes=1:ppn=1:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
>> qsub -I -l nodes=1:ppn=2:ia32-compute,walltime=0:30:00 -A TG-CCR070008T
>>
>> iraicu at tg-viz-login2:~> showq -u iraicu
>>
>> active jobs------------------------
>> JOBID USERNAME STATE PROCS REMAINING STARTTIME
>>
>> 1574623 iraicu Running 2 00:29:55 Tue Nov 6 12:34:23
>> 1574621 iraicu Running 2 00:29:21 Tue Nov 6 12:33:49
>>
>> 2 active jobs 4 of 242 processors in use by local jobs (1.65%)
>> 20 of 121 nodes active (16.53%)
>>
>> eligible jobs----------------------
>> JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME
>>
>>
>> 0 eligible jobs
>> blocked jobs-----------------------
>> JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME
>>
>>
>> 0 blocked jobs
>> Total jobs: 2
>>
>> Notice that both jobs have 2 processors allocated! These same commands on
>> TeraPort would have yielded one allocation with 1 processor and another with 2
>> processors. This is what I meant by "it a policy thing", because PBS can be
>> configured to ignore the ppn field.
>>
>> Ioan
>>
>> Ben Clifford wrote:
>>
>>> That's what the ppn parameter specifies to PBS.
>>>
>>> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>>>
>>>
>>>
>>>> Right, its not that PBS doesn't support it, its more of a policy thing.
>>>> On
>>>> the TeraGrid, my experience has been that when PBS (or whatever LRM is
>>>> being
>>>> used) allocates CPUs, it always allocates at the machine level, not at the
>>>> CPU
>>>> level. That means, if you have an 8 processor machine, and you get 1
>>>> processor on that machine, then you get (and are charged for) the whole
>>>> machine as you have exclusive rights to this machine for the duration of
>>>> your
>>>> reservation. I have seen this behave differently in other environments,
>>>> such
>>>> as TeraPort, where PBS was allocating at the processor level, and not the
>>>> machine level. This is why I said that I think Swift would need to
>>>> somehow
>>>> handle this at the worker node scripts, and not rely necessarily on the
>>>> LRM
>>>> doing this. Ioan
>>>>
>>>> Ben Clifford wrote:
>>>>
>>>>
>>>>> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>>>>>
>>>>>
>>>>>
>>>>>> 2) the LRM allows the partitioning of the SMP machine into smaller
>>>>>> pieces;
>>>>>> for
>>>>>> example, with 8 processor node, if it lets you submit 8 jobs that only
>>>>>> need 1
>>>>>> processor, and it will launch 8 different jobs on the same node, then
>>>>>> you
>>>>>> are
>>>>>> fine... the parallelism will be done automatically by the LRM, as long
>>>>>> as
>>>>>> you
>>>>>> ask for only 1 process at a time; on the TG at least, I don't think
>>>>>> this
>>>>>> is
>>>>>> how things work, and when you get a node, regardless of how many
>>>>>> processors it
>>>>>> has, you get full access to all processors, not just the ones you
>>>>>> asked
>>>>>> for.
>>>>>>
>>>>>>
>>>>> PBS allows the specification of multiple processes per node, like this
>>>>> (grabbed from google)
>>>>>
>>>>>
>>>>>
>>>>>> qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs
>>>>>>
>>>>>>
>>>>> It looks like abe runs PBS.
>>>>>
>>>>> So I think you could specify a globus profile key in the sites.xml,
>>>>> perhaps
>>>>> something like this:
>>>>>
>>>>> 8
>>>>>
>>>>> I haven't tried this myself, but I'd be interested to hear your results.
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>
>
--
============================================
Ioan Raicu
Ph.D. Student
============================================
Distributed Systems Laboratory
Computer Science Department
University of Chicago
1100 E. 58th Street, Ryerson Hall
Chicago, IL 60637
============================================
Email: iraicu at cs.uchicago.edu
Web: http://www.cs.uchicago.edu/~iraicu
http://dsl.cs.uchicago.edu/
============================================
============================================
-------------- next part --------------
An HTML attachment was scrubbed...
URL:
From deng at mcs.anl.gov Tue Nov 6 13:35:46 2007
From: deng at mcs.anl.gov (Yuqing Deng)
Date: Tue, 6 Nov 2007 13:35:46 -0600
Subject: [Swift-user] Job bundles
In-Reply-To:
References:
<4730AA1A.2020809@cs.uchicago.edu>
Message-ID:
On 11/6/07, Ben Clifford wrote:
>
>
> On Tue, 6 Nov 2007, Ioan Raicu wrote:
>
> > 2) the LRM allows the partitioning of the SMP machine into smaller pieces; for
> > example, with 8 processor node, if it lets you submit 8 jobs that only need 1
> > processor, and it will launch 8 different jobs on the same node, then you are
> > fine... the parallelism will be done automatically by the LRM, as long as you
> > ask for only 1 process at a time; on the TG at least, I don't think this is
> > how things work, and when you get a node, regardless of how many processors it
> > has, you get full access to all processors, not just the ones you asked for.
>
>
> PBS allows the specification of multiple processes per node, like this
> (grabbed from google)
>
> > qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs
>
> It looks like abe runs PBS.
>
I just tried it. The jobs are scheduled on different nodes on abe.
> So I think you could specify a globus profile key in the sites.xml,
> perhaps something like this:
>
> 8
>
I tried last Wednesday and got some really strange error message.
I think the correct way to set ppn number is to use the count rsl key word.
Yuqing
From deng at mcs.anl.gov Tue Nov 6 15:26:26 2007
From: deng at mcs.anl.gov (Yuqing Deng)
Date: Tue, 6 Nov 2007 15:26:26 -0600
Subject: [Swift-user] problem with purdue condor pool
Message-ID:
Hi, here's another problem:
The following command works at purdue:
globusrun -o -r tg-gatekeeper.purdue.teragrid.org/jobmanager-condor
'&(executable=/bin/hostname) (jobtype=single)'
However, when I use swift with the following in site.xml
I get this (full error in attachment):
MolDyn-1-loops-20071106-1520-gzzclbaa
Caused by: Cannot submit job: The AXIS engine could not find a target
service to invoke! targetService is null
Swift finished - workflow had errors
The same thing happens with jobmanager-pbs at purdue. Only
jobmanager-fork works with swift, but there is no problem with
globusrun.
Thanks,
Yuqing
-------------- next part --------------
A non-text attachment was scrubbed...
Name: error
Type: application/octet-stream
Size: 17491 bytes
Desc: not available
URL:
From benc at hawaga.org.uk Tue Nov 6 15:52:38 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Tue, 6 Nov 2007 21:52:38 +0000 (GMT)
Subject: [Swift-user] problem with purdue condor pool
In-Reply-To:
References:
Message-ID:
On Tue, 6 Nov 2007, Yuqing Deng wrote:
> The following command works at purdue:
> globusrun -o -r tg-gatekeeper.purdue.teragrid.org/jobmanager-condor
> '&(executable=/bin/hostname) (jobtype=single)'
That is using GRAM version 2.
> However, when I use swift with the following in site.xml
> url="tg-gatekeeper.purdue.teragrid.org/jobmanager-condor" major="4"
> minor="0" />
That tells it to use GRAM v4. That is a totally different piece of
software.
Change major=4 to major=2 to tell it to use GRAM v2.
--
From deng at mcs.anl.gov Wed Nov 7 10:36:48 2007
From: deng at mcs.anl.gov (Yuqing Deng)
Date: Wed, 7 Nov 2007 10:36:48 -0600
Subject: [Swift-user] problem with purdue condor pool
In-Reply-To:
References:
Message-ID:
Thanks. GRAM version 2 works with purdue condor pool.
They have three major different kinds machines in the pool:
Linux/X86_64, Linux/X86_32 and WINNT51/INTEL.
I only tested Linux/X86_64 but Linux/X86_32 should work too.
How do I use them all? Just give a different entry in to tc.data file
for apps built with different OS/ARCH?
For WINNT, I need to build the apps first.
Yuqing
On 11/6/07, Ben Clifford wrote:
>
> On Tue, 6 Nov 2007, Yuqing Deng wrote:
> > The following command works at purdue:
> > globusrun -o -r tg-gatekeeper.purdue.teragrid.org/jobmanager-condor
> > '&(executable=/bin/hostname) (jobtype=single)'
>
> That is using GRAM version 2.
>
> > However, when I use swift with the following in site.xml
> > > url="tg-gatekeeper.purdue.teragrid.org/jobmanager-condor" major="4"
> > minor="0" />
>
> That tells it to use GRAM v4. That is a totally different piece of
> software.
>
> Change major=4 to major=2 to tell it to use GRAM v2.
>
> --
>
>
From benc at hawaga.org.uk Wed Nov 7 11:20:24 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 7 Nov 2007 17:20:24 +0000 (GMT)
Subject: [Swift-user] problem with purdue condor pool
In-Reply-To:
References:
Message-ID:
On Wed, 7 Nov 2007, Yuqing Deng wrote:
> They have three major different kinds machines in the pool:
> Linux/X86_64, Linux/X86_32 and WINNT51/INTEL.
> I only tested Linux/X86_64 but Linux/X86_32 should work too.
> How do I use them all? Just give a different entry in to tc.data file
> for apps built with different OS/ARCH?
>
> For WINNT, I need to build the apps first.
I think that WINNT won't work because there is no Windows version of our
worker script (libexec/wrapper.sh)
For the linux machines, there are two different ways you could go:
i) multiple-site model:
define two sites, one called purdue-64 and one called purdue-32.
for purdue-64, specify a profile entry in sites.xml that restricts that
site to only 64 bit nodes; and for purdue-32, specify a profile entry that
restricts that site to only 32 bit nodes.
See http://www.purdue.teragrid.org/content/view/11/25/
I think maybe something like:
then specify tc.data entries with 64bit binaries for purdue-64 and
32 bit binaries for purdue-32.
Swift will treat the 64bit and 32bit pieces of the condor pool as two
separate sites. This will have disadvantages for rate control and file
staging but will allow you to have some executables compiled for only 64
bit, some for only 32 bit and some for both.
ii) one site model:
Make a script to replace each application. For example, replace myapp with
a script that looks at the architecture and chooses which executable to
run on. Then point your tc.data file at that script, instead of at your
application code. Swift will send jobs to the condor pool, and when a job
starts running on a particular worker node, the script will decide which
is the correct executable to use.
This is better for file staging and site scoring, because it treats the
site as one site; but means that you have to have each application
available in both 32 and 64 bits; that may or may not be a problem,
depending on what you are running.
Here's an example script I got from mike's home directory for choosing
between two programs based on architecture:
#!/bin/sh
ARCH_TEST=`uname -a | grep -c ia64`
if [ $ARCH_TEST -eq 0 ]; then # i686
/home/wilde/pegasus/src/tools/kickstart/kickstart.i686 $*
elif [ $ARCH_TEST -eq 1 ]; then # ia64
/home/wilde/pegasus/src/tools/kickstart/kickstart.ia64 $*
fi
--
From wilde at mcs.anl.gov Thu Nov 8 12:20:28 2007
From: wilde at mcs.anl.gov (Michael Wilde)
Date: Thu, 08 Nov 2007 12:20:28 -0600
Subject: [Swift-user] Passing strings with blanks to apps in swift
Message-ID: <4733536C.3080701@mcs.anl.gov>
In angle i need to pass a string of IDs as a single parameter to an
application:
angle "7171717 76 76" --more stuff --here
Can you point me to an example of how to pass/quote this correctly, so
that the command line is invoked exactly as above (in this case with
argc=5, and argv[1]="7171717 76 76" ?
Thanks.
From benc at hawaga.org.uk Thu Nov 8 12:29:29 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 8 Nov 2007 18:29:29 +0000 (GMT)
Subject: [Swift-user] Passing strings with blanks to apps in swift
In-Reply-To: <4733536C.3080701@mcs.anl.gov>
References: <4733536C.3080701@mcs.anl.gov>
Message-ID:
My initial answer would be "don't do that". Almost anything in unix gets
screwed by spaces. I know there are definitely problems in some places in
the swift/globus stack with doing this.
On Thu, 8 Nov 2007, Michael Wilde wrote:
> In angle i need to pass a string of IDs as a single parameter to an
> application:
> angle "7171717 76 76" --more stuff --here
>
> Can you point me to an example of how to pass/quote this correctly, so that
> the command line is invoked exactly as above (in this case with argc=5, and
> argv[1]="7171717 76 76" ?
>
> Thanks.
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>
>
From hategan at mcs.anl.gov Thu Nov 8 12:43:08 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 08 Nov 2007 12:43:08 -0600
Subject: [Swift-user] Passing strings with blanks to apps in swift
In-Reply-To: <4733536C.3080701@mcs.anl.gov>
References: <4733536C.3080701@mcs.anl.gov>
Message-ID: <1194547388.11817.0.camel@blabla.mcs.anl.gov>
If you put the string in quotes it should work. One string is one
element in argv.
On Thu, 2007-11-08 at 12:20 -0600, Michael Wilde wrote:
> In angle i need to pass a string of IDs as a single parameter to an
> application:
> angle "7171717 76 76" --more stuff --here
>
> Can you point me to an example of how to pass/quote this correctly, so
> that the command line is invoked exactly as above (in this case with
> argc=5, and argv[1]="7171717 76 76" ?
>
> Thanks.
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>
From benc at hawaga.org.uk Thu Nov 8 12:56:46 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 8 Nov 2007 18:56:46 +0000 (GMT)
Subject: [Swift-user] Passing strings with blanks to apps in swift
In-Reply-To: <1194547388.11817.0.camel@blabla.mcs.anl.gov>
References: <4733536C.3080701@mcs.anl.gov>
<1194547388.11817.0.camel@blabla.mcs.anl.gov>
Message-ID:
On Thu, 8 Nov 2007, Mihael Hategan wrote:
> If you put the string in quotes it should work. One string is one
> element in argv.
some of the GRAM2 jobmanagers don't deal with that - I know condor
doesn't.
>
> On Thu, 2007-11-08 at 12:20 -0600, Michael Wilde wrote:
> > In angle i need to pass a string of IDs as a single parameter to an
> > application:
> > angle "7171717 76 76" --more stuff --here
> >
> > Can you point me to an example of how to pass/quote this correctly, so
> > that the command line is invoked exactly as above (in this case with
> > argc=5, and argv[1]="7171717 76 76" ?
> >
> > Thanks.
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>
>
From hategan at mcs.anl.gov Thu Nov 8 13:03:10 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 08 Nov 2007 13:03:10 -0600
Subject: [Swift-user] Passing strings with blanks to apps in swift
In-Reply-To:
References: <4733536C.3080701@mcs.anl.gov>
<1194547388.11817.0.camel@blabla.mcs.anl.gov>
Message-ID: <1194548590.13231.0.camel@blabla.mcs.anl.gov>
On Thu, 2007-11-08 at 18:56 +0000, Ben Clifford wrote:
>
> On Thu, 8 Nov 2007, Mihael Hategan wrote:
>
> > If you put the string in quotes it should work. One string is one
> > element in argv.
>
> some of the GRAM2 jobmanagers don't deal with that - I know condor
> doesn't.
Don't use the condor job manager then.
>
> >
> > On Thu, 2007-11-08 at 12:20 -0600, Michael Wilde wrote:
> > > In angle i need to pass a string of IDs as a single parameter to an
> > > application:
> > > angle "7171717 76 76" --more stuff --here
> > >
> > > Can you point me to an example of how to pass/quote this correctly, so
> > > that the command line is invoked exactly as above (in this case with
> > > argc=5, and argv[1]="7171717 76 76" ?
> > >
> > > Thanks.
> > > _______________________________________________
> > > Swift-user mailing list
> > > Swift-user at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > >
> >
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >
> >
>
From hategan at mcs.anl.gov Thu Nov 8 13:06:16 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 08 Nov 2007 13:06:16 -0600
Subject: [Swift-user] Passing strings with blanks to apps in swift
In-Reply-To: <1194548590.13231.0.camel@blabla.mcs.anl.gov>
References: <4733536C.3080701@mcs.anl.gov>
<1194547388.11817.0.camel@blabla.mcs.anl.gov>
<1194548590.13231.0.camel@blabla.mcs.anl.gov>
Message-ID: <1194548776.13602.1.camel@blabla.mcs.anl.gov>
On Thu, 2007-11-08 at 13:03 -0600, Mihael Hategan wrote:
> On Thu, 2007-11-08 at 18:56 +0000, Ben Clifford wrote:
> >
> > On Thu, 8 Nov 2007, Mihael Hategan wrote:
> >
> > > If you put the string in quotes it should work. One string is one
> > > element in argv.
> >
> > some of the GRAM2 jobmanagers don't deal with that - I know condor
> > doesn't.
>
> Don't use the condor job manager then.
Actually file an enhancement request with CoG to quote things when the
condor job manager is used explicitly. Waiting for it to be fixed in
GRAM 2 and then deployed on sites is silly.
Mihael
>
> >
> > >
> > > On Thu, 2007-11-08 at 12:20 -0600, Michael Wilde wrote:
> > > > In angle i need to pass a string of IDs as a single parameter to an
> > > > application:
> > > > angle "7171717 76 76" --more stuff --here
> > > >
> > > > Can you point me to an example of how to pass/quote this correctly, so
> > > > that the command line is invoked exactly as above (in this case with
> > > > argc=5, and argv[1]="7171717 76 76" ?
> > > >
> > > > Thanks.
> > > > _______________________________________________
> > > > Swift-user mailing list
> > > > Swift-user at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > > >
> > >
> > > _______________________________________________
> > > Swift-user mailing list
> > > Swift-user at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > >
> > >
> >
>
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>
From benc at hawaga.org.uk Thu Nov 8 13:08:43 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Thu, 8 Nov 2007 19:08:43 +0000 (GMT)
Subject: [Swift-user] Passing strings with blanks to apps in swift
In-Reply-To: <1194548776.13602.1.camel@blabla.mcs.anl.gov>
References: <4733536C.3080701@mcs.anl.gov>
<1194547388.11817.0.camel@blabla.mcs.anl.gov>
<1194548590.13231.0.camel@blabla.mcs.anl.gov>
<1194548776.13602.1.camel@blabla.mcs.anl.gov>
Message-ID:
or don't use strings, just like any other time in unix.
On Thu, 8 Nov 2007, Mihael Hategan wrote:
> On Thu, 2007-11-08 at 13:03 -0600, Mihael Hategan wrote:
> > On Thu, 2007-11-08 at 18:56 +0000, Ben Clifford wrote:
> > >
> > > On Thu, 8 Nov 2007, Mihael Hategan wrote:
> > >
> > > > If you put the string in quotes it should work. One string is one
> > > > element in argv.
> > >
> > > some of the GRAM2 jobmanagers don't deal with that - I know condor
> > > doesn't.
> >
> > Don't use the condor job manager then.
>
> Actually file an enhancement request with CoG to quote things when the
> condor job manager is used explicitly. Waiting for it to be fixed in
> GRAM 2 and then deployed on sites is silly.
>
> Mihael
>
> >
> > >
> > > >
> > > > On Thu, 2007-11-08 at 12:20 -0600, Michael Wilde wrote:
> > > > > In angle i need to pass a string of IDs as a single parameter to an
> > > > > application:
> > > > > angle "7171717 76 76" --more stuff --here
> > > > >
> > > > > Can you point me to an example of how to pass/quote this correctly, so
> > > > > that the command line is invoked exactly as above (in this case with
> > > > > argc=5, and argv[1]="7171717 76 76" ?
> > > > >
> > > > > Thanks.
> > > > > _______________________________________________
> > > > > Swift-user mailing list
> > > > > Swift-user at ci.uchicago.edu
> > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > > > >
> > > >
> > > > _______________________________________________
> > > > Swift-user mailing list
> > > > Swift-user at ci.uchicago.edu
> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > > >
> > > >
> > >
> >
> > _______________________________________________
> > Swift-user mailing list
> > Swift-user at ci.uchicago.edu
> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> >
>
>
From hategan at mcs.anl.gov Thu Nov 8 13:26:26 2007
From: hategan at mcs.anl.gov (Mihael Hategan)
Date: Thu, 08 Nov 2007 13:26:26 -0600
Subject: [Swift-user] Passing strings with blanks to apps in swift
In-Reply-To:
References: <4733536C.3080701@mcs.anl.gov>
<1194547388.11817.0.camel@blabla.mcs.anl.gov>
<1194548590.13231.0.camel@blabla.mcs.anl.gov>
<1194548776.13602.1.camel@blabla.mcs.anl.gov>
Message-ID: <1194549987.15077.2.camel@blabla.mcs.anl.gov>
File the report. "Don't use spaces" does not nicely encapsulate a
reproducible solution. Sooner or later others will run into this problem
despite any "don't use spaces" we put in the documentation, simply
because the decision to not use spaces belongs to the application domain
not those who bridge Swift with the applications.
On Thu, 2007-11-08 at 19:08 +0000, Ben Clifford wrote:
> or don't use strings, just like any other time in unix.
>
> On Thu, 8 Nov 2007, Mihael Hategan wrote:
>
> > On Thu, 2007-11-08 at 13:03 -0600, Mihael Hategan wrote:
> > > On Thu, 2007-11-08 at 18:56 +0000, Ben Clifford wrote:
> > > >
> > > > On Thu, 8 Nov 2007, Mihael Hategan wrote:
> > > >
> > > > > If you put the string in quotes it should work. One string is one
> > > > > element in argv.
> > > >
> > > > some of the GRAM2 jobmanagers don't deal with that - I know condor
> > > > doesn't.
> > >
> > > Don't use the condor job manager then.
> >
> > Actually file an enhancement request with CoG to quote things when the
> > condor job manager is used explicitly. Waiting for it to be fixed in
> > GRAM 2 and then deployed on sites is silly.
> >
> > Mihael
> >
> > >
> > > >
> > > > >
> > > > > On Thu, 2007-11-08 at 12:20 -0600, Michael Wilde wrote:
> > > > > > In angle i need to pass a string of IDs as a single parameter to an
> > > > > > application:
> > > > > > angle "7171717 76 76" --more stuff --here
> > > > > >
> > > > > > Can you point me to an example of how to pass/quote this correctly, so
> > > > > > that the command line is invoked exactly as above (in this case with
> > > > > > argc=5, and argv[1]="7171717 76 76" ?
> > > > > >
> > > > > > Thanks.
> > > > > > _______________________________________________
> > > > > > Swift-user mailing list
> > > > > > Swift-user at ci.uchicago.edu
> > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > > > > >
> > > > >
> > > > > _______________________________________________
> > > > > Swift-user mailing list
> > > > > Swift-user at ci.uchicago.edu
> > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > > > >
> > > > >
> > > >
> > >
> > > _______________________________________________
> > > Swift-user mailing list
> > > Swift-user at ci.uchicago.edu
> > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> > >
> >
> >
>
From wilde at mcs.anl.gov Mon Nov 12 17:57:58 2007
From: wilde at mcs.anl.gov (Michael Wilde)
Date: Mon, 12 Nov 2007 17:57:58 -0600
Subject: [Swift-user] Questions on sites.xml entries
Message-ID: <4738E886.6050308@mcs.anl.gov>
In sites.xml:
Is storage used, or can it be set to null or some filler value?
Is major and minor used? If so there a way to determine the correct
setting for a given server, over the net?
major=2 => use pre-ws-gram
major=4 => use ws-gram
minor => ignored
correct?
From benc at hawaga.org.uk Mon Nov 12 17:59:08 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Mon, 12 Nov 2007 23:59:08 +0000 (GMT)
Subject: [Swift-user] Questions on sites.xml entries
In-Reply-To: <4738E886.6050308@mcs.anl.gov>
References: <4738E886.6050308@mcs.anl.gov>
Message-ID:
On Mon, 12 Nov 2007, Michael Wilde wrote:
> Is storage used, or can it be set to null or some filler value?
it isn't used as far as I'm aware.
> Is major and minor used? If so there a way to determine the correct setting
> for a given server, over the net?
I don't think so.
> url="tg-login1.sdsc.teragrid.org:2119/jobmanager-pbs"
> major="2" minor="4"/>
>
> major=2 => use pre-ws-gram
> major=4 => use ws-gram
> minor => ignored
>
> correct?
yes.
--
From skenny at uchicago.edu Wed Nov 14 14:11:28 2007
From: skenny at uchicago.edu (skenny at uchicago.edu)
Date: Wed, 14 Nov 2007 14:11:28 -0600 (CST)
Subject: [Swift-user] no registered callback handler
Message-ID: <20071114141128.AWA04405@m4500-02.uchicago.edu>
hi all, i'm getting this error regarless of the site that i
submit to (i've tried uc/anl and teraport). initially was
trying my own script but then tried 'hello world' and am
getting the same thing...
however, when i run my own script it does seem to get as far
as transferring the input file to the remote site; but then
fails on trying to run the actual job.
any ideas?
RunID: 20071114-1407-g84ac350
echo started
2007.11.14 14:07:21.795 CST: [ERROR] Parsing profiles on line
187 Illegal character ' 'at position 65 :Illegal character ' '
2007.11.14 14:07:21.798 CST: [ERROR] Parsing profiles on line
212 Illegal character ' 'at position 5 :Illegal character ' '
2007.11.14 14:07:21.806 CST: [ERROR] Parsing profiles on line
248 Illegal character ' 'at position 5 :Illegal character ' '
2007.11.14 14:07:21.807 CST: [ERROR] Parsing profiles on line
273 Illegal character ' 'at position 5 :Illegal character ' '
Failed to clean up job
java.lang.IllegalStateException: No registered callback
handler for org.globus.gsi.gssapi.GlobusGSSCredentialImpl at 1fc0f04
at
org.globus.cog.abstraction.impl.execution.gt2.CallbackHandlerManager.decreaseUsageCount(CallbackHandlerManager.java:33)
at
org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.cleanup(JobSubmissionTaskHandler.java:482)
at
org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:148)
at
org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:92)
at
org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:54)
at
org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:83)
at
edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431)
at
edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643)
at
edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668)
at java.lang.Thread.run(Thread.java:595)
From wilde at mcs.anl.gov Wed Nov 14 15:17:36 2007
From: wilde at mcs.anl.gov (Michael Wilde)
Date: Wed, 14 Nov 2007 15:17:36 -0600
Subject: [Swift-user] no registered callback handler
In-Reply-To: <20071114141128.AWA04405@m4500-02.uchicago.edu>
References: <20071114141128.AWA04405@m4500-02.uchicago.edu>
Message-ID: <473B65F0.9000702@mcs.anl.gov>
looks to me like possible errors in tc.data may be causing the initial
illegal char messages. if you used maxwalltime= did you put 00:20:00
values in double-quotes: "00:30" say?
not sure if this is causing the later gssapi message.
send your sites.xml and tc.data file for a closer look
On 11/14/07 2:11 PM, skenny at uchicago.edu wrote:
> hi all, i'm getting this error regarless of the site that i
> submit to (i've tried uc/anl and teraport). initially was
> trying my own script but then tried 'hello world' and am
> getting the same thing...
>
> however, when i run my own script it does seem to get as far
> as transferring the input file to the remote site; but then
> fails on trying to run the actual job.
>
> any ideas?
>
> RunID: 20071114-1407-g84ac350
> echo started
> 2007.11.14 14:07:21.795 CST: [ERROR] Parsing profiles on line
> 187 Illegal character ' 'at position 65 :Illegal character ' '
> 2007.11.14 14:07:21.798 CST: [ERROR] Parsing profiles on line
> 212 Illegal character ' 'at position 5 :Illegal character ' '
> 2007.11.14 14:07:21.806 CST: [ERROR] Parsing profiles on line
> 248 Illegal character ' 'at position 5 :Illegal character ' '
> 2007.11.14 14:07:21.807 CST: [ERROR] Parsing profiles on line
> 273 Illegal character ' 'at position 5 :Illegal character ' '
> Failed to clean up job
> java.lang.IllegalStateException: No registered callback
> handler for org.globus.gsi.gssapi.GlobusGSSCredentialImpl at 1fc0f04
> at
> org.globus.cog.abstraction.impl.execution.gt2.CallbackHandlerManager.decreaseUsageCount(CallbackHandlerManager.java:33)
> at
> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.cleanup(JobSubmissionTaskHandler.java:482)
> at
> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:148)
> at
> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:92)
> at
> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:54)
> at
> org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:83)
> at
> edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431)
> at
> edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166)
> at
> edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643)
> at
> edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668)
> at java.lang.Thread.run(Thread.java:595)
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>
>
From benc at hawaga.org.uk Wed Nov 14 16:00:03 2007
From: benc at hawaga.org.uk (Ben Clifford)
Date: Wed, 14 Nov 2007 22:00:03 +0000 (GMT)
Subject: [Swift-user] no registered callback handler
In-Reply-To: <473B65F0.9000702@mcs.anl.gov>
References: <20071114141128.AWA04405@m4500-02.uchicago.edu>
<473B65F0.9000702@mcs.anl.gov>
Message-ID:
On Wed, 14 Nov 2007, Michael Wilde wrote:
> looks to me like possible errors in tc.data may be causing the initial illegal
> char messages. if you used maxwalltime= did you put 00:20:00 values in
> double-quotes: "00:30" say?
>
> not sure if this is causing the later gssapi message.
I think they are probably unrelated.
--
From skenny at uchicago.edu Wed Nov 14 16:07:43 2007
From: skenny at uchicago.edu (skenny at uchicago.edu)
Date: Wed, 14 Nov 2007 16:07:43 -0600 (CST)
Subject: [Swift-user] no registered callback handler
Message-ID: <20071114160743.AWA25191@m4500-02.uchicago.edu>
here's my sites file for uc/anl teragrid:
/app/osg_app
/home/skenny/data
/tmp
/tmp
osg
120
ia32-compute
/home/skenny/sidgrid_out
and here is my entry in tc.data for each of the 2 scripts i'm
testing on:
ANLUCTERAGRID32 echo /bin/echo INSTALLED INTEL32::LINUX null
UCTERAPORT ffmpeg_sh
/gpfs1/osg_data/sidgrid_tools/transcode/bin/ffmpeg_sh
INSTALLED INTEL64::LINUX null
---- Original message ----
>Date: Wed, 14 Nov 2007 15:17:36 -0600
>From: Michael Wilde
>Subject: Re: [Swift-user] no registered callback handler
>To: skenny at uchicago.edu
>Cc: swift-user at ci.uchicago.edu
>
>looks to me like possible errors in tc.data may be causing
the initial
>illegal char messages. if you used maxwalltime= did you put
00:20:00
>values in double-quotes: "00:30" say?
>
>not sure if this is causing the later gssapi message.
>
>send your sites.xml and tc.data file for a closer look
>
>On 11/14/07 2:11 PM, skenny at uchicago.edu wrote:
>> hi all, i'm getting this error regarless of the site that i
>> submit to (i've tried uc/anl and teraport). initially was
>> trying my own script but then tried 'hello world' and am
>> getting the same thing...
>>
>> however, when i run my own script it does seem to get as far
>> as transferring the input file to the remote site; but then
>> fails on trying to run the actual job.
>>
>> any ideas?
>>
>> RunID: 20071114-1407-g84ac350
>> echo started
>> 2007.11.14 14:07:21.795 CST: [ERROR] Parsing profiles on line
>> 187 Illegal character ' 'at position 65 :Illegal character ' '
>> 2007.11.14 14:07:21.798 CST: [ERROR] Parsing profiles on line
>> 212 Illegal character ' 'at position 5 :Illegal character ' '
>> 2007.11.14 14:07:21.806 CST: [ERROR] Parsing profiles on line
>> 248 Illegal character ' 'at position 5 :Illegal character ' '
>> 2007.11.14 14:07:21.807 CST: [ERROR] Parsing profiles on line
>> 273 Illegal character ' 'at position 5 :Illegal character ' '
>> Failed to clean up job
>> java.lang.IllegalStateException: No registered callback
>> handler for
org.globus.gsi.gssapi.GlobusGSSCredentialImpl at 1fc0f04
>> at
>>
org.globus.cog.abstraction.impl.execution.gt2.CallbackHandlerManager.decreaseUsageCount(CallbackHandlerManager.java:33)
>> at
>>
org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.cleanup(JobSubmissionTaskHandler.java:482)
>> at
>>
org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:148)
>> at
>>
org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:92)
>> at
>>
org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:54)
>> at
>>
org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:83)
>> at
>>
edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431)
>> at
>>
edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166)
>> at
>>
edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643)
>> at
>>
edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668)
>> at java.lang.Thread.run(Thread.java:595)
>> _______________________________________________
>> Swift-user mailing list
>> Swift-user at ci.uchicago.edu
>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>
>>