From chad at uchicago.edu Thu Nov 1 18:33:29 2007 From: chad at uchicago.edu (Chad Glendenin) Date: Thu, 1 Nov 2007 18:33:29 -0500 Subject: [Swift-user] Running on teraport Message-ID: <7D0F4CC1-073B-4A29-B962-A52CB7AA9557@uchicago.edu> I just got an account on teraport, and I'm trying to see if I can run a Swift 0.3 workflow from my laptop to teraport, but it's not working. Right now, I'm just trying to run 'hostname' to verify that it's running in the right place. I added this line to tc.data: teraport hostname /bin/hostname INSTALLED INTEL32::LINUX null but with tabs instead of spaces. In sites.xml, I just uncommented the teraport entry and changed the storage and work directories from Tibi's home directory to my own, like this: /home/chad/tmp/swift The script is basically the same as "hello world," but with 'hostname' instead of 'echo'. When I try to run it, I get the following: Execution failed: Missing argument minor for sys:element(url, storage, major, minor, patch) Is that a problem with the sites.xml entry? What am I forgetting? Thanks, ccg From hategan at mcs.anl.gov Thu Nov 1 18:38:01 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 01 Nov 2007 18:38:01 -0500 Subject: [Swift-user] Running on teraport In-Reply-To: <7D0F4CC1-073B-4A29-B962-A52CB7AA9557@uchicago.edu> References: <7D0F4CC1-073B-4A29-B962-A52CB7AA9557@uchicago.edu> Message-ID: <1193960281.9812.9.camel@blabla.mcs.anl.gov> Chad, Paste the whole sites.xml here or run with debug (-d) and paste the output. Or both. Mihael On Thu, 2007-11-01 at 18:33 -0500, Chad Glendenin wrote: > I just got an account on teraport, and I'm trying to see if I can run > a Swift 0.3 workflow from my laptop to teraport, but it's not > working. Right now, I'm just trying to run 'hostname' to verify that > it's running in the right place. I added this line to tc.data: > > teraport hostname /bin/hostname INSTALLED INTEL32::LINUX null > > but with tabs instead of spaces. > > In sites.xml, I just uncommented the teraport entry and changed the > storage and work directories from Tibi's home directory to my own, > like this: > > > > > > /home/chad/tmp/swift > > > The script is basically the same as "hello world," but with > 'hostname' instead of 'echo'. > > When I try to run it, I get the following: > > Execution failed: > Missing argument minor for sys:element(url, storage, major, > minor, patch) > > Is that a problem with the sites.xml entry? What am I forgetting? > > Thanks, > ccg > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From chad at uchicago.edu Thu Nov 1 19:43:24 2007 From: chad at uchicago.edu (Chad Glendenin) Date: Thu, 1 Nov 2007 19:43:24 -0500 Subject: [Swift-user] Running on teraport In-Reply-To: <1193960281.9812.9.camel@blabla.mcs.anl.gov> References: <7D0F4CC1-073B-4A29-B962-A52CB7AA9557@uchicago.edu> <1193960281.9812.9.camel@blabla.mcs.anl.gov> Message-ID: Thanks for pointing out -d. That told me the line number of the problem, which just turned out to be an easily fixed typo. ccg On Nov 1, 2007, at 6:38 PM, Mihael Hategan wrote: > Chad, > > Paste the whole sites.xml here or run with debug (-d) and paste the > output. Or both. > > Mihael > > On Thu, 2007-11-01 at 18:33 -0500, Chad Glendenin wrote: >> I just got an account on teraport, and I'm trying to see if I can run >> a Swift 0.3 workflow from my laptop to teraport, but it's not >> working. Right now, I'm just trying to run 'hostname' to verify that >> it's running in the right place. I added this line to tc.data: >> >> teraport hostname /bin/hostname INSTALLED INTEL32::LINUX null >> >> but with tabs instead of spaces. >> >> In sites.xml, I just uncommented the teraport entry and changed the >> storage and work directories from Tibi's home directory to my own, >> like this: >> >> >> >> >> >> /home/chad/tmp/swift >> >> >> The script is basically the same as "hello world," but with >> 'hostname' instead of 'echo'. >> >> When I try to run it, I get the following: >> >> Execution failed: >> Missing argument minor for sys:element(url, storage, major, >> minor, patch) >> >> Is that a problem with the sites.xml entry? What am I forgetting? >> >> Thanks, >> ccg >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> > From deng at mcs.anl.gov Tue Nov 6 11:46:56 2007 From: deng at mcs.anl.gov (Yuqing Deng) Date: Tue, 6 Nov 2007 11:46:56 -0600 Subject: [Swift-user] Job bundles Message-ID: Hi, I am using swift to run workflow on login-abe.ncsa.teragrid.org at ncsa. Abe is allocated on node basis. Each of the node has 8 computing cores. My jobs are all serial. What happens is that only one jobs runs on one core per node. It there a way to bundle jobs so that 8 of them could run simultaneously on a node? I have tried to use 8 1 in the sites.xml file. But doing that seems to run the same job eight times on a node. Thanks, Yuqing From iraicu at cs.uchicago.edu Tue Nov 6 11:53:30 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 06 Nov 2007 11:53:30 -0600 Subject: [Swift-user] Job bundles In-Reply-To: References: Message-ID: <4730AA1A.2020809@cs.uchicago.edu> Hi, I had a similar discussion with Nika, and here was the summary of what I told her (in the context of GRAM4): GRAM4 has two elements that dictate how many nodes and processes it gets (in the XML RSL). processCount hostCount The is the number of nodes you want, say 1. The is the number of processes that you want to start in total. To compute the number of processes per node, you would simply take processCount/hostCount. Now, the catch is that all the commands and argument to the particular GRAM4 call will have to be the same, so you won't be able to specify to GRAM that you want 8 processes per node say, but to run a different process for each of those 8. You will have to have this kind of logic internally in your app. If Swift works the way I think it works, I don't think you will be able to use multiple processors unless at least one of the following is true: 1) the application is already multi-threaded, and implicitly can use multiple cores 2) the LRM allows the partitioning of the SMP machine into smaller pieces; for example, with 8 processor node, if it lets you submit 8 jobs that only need 1 processor, and it will launch 8 different jobs on the same node, then you are fine... the parallelism will be done automatically by the LRM, as long as you ask for only 1 process at a time; on the TG at least, I don't think this is how things work, and when you get a node, regardless of how many processors it has, you get full access to all processors, not just the ones you asked for. 3) the bundling component in Swift somehow should be able to control how many concurrent jobs it should perform; by default, I suppose it serializes the entire bundle, but you could imagine having a parameter that allows you to increase the parallelism if you know the application is not CPU bound for example Choices #1 and #2 are the easiest, as you don't have to do anything special from Swift's point of view. Choice #3 requires that Swift handle the parallelism. GRAM4 as far as I know will not handle this. Maybe the Swift team can shed more light on option #3, if there is such an option. Ioan Yuqing Deng wrote: > Hi, > > I am using swift to run workflow on login-abe.ncsa.teragrid.org at > ncsa. Abe is allocated on node basis. Each of the node has 8 > computing cores. My jobs are all serial. What happens is that only > one jobs runs on one core per node. It there a way to bundle jobs so > that 8 of them > could run simultaneously on a node? I have tried to use > 8 > 1 > in the sites.xml file. But doing that seems to run the same job eight > times on a node. > > Thanks, > > Yuqing > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From benc at hawaga.org.uk Tue Nov 6 12:15:06 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 6 Nov 2007 18:15:06 +0000 (GMT) Subject: [Swift-user] Job bundles In-Reply-To: <4730AA1A.2020809@cs.uchicago.edu> References: <4730AA1A.2020809@cs.uchicago.edu> Message-ID: On Tue, 6 Nov 2007, Ioan Raicu wrote: > 2) the LRM allows the partitioning of the SMP machine into smaller pieces; for > example, with 8 processor node, if it lets you submit 8 jobs that only need 1 > processor, and it will launch 8 different jobs on the same node, then you are > fine... the parallelism will be done automatically by the LRM, as long as you > ask for only 1 process at a time; on the TG at least, I don't think this is > how things work, and when you get a node, regardless of how many processors it > has, you get full access to all processors, not just the ones you asked for. PBS allows the specification of multiple processes per node, like this (grabbed from google) > qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs It looks like abe runs PBS. So I think you could specify a globus profile key in the sites.xml, perhaps something like this: 8 I haven't tried this myself, but I'd be interested to hear your results. -- From iraicu at cs.uchicago.edu Tue Nov 6 12:26:02 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 06 Nov 2007 12:26:02 -0600 Subject: [Swift-user] Job bundles In-Reply-To: References: <4730AA1A.2020809@cs.uchicago.edu> Message-ID: <4730B1BA.4000404@cs.uchicago.edu> Right, its not that PBS doesn't support it, its more of a policy thing. On the TeraGrid, my experience has been that when PBS (or whatever LRM is being used) allocates CPUs, it always allocates at the machine level, not at the CPU level. That means, if you have an 8 processor machine, and you get 1 processor on that machine, then you get (and are charged for) the whole machine as you have exclusive rights to this machine for the duration of your reservation. I have seen this behave differently in other environments, such as TeraPort, where PBS was allocating at the processor level, and not the machine level. This is why I said that I think Swift would need to somehow handle this at the worker node scripts, and not rely necessarily on the LRM doing this. Ioan Ben Clifford wrote: > On Tue, 6 Nov 2007, Ioan Raicu wrote: > > >> 2) the LRM allows the partitioning of the SMP machine into smaller pieces; for >> example, with 8 processor node, if it lets you submit 8 jobs that only need 1 >> processor, and it will launch 8 different jobs on the same node, then you are >> fine... the parallelism will be done automatically by the LRM, as long as you >> ask for only 1 process at a time; on the TG at least, I don't think this is >> how things work, and when you get a node, regardless of how many processors it >> has, you get full access to all processors, not just the ones you asked for. >> > > > PBS allows the specification of multiple processes per node, like this > (grabbed from google) > > >> qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs >> > > It looks like abe runs PBS. > > So I think you could specify a globus profile key in the sites.xml, > perhaps something like this: > > 8 > > I haven't tried this myself, but I'd be interested to hear your results. > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Tue Nov 6 12:29:53 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 6 Nov 2007 18:29:53 +0000 (GMT) Subject: [Swift-user] Job bundles In-Reply-To: <4730B1BA.4000404@cs.uchicago.edu> References: <4730AA1A.2020809@cs.uchicago.edu> <4730B1BA.4000404@cs.uchicago.edu> Message-ID: That's what the ppn parameter specifies to PBS. On Tue, 6 Nov 2007, Ioan Raicu wrote: > Right, its not that PBS doesn't support it, its more of a policy thing. On > the TeraGrid, my experience has been that when PBS (or whatever LRM is being > used) allocates CPUs, it always allocates at the machine level, not at the CPU > level. That means, if you have an 8 processor machine, and you get 1 > processor on that machine, then you get (and are charged for) the whole > machine as you have exclusive rights to this machine for the duration of your > reservation. I have seen this behave differently in other environments, such > as TeraPort, where PBS was allocating at the processor level, and not the > machine level. This is why I said that I think Swift would need to somehow > handle this at the worker node scripts, and not rely necessarily on the LRM > doing this. > Ioan > > Ben Clifford wrote: > > On Tue, 6 Nov 2007, Ioan Raicu wrote: > > > > > > > 2) the LRM allows the partitioning of the SMP machine into smaller pieces; > > > for > > > example, with 8 processor node, if it lets you submit 8 jobs that only > > > need 1 > > > processor, and it will launch 8 different jobs on the same node, then you > > > are > > > fine... the parallelism will be done automatically by the LRM, as long as > > > you > > > ask for only 1 process at a time; on the TG at least, I don't think this > > > is > > > how things work, and when you get a node, regardless of how many > > > processors it > > > has, you get full access to all processors, not just the ones you asked > > > for. > > > > > > > > > PBS allows the specification of multiple processes per node, like this > > (grabbed from google) > > > > > > > qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs > > > > > > > It looks like abe runs PBS. > > > > So I think you could specify a globus profile key in the sites.xml, perhaps > > something like this: > > > > 8 > > > > I haven't tried this myself, but I'd be interested to hear your results. > > > > From iraicu at cs.uchicago.edu Tue Nov 6 12:36:48 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 06 Nov 2007 12:36:48 -0600 Subject: [Swift-user] Job bundles In-Reply-To: References: <4730AA1A.2020809@cs.uchicago.edu> <4730B1BA.4000404@cs.uchicago.edu> Message-ID: <4730B440.8040602@cs.uchicago.edu> Here is what I get at the UC/ANL TG site: qsub -I -l nodes=1:ppn=1:ia32-compute,walltime=0:30:00 -A TG-CCR070008T qsub -I -l nodes=1:ppn=2:ia32-compute,walltime=0:30:00 -A TG-CCR070008T iraicu at tg-viz-login2:~> showq -u iraicu active jobs------------------------ JOBID USERNAME STATE PROCS REMAINING STARTTIME 1574623 iraicu Running 2 00:29:55 Tue Nov 6 12:34:23 1574621 iraicu Running 2 00:29:21 Tue Nov 6 12:33:49 2 active jobs 4 of 242 processors in use by local jobs (1.65%) 20 of 121 nodes active (16.53%) eligible jobs---------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 eligible jobs blocked jobs----------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 blocked jobs Total jobs: 2 Notice that both jobs have 2 processors allocated! These same commands on TeraPort would have yielded one allocation with 1 processor and another with 2 processors. This is what I meant by "it a policy thing", because PBS can be configured to ignore the ppn field. Ioan Ben Clifford wrote: > That's what the ppn parameter specifies to PBS. > > On Tue, 6 Nov 2007, Ioan Raicu wrote: > > >> Right, its not that PBS doesn't support it, its more of a policy thing. On >> the TeraGrid, my experience has been that when PBS (or whatever LRM is being >> used) allocates CPUs, it always allocates at the machine level, not at the CPU >> level. That means, if you have an 8 processor machine, and you get 1 >> processor on that machine, then you get (and are charged for) the whole >> machine as you have exclusive rights to this machine for the duration of your >> reservation. I have seen this behave differently in other environments, such >> as TeraPort, where PBS was allocating at the processor level, and not the >> machine level. This is why I said that I think Swift would need to somehow >> handle this at the worker node scripts, and not rely necessarily on the LRM >> doing this. >> Ioan >> >> Ben Clifford wrote: >> >>> On Tue, 6 Nov 2007, Ioan Raicu wrote: >>> >>> >>> >>>> 2) the LRM allows the partitioning of the SMP machine into smaller pieces; >>>> for >>>> example, with 8 processor node, if it lets you submit 8 jobs that only >>>> need 1 >>>> processor, and it will launch 8 different jobs on the same node, then you >>>> are >>>> fine... the parallelism will be done automatically by the LRM, as long as >>>> you >>>> ask for only 1 process at a time; on the TG at least, I don't think this >>>> is >>>> how things work, and when you get a node, regardless of how many >>>> processors it >>>> has, you get full access to all processors, not just the ones you asked >>>> for. >>>> >>>> >>> PBS allows the specification of multiple processes per node, like this >>> (grabbed from google) >>> >>> >>> >>>> qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs >>>> >>>> >>> It looks like abe runs PBS. >>> >>> So I think you could specify a globus profile key in the sites.xml, perhaps >>> something like this: >>> >>> 8 >>> >>> I haven't tried this myself, but I'd be interested to hear your results. >>> >>> >> > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Tue Nov 6 12:57:46 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 6 Nov 2007 18:57:46 +0000 (GMT) Subject: [Swift-user] Job bundles In-Reply-To: <4730B440.8040602@cs.uchicago.edu> References: <4730AA1A.2020809@cs.uchicago.edu> <4730B1BA.4000404@cs.uchicago.edu> <4730B440.8040602@cs.uchicago.edu> Message-ID: yeah, I see same. though the TG UC docs suggest it should work. I can't log into abe to see what happens there but it would be interesting to know. On Tue, 6 Nov 2007, Ioan Raicu wrote: > Here is what I get at the UC/ANL TG site: > qsub -I -l nodes=1:ppn=1:ia32-compute,walltime=0:30:00 -A TG-CCR070008T > qsub -I -l nodes=1:ppn=2:ia32-compute,walltime=0:30:00 -A TG-CCR070008T > > iraicu at tg-viz-login2:~> showq -u iraicu > > active jobs------------------------ > JOBID USERNAME STATE PROCS REMAINING STARTTIME > > 1574623 iraicu Running 2 00:29:55 Tue Nov 6 12:34:23 > 1574621 iraicu Running 2 00:29:21 Tue Nov 6 12:33:49 > > 2 active jobs 4 of 242 processors in use by local jobs (1.65%) > 20 of 121 nodes active (16.53%) > > eligible jobs---------------------- > JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME > > > 0 eligible jobs > blocked jobs----------------------- > JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME > > > 0 blocked jobs > Total jobs: 2 > > Notice that both jobs have 2 processors allocated! These same commands on > TeraPort would have yielded one allocation with 1 processor and another with 2 > processors. This is what I meant by "it a policy thing", because PBS can be > configured to ignore the ppn field. > > Ioan > > Ben Clifford wrote: > > That's what the ppn parameter specifies to PBS. > > > > On Tue, 6 Nov 2007, Ioan Raicu wrote: > > > > > > > Right, its not that PBS doesn't support it, its more of a policy thing. > > > On > > > the TeraGrid, my experience has been that when PBS (or whatever LRM is > > > being > > > used) allocates CPUs, it always allocates at the machine level, not at the > > > CPU > > > level. That means, if you have an 8 processor machine, and you get 1 > > > processor on that machine, then you get (and are charged for) the whole > > > machine as you have exclusive rights to this machine for the duration of > > > your > > > reservation. I have seen this behave differently in other environments, > > > such > > > as TeraPort, where PBS was allocating at the processor level, and not the > > > machine level. This is why I said that I think Swift would need to > > > somehow > > > handle this at the worker node scripts, and not rely necessarily on the > > > LRM > > > doing this. Ioan > > > > > > Ben Clifford wrote: > > > > > > > On Tue, 6 Nov 2007, Ioan Raicu wrote: > > > > > > > > > > > > > 2) the LRM allows the partitioning of the SMP machine into smaller > > > > > pieces; > > > > > for > > > > > example, with 8 processor node, if it lets you submit 8 jobs that only > > > > > need 1 > > > > > processor, and it will launch 8 different jobs on the same node, then > > > > > you > > > > > are > > > > > fine... the parallelism will be done automatically by the LRM, as long > > > > > as > > > > > you > > > > > ask for only 1 process at a time; on the TG at least, I don't think > > > > > this > > > > > is > > > > > how things work, and when you get a node, regardless of how many > > > > > processors it > > > > > has, you get full access to all processors, not just the ones you > > > > > asked > > > > > for. > > > > > > > > > PBS allows the specification of multiple processes per node, like this > > > > (grabbed from google) > > > > > > > > > > > > > qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs > > > > > > > > > It looks like abe runs PBS. > > > > > > > > So I think you could specify a globus profile key in the sites.xml, > > > > perhaps > > > > something like this: > > > > > > > > 8 > > > > > > > > I haven't tried this myself, but I'd be interested to hear your results. > > > > > > > > > > > > > From iraicu at cs.uchicago.edu Tue Nov 6 13:10:16 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 06 Nov 2007 13:10:16 -0600 Subject: [Swift-user] Job bundles In-Reply-To: References: <4730AA1A.2020809@cs.uchicago.edu> <4730B1BA.4000404@cs.uchicago.edu> <4730B440.8040602@cs.uchicago.edu> Message-ID: <4730BC18.30204@cs.uchicago.edu> If the docs say that PBS should support this option, maybe write help at tg to ask them why it doesn't work as the docs say. Ioan Ben Clifford wrote: > yeah, I see same. though the TG UC docs suggest it should work. > > I can't log into abe to see what happens there but it would be interesting > to know. > > On Tue, 6 Nov 2007, Ioan Raicu wrote: > > >> Here is what I get at the UC/ANL TG site: >> qsub -I -l nodes=1:ppn=1:ia32-compute,walltime=0:30:00 -A TG-CCR070008T >> qsub -I -l nodes=1:ppn=2:ia32-compute,walltime=0:30:00 -A TG-CCR070008T >> >> iraicu at tg-viz-login2:~> showq -u iraicu >> >> active jobs------------------------ >> JOBID USERNAME STATE PROCS REMAINING STARTTIME >> >> 1574623 iraicu Running 2 00:29:55 Tue Nov 6 12:34:23 >> 1574621 iraicu Running 2 00:29:21 Tue Nov 6 12:33:49 >> >> 2 active jobs 4 of 242 processors in use by local jobs (1.65%) >> 20 of 121 nodes active (16.53%) >> >> eligible jobs---------------------- >> JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME >> >> >> 0 eligible jobs >> blocked jobs----------------------- >> JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME >> >> >> 0 blocked jobs >> Total jobs: 2 >> >> Notice that both jobs have 2 processors allocated! These same commands on >> TeraPort would have yielded one allocation with 1 processor and another with 2 >> processors. This is what I meant by "it a policy thing", because PBS can be >> configured to ignore the ppn field. >> >> Ioan >> >> Ben Clifford wrote: >> >>> That's what the ppn parameter specifies to PBS. >>> >>> On Tue, 6 Nov 2007, Ioan Raicu wrote: >>> >>> >>> >>>> Right, its not that PBS doesn't support it, its more of a policy thing. >>>> On >>>> the TeraGrid, my experience has been that when PBS (or whatever LRM is >>>> being >>>> used) allocates CPUs, it always allocates at the machine level, not at the >>>> CPU >>>> level. That means, if you have an 8 processor machine, and you get 1 >>>> processor on that machine, then you get (and are charged for) the whole >>>> machine as you have exclusive rights to this machine for the duration of >>>> your >>>> reservation. I have seen this behave differently in other environments, >>>> such >>>> as TeraPort, where PBS was allocating at the processor level, and not the >>>> machine level. This is why I said that I think Swift would need to >>>> somehow >>>> handle this at the worker node scripts, and not rely necessarily on the >>>> LRM >>>> doing this. Ioan >>>> >>>> Ben Clifford wrote: >>>> >>>> >>>>> On Tue, 6 Nov 2007, Ioan Raicu wrote: >>>>> >>>>> >>>>> >>>>>> 2) the LRM allows the partitioning of the SMP machine into smaller >>>>>> pieces; >>>>>> for >>>>>> example, with 8 processor node, if it lets you submit 8 jobs that only >>>>>> need 1 >>>>>> processor, and it will launch 8 different jobs on the same node, then >>>>>> you >>>>>> are >>>>>> fine... the parallelism will be done automatically by the LRM, as long >>>>>> as >>>>>> you >>>>>> ask for only 1 process at a time; on the TG at least, I don't think >>>>>> this >>>>>> is >>>>>> how things work, and when you get a node, regardless of how many >>>>>> processors it >>>>>> has, you get full access to all processors, not just the ones you >>>>>> asked >>>>>> for. >>>>>> >>>>>> >>>>> PBS allows the specification of multiple processes per node, like this >>>>> (grabbed from google) >>>>> >>>>> >>>>> >>>>>> qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs >>>>>> >>>>>> >>>>> It looks like abe runs PBS. >>>>> >>>>> So I think you could specify a globus profile key in the sites.xml, >>>>> perhaps >>>>> something like this: >>>>> >>>>> 8 >>>>> >>>>> I haven't tried this myself, but I'd be interested to hear your results. >>>>> >>>>> >>>> >>>> >>> >>> >> > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From deng at mcs.anl.gov Tue Nov 6 13:35:46 2007 From: deng at mcs.anl.gov (Yuqing Deng) Date: Tue, 6 Nov 2007 13:35:46 -0600 Subject: [Swift-user] Job bundles In-Reply-To: References: <4730AA1A.2020809@cs.uchicago.edu> Message-ID: On 11/6/07, Ben Clifford wrote: > > > On Tue, 6 Nov 2007, Ioan Raicu wrote: > > > 2) the LRM allows the partitioning of the SMP machine into smaller pieces; for > > example, with 8 processor node, if it lets you submit 8 jobs that only need 1 > > processor, and it will launch 8 different jobs on the same node, then you are > > fine... the parallelism will be done automatically by the LRM, as long as you > > ask for only 1 process at a time; on the TG at least, I don't think this is > > how things work, and when you get a node, regardless of how many processors it > > has, you get full access to all processors, not just the ones you asked for. > > > PBS allows the specification of multiple processes per node, like this > (grabbed from google) > > > qsub -l walltime=15:00,nodes=1:ppn=1 script.pbs > > It looks like abe runs PBS. > I just tried it. The jobs are scheduled on different nodes on abe. > So I think you could specify a globus profile key in the sites.xml, > perhaps something like this: > > 8 > I tried last Wednesday and got some really strange error message. I think the correct way to set ppn number is to use the count rsl key word. Yuqing From deng at mcs.anl.gov Tue Nov 6 15:26:26 2007 From: deng at mcs.anl.gov (Yuqing Deng) Date: Tue, 6 Nov 2007 15:26:26 -0600 Subject: [Swift-user] problem with purdue condor pool Message-ID: Hi, here's another problem: The following command works at purdue: globusrun -o -r tg-gatekeeper.purdue.teragrid.org/jobmanager-condor '&(executable=/bin/hostname) (jobtype=single)' However, when I use swift with the following in site.xml I get this (full error in attachment): MolDyn-1-loops-20071106-1520-gzzclbaa Caused by: Cannot submit job: The AXIS engine could not find a target service to invoke! targetService is null Swift finished - workflow had errors The same thing happens with jobmanager-pbs at purdue. Only jobmanager-fork works with swift, but there is no problem with globusrun. Thanks, Yuqing -------------- next part -------------- A non-text attachment was scrubbed... Name: error Type: application/octet-stream Size: 17491 bytes Desc: not available URL: From benc at hawaga.org.uk Tue Nov 6 15:52:38 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 6 Nov 2007 21:52:38 +0000 (GMT) Subject: [Swift-user] problem with purdue condor pool In-Reply-To: References: Message-ID: On Tue, 6 Nov 2007, Yuqing Deng wrote: > The following command works at purdue: > globusrun -o -r tg-gatekeeper.purdue.teragrid.org/jobmanager-condor > '&(executable=/bin/hostname) (jobtype=single)' That is using GRAM version 2. > However, when I use swift with the following in site.xml > url="tg-gatekeeper.purdue.teragrid.org/jobmanager-condor" major="4" > minor="0" /> That tells it to use GRAM v4. That is a totally different piece of software. Change major=4 to major=2 to tell it to use GRAM v2. -- From deng at mcs.anl.gov Wed Nov 7 10:36:48 2007 From: deng at mcs.anl.gov (Yuqing Deng) Date: Wed, 7 Nov 2007 10:36:48 -0600 Subject: [Swift-user] problem with purdue condor pool In-Reply-To: References: Message-ID: Thanks. GRAM version 2 works with purdue condor pool. They have three major different kinds machines in the pool: Linux/X86_64, Linux/X86_32 and WINNT51/INTEL. I only tested Linux/X86_64 but Linux/X86_32 should work too. How do I use them all? Just give a different entry in to tc.data file for apps built with different OS/ARCH? For WINNT, I need to build the apps first. Yuqing On 11/6/07, Ben Clifford wrote: > > On Tue, 6 Nov 2007, Yuqing Deng wrote: > > The following command works at purdue: > > globusrun -o -r tg-gatekeeper.purdue.teragrid.org/jobmanager-condor > > '&(executable=/bin/hostname) (jobtype=single)' > > That is using GRAM version 2. > > > However, when I use swift with the following in site.xml > > > url="tg-gatekeeper.purdue.teragrid.org/jobmanager-condor" major="4" > > minor="0" /> > > That tells it to use GRAM v4. That is a totally different piece of > software. > > Change major=4 to major=2 to tell it to use GRAM v2. > > -- > > From benc at hawaga.org.uk Wed Nov 7 11:20:24 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 7 Nov 2007 17:20:24 +0000 (GMT) Subject: [Swift-user] problem with purdue condor pool In-Reply-To: References: Message-ID: On Wed, 7 Nov 2007, Yuqing Deng wrote: > They have three major different kinds machines in the pool: > Linux/X86_64, Linux/X86_32 and WINNT51/INTEL. > I only tested Linux/X86_64 but Linux/X86_32 should work too. > How do I use them all? Just give a different entry in to tc.data file > for apps built with different OS/ARCH? > > For WINNT, I need to build the apps first. I think that WINNT won't work because there is no Windows version of our worker script (libexec/wrapper.sh) For the linux machines, there are two different ways you could go: i) multiple-site model: define two sites, one called purdue-64 and one called purdue-32. for purdue-64, specify a profile entry in sites.xml that restricts that site to only 64 bit nodes; and for purdue-32, specify a profile entry that restricts that site to only 32 bit nodes. See http://www.purdue.teragrid.org/content/view/11/25/ I think maybe something like: then specify tc.data entries with 64bit binaries for purdue-64 and 32 bit binaries for purdue-32. Swift will treat the 64bit and 32bit pieces of the condor pool as two separate sites. This will have disadvantages for rate control and file staging but will allow you to have some executables compiled for only 64 bit, some for only 32 bit and some for both. ii) one site model: Make a script to replace each application. For example, replace myapp with a script that looks at the architecture and chooses which executable to run on. Then point your tc.data file at that script, instead of at your application code. Swift will send jobs to the condor pool, and when a job starts running on a particular worker node, the script will decide which is the correct executable to use. This is better for file staging and site scoring, because it treats the site as one site; but means that you have to have each application available in both 32 and 64 bits; that may or may not be a problem, depending on what you are running. Here's an example script I got from mike's home directory for choosing between two programs based on architecture: #!/bin/sh ARCH_TEST=`uname -a | grep -c ia64` if [ $ARCH_TEST -eq 0 ]; then # i686 /home/wilde/pegasus/src/tools/kickstart/kickstart.i686 $* elif [ $ARCH_TEST -eq 1 ]; then # ia64 /home/wilde/pegasus/src/tools/kickstart/kickstart.ia64 $* fi -- From wilde at mcs.anl.gov Thu Nov 8 12:20:28 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 08 Nov 2007 12:20:28 -0600 Subject: [Swift-user] Passing strings with blanks to apps in swift Message-ID: <4733536C.3080701@mcs.anl.gov> In angle i need to pass a string of IDs as a single parameter to an application: angle "7171717 76 76" --more stuff --here Can you point me to an example of how to pass/quote this correctly, so that the command line is invoked exactly as above (in this case with argc=5, and argv[1]="7171717 76 76" ? Thanks. From benc at hawaga.org.uk Thu Nov 8 12:29:29 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 8 Nov 2007 18:29:29 +0000 (GMT) Subject: [Swift-user] Passing strings with blanks to apps in swift In-Reply-To: <4733536C.3080701@mcs.anl.gov> References: <4733536C.3080701@mcs.anl.gov> Message-ID: My initial answer would be "don't do that". Almost anything in unix gets screwed by spaces. I know there are definitely problems in some places in the swift/globus stack with doing this. On Thu, 8 Nov 2007, Michael Wilde wrote: > In angle i need to pass a string of IDs as a single parameter to an > application: > angle "7171717 76 76" --more stuff --here > > Can you point me to an example of how to pass/quote this correctly, so that > the command line is invoked exactly as above (in this case with argc=5, and > argv[1]="7171717 76 76" ? > > Thanks. > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From hategan at mcs.anl.gov Thu Nov 8 12:43:08 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 08 Nov 2007 12:43:08 -0600 Subject: [Swift-user] Passing strings with blanks to apps in swift In-Reply-To: <4733536C.3080701@mcs.anl.gov> References: <4733536C.3080701@mcs.anl.gov> Message-ID: <1194547388.11817.0.camel@blabla.mcs.anl.gov> If you put the string in quotes it should work. One string is one element in argv. On Thu, 2007-11-08 at 12:20 -0600, Michael Wilde wrote: > In angle i need to pass a string of IDs as a single parameter to an > application: > angle "7171717 76 76" --more stuff --here > > Can you point me to an example of how to pass/quote this correctly, so > that the command line is invoked exactly as above (in this case with > argc=5, and argv[1]="7171717 76 76" ? > > Thanks. > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From benc at hawaga.org.uk Thu Nov 8 12:56:46 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 8 Nov 2007 18:56:46 +0000 (GMT) Subject: [Swift-user] Passing strings with blanks to apps in swift In-Reply-To: <1194547388.11817.0.camel@blabla.mcs.anl.gov> References: <4733536C.3080701@mcs.anl.gov> <1194547388.11817.0.camel@blabla.mcs.anl.gov> Message-ID: On Thu, 8 Nov 2007, Mihael Hategan wrote: > If you put the string in quotes it should work. One string is one > element in argv. some of the GRAM2 jobmanagers don't deal with that - I know condor doesn't. > > On Thu, 2007-11-08 at 12:20 -0600, Michael Wilde wrote: > > In angle i need to pass a string of IDs as a single parameter to an > > application: > > angle "7171717 76 76" --more stuff --here > > > > Can you point me to an example of how to pass/quote this correctly, so > > that the command line is invoked exactly as above (in this case with > > argc=5, and argv[1]="7171717 76 76" ? > > > > Thanks. > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From hategan at mcs.anl.gov Thu Nov 8 13:03:10 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 08 Nov 2007 13:03:10 -0600 Subject: [Swift-user] Passing strings with blanks to apps in swift In-Reply-To: References: <4733536C.3080701@mcs.anl.gov> <1194547388.11817.0.camel@blabla.mcs.anl.gov> Message-ID: <1194548590.13231.0.camel@blabla.mcs.anl.gov> On Thu, 2007-11-08 at 18:56 +0000, Ben Clifford wrote: > > On Thu, 8 Nov 2007, Mihael Hategan wrote: > > > If you put the string in quotes it should work. One string is one > > element in argv. > > some of the GRAM2 jobmanagers don't deal with that - I know condor > doesn't. Don't use the condor job manager then. > > > > > On Thu, 2007-11-08 at 12:20 -0600, Michael Wilde wrote: > > > In angle i need to pass a string of IDs as a single parameter to an > > > application: > > > angle "7171717 76 76" --more stuff --here > > > > > > Can you point me to an example of how to pass/quote this correctly, so > > > that the command line is invoked exactly as above (in this case with > > > argc=5, and argv[1]="7171717 76 76" ? > > > > > > Thanks. > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > From hategan at mcs.anl.gov Thu Nov 8 13:06:16 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 08 Nov 2007 13:06:16 -0600 Subject: [Swift-user] Passing strings with blanks to apps in swift In-Reply-To: <1194548590.13231.0.camel@blabla.mcs.anl.gov> References: <4733536C.3080701@mcs.anl.gov> <1194547388.11817.0.camel@blabla.mcs.anl.gov> <1194548590.13231.0.camel@blabla.mcs.anl.gov> Message-ID: <1194548776.13602.1.camel@blabla.mcs.anl.gov> On Thu, 2007-11-08 at 13:03 -0600, Mihael Hategan wrote: > On Thu, 2007-11-08 at 18:56 +0000, Ben Clifford wrote: > > > > On Thu, 8 Nov 2007, Mihael Hategan wrote: > > > > > If you put the string in quotes it should work. One string is one > > > element in argv. > > > > some of the GRAM2 jobmanagers don't deal with that - I know condor > > doesn't. > > Don't use the condor job manager then. Actually file an enhancement request with CoG to quote things when the condor job manager is used explicitly. Waiting for it to be fixed in GRAM 2 and then deployed on sites is silly. Mihael > > > > > > > > > On Thu, 2007-11-08 at 12:20 -0600, Michael Wilde wrote: > > > > In angle i need to pass a string of IDs as a single parameter to an > > > > application: > > > > angle "7171717 76 76" --more stuff --here > > > > > > > > Can you point me to an example of how to pass/quote this correctly, so > > > > that the command line is invoked exactly as above (in this case with > > > > argc=5, and argv[1]="7171717 76 76" ? > > > > > > > > Thanks. > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From benc at hawaga.org.uk Thu Nov 8 13:08:43 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 8 Nov 2007 19:08:43 +0000 (GMT) Subject: [Swift-user] Passing strings with blanks to apps in swift In-Reply-To: <1194548776.13602.1.camel@blabla.mcs.anl.gov> References: <4733536C.3080701@mcs.anl.gov> <1194547388.11817.0.camel@blabla.mcs.anl.gov> <1194548590.13231.0.camel@blabla.mcs.anl.gov> <1194548776.13602.1.camel@blabla.mcs.anl.gov> Message-ID: or don't use strings, just like any other time in unix. On Thu, 8 Nov 2007, Mihael Hategan wrote: > On Thu, 2007-11-08 at 13:03 -0600, Mihael Hategan wrote: > > On Thu, 2007-11-08 at 18:56 +0000, Ben Clifford wrote: > > > > > > On Thu, 8 Nov 2007, Mihael Hategan wrote: > > > > > > > If you put the string in quotes it should work. One string is one > > > > element in argv. > > > > > > some of the GRAM2 jobmanagers don't deal with that - I know condor > > > doesn't. > > > > Don't use the condor job manager then. > > Actually file an enhancement request with CoG to quote things when the > condor job manager is used explicitly. Waiting for it to be fixed in > GRAM 2 and then deployed on sites is silly. > > Mihael > > > > > > > > > > > > > > On Thu, 2007-11-08 at 12:20 -0600, Michael Wilde wrote: > > > > > In angle i need to pass a string of IDs as a single parameter to an > > > > > application: > > > > > angle "7171717 76 76" --more stuff --here > > > > > > > > > > Can you point me to an example of how to pass/quote this correctly, so > > > > > that the command line is invoked exactly as above (in this case with > > > > > argc=5, and argv[1]="7171717 76 76" ? > > > > > > > > > > Thanks. > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > From hategan at mcs.anl.gov Thu Nov 8 13:26:26 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 08 Nov 2007 13:26:26 -0600 Subject: [Swift-user] Passing strings with blanks to apps in swift In-Reply-To: References: <4733536C.3080701@mcs.anl.gov> <1194547388.11817.0.camel@blabla.mcs.anl.gov> <1194548590.13231.0.camel@blabla.mcs.anl.gov> <1194548776.13602.1.camel@blabla.mcs.anl.gov> Message-ID: <1194549987.15077.2.camel@blabla.mcs.anl.gov> File the report. "Don't use spaces" does not nicely encapsulate a reproducible solution. Sooner or later others will run into this problem despite any "don't use spaces" we put in the documentation, simply because the decision to not use spaces belongs to the application domain not those who bridge Swift with the applications. On Thu, 2007-11-08 at 19:08 +0000, Ben Clifford wrote: > or don't use strings, just like any other time in unix. > > On Thu, 8 Nov 2007, Mihael Hategan wrote: > > > On Thu, 2007-11-08 at 13:03 -0600, Mihael Hategan wrote: > > > On Thu, 2007-11-08 at 18:56 +0000, Ben Clifford wrote: > > > > > > > > On Thu, 8 Nov 2007, Mihael Hategan wrote: > > > > > > > > > If you put the string in quotes it should work. One string is one > > > > > element in argv. > > > > > > > > some of the GRAM2 jobmanagers don't deal with that - I know condor > > > > doesn't. > > > > > > Don't use the condor job manager then. > > > > Actually file an enhancement request with CoG to quote things when the > > condor job manager is used explicitly. Waiting for it to be fixed in > > GRAM 2 and then deployed on sites is silly. > > > > Mihael > > > > > > > > > > > > > > > > > > > On Thu, 2007-11-08 at 12:20 -0600, Michael Wilde wrote: > > > > > > In angle i need to pass a string of IDs as a single parameter to an > > > > > > application: > > > > > > angle "7171717 76 76" --more stuff --here > > > > > > > > > > > > Can you point me to an example of how to pass/quote this correctly, so > > > > > > that the command line is invoked exactly as above (in this case with > > > > > > argc=5, and argv[1]="7171717 76 76" ? > > > > > > > > > > > > Thanks. > > > > > > _______________________________________________ > > > > > > Swift-user mailing list > > > > > > Swift-user at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > From wilde at mcs.anl.gov Mon Nov 12 17:57:58 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 12 Nov 2007 17:57:58 -0600 Subject: [Swift-user] Questions on sites.xml entries Message-ID: <4738E886.6050308@mcs.anl.gov> In sites.xml: Is storage used, or can it be set to null or some filler value? Is major and minor used? If so there a way to determine the correct setting for a given server, over the net? major=2 => use pre-ws-gram major=4 => use ws-gram minor => ignored correct? From benc at hawaga.org.uk Mon Nov 12 17:59:08 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 12 Nov 2007 23:59:08 +0000 (GMT) Subject: [Swift-user] Questions on sites.xml entries In-Reply-To: <4738E886.6050308@mcs.anl.gov> References: <4738E886.6050308@mcs.anl.gov> Message-ID: On Mon, 12 Nov 2007, Michael Wilde wrote: > Is storage used, or can it be set to null or some filler value? it isn't used as far as I'm aware. > Is major and minor used? If so there a way to determine the correct setting > for a given server, over the net? I don't think so. > url="tg-login1.sdsc.teragrid.org:2119/jobmanager-pbs" > major="2" minor="4"/> > > major=2 => use pre-ws-gram > major=4 => use ws-gram > minor => ignored > > correct? yes. -- From skenny at uchicago.edu Wed Nov 14 14:11:28 2007 From: skenny at uchicago.edu (skenny at uchicago.edu) Date: Wed, 14 Nov 2007 14:11:28 -0600 (CST) Subject: [Swift-user] no registered callback handler Message-ID: <20071114141128.AWA04405@m4500-02.uchicago.edu> hi all, i'm getting this error regarless of the site that i submit to (i've tried uc/anl and teraport). initially was trying my own script but then tried 'hello world' and am getting the same thing... however, when i run my own script it does seem to get as far as transferring the input file to the remote site; but then fails on trying to run the actual job. any ideas? RunID: 20071114-1407-g84ac350 echo started 2007.11.14 14:07:21.795 CST: [ERROR] Parsing profiles on line 187 Illegal character ' 'at position 65 :Illegal character ' ' 2007.11.14 14:07:21.798 CST: [ERROR] Parsing profiles on line 212 Illegal character ' 'at position 5 :Illegal character ' ' 2007.11.14 14:07:21.806 CST: [ERROR] Parsing profiles on line 248 Illegal character ' 'at position 5 :Illegal character ' ' 2007.11.14 14:07:21.807 CST: [ERROR] Parsing profiles on line 273 Illegal character ' 'at position 5 :Illegal character ' ' Failed to clean up job java.lang.IllegalStateException: No registered callback handler for org.globus.gsi.gssapi.GlobusGSSCredentialImpl at 1fc0f04 at org.globus.cog.abstraction.impl.execution.gt2.CallbackHandlerManager.decreaseUsageCount(CallbackHandlerManager.java:33) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.cleanup(JobSubmissionTaskHandler.java:482) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:148) at org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:92) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:54) at org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:83) at edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) at edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) at edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) at java.lang.Thread.run(Thread.java:595) From wilde at mcs.anl.gov Wed Nov 14 15:17:36 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 14 Nov 2007 15:17:36 -0600 Subject: [Swift-user] no registered callback handler In-Reply-To: <20071114141128.AWA04405@m4500-02.uchicago.edu> References: <20071114141128.AWA04405@m4500-02.uchicago.edu> Message-ID: <473B65F0.9000702@mcs.anl.gov> looks to me like possible errors in tc.data may be causing the initial illegal char messages. if you used maxwalltime= did you put 00:20:00 values in double-quotes: "00:30" say? not sure if this is causing the later gssapi message. send your sites.xml and tc.data file for a closer look On 11/14/07 2:11 PM, skenny at uchicago.edu wrote: > hi all, i'm getting this error regarless of the site that i > submit to (i've tried uc/anl and teraport). initially was > trying my own script but then tried 'hello world' and am > getting the same thing... > > however, when i run my own script it does seem to get as far > as transferring the input file to the remote site; but then > fails on trying to run the actual job. > > any ideas? > > RunID: 20071114-1407-g84ac350 > echo started > 2007.11.14 14:07:21.795 CST: [ERROR] Parsing profiles on line > 187 Illegal character ' 'at position 65 :Illegal character ' ' > 2007.11.14 14:07:21.798 CST: [ERROR] Parsing profiles on line > 212 Illegal character ' 'at position 5 :Illegal character ' ' > 2007.11.14 14:07:21.806 CST: [ERROR] Parsing profiles on line > 248 Illegal character ' 'at position 5 :Illegal character ' ' > 2007.11.14 14:07:21.807 CST: [ERROR] Parsing profiles on line > 273 Illegal character ' 'at position 5 :Illegal character ' ' > Failed to clean up job > java.lang.IllegalStateException: No registered callback > handler for org.globus.gsi.gssapi.GlobusGSSCredentialImpl at 1fc0f04 > at > org.globus.cog.abstraction.impl.execution.gt2.CallbackHandlerManager.decreaseUsageCount(CallbackHandlerManager.java:33) > at > org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.cleanup(JobSubmissionTaskHandler.java:482) > at > org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:148) > at > org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:92) > at > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:54) > at > org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:83) > at > edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) > at > edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) > at > edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) > at > edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) > at java.lang.Thread.run(Thread.java:595) > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From benc at hawaga.org.uk Wed Nov 14 16:00:03 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 14 Nov 2007 22:00:03 +0000 (GMT) Subject: [Swift-user] no registered callback handler In-Reply-To: <473B65F0.9000702@mcs.anl.gov> References: <20071114141128.AWA04405@m4500-02.uchicago.edu> <473B65F0.9000702@mcs.anl.gov> Message-ID: On Wed, 14 Nov 2007, Michael Wilde wrote: > looks to me like possible errors in tc.data may be causing the initial illegal > char messages. if you used maxwalltime= did you put 00:20:00 values in > double-quotes: "00:30" say? > > not sure if this is causing the later gssapi message. I think they are probably unrelated. -- From skenny at uchicago.edu Wed Nov 14 16:07:43 2007 From: skenny at uchicago.edu (skenny at uchicago.edu) Date: Wed, 14 Nov 2007 16:07:43 -0600 (CST) Subject: [Swift-user] no registered callback handler Message-ID: <20071114160743.AWA25191@m4500-02.uchicago.edu> here's my sites file for uc/anl teragrid: /app/osg_app /home/skenny/data /tmp /tmp osg 120 ia32-compute /home/skenny/sidgrid_out and here is my entry in tc.data for each of the 2 scripts i'm testing on: ANLUCTERAGRID32 echo /bin/echo INSTALLED INTEL32::LINUX null UCTERAPORT ffmpeg_sh /gpfs1/osg_data/sidgrid_tools/transcode/bin/ffmpeg_sh INSTALLED INTEL64::LINUX null ---- Original message ---- >Date: Wed, 14 Nov 2007 15:17:36 -0600 >From: Michael Wilde >Subject: Re: [Swift-user] no registered callback handler >To: skenny at uchicago.edu >Cc: swift-user at ci.uchicago.edu > >looks to me like possible errors in tc.data may be causing the initial >illegal char messages. if you used maxwalltime= did you put 00:20:00 >values in double-quotes: "00:30" say? > >not sure if this is causing the later gssapi message. > >send your sites.xml and tc.data file for a closer look > >On 11/14/07 2:11 PM, skenny at uchicago.edu wrote: >> hi all, i'm getting this error regarless of the site that i >> submit to (i've tried uc/anl and teraport). initially was >> trying my own script but then tried 'hello world' and am >> getting the same thing... >> >> however, when i run my own script it does seem to get as far >> as transferring the input file to the remote site; but then >> fails on trying to run the actual job. >> >> any ideas? >> >> RunID: 20071114-1407-g84ac350 >> echo started >> 2007.11.14 14:07:21.795 CST: [ERROR] Parsing profiles on line >> 187 Illegal character ' 'at position 65 :Illegal character ' ' >> 2007.11.14 14:07:21.798 CST: [ERROR] Parsing profiles on line >> 212 Illegal character ' 'at position 5 :Illegal character ' ' >> 2007.11.14 14:07:21.806 CST: [ERROR] Parsing profiles on line >> 248 Illegal character ' 'at position 5 :Illegal character ' ' >> 2007.11.14 14:07:21.807 CST: [ERROR] Parsing profiles on line >> 273 Illegal character ' 'at position 5 :Illegal character ' ' >> Failed to clean up job >> java.lang.IllegalStateException: No registered callback >> handler for org.globus.gsi.gssapi.GlobusGSSCredentialImpl at 1fc0f04 >> at >> org.globus.cog.abstraction.impl.execution.gt2.CallbackHandlerManager.decreaseUsageCount(CallbackHandlerManager.java:33) >> at >> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.cleanup(JobSubmissionTaskHandler.java:482) >> at >> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submitSingleJob(JobSubmissionTaskHandler.java:148) >> at >> org.globus.cog.abstraction.impl.execution.gt2.JobSubmissionTaskHandler.submit(JobSubmissionTaskHandler.java:92) >> at >> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:54) >> at >> org.globus.cog.karajan.scheduler.submitQueue.NonBlockingSubmit.run(NonBlockingSubmit.java:83) >> at >> edu.emory.mathcs.backport.java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:431) >> at >> edu.emory.mathcs.backport.java.util.concurrent.FutureTask.run(FutureTask.java:166) >> at >> edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:643) >> at >> edu.emory.mathcs.backport.java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:668) >> at java.lang.Thread.run(Thread.java:595) >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> >>