From hategan at mcs.anl.gov Thu Mar 1 10:19:20 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 01 Mar 2007 10:19:20 -0600 Subject: [Swift-devel] the big red swift site In-Reply-To: <45E6FAA6.3090800@mcs.anl.gov> References: <1172688330.26982.0.camel@blabla.mcs.anl.gov> <45E5CF82.5060702@mcs.anl.gov> <1172689571.27806.3.camel@blabla.mcs.anl.gov> <1172719238.12526.1.camel@blabla.mcs.anl.gov> <45E6FAA6.3090800@mcs.anl.gov> Message-ID: <1172765960.20077.8.camel@blabla.mcs.anl.gov> On Thu, 2007-03-01 at 10:09 -0600, Beth Cerny Patino wrote: > Hi Mihael, > Do you altered the site to scale in the main content area? Yes. Made it larger since things were too squeezed in. > If so, it > does not display properly on my screen (I've attached a screen grab) - > the side content goes below the main content. Yes. A narrow enough browser window will cause that. Do you know a way that keeps things relative as much as possible, yet the right side properly positioned even on smaller windows? > It will be best to switch it back or use style2.css so that it looks > like this http://www.ci.uchicago.edu/test/swift/index2.php. But you will > want to let Ian know that you prefer the page to scale. > > Beth > > > > Mihael Hategan wrote: > > On Wed, 2007-02-28 at 13:06 -0600, Mihael Hategan wrote: > > > > > >> I'm going to change a few things, like remove certain sentences and > >> links opening in new windows. I find the latter justified in too few > >> cases and annoying in most. > >> I'll also try to integrate the docs. > >> So if it's ok with everybody, lock on www in about 45 minutes. > >> > > > > Done. > > > > > >> Mihael > >> > >> > >>> is beth on this list, btw? > >>> > >>> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > >> > > > > > From nefedova at mcs.anl.gov Thu Mar 1 14:36:57 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Thu, 01 Mar 2007 14:36:57 -0600 Subject: [Swift-devel] variable command line options Message-ID: <6.0.0.22.2.20070301142203.05661af0@mail.mcs.anl.gov> Hi: The application I am working with accepts a variable number of arguments: could be 10, could be 15 (i.e. some of them are optional). I need to call this application N times with 17 input command line parameters, M times with 18 parameters, and X times with 20 parameters. Do I need to define 3 functions, one for each set of parameters ? Or there is a better way ? Input parameters is a combination of strings, filenames and stdin input. Thanks! Nika From bcerny at mcs.anl.gov Thu Mar 1 10:09:10 2007 From: bcerny at mcs.anl.gov (Beth Cerny Patino) Date: Thu, 01 Mar 2007 10:09:10 -0600 Subject: [Swift-devel] the big red swift site In-Reply-To: <1172719238.12526.1.camel@blabla.mcs.anl.gov> References: <1172688330.26982.0.camel@blabla.mcs.anl.gov> <45E5CF82.5060702@mcs.anl.gov> <1172689571.27806.3.camel@blabla.mcs.anl.gov> <1172719238.12526.1.camel@blabla.mcs.anl.gov> Message-ID: <45E6FAA6.3090800@mcs.anl.gov> Hi Mihael, Do you altered the site to scale in the main content area? If so, it does not display properly on my screen (I've attached a screen grab) - the side content goes below the main content. It will be best to switch it back or use style2.css so that it looks like this http://www.ci.uchicago.edu/test/swift/index2.php. But you will want to let Ian know that you prefer the page to scale. Beth Mihael Hategan wrote: > On Wed, 2007-02-28 at 13:06 -0600, Mihael Hategan wrote: > > >> I'm going to change a few things, like remove certain sentences and >> links opening in new windows. I find the latter justified in too few >> cases and annoying in most. >> I'll also try to integrate the docs. >> So if it's ok with everybody, lock on www in about 45 minutes. >> > > Done. > > >> Mihael >> >> >>> is beth on this list, btw? >>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > > -------------- next part -------------- A non-text attachment was scrubbed... Name: swifthome.gif Type: image/gif Size: 91445 bytes Desc: not available URL: From benc at hawaga.org.uk Fri Mar 2 05:35:51 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 2 Mar 2007 11:35:51 +0000 (GMT) Subject: [Swift-devel] commit message verbosity Message-ID: Might be useful to make commit messages more descriptive of problem being fixed if it doesn't have a bug open. I also wonder about the need to maintain duplicate information in CHANGES manually rather than pulling them out of svn log - certainly multi-commit activities need description, but for single-commit activities, the svn log is more authoritative. -- From benc at hawaga.org.uk Fri Mar 2 05:41:56 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 2 Mar 2007 11:41:56 +0000 (GMT) Subject: [Swift-devel] sites.xml Message-ID: In bug 32, Mihael commented: > The handshake failure is caused by the fact that the url is not valid. > GT4 does not use /jobmanager-xyz in the URL, and therefore the one you > use is trying to access a non-WSRF URL in the container (which causes > the GSI handshake to fail, because it gets some 404 page back). > This brings up a problem. The GT2 style job manager specification is > clearly not portable. In cog, resources (aka services) have an > additional jobManager attribute which can be used to portably specify > job managers, and the providers take care of translating that into > whatever the implementation requires. However, this would require a > modification on the structure of the sites.xml file. I think it should > be OK to add a jobManager attribute, while still allowing the > /jobmanager-xyz thing for classic GRAM resources, but not for WS-GRAM. There's no real backwards compatibility reason to stick with anything like the sites.xml format that we have now. Some of the stuff is out of place in there already and eventually it might warrant a big tidyup. Also, WS-GRAM (at least in theory) doesn't use URLs with a job-manager - it uses an opaque EPR in the form of an XML blob. Alas, these are so appallingly unusable that for the most part people have settled on rigid structure for the EPRs when that rigid structure shouldn't be there. Roll on the day when someone actually fixes that. Come back GT3. All is forgiven. VDS already has a way to specify GRAM4 resources but I can't remember. Maybe its similar to the one used by condor - they use this format for the jobmanager string (it wrapped on paste - newlines are spaces): grid_resource = gt4 [https://]IPaddress[:port][/wsrf/services/ManagedJobFactoryService] scheduler-string (according to http://www.cs.wisc.edu/condor/manual/v6.8/5_3Grid_Universe.html#SECTION00632400000000000000) -- From hategan at mcs.anl.gov Fri Mar 2 09:18:42 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 02 Mar 2007 09:18:42 -0600 Subject: [Swift-devel] sites.xml In-Reply-To: References: Message-ID: <1172848722.4297.3.camel@blabla.mcs.anl.gov> On Fri, 2007-03-02 at 11:41 +0000, Ben Clifford wrote: > In bug 32, Mihael commented: > > > The handshake failure is caused by the fact that the url is not valid. > > GT4 does not use /jobmanager-xyz in the URL, and therefore the one you > > use is trying to access a non-WSRF URL in the container (which causes > > the GSI handshake to fail, because it gets some 404 page back). > > > This brings up a problem. The GT2 style job manager specification is > > clearly not portable. In cog, resources (aka services) have an > > additional jobManager attribute which can be used to portably specify > > job managers, and the providers take care of translating that into > > whatever the implementation requires. However, this would require a > > modification on the structure of the sites.xml file. I think it should > > be OK to add a jobManager attribute, while still allowing the > > /jobmanager-xyz thing for classic GRAM resources, but not for WS-GRAM. > > There's no real backwards compatibility reason to stick with anything like > the sites.xml format that we have now. Aside, perhaps, from the fact that the documentation is already written for that. > Some of the stuff is out of place > in there already and eventually it might warrant a big tidyup. > > Also, WS-GRAM (at least in theory) doesn't use URLs with a job-manager - > it uses an opaque EPR in the form of an XML blob. Alas, these are so > appallingly unusable that for the most part people have settled on rigid > structure for the EPRs when that rigid structure shouldn't be there. Roll > on the day when someone actually fixes that. Come back GT3. All is > forgiven. > > VDS already has a way to specify GRAM4 resources but I can't remember. > Maybe its similar to the one used by condor - they use this format for the > jobmanager string (it wrapped on paste - newlines are spaces): > > grid_resource = gt4 > [https://]IPaddress[:port][/wsrf/services/ManagedJobFactoryService] > scheduler-string > > (according to > http://www.cs.wisc.edu/condor/manual/v6.8/5_3Grid_Universe.html#SECTION00632400000000000000) > From benc at hawaga.org.uk Fri Mar 2 09:56:23 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 2 Mar 2007 15:56:23 +0000 (GMT) Subject: [Swift-devel] sites.xml In-Reply-To: References: <1172848722.4297.3.camel@blabla.mcs.anl.gov> Message-ID: On Fri, 2 Mar 2007, Tiberiu Stef-Praun wrote: > For instance, I did not know where to search for GLOBUS::queue=ABC. > Apparently these parameters are connected to RSL. There are lots of > cases where these connections need to be made explicit for > new/unexperienced users. Yes. There's a question of how much we expect people to know about stuff that isn't Swift itself. We can try to make this code easy to start off with, but at some point to do 'advanced gram things' people have to take time off swift and go learn how to do advanced gram things before coming back to do those things in Swift. -- From tiberius at ci.uchicago.edu Fri Mar 2 09:52:01 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Fri, 2 Mar 2007 09:52:01 -0600 Subject: [Swift-devel] sites.xml In-Reply-To: <1172848722.4297.3.camel@blabla.mcs.anl.gov> References: <1172848722.4297.3.camel@blabla.mcs.anl.gov> Message-ID: Given the deluge of information available these days, our goal should move from providing the information to locating where the information is already provided. That means providing the idea, and linking to the full documentation. For instance, I did not know where to search for GLOBUS::queue=ABC. Apparently these parameters are connected to RSL. There are lots of cases where these connections need to be made explicit for new/unexperienced users. Tibi On 3/2/07, Mihael Hategan wrote: > On Fri, 2007-03-02 at 11:41 +0000, Ben Clifford wrote: > > In bug 32, Mihael commented: > > > > > The handshake failure is caused by the fact that the url is not valid. > > > GT4 does not use /jobmanager-xyz in the URL, and therefore the one you > > > use is trying to access a non-WSRF URL in the container (which causes > > > the GSI handshake to fail, because it gets some 404 page back). > > > > > This brings up a problem. The GT2 style job manager specification is > > > clearly not portable. In cog, resources (aka services) have an > > > additional jobManager attribute which can be used to portably specify > > > job managers, and the providers take care of translating that into > > > whatever the implementation requires. However, this would require a > > > modification on the structure of the sites.xml file. I think it should > > > be OK to add a jobManager attribute, while still allowing the > > > /jobmanager-xyz thing for classic GRAM resources, but not for WS-GRAM. > > > > There's no real backwards compatibility reason to stick with anything like > > the sites.xml format that we have now. > > Aside, perhaps, from the fact that the documentation is already written > for that. > > > Some of the stuff is out of place > > in there already and eventually it might warrant a big tidyup. > > > > Also, WS-GRAM (at least in theory) doesn't use URLs with a job-manager - > > it uses an opaque EPR in the form of an XML blob. Alas, these are so > > appallingly unusable that for the most part people have settled on rigid > > structure for the EPRs when that rigid structure shouldn't be there. Roll > > on the day when someone actually fixes that. Come back GT3. All is > > forgiven. > > > > VDS already has a way to specify GRAM4 resources but I can't remember. > > Maybe its similar to the one used by condor - they use this format for the > > jobmanager string (it wrapped on paste - newlines are spaces): > > > > grid_resource = gt4 > > [https://]IPaddress[:port][/wsrf/services/ManagedJobFactoryService] > > scheduler-string > > > > (according to > > http://www.cs.wisc.edu/condor/manual/v6.8/5_3Grid_Universe.html#SECTION00632400000000000000) > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From foster at mcs.anl.gov Fri Mar 2 10:13:58 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Fri, 02 Mar 2007 10:13:58 -0600 Subject: [Swift-devel] sites.xml In-Reply-To: References: <1172848722.4297.3.camel@blabla.mcs.anl.gov> Message-ID: <45E84D46.3060104@mcs.anl.gov> For some sites (e.g., TeraGrid) a fair bit of this info is available via MDS. Could we make use of that source, rather than (as I think we do??) replicating info on sites.xml files? Ben Clifford wrote: > On Fri, 2 Mar 2007, Tiberiu Stef-Praun wrote: > > >> For instance, I did not know where to search for GLOBUS::queue=ABC. >> Apparently these parameters are connected to RSL. There are lots of >> cases where these connections need to be made explicit for >> new/unexperienced users. >> > > Yes. There's a question of how much we expect people to know about stuff > that isn't Swift itself. We can try to make this code easy to start off > with, but at some point to do 'advanced gram things' people have to take > time off swift and go learn how to do advanced gram things before coming > back to do those things in Swift. > > -- > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Fri Mar 2 10:30:02 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 2 Mar 2007 16:30:02 +0000 (GMT) Subject: [Swift-devel] 0.1rc2 In-Reply-To: References: Message-ID: On Wed, 28 Feb 2007, Ben Clifford wrote: > http://www.ci.uchicago.edu/swift/tests/vdsk-0.1rc2.tar.gz > I haven't tried it yet. > > As before, try it, see what breaks, and if it survives 24h from this > message then it turns into 0.1 release. at least two people (me and Nika) have done some testing with this and it seems ok. so in a few hours (unless someone is sitting on some secret disaster bug) I'll put this out at the 0.1 release. -- From yongzh at cs.uchicago.edu Fri Mar 2 11:30:19 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 2 Mar 2007 11:30:19 -0600 (CST) Subject: [Swift-devel] remote file/directory stuff (bug 22) In-Reply-To: <1172604377.25936.2.camel@blabla.mcs.anl.gov> References: <1172521676.27811.9.camel@blabla.mcs.anl.gov> <1172604377.25936.2.camel@blabla.mcs.anl.gov> Message-ID: Hi, Can you elaborate on this issue a little bit so that we can make a unanimous decision: 1. what was the problem exactly 2. what are you proposing 3. to what extent does the proposal solve the problem 4. what is the implication to the mapping interface Thanks. Yong. On Tue, 27 Feb 2007, Mihael Hategan wrote: > If you can make this translate into something like vdl:(in| > out)appmapping(var, path, dest), preferably after the stagein/stageout > directives, I can probably make it work. > > On Tue, 2007-02-27 at 19:22 +0000, Ben Clifford wrote: > > > > On Mon, 26 Feb 2007, Mihael Hategan wrote: > > > > > Right. This would be the "application mapper". > > > Now, there are a few things here: > > > We may also want to do the same to the input, because some even more > > > twisted apps will not even accept that as a parameter. So: > > > (file k, file m, file n) myapp(file l) { > > > app{ > > > l>"input.txt"; > > > myapp; > > > k<"output.crd" > > > m<"output.prd" > > > n<"output.rtf" > > > } > > > } > > > > so something like the above syntax, sufficient to address Nika's cases, is > > probably the way to go for bug 22, without necessarily implementing more > > complicated stuff like arrays below. > > > > That should give us a feel for how the concept works in practice too. > > > > > This may become a little trickier when inputs (or even outputs) are > > > arrays, so we may need nicer schemes: > > > (file o) myapp(file i[]){ > > > app{ > > > i[x=*] > "input"+$1; (or something like that) > > > myapp; > > > ... > > > } > > > } > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Fri Mar 2 11:32:50 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 02 Mar 2007 11:32:50 -0600 Subject: [Swift-devel] ideas? Message-ID: <6.0.0.22.2.20070302111759.055b4d00@mail.mcs.anl.gov> Hi, I am working now on a next stage of Molecular Dynamics application (where it gets interesting). 1. application one generates bla*.prt files and random seed files (123673245.rand, 738869.rand, etc) - equal number of those. The number of *.prt files generated is close to 100 (the exact number depends on the input) 2. I need to call another application for each of the prt file that uses the prt file's name pattern as an input, and also the random seed: For example, for bla_2_3.prt I need to call my app this way: ./my_app bla_2_3 seed:123673245 start:2 end:3 (different random seed for each .prt file) I am wondering if you have any suggestions on how better to approach this. I can probably do it all in the wrapper, but I am not sure if I can do it in swift (maybe just part of it could be done in swift?). I'll keep all the string operations in my wrapper for now. Thanks, Nika From benc at hawaga.org.uk Fri Mar 2 11:58:14 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 2 Mar 2007 17:58:14 +0000 (GMT) Subject: [Swift-devel] remote file/directory stuff (bug 22) In-Reply-To: References: <1172521676.27811.9.camel@blabla.mcs.anl.gov> <1172604377.25936.2.camel@blabla.mcs.anl.gov> Message-ID: On Fri, 2 Mar 2007, Yong Zhao wrote: > Can you elaborate on this issue a little bit so that we can make a > unanimous decision: > > 1. what was the problem exactly Some programs that we run in swift do not use the traditional VDS-like API of being told on the commandline the names of the files that they must input and output to. Instead, they make up some of the names themselves. For example, one of Nika's programs has the syntax: ./program inputfilename and places its outputs in inputfile.stuff, inputfile.abc, inputfile.foo > 2. what are you proposing To extend the syntax of the app {} block to permit specification of the above interface, with a syntax something like: (stuffoutfile s, abcoutfile a, foooutfile f) myproc(inputfile i) app { program @i; s < @strcat(@inputfile,".stuff") a < @strcat(@inputfile,".abc") f < @strcat(@inputfile,".foo") } } Meaning that rather than Swift specifying the remote name for s, a and f, instead the app block specifies where those three files are. These will be staged back into the submit-side location defined in the existing mappers. > 3. to what extent does the proposal solve the problem It should solve Nika's immediate problem, I think. > 4. what is the implication to the mapping interface A longer term perspective is that this is the beginning of longer work to implement fuller execute-side mappers (which have also been called application mappers in some threads). So it is mapping, but on the execute side. It fits in in a fairly straightforward way with mapping on the submit side, which is what we have now. Submit side mapping maps between submit-side data and SwiftScript variables/structures, so that the user can arrange his submit-side data in a way that he wants (rather than swift compelling it to be in a particular format) Execute side mapping maps between SwiftScript variables and execute side data, so that data can be laid out on the execute side in the way that the program wants it (rather than swift compelling it to be in a particular format) With the present implementation, this amounts to being able to specify different paths and filenames on the submit and execute side for each data file. In the longer term, it might also be useful in defining things like how to map data on a submit-side database to some format on the execute side for processing. If we have only submit side mappers, then we can map data between a submit side database and SwiftScript structures, but not map between those structures and the execute side... -- From benc at hawaga.org.uk Fri Mar 2 12:32:29 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 2 Mar 2007 18:32:29 +0000 (GMT) Subject: [Swift-devel] variable command line options In-Reply-To: <6.0.0.22.2.20070301142203.05661af0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070301142203.05661af0@mail.mcs.anl.gov> Message-ID: On Thu, 1 Mar 2007, Veronika V. Nefedova wrote: > The application I am working with accepts a variable number of arguments: > could be 10, could be 15 (i.e. some of them are optional). I need to call this > application N times with 17 input command line parameters, M times with 18 > parameters, and X times with 20 parameters. Do I need to define 3 functions, > one for each set of parameters ? Or there is a better way ? Input parameters > is a combination of strings, filenames and stdin input. it is possible to make SwiftScript procedures have parameters with default values. that isn't exactly what you are asking for, but might be related: http://www.ci.uchicago.edu/swift/guides/tutorial.php#id2784846 But that doesn't let you change the parameters going to the application itself. Can you give a couple of example command-lines and explain what the differences are? -- From nefedova at mcs.anl.gov Fri Mar 2 12:47:12 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 02 Mar 2007 12:47:12 -0600 Subject: [Swift-devel] variable command line options In-Reply-To: References: <6.0.0.22.2.20070301142203.05661af0@mail.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070302123612.05663c80@mail.mcs.anl.gov> At 12:32 PM 3/2/2007, Ben Clifford wrote: >On Thu, 1 Mar 2007, Veronika V. Nefedova wrote: > > > The application I am working with accepts a variable number of arguments: > > could be 10, could be 15 (i.e. some of them are optional). I need to > call this > > application N times with 17 input command line parameters, M times with 18 > > parameters, and X times with 20 parameters. Do I need to define 3 > functions, > > one for each set of parameters ? Or there is a better way ? Input > parameters > > is a combination of strings, filenames and stdin input. > >it is possible to make SwiftScript procedures have parameters with default >values. that isn't exactly what you are asking for, but might be related: > >http://www.ci.uchicago.edu/swift/guides/tutorial.php#id2784846 > yes, I am already doing that since my apps have a lot of default parameters. >But that doesn't let you change the parameters going to the application >itself. > >Can you give a couple of example command-lines and explain what the >differences are? The input parameters to the application are keyword-driven: This is how the app could be called: charmm pstep:700 dirname:solv_chg_a8 system:solv_m001 stitle:m001 cpddata:../alamines rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm gaff:m001_am1 vac: restart:NONE inputdir:../inputs faster:off rwater:15 chem:chem minstep:0 rforce:0 ligcrd:lyz stage:chg urandseed:762192 < solv.inp > solv_chg_a8.out or this way: charmm pstep:700 dirname:solv_repu0_0.2_a1 system:solv_m001 stitle:m001 cpddata:../alamines rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm gaff:m001_am1 vac: restart:NONE inputdir:../inputs faster:off rwater:15 chem:chem minstep:0 rforce:0 ligcrd:lyz stage:repu rcut1:0 rcut2:0.2 urandseed:1522709 < solv.inp > solv_repu0_0.2_a1.out Or this way: charmm system:solv_m001 title:solv stitle:m001 rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm gaff:m001_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 rwater:15 nstep:100 minstep:100 skipstep:100 startstep:10000 < equil_solv.inp > equil_solv.out Some of these parameters are constants (default), but majority are not. And the number of those that change vary from call to call. Nika >-- From yongzh at cs.uchicago.edu Fri Mar 2 12:49:28 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 2 Mar 2007 12:49:28 -0600 (CST) Subject: [Swift-devel] remote file/directory stuff (bug 22) In-Reply-To: References: <1172521676.27811.9.camel@blabla.mcs.anl.gov> <1172604377.25936.2.camel@blabla.mcs.anl.gov> Message-ID: I like this stated fact: "Meaning that rather than Swift specifying the remote name for s, a and f, instead the app block specifies where those three files are." It pushes application logic further info the app block, which is where it is supposed to be, and at the workflow level it is much cleaner. Do you have a nice example about input layout? Yong. On Fri, 2 Mar 2007, Ben Clifford wrote: > > > On Fri, 2 Mar 2007, Yong Zhao wrote: > > > Can you elaborate on this issue a little bit so that we can make a > > unanimous decision: > > > > 1. what was the problem exactly > > Some programs that we run in swift do not use the traditional VDS-like API > of being told on the commandline the names of the files that they must > input and output to. Instead, they make up some of the names themselves. > > For example, one of Nika's programs has the syntax: > > ./program inputfilename > > and places its outputs in inputfile.stuff, inputfile.abc, inputfile.foo > > > 2. what are you proposing > > To extend the syntax of the app {} block to permit specification of the > above interface, with a syntax something like: > > (stuffoutfile s, abcoutfile a, foooutfile f) myproc(inputfile i) > app { > program @i; > s < @strcat(@inputfile,".stuff") > a < @strcat(@inputfile,".abc") > f < @strcat(@inputfile,".foo") > } > } > > Meaning that rather than Swift specifying the remote name for s, a and f, > instead the app block specifies where those three files are. > > These will be staged back into the submit-side location defined in the > existing mappers. > > > 3. to what extent does the proposal solve the problem > > It should solve Nika's immediate problem, I think. > > > 4. what is the implication to the mapping interface > > A longer term perspective is that this is the beginning of longer work to > implement fuller execute-side mappers (which have also been called > application mappers in some threads). > > So it is mapping, but on the execute side. It fits in in a fairly > straightforward way with mapping on the submit side, which is what we have > now. > > Submit side mapping maps between submit-side data and SwiftScript > variables/structures, so that the user can arrange his submit-side data in > a way that he wants (rather than swift compelling it to be in a particular > format) > > Execute side mapping maps between SwiftScript variables and execute side > data, so that data can be laid out on the execute side in the way that the > program wants it (rather than swift compelling it to be in a particular > format) > > With the present implementation, this amounts to being able to specify > different paths and filenames on the submit and execute side for each data > file. > > In the longer term, it might also be useful in defining things like how to > map data on a submit-side database to some format on the execute side for > processing. If we have only submit side mappers, then we can map data > between a submit side database and SwiftScript structures, but not map > between those structures and the execute side... > > -- > From yongzh at cs.uchicago.edu Fri Mar 2 12:58:42 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 2 Mar 2007 12:58:42 -0600 (CST) Subject: [Swift-devel] variable command line options In-Reply-To: <6.0.0.22.2.20070302123612.05663c80@mail.mcs.anl.gov> References: <6.0.0.22.2.20070301142203.05661af0@mail.mcs.anl.gov> <6.0.0.22.2.20070302123612.05663c80@mail.mcs.anl.gov> Message-ID: I had some sort of discussion with Mihael about that. If you have an application, for instance, like "ls", you have tons of options, but you don't use them all, so you might end up defining different procedures for different groups of parameters, which may not be desirable. So what we talked about was to have optional mapping at the command line, using some syntax like the unix man page for options, for instance: (....)myproc(string lv, string tv) app{ myapp ["-l" lv] ["-t" tv]; } so if lv has value, the whole option for -l goes in, otherwise, it is ignored Yong. On Fri, 2 Mar 2007, Veronika V. Nefedova wrote: > At 12:32 PM 3/2/2007, Ben Clifford wrote: > > > >On Thu, 1 Mar 2007, Veronika V. Nefedova wrote: > > > > > The application I am working with accepts a variable number of arguments: > > > could be 10, could be 15 (i.e. some of them are optional). I need to > > call this > > > application N times with 17 input command line parameters, M times with 18 > > > parameters, and X times with 20 parameters. Do I need to define 3 > > functions, > > > one for each set of parameters ? Or there is a better way ? Input > > parameters > > > is a combination of strings, filenames and stdin input. > > > >it is possible to make SwiftScript procedures have parameters with default > >values. that isn't exactly what you are asking for, but might be related: > > > >http://www.ci.uchicago.edu/swift/guides/tutorial.php#id2784846 > > > > yes, I am already doing that since my apps have a lot of default parameters. > > >But that doesn't let you change the parameters going to the application > >itself. > > > >Can you give a couple of example command-lines and explain what the > >differences are? > > > The input parameters to the application are keyword-driven: > > This is how the app could be called: > > charmm pstep:700 dirname:solv_chg_a8 system:solv_m001 stitle:m001 > cpddata:../alamines rtffile:parm03_gaff_all.rtf > paramfile:parm03_gaffnb_all.prm gaff:m001_am1 vac: restart:NONE > inputdir:../inputs faster:off rwater:15 chem:chem minstep:0 rforce:0 > ligcrd:lyz stage:chg urandseed:762192 < solv.inp > solv_chg_a8.out > > or this way: > > charmm pstep:700 dirname:solv_repu0_0.2_a1 system:solv_m001 stitle:m001 > cpddata:../alamines rtffile:parm03_gaff_all.rtf > paramfile:parm03_gaffnb_all.prm gaff:m001_am1 vac: restart:NONE > inputdir:../inputs faster:off rwater:15 chem:chem minstep:0 rforce:0 > ligcrd:lyz stage:repu rcut1:0 rcut2:0.2 urandseed:1522709 < solv.inp > > solv_repu0_0.2_a1.out > > Or this way: > > charmm system:solv_m001 title:solv stitle:m001 rtffile:parm03_gaff_all.rtf > paramfile:parm03_gaffnb_all.prm gaff:m001_am1 nwater:400 ligcrd:lyz > rforce:0 iseed:3131887 rwater:15 nstep:100 minstep:100 skipstep:100 > startstep:10000 < equil_solv.inp > equil_solv.out > > > Some of these parameters are constants (default), but majority are not. And > the number of those that change vary from call to call. > > Nika > > > >-- > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Fri Mar 2 17:21:38 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 2 Mar 2007 23:21:38 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] New Java exception when moving from rc3 In-Reply-To: <9B4531C0-72F7-461A-80CB-B7D050A6734A@uchicago.edu> References: <9B4531C0-72F7-461A-80CB-B7D050A6734A@uchicago.edu> Message-ID: I was a similar exception a couple of days ago on something i was playing with but moved onto other stuff and never got round to looking closer - the -debug option didn't give a stack trace and I never started digging through the code. On Fri, 2 Mar 2007, Chad Glendenin wrote: > I'm still trying to run a workflow that worked with VDL rc3. With Swift > snapshot 070301, the command-line argument works again, but now I get a Java > exception: > > $ swift minmax.kml -list=filelist.txt > Swift V 0.0405 > RunID: flkjse5yhp1c0 > Execution failed: > java.lang.NumberFormatException: null > > Here's the code I'm trying to run. > > ////// > type file_t {}; > type filelist_t { > file_t file; > } > > (file_t output, file_t error) minmax (file_t input) { > app { > wrapper @input stdout=@output stderr=@error; > } > } > > //filelist_t dataset[] ; > filelist_t dataset[] ; > foreach f in dataset { > file_t out transform="\1.out">; > file_t err transform="\1.err">; > (out, err) = minmax(f.file); > } > ////// > > Here's my input file, 'filelist.txt': > > file > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000340.tar > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000341.tar > > Any idea what's causing the exception? > > By the way, the only reason I'm trying to upgrade is so that I can get > disk-space management to work (" key="storagesize">40000000000" in sites.xml). My understanding is > that if I use VDL rc3, it could overrun the available scratch space and crash: > I have 13.3 TB of data to process, but the UC/ANL TG site only has 1.1 TB of > scratch space available. So if there's a way to enable scratch-space > management in rc3, that should be good enough for now. (I'm a Swift user, not > a developer, so I'd rather not be on the bleeding edge if I don't have to.) > > Thanks, > ccg > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From benc at hawaga.org.uk Fri Mar 2 18:15:34 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 3 Mar 2007 00:15:34 +0000 (GMT) Subject: [Swift-devel] swift 0.1 Message-ID: I just put up v0.1 of Swift. This is a small development release intended for users who prefer to use a released version rather than follow the nightly builds. You can download from: http://www.ci.uchicago.edu/swift/packages/vdsk-0.1.tar.gz There's a changes file available at: http://www.ci.uchicago.edu/trac/swift/browser/tags/v0.1rc1/CHANGES.txt?format=raw Please send feedback to the swift-user or swift-devel at ci.uchicago.edu lists. (note that if you are a nightly-build follower, this code is approximately equivalent to a build from a few days ago - bugfixes made in the past day or so are not present in v0.1) -- From hategan at mcs.anl.gov Fri Mar 2 18:35:49 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 02 Mar 2007 18:35:49 -0600 Subject: [Swift-devel] Re: [Swift-user] New Java exception when moving from rc3 In-Reply-To: References: <9B4531C0-72F7-461A-80CB-B7D050A6734A@uchicago.edu> Message-ID: <1172882149.16258.5.camel@blabla.mcs.anl.gov> The log4j file should include DEBUG on the loader (which should give the full stack trace of such). This is as of r367. On Fri, 2007-03-02 at 23:21 +0000, Ben Clifford wrote: > I was a similar exception a couple of days ago on something i was playing > with but moved onto other stuff and never got round to looking closer - > the -debug option didn't give a stack trace and I never started digging > through the code. > > On Fri, 2 Mar 2007, Chad Glendenin wrote: > > > I'm still trying to run a workflow that worked with VDL rc3. With Swift > > snapshot 070301, the command-line argument works again, but now I get a Java > > exception: > > > > $ swift minmax.kml -list=filelist.txt > > Swift V 0.0405 > > RunID: flkjse5yhp1c0 > > Execution failed: > > java.lang.NumberFormatException: null > > > > Here's the code I'm trying to run. > > > > ////// > > type file_t {}; > > type filelist_t { > > file_t file; > > } > > > > (file_t output, file_t error) minmax (file_t input) { > > app { > > wrapper @input stdout=@output stderr=@error; > > } > > } > > > > //filelist_t dataset[] ; > > filelist_t dataset[] ; > > foreach f in dataset { > > file_t out > transform="\1.out">; > > file_t err > transform="\1.err">; > > (out, err) = minmax(f.file); > > } > > ////// > > > > Here's my input file, 'filelist.txt': > > > > file > > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000340.tar > > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000341.tar > > > > Any idea what's causing the exception? > > > > By the way, the only reason I'm trying to upgrade is so that I can get > > disk-space management to work (" > key="storagesize">40000000000" in sites.xml). My understanding is > > that if I use VDL rc3, it could overrun the available scratch space and crash: > > I have 13.3 TB of data to process, but the UC/ANL TG site only has 1.1 TB of > > scratch space available. So if there's a way to enable scratch-space > > management in rc3, that should be good enough for now. (I'm a Swift user, not > > a developer, so I'd rather not be on the bleeding edge if I don't have to.) > > > > Thanks, > > ccg > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Fri Mar 2 18:52:16 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 02 Mar 2007 18:52:16 -0600 Subject: [Swift-devel] Re: [Swift-user] New Java exception when moving from rc3 In-Reply-To: <1172882149.16258.5.camel@blabla.mcs.anl.gov> References: <9B4531C0-72F7-461A-80CB-B7D050A6734A@uchicago.edu> <1172882149.16258.5.camel@blabla.mcs.anl.gov> Message-ID: <1172883136.17280.0.camel@blabla.mcs.anl.gov> On Fri, 2007-03-02 at 18:35 -0600, Mihael Hategan wrote: > The log4j file should include DEBUG on the loader (which should give the > full stack trace of such). This is as of r367. Actually no. This is of r527. Sorry. > > On Fri, 2007-03-02 at 23:21 +0000, Ben Clifford wrote: > > I was a similar exception a couple of days ago on something i was playing > > with but moved onto other stuff and never got round to looking closer - > > the -debug option didn't give a stack trace and I never started digging > > through the code. > > > > On Fri, 2 Mar 2007, Chad Glendenin wrote: > > > > > I'm still trying to run a workflow that worked with VDL rc3. With Swift > > > snapshot 070301, the command-line argument works again, but now I get a Java > > > exception: > > > > > > $ swift minmax.kml -list=filelist.txt > > > Swift V 0.0405 > > > RunID: flkjse5yhp1c0 > > > Execution failed: > > > java.lang.NumberFormatException: null > > > > > > Here's the code I'm trying to run. > > > > > > ////// > > > type file_t {}; > > > type filelist_t { > > > file_t file; > > > } > > > > > > (file_t output, file_t error) minmax (file_t input) { > > > app { > > > wrapper @input stdout=@output stderr=@error; > > > } > > > } > > > > > > //filelist_t dataset[] ; > > > filelist_t dataset[] ; > > > foreach f in dataset { > > > file_t out > > transform="\1.out">; > > > file_t err > > transform="\1.err">; > > > (out, err) = minmax(f.file); > > > } > > > ////// > > > > > > Here's my input file, 'filelist.txt': > > > > > > file > > > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000340.tar > > > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000341.tar > > > > > > Any idea what's causing the exception? > > > > > > By the way, the only reason I'm trying to upgrade is so that I can get > > > disk-space management to work (" > > key="storagesize">40000000000" in sites.xml). My understanding is > > > that if I use VDL rc3, it could overrun the available scratch space and crash: > > > I have 13.3 TB of data to process, but the UC/ANL TG site only has 1.1 TB of > > > scratch space available. So if there's a way to enable scratch-space > > > management in rc3, that should be good enough for now. (I'm a Swift user, not > > > a developer, so I'd rather not be on the bleeding edge if I don't have to.) > > > > > > Thanks, > > > ccg > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Fri Mar 2 19:03:14 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 02 Mar 2007 19:03:14 -0600 Subject: [Swift-devel] Re: [Swift-user] New Java exception when moving from rc3 In-Reply-To: <1172883136.17280.0.camel@blabla.mcs.anl.gov> References: <9B4531C0-72F7-461A-80CB-B7D050A6734A@uchicago.edu> <1172882149.16258.5.camel@blabla.mcs.anl.gov> <1172883136.17280.0.camel@blabla.mcs.anl.gov> Message-ID: <1172883794.17361.2.camel@blabla.mcs.anl.gov> Add a skip="0" parameter to the csv_mapper: filelist_t dataset[] ; I'll fix this in SVN in the mean time. Mihael On Fri, 2007-03-02 at 18:52 -0600, Mihael Hategan wrote: > On Fri, 2007-03-02 at 18:35 -0600, Mihael Hategan wrote: > > The log4j file should include DEBUG on the loader (which should give the > > full stack trace of such). This is as of r367. > > Actually no. This is of r527. Sorry. > > > > > On Fri, 2007-03-02 at 23:21 +0000, Ben Clifford wrote: > > > I was a similar exception a couple of days ago on something i was playing > > > with but moved onto other stuff and never got round to looking closer - > > > the -debug option didn't give a stack trace and I never started digging > > > through the code. > > > > > > On Fri, 2 Mar 2007, Chad Glendenin wrote: > > > > > > > I'm still trying to run a workflow that worked with VDL rc3. With Swift > > > > snapshot 070301, the command-line argument works again, but now I get a Java > > > > exception: > > > > > > > > $ swift minmax.kml -list=filelist.txt > > > > Swift V 0.0405 > > > > RunID: flkjse5yhp1c0 > > > > Execution failed: > > > > java.lang.NumberFormatException: null > > > > > > > > Here's the code I'm trying to run. > > > > > > > > ////// > > > > type file_t {}; > > > > type filelist_t { > > > > file_t file; > > > > } > > > > > > > > (file_t output, file_t error) minmax (file_t input) { > > > > app { > > > > wrapper @input stdout=@output stderr=@error; > > > > } > > > > } > > > > > > > > //filelist_t dataset[] ; > > > > filelist_t dataset[] ; > > > > foreach f in dataset { > > > > file_t out > > > transform="\1.out">; > > > > file_t err > > > transform="\1.err">; > > > > (out, err) = minmax(f.file); > > > > } > > > > ////// > > > > > > > > Here's my input file, 'filelist.txt': > > > > > > > > file > > > > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000340.tar > > > > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000341.tar > > > > > > > > Any idea what's causing the exception? > > > > > > > > By the way, the only reason I'm trying to upgrade is so that I can get > > > > disk-space management to work (" > > > key="storagesize">40000000000" in sites.xml). My understanding is > > > > that if I use VDL rc3, it could overrun the available scratch space and crash: > > > > I have 13.3 TB of data to process, but the UC/ANL TG site only has 1.1 TB of > > > > scratch space available. So if there's a way to enable scratch-space > > > > management in rc3, that should be good enough for now. (I'm a Swift user, not > > > > a developer, so I'd rather not be on the bleeding edge if I don't have to.) > > > > > > > > Thanks, > > > > ccg > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Fri Mar 2 19:07:38 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 3 Mar 2007 01:07:38 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] New Java exception when moving from rc3 In-Reply-To: <1172882149.16258.5.camel@blabla.mcs.anl.gov> References: <9B4531C0-72F7-461A-80CB-B7D050A6734A@uchicago.edu> <1172882149.16258.5.camel@blabla.mcs.anl.gov> Message-ID: On Fri, 2 Mar 2007, Mihael Hategan wrote: > The log4j file should include DEBUG on the loader (which should give the > full stack trace of such). This is as of r367. it doesn't do so in the r526 build I have: swift -debug fmri.swift Recompilation suppressed. Using sites file: /Users/benc/work/vdl2/cog/modules/vdsk/dist/vdsk-1.0/bin/../etc/sites.xml Using tc.data: /Users/benc/work/vdl2/cog/modules/vdsk/dist/vdsk-1.0/bin/../etc/tc.data Swift V 0.0405 RunID: vu60fptpjhv70 Execution failed: java.lang.NumberFormatException: null I rsynced directory from my laptop to ci:~benc/swift-fmri if you want to look at what I have around. Logs look like Chad's, pretty much. > > On Fri, 2007-03-02 at 23:21 +0000, Ben Clifford wrote: > > I was a similar exception a couple of days ago on something i was playing > > with but moved onto other stuff and never got round to looking closer - > > the -debug option didn't give a stack trace and I never started digging > > through the code. > > > > On Fri, 2 Mar 2007, Chad Glendenin wrote: > > > > > I'm still trying to run a workflow that worked with VDL rc3. With Swift > > > snapshot 070301, the command-line argument works again, but now I get a Java > > > exception: > > > > > > $ swift minmax.kml -list=filelist.txt > > > Swift V 0.0405 > > > RunID: flkjse5yhp1c0 > > > Execution failed: > > > java.lang.NumberFormatException: null > > > > > > Here's the code I'm trying to run. > > > > > > ////// > > > type file_t {}; > > > type filelist_t { > > > file_t file; > > > } > > > > > > (file_t output, file_t error) minmax (file_t input) { > > > app { > > > wrapper @input stdout=@output stderr=@error; > > > } > > > } > > > > > > //filelist_t dataset[] ; > > > filelist_t dataset[] ; > > > foreach f in dataset { > > > file_t out > > transform="\1.out">; > > > file_t err > > transform="\1.err">; > > > (out, err) = minmax(f.file); > > > } > > > ////// > > > > > > Here's my input file, 'filelist.txt': > > > > > > file > > > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000340.tar > > > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000341.tar > > > > > > Any idea what's causing the exception? > > > > > > By the way, the only reason I'm trying to upgrade is so that I can get > > > disk-space management to work (" > > key="storagesize">40000000000" in sites.xml). My understanding is > > > that if I use VDL rc3, it could overrun the available scratch space and crash: > > > I have 13.3 TB of data to process, but the UC/ANL TG site only has 1.1 TB of > > > scratch space available. So if there's a way to enable scratch-space > > > management in rc3, that should be good enough for now. (I'm a Swift user, not > > > a developer, so I'd rather not be on the bleeding edge if I don't have to.) > > > > > > Thanks, > > > ccg > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From hategan at mcs.anl.gov Fri Mar 2 19:09:18 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 02 Mar 2007 19:09:18 -0600 Subject: [Swift-devel] Re: [Swift-user] New Java exception when moving from rc3 In-Reply-To: References: <9B4531C0-72F7-461A-80CB-B7D050A6734A@uchicago.edu> <1172882149.16258.5.camel@blabla.mcs.anl.gov> Message-ID: <1172884159.17361.4.camel@blabla.mcs.anl.gov> Right. Forgot to commit the added debugging statements. r529. Mihael On Sat, 2007-03-03 at 01:07 +0000, Ben Clifford wrote: > > On Fri, 2 Mar 2007, Mihael Hategan wrote: > > > The log4j file should include DEBUG on the loader (which should give the > > full stack trace of such). This is as of r367. > > it doesn't do so in the r526 build I have: > > swift -debug fmri.swift > Recompilation suppressed. > Using sites file: > /Users/benc/work/vdl2/cog/modules/vdsk/dist/vdsk-1.0/bin/../etc/sites.xml > Using tc.data: > /Users/benc/work/vdl2/cog/modules/vdsk/dist/vdsk-1.0/bin/../etc/tc.data > Swift V 0.0405 > RunID: vu60fptpjhv70 > Execution failed: > java.lang.NumberFormatException: null > > > I rsynced directory from my laptop to ci:~benc/swift-fmri if you want to > look at what I have around. > > Logs look like Chad's, pretty much. > > > > > On Fri, 2007-03-02 at 23:21 +0000, Ben Clifford wrote: > > > I was a similar exception a couple of days ago on something i was playing > > > with but moved onto other stuff and never got round to looking closer - > > > the -debug option didn't give a stack trace and I never started digging > > > through the code. > > > > > > On Fri, 2 Mar 2007, Chad Glendenin wrote: > > > > > > > I'm still trying to run a workflow that worked with VDL rc3. With Swift > > > > snapshot 070301, the command-line argument works again, but now I get a Java > > > > exception: > > > > > > > > $ swift minmax.kml -list=filelist.txt > > > > Swift V 0.0405 > > > > RunID: flkjse5yhp1c0 > > > > Execution failed: > > > > java.lang.NumberFormatException: null > > > > > > > > Here's the code I'm trying to run. > > > > > > > > ////// > > > > type file_t {}; > > > > type filelist_t { > > > > file_t file; > > > > } > > > > > > > > (file_t output, file_t error) minmax (file_t input) { > > > > app { > > > > wrapper @input stdout=@output stderr=@error; > > > > } > > > > } > > > > > > > > //filelist_t dataset[] ; > > > > filelist_t dataset[] ; > > > > foreach f in dataset { > > > > file_t out > > > transform="\1.out">; > > > > file_t err > > > transform="\1.err">; > > > > (out, err) = minmax(f.file); > > > > } > > > > ////// > > > > > > > > Here's my input file, 'filelist.txt': > > > > > > > > file > > > > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000340.tar > > > > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000341.tar > > > > > > > > Any idea what's causing the exception? > > > > > > > > By the way, the only reason I'm trying to upgrade is so that I can get > > > > disk-space management to work (" > > > key="storagesize">40000000000" in sites.xml). My understanding is > > > > that if I use VDL rc3, it could overrun the available scratch space and crash: > > > > I have 13.3 TB of data to process, but the UC/ANL TG site only has 1.1 TB of > > > > scratch space available. So if there's a way to enable scratch-space > > > > management in rc3, that should be good enough for now. (I'm a Swift user, not > > > > a developer, so I'd rather not be on the bleeding edge if I don't have to.) > > > > > > > > Thanks, > > > > ccg > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > From hategan at mcs.anl.gov Fri Mar 2 19:25:46 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 02 Mar 2007 19:25:46 -0600 Subject: [Swift-devel] Re: [Swift-user] New Java exception when moving from rc3 In-Reply-To: <1172883794.17361.2.camel@blabla.mcs.anl.gov> References: <9B4531C0-72F7-461A-80CB-B7D050A6734A@uchicago.edu> <1172882149.16258.5.camel@blabla.mcs.anl.gov> <1172883136.17280.0.camel@blabla.mcs.anl.gov> <1172883794.17361.2.camel@blabla.mcs.anl.gov> Message-ID: <1172885146.17361.6.camel@blabla.mcs.anl.gov> On Fri, 2007-03-02 at 19:03 -0600, Mihael Hategan wrote: > Add a skip="0" parameter to the csv_mapper: > filelist_t dataset[] ; > > I'll fix this in SVN in the mean time. Should be fixed now. > > Mihael > > On Fri, 2007-03-02 at 18:52 -0600, Mihael Hategan wrote: > > On Fri, 2007-03-02 at 18:35 -0600, Mihael Hategan wrote: > > > The log4j file should include DEBUG on the loader (which should give the > > > full stack trace of such). This is as of r367. > > > > Actually no. This is of r527. Sorry. > > > > > > > > On Fri, 2007-03-02 at 23:21 +0000, Ben Clifford wrote: > > > > I was a similar exception a couple of days ago on something i was playing > > > > with but moved onto other stuff and never got round to looking closer - > > > > the -debug option didn't give a stack trace and I never started digging > > > > through the code. > > > > > > > > On Fri, 2 Mar 2007, Chad Glendenin wrote: > > > > > > > > > I'm still trying to run a workflow that worked with VDL rc3. With Swift > > > > > snapshot 070301, the command-line argument works again, but now I get a Java > > > > > exception: > > > > > > > > > > $ swift minmax.kml -list=filelist.txt > > > > > Swift V 0.0405 > > > > > RunID: flkjse5yhp1c0 > > > > > Execution failed: > > > > > java.lang.NumberFormatException: null > > > > > > > > > > Here's the code I'm trying to run. > > > > > > > > > > ////// > > > > > type file_t {}; > > > > > type filelist_t { > > > > > file_t file; > > > > > } > > > > > > > > > > (file_t output, file_t error) minmax (file_t input) { > > > > > app { > > > > > wrapper @input stdout=@output stderr=@error; > > > > > } > > > > > } > > > > > > > > > > //filelist_t dataset[] ; > > > > > filelist_t dataset[] ; > > > > > foreach f in dataset { > > > > > file_t out > > > > transform="\1.out">; > > > > > file_t err > > > > transform="\1.err">; > > > > > (out, err) = minmax(f.file); > > > > > } > > > > > ////// > > > > > > > > > > Here's my input file, 'filelist.txt': > > > > > > > > > > file > > > > > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000340.tar > > > > > /scratch20/chad/src/workflow/driventurb_3d_direct_plt_cnt_000341.tar > > > > > > > > > > Any idea what's causing the exception? > > > > > > > > > > By the way, the only reason I'm trying to upgrade is so that I can get > > > > > disk-space management to work (" > > > > key="storagesize">40000000000" in sites.xml). My understanding is > > > > > that if I use VDL rc3, it could overrun the available scratch space and crash: > > > > > I have 13.3 TB of data to process, but the UC/ANL TG site only has 1.1 TB of > > > > > scratch space available. So if there's a way to enable scratch-space > > > > > management in rc3, that should be good enough for now. (I'm a Swift user, not > > > > > a developer, so I'd rather not be on the bleeding edge if I don't have to.) > > > > > > > > > > Thanks, > > > > > ccg > > > > > > > > > > _______________________________________________ > > > > > Swift-user mailing list > > > > > Swift-user at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Fri Mar 2 20:01:29 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 3 Mar 2007 02:01:29 +0000 (GMT) Subject: [Swift-devel] more remote side mapping thoughts Message-ID: I was chatting with Nika earlier and she suggested that it might be nice to write something like this: (I trimmed it a bit) (file prt_files[]) GENERATOR (string s1, string s2, string s3) { app { fe_pl "--nosite" "--gaff" @s1 "--stitle" @s2 "--minstep" @s3; prt_files ; } } to mean "fe_pl generates some (unknown) number of files, all of the form *prt*, and those files should map to the array specified as the return parameter". It seems sort-of intuitive. -- From yongzh at cs.uchicago.edu Fri Mar 2 20:25:14 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 2 Mar 2007 20:25:14 -0600 (CST) Subject: [Swift-devel] more remote side mapping thoughts In-Reply-To: References: Message-ID: If this is the case, then the ones we talked before should be: (file c, file p, file t) antechamber (file i){ app { myapp .... c; p; t; } } Yong. On Sat, 3 Mar 2007, Ben Clifford wrote: > > I was chatting with Nika earlier and she suggested that it might be nice > to write something like this: (I trimmed it a bit) > > (file prt_files[]) GENERATOR (string s1, string s2, string s3) { > app { > fe_pl "--nosite" "--gaff" @s1 "--stitle" @s2 "--minstep" @s3; > prt_files ; > } > } > > to mean "fe_pl generates some (unknown) number of files, all of the form > *prt*, and those files should map to the array specified as the return > parameter". > > It seems sort-of intuitive. > > -- > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From tiberius at ci.uchicago.edu Fri Mar 2 20:49:33 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Fri, 2 Mar 2007 20:49:33 -0600 Subject: [Swift-devel] Documentation for hierarchical mappers ? Message-ID: It would be nice to have an example of the hierarchical mapper that Mihael was arguing at some point in time for file named from various substrings: aabb....txt where ,, are the substrings. It seems that we will not have very soon string concatenation, so at least this specialized mapper would do. The current SIDGrid workflow is not really what we are advertising for swift.... Hint: Let's make an example with 3 substrings (I have filenames made from 3 substrings ) that are generated in 3 loops. Tibi -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From benc at hawaga.org.uk Sat Mar 3 05:38:51 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 3 Mar 2007 11:38:51 +0000 (GMT) Subject: [Swift-devel] Documentation for hierarchical mappers ? In-Reply-To: References: Message-ID: On Fri, 2 Mar 2007, Tiberiu Stef-Praun wrote: > It seems that we will not have very soon string concatenation, so at > least this specialized mapper would do. string concat is not that far away - its one of the small set of 0.2 deliverables so should be appearing in the next week or so. This is what needs to happen, I think: i) text->xml parser needs to support pass-through of multiparameter expressions (I might have that done in the next hour) (me) ii) xml->kml compiler needs to generate multiple arg karajan invocations (me or yong) iii) vdl:strcat function needs to be implemented, but as I think there's already a karajan strcat with the right semantics it should be straightforward to implement vdl:strcat as a wrapper. (mihael, maybe yong?) I'm working on i) at the moment. -- From benc at hawaga.org.uk Sat Mar 3 05:53:40 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 3 Mar 2007 11:53:40 +0000 (GMT) Subject: [Swift-devel] Re: [Swift-user] New Java exception when moving from rc3 In-Reply-To: <1172885146.17361.6.camel@blabla.mcs.anl.gov> References: <9B4531C0-72F7-461A-80CB-B7D050A6734A@uchicago.edu> <1172882149.16258.5.camel@blabla.mcs.anl.gov> <1172883136.17280.0.camel@blabla.mcs.anl.gov> <1172883794.17361.2.camel@blabla.mcs.anl.gov> <1172885146.17361.6.camel@blabla.mcs.anl.gov> Message-ID: On Fri, 2 Mar 2007, Mihael Hategan wrote: > On Fri, 2007-03-02 at 19:03 -0600, Mihael Hategan wrote: > > Add a skip="0" parameter to the csv_mapper: > > filelist_t dataset[] ; > > > > I'll fix this in SVN in the mean time. > > Should be fixed now. some of the nightly tests say this now: Execution failed: org.griphyn.vdl.mapping.InvalidMappingParameterException: Missing required mapping parameter: location I guess the extra hardcore checking in r531 is too hardcore for our examples ;-) -- From tiberius at ci.uchicago.edu Sat Mar 3 10:54:04 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Sat, 3 Mar 2007 10:54:04 -0600 Subject: [Swift-devel] Documentation for hierarchical mappers ? In-Reply-To: References: Message-ID: Great, thanks for the update on the string concatenation. Tibi On 3/3/07, Ben Clifford wrote: > > > On Fri, 2 Mar 2007, Tiberiu Stef-Praun wrote: > > > It seems that we will not have very soon string concatenation, so at > > least this specialized mapper would do. > > string concat is not that far away - its one of the small set of 0.2 > deliverables so should be appearing in the next week or so. This is what > needs to happen, I think: > > i) text->xml parser needs to support pass-through of multiparameter > expressions (I might have that done in the next hour) > (me) > > ii) xml->kml compiler needs to generate multiple arg karajan invocations > (me or yong) > > iii) vdl:strcat function needs to be implemented, but as I think there's > already a karajan strcat with the right semantics it should be > straightforward to implement vdl:strcat as a wrapper. > (mihael, maybe yong?) > > I'm working on i) at the moment. > > -- > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From hategan at mcs.anl.gov Sat Mar 3 11:45:21 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 03 Mar 2007 11:45:21 -0600 Subject: [Swift-devel] Documentation for hierarchical mappers ? In-Reply-To: References: Message-ID: <1172943922.30472.0.camel@blabla.mcs.anl.gov> On Sat, 2007-03-03 at 11:38 +0000, Ben Clifford wrote: > > On Fri, 2 Mar 2007, Tiberiu Stef-Praun wrote: > > > It seems that we will not have very soon string concatenation, so at > > least this specialized mapper would do. > > string concat is not that far away - its one of the small set of 0.2 > deliverables so should be appearing in the next week or so. This is what > needs to happen, I think: > > i) text->xml parser needs to support pass-through of multiparameter > expressions (I might have that done in the next hour) > (me) > > ii) xml->kml compiler needs to generate multiple arg karajan invocations > (me or yong) > > iii) vdl:strcat function needs to be implemented, but as I think there's > already a karajan strcat with the right semantics it should be > straightforward to implement vdl:strcat as a wrapper. > (mihael, maybe yong?) Yeah: element(strcat, [...] concat(each(...)) ) > > I'm working on i) at the moment. > From hategan at mcs.anl.gov Sat Mar 3 12:25:26 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 03 Mar 2007 12:25:26 -0600 Subject: [Swift-devel] Re: [Swift-user] New Java exception when moving from rc3 In-Reply-To: References: <9B4531C0-72F7-461A-80CB-B7D050A6734A@uchicago.edu> <1172882149.16258.5.camel@blabla.mcs.anl.gov> <1172883136.17280.0.camel@blabla.mcs.anl.gov> <1172883794.17361.2.camel@blabla.mcs.anl.gov> <1172885146.17361.6.camel@blabla.mcs.anl.gov> Message-ID: <1172946327.31967.1.camel@blabla.mcs.anl.gov> On Sat, 2007-03-03 at 11:53 +0000, Ben Clifford wrote: > > On Fri, 2 Mar 2007, Mihael Hategan wrote: > > > On Fri, 2007-03-02 at 19:03 -0600, Mihael Hategan wrote: > > > Add a skip="0" parameter to the csv_mapper: > > > filelist_t dataset[] ; > > > > > > I'll fix this in SVN in the mean time. > > > > Should be fixed now. > > some of the nightly tests say this now: > > Execution failed: > org.griphyn.vdl.mapping.InvalidMappingParameterException: Missing > required mapping parameter: location > > I guess the extra hardcore checking in r531 is too hardcore for our > examples ;-) Yeah. Some mappers did something like value = param.getValue(); if (value != null)... Fixed hopefully. > From benc at ci.uchicago.edu Sat Mar 3 13:51:33 2007 From: benc at ci.uchicago.edu (Ben Clifford) Date: Sat, 3 Mar 2007 14:51:33 -0500 (EST) Subject: [Swift-devel] Re: app {} environment In-Reply-To: References: Message-ID: On Sat, 3 Mar 2007, Ben Clifford wrote: > I guess something like profile keys might be useful in this space at some > point, though. wait, we have those in the schema already - also unused, though. -- From benc at ci.uchicago.edu Sat Mar 3 13:44:33 2007 From: benc at ci.uchicago.edu (Ben Clifford) Date: Sat, 3 Mar 2007 14:44:33 -0500 (EST) Subject: [Swift-devel] app {} environment Message-ID: The XML intermediate language definition for ApplicationBinding has a definition for an appenv element: Specifies the environment in which the application runs which at present is never generated by the text->xml compiler, and looks like its never used by the xml->kml compiler either. I guess something like profile keys might be useful in this space at some point, though. -- From benc at ci.uchicago.edu Sat Mar 3 13:16:34 2007 From: benc at ci.uchicago.edu (Ben Clifford) Date: Sat, 3 Mar 2007 14:16:34 -0500 (EST) Subject: [Swift-devel] generated java code for schema Message-ID: The .java files for org.griphyn.vdl.model look like they are compiled from VDL.xsd by the compileSchema build target. Unless there's compelling reason not to, these .java files shouldn't be in the SVN, and that build target should get called as part of the build. -- From benc at ci.uchicago.edu Sat Mar 3 14:41:00 2007 From: benc at ci.uchicago.edu (Ben Clifford) Date: Sat, 3 Mar 2007 15:41:00 -0500 (EST) Subject: [Swift-devel] functionm in Karajan.stg Message-ID: There's a template called functionm in Karajan.stg. I don't see it referred to anywhere - in fact, I don't see the string 'functionm' anywhere in the source tree apart from the one occurence in Karajan.stg. Is it dead code? -- From hategan at mcs.anl.gov Sun Mar 4 20:39:19 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 04 Mar 2007 20:39:19 -0600 Subject: [Swift-devel] generated java code for schema In-Reply-To: References: Message-ID: <1173062359.12030.1.camel@blabla.mcs.anl.gov> On Sat, 2007-03-03 at 14:16 -0500, Ben Clifford wrote: > The .java files for org.griphyn.vdl.model look like they are compiled > from VDL.xsd by the compileSchema build target. Unless there's compelling > reason not to, these .java files shouldn't be in the SVN, and that build > target should get called as part of the build. I thought you removed them from SVN. I'm pretty sure I didn't add them back. > From benc at hawaga.org.uk Mon Mar 5 05:46:25 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 5 Mar 2007 11:46:25 +0000 (GMT) Subject: [Swift-devel] generated java code for schema In-Reply-To: <1173062359.12030.1.camel@blabla.mcs.anl.gov> References: <1173062359.12030.1.camel@blabla.mcs.anl.gov> Message-ID: On Sun, 4 Mar 2007, Mihael Hategan wrote: > On Sat, 2007-03-03 at 14:16 -0500, Ben Clifford wrote: > > The .java files for org.griphyn.vdl.model look like they are compiled > > from VDL.xsd by the compileSchema build target. Unless there's compelling > > reason not to, these .java files shouldn't be in the SVN, and that build > > target should get called as part of the build. > > I thought you removed them from SVN. I'm pretty sure I didn't add them > back. I think that was the stuff that ANTLR generated - this was a different set, generated from the XML - the same but for the next language along the pipeline... -- From benc at hawaga.org.uk Mon Mar 5 05:47:41 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 5 Mar 2007 11:47:41 +0000 (GMT) Subject: [Swift-devel] generated java code for schema In-Reply-To: References: <1173062359.12030.1.camel@blabla.mcs.anl.gov> Message-ID: On Mon, 5 Mar 2007, Ben Clifford wrote: > > I thought you removed them from SVN. I removed them yesterday, by the wya, in r542. -- From benc at hawaga.org.uk Mon Mar 5 06:36:57 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 5 Mar 2007 12:36:57 +0000 (GMT) Subject: [Swift-devel] source location passthrough Message-ID: A bunch of error reporting might be made more useful if we pass the source file location (line, at least) through the various intermediate languages so that errors like this: $ swift fmri.swift Swift V 0.0405 RunID: 8zr1mg5964q90 Execution failed: java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 have a bit more context. Bit of a hassle to implement, though. -- From wilde at mcs.anl.gov Mon Mar 5 07:28:45 2007 From: wilde at mcs.anl.gov (Mike Wilde) Date: Mon, 05 Mar 2007 07:28:45 -0600 Subject: [Swift-devel] source location passthrough In-Reply-To: References: Message-ID: <45EC1B0D.6070908@mcs.anl.gov> I think that this is a great idea. Can you file it in bugz so we can sketch a rough design and estimate its cost in a later release? 0.3 or beyond, likely. - Mike Ben Clifford wrote, On 3/5/2007 6:36 AM: > A bunch of error reporting might be made more useful if we pass the source > file location (line, at least) through the various intermediate languages > so that errors like this: > > $ swift fmri.swift > Swift V 0.0405 > RunID: 8zr1mg5964q90 > Execution failed: > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > > have a bit more context. > > Bit of a hassle to implement, though. > -- Mike Wilde Computation Institute, University of Chicago Math & Computer Science Division Argonne National Laboratory Argonne, IL 60439 USA tel 630-252-7497 fax 630-252-1997 From yongzh at cs.uchicago.edu Mon Mar 5 09:14:00 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Mon, 5 Mar 2007 09:14:00 -0600 (CST) Subject: [Swift-devel] functionm in Karajan.stg In-Reply-To: References: Message-ID: just some backups On Sat, 3 Mar 2007, Ben Clifford wrote: > > There's a template called functionm in Karajan.stg. I don't see it > referred to anywhere - in fact, I don't see the string 'functionm' > anywhere in the source tree apart from the one occurence in Karajan.stg. > Is it dead code? > > -- > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Mon Mar 5 09:36:45 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 05 Mar 2007 09:36:45 -0600 Subject: [Swift-devel] source location passthrough In-Reply-To: References: Message-ID: <1173109005.20917.7.camel@blabla.mcs.anl.gov> Right. We talked about this a while ago. You can some magic in the .kml files. The _line attribute. This would go for each element. The _filename attribute. This is lexically inherited (i.e. in the XML tree, a node's file name is taken from the closest node as the tree is traversed towards the root that has a _filename attribute). On Mon, 2007-03-05 at 12:36 +0000, Ben Clifford wrote: > A bunch of error reporting might be made more useful if we pass the source > file location (line, at least) through the various intermediate languages > so that errors like this: > > $ swift fmri.swift > Swift V 0.0405 > RunID: 8zr1mg5964q90 > Execution failed: > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > > have a bit more context. > > Bit of a hassle to implement, though. > From yongzh at cs.uchicago.edu Mon Mar 5 10:52:55 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Mon, 5 Mar 2007 10:52:55 -0600 (CST) Subject: [Swift-devel] Documentation for hierarchical mappers ? In-Reply-To: References: Message-ID: Hi Ben, When you finish the text part, let me know the xml format and I'll incorporate that into the xml->kml part. Thanks. Yong. On Sat, 3 Mar 2007, Ben Clifford wrote: > > > On Fri, 2 Mar 2007, Tiberiu Stef-Praun wrote: > > > It seems that we will not have very soon string concatenation, so at > > least this specialized mapper would do. > > string concat is not that far away - its one of the small set of 0.2 > deliverables so should be appearing in the next week or so. This is what > needs to happen, I think: > > i) text->xml parser needs to support pass-through of multiparameter > expressions (I might have that done in the next hour) > (me) > > ii) xml->kml compiler needs to generate multiple arg karajan invocations > (me or yong) > > iii) vdl:strcat function needs to be implemented, but as I think there's > already a karajan strcat with the right semantics it should be > straightforward to implement vdl:strcat as a wrapper. > (mihael, maybe yong?) > > I'm working on i) at the moment. > > -- > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From yongzh at cs.uchicago.edu Mon Mar 5 10:57:38 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Mon, 5 Mar 2007 10:57:38 -0600 (CST) Subject: [Swift-devel] Re: app {} environment In-Reply-To: References: Message-ID: right, there were there for future uses so we don't have to modify the schema too often. On Sat, 3 Mar 2007, Ben Clifford wrote: > > On Sat, 3 Mar 2007, Ben Clifford wrote: > > > I guess something like profile keys might be useful in this space at some > > point, though. > > wait, we have those in the schema already - also unused, though. > -- > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From yongzh at cs.uchicago.edu Mon Mar 5 11:12:33 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Mon, 5 Mar 2007 11:12:33 -0600 (CST) Subject: [Swift-devel] generated java code for schema In-Reply-To: References: <1173062359.12030.1.camel@blabla.mcs.anl.gov> Message-ID: right, there is also a library generated from the VDL.xsd and renamed and put in lib. We need to automate these carefully if we decide to take them out. Yong. On Mon, 5 Mar 2007, Ben Clifford wrote: > > > On Sun, 4 Mar 2007, Mihael Hategan wrote: > > > On Sat, 2007-03-03 at 14:16 -0500, Ben Clifford wrote: > > > The .java files for org.griphyn.vdl.model look like they are compiled > > > from VDL.xsd by the compileSchema build target. Unless there's compelling > > > reason not to, these .java files shouldn't be in the SVN, and that build > > > target should get called as part of the build. > > > > I thought you removed them from SVN. I'm pretty sure I didn't add them > > back. > > I think that was the stuff that ANTLR generated - this was a different > set, generated from the XML - the same but for the next language along the > pipeline... > -- > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Mon Mar 5 11:18:47 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 5 Mar 2007 17:18:47 +0000 (GMT) Subject: [Swift-devel] generated java code for schema In-Reply-To: References: <1173062359.12030.1.camel@blabla.mcs.anl.gov> Message-ID: In r542 I removed lib/vdldefinitions.jar from the svn, because I saw that the compileSchema target was modifying that. On Mon, 5 Mar 2007, Yong Zhao wrote: > right, there is also a library generated from the VDL.xsd and renamed and > put in lib. We need to automate these carefully if we decide to take them > out. > > Yong. > > On Mon, 5 Mar 2007, Ben Clifford wrote: > > > > > > > On Sun, 4 Mar 2007, Mihael Hategan wrote: > > > > > On Sat, 2007-03-03 at 14:16 -0500, Ben Clifford wrote: > > > > The .java files for org.griphyn.vdl.model look like they are compiled > > > > from VDL.xsd by the compileSchema build target. Unless there's compelling > > > > reason not to, these .java files shouldn't be in the SVN, and that build > > > > target should get called as part of the build. > > > > > > I thought you removed them from SVN. I'm pretty sure I didn't add them > > > back. > > > > I think that was the stuff that ANTLR generated - this was a different > > set, generated from the XML - the same but for the next language along the > > pipeline... > > -- > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From nefedova at mcs.anl.gov Mon Mar 5 12:15:57 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Mon, 05 Mar 2007 12:15:57 -0600 Subject: [Swift-devel] wiki editing Message-ID: <6.0.0.22.2.20070305121002.0592be20@mail.mcs.anl.gov> Hi, I wanted to add a couple of links down from the Application Status wiki page http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ApplicationStatus but I got access denied ( something like this: http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ApplicationStatus/DETAILS, http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ApplicationStatus/scripts, etc). Could I be given such permissions to add pages to the wiki, or I should contact support every time I need to do this? Thank you very much, Nika From nefedova at mcs.anl.gov Tue Mar 6 11:57:53 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 06 Mar 2007 11:57:53 -0600 Subject: [Swift-devel] filenames with "." Message-ID: <6.0.0.22.2.20070306115336.03326ec0@mail.mcs.anl.gov> Hi, I have Swift complaining about this line: file solv_repu_0.2_0.3_a0_m001_wham <"solv_repu_0.2_0.3_a0_m001.wham">; [315] wiggum /sandbox/ydeng/alamines > swift swift-MolDyn-free.dtm Could not compile SwiftScript source: line 311:17: unexpected token: .2 It looks like it doesn't like multiple "." in the file name. Is it possible to fix it ? I have a lot of files (input/output) that look like that and it will be a big mess if I start renaming those... Thanks! Nika From yongzh at cs.uchicago.edu Tue Mar 6 11:59:15 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Tue, 6 Mar 2007 11:59:15 -0600 (CST) Subject: [Swift-devel] filenames with "." In-Reply-To: <6.0.0.22.2.20070306115336.03326ec0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070306115336.03326ec0@mail.mcs.anl.gov> Message-ID: '.' is the path separator in swift, so you have to replace those with underscore or something. On Tue, 6 Mar 2007, Veronika V. Nefedova wrote: > Hi, > > I have Swift complaining about this line: > > file solv_repu_0.2_0.3_a0_m001_wham <"solv_repu_0.2_0.3_a0_m001.wham">; > > [315] wiggum /sandbox/ydeng/alamines > swift swift-MolDyn-free.dtm > Could not compile SwiftScript source: line 311:17: unexpected token: .2 > > It looks like it doesn't like multiple "." in the file name. Is it possible > to fix it ? I have a lot of files (input/output) that look like that and it > will be a big mess if I start renaming those... > > Thanks! > > Nika > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Tue Mar 6 12:01:35 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 06 Mar 2007 12:01:35 -0600 Subject: [Swift-devel] filenames with "." In-Reply-To: References: <6.0.0.22.2.20070306115336.03326ec0@mail.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070306120005.03323e60@mail.mcs.anl.gov> Sigh... So I should just replace "." in logical file name? Can I keep dots in the physical file name ? Nika At 11:59 AM 3/6/2007, Yong Zhao wrote: >'.' is the path separator in swift, so you have to replace those with >underscore or something. > >On Tue, 6 Mar 2007, Veronika V. Nefedova wrote: > > > Hi, > > > > I have Swift complaining about this line: > > > > file solv_repu_0.2_0.3_a0_m001_wham <"solv_repu_0.2_0.3_a0_m001.wham">; > > > > [315] wiggum /sandbox/ydeng/alamines > swift swift-MolDyn-free.dtm > > Could not compile SwiftScript source: line 311:17: unexpected token: .2 > > > > It looks like it doesn't like multiple "." in the file name. Is it possible > > to fix it ? I have a lot of files (input/output) that look like that and it > > will be a big mess if I start renaming those... > > > > Thanks! > > > > Nika > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Tue Mar 6 11:59:29 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 06 Mar 2007 11:59:29 -0600 Subject: [Swift-devel] filenames with "." In-Reply-To: References: <6.0.0.22.2.20070306115336.03326ec0@mail.mcs.anl.gov> Message-ID: <1173203969.27240.0.camel@blabla.mcs.anl.gov> On Tue, 2007-03-06 at 11:59 -0600, Yong Zhao wrote: > '.' is the path separator in swift, so you have to replace those with > underscore or something. That refers to the variable identifier, not the file name right? > > On Tue, 6 Mar 2007, Veronika V. Nefedova wrote: > > > Hi, > > > > I have Swift complaining about this line: > > > > file solv_repu_0.2_0.3_a0_m001_wham <"solv_repu_0.2_0.3_a0_m001.wham">; > > > > [315] wiggum /sandbox/ydeng/alamines > swift swift-MolDyn-free.dtm > > Could not compile SwiftScript source: line 311:17: unexpected token: .2 > > > > It looks like it doesn't like multiple "." in the file name. Is it possible > > to fix it ? I have a lot of files (input/output) that look like that and it > > will be a big mess if I start renaming those... > > > > Thanks! > > > > Nika > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From yongzh at cs.uchicago.edu Tue Mar 6 12:04:10 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Tue, 6 Mar 2007 12:04:10 -0600 (CST) Subject: [Swift-devel] filenames with "." In-Reply-To: <6.0.0.22.2.20070306120005.03323e60@mail.mcs.anl.gov> References: <6.0.0.22.2.20070306115336.03326ec0@mail.mcs.anl.gov> <6.0.0.22.2.20070306120005.03323e60@mail.mcs.anl.gov> Message-ID: Right. it is more a logical variable name instead of filename. It has nothing to do with how you name your physical name. On Tue, 6 Mar 2007, Veronika V. Nefedova wrote: > Sigh... > So I should just replace "." in logical file name? Can I keep dots in the > physical file name ? > > Nika > > At 11:59 AM 3/6/2007, Yong Zhao wrote: > >'.' is the path separator in swift, so you have to replace those with > >underscore or something. > > > >On Tue, 6 Mar 2007, Veronika V. Nefedova wrote: > > > > > Hi, > > > > > > I have Swift complaining about this line: > > > > > > file solv_repu_0.2_0.3_a0_m001_wham <"solv_repu_0.2_0.3_a0_m001.wham">; > > > > > > [315] wiggum /sandbox/ydeng/alamines > swift swift-MolDyn-free.dtm > > > Could not compile SwiftScript source: line 311:17: unexpected token: .2 > > > > > > It looks like it doesn't like multiple "." in the file name. Is it possible > > > to fix it ? I have a lot of files (input/output) that look like that and it > > > will be a big mess if I start renaming those... > > > > > > Thanks! > > > > > > Nika > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > From nefedova at mcs.anl.gov Tue Mar 6 12:04:15 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 06 Mar 2007 12:04:15 -0600 Subject: [Swift-devel] filenames with "." In-Reply-To: References: <6.0.0.22.2.20070306115336.03326ec0@mail.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070306120245.03324ec0@mail.mcs.anl.gov> What symbols can I have in the filename ? "-", "+", "=", etc ? What is *not* allowed? NIka At 11:59 AM 3/6/2007, Yong Zhao wrote: >'.' is the path separator in swift, so you have to replace those with >underscore or something. > >On Tue, 6 Mar 2007, Veronika V. Nefedova wrote: > > > Hi, > > > > I have Swift complaining about this line: > > > > file solv_repu_0.2_0.3_a0_m001_wham <"solv_repu_0.2_0.3_a0_m001.wham">; > > > > [315] wiggum /sandbox/ydeng/alamines > swift swift-MolDyn-free.dtm > > Could not compile SwiftScript source: line 311:17: unexpected token: .2 > > > > It looks like it doesn't like multiple "." in the file name. Is it possible > > to fix it ? I have a lot of files (input/output) that look like that and it > > will be a big mess if I start renaming those... > > > > Thanks! > > > > Nika > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From itf at mcs.anl.gov Tue Mar 6 12:03:20 2007 From: itf at mcs.anl.gov (=?UTF-8?B?SWFuIEZvc3Rlcg==?=) Date: Tue, 6 Mar 2007 18:03:20 +0000 Subject: [Swift-devel] filenames with "." In-Reply-To: <6.0.0.22.2.20070306120005.03323e60@mail.mcs.anl.gov> References: <6.0.0.22.2.20070306115336.03326ec0@mail.mcs.anl.gov> <6.0.0.22.2.20070306120005.03323e60@mail.mcs.anl.gov> Message-ID: <251098058-1173204263-cardhu_blackberry.rim.net-762091144-@bwe047-cell00.bisx.prod.on.blackberry> That seems a rather unreasonable restriction long term? Ian Sent via BlackBerry from T-Mobile -----Original Message----- From: "Veronika V. Nefedova" Date: Tue, 06 Mar 2007 12:01:35 To:Yong Zhao Cc:swift-devel at ci.uchicago.edu Subject: Re: [Swift-devel] filenames with "." Sigh... So I should just replace "." in logical file name? Can I keep dots in the physical file name ? Nika At 11:59 AM 3/6/2007, Yong Zhao wrote: >'.' is the path separator in swift, so you have to replace those with >underscore or something. > >On Tue, 6 Mar 2007, Veronika V. Nefedova wrote: > > > Hi, > > > > I have Swift complaining about this line: > > > > file solv_repu_0.2_0.3_a0_m001_wham <"solv_repu_0.2_0.3_a0_m001.wham">; > > > > [315] wiggum /sandbox/ydeng/alamines > swift swift-MolDyn-free.dtm > > Could not compile SwiftScript source: line 311:17: unexpected token: .2 > > > > It looks like it doesn't like multiple "." in the file name. Is it possible > > to fix it ? I have a lot of files (input/output) that look like that and it > > will be a big mess if I start renaming those... > > > > Thanks! > > > > Nika > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From yongzh at cs.uchicago.edu Tue Mar 6 12:08:50 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Tue, 6 Mar 2007 12:08:50 -0600 (CST) Subject: [Swift-devel] filenames with "." In-Reply-To: <6.0.0.22.2.20070306120245.03324ec0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070306115336.03326ec0@mail.mcs.anl.gov> <6.0.0.22.2.20070306120245.03324ec0@mail.mcs.anl.gov> Message-ID: It conforms to the usual ones in programming languages: letters, underscore, and digits Yong. On Tue, 6 Mar 2007, Veronika V. Nefedova wrote: > What symbols can I have in the filename ? "-", "+", "=", etc ? What is > *not* allowed? > > NIka > > At 11:59 AM 3/6/2007, Yong Zhao wrote: > >'.' is the path separator in swift, so you have to replace those with > >underscore or something. > > > >On Tue, 6 Mar 2007, Veronika V. Nefedova wrote: > > > > > Hi, > > > > > > I have Swift complaining about this line: > > > > > > file solv_repu_0.2_0.3_a0_m001_wham <"solv_repu_0.2_0.3_a0_m001.wham">; > > > > > > [315] wiggum /sandbox/ydeng/alamines > swift swift-MolDyn-free.dtm > > > Could not compile SwiftScript source: line 311:17: unexpected token: .2 > > > > > > It looks like it doesn't like multiple "." in the file name. Is it possible > > > to fix it ? I have a lot of files (input/output) that look like that and it > > > will be a big mess if I start renaming those... > > > > > > Thanks! > > > > > > Nika > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > From yongzh at cs.uchicago.edu Tue Mar 6 12:10:20 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Tue, 6 Mar 2007 12:10:20 -0600 (CST) Subject: [Swift-devel] filenames with "." In-Reply-To: <251098058-1173204263-cardhu_blackberry.rim.net-762091144-@bwe047-cell00.bisx.prod.on.blackberry> References: <6.0.0.22.2.20070306115336.03326ec0@mail.mcs.anl.gov> <6.0.0.22.2.20070306120005.03323e60@mail.mcs.anl.gov> <251098058-1173204263-cardhu_blackberry.rim.net-762091144-@bwe047-cell00.bisx.prod.on.blackberry> Message-ID: This is not a restriction, just like in java or in c, you can't put . into an identifier, as that would mean to access its member: a.b means I want to access the member item b of a. Yong. On Tue, 6 Mar 2007, [UTF-8] Ian Foster wrote: > That seems a rather unreasonable restriction long term? > > Ian > > Sent via BlackBerry from T-Mobile > > -----Original Message----- > From: "Veronika V. Nefedova" > Date: Tue, 06 Mar 2007 12:01:35 > To:Yong Zhao > Cc:swift-devel at ci.uchicago.edu > Subject: Re: [Swift-devel] filenames with "." > > Sigh... > So I should just replace "." in logical file name? Can I keep dots in the > physical file name ? > > Nika > > At 11:59 AM 3/6/2007, Yong Zhao wrote: > >'.' is the path separator in swift, so you have to replace those with > >underscore or something. > > > >On Tue, 6 Mar 2007, Veronika V. Nefedova wrote: > > > > > Hi, > > > > > > I have Swift complaining about this line: > > > > > > file solv_repu_0.2_0.3_a0_m001_wham <"solv_repu_0.2_0.3_a0_m001.wham">; > > > > > > [315] wiggum /sandbox/ydeng/alamines > swift swift-MolDyn-free.dtm > > > Could not compile SwiftScript source: line 311:17: unexpected token: .2 > > > > > > It looks like it doesn't like multiple "." in the file name. Is it possible > > > to fix it ? I have a lot of files (input/output) that look like that and it > > > will be a big mess if I start renaming those... > > > > > > Thanks! > > > > > > Nika > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From leggett at ci.uchicago.edu Tue Mar 6 09:29:24 2007 From: leggett at ci.uchicago.edu (Ti Leggett) Date: Tue, 6 Mar 2007 09:29:24 -0600 Subject: [Swift-devel] Re: wiki editing In-Reply-To: <6.0.0.22.2.20070305121002.0592be20@mail.mcs.anl.gov> References: <6.0.0.22.2.20070305121002.0592be20@mail.mcs.anl.gov> Message-ID: <7102B7CB-F854-4597-8437-28BCB59C3FCC@ci.uchicago.edu> The owners of that particular wiki web should be able to grant you access. I believe the relevant people are on the swift-devel mailing list. If they can't, or don't know how, let us know. On Mar 5, 2007, at Mon,Mar 5, 12:15 PM, Veronika V. Nefedova wrote: > Hi, > > I wanted to add a couple of links down from the Application Status > wiki page http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ > ApplicationStatus but I got access denied ( something like this: > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ApplicationStatus/ > DETAILS, http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ > ApplicationStatus/scripts, etc). Could I be given such permissions > to add pages to the wiki, or I should contact support every time I > need to do this? > > Thank you very much, > > Nika > From tiberius at ci.uchicago.edu Tue Mar 6 12:31:37 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Tue, 6 Mar 2007 12:31:37 -0600 Subject: [Swift-devel] Re: wiki editing In-Reply-To: <7102B7CB-F854-4597-8437-28BCB59C3FCC@ci.uchicago.edu> References: <6.0.0.22.2.20070305121002.0592be20@mail.mcs.anl.gov> <7102B7CB-F854-4597-8437-28BCB59C3FCC@ci.uchicago.edu> Message-ID: I helped Nika out, she was passing the full link path instead of using a simple WikiName. Using full URL links confused the Wiki, because those pages were not created yet. On 3/6/07, Ti Leggett wrote: > The owners of that particular wiki web should be able to grant you > access. I believe the relevant people are on the swift-devel mailing > list. If they can't, or don't know how, let us know. > > On Mar 5, 2007, at Mon,Mar 5, 12:15 PM, Veronika V. Nefedova wrote: > > > Hi, > > > > I wanted to add a couple of links down from the Application Status > > wiki page http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ > > ApplicationStatus but I got access denied ( something like this: > > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ApplicationStatus/ > > DETAILS, http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ > > ApplicationStatus/scripts, etc). Could I be given such permissions > > to add pages to the wiki, or I should contact support every time I > > need to do this? > > > > Thank you very much, > > > > Nika > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From nefedova at mcs.anl.gov Tue Mar 6 12:36:46 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 06 Mar 2007 12:36:46 -0600 Subject: [Swift-devel] Re: wiki editing In-Reply-To: References: <6.0.0.22.2.20070305121002.0592be20@mail.mcs.anl.gov> <7102B7CB-F854-4597-8437-28BCB59C3FCC@ci.uchicago.edu> Message-ID: <6.0.0.22.2.20070306123636.0332d9c0@mail.mcs.anl.gov> Thanks, Tibi! N At 12:31 PM 3/6/2007, Tiberiu Stef-Praun wrote: >I helped Nika out, she was passing the full link path instead of using >a simple WikiName. Using full URL links confused the Wiki, because >those pages were not created yet. > >On 3/6/07, Ti Leggett wrote: >>The owners of that particular wiki web should be able to grant you >>access. I believe the relevant people are on the swift-devel mailing >>list. If they can't, or don't know how, let us know. >> >>On Mar 5, 2007, at Mon,Mar 5, 12:15 PM, Veronika V. Nefedova wrote: >> >> > Hi, >> > >> > I wanted to add a couple of links down from the Application Status >> > wiki page http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ >> > ApplicationStatus but I got access denied ( something like this: >> > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ApplicationStatus/ >> > DETAILS, http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ >> > ApplicationStatus/scripts, etc). Could I be given such permissions >> > to add pages to the wiki, or I should contact support every time I >> > need to do this? >> > >> > Thank you very much, >> > >> > Nika >> > >> >>_______________________________________________ >>Swift-devel mailing list >>Swift-devel at ci.uchicago.edu >>http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >-- >Tiberiu (Tibi) Stef-Praun, PhD >Research Staff, Computation Institute >5640 S. Ellis Ave, #405 >University of Chicago >http://www-unix.mcs.anl.gov/~tiberius/ From benc at hawaga.org.uk Tue Mar 6 12:37:53 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 6 Mar 2007 18:37:53 +0000 (GMT) Subject: [Swift-devel] filenames with "." In-Reply-To: <251098058-1173204263-cardhu_blackberry.rim.net-762091144-@bwe047-cell00.bisx.prod.on.blackberry> References: <6.0.0.22.2.20070306115336.03326ec0@mail.mcs.anl.gov> <6.0.0.22.2.20070306120005.03323e60@mail.mcs.anl.gov> <251098058-1173204263-cardhu_blackberry.rim.net-762091144-@bwe047-cell00.bisx.prod.on.blackberry> Message-ID: On Tue, 6 Mar 2007, Ian Foster wrote: > That seems a rather unreasonable restriction long term? They're variable names, scoped at most only to the local program; they aren't LFNs which sit in a global namespace. -- From nefedova at mcs.anl.gov Tue Mar 6 14:14:41 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 06 Mar 2007 14:14:41 -0600 Subject: [Swift-devel] workflow hung? Message-ID: <6.0.0.22.2.20070306135902.03309080@mail.mcs.anl.gov> Hi, I am testing an extend Molecular Dynamics workflow -- and it seems to be hung after the first 3 steps of the workflow. The fourth step consists of 68 jobs that could/should be ran simultaneously. All these jobs have the same executable, but different command line parameters. Input files for these 68 jobs come from step 3 of the workflow (plus 2 additional files - one common to all jobs, and one unique for every job). I see all these files on my localhost present. The log finishes with staging out of the results of step 3 (successful) and then nothing happens. No files are being staged in for step 4. This is the snapshot of the dtm file (CHARMM is step 3, CHARMM2 is step 4) - the place where it all hung: (stdt_m001, psf_file_m001, crd_eq_file_m001, rst_file_m001, trj_file_m001, crd_min_f ile_m001) = CHARMM (gaff_rft, gaff_prm, stdn, rtf_file_m001, prm_file_m001, crd_file _m001, water_file, "system:solv_m001", "stitle:m001", "rtffile:parm03_gaff_all.rtf", "paramfile:parm03_gaffnb_all.prm", "gaff:m001_am1"); file prt_file_m001 <"solv_chg_a0.prt">; file solv_chg_a0_m001_wham <"solv_chg_a0_m001.wham">; file solv_chg_a0_m001_crd <"solv_chg_a0_m001.crd">; file solv_chg_a0_m001_out <"solv_chg_a0_m001.out">; file solv_chg_a0_m001_done <"solv_chg_a0_m001_done">; (solv_chg_a0_m001_wham, solv_chg_a0_m001_crd, solv_chg_a0_m001_out, solv_chg_a0_m001 _done) = CHARMM2 (standn, gaff_prm, gaff_rft, rtf_file_m001, prm_file_m001, psf_file _m001, crd_eq_file_m001, prt_file_m001, "dirname:solv_chg_a0_m001", "system:solv_m00 1", "stitle:m001", "rtffile:parm03_gaff_all.rtf", "paramfile:parm03_gaffnb_all.prm", "gaff:m001_am1", "stage:chg", "urandseed:4880701"); The complete dtm file is on wiggum in: /sandbox/ydeng/alamines/swift-MolDyn-free.dtm. I am probably missing something here, but can't see what... Please let me know where to look for the errors, Thanks, Nika From hategan at mcs.anl.gov Tue Mar 6 14:14:35 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 06 Mar 2007 14:14:35 -0600 Subject: [Swift-devel] workflow hung? In-Reply-To: <6.0.0.22.2.20070306135902.03309080@mail.mcs.anl.gov> References: <6.0.0.22.2.20070306135902.03309080@mail.mcs.anl.gov> Message-ID: <1173212075.609.0.camel@blabla.mcs.anl.gov> What's the run id (or the log file)? Mihael On Tue, 2007-03-06 at 14:14 -0600, Veronika V. Nefedova wrote: > Hi, > > I am testing an extend Molecular Dynamics workflow -- and it seems to be > hung after the first 3 steps of the workflow. The fourth step consists of > 68 jobs that could/should be ran simultaneously. All these jobs have the > same executable, but different command line parameters. Input files for > these 68 jobs come from step 3 of the workflow (plus 2 additional files - > one common to all jobs, and one unique for every job). I see all these > files on my localhost present. > The log finishes with staging out of the results of step 3 (successful) and > then nothing happens. No files are being staged in for step 4. > This is the snapshot of the dtm file (CHARMM is step 3, CHARMM2 is step 4) > - the place where it all hung: > > (stdt_m001, psf_file_m001, crd_eq_file_m001, rst_file_m001, trj_file_m001, > crd_min_f > ile_m001) = CHARMM (gaff_rft, gaff_prm, stdn, rtf_file_m001, prm_file_m001, > crd_file > _m001, water_file, "system:solv_m001", "stitle:m001", > "rtffile:parm03_gaff_all.rtf", > "paramfile:parm03_gaffnb_all.prm", "gaff:m001_am1"); > file prt_file_m001 <"solv_chg_a0.prt">; > file solv_chg_a0_m001_wham <"solv_chg_a0_m001.wham">; > file solv_chg_a0_m001_crd <"solv_chg_a0_m001.crd">; > file solv_chg_a0_m001_out <"solv_chg_a0_m001.out">; > file solv_chg_a0_m001_done <"solv_chg_a0_m001_done">; > > (solv_chg_a0_m001_wham, solv_chg_a0_m001_crd, solv_chg_a0_m001_out, > solv_chg_a0_m001 > _done) = CHARMM2 (standn, gaff_prm, gaff_rft, rtf_file_m001, prm_file_m001, > psf_file > _m001, crd_eq_file_m001, prt_file_m001, "dirname:solv_chg_a0_m001", > "system:solv_m00 > 1", "stitle:m001", "rtffile:parm03_gaff_all.rtf", > "paramfile:parm03_gaffnb_all.prm", > "gaff:m001_am1", "stage:chg", "urandseed:4880701"); > > The complete dtm file is on wiggum in: > /sandbox/ydeng/alamines/swift-MolDyn-free.dtm. I am probably missing > something here, but can't see what... > > Please let me know where to look for the errors, > > Thanks, > > Nika > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Tue Mar 6 14:18:54 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 06 Mar 2007 14:18:54 -0600 Subject: [Swift-devel] workflow hung? In-Reply-To: <1173212075.609.0.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070306135902.03309080@mail.mcs.anl.gov> <1173212075.609.0.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070306141816.057d22a0@mail.mcs.anl.gov> /sandbox/ydeng/alamines/swift-MolDyn-free-7bputznxmlga1.log on wiggum At 02:14 PM 3/6/2007, Mihael Hategan wrote: >What's the run id (or the log file)? > >Mihael > >On Tue, 2007-03-06 at 14:14 -0600, Veronika V. Nefedova wrote: > > Hi, > > > > I am testing an extend Molecular Dynamics workflow -- and it seems to be > > hung after the first 3 steps of the workflow. The fourth step consists of > > 68 jobs that could/should be ran simultaneously. All these jobs have the > > same executable, but different command line parameters. Input files for > > these 68 jobs come from step 3 of the workflow (plus 2 additional files - > > one common to all jobs, and one unique for every job). I see all these > > files on my localhost present. > > The log finishes with staging out of the results of step 3 (successful) > and > > then nothing happens. No files are being staged in for step 4. > > This is the snapshot of the dtm file (CHARMM is step 3, CHARMM2 is step 4) > > - the place where it all hung: > > > > (stdt_m001, psf_file_m001, crd_eq_file_m001, rst_file_m001, trj_file_m001, > > crd_min_f > > ile_m001) = CHARMM (gaff_rft, gaff_prm, stdn, rtf_file_m001, > prm_file_m001, > > crd_file > > _m001, water_file, "system:solv_m001", "stitle:m001", > > "rtffile:parm03_gaff_all.rtf", > > "paramfile:parm03_gaffnb_all.prm", "gaff:m001_am1"); > > file prt_file_m001 <"solv_chg_a0.prt">; > > file solv_chg_a0_m001_wham <"solv_chg_a0_m001.wham">; > > file solv_chg_a0_m001_crd <"solv_chg_a0_m001.crd">; > > file solv_chg_a0_m001_out <"solv_chg_a0_m001.out">; > > file solv_chg_a0_m001_done <"solv_chg_a0_m001_done">; > > > > (solv_chg_a0_m001_wham, solv_chg_a0_m001_crd, solv_chg_a0_m001_out, > > solv_chg_a0_m001 > > _done) = CHARMM2 (standn, gaff_prm, gaff_rft, rtf_file_m001, > prm_file_m001, > > psf_file > > _m001, crd_eq_file_m001, prt_file_m001, "dirname:solv_chg_a0_m001", > > "system:solv_m00 > > 1", "stitle:m001", "rtffile:parm03_gaff_all.rtf", > > "paramfile:parm03_gaffnb_all.prm", > > "gaff:m001_am1", "stage:chg", "urandseed:4880701"); > > > > The complete dtm file is on wiggum in: > > /sandbox/ydeng/alamines/swift-MolDyn-free.dtm. I am probably missing > > something here, but can't see what... > > > > Please let me know where to look for the errors, > > > > Thanks, > > > > Nika > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From nefedova at mcs.anl.gov Tue Mar 6 14:20:51 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 06 Mar 2007 14:20:51 -0600 Subject: [Swift-devel] workflow hung? In-Reply-To: <6.0.0.22.2.20070306141816.057d22a0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070306135902.03309080@mail.mcs.anl.gov> <1173212075.609.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070306141816.057d22a0@mail.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070306142023.057d3ec0@mail.mcs.anl.gov> I haven't cancelled it -- it is still hanging out there (; Nika At 02:18 PM 3/6/2007, Veronika V. Nefedova wrote: >/sandbox/ydeng/alamines/swift-MolDyn-free-7bputznxmlga1.log > >on wiggum > >At 02:14 PM 3/6/2007, Mihael Hategan wrote: >>What's the run id (or the log file)? >> >>Mihael >> >>On Tue, 2007-03-06 at 14:14 -0600, Veronika V. Nefedova wrote: >> > Hi, >> > >> > I am testing an extend Molecular Dynamics workflow -- and it seems to be >> > hung after the first 3 steps of the workflow. The fourth step consists of >> > 68 jobs that could/should be ran simultaneously. All these jobs have the >> > same executable, but different command line parameters. Input files for >> > these 68 jobs come from step 3 of the workflow (plus 2 additional files - >> > one common to all jobs, and one unique for every job). I see all these >> > files on my localhost present. >> > The log finishes with staging out of the results of step 3 >> (successful) and >> > then nothing happens. No files are being staged in for step 4. >> > This is the snapshot of the dtm file (CHARMM is step 3, CHARMM2 is step 4) >> > - the place where it all hung: >> > >> > (stdt_m001, psf_file_m001, crd_eq_file_m001, rst_file_m001, trj_file_m001, >> > crd_min_f >> > ile_m001) = CHARMM (gaff_rft, gaff_prm, stdn, rtf_file_m001, >> prm_file_m001, >> > crd_file >> > _m001, water_file, "system:solv_m001", "stitle:m001", >> > "rtffile:parm03_gaff_all.rtf", >> > "paramfile:parm03_gaffnb_all.prm", "gaff:m001_am1"); >> > file prt_file_m001 <"solv_chg_a0.prt">; >> > file solv_chg_a0_m001_wham <"solv_chg_a0_m001.wham">; >> > file solv_chg_a0_m001_crd <"solv_chg_a0_m001.crd">; >> > file solv_chg_a0_m001_out <"solv_chg_a0_m001.out">; >> > file solv_chg_a0_m001_done <"solv_chg_a0_m001_done">; >> > >> > (solv_chg_a0_m001_wham, solv_chg_a0_m001_crd, solv_chg_a0_m001_out, >> > solv_chg_a0_m001 >> > _done) = CHARMM2 (standn, gaff_prm, gaff_rft, rtf_file_m001, >> prm_file_m001, >> > psf_file >> > _m001, crd_eq_file_m001, prt_file_m001, "dirname:solv_chg_a0_m001", >> > "system:solv_m00 >> > 1", "stitle:m001", "rtffile:parm03_gaff_all.rtf", >> > "paramfile:parm03_gaffnb_all.prm", >> > "gaff:m001_am1", "stage:chg", "urandseed:4880701"); >> > >> > The complete dtm file is on wiggum in: >> > /sandbox/ydeng/alamines/swift-MolDyn-free.dtm. I am probably missing >> > something here, but can't see what... >> > >> > Please let me know where to look for the errors, >> > >> > Thanks, >> > >> > Nika >> > >> > _______________________________________________ >> > Swift-devel mailing list >> > Swift-devel at ci.uchicago.edu >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > >_______________________________________________ >Swift-devel mailing list >Swift-devel at ci.uchicago.edu >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Tue Mar 6 14:19:51 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 06 Mar 2007 14:19:51 -0600 Subject: [Swift-devel] workflow hung? In-Reply-To: <6.0.0.22.2.20070306142023.057d3ec0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070306135902.03309080@mail.mcs.anl.gov> <1173212075.609.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070306141816.057d22a0@mail.mcs.anl.gov> <6.0.0.22.2.20070306142023.057d3ec0@mail.mcs.anl.gov> Message-ID: <1173212391.829.0.camel@blabla.mcs.anl.gov> Excellent! Quickly type "v", "Enter" and post the output :) On Tue, 2007-03-06 at 14:20 -0600, Veronika V. Nefedova wrote: > I haven't cancelled it -- it is still hanging out there (; > > Nika > > At 02:18 PM 3/6/2007, Veronika V. Nefedova wrote: > >/sandbox/ydeng/alamines/swift-MolDyn-free-7bputznxmlga1.log > > > >on wiggum > > > >At 02:14 PM 3/6/2007, Mihael Hategan wrote: > >>What's the run id (or the log file)? > >> > >>Mihael > >> > >>On Tue, 2007-03-06 at 14:14 -0600, Veronika V. Nefedova wrote: > >> > Hi, > >> > > >> > I am testing an extend Molecular Dynamics workflow -- and it seems to be > >> > hung after the first 3 steps of the workflow. The fourth step consists of > >> > 68 jobs that could/should be ran simultaneously. All these jobs have the > >> > same executable, but different command line parameters. Input files for > >> > these 68 jobs come from step 3 of the workflow (plus 2 additional files - > >> > one common to all jobs, and one unique for every job). I see all these > >> > files on my localhost present. > >> > The log finishes with staging out of the results of step 3 > >> (successful) and > >> > then nothing happens. No files are being staged in for step 4. > >> > This is the snapshot of the dtm file (CHARMM is step 3, CHARMM2 is step 4) > >> > - the place where it all hung: > >> > > >> > (stdt_m001, psf_file_m001, crd_eq_file_m001, rst_file_m001, trj_file_m001, > >> > crd_min_f > >> > ile_m001) = CHARMM (gaff_rft, gaff_prm, stdn, rtf_file_m001, > >> prm_file_m001, > >> > crd_file > >> > _m001, water_file, "system:solv_m001", "stitle:m001", > >> > "rtffile:parm03_gaff_all.rtf", > >> > "paramfile:parm03_gaffnb_all.prm", "gaff:m001_am1"); > >> > file prt_file_m001 <"solv_chg_a0.prt">; > >> > file solv_chg_a0_m001_wham <"solv_chg_a0_m001.wham">; > >> > file solv_chg_a0_m001_crd <"solv_chg_a0_m001.crd">; > >> > file solv_chg_a0_m001_out <"solv_chg_a0_m001.out">; > >> > file solv_chg_a0_m001_done <"solv_chg_a0_m001_done">; > >> > > >> > (solv_chg_a0_m001_wham, solv_chg_a0_m001_crd, solv_chg_a0_m001_out, > >> > solv_chg_a0_m001 > >> > _done) = CHARMM2 (standn, gaff_prm, gaff_rft, rtf_file_m001, > >> prm_file_m001, > >> > psf_file > >> > _m001, crd_eq_file_m001, prt_file_m001, "dirname:solv_chg_a0_m001", > >> > "system:solv_m00 > >> > 1", "stitle:m001", "rtffile:parm03_gaff_all.rtf", > >> > "paramfile:parm03_gaffnb_all.prm", > >> > "gaff:m001_am1", "stage:chg", "urandseed:4880701"); > >> > > >> > The complete dtm file is on wiggum in: > >> > /sandbox/ydeng/alamines/swift-MolDyn-free.dtm. I am probably missing > >> > something here, but can't see what... > >> > > >> > Please let me know where to look for the errors, > >> > > >> > Thanks, > >> > > >> > Nika > >> > > >> > _______________________________________________ > >> > Swift-devel mailing list > >> > Swift-devel at ci.uchicago.edu > >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > > > > > >_______________________________________________ > >Swift-devel mailing list > >Swift-devel at ci.uchicago.edu > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From nefedova at mcs.anl.gov Tue Mar 6 14:23:56 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 06 Mar 2007 14:23:56 -0600 Subject: [Swift-devel] workflow hung? In-Reply-To: <1173212391.829.0.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070306135902.03309080@mail.mcs.anl.gov> <1173212075.609.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070306141816.057d22a0@mail.mcs.anl.gov> <6.0.0.22.2.20070306142023.057d3ec0@mail.mcs.anl.gov> <1173212391.829.0.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070306142338.0515eb80@mail.mcs.anl.gov> v Registered futures: file am1_file_m001 F/am1_file_m001 Closed file rtf_file_m001 F/rtf_file_m001 Closed file psf_file_m001 F/psf_file_m001 Closed file prt_file_m001 - F/prt_file_m001 Open ---- At 02:19 PM 3/6/2007, Mihael Hategan wrote: >Excellent! Quickly type "v", "Enter" and post the output :) > >On Tue, 2007-03-06 at 14:20 -0600, Veronika V. Nefedova wrote: > > I haven't cancelled it -- it is still hanging out there (; > > > > Nika > > > > At 02:18 PM 3/6/2007, Veronika V. Nefedova wrote: > > >/sandbox/ydeng/alamines/swift-MolDyn-free-7bputznxmlga1.log > > > > > >on wiggum > > > > > >At 02:14 PM 3/6/2007, Mihael Hategan wrote: > > >>What's the run id (or the log file)? > > >> > > >>Mihael > > >> > > >>On Tue, 2007-03-06 at 14:14 -0600, Veronika V. Nefedova wrote: > > >> > Hi, > > >> > > > >> > I am testing an extend Molecular Dynamics workflow -- and it seems > to be > > >> > hung after the first 3 steps of the workflow. The fourth step > consists of > > >> > 68 jobs that could/should be ran simultaneously. All these jobs > have the > > >> > same executable, but different command line parameters. Input > files for > > >> > these 68 jobs come from step 3 of the workflow (plus 2 additional > files - > > >> > one common to all jobs, and one unique for every job). I see all these > > >> > files on my localhost present. > > >> > The log finishes with staging out of the results of step 3 > > >> (successful) and > > >> > then nothing happens. No files are being staged in for step 4. > > >> > This is the snapshot of the dtm file (CHARMM is step 3, CHARMM2 is > step 4) > > >> > - the place where it all hung: > > >> > > > >> > (stdt_m001, psf_file_m001, crd_eq_file_m001, rst_file_m001, > trj_file_m001, > > >> > crd_min_f > > >> > ile_m001) = CHARMM (gaff_rft, gaff_prm, stdn, rtf_file_m001, > > >> prm_file_m001, > > >> > crd_file > > >> > _m001, water_file, "system:solv_m001", "stitle:m001", > > >> > "rtffile:parm03_gaff_all.rtf", > > >> > "paramfile:parm03_gaffnb_all.prm", "gaff:m001_am1"); > > >> > file prt_file_m001 <"solv_chg_a0.prt">; > > >> > file solv_chg_a0_m001_wham <"solv_chg_a0_m001.wham">; > > >> > file solv_chg_a0_m001_crd <"solv_chg_a0_m001.crd">; > > >> > file solv_chg_a0_m001_out <"solv_chg_a0_m001.out">; > > >> > file solv_chg_a0_m001_done <"solv_chg_a0_m001_done">; > > >> > > > >> > (solv_chg_a0_m001_wham, solv_chg_a0_m001_crd, solv_chg_a0_m001_out, > > >> > solv_chg_a0_m001 > > >> > _done) = CHARMM2 (standn, gaff_prm, gaff_rft, rtf_file_m001, > > >> prm_file_m001, > > >> > psf_file > > >> > _m001, crd_eq_file_m001, prt_file_m001, "dirname:solv_chg_a0_m001", > > >> > "system:solv_m00 > > >> > 1", "stitle:m001", "rtffile:parm03_gaff_all.rtf", > > >> > "paramfile:parm03_gaffnb_all.prm", > > >> > "gaff:m001_am1", "stage:chg", "urandseed:4880701"); > > >> > > > >> > The complete dtm file is on wiggum in: > > >> > /sandbox/ydeng/alamines/swift-MolDyn-free.dtm. I am probably missing > > >> > something here, but can't see what... > > >> > > > >> > Please let me know where to look for the errors, > > >> > > > >> > Thanks, > > >> > > > >> > Nika > > >> > > > >> > _______________________________________________ > > >> > Swift-devel mailing list > > >> > Swift-devel at ci.uchicago.edu > > >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >> > > > > > > > > > >_______________________________________________ > > >Swift-devel mailing list > > >Swift-devel at ci.uchicago.edu > > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From hategan at mcs.anl.gov Tue Mar 6 14:26:12 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 06 Mar 2007 14:26:12 -0600 Subject: [Swift-devel] workflow hung? In-Reply-To: <6.0.0.22.2.20070306142338.0515eb80@mail.mcs.anl.gov> References: <6.0.0.22.2.20070306135902.03309080@mail.mcs.anl.gov> <1173212075.609.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070306141816.057d22a0@mail.mcs.anl.gov> <6.0.0.22.2.20070306142023.057d3ec0@mail.mcs.anl.gov> <1173212391.829.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070306142338.0515eb80@mail.mcs.anl.gov> Message-ID: <1173212773.1083.1.camel@blabla.mcs.anl.gov> There are multiple declarations of prt_file_m001. Mihael On Tue, 2007-03-06 at 14:23 -0600, Veronika V. Nefedova wrote: > v > > Registered futures: > file am1_file_m001 F/am1_file_m001 Closed > file rtf_file_m001 F/rtf_file_m001 Closed > file psf_file_m001 F/psf_file_m001 Closed > file prt_file_m001 - F/prt_file_m001 Open > ---- > > At 02:19 PM 3/6/2007, Mihael Hategan wrote: > >Excellent! Quickly type "v", "Enter" and post the output :) > > > >On Tue, 2007-03-06 at 14:20 -0600, Veronika V. Nefedova wrote: > > > I haven't cancelled it -- it is still hanging out there (; > > > > > > Nika > > > > > > At 02:18 PM 3/6/2007, Veronika V. Nefedova wrote: > > > >/sandbox/ydeng/alamines/swift-MolDyn-free-7bputznxmlga1.log > > > > > > > >on wiggum > > > > > > > >At 02:14 PM 3/6/2007, Mihael Hategan wrote: > > > >>What's the run id (or the log file)? > > > >> > > > >>Mihael > > > >> > > > >>On Tue, 2007-03-06 at 14:14 -0600, Veronika V. Nefedova wrote: > > > >> > Hi, > > > >> > > > > >> > I am testing an extend Molecular Dynamics workflow -- and it seems > > to be > > > >> > hung after the first 3 steps of the workflow. The fourth step > > consists of > > > >> > 68 jobs that could/should be ran simultaneously. All these jobs > > have the > > > >> > same executable, but different command line parameters. Input > > files for > > > >> > these 68 jobs come from step 3 of the workflow (plus 2 additional > > files - > > > >> > one common to all jobs, and one unique for every job). I see all these > > > >> > files on my localhost present. > > > >> > The log finishes with staging out of the results of step 3 > > > >> (successful) and > > > >> > then nothing happens. No files are being staged in for step 4. > > > >> > This is the snapshot of the dtm file (CHARMM is step 3, CHARMM2 is > > step 4) > > > >> > - the place where it all hung: > > > >> > > > > >> > (stdt_m001, psf_file_m001, crd_eq_file_m001, rst_file_m001, > > trj_file_m001, > > > >> > crd_min_f > > > >> > ile_m001) = CHARMM (gaff_rft, gaff_prm, stdn, rtf_file_m001, > > > >> prm_file_m001, > > > >> > crd_file > > > >> > _m001, water_file, "system:solv_m001", "stitle:m001", > > > >> > "rtffile:parm03_gaff_all.rtf", > > > >> > "paramfile:parm03_gaffnb_all.prm", "gaff:m001_am1"); > > > >> > file prt_file_m001 <"solv_chg_a0.prt">; > > > >> > file solv_chg_a0_m001_wham <"solv_chg_a0_m001.wham">; > > > >> > file solv_chg_a0_m001_crd <"solv_chg_a0_m001.crd">; > > > >> > file solv_chg_a0_m001_out <"solv_chg_a0_m001.out">; > > > >> > file solv_chg_a0_m001_done <"solv_chg_a0_m001_done">; > > > >> > > > > >> > (solv_chg_a0_m001_wham, solv_chg_a0_m001_crd, solv_chg_a0_m001_out, > > > >> > solv_chg_a0_m001 > > > >> > _done) = CHARMM2 (standn, gaff_prm, gaff_rft, rtf_file_m001, > > > >> prm_file_m001, > > > >> > psf_file > > > >> > _m001, crd_eq_file_m001, prt_file_m001, "dirname:solv_chg_a0_m001", > > > >> > "system:solv_m00 > > > >> > 1", "stitle:m001", "rtffile:parm03_gaff_all.rtf", > > > >> > "paramfile:parm03_gaffnb_all.prm", > > > >> > "gaff:m001_am1", "stage:chg", "urandseed:4880701"); > > > >> > > > > >> > The complete dtm file is on wiggum in: > > > >> > /sandbox/ydeng/alamines/swift-MolDyn-free.dtm. I am probably missing > > > >> > something here, but can't see what... > > > >> > > > > >> > Please let me know where to look for the errors, > > > >> > > > > >> > Thanks, > > > >> > > > > >> > Nika > > > >> > > > > >> > _______________________________________________ > > > >> > Swift-devel mailing list > > > >> > Swift-devel at ci.uchicago.edu > > > >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > >> > > > > > > > > > > > > >_______________________________________________ > > > >Swift-devel mailing list > > > >Swift-devel at ci.uchicago.edu > > > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > From nefedova at mcs.anl.gov Tue Mar 6 14:36:56 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 06 Mar 2007 14:36:56 -0600 Subject: [Swift-devel] workflow hung? In-Reply-To: <1173212773.1083.1.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070306135902.03309080@mail.mcs.anl.gov> <1173212075.609.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070306141816.057d22a0@mail.mcs.anl.gov> <6.0.0.22.2.20070306142023.057d3ec0@mail.mcs.anl.gov> <1173212391.829.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070306142338.0515eb80@mail.mcs.anl.gov> <1173212773.1083.1.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070306143423.05160d60@mail.mcs.anl.gov> Right. But they are all different.... Ah, I see. If those 68 jobs were sequential - that wouldn't be a problems, correct ? But since they are all simultaneous - it hungs. Got it. Thanks for spotting this one out! Nika At 02:26 PM 3/6/2007, Mihael Hategan wrote: >There are multiple declarations of prt_file_m001. > >Mihael > >On Tue, 2007-03-06 at 14:23 -0600, Veronika V. Nefedova wrote: > > v > > > > Registered futures: > > file am1_file_m001 F/am1_file_m001 Closed > > file rtf_file_m001 F/rtf_file_m001 Closed > > file psf_file_m001 F/psf_file_m001 Closed > > file prt_file_m001 - F/prt_file_m001 Open > > ---- > > > > At 02:19 PM 3/6/2007, Mihael Hategan wrote: > > >Excellent! Quickly type "v", "Enter" and post the output :) > > > > > >On Tue, 2007-03-06 at 14:20 -0600, Veronika V. Nefedova wrote: > > > > I haven't cancelled it -- it is still hanging out there (; > > > > > > > > Nika > > > > > > > > At 02:18 PM 3/6/2007, Veronika V. Nefedova wrote: > > > > >/sandbox/ydeng/alamines/swift-MolDyn-free-7bputznxmlga1.log > > > > > > > > > >on wiggum > > > > > > > > > >At 02:14 PM 3/6/2007, Mihael Hategan wrote: > > > > >>What's the run id (or the log file)? > > > > >> > > > > >>Mihael > > > > >> > > > > >>On Tue, 2007-03-06 at 14:14 -0600, Veronika V. Nefedova wrote: > > > > >> > Hi, > > > > >> > > > > > >> > I am testing an extend Molecular Dynamics workflow -- and it > seems > > > to be > > > > >> > hung after the first 3 steps of the workflow. The fourth step > > > consists of > > > > >> > 68 jobs that could/should be ran simultaneously. All these jobs > > > have the > > > > >> > same executable, but different command line parameters. Input > > > files for > > > > >> > these 68 jobs come from step 3 of the workflow (plus 2 additional > > > files - > > > > >> > one common to all jobs, and one unique for every job). I see > all these > > > > >> > files on my localhost present. > > > > >> > The log finishes with staging out of the results of step 3 > > > > >> (successful) and > > > > >> > then nothing happens. No files are being staged in for step 4. > > > > >> > This is the snapshot of the dtm file (CHARMM is step 3, > CHARMM2 is > > > step 4) > > > > >> > - the place where it all hung: > > > > >> > > > > > >> > (stdt_m001, psf_file_m001, crd_eq_file_m001, rst_file_m001, > > > trj_file_m001, > > > > >> > crd_min_f > > > > >> > ile_m001) = CHARMM (gaff_rft, gaff_prm, stdn, rtf_file_m001, > > > > >> prm_file_m001, > > > > >> > crd_file > > > > >> > _m001, water_file, "system:solv_m001", "stitle:m001", > > > > >> > "rtffile:parm03_gaff_all.rtf", > > > > >> > "paramfile:parm03_gaffnb_all.prm", "gaff:m001_am1"); > > > > >> > file prt_file_m001 <"solv_chg_a0.prt">; > > > > >> > file solv_chg_a0_m001_wham <"solv_chg_a0_m001.wham">; > > > > >> > file solv_chg_a0_m001_crd <"solv_chg_a0_m001.crd">; > > > > >> > file solv_chg_a0_m001_out <"solv_chg_a0_m001.out">; > > > > >> > file solv_chg_a0_m001_done <"solv_chg_a0_m001_done">; > > > > >> > > > > > >> > (solv_chg_a0_m001_wham, solv_chg_a0_m001_crd, > solv_chg_a0_m001_out, > > > > >> > solv_chg_a0_m001 > > > > >> > _done) = CHARMM2 (standn, gaff_prm, gaff_rft, rtf_file_m001, > > > > >> prm_file_m001, > > > > >> > psf_file > > > > >> > _m001, crd_eq_file_m001, prt_file_m001, > "dirname:solv_chg_a0_m001", > > > > >> > "system:solv_m00 > > > > >> > 1", "stitle:m001", "rtffile:parm03_gaff_all.rtf", > > > > >> > "paramfile:parm03_gaffnb_all.prm", > > > > >> > "gaff:m001_am1", "stage:chg", "urandseed:4880701"); > > > > >> > > > > > >> > The complete dtm file is on wiggum in: > > > > >> > /sandbox/ydeng/alamines/swift-MolDyn-free.dtm. I am probably > missing > > > > >> > something here, but can't see what... > > > > >> > > > > > >> > Please let me know where to look for the errors, > > > > >> > > > > > >> > Thanks, > > > > >> > > > > > >> > Nika > > > > >> > > > > > >> > _______________________________________________ > > > > >> > Swift-devel mailing list > > > > >> > Swift-devel at ci.uchicago.edu > > > > >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > >> > > > > > > > > > > > > > > > >_______________________________________________ > > > > >Swift-devel mailing list > > > > >Swift-devel at ci.uchicago.edu > > > > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > From nefedova at mcs.anl.gov Tue Mar 6 16:02:55 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 06 Mar 2007 16:02:55 -0600 Subject: [Swift-devel] strange exceptions Message-ID: <6.0.0.22.2.20070306155919.032d8e00@mail.mcs.anl.gov> Hi, I got some very strange exceptions while running the workflow: 2007-03-06 15:20:50,741 INFO vdl:execute2 Creating temporary directory swift-MolDyn-free-kh02i75488k02/chrm-libtl78i on TG-NCSA 2007-03-06 15:20:50,823 DEBUG vdl:execute2 Application exception: Failed to check if /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i is a directory task:dir:make @ vdl-int.k, line: 340 vdl:execute2 @ execute-default.k, line: 22 vdl:execute @ swift-MolDyn-free.kml, line: 217 charmm3 @ swift-MolDyn-free.kml, line: 4619 vdl:mains @ swift-MolDyn-free.kml, line: 3395 Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: Failed to check if /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i is a directory Caused by: org.globus.ftp.exception.ServerException: Custom message: Could not create MlsxEntry [Nested exception message: Custom message: Expected multiline reply] [Nested exception is org.globus.ftp.exception.FTPException: Custom message: Expected multiline reply] My files are on wiggum in /sandbox/ydeng/alamines. The log is swift-MolDyn-free-kh02i75488k02.log Thanks, Nika From hategan at mcs.anl.gov Tue Mar 6 16:04:50 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 06 Mar 2007 16:04:50 -0600 Subject: [Swift-devel] strange exceptions In-Reply-To: <6.0.0.22.2.20070306155919.032d8e00@mail.mcs.anl.gov> References: <6.0.0.22.2.20070306155919.032d8e00@mail.mcs.anl.gov> Message-ID: <1173218690.4996.0.camel@blabla.mcs.anl.gov> Does the workflow fail? On Tue, 2007-03-06 at 16:02 -0600, Veronika V. Nefedova wrote: > Hi, > > I got some very strange exceptions while running the workflow: > > 2007-03-06 15:20:50,741 INFO vdl:execute2 Creating temporary directory > swift-MolDyn-free-kh02i75488k02/chrm-libtl78i on TG-NCSA > 2007-03-06 15:20:50,823 DEBUG vdl:execute2 Application exception: Failed to > check if > /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i is a > directory > task:dir:make @ vdl-int.k, line: 340 > vdl:execute2 @ execute-default.k, line: 22 > vdl:execute @ swift-MolDyn-free.kml, line: 217 > charmm3 @ swift-MolDyn-free.kml, line: 4619 > vdl:mains @ swift-MolDyn-free.kml, line: 3395 > Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: > Failed to check if > /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i is a > directory > Caused by: org.globus.ftp.exception.ServerException: Custom message: Could > not create MlsxEntry [Nested exception message: Custom message: Expected > multiline reply] [Nested exception is > org.globus.ftp.exception.FTPException: Custom message: Expected multiline > reply] > > > My files are on wiggum in /sandbox/ydeng/alamines. The log is > swift-MolDyn-free-kh02i75488k02.log > > Thanks, > > Nika > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Tue Mar 6 16:09:33 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 06 Mar 2007 16:09:33 -0600 Subject: [Swift-devel] strange exceptions In-Reply-To: <1173218690.4996.0.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070306155919.032d8e00@mail.mcs.anl.gov> <1173218690.4996.0.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070306160922.0515ad60@mail.mcs.anl.gov> It looks like it... At 04:04 PM 3/6/2007, Mihael Hategan wrote: >Does the workflow fail? > >On Tue, 2007-03-06 at 16:02 -0600, Veronika V. Nefedova wrote: > > Hi, > > > > I got some very strange exceptions while running the workflow: > > > > 2007-03-06 15:20:50,741 INFO vdl:execute2 Creating temporary directory > > swift-MolDyn-free-kh02i75488k02/chrm-libtl78i on TG-NCSA > > 2007-03-06 15:20:50,823 DEBUG vdl:execute2 Application exception: > Failed to > > check if > > /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i is a > > directory > > task:dir:make @ vdl-int.k, line: 340 > > vdl:execute2 @ execute-default.k, line: 22 > > vdl:execute @ swift-MolDyn-free.kml, line: 217 > > charmm3 @ swift-MolDyn-free.kml, line: 4619 > > vdl:mains @ swift-MolDyn-free.kml, line: 3395 > > Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: > > Failed to check if > > /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i is a > > directory > > Caused by: org.globus.ftp.exception.ServerException: Custom message: > Could > > not create MlsxEntry [Nested exception message: Custom message: Expected > > multiline reply] [Nested exception is > > org.globus.ftp.exception.FTPException: Custom message: Expected multiline > > reply] > > > > > > My files are on wiggum in /sandbox/ydeng/alamines. The log is > > swift-MolDyn-free-kh02i75488k02.log > > > > Thanks, > > > > Nika > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Tue Mar 6 16:09:11 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 06 Mar 2007 16:09:11 -0600 Subject: [Swift-devel] strange exceptions In-Reply-To: <6.0.0.22.2.20070306160922.0515ad60@mail.mcs.anl.gov> References: <6.0.0.22.2.20070306155919.032d8e00@mail.mcs.anl.gov> <1173218690.4996.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070306160922.0515ad60@mail.mcs.anl.gov> Message-ID: <1173218951.5339.0.camel@blabla.mcs.anl.gov> On Tue, 2007-03-06 at 16:09 -0600, Veronika V. Nefedova wrote: > It looks like it... Can you be more specific? > > At 04:04 PM 3/6/2007, Mihael Hategan wrote: > >Does the workflow fail? > > > >On Tue, 2007-03-06 at 16:02 -0600, Veronika V. Nefedova wrote: > > > Hi, > > > > > > I got some very strange exceptions while running the workflow: > > > > > > 2007-03-06 15:20:50,741 INFO vdl:execute2 Creating temporary directory > > > swift-MolDyn-free-kh02i75488k02/chrm-libtl78i on TG-NCSA > > > 2007-03-06 15:20:50,823 DEBUG vdl:execute2 Application exception: > > Failed to > > > check if > > > /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i is a > > > directory > > > task:dir:make @ vdl-int.k, line: 340 > > > vdl:execute2 @ execute-default.k, line: 22 > > > vdl:execute @ swift-MolDyn-free.kml, line: 217 > > > charmm3 @ swift-MolDyn-free.kml, line: 4619 > > > vdl:mains @ swift-MolDyn-free.kml, line: 3395 > > > Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: > > > Failed to check if > > > /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i is a > > > directory > > > Caused by: org.globus.ftp.exception.ServerException: Custom message: > > Could > > > not create MlsxEntry [Nested exception message: Custom message: Expected > > > multiline reply] [Nested exception is > > > org.globus.ftp.exception.FTPException: Custom message: Expected multiline > > > reply] > > > > > > > > > My files are on wiggum in /sandbox/ydeng/alamines. The log is > > > swift-MolDyn-free-kh02i75488k02.log > > > > > > Thanks, > > > > > > Nika > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From nefedova at mcs.anl.gov Tue Mar 6 16:15:46 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 06 Mar 2007 16:15:46 -0600 Subject: [Swift-devel] strange exceptions In-Reply-To: <1173218951.5339.0.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070306155919.032d8e00@mail.mcs.anl.gov> <1173218690.4996.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070306160922.0515ad60@mail.mcs.anl.gov> <1173218951.5339.0.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070306161335.0331bec0@mail.mcs.anl.gov> I do not see any output neither on the remote host nor on the localhost. And I have this on stdout (where I submitted the workflow): chrm failed chrm failed chrm failed ....... At 04:09 PM 3/6/2007, Mihael Hategan wrote: >On Tue, 2007-03-06 at 16:09 -0600, Veronika V. Nefedova wrote: > > It looks like it... > >Can you be more specific? > > > > > At 04:04 PM 3/6/2007, Mihael Hategan wrote: > > >Does the workflow fail? > > > > > >On Tue, 2007-03-06 at 16:02 -0600, Veronika V. Nefedova wrote: > > > > Hi, > > > > > > > > I got some very strange exceptions while running the workflow: > > > > > > > > 2007-03-06 15:20:50,741 INFO vdl:execute2 Creating temporary directory > > > > swift-MolDyn-free-kh02i75488k02/chrm-libtl78i on TG-NCSA > > > > 2007-03-06 15:20:50,823 DEBUG vdl:execute2 Application exception: > > > Failed to > > > > check if > > > > > /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i is a > > > > directory > > > > task:dir:make @ vdl-int.k, line: 340 > > > > vdl:execute2 @ execute-default.k, line: 22 > > > > vdl:execute @ swift-MolDyn-free.kml, line: 217 > > > > charmm3 @ swift-MolDyn-free.kml, line: 4619 > > > > vdl:mains @ swift-MolDyn-free.kml, line: 3395 > > > > Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: > > > > Failed to check if > > > > > /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i is a > > > > directory > > > > Caused by: org.globus.ftp.exception.ServerException: Custom message: > > > Could > > > > not create MlsxEntry [Nested exception message: Custom message: > Expected > > > > multiline reply] [Nested exception is > > > > org.globus.ftp.exception.FTPException: Custom message: Expected > multiline > > > > reply] > > > > > > > > > > > > My files are on wiggum in /sandbox/ydeng/alamines. The log is > > > > swift-MolDyn-free-kh02i75488k02.log > > > > > > > > Thanks, > > > > > > > > Nika > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > From hategan at mcs.anl.gov Tue Mar 6 16:15:41 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 06 Mar 2007 16:15:41 -0600 Subject: [Swift-devel] strange exceptions In-Reply-To: <6.0.0.22.2.20070306161335.0331bec0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070306155919.032d8e00@mail.mcs.anl.gov> <1173218690.4996.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070306160922.0515ad60@mail.mcs.anl.gov> <1173218951.5339.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070306161335.0331bec0@mail.mcs.anl.gov> Message-ID: <1173219341.5682.0.camel@blabla.mcs.anl.gov> Do you get the shell prompt back? Is that gridftp server actually working? Mihael On Tue, 2007-03-06 at 16:15 -0600, Veronika V. Nefedova wrote: > I do not see any output neither on the remote host nor on the localhost. > And I have this on stdout (where I submitted the workflow): > > chrm failed > chrm failed > chrm failed > ....... > > > > At 04:09 PM 3/6/2007, Mihael Hategan wrote: > >On Tue, 2007-03-06 at 16:09 -0600, Veronika V. Nefedova wrote: > > > It looks like it... > > > >Can you be more specific? > > > > > > > > At 04:04 PM 3/6/2007, Mihael Hategan wrote: > > > >Does the workflow fail? > > > > > > > >On Tue, 2007-03-06 at 16:02 -0600, Veronika V. Nefedova wrote: > > > > > Hi, > > > > > > > > > > I got some very strange exceptions while running the workflow: > > > > > > > > > > 2007-03-06 15:20:50,741 INFO vdl:execute2 Creating temporary directory > > > > > swift-MolDyn-free-kh02i75488k02/chrm-libtl78i on TG-NCSA > > > > > 2007-03-06 15:20:50,823 DEBUG vdl:execute2 Application exception: > > > > Failed to > > > > > check if > > > > > > > /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i is a > > > > > directory > > > > > task:dir:make @ vdl-int.k, line: 340 > > > > > vdl:execute2 @ execute-default.k, line: 22 > > > > > vdl:execute @ swift-MolDyn-free.kml, line: 217 > > > > > charmm3 @ swift-MolDyn-free.kml, line: 4619 > > > > > vdl:mains @ swift-MolDyn-free.kml, line: 3395 > > > > > Caused by: org.globus.cog.abstraction.impl.file.FileResourceException: > > > > > Failed to check if > > > > > > > /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i is a > > > > > directory > > > > > Caused by: org.globus.ftp.exception.ServerException: Custom message: > > > > Could > > > > > not create MlsxEntry [Nested exception message: Custom message: > > Expected > > > > > multiline reply] [Nested exception is > > > > > org.globus.ftp.exception.FTPException: Custom message: Expected > > multiline > > > > > reply] > > > > > > > > > > > > > > > My files are on wiggum in /sandbox/ydeng/alamines. The log is > > > > > swift-MolDyn-free-kh02i75488k02.log > > > > > > > > > > Thanks, > > > > > > > > > > Nika > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > From nefedova at mcs.anl.gov Tue Mar 6 16:22:24 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 06 Mar 2007 16:22:24 -0600 Subject: [Swift-devel] strange exceptions In-Reply-To: <1173219341.5682.0.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070306155919.032d8e00@mail.mcs.anl.gov> <1173218690.4996.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070306160922.0515ad60@mail.mcs.anl.gov> <1173218951.5339.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070306161335.0331bec0@mail.mcs.anl.gov> <1173219341.5682.0.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070306162037.0515a340@mail.mcs.anl.gov> Yes, I got the prompt back. And all the previous stages of the workflow had no problems with files staged in/out. The output from that last stage was never produced... I'll check the parameters to see if that could be the problem. Nika At 04:15 PM 3/6/2007, Mihael Hategan wrote: >Do you get the shell prompt back? > >Is that gridftp server actually working? > >Mihael > >On Tue, 2007-03-06 at 16:15 -0600, Veronika V. Nefedova wrote: > > I do not see any output neither on the remote host nor on the localhost. > > And I have this on stdout (where I submitted the workflow): > > > > chrm failed > > chrm failed > > chrm failed > > ....... > > > > > > > > At 04:09 PM 3/6/2007, Mihael Hategan wrote: > > >On Tue, 2007-03-06 at 16:09 -0600, Veronika V. Nefedova wrote: > > > > It looks like it... > > > > > >Can you be more specific? > > > > > > > > > > > At 04:04 PM 3/6/2007, Mihael Hategan wrote: > > > > >Does the workflow fail? > > > > > > > > > >On Tue, 2007-03-06 at 16:02 -0600, Veronika V. Nefedova wrote: > > > > > > Hi, > > > > > > > > > > > > I got some very strange exceptions while running the workflow: > > > > > > > > > > > > 2007-03-06 15:20:50,741 INFO vdl:execute2 Creating temporary > directory > > > > > > swift-MolDyn-free-kh02i75488k02/chrm-libtl78i on TG-NCSA > > > > > > 2007-03-06 15:20:50,823 DEBUG vdl:execute2 Application exception: > > > > > Failed to > > > > > > check if > > > > > > > > > /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i > is a > > > > > > directory > > > > > > task:dir:make @ vdl-int.k, line: 340 > > > > > > vdl:execute2 @ execute-default.k, line: 22 > > > > > > vdl:execute @ swift-MolDyn-free.kml, line: 217 > > > > > > charmm3 @ swift-MolDyn-free.kml, line: 4619 > > > > > > vdl:mains @ swift-MolDyn-free.kml, line: 3395 > > > > > > Caused by: > org.globus.cog.abstraction.impl.file.FileResourceException: > > > > > > Failed to check if > > > > > > > > > /home/ac/nefedova/SWIFT/swift-MolDyn-free-kh02i75488k02/chrm-libtl78i > is a > > > > > > directory > > > > > > Caused by: org.globus.ftp.exception.ServerException: Custom > message: > > > > > Could > > > > > > not create MlsxEntry [Nested exception message: Custom message: > > > Expected > > > > > > multiline reply] [Nested exception is > > > > > > org.globus.ftp.exception.FTPException: Custom message: Expected > > > multiline > > > > > > reply] > > > > > > > > > > > > > > > > > > My files are on wiggum in /sandbox/ydeng/alamines. The log is > > > > > > swift-MolDyn-free-kh02i75488k02.log > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Nika > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > From nefedova at mcs.anl.gov Wed Mar 7 16:30:25 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 07 Mar 2007 16:30:25 -0600 Subject: [Swift-devel] submitting jobs to the queue Message-ID: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> Hi, I've noticed one very strange behavior. For example, I have 68 jobs to be submitted to the remote host simultaneously. Swift submits at first just 26 jobs. I checked that several times - its always 26 jobs. Then, when at least one job out of those 26 is finished - swift goes ahead and submits the rest (all of those left - 42 in my case). Is it a bug or a feature? Nika From hategan at mcs.anl.gov Wed Mar 7 16:36:14 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 07 Mar 2007 16:36:14 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> Message-ID: <1173306974.4767.6.camel@blabla.mcs.anl.gov> On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > Hi, > > I've noticed one very strange behavior. For example, I have 68 jobs to be > submitted to the remote host simultaneously. Swift submits at first just 26 > jobs. I checked that several times - its always 26 jobs. Then, when at > least one job out of those 26 is finished - swift goes ahead and submits > the rest (all of those left - 42 in my case). > Is it a bug or a feature? Feature. Although it should probably be tamed down in the one site case. Each site has a score that changes based on how it behaves. If a site completes jobs ok, it gets a higher score in time. If jobs fail on it, it gets a lower score. Now, let's consider the following scenario: 2 sites, one fast one slow. With no scores and no limitations, half of the jobs would go to one, and half to the other. The workflow finishes when the slow site finishes half the jobs. What happens however, is that Swift limits the number of initial jobs, and does "probing". This allows it to infer some stuff about the sites by the time it gets to submit lots of jobs. It should yield better performance on larger workflows with imbalanced sites, which is, I'm guessing, our main scenario. > > Nika > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Wed Mar 7 16:58:36 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 07 Mar 2007 16:58:36 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <1173306974.4767.6.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070307165326.03c8abe0@mail.mcs.anl.gov> OK, Here is my another question. Teragrid allows the user to have 385 jobs in a queue. If I run my complete workflow (244 molecules), on stage four I'll have 80 times 244 jobs (i.e. close to 20K). How do I set the limit for the number of submitted jobs to the queue to 385 ? I remember that condor had a specific parameter to condor_submit that was managing exactly that... Nika At 04:36 PM 3/7/2007, Mihael Hategan wrote: >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > Hi, > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be > > submitted to the remote host simultaneously. Swift submits at first > just 26 > > jobs. I checked that several times - its always 26 jobs. Then, when at > > least one job out of those 26 is finished - swift goes ahead and submits > > the rest (all of those left - 42 in my case). > > Is it a bug or a feature? > >Feature. Although it should probably be tamed down in the one site case. >Each site has a score that changes based on how it behaves. If a site >completes jobs ok, it gets a higher score in time. If jobs fail on it, >it gets a lower score. > >Now, let's consider the following scenario: 2 sites, one fast one slow. >With no scores and no limitations, half of the jobs would go to one, and >half to the other. The workflow finishes when the slow site finishes >half the jobs. >What happens however, is that Swift limits the number of initial jobs, >and does "probing". This allows it to infer some stuff about the sites >by the time it gets to submit lots of jobs. It should yield better >performance on larger workflows with imbalanced sites, which is, I'm >guessing, our main scenario. > > > > > Nika > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Wed Mar 7 17:19:21 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 07 Mar 2007 17:19:21 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <6.0.0.22.2.20070307165326.03c8abe0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070307165326.03c8abe0@mail.mcs.anl.gov> Message-ID: <1173309561.6469.0.camel@blabla.mcs.anl.gov> On Wed, 2007-03-07 at 16:58 -0600, Veronika V. Nefedova wrote: > OK, Here is my another question. > Teragrid allows the user to have 385 jobs in a queue. If I run my complete > workflow (244 molecules), on stage four I'll have 80 times 244 jobs (i.e. > close to 20K). How do I set the limit for the number of submitted jobs to > the queue to 385 ? I remember that condor had a specific parameter to > condor_submit that was managing exactly that... Is this 385 jobs per site? > > Nika > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > Hi, > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be > > > submitted to the remote host simultaneously. Swift submits at first > > just 26 > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > least one job out of those 26 is finished - swift goes ahead and submits > > > the rest (all of those left - 42 in my case). > > > Is it a bug or a feature? > > > >Feature. Although it should probably be tamed down in the one site case. > >Each site has a score that changes based on how it behaves. If a site > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > >it gets a lower score. > > > >Now, let's consider the following scenario: 2 sites, one fast one slow. > >With no scores and no limitations, half of the jobs would go to one, and > >half to the other. The workflow finishes when the slow site finishes > >half the jobs. > >What happens however, is that Swift limits the number of initial jobs, > >and does "probing". This allows it to infer some stuff about the sites > >by the time it gets to submit lots of jobs. It should yield better > >performance on larger workflows with imbalanced sites, which is, I'm > >guessing, our main scenario. > > > > > > > > Nika > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From nefedova at mcs.anl.gov Wed Mar 7 17:27:06 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 07 Mar 2007 17:27:06 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <1173309561.6469.0.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070307165326.03c8abe0@mail.mcs.anl.gov> <1173309561.6469.0.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070307172505.04ecfd10@mail.mcs.anl.gov> Right. Teragrid at NCSA has the limit of 384 queued or running jobs per user. Nika At 05:19 PM 3/7/2007, Mihael Hategan wrote: >On Wed, 2007-03-07 at 16:58 -0600, Veronika V. Nefedova wrote: > > OK, Here is my another question. > > Teragrid allows the user to have 385 jobs in a queue. If I run my complete > > workflow (244 molecules), on stage four I'll have 80 times 244 jobs (i.e. > > close to 20K). How do I set the limit for the number of submitted jobs to > > the queue to 385 ? I remember that condor had a specific parameter to > > condor_submit that was managing exactly that... > >Is this 385 jobs per site? > > > > > Nika > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > Hi, > > > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs > to be > > > > submitted to the remote host simultaneously. Swift submits at first > > > just 26 > > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > > least one job out of those 26 is finished - swift goes ahead and > submits > > > > the rest (all of those left - 42 in my case). > > > > Is it a bug or a feature? > > > > > >Feature. Although it should probably be tamed down in the one site case. > > >Each site has a score that changes based on how it behaves. If a site > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > >it gets a lower score. > > > > > >Now, let's consider the following scenario: 2 sites, one fast one slow. > > >With no scores and no limitations, half of the jobs would go to one, and > > >half to the other. The workflow finishes when the slow site finishes > > >half the jobs. > > >What happens however, is that Swift limits the number of initial jobs, > > >and does "probing". This allows it to infer some stuff about the sites > > >by the time it gets to submit lots of jobs. It should yield better > > >performance on larger workflows with imbalanced sites, which is, I'm > > >guessing, our main scenario. > > > > > > > > > > > Nika > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > From hategan at mcs.anl.gov Wed Mar 7 18:20:54 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 07 Mar 2007 18:20:54 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <6.0.0.22.2.20070307172505.04ecfd10@mail.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070307165326.03c8abe0@mail.mcs.anl.gov> <1173309561.6469.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070307172505.04ecfd10@mail.mcs.anl.gov> Message-ID: <1173313254.6939.5.camel@blabla.mcs.anl.gov> So this limit would have to be a per-site limit. There is no such thing right now. You can limit the total number of concurrent jobs, but it's not exposed through swift.properties. In libexec/scheduler.xml, you can try adding the following thing inside ...: Mihael On Wed, 2007-03-07 at 17:27 -0600, Veronika V. Nefedova wrote: > Right. Teragrid at NCSA has the limit of 384 queued or running jobs per user. > > Nika > > At 05:19 PM 3/7/2007, Mihael Hategan wrote: > >On Wed, 2007-03-07 at 16:58 -0600, Veronika V. Nefedova wrote: > > > OK, Here is my another question. > > > Teragrid allows the user to have 385 jobs in a queue. If I run my complete > > > workflow (244 molecules), on stage four I'll have 80 times 244 jobs (i.e. > > > close to 20K). How do I set the limit for the number of submitted jobs to > > > the queue to 385 ? I remember that condor had a specific parameter to > > > condor_submit that was managing exactly that... > > > >Is this 385 jobs per site? > > > > > > > > Nika > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > Hi, > > > > > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs > > to be > > > > > submitted to the remote host simultaneously. Swift submits at first > > > > just 26 > > > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > > > least one job out of those 26 is finished - swift goes ahead and > > submits > > > > > the rest (all of those left - 42 in my case). > > > > > Is it a bug or a feature? > > > > > > > >Feature. Although it should probably be tamed down in the one site case. > > > >Each site has a score that changes based on how it behaves. If a site > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > > >it gets a lower score. > > > > > > > >Now, let's consider the following scenario: 2 sites, one fast one slow. > > > >With no scores and no limitations, half of the jobs would go to one, and > > > >half to the other. The workflow finishes when the slow site finishes > > > >half the jobs. > > > >What happens however, is that Swift limits the number of initial jobs, > > > >and does "probing". This allows it to infer some stuff about the sites > > > >by the time it gets to submit lots of jobs. It should yield better > > > >performance on larger workflows with imbalanced sites, which is, I'm > > > >guessing, our main scenario. > > > > > > > > > > > > > > Nika > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > From foster at mcs.anl.gov Wed Mar 7 22:49:28 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Wed, 07 Mar 2007 22:49:28 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <1173313254.6939.5.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070307165326.03c8abe0@mail.mcs.anl.gov> <1173309561.6469.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070307172505.04ecfd10@mail.mcs.anl.gov> <1173313254.6939.5.camel@blabla.mcs.anl.gov> Message-ID: <45EF95D8.2000003@mcs.anl.gov> I think that all of these issues will go away soon, when we start using the dynamic provisioning code that Ioan is working on. So I wonder if they are worth worrying about too much? Ian. Mihael Hategan wrote: > So this limit would have to be a per-site limit. > There is no such thing right now. You can limit the total number of > concurrent jobs, but it's not exposed through swift.properties. > > In libexec/scheduler.xml, you can try adding the following thing inside > ...: > > > > Mihael > > On Wed, 2007-03-07 at 17:27 -0600, Veronika V. Nefedova wrote: > >> Right. Teragrid at NCSA has the limit of 384 queued or running jobs per user. >> >> Nika >> >> At 05:19 PM 3/7/2007, Mihael Hategan wrote: >> >>> On Wed, 2007-03-07 at 16:58 -0600, Veronika V. Nefedova wrote: >>> >>>> OK, Here is my another question. >>>> Teragrid allows the user to have 385 jobs in a queue. If I run my complete >>>> workflow (244 molecules), on stage four I'll have 80 times 244 jobs (i.e. >>>> close to 20K). How do I set the limit for the number of submitted jobs to >>>> the queue to 385 ? I remember that condor had a specific parameter to >>>> condor_submit that was managing exactly that... >>>> >>> Is this 385 jobs per site? >>> >>> >>>> Nika >>>> >>>> At 04:36 PM 3/7/2007, Mihael Hategan wrote: >>>> >>>>> On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: >>>>> >>>>>> Hi, >>>>>> >>>>>> I've noticed one very strange behavior. For example, I have 68 jobs >>>>>> >>> to be >>> >>>>>> submitted to the remote host simultaneously. Swift submits at first >>>>>> >>>>> just 26 >>>>> >>>>>> jobs. I checked that several times - its always 26 jobs. Then, when at >>>>>> least one job out of those 26 is finished - swift goes ahead and >>>>>> >>> submits >>> >>>>>> the rest (all of those left - 42 in my case). >>>>>> Is it a bug or a feature? >>>>>> >>>>> Feature. Although it should probably be tamed down in the one site case. >>>>> Each site has a score that changes based on how it behaves. If a site >>>>> completes jobs ok, it gets a higher score in time. If jobs fail on it, >>>>> it gets a lower score. >>>>> >>>>> Now, let's consider the following scenario: 2 sites, one fast one slow. >>>>> With no scores and no limitations, half of the jobs would go to one, and >>>>> half to the other. The workflow finishes when the slow site finishes >>>>> half the jobs. >>>>> What happens however, is that Swift limits the number of initial jobs, >>>>> and does "probing". This allows it to infer some stuff about the sites >>>>> by the time it gets to submit lots of jobs. It should yield better >>>>> performance on larger workflows with imbalanced sites, which is, I'm >>>>> guessing, our main scenario. >>>>> >>>>> >>>>>> Nika >>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>>> >>>> >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Mar 7 23:03:08 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 07 Mar 2007 23:03:08 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <45EF95D8.2000003@mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070307165326.03c8abe0@mail.mcs.anl.gov> <1173309561.6469.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070307172505.04ecfd10@mail.mcs.anl.gov> <1173313254.6939.5.camel@blabla.mcs.anl.gov> <45EF95D8.2000003@mcs.anl.gov> Message-ID: <1173330188.17565.4.camel@blabla.mcs.anl.gov> On Wed, 2007-03-07 at 22:49 -0600, Ian Foster wrote: > I think that all of these issues will go away soon, when we start > using the dynamic provisioning code that Ioan is working on. In theory, yes. Practice has a tendency to come up with new problems though. I would not bet all my money on something complex that's not yet there. Mihael > So I wonder if they are worth worrying about too much? > > Ian. > > Mihael Hategan wrote: > > So this limit would have to be a per-site limit. > > There is no such thing right now. You can limit the total number of > > concurrent jobs, but it's not exposed through swift.properties. > > > > In libexec/scheduler.xml, you can try adding the following thing inside > > ...: > > > > > > > > Mihael > > > > On Wed, 2007-03-07 at 17:27 -0600, Veronika V. Nefedova wrote: > > > > > Right. Teragrid at NCSA has the limit of 384 queued or running jobs per user. > > > > > > Nika > > > > > > At 05:19 PM 3/7/2007, Mihael Hategan wrote: > > > > > > > On Wed, 2007-03-07 at 16:58 -0600, Veronika V. Nefedova wrote: > > > > > > > > > OK, Here is my another question. > > > > > Teragrid allows the user to have 385 jobs in a queue. If I run my complete > > > > > workflow (244 molecules), on stage four I'll have 80 times 244 jobs (i.e. > > > > > close to 20K). How do I set the limit for the number of submitted jobs to > > > > > the queue to 385 ? I remember that condor had a specific parameter to > > > > > condor_submit that was managing exactly that... > > > > > > > > > Is this 385 jobs per site? > > > > > > > > > > > > > Nika > > > > > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > > > > > > > > > On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs > > > > > > > > > > > to be > > > > > > > > > > > submitted to the remote host simultaneously. Swift submits at first > > > > > > > > > > > > > just 26 > > > > > > > > > > > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > > > > > least one job out of those 26 is finished - swift goes ahead and > > > > > > > > > > > submits > > > > > > > > > > > the rest (all of those left - 42 in my case). > > > > > > > Is it a bug or a feature? > > > > > > > > > > > > > Feature. Although it should probably be tamed down in the one site case. > > > > > > Each site has a score that changes based on how it behaves. If a site > > > > > > completes jobs ok, it gets a higher score in time. If jobs fail on it, > > > > > > it gets a lower score. > > > > > > > > > > > > Now, let's consider the following scenario: 2 sites, one fast one slow. > > > > > > With no scores and no limitations, half of the jobs would go to one, and > > > > > > half to the other. The workflow finishes when the slow site finishes > > > > > > half the jobs. > > > > > > What happens however, is that Swift limits the number of initial jobs, > > > > > > and does "probing". This allows it to infer some stuff about the sites > > > > > > by the time it gets to submit lots of jobs. It should yield better > > > > > > performance on larger workflows with imbalanced sites, which is, I'm > > > > > > guessing, our main scenario. > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > > Ian Foster, Director, Computation Institute > Argonne National Laboratory & University of Chicago > Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 > Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 > Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. > Globus Alliance: www.globus.org. From yongzh at cs.uchicago.edu Thu Mar 8 09:12:50 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Thu, 8 Mar 2007 09:12:50 -0600 (CST) Subject: [Swift-devel] Re: [Swift-user] A question about resource selection strategy in swift In-Reply-To: References: Message-ID: currently it is based on some heuristics. each site gets a score based on its responsiveness, and jobs are dispatched proportionally according to the scores the sites get. Yong. On Wed, 7 Mar 2007, Ming Wu wrote: > Hi, > > When I submit a workflow to an environment having multiple sites, how > swift distributes the tasks in a workflow onto those multiple > resources? In other words, what strategies does swift use to transform > an abstract workflow to an executable workflow? > > Thanks > > Ming > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > From nefedova at mcs.anl.gov Fri Mar 9 11:06:21 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 09 Mar 2007 11:06:21 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <1173306974.4767.6.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> Hi, Mihael: Is it possible to remove this feature in the one site case ? For example, the queue is now almost empty on TG, but I have to wait for 1.5 hours for the rest of my jobs to be submitted (thats the average running time of my job) - and the queue might be full by that time... Nika At 04:36 PM 3/7/2007, Mihael Hategan wrote: >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > Hi, > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be > > submitted to the remote host simultaneously. Swift submits at first > just 26 > > jobs. I checked that several times - its always 26 jobs. Then, when at > > least one job out of those 26 is finished - swift goes ahead and submits > > the rest (all of those left - 42 in my case). > > Is it a bug or a feature? > >Feature. Although it should probably be tamed down in the one site case. >Each site has a score that changes based on how it behaves. If a site >completes jobs ok, it gets a higher score in time. If jobs fail on it, >it gets a lower score. > >Now, let's consider the following scenario: 2 sites, one fast one slow. >With no scores and no limitations, half of the jobs would go to one, and >half to the other. The workflow finishes when the slow site finishes >half the jobs. >What happens however, is that Swift limits the number of initial jobs, >and does "probing". This allows it to infer some stuff about the sites >by the time it gets to submit lots of jobs. It should yield better >performance on larger workflows with imbalanced sites, which is, I'm >guessing, our main scenario. > > > > > Nika > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Fri Mar 9 11:11:03 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 09 Mar 2007 11:11:03 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> Message-ID: <1173460263.4225.7.camel@blabla.mcs.anl.gov> Yes, although we need to come up with a nicer way to do it. In libexec/scheduler.xml, change to value="large number" (not literally). Mihael On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote: > Hi, Mihael: > > Is it possible to remove this feature in the one site case ? For example, > the queue is now almost empty on TG, but I have to wait for 1.5 hours for > the rest of my jobs to be submitted (thats the average running time of my > job) - and the queue might be full by that time... > > Nika > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > Hi, > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be > > > submitted to the remote host simultaneously. Swift submits at first > > just 26 > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > least one job out of those 26 is finished - swift goes ahead and submits > > > the rest (all of those left - 42 in my case). > > > Is it a bug or a feature? > > > >Feature. Although it should probably be tamed down in the one site case. > >Each site has a score that changes based on how it behaves. If a site > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > >it gets a lower score. > > > >Now, let's consider the following scenario: 2 sites, one fast one slow. > >With no scores and no limitations, half of the jobs would go to one, and > >half to the other. The workflow finishes when the slow site finishes > >half the jobs. > >What happens however, is that Swift limits the number of initial jobs, > >and does "probing". This allows it to infer some stuff about the sites > >by the time it gets to submit lots of jobs. It should yield better > >performance on larger workflows with imbalanced sites, which is, I'm > >guessing, our main scenario. > > > > > > > > Nika > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From tiberius at ci.uchicago.edu Fri Mar 9 11:13:11 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Fri, 9 Mar 2007 11:13:11 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> Message-ID: Even better: Can we have a knob that could be used by the user to increase the aggressiveness of the job submission rate ? Some users might enjoy that (being more in control). Tibi On 3/9/07, Veronika V. Nefedova wrote: > Hi, Mihael: > > Is it possible to remove this feature in the one site case ? For example, > the queue is now almost empty on TG, but I have to wait for 1.5 hours for > the rest of my jobs to be submitted (thats the average running time of my > job) - and the queue might be full by that time... > > Nika > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > Hi, > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be > > > submitted to the remote host simultaneously. Swift submits at first > > just 26 > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > least one job out of those 26 is finished - swift goes ahead and submits > > > the rest (all of those left - 42 in my case). > > > Is it a bug or a feature? > > > >Feature. Although it should probably be tamed down in the one site case. > >Each site has a score that changes based on how it behaves. If a site > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > >it gets a lower score. > > > >Now, let's consider the following scenario: 2 sites, one fast one slow. > >With no scores and no limitations, half of the jobs would go to one, and > >half to the other. The workflow finishes when the slow site finishes > >half the jobs. > >What happens however, is that Swift limits the number of initial jobs, > >and does "probing". This allows it to infer some stuff about the sites > >by the time it gets to submit lots of jobs. It should yield better > >performance on larger workflows with imbalanced sites, which is, I'm > >guessing, our main scenario. > > > > > > > > Nika > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From tiberius at ci.uchicago.edu Fri Mar 9 11:21:51 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Fri, 9 Mar 2007 11:21:51 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <1173460263.4225.7.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> <1173460263.4225.7.camel@blabla.mcs.anl.gov> Message-ID: Knob means "while in progress" Is that doable ? (Probably extending your rudimentary debugger would do it). How about the following extension: can we easily create hooks (webservices) into a running swift engine, that would allow this manipulation with an external client (the knob driver) ? Having more interactivity with a running workflow is something that might be appealing for long-running or never-ending workflows, and would differentiate us from others in a nice way. You would not believe how many people are working on workflows: everybody and their brother at the OSG meeting had some offering labeled "workflow". (I'm exaggerating a bit here) Tibi On 3/9/07, Mihael Hategan wrote: > Yes, although we need to come up with a nicer way to do it. > In libexec/scheduler.xml, change value="4"/> to value="large number" (not literally). > > Mihael > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote: > > Hi, Mihael: > > > > Is it possible to remove this feature in the one site case ? For example, > > the queue is now almost empty on TG, but I have to wait for 1.5 hours for > > the rest of my jobs to be submitted (thats the average running time of my > > job) - and the queue might be full by that time... > > > > Nika > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > Hi, > > > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be > > > > submitted to the remote host simultaneously. Swift submits at first > > > just 26 > > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > > least one job out of those 26 is finished - swift goes ahead and submits > > > > the rest (all of those left - 42 in my case). > > > > Is it a bug or a feature? > > > > > >Feature. Although it should probably be tamed down in the one site case. > > >Each site has a score that changes based on how it behaves. If a site > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > >it gets a lower score. > > > > > >Now, let's consider the following scenario: 2 sites, one fast one slow. > > >With no scores and no limitations, half of the jobs would go to one, and > > >half to the other. The workflow finishes when the slow site finishes > > >half the jobs. > > >What happens however, is that Swift limits the number of initial jobs, > > >and does "probing". This allows it to infer some stuff about the sites > > >by the time it gets to submit lots of jobs. It should yield better > > >performance on larger workflows with imbalanced sites, which is, I'm > > >guessing, our main scenario. > > > > > > > > > > > Nika > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From yongzh at cs.uchicago.edu Fri Mar 9 11:27:09 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 9 Mar 2007 11:27:09 -0600 (CST) Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> <1173460263.4225.7.camel@blabla.mcs.anl.gov> Message-ID: I have been thinking that the system should be smarter in dealing with such issues, without relying too much on a user's manual intervention. For job submission rate, or transfer rate, if we observe abnormality, for instance: ftp errors due to high transfer rate, the system should be able to slow down automatically. I am not quite sure about how to detect that jobs go through quickly to a scheduler, but if that is the case, the submission rate should be increased automatically. Yong. On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote: > Knob means "while in progress" > Is that doable ? (Probably extending your rudimentary debugger would do it). > How about the following extension: can we easily create hooks > (webservices) into a running swift engine, that would allow this > manipulation with an external client (the knob driver) ? > Having more interactivity with a running workflow is something that > might be appealing for long-running or never-ending workflows, and > would differentiate us from others in a nice way. You would not > believe how many people are working on workflows: everybody and their > brother at the OSG meeting had some offering labeled "workflow". (I'm > exaggerating a bit here) > > Tibi > > On 3/9/07, Mihael Hategan wrote: > > Yes, although we need to come up with a nicer way to do it. > > In libexec/scheduler.xml, change > value="4"/> to value="large number" (not literally). > > > > Mihael > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote: > > > Hi, Mihael: > > > > > > Is it possible to remove this feature in the one site case ? For example, > > > the queue is now almost empty on TG, but I have to wait for 1.5 hours for > > > the rest of my jobs to be submitted (thats the average running time of my > > > job) - and the queue might be full by that time... > > > > > > Nika > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > Hi, > > > > > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be > > > > > submitted to the remote host simultaneously. Swift submits at first > > > > just 26 > > > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > > > least one job out of those 26 is finished - swift goes ahead and submits > > > > > the rest (all of those left - 42 in my case). > > > > > Is it a bug or a feature? > > > > > > > >Feature. Although it should probably be tamed down in the one site case. > > > >Each site has a score that changes based on how it behaves. If a site > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > > >it gets a lower score. > > > > > > > >Now, let's consider the following scenario: 2 sites, one fast one slow. > > > >With no scores and no limitations, half of the jobs would go to one, and > > > >half to the other. The workflow finishes when the slow site finishes > > > >half the jobs. > > > >What happens however, is that Swift limits the number of initial jobs, > > > >and does "probing". This allows it to infer some stuff about the sites > > > >by the time it gets to submit lots of jobs. It should yield better > > > >performance on larger workflows with imbalanced sites, which is, I'm > > > >guessing, our main scenario. > > > > > > > > > > > > > > Nika > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > -- > Tiberiu (Tibi) Stef-Praun, PhD > Research Staff, Computation Institute > 5640 S. Ellis Ave, #405 > University of Chicago > http://www-unix.mcs.anl.gov/~tiberius/ > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From tiberius at ci.uchicago.edu Fri Mar 9 11:29:38 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Fri, 9 Mar 2007 11:29:38 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> <1173460263.4225.7.camel@blabla.mcs.anl.gov> Message-ID: Oh, definitely, we do want the system to be smarter. The extra control capability would be in addition to that (for impatient users / control freaks ). On 3/9/07, Yong Zhao wrote: > I have been thinking that the system should be smarter in dealing with > such issues, without relying too much on a user's manual intervention. For > job submission rate, or transfer rate, if we observe abnormality, for > instance: ftp errors due to high transfer rate, the system should be able > to slow down automatically. I am not quite sure about how to detect that > jobs go through quickly to a scheduler, but if that is the case, the > submission rate should be increased automatically. > > Yong. > > On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote: > > > Knob means "while in progress" > > Is that doable ? (Probably extending your rudimentary debugger would do it). > > How about the following extension: can we easily create hooks > > (webservices) into a running swift engine, that would allow this > > manipulation with an external client (the knob driver) ? > > Having more interactivity with a running workflow is something that > > might be appealing for long-running or never-ending workflows, and > > would differentiate us from others in a nice way. You would not > > believe how many people are working on workflows: everybody and their > > brother at the OSG meeting had some offering labeled "workflow". (I'm > > exaggerating a bit here) > > > > Tibi > > > > On 3/9/07, Mihael Hategan wrote: > > > Yes, although we need to come up with a nicer way to do it. > > > In libexec/scheduler.xml, change > > value="4"/> to value="large number" (not literally). > > > > > > Mihael > > > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote: > > > > Hi, Mihael: > > > > > > > > Is it possible to remove this feature in the one site case ? For example, > > > > the queue is now almost empty on TG, but I have to wait for 1.5 hours for > > > > the rest of my jobs to be submitted (thats the average running time of my > > > > job) - and the queue might be full by that time... > > > > > > > > Nika > > > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > > Hi, > > > > > > > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be > > > > > > submitted to the remote host simultaneously. Swift submits at first > > > > > just 26 > > > > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > > > > least one job out of those 26 is finished - swift goes ahead and submits > > > > > > the rest (all of those left - 42 in my case). > > > > > > Is it a bug or a feature? > > > > > > > > > >Feature. Although it should probably be tamed down in the one site case. > > > > >Each site has a score that changes based on how it behaves. If a site > > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > > > >it gets a lower score. > > > > > > > > > >Now, let's consider the following scenario: 2 sites, one fast one slow. > > > > >With no scores and no limitations, half of the jobs would go to one, and > > > > >half to the other. The workflow finishes when the slow site finishes > > > > >half the jobs. > > > > >What happens however, is that Swift limits the number of initial jobs, > > > > >and does "probing". This allows it to infer some stuff about the sites > > > > >by the time it gets to submit lots of jobs. It should yield better > > > > >performance on larger workflows with imbalanced sites, which is, I'm > > > > >guessing, our main scenario. > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > -- > > Tiberiu (Tibi) Stef-Praun, PhD > > Research Staff, Computation Institute > > 5640 S. Ellis Ave, #405 > > University of Chicago > > http://www-unix.mcs.anl.gov/~tiberius/ > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From hategan at mcs.anl.gov Fri Mar 9 11:28:22 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 09 Mar 2007 11:28:22 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> <1173460263.4225.7.camel@blabla.mcs.anl.gov> Message-ID: <1173461302.4225.17.camel@blabla.mcs.anl.gov> On Fri, 2007-03-09 at 11:21 -0600, Tiberiu Stef-Praun wrote: > Knob means "while in progress" > Is that doable ? (Probably extending your rudimentary debugger would do it). > How about the following extension: can we easily create hooks > (webservices) into a running swift engine, that would allow this > manipulation with an external client (the knob driver) ? Possible it is. Easy, I'm not sure. The problem consists of two things: - making the engine support such knobs, and this depends on the exact functionality needed. Updating scheduler properties is probably straightforward. - the interface (a service, a GUI, a TUI, etc.) > Having more interactivity with a running workflow is something that > might be appealing for long-running or never-ending workflows, and > would differentiate us from others in a nice way. You would not > believe how many people are working on workflows: everybody and their > brother at the OSG meeting had some offering labeled "workflow". That's not unbelivable. If you want to run non-trivial applications on a big thing like OSG, you need some kind of workflow tool. > (I'm > exaggerating a bit here) > > Tibi > > On 3/9/07, Mihael Hategan wrote: > > Yes, although we need to come up with a nicer way to do it. > > In libexec/scheduler.xml, change > value="4"/> to value="large number" (not literally). > > > > Mihael > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote: > > > Hi, Mihael: > > > > > > Is it possible to remove this feature in the one site case ? For example, > > > the queue is now almost empty on TG, but I have to wait for 1.5 hours for > > > the rest of my jobs to be submitted (thats the average running time of my > > > job) - and the queue might be full by that time... > > > > > > Nika > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > Hi, > > > > > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be > > > > > submitted to the remote host simultaneously. Swift submits at first > > > > just 26 > > > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > > > least one job out of those 26 is finished - swift goes ahead and submits > > > > > the rest (all of those left - 42 in my case). > > > > > Is it a bug or a feature? > > > > > > > >Feature. Although it should probably be tamed down in the one site case. > > > >Each site has a score that changes based on how it behaves. If a site > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > > >it gets a lower score. > > > > > > > >Now, let's consider the following scenario: 2 sites, one fast one slow. > > > >With no scores and no limitations, half of the jobs would go to one, and > > > >half to the other. The workflow finishes when the slow site finishes > > > >half the jobs. > > > >What happens however, is that Swift limits the number of initial jobs, > > > >and does "probing". This allows it to infer some stuff about the sites > > > >by the time it gets to submit lots of jobs. It should yield better > > > >performance on larger workflows with imbalanced sites, which is, I'm > > > >guessing, our main scenario. > > > > > > > > > > > > > > Nika > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From hategan at mcs.anl.gov Fri Mar 9 11:35:42 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 09 Mar 2007 11:35:42 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> <1173460263.4225.7.camel@blabla.mcs.anl.gov> Message-ID: <1173461742.4225.23.camel@blabla.mcs.anl.gov> On Fri, 2007-03-09 at 11:27 -0600, Yong Zhao wrote: > I have been thinking that the system should be smarter in dealing with > such issues, without relying too much on a user's manual intervention. For > job submission rate, or transfer rate, if we observe abnormality, for > instance: ftp errors due to high transfer rate, the system should be able > to slow down automatically. I am not quite sure about how to detect that > jobs go through quickly to a scheduler, but if that is the case, the > submission rate should be increased automatically. It is increased automatically. But the problem is at the start. Do you send many jobs to a site without knowing anything about it? The site-selector that Luiz worked on would split the jobs equally to sites on the first round. That may be bad if you have highly asymmetrical sites. > > Yong. > > On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote: > > > Knob means "while in progress" > > Is that doable ? (Probably extending your rudimentary debugger would do it). > > How about the following extension: can we easily create hooks > > (webservices) into a running swift engine, that would allow this > > manipulation with an external client (the knob driver) ? > > Having more interactivity with a running workflow is something that > > might be appealing for long-running or never-ending workflows, and > > would differentiate us from others in a nice way. You would not > > believe how many people are working on workflows: everybody and their > > brother at the OSG meeting had some offering labeled "workflow". (I'm > > exaggerating a bit here) > > > > Tibi > > > > On 3/9/07, Mihael Hategan wrote: > > > Yes, although we need to come up with a nicer way to do it. > > > In libexec/scheduler.xml, change > > value="4"/> to value="large number" (not literally). > > > > > > Mihael > > > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote: > > > > Hi, Mihael: > > > > > > > > Is it possible to remove this feature in the one site case ? For example, > > > > the queue is now almost empty on TG, but I have to wait for 1.5 hours for > > > > the rest of my jobs to be submitted (thats the average running time of my > > > > job) - and the queue might be full by that time... > > > > > > > > Nika > > > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > > Hi, > > > > > > > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be > > > > > > submitted to the remote host simultaneously. Swift submits at first > > > > > just 26 > > > > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > > > > least one job out of those 26 is finished - swift goes ahead and submits > > > > > > the rest (all of those left - 42 in my case). > > > > > > Is it a bug or a feature? > > > > > > > > > >Feature. Although it should probably be tamed down in the one site case. > > > > >Each site has a score that changes based on how it behaves. If a site > > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > > > >it gets a lower score. > > > > > > > > > >Now, let's consider the following scenario: 2 sites, one fast one slow. > > > > >With no scores and no limitations, half of the jobs would go to one, and > > > > >half to the other. The workflow finishes when the slow site finishes > > > > >half the jobs. > > > > >What happens however, is that Swift limits the number of initial jobs, > > > > >and does "probing". This allows it to infer some stuff about the sites > > > > >by the time it gets to submit lots of jobs. It should yield better > > > > >performance on larger workflows with imbalanced sites, which is, I'm > > > > >guessing, our main scenario. > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > -- > > Tiberiu (Tibi) Stef-Praun, PhD > > Research Staff, Computation Institute > > 5640 S. Ellis Ave, #405 > > University of Chicago > > http://www-unix.mcs.anl.gov/~tiberius/ > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > From nefedova at mcs.anl.gov Fri Mar 9 11:40:27 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 09 Mar 2007 11:40:27 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> <1173460263.4225.7.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070309113809.032df8a0@mail.mcs.anl.gov> Oh, actually I am curious (-; Is it possible to specify somewhere the gridftp parameters (like number of parallel streams ) to increase the transfer rates ? Or swift decides on that based on the file's size ? Also, when swift is staging files in/out - does it use the concurrent transfers (all files at once), or each file is transferred separately ? Nika At 11:27 AM 3/9/2007, Yong Zhao wrote: >I have been thinking that the system should be smarter in dealing with >such issues, without relying too much on a user's manual intervention. For >job submission rate, or transfer rate, if we observe abnormality, for >instance: ftp errors due to high transfer rate, the system should be able >to slow down automatically. I am not quite sure about how to detect that >jobs go through quickly to a scheduler, but if that is the case, the >submission rate should be increased automatically. > >Yong. > >On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote: > > > Knob means "while in progress" > > Is that doable ? (Probably extending your rudimentary debugger would do > it). > > How about the following extension: can we easily create hooks > > (webservices) into a running swift engine, that would allow this > > manipulation with an external client (the knob driver) ? > > Having more interactivity with a running workflow is something that > > might be appealing for long-running or never-ending workflows, and > > would differentiate us from others in a nice way. You would not > > believe how many people are working on workflows: everybody and their > > brother at the OSG meeting had some offering labeled "workflow". (I'm > > exaggerating a bit here) > > > > Tibi > > > > On 3/9/07, Mihael Hategan wrote: > > > Yes, although we need to come up with a nicer way to do it. > > > In libexec/scheduler.xml, change > > value="4"/> to value="large number" (not literally). > > > > > > Mihael > > > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote: > > > > Hi, Mihael: > > > > > > > > Is it possible to remove this feature in the one site case ? For > example, > > > > the queue is now almost empty on TG, but I have to wait for 1.5 > hours for > > > > the rest of my jobs to be submitted (thats the average running time > of my > > > > job) - and the queue might be full by that time... > > > > > > > > Nika > > > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > > Hi, > > > > > > > > > > > > I've noticed one very strange behavior. For example, I have 68 > jobs to be > > > > > > submitted to the remote host simultaneously. Swift submits at first > > > > > just 26 > > > > > > jobs. I checked that several times - its always 26 jobs. Then, > when at > > > > > > least one job out of those 26 is finished - swift goes ahead > and submits > > > > > > the rest (all of those left - 42 in my case). > > > > > > Is it a bug or a feature? > > > > > > > > > >Feature. Although it should probably be tamed down in the one site > case. > > > > >Each site has a score that changes based on how it behaves. If a site > > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > > > >it gets a lower score. > > > > > > > > > >Now, let's consider the following scenario: 2 sites, one fast one > slow. > > > > >With no scores and no limitations, half of the jobs would go to > one, and > > > > >half to the other. The workflow finishes when the slow site finishes > > > > >half the jobs. > > > > >What happens however, is that Swift limits the number of initial jobs, > > > > >and does "probing". This allows it to infer some stuff about the sites > > > > >by the time it gets to submit lots of jobs. It should yield better > > > > >performance on larger workflows with imbalanced sites, which is, I'm > > > > >guessing, our main scenario. > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > -- > > Tiberiu (Tibi) Stef-Praun, PhD > > Research Staff, Computation Institute > > 5640 S. Ellis Ave, #405 > > University of Chicago > > http://www-unix.mcs.anl.gov/~tiberius/ > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >_______________________________________________ >Swift-devel mailing list >Swift-devel at ci.uchicago.edu >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Fri Mar 9 11:38:42 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 09 Mar 2007 11:38:42 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <1173461742.4225.23.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> <1173460263.4225.7.camel@blabla.mcs.anl.gov> <1173461742.4225.23.camel@blabla.mcs.anl.gov> Message-ID: <1173461922.4225.25.camel@blabla.mcs.anl.gov> The other thing that could be done is giving sites a different initial score. On Fri, 2007-03-09 at 11:35 -0600, Mihael Hategan wrote: > On Fri, 2007-03-09 at 11:27 -0600, Yong Zhao wrote: > > I have been thinking that the system should be smarter in dealing with > > such issues, without relying too much on a user's manual intervention. For > > job submission rate, or transfer rate, if we observe abnormality, for > > instance: ftp errors due to high transfer rate, the system should be able > > to slow down automatically. I am not quite sure about how to detect that > > jobs go through quickly to a scheduler, but if that is the case, the > > submission rate should be increased automatically. > > It is increased automatically. But the problem is at the start. Do you > send many jobs to a site without knowing anything about it? The > site-selector that Luiz worked on would split the jobs equally to sites > on the first round. That may be bad if you have highly asymmetrical > sites. > > > > > Yong. > > > > On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > Knob means "while in progress" > > > Is that doable ? (Probably extending your rudimentary debugger would do it). > > > How about the following extension: can we easily create hooks > > > (webservices) into a running swift engine, that would allow this > > > manipulation with an external client (the knob driver) ? > > > Having more interactivity with a running workflow is something that > > > might be appealing for long-running or never-ending workflows, and > > > would differentiate us from others in a nice way. You would not > > > believe how many people are working on workflows: everybody and their > > > brother at the OSG meeting had some offering labeled "workflow". (I'm > > > exaggerating a bit here) > > > > > > Tibi > > > > > > On 3/9/07, Mihael Hategan wrote: > > > > Yes, although we need to come up with a nicer way to do it. > > > > In libexec/scheduler.xml, change > > > value="4"/> to value="large number" (not literally). > > > > > > > > Mihael > > > > > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote: > > > > > Hi, Mihael: > > > > > > > > > > Is it possible to remove this feature in the one site case ? For example, > > > > > the queue is now almost empty on TG, but I have to wait for 1.5 hours for > > > > > the rest of my jobs to be submitted (thats the average running time of my > > > > > job) - and the queue might be full by that time... > > > > > > > > > > Nika > > > > > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be > > > > > > > submitted to the remote host simultaneously. Swift submits at first > > > > > > just 26 > > > > > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > > > > > least one job out of those 26 is finished - swift goes ahead and submits > > > > > > > the rest (all of those left - 42 in my case). > > > > > > > Is it a bug or a feature? > > > > > > > > > > > >Feature. Although it should probably be tamed down in the one site case. > > > > > >Each site has a score that changes based on how it behaves. If a site > > > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > > > > >it gets a lower score. > > > > > > > > > > > >Now, let's consider the following scenario: 2 sites, one fast one slow. > > > > > >With no scores and no limitations, half of the jobs would go to one, and > > > > > >half to the other. The workflow finishes when the slow site finishes > > > > > >half the jobs. > > > > > >What happens however, is that Swift limits the number of initial jobs, > > > > > >and does "probing". This allows it to infer some stuff about the sites > > > > > >by the time it gets to submit lots of jobs. It should yield better > > > > > >performance on larger workflows with imbalanced sites, which is, I'm > > > > > >guessing, our main scenario. > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > -- > > > Tiberiu (Tibi) Stef-Praun, PhD > > > Research Staff, Computation Institute > > > 5640 S. Ellis Ave, #405 > > > University of Chicago > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From yongzh at cs.uchicago.edu Fri Mar 9 11:43:36 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 9 Mar 2007 11:43:36 -0600 (CST) Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <1173461742.4225.23.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> <1173460263.4225.7.camel@blabla.mcs.anl.gov> <1173461742.4225.23.camel@blabla.mcs.anl.gov> Message-ID: right, for first batches, a user supplied hint would be more appropriate. Yong. On Fri, 9 Mar 2007, Mihael Hategan wrote: > On Fri, 2007-03-09 at 11:27 -0600, Yong Zhao wrote: > > I have been thinking that the system should be smarter in dealing with > > such issues, without relying too much on a user's manual intervention. For > > job submission rate, or transfer rate, if we observe abnormality, for > > instance: ftp errors due to high transfer rate, the system should be able > > to slow down automatically. I am not quite sure about how to detect that > > jobs go through quickly to a scheduler, but if that is the case, the > > submission rate should be increased automatically. > > It is increased automatically. But the problem is at the start. Do you > send many jobs to a site without knowing anything about it? The > site-selector that Luiz worked on would split the jobs equally to sites > on the first round. That may be bad if you have highly asymmetrical > sites. > > > > > Yong. > > > > On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > Knob means "while in progress" > > > Is that doable ? (Probably extending your rudimentary debugger would do it). > > > How about the following extension: can we easily create hooks > > > (webservices) into a running swift engine, that would allow this > > > manipulation with an external client (the knob driver) ? > > > Having more interactivity with a running workflow is something that > > > might be appealing for long-running or never-ending workflows, and > > > would differentiate us from others in a nice way. You would not > > > believe how many people are working on workflows: everybody and their > > > brother at the OSG meeting had some offering labeled "workflow". (I'm > > > exaggerating a bit here) > > > > > > Tibi > > > > > > On 3/9/07, Mihael Hategan wrote: > > > > Yes, although we need to come up with a nicer way to do it. > > > > In libexec/scheduler.xml, change > > > value="4"/> to value="large number" (not literally). > > > > > > > > Mihael > > > > > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote: > > > > > Hi, Mihael: > > > > > > > > > > Is it possible to remove this feature in the one site case ? For example, > > > > > the queue is now almost empty on TG, but I have to wait for 1.5 hours for > > > > > the rest of my jobs to be submitted (thats the average running time of my > > > > > job) - and the queue might be full by that time... > > > > > > > > > > Nika > > > > > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be > > > > > > > submitted to the remote host simultaneously. Swift submits at first > > > > > > just 26 > > > > > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > > > > > least one job out of those 26 is finished - swift goes ahead and submits > > > > > > > the rest (all of those left - 42 in my case). > > > > > > > Is it a bug or a feature? > > > > > > > > > > > >Feature. Although it should probably be tamed down in the one site case. > > > > > >Each site has a score that changes based on how it behaves. If a site > > > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > > > > >it gets a lower score. > > > > > > > > > > > >Now, let's consider the following scenario: 2 sites, one fast one slow. > > > > > >With no scores and no limitations, half of the jobs would go to one, and > > > > > >half to the other. The workflow finishes when the slow site finishes > > > > > >half the jobs. > > > > > >What happens however, is that Swift limits the number of initial jobs, > > > > > >and does "probing". This allows it to infer some stuff about the sites > > > > > >by the time it gets to submit lots of jobs. It should yield better > > > > > >performance on larger workflows with imbalanced sites, which is, I'm > > > > > >guessing, our main scenario. > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > -- > > > Tiberiu (Tibi) Stef-Praun, PhD > > > Research Staff, Computation Institute > > > 5640 S. Ellis Ave, #405 > > > University of Chicago > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > From hategan at mcs.anl.gov Fri Mar 9 11:41:42 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 09 Mar 2007 11:41:42 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <6.0.0.22.2.20070309113809.032df8a0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> <1173460263.4225.7.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309113809.032df8a0@mail.mcs.anl.gov> Message-ID: <1173462102.4225.29.camel@blabla.mcs.anl.gov> On Fri, 2007-03-09 at 11:40 -0600, Veronika V. Nefedova wrote: > Oh, actually I am curious (-; Is it possible to specify somewhere the > gridftp parameters (like number of parallel streams ) Not at this time. There is some preliminary support for tgcp files that can be used to set buffer sizes. > to increase the > transfer rates ? Or swift decides on that based on the file's size ? It doesn't. It uses normal transfers. > Also, > when swift is staging files in/out - does it use the concurrent transfers > (all files at once), or each file is transferred separately ? The transfers happen concurrently, but there are throttles (look at swift.properties) that limit the number of concurrent transfers globally. > > Nika > > At 11:27 AM 3/9/2007, Yong Zhao wrote: > >I have been thinking that the system should be smarter in dealing with > >such issues, without relying too much on a user's manual intervention. For > >job submission rate, or transfer rate, if we observe abnormality, for > >instance: ftp errors due to high transfer rate, the system should be able > >to slow down automatically. I am not quite sure about how to detect that > >jobs go through quickly to a scheduler, but if that is the case, the > >submission rate should be increased automatically. > > > >Yong. > > > >On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > Knob means "while in progress" > > > Is that doable ? (Probably extending your rudimentary debugger would do > > it). > > > How about the following extension: can we easily create hooks > > > (webservices) into a running swift engine, that would allow this > > > manipulation with an external client (the knob driver) ? > > > Having more interactivity with a running workflow is something that > > > might be appealing for long-running or never-ending workflows, and > > > would differentiate us from others in a nice way. You would not > > > believe how many people are working on workflows: everybody and their > > > brother at the OSG meeting had some offering labeled "workflow". (I'm > > > exaggerating a bit here) > > > > > > Tibi > > > > > > On 3/9/07, Mihael Hategan wrote: > > > > Yes, although we need to come up with a nicer way to do it. > > > > In libexec/scheduler.xml, change > > > value="4"/> to value="large number" (not literally). > > > > > > > > Mihael > > > > > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote: > > > > > Hi, Mihael: > > > > > > > > > > Is it possible to remove this feature in the one site case ? For > > example, > > > > > the queue is now almost empty on TG, but I have to wait for 1.5 > > hours for > > > > > the rest of my jobs to be submitted (thats the average running time > > of my > > > > > job) - and the queue might be full by that time... > > > > > > > > > > Nika > > > > > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I've noticed one very strange behavior. For example, I have 68 > > jobs to be > > > > > > > submitted to the remote host simultaneously. Swift submits at first > > > > > > just 26 > > > > > > > jobs. I checked that several times - its always 26 jobs. Then, > > when at > > > > > > > least one job out of those 26 is finished - swift goes ahead > > and submits > > > > > > > the rest (all of those left - 42 in my case). > > > > > > > Is it a bug or a feature? > > > > > > > > > > > >Feature. Although it should probably be tamed down in the one site > > case. > > > > > >Each site has a score that changes based on how it behaves. If a site > > > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > > > > >it gets a lower score. > > > > > > > > > > > >Now, let's consider the following scenario: 2 sites, one fast one > > slow. > > > > > >With no scores and no limitations, half of the jobs would go to > > one, and > > > > > >half to the other. The workflow finishes when the slow site finishes > > > > > >half the jobs. > > > > > >What happens however, is that Swift limits the number of initial jobs, > > > > > >and does "probing". This allows it to infer some stuff about the sites > > > > > >by the time it gets to submit lots of jobs. It should yield better > > > > > >performance on larger workflows with imbalanced sites, which is, I'm > > > > > >guessing, our main scenario. > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > -- > > > Tiberiu (Tibi) Stef-Praun, PhD > > > Research Staff, Computation Institute > > > 5640 S. Ellis Ave, #405 > > > University of Chicago > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > >_______________________________________________ > >Swift-devel mailing list > >Swift-devel at ci.uchicago.edu > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Fri Mar 9 11:46:42 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 09 Mar 2007 11:46:42 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> <1173460263.4225.7.camel@blabla.mcs.anl.gov> <1173461742.4225.23.camel@blabla.mcs.anl.gov> Message-ID: <1173462402.5815.3.camel@blabla.mcs.anl.gov> Yeah. I think that's it. The ability to control the initial score. And possibly automate that a little by considering a total score that gets divided by the number of sites. That would limit the number of jobs sent initially to all the sites (and this could be a much larger number). In the one site case, that larger number would belong exclusively to the one site. On Fri, 2007-03-09 at 11:43 -0600, Yong Zhao wrote: > right, for first batches, a user supplied hint would be more appropriate. > > Yong. > > On Fri, 9 Mar 2007, Mihael Hategan wrote: > > > On Fri, 2007-03-09 at 11:27 -0600, Yong Zhao wrote: > > > I have been thinking that the system should be smarter in dealing with > > > such issues, without relying too much on a user's manual intervention. For > > > job submission rate, or transfer rate, if we observe abnormality, for > > > instance: ftp errors due to high transfer rate, the system should be able > > > to slow down automatically. I am not quite sure about how to detect that > > > jobs go through quickly to a scheduler, but if that is the case, the > > > submission rate should be increased automatically. > > > > It is increased automatically. But the problem is at the start. Do you > > send many jobs to a site without knowing anything about it? The > > site-selector that Luiz worked on would split the jobs equally to sites > > on the first round. That may be bad if you have highly asymmetrical > > sites. > > > > > > > > Yong. > > > > > > On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > > > Knob means "while in progress" > > > > Is that doable ? (Probably extending your rudimentary debugger would do it). > > > > How about the following extension: can we easily create hooks > > > > (webservices) into a running swift engine, that would allow this > > > > manipulation with an external client (the knob driver) ? > > > > Having more interactivity with a running workflow is something that > > > > might be appealing for long-running or never-ending workflows, and > > > > would differentiate us from others in a nice way. You would not > > > > believe how many people are working on workflows: everybody and their > > > > brother at the OSG meeting had some offering labeled "workflow". (I'm > > > > exaggerating a bit here) > > > > > > > > Tibi > > > > > > > > On 3/9/07, Mihael Hategan wrote: > > > > > Yes, although we need to come up with a nicer way to do it. > > > > > In libexec/scheduler.xml, change > > > > value="4"/> to value="large number" (not literally). > > > > > > > > > > Mihael > > > > > > > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote: > > > > > > Hi, Mihael: > > > > > > > > > > > > Is it possible to remove this feature in the one site case ? For example, > > > > > > the queue is now almost empty on TG, but I have to wait for 1.5 hours for > > > > > > the rest of my jobs to be submitted (thats the average running time of my > > > > > > job) - and the queue might be full by that time... > > > > > > > > > > > > Nika > > > > > > > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > I've noticed one very strange behavior. For example, I have 68 jobs to be > > > > > > > > submitted to the remote host simultaneously. Swift submits at first > > > > > > > just 26 > > > > > > > > jobs. I checked that several times - its always 26 jobs. Then, when at > > > > > > > > least one job out of those 26 is finished - swift goes ahead and submits > > > > > > > > the rest (all of those left - 42 in my case). > > > > > > > > Is it a bug or a feature? > > > > > > > > > > > > > >Feature. Although it should probably be tamed down in the one site case. > > > > > > >Each site has a score that changes based on how it behaves. If a site > > > > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > > > > > >it gets a lower score. > > > > > > > > > > > > > >Now, let's consider the following scenario: 2 sites, one fast one slow. > > > > > > >With no scores and no limitations, half of the jobs would go to one, and > > > > > > >half to the other. The workflow finishes when the slow site finishes > > > > > > >half the jobs. > > > > > > >What happens however, is that Swift limits the number of initial jobs, > > > > > > >and does "probing". This allows it to infer some stuff about the sites > > > > > > >by the time it gets to submit lots of jobs. It should yield better > > > > > > >performance on larger workflows with imbalanced sites, which is, I'm > > > > > > >guessing, our main scenario. > > > > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Swift-devel mailing list > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > -- > > > > Tiberiu (Tibi) Stef-Praun, PhD > > > > Research Staff, Computation Institute > > > > 5640 S. Ellis Ave, #405 > > > > University of Chicago > > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > From nefedova at mcs.anl.gov Fri Mar 9 12:08:19 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 09 Mar 2007 12:08:19 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <1173462402.5815.3.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> <1173460263.4225.7.camel@blabla.mcs.anl.gov> <1173461742.4225.23.camel@blabla.mcs.anl.gov> <1173462402.5815.3.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070309115740.021a1cf0@mail.mcs.anl.gov> So how the initial score is determined ? By the waiting time in the queue ? Or does it send any probing job (or qstat ) to check the queue availability ? If we have two sites - one has an empty queue, another has a full queue - how the submission of jobs will be handled to both sites? Nika At 11:46 AM 3/9/2007, Mihael Hategan wrote: >Yeah. I think that's it. The ability to control the initial score. And >possibly automate that a little by considering a total score that gets >divided by the number of sites. That would limit the number of jobs sent >initially to all the sites (and this could be a much larger number). In >the one site case, that larger number would belong exclusively to the >one site. > >On Fri, 2007-03-09 at 11:43 -0600, Yong Zhao wrote: > > right, for first batches, a user supplied hint would be more appropriate. > > > > Yong. > > > > On Fri, 9 Mar 2007, Mihael Hategan wrote: > > > > > On Fri, 2007-03-09 at 11:27 -0600, Yong Zhao wrote: > > > > I have been thinking that the system should be smarter in dealing with > > > > such issues, without relying too much on a user's manual > intervention. For > > > > job submission rate, or transfer rate, if we observe abnormality, for > > > > instance: ftp errors due to high transfer rate, the system should > be able > > > > to slow down automatically. I am not quite sure about how to detect > that > > > > jobs go through quickly to a scheduler, but if that is the case, the > > > > submission rate should be increased automatically. > > > > > > It is increased automatically. But the problem is at the start. Do you > > > send many jobs to a site without knowing anything about it? The > > > site-selector that Luiz worked on would split the jobs equally to sites > > > on the first round. That may be bad if you have highly asymmetrical > > > sites. > > > > > > > > > > > Yong. > > > > > > > > On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > > > > > Knob means "while in progress" > > > > > Is that doable ? (Probably extending your rudimentary debugger > would do it). > > > > > How about the following extension: can we easily create hooks > > > > > (webservices) into a running swift engine, that would allow this > > > > > manipulation with an external client (the knob driver) ? > > > > > Having more interactivity with a running workflow is something that > > > > > might be appealing for long-running or never-ending workflows, and > > > > > would differentiate us from others in a nice way. You would not > > > > > believe how many people are working on workflows: everybody and their > > > > > brother at the OSG meeting had some offering labeled "workflow". (I'm > > > > > exaggerating a bit here) > > > > > > > > > > Tibi > > > > > > > > > > On 3/9/07, Mihael Hategan wrote: > > > > > > Yes, although we need to come up with a nicer way to do it. > > > > > > In libexec/scheduler.xml, change > > > > > value="4"/> to value="large number" (not literally). > > > > > > > > > > > > Mihael > > > > > > > > > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote: > > > > > > > Hi, Mihael: > > > > > > > > > > > > > > Is it possible to remove this feature in the one site case ? > For example, > > > > > > > the queue is now almost empty on TG, but I have to wait for > 1.5 hours for > > > > > > > the rest of my jobs to be submitted (thats the average > running time of my > > > > > > > job) - and the queue might be full by that time... > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > I've noticed one very strange behavior. For example, I > have 68 jobs to be > > > > > > > > > submitted to the remote host simultaneously. Swift > submits at first > > > > > > > > just 26 > > > > > > > > > jobs. I checked that several times - its always 26 jobs. > Then, when at > > > > > > > > > least one job out of those 26 is finished - swift goes > ahead and submits > > > > > > > > > the rest (all of those left - 42 in my case). > > > > > > > > > Is it a bug or a feature? > > > > > > > > > > > > > > > >Feature. Although it should probably be tamed down in the > one site case. > > > > > > > >Each site has a score that changes based on how it behaves. > If a site > > > > > > > >completes jobs ok, it gets a higher score in time. If jobs > fail on it, > > > > > > > >it gets a lower score. > > > > > > > > > > > > > > > >Now, let's consider the following scenario: 2 sites, one > fast one slow. > > > > > > > >With no scores and no limitations, half of the jobs would go > to one, and > > > > > > > >half to the other. The workflow finishes when the slow site > finishes > > > > > > > >half the jobs. > > > > > > > >What happens however, is that Swift limits the number of > initial jobs, > > > > > > > >and does "probing". This allows it to infer some stuff about > the sites > > > > > > > >by the time it gets to submit lots of jobs. It should yield > better > > > > > > > >performance on larger workflows with imbalanced sites, which > is, I'm > > > > > > > >guessing, our main scenario. > > > > > > > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Swift-devel mailing list > > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > -- > > > > > Tiberiu (Tibi) Stef-Praun, PhD > > > > > Research Staff, Computation Institute > > > > > 5640 S. Ellis Ave, #405 > > > > > University of Chicago > > > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > >_______________________________________________ >Swift-devel mailing list >Swift-devel at ci.uchicago.edu >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From nefedova at mcs.anl.gov Fri Mar 9 12:24:58 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 09 Mar 2007 12:24:58 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <1173313254.6939.5.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070307165326.03c8abe0@mail.mcs.anl.gov> <1173309561.6469.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070307172505.04ecfd10@mail.mcs.anl.gov> <1173313254.6939.5.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070309122152.042b2030@mail.mcs.anl.gov> So this maxSimultaneousJobs parameter would control the number of jobs I have on any particular site, correct ? For example, if I have now say 50 running jobs and then submit another workflow that has this parameter set - would it submit an additional 384 jobs, or only enough amount to make the total number of jobs from 2 workflows to equal to 384 ? Nika At 06:20 PM 3/7/2007, Mihael Hategan wrote: >So this limit would have to be a per-site limit. >There is no such thing right now. You can limit the total number of >concurrent jobs, but it's not exposed through swift.properties. > >In libexec/scheduler.xml, you can try adding the following thing inside >...: > > > >Mihael > >On Wed, 2007-03-07 at 17:27 -0600, Veronika V. Nefedova wrote: > > Right. Teragrid at NCSA has the limit of 384 queued or running jobs per > user. > > > > Nika > > > > At 05:19 PM 3/7/2007, Mihael Hategan wrote: > > >On Wed, 2007-03-07 at 16:58 -0600, Veronika V. Nefedova wrote: > > > > OK, Here is my another question. > > > > Teragrid allows the user to have 385 jobs in a queue. If I run my > complete > > > > workflow (244 molecules), on stage four I'll have 80 times 244 jobs > (i.e. > > > > close to 20K). How do I set the limit for the number of submitted > jobs to > > > > the queue to 385 ? I remember that condor had a specific parameter to > > > > condor_submit that was managing exactly that... > > > > > >Is this 385 jobs per site? > > > > > > > > > > > Nika > > > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > > Hi, > > > > > > > > > > > > I've noticed one very strange behavior. For example, I have 68 > jobs > > > to be > > > > > > submitted to the remote host simultaneously. Swift submits at first > > > > > just 26 > > > > > > jobs. I checked that several times - its always 26 jobs. Then, > when at > > > > > > least one job out of those 26 is finished - swift goes ahead and > > > submits > > > > > > the rest (all of those left - 42 in my case). > > > > > > Is it a bug or a feature? > > > > > > > > > >Feature. Although it should probably be tamed down in the one site > case. > > > > >Each site has a score that changes based on how it behaves. If a site > > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > > > >it gets a lower score. > > > > > > > > > >Now, let's consider the following scenario: 2 sites, one fast one > slow. > > > > >With no scores and no limitations, half of the jobs would go to > one, and > > > > >half to the other. The workflow finishes when the slow site finishes > > > > >half the jobs. > > > > >What happens however, is that Swift limits the number of initial jobs, > > > > >and does "probing". This allows it to infer some stuff about the sites > > > > >by the time it gets to submit lots of jobs. It should yield better > > > > >performance on larger workflows with imbalanced sites, which is, I'm > > > > >guessing, our main scenario. > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > From hategan at mcs.anl.gov Fri Mar 9 12:32:24 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 09 Mar 2007 12:32:24 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <6.0.0.22.2.20070309115740.021a1cf0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309110255.048c6c20@mail.mcs.anl.gov> <1173460263.4225.7.camel@blabla.mcs.anl.gov> <1173461742.4225.23.camel@blabla.mcs.anl.gov> <1173462402.5815.3.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309115740.021a1cf0@mail.mcs.anl.gov> Message-ID: <1173465144.7422.4.camel@blabla.mcs.anl.gov> On Fri, 2007-03-09 at 12:08 -0600, Veronika V. Nefedova wrote: > So how the initial score is determined ? The initial score is currently 1. The limit to the number of jobs is score*jobThrottle + 2. > By the waiting time in the queue ? > Or does it send any probing job (or qstat ) to check the queue availability > ? That is a possibility, but it may require some assumptions about the exact queuing system that is installed. I heard that MDS should provide this information. I have though yet been unable to see any details on that. Perhaps somebody has some pointers. > If we have two sites - one has an empty queue, another has a full queue - > how the submission of jobs will be handled to both sites? Initially both will get the same score and the same amount of jobs. When a job completes successfully, the score for that site is increased. In the above case, the one with the empty queue will finish the jobs, which will increase its score and cause it to get more jobs, while the one with the full queue will still only have the initial jobs. > > Nika > > At 11:46 AM 3/9/2007, Mihael Hategan wrote: > >Yeah. I think that's it. The ability to control the initial score. And > >possibly automate that a little by considering a total score that gets > >divided by the number of sites. That would limit the number of jobs sent > >initially to all the sites (and this could be a much larger number). In > >the one site case, that larger number would belong exclusively to the > >one site. > > > >On Fri, 2007-03-09 at 11:43 -0600, Yong Zhao wrote: > > > right, for first batches, a user supplied hint would be more appropriate. > > > > > > Yong. > > > > > > On Fri, 9 Mar 2007, Mihael Hategan wrote: > > > > > > > On Fri, 2007-03-09 at 11:27 -0600, Yong Zhao wrote: > > > > > I have been thinking that the system should be smarter in dealing with > > > > > such issues, without relying too much on a user's manual > > intervention. For > > > > > job submission rate, or transfer rate, if we observe abnormality, for > > > > > instance: ftp errors due to high transfer rate, the system should > > be able > > > > > to slow down automatically. I am not quite sure about how to detect > > that > > > > > jobs go through quickly to a scheduler, but if that is the case, the > > > > > submission rate should be increased automatically. > > > > > > > > It is increased automatically. But the problem is at the start. Do you > > > > send many jobs to a site without knowing anything about it? The > > > > site-selector that Luiz worked on would split the jobs equally to sites > > > > on the first round. That may be bad if you have highly asymmetrical > > > > sites. > > > > > > > > > > > > > > Yong. > > > > > > > > > > On Fri, 9 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > > > > > > > Knob means "while in progress" > > > > > > Is that doable ? (Probably extending your rudimentary debugger > > would do it). > > > > > > How about the following extension: can we easily create hooks > > > > > > (webservices) into a running swift engine, that would allow this > > > > > > manipulation with an external client (the knob driver) ? > > > > > > Having more interactivity with a running workflow is something that > > > > > > might be appealing for long-running or never-ending workflows, and > > > > > > would differentiate us from others in a nice way. You would not > > > > > > believe how many people are working on workflows: everybody and their > > > > > > brother at the OSG meeting had some offering labeled "workflow". (I'm > > > > > > exaggerating a bit here) > > > > > > > > > > > > Tibi > > > > > > > > > > > > On 3/9/07, Mihael Hategan wrote: > > > > > > > Yes, although we need to come up with a nicer way to do it. > > > > > > > In libexec/scheduler.xml, change > > > > > > value="4"/> to value="large number" (not literally). > > > > > > > > > > > > > > Mihael > > > > > > > > > > > > > > On Fri, 2007-03-09 at 11:06 -0600, Veronika V. Nefedova wrote: > > > > > > > > Hi, Mihael: > > > > > > > > > > > > > > > > Is it possible to remove this feature in the one site case ? > > For example, > > > > > > > > the queue is now almost empty on TG, but I have to wait for > > 1.5 hours for > > > > > > > > the rest of my jobs to be submitted (thats the average > > running time of my > > > > > > > > job) - and the queue might be full by that time... > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > > > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > I've noticed one very strange behavior. For example, I > > have 68 jobs to be > > > > > > > > > > submitted to the remote host simultaneously. Swift > > submits at first > > > > > > > > > just 26 > > > > > > > > > > jobs. I checked that several times - its always 26 jobs. > > Then, when at > > > > > > > > > > least one job out of those 26 is finished - swift goes > > ahead and submits > > > > > > > > > > the rest (all of those left - 42 in my case). > > > > > > > > > > Is it a bug or a feature? > > > > > > > > > > > > > > > > > >Feature. Although it should probably be tamed down in the > > one site case. > > > > > > > > >Each site has a score that changes based on how it behaves. > > If a site > > > > > > > > >completes jobs ok, it gets a higher score in time. If jobs > > fail on it, > > > > > > > > >it gets a lower score. > > > > > > > > > > > > > > > > > >Now, let's consider the following scenario: 2 sites, one > > fast one slow. > > > > > > > > >With no scores and no limitations, half of the jobs would go > > to one, and > > > > > > > > >half to the other. The workflow finishes when the slow site > > finishes > > > > > > > > >half the jobs. > > > > > > > > >What happens however, is that Swift limits the number of > > initial jobs, > > > > > > > > >and does "probing". This allows it to infer some stuff about > > the sites > > > > > > > > >by the time it gets to submit lots of jobs. It should yield > > better > > > > > > > > >performance on larger workflows with imbalanced sites, which > > is, I'm > > > > > > > > >guessing, our main scenario. > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > Swift-devel mailing list > > > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > > Tiberiu (Tibi) Stef-Praun, PhD > > > > > > Research Staff, Computation Institute > > > > > > 5640 S. Ellis Ave, #405 > > > > > > University of Chicago > > > > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > >_______________________________________________ > >Swift-devel mailing list > >Swift-devel at ci.uchicago.edu > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Fri Mar 9 12:35:12 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 09 Mar 2007 12:35:12 -0600 Subject: [Swift-devel] submitting jobs to the queue In-Reply-To: <6.0.0.22.2.20070309122152.042b2030@mail.mcs.anl.gov> References: <6.0.0.22.2.20070307162552.04ebb090@mail.mcs.anl.gov> <1173306974.4767.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070307165326.03c8abe0@mail.mcs.anl.gov> <1173309561.6469.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070307172505.04ecfd10@mail.mcs.anl.gov> <1173313254.6939.5.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309122152.042b2030@mail.mcs.anl.gov> Message-ID: <1173465312.7422.8.camel@blabla.mcs.anl.gov> On Fri, 2007-03-09 at 12:24 -0600, Veronika V. Nefedova wrote: > So this maxSimultaneousJobs parameter would control the number of jobs I > have on any particular site, correct ? No. That controls the number of concurrent jobs in one swift scheduler instance (right now that == one swift invocation). > For example, if I have now say 50 > running jobs and then submit another workflow that has this parameter set - > would it submit an additional 384 jobs, or only enough amount to make the > total number of jobs from 2 workflows to equal to 384 ? The maxSimulatneousJobs throttle is applied after the scoring stuff. You would have to make sure you increase the jobThrottle I mentioned and then set maxSimultaneousJobs. > > Nika > > At 06:20 PM 3/7/2007, Mihael Hategan wrote: > >So this limit would have to be a per-site limit. > >There is no such thing right now. You can limit the total number of > >concurrent jobs, but it's not exposed through swift.properties. > > > >In libexec/scheduler.xml, you can try adding the following thing inside > >...: > > > > > > > >Mihael > > > >On Wed, 2007-03-07 at 17:27 -0600, Veronika V. Nefedova wrote: > > > Right. Teragrid at NCSA has the limit of 384 queued or running jobs per > > user. > > > > > > Nika > > > > > > At 05:19 PM 3/7/2007, Mihael Hategan wrote: > > > >On Wed, 2007-03-07 at 16:58 -0600, Veronika V. Nefedova wrote: > > > > > OK, Here is my another question. > > > > > Teragrid allows the user to have 385 jobs in a queue. If I run my > > complete > > > > > workflow (244 molecules), on stage four I'll have 80 times 244 jobs > > (i.e. > > > > > close to 20K). How do I set the limit for the number of submitted > > jobs to > > > > > the queue to 385 ? I remember that condor had a specific parameter to > > > > > condor_submit that was managing exactly that... > > > > > > > >Is this 385 jobs per site? > > > > > > > > > > > > > > Nika > > > > > > > > > > At 04:36 PM 3/7/2007, Mihael Hategan wrote: > > > > > >On Wed, 2007-03-07 at 16:30 -0600, Veronika V. Nefedova wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I've noticed one very strange behavior. For example, I have 68 > > jobs > > > > to be > > > > > > > submitted to the remote host simultaneously. Swift submits at first > > > > > > just 26 > > > > > > > jobs. I checked that several times - its always 26 jobs. Then, > > when at > > > > > > > least one job out of those 26 is finished - swift goes ahead and > > > > submits > > > > > > > the rest (all of those left - 42 in my case). > > > > > > > Is it a bug or a feature? > > > > > > > > > > > >Feature. Although it should probably be tamed down in the one site > > case. > > > > > >Each site has a score that changes based on how it behaves. If a site > > > > > >completes jobs ok, it gets a higher score in time. If jobs fail on it, > > > > > >it gets a lower score. > > > > > > > > > > > >Now, let's consider the following scenario: 2 sites, one fast one > > slow. > > > > > >With no scores and no limitations, half of the jobs would go to > > one, and > > > > > >half to the other. The workflow finishes when the slow site finishes > > > > > >half the jobs. > > > > > >What happens however, is that Swift limits the number of initial jobs, > > > > > >and does "probing". This allows it to infer some stuff about the sites > > > > > >by the time it gets to submit lots of jobs. It should yield better > > > > > >performance on larger workflows with imbalanced sites, which is, I'm > > > > > >guessing, our main scenario. > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > From nefedova at mcs.anl.gov Fri Mar 9 14:55:52 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 09 Mar 2007 14:55:52 -0600 Subject: [Swift-devel] dot error Message-ID: <6.0.0.22.2.20070309145104.048c5010@mail.mcs.anl.gov> Hi, Not sure what happened -- but I was unable to produce (or display) the correct png file out of dot file that was generated by swift (after the workflow was done). I put my dot file on evitable in ~nefedova/swift-MolDyn-free-01dqgqzbhns11.dot Then I ran the dot command to generate the png file: $dot -ograph_big.png -Tpng swift-MolDyn-free-01dqgqzbhns11.dot Then I tried to display it in the browser and got this: The image "http://www.ci.uchicago.edu/~nefedova/graph_big.png" cannot be displayed, because it contains errors. Any idea what is wrong here? Thanks, Nika From hategan at mcs.anl.gov Fri Mar 9 15:25:55 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 09 Mar 2007 15:25:55 -0600 Subject: [Swift-devel] dot error In-Reply-To: <6.0.0.22.2.20070309145104.048c5010@mail.mcs.anl.gov> References: <6.0.0.22.2.20070309145104.048c5010@mail.mcs.anl.gov> Message-ID: <1173475555.13614.0.camel@blabla.mcs.anl.gov> Did dot complain about anything? On Fri, 2007-03-09 at 14:55 -0600, Veronika V. Nefedova wrote: > Hi, > > Not sure what happened -- but I was unable to produce (or display) the > correct png file out of dot file that was generated by swift (after the > workflow was done). > > I put my dot file on evitable in ~nefedova/swift-MolDyn-free-01dqgqzbhns11.dot > Then I ran the dot command to generate the png file: > $dot -ograph_big.png -Tpng swift-MolDyn-free-01dqgqzbhns11.dot > > Then I tried to display it in the browser and got this: > The image "http://www.ci.uchicago.edu/~nefedova/graph_big.png" cannot be > displayed, because it contains errors. > > Any idea what is wrong here? > > Thanks, > > Nika > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Fri Mar 9 15:31:48 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 09 Mar 2007 15:31:48 -0600 Subject: [Swift-devel] dot error In-Reply-To: <1173475555.13614.0.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070309145104.048c5010@mail.mcs.anl.gov> <1173475555.13614.0.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070309153112.032e73f0@mail.mcs.anl.gov> Nope! [nefedova at evitable ~]$ dot -ograph_big.png -Tpng swift-MolDyn-free-01dqgqzbhns11.dot [nefedova at evitable ~]$ cp graph_big.png public_html/ [nefedova at evitable ~]$ which dot /usr/bin/dot At 03:25 PM 3/9/2007, Mihael Hategan wrote: >Did dot complain about anything? > >On Fri, 2007-03-09 at 14:55 -0600, Veronika V. Nefedova wrote: > > Hi, > > > > Not sure what happened -- but I was unable to produce (or display) the > > correct png file out of dot file that was generated by swift (after the > > workflow was done). > > > > I put my dot file on evitable in > ~nefedova/swift-MolDyn-free-01dqgqzbhns11.dot > > Then I ran the dot command to generate the png file: > > $dot -ograph_big.png -Tpng swift-MolDyn-free-01dqgqzbhns11.dot > > > > Then I tried to display it in the browser and got this: > > The image "http://www.ci.uchicago.edu/~nefedova/graph_big.png" cannot be > > displayed, because it contains errors. > > > > Any idea what is wrong here? > > > > Thanks, > > > > Nika > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Fri Mar 9 15:31:05 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 09 Mar 2007 15:31:05 -0600 Subject: [Swift-devel] dot error In-Reply-To: <6.0.0.22.2.20070309153112.032e73f0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070309145104.048c5010@mail.mcs.anl.gov> <1173475555.13614.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309153112.032e73f0@mail.mcs.anl.gov> Message-ID: <1173475865.13614.6.camel@blabla.mcs.anl.gov> Identify from ImageMagick quickly eats up all the memory when I try to run it on that file. I'm tempted to conclude that something might be broken with dot. Can you try producing a PostScript file instead? Mihael On Fri, 2007-03-09 at 15:31 -0600, Veronika V. Nefedova wrote: > Nope! > > [nefedova at evitable ~]$ dot -ograph_big.png -Tpng > swift-MolDyn-free-01dqgqzbhns11.dot > [nefedova at evitable ~]$ cp graph_big.png public_html/ > [nefedova at evitable ~]$ which dot > /usr/bin/dot > > > At 03:25 PM 3/9/2007, Mihael Hategan wrote: > >Did dot complain about anything? > > > >On Fri, 2007-03-09 at 14:55 -0600, Veronika V. Nefedova wrote: > > > Hi, > > > > > > Not sure what happened -- but I was unable to produce (or display) the > > > correct png file out of dot file that was generated by swift (after the > > > workflow was done). > > > > > > I put my dot file on evitable in > > ~nefedova/swift-MolDyn-free-01dqgqzbhns11.dot > > > Then I ran the dot command to generate the png file: > > > $dot -ograph_big.png -Tpng swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > Then I tried to display it in the browser and got this: > > > The image "http://www.ci.uchicago.edu/~nefedova/graph_big.png" cannot be > > > displayed, because it contains errors. > > > > > > Any idea what is wrong here? > > > > > > Thanks, > > > > > > Nika > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From nefedova at mcs.anl.gov Fri Mar 9 18:08:03 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 09 Mar 2007 18:08:03 -0600 Subject: [Swift-devel] dot error In-Reply-To: <1173484840.19054.2.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070309145104.048c5010@mail.mcs.anl.gov> <1173475555.13614.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309153112.032e73f0@mail.mcs.anl.gov> <1173475865.13614.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309153648.0553a930@mail.mcs.anl.gov> <1173479931.15803.8.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309165000.032ff910@mail.mcs.anl.gov> <1173484840.19054.2.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070309180625.05412d40@mail.mcs.anl.gov> It is a simple graph: a-b-c-d (where a,b,c are single nodes, and d is 68 parallel nodes). Could the number of parallel nodes be a problem? Nika At 06:00 PM 3/9/2007, Mihael Hategan wrote: >No clue. Looks like a pretty big graph. I can somewhat view things, but >it behaves weirdly. > >On Fri, 2007-03-09 at 16:50 -0600, Veronika V. Nefedova wrote: > > Thank you! > > > > Done. Its in my home dir on evitable. > > > > At 04:38 PM 3/9/2007, you wrote: > > >No walk. Only a step. Instead of -Tpng, -Tps. And the name eventually > > >changed to graph_big.ps. Ok, it was two steps. > > > > > >Mihael > > > > > >On Fri, 2007-03-09 at 15:37 -0600, Veronika V. Nefedova wrote: > > > > You'd have to walk me through this. I do not know how to produce the ps > > > > file out of dot file... > > > > Sorry! > > > > > > > > At 03:31 PM 3/9/2007, you wrote: > > > > >Identify from ImageMagick quickly eats up all the memory when I try to > > > > >run it on that file. > > > > >I'm tempted to conclude that something might be broken with dot. > > > > >Can you try producing a PostScript file instead? > > > > > > > > > >Mihael > > > > > > > > > >On Fri, 2007-03-09 at 15:31 -0600, Veronika V. Nefedova wrote: > > > > > > Nope! > > > > > > > > > > > > [nefedova at evitable ~]$ dot -ograph_big.png -Tpng > > > > > > swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > [nefedova at evitable ~]$ cp graph_big.png public_html/ > > > > > > [nefedova at evitable ~]$ which dot > > > > > > /usr/bin/dot > > > > > > > > > > > > > > > > > > At 03:25 PM 3/9/2007, Mihael Hategan wrote: > > > > > > >Did dot complain about anything? > > > > > > > > > > > > > >On Fri, 2007-03-09 at 14:55 -0600, Veronika V. Nefedova wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > Not sure what happened -- but I was unable to produce (or > > > display) the > > > > > > > > correct png file out of dot file that was generated by swift > > > (after the > > > > > > > > workflow was done). > > > > > > > > > > > > > > > > I put my dot file on evitable in > > > > > > > ~nefedova/swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > > Then I ran the dot command to generate the png file: > > > > > > > > $dot -ograph_big.png -Tpng swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > > > > > > > > > > Then I tried to display it in the browser and got this: > > > > > > > > The image "http://www.ci.uchicago.edu/~nefedova/graph_big.png" > > > > > cannot be > > > > > > > > displayed, because it contains errors. > > > > > > > > > > > > > > > > Any idea what is wrong here? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Swift-devel mailing list > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From hategan at mcs.anl.gov Fri Mar 9 20:05:20 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 09 Mar 2007 20:05:20 -0600 Subject: [Swift-devel] dot error In-Reply-To: <6.0.0.22.2.20070309180625.05412d40@mail.mcs.anl.gov> References: <6.0.0.22.2.20070309145104.048c5010@mail.mcs.anl.gov> <1173475555.13614.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309153112.032e73f0@mail.mcs.anl.gov> <1173475865.13614.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309153648.0553a930@mail.mcs.anl.gov> <1173479931.15803.8.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309165000.032ff910@mail.mcs.anl.gov> <1173484840.19054.2.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309180625.05412d40@mail.mcs.anl.gov> Message-ID: <1173492320.19899.1.camel@blabla.mcs.anl.gov> If it's a valid .dot specification, dot shouldn't blow. If it's not a valid dot specification, dot should complain, and we should fix swift to produce valid .dot files. We need to figure out which one it is. Mihael On Fri, 2007-03-09 at 18:08 -0600, Veronika V. Nefedova wrote: > It is a simple graph: a-b-c-d (where a,b,c are single nodes, and d is 68 > parallel nodes). Could the number of parallel nodes be a problem? > > Nika > > At 06:00 PM 3/9/2007, Mihael Hategan wrote: > >No clue. Looks like a pretty big graph. I can somewhat view things, but > >it behaves weirdly. > > > >On Fri, 2007-03-09 at 16:50 -0600, Veronika V. Nefedova wrote: > > > Thank you! > > > > > > Done. Its in my home dir on evitable. > > > > > > At 04:38 PM 3/9/2007, you wrote: > > > >No walk. Only a step. Instead of -Tpng, -Tps. And the name eventually > > > >changed to graph_big.ps. Ok, it was two steps. > > > > > > > >Mihael > > > > > > > >On Fri, 2007-03-09 at 15:37 -0600, Veronika V. Nefedova wrote: > > > > > You'd have to walk me through this. I do not know how to produce the ps > > > > > file out of dot file... > > > > > Sorry! > > > > > > > > > > At 03:31 PM 3/9/2007, you wrote: > > > > > >Identify from ImageMagick quickly eats up all the memory when I try to > > > > > >run it on that file. > > > > > >I'm tempted to conclude that something might be broken with dot. > > > > > >Can you try producing a PostScript file instead? > > > > > > > > > > > >Mihael > > > > > > > > > > > >On Fri, 2007-03-09 at 15:31 -0600, Veronika V. Nefedova wrote: > > > > > > > Nope! > > > > > > > > > > > > > > [nefedova at evitable ~]$ dot -ograph_big.png -Tpng > > > > > > > swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > [nefedova at evitable ~]$ cp graph_big.png public_html/ > > > > > > > [nefedova at evitable ~]$ which dot > > > > > > > /usr/bin/dot > > > > > > > > > > > > > > > > > > > > > At 03:25 PM 3/9/2007, Mihael Hategan wrote: > > > > > > > >Did dot complain about anything? > > > > > > > > > > > > > > > >On Fri, 2007-03-09 at 14:55 -0600, Veronika V. Nefedova wrote: > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > Not sure what happened -- but I was unable to produce (or > > > > display) the > > > > > > > > > correct png file out of dot file that was generated by swift > > > > (after the > > > > > > > > > workflow was done). > > > > > > > > > > > > > > > > > > I put my dot file on evitable in > > > > > > > > ~nefedova/swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > > > Then I ran the dot command to generate the png file: > > > > > > > > > $dot -ograph_big.png -Tpng swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > > > > > > > > > > > > Then I tried to display it in the browser and got this: > > > > > > > > > The image "http://www.ci.uchicago.edu/~nefedova/graph_big.png" > > > > > > cannot be > > > > > > > > > displayed, because it contains errors. > > > > > > > > > > > > > > > > > > Any idea what is wrong here? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Swift-devel mailing list > > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From tiberius at ci.uchicago.edu Fri Mar 9 21:01:29 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Fri, 9 Mar 2007 21:01:29 -0600 Subject: [Swift-devel] dot error In-Reply-To: <1173492320.19899.1.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070309145104.048c5010@mail.mcs.anl.gov> <1173475555.13614.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309153112.032e73f0@mail.mcs.anl.gov> <1173475865.13614.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309153648.0553a930@mail.mcs.anl.gov> <1173479931.15803.8.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309165000.032ff910@mail.mcs.anl.gov> <1173484840.19054.2.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309180625.05412d40@mail.mcs.anl.gov> <1173492320.19899.1.camel@blabla.mcs.anl.gov> Message-ID: Bad dot specification On 3/9/07, Mihael Hategan wrote: > If it's a valid .dot specification, dot shouldn't blow. > If it's not a valid dot specification, dot should complain, and we > should fix swift to produce valid .dot files. > > We need to figure out which one it is. > > Mihael > > On Fri, 2007-03-09 at 18:08 -0600, Veronika V. Nefedova wrote: > > It is a simple graph: a-b-c-d (where a,b,c are single nodes, and d is 68 > > parallel nodes). Could the number of parallel nodes be a problem? > > > > Nika > > > > At 06:00 PM 3/9/2007, Mihael Hategan wrote: > > >No clue. Looks like a pretty big graph. I can somewhat view things, but > > >it behaves weirdly. > > > > > >On Fri, 2007-03-09 at 16:50 -0600, Veronika V. Nefedova wrote: > > > > Thank you! > > > > > > > > Done. Its in my home dir on evitable. > > > > > > > > At 04:38 PM 3/9/2007, you wrote: > > > > >No walk. Only a step. Instead of -Tpng, -Tps. And the name eventually > > > > >changed to graph_big.ps. Ok, it was two steps. > > > > > > > > > >Mihael > > > > > > > > > >On Fri, 2007-03-09 at 15:37 -0600, Veronika V. Nefedova wrote: > > > > > > You'd have to walk me through this. I do not know how to produce the ps > > > > > > file out of dot file... > > > > > > Sorry! > > > > > > > > > > > > At 03:31 PM 3/9/2007, you wrote: > > > > > > >Identify from ImageMagick quickly eats up all the memory when I try to > > > > > > >run it on that file. > > > > > > >I'm tempted to conclude that something might be broken with dot. > > > > > > >Can you try producing a PostScript file instead? > > > > > > > > > > > > > >Mihael > > > > > > > > > > > > > >On Fri, 2007-03-09 at 15:31 -0600, Veronika V. Nefedova wrote: > > > > > > > > Nope! > > > > > > > > > > > > > > > > [nefedova at evitable ~]$ dot -ograph_big.png -Tpng > > > > > > > > swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > > [nefedova at evitable ~]$ cp graph_big.png public_html/ > > > > > > > > [nefedova at evitable ~]$ which dot > > > > > > > > /usr/bin/dot > > > > > > > > > > > > > > > > > > > > > > > > At 03:25 PM 3/9/2007, Mihael Hategan wrote: > > > > > > > > >Did dot complain about anything? > > > > > > > > > > > > > > > > > >On Fri, 2007-03-09 at 14:55 -0600, Veronika V. Nefedova wrote: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > Not sure what happened -- but I was unable to produce (or > > > > > display) the > > > > > > > > > > correct png file out of dot file that was generated by swift > > > > > (after the > > > > > > > > > > workflow was done). > > > > > > > > > > > > > > > > > > > > I put my dot file on evitable in > > > > > > > > > ~nefedova/swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > > > > Then I ran the dot command to generate the png file: > > > > > > > > > > $dot -ograph_big.png -Tpng swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > > > > > > > > > > > > > > Then I tried to display it in the browser and got this: > > > > > > > > > > The image "http://www.ci.uchicago.edu/~nefedova/graph_big.png" > > > > > > > cannot be > > > > > > > > > > displayed, because it contains errors. > > > > > > > > > > > > > > > > > > > > Any idea what is wrong here? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > Swift-devel mailing list > > > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From hategan at mcs.anl.gov Fri Mar 9 21:10:09 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 09 Mar 2007 21:10:09 -0600 Subject: [Swift-devel] dot error In-Reply-To: References: <6.0.0.22.2.20070309145104.048c5010@mail.mcs.anl.gov> <1173475555.13614.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309153112.032e73f0@mail.mcs.anl.gov> <1173475865.13614.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309153648.0553a930@mail.mcs.anl.gov> <1173479931.15803.8.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309165000.032ff910@mail.mcs.anl.gov> <1173484840.19054.2.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309180625.05412d40@mail.mcs.anl.gov> <1173492320.19899.1.camel@blabla.mcs.anl.gov> Message-ID: <1173496209.22230.1.camel@blabla.mcs.anl.gov> On Fri, 2007-03-09 at 21:01 -0600, Tiberiu Stef-Praun wrote: > Bad dot specification :) Labels should be quoted. That still doesn't explain why dot doesn't complain. > > > On 3/9/07, Mihael Hategan wrote: > > If it's a valid .dot specification, dot shouldn't blow. > > If it's not a valid dot specification, dot should complain, and we > > should fix swift to produce valid .dot files. > > > > We need to figure out which one it is. > > > > Mihael > > > > On Fri, 2007-03-09 at 18:08 -0600, Veronika V. Nefedova wrote: > > > It is a simple graph: a-b-c-d (where a,b,c are single nodes, and d is 68 > > > parallel nodes). Could the number of parallel nodes be a problem? > > > > > > Nika > > > > > > At 06:00 PM 3/9/2007, Mihael Hategan wrote: > > > >No clue. Looks like a pretty big graph. I can somewhat view things, but > > > >it behaves weirdly. > > > > > > > >On Fri, 2007-03-09 at 16:50 -0600, Veronika V. Nefedova wrote: > > > > > Thank you! > > > > > > > > > > Done. Its in my home dir on evitable. > > > > > > > > > > At 04:38 PM 3/9/2007, you wrote: > > > > > >No walk. Only a step. Instead of -Tpng, -Tps. And the name eventually > > > > > >changed to graph_big.ps. Ok, it was two steps. > > > > > > > > > > > >Mihael > > > > > > > > > > > >On Fri, 2007-03-09 at 15:37 -0600, Veronika V. Nefedova wrote: > > > > > > > You'd have to walk me through this. I do not know how to produce the ps > > > > > > > file out of dot file... > > > > > > > Sorry! > > > > > > > > > > > > > > At 03:31 PM 3/9/2007, you wrote: > > > > > > > >Identify from ImageMagick quickly eats up all the memory when I try to > > > > > > > >run it on that file. > > > > > > > >I'm tempted to conclude that something might be broken with dot. > > > > > > > >Can you try producing a PostScript file instead? > > > > > > > > > > > > > > > >Mihael > > > > > > > > > > > > > > > >On Fri, 2007-03-09 at 15:31 -0600, Veronika V. Nefedova wrote: > > > > > > > > > Nope! > > > > > > > > > > > > > > > > > > [nefedova at evitable ~]$ dot -ograph_big.png -Tpng > > > > > > > > > swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > > > [nefedova at evitable ~]$ cp graph_big.png public_html/ > > > > > > > > > [nefedova at evitable ~]$ which dot > > > > > > > > > /usr/bin/dot > > > > > > > > > > > > > > > > > > > > > > > > > > > At 03:25 PM 3/9/2007, Mihael Hategan wrote: > > > > > > > > > >Did dot complain about anything? > > > > > > > > > > > > > > > > > > > >On Fri, 2007-03-09 at 14:55 -0600, Veronika V. Nefedova wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > Not sure what happened -- but I was unable to produce (or > > > > > > display) the > > > > > > > > > > > correct png file out of dot file that was generated by swift > > > > > > (after the > > > > > > > > > > > workflow was done). > > > > > > > > > > > > > > > > > > > > > > I put my dot file on evitable in > > > > > > > > > > ~nefedova/swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > > > > > Then I ran the dot command to generate the png file: > > > > > > > > > > > $dot -ograph_big.png -Tpng swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > > > > > > > > > > > > > > > > Then I tried to display it in the browser and got this: > > > > > > > > > > > The image "http://www.ci.uchicago.edu/~nefedova/graph_big.png" > > > > > > > > cannot be > > > > > > > > > > > displayed, because it contains errors. > > > > > > > > > > > > > > > > > > > > > > Any idea what is wrong here? > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > Swift-devel mailing list > > > > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From hategan at mcs.anl.gov Sat Mar 10 19:44:33 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 10 Mar 2007 19:44:33 -0600 Subject: [Swift-devel] jobchart Message-ID: <1173577473.15768.20.camel@blabla.mcs.anl.gov> There's a new tool in bin. It's a spin off Jens' "show-id" tool. After careful analysis of show-id, it became apparent that a lot of the difficulty was in gathering and organizing the data, rather than in generating the plots. This one's written in python and lacks the command line options to control sizes, but includes the logic in Jens' tool that automatically scale things. It does not show individual stage-ins and stage-outs. I'll have to think of a way to represent those on the plot without making it messy. It needs the logs to contain debugging info from individual tasks: log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG I've updated this in SVN, but if you want to run it on older builds, you need the above in log4j.properties. I attached a sample output. Mihael -------------- next part -------------- A non-text attachment was scrubbed... Name: helloworld-1bef51518fyz0.log.png Type: image/png Size: 136145 bytes Desc: not available URL: From wilde at mcs.anl.gov Sat Mar 10 22:09:05 2007 From: wilde at mcs.anl.gov (Mike Wilde) Date: Sat, 10 Mar 2007 22:09:05 -0600 Subject: [Swift-devel] jobchart In-Reply-To: <1173577473.15768.20.camel@blabla.mcs.anl.gov> References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> Message-ID: <45F380E1.3070301@mcs.anl.gov> That is beautiful! Nice work, Mihael. - Mike Mihael Hategan wrote, On 3/10/2007 7:44 PM: > There's a new tool in bin. > > It's a spin off Jens' "show-id" tool. > After careful analysis of show-id, it became apparent that a lot of the > difficulty was in gathering and organizing the data, rather than in > generating the plots. This one's written in python and lacks the command > line options to control sizes, but includes the logic in Jens' tool that > automatically scale things. > > It does not show individual stage-ins and stage-outs. I'll have to think > of a way to represent those on the plot without making it messy. > It needs the logs to contain debugging info from individual tasks: > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > > I've updated this in SVN, but if you want to run it on older builds, you > need the above in log4j.properties. > > I attached a sample output. > > Mihael > > > ------------------------------------------------------------------------ > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Mike Wilde Computation Institute, University of Chicago Math & Computer Science Division Argonne National Laboratory Argonne, IL 60439 USA tel 630-252-7497 fax 630-252-1997 From hategan at mcs.anl.gov Sat Mar 10 22:32:54 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 10 Mar 2007 22:32:54 -0600 Subject: [Swift-devel] jobchart In-Reply-To: <45F380E1.3070301@mcs.anl.gov> References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> <45F380E1.3070301@mcs.anl.gov> Message-ID: <1173587575.24215.6.camel@blabla.mcs.anl.gov> I updated it a bit. Two interesting ones (warning: ~700kB files): http://www-unix.mcs.anl.gov/~hategan/helloworld-i4lb1xpvedgs0.log.png and http://www-unix.mcs.anl.gov/~hategan/helloworld-okdn8oj4qg411.log.png The first one has the gradual throttling disabled. The second one has it set to a low value. Granted, this is running /bin/sleep 2 (ignore the fact that the label says "echo"), but the fact that lack of throttling can cause resource saturation and slightly worse performance is interesting. I still have to figure out what, besides checking the exit code file, causes the long delays after the job is done. I'm guessing it's some CPU intensive stuff that doesn't parallelize very well on my laptop. Mihael On Sat, 2007-03-10 at 22:09 -0600, Mike Wilde wrote: > That is beautiful! Nice work, Mihael. > > - Mike > > Mihael Hategan wrote, On 3/10/2007 7:44 PM: > > There's a new tool in bin. > > > > It's a spin off Jens' "show-id" tool. > > After careful analysis of show-id, it became apparent that a lot of the > > difficulty was in gathering and organizing the data, rather than in > > generating the plots. This one's written in python and lacks the command > > line options to control sizes, but includes the logic in Jens' tool that > > automatically scale things. > > > > It does not show individual stage-ins and stage-outs. I'll have to think > > of a way to represent those on the plot without making it messy. > > It needs the logs to contain debugging info from individual tasks: > > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > > > > I've updated this in SVN, but if you want to run it on older builds, you > > need the above in log4j.properties. > > > > I attached a sample output. > > > > Mihael > > > > > > ------------------------------------------------------------------------ > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From yongzh at cs.uchicago.edu Sat Mar 10 22:51:47 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Sat, 10 Mar 2007 22:51:47 -0600 (CST) Subject: [Swift-devel] jobchart In-Reply-To: <1173577473.15768.20.camel@blabla.mcs.anl.gov> References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> Message-ID: nice graph, what's preprocessing and postprocessing involved in this case? Yong. On Sat, 10 Mar 2007, Mihael Hategan wrote: > There's a new tool in bin. > > It's a spin off Jens' "show-id" tool. > After careful analysis of show-id, it became apparent that a lot of the > difficulty was in gathering and organizing the data, rather than in > generating the plots. This one's written in python and lacks the command > line options to control sizes, but includes the logic in Jens' tool that > automatically scale things. > > It does not show individual stage-ins and stage-outs. I'll have to think > of a way to represent those on the plot without making it messy. > It needs the logs to contain debugging info from individual tasks: > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > > I've updated this in SVN, but if you want to run it on older builds, you > need the above in log4j.properties. > > I attached a sample output. > > Mihael > From hategan at mcs.anl.gov Sat Mar 10 22:55:13 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 10 Mar 2007 22:55:13 -0600 Subject: [Swift-devel] jobchart In-Reply-To: References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> Message-ID: <1173588913.25191.3.camel@blabla.mcs.anl.gov> Preprocessing is the time between "Running job..." log message in vdl-int.k and when the task status gets changed to "Submitted". Postprocessing is between when the task status is changed to "Completed" and the "Completed job" log message in vdl-int.k, and includes the check for the exit code file. Mihael On Sat, 2007-03-10 at 22:51 -0600, Yong Zhao wrote: > nice graph, what's preprocessing and postprocessing involved in this case? > > Yong. > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > There's a new tool in bin. > > > > It's a spin off Jens' "show-id" tool. > > After careful analysis of show-id, it became apparent that a lot of the > > difficulty was in gathering and organizing the data, rather than in > > generating the plots. This one's written in python and lacks the command > > line options to control sizes, but includes the logic in Jens' tool that > > automatically scale things. > > > > It does not show individual stage-ins and stage-outs. I'll have to think > > of a way to represent those on the plot without making it messy. > > It needs the logs to contain debugging info from individual tasks: > > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > > > > I've updated this in SVN, but if you want to run it on older builds, you > > need the above in log4j.properties. > > > > I attached a sample output. > > > > Mihael > > > From yongzh at cs.uchicago.edu Sat Mar 10 23:02:13 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Sat, 10 Mar 2007 23:02:13 -0600 (CST) Subject: [Swift-devel] jobchart In-Reply-To: <1173588913.25191.3.camel@blabla.mcs.anl.gov> References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> <1173588913.25191.3.camel@blabla.mcs.anl.gov> Message-ID: I see, so this does not involve stage-in and outs. Why did postprocessing take that much time in the other two graphs, where in this case it is minimal? Yong. On Sat, 10 Mar 2007, Mihael Hategan wrote: > Preprocessing is the time between "Running job..." log message in > vdl-int.k and when the task status gets changed to "Submitted". > > Postprocessing is between when the task status is changed to "Completed" > and the "Completed job" log message in vdl-int.k, and includes the check > for the exit code file. > > Mihael > > On Sat, 2007-03-10 at 22:51 -0600, Yong Zhao wrote: > > nice graph, what's preprocessing and postprocessing involved in this case? > > > > Yong. > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > There's a new tool in bin. > > > > > > It's a spin off Jens' "show-id" tool. > > > After careful analysis of show-id, it became apparent that a lot of the > > > difficulty was in gathering and organizing the data, rather than in > > > generating the plots. This one's written in python and lacks the command > > > line options to control sizes, but includes the logic in Jens' tool that > > > automatically scale things. > > > > > > It does not show individual stage-ins and stage-outs. I'll have to think > > > of a way to represent those on the plot without making it messy. > > > It needs the logs to contain debugging info from individual tasks: > > > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > > > > > > I've updated this in SVN, but if you want to run it on older builds, you > > > need the above in log4j.properties. > > > > > > I attached a sample output. > > > > > > Mihael > > > > > > > From hategan at mcs.anl.gov Sat Mar 10 23:02:03 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 10 Mar 2007 23:02:03 -0600 Subject: [Swift-devel] jobchart In-Reply-To: References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> <1173588913.25191.3.camel@blabla.mcs.anl.gov> Message-ID: <1173589323.25508.1.camel@blabla.mcs.anl.gov> On Sat, 2007-03-10 at 23:02 -0600, Yong Zhao wrote: > I see, so this does not involve stage-in and outs. Why did postprocessing > take that much time in the other two graphs, where in this case it is > minimal? I think that's a cpu that does too many things at once. But I welcome other explanations. > > Yong. > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > Preprocessing is the time between "Running job..." log message in > > vdl-int.k and when the task status gets changed to "Submitted". > > > > Postprocessing is between when the task status is changed to "Completed" > > and the "Completed job" log message in vdl-int.k, and includes the check > > for the exit code file. > > > > Mihael > > > > On Sat, 2007-03-10 at 22:51 -0600, Yong Zhao wrote: > > > nice graph, what's preprocessing and postprocessing involved in this case? > > > > > > Yong. > > > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > > > There's a new tool in bin. > > > > > > > > It's a spin off Jens' "show-id" tool. > > > > After careful analysis of show-id, it became apparent that a lot of the > > > > difficulty was in gathering and organizing the data, rather than in > > > > generating the plots. This one's written in python and lacks the command > > > > line options to control sizes, but includes the logic in Jens' tool that > > > > automatically scale things. > > > > > > > > It does not show individual stage-ins and stage-outs. I'll have to think > > > > of a way to represent those on the plot without making it messy. > > > > It needs the logs to contain debugging info from individual tasks: > > > > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > > > > > > > > I've updated this in SVN, but if you want to run it on older builds, you > > > > need the above in log4j.properties. > > > > > > > > I attached a sample output. > > > > > > > > Mihael > > > > > > > > > > > > From hategan at mcs.anl.gov Sat Mar 10 23:14:41 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 10 Mar 2007 23:14:41 -0600 Subject: [Swift-devel] jobchart In-Reply-To: <1173589323.25508.1.camel@blabla.mcs.anl.gov> References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> <1173588913.25191.3.camel@blabla.mcs.anl.gov> <1173589323.25508.1.camel@blabla.mcs.anl.gov> Message-ID: <1173590081.25508.8.camel@blabla.mcs.anl.gov> On Sat, 2007-03-10 at 23:02 -0600, Mihael Hategan wrote: > On Sat, 2007-03-10 at 23:02 -0600, Yong Zhao wrote: > > I see, so this does not involve stage-in and outs. Why did postprocessing > > take that much time in the other two graphs, where in this case it is > > minimal? > > I think that's a cpu that does too many things at once. But I welcome > other explanations. Actually I'm kinda confused. Running bash with 300 parallel "/bin/sleep 2" takes 2.4 seconds and doing it with plain Karajan takes 5. Some thinking needs to happen there. > > > > > Yong. > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > Preprocessing is the time between "Running job..." log message in > > > vdl-int.k and when the task status gets changed to "Submitted". > > > > > > Postprocessing is between when the task status is changed to "Completed" > > > and the "Completed job" log message in vdl-int.k, and includes the check > > > for the exit code file. > > > > > > Mihael > > > > > > On Sat, 2007-03-10 at 22:51 -0600, Yong Zhao wrote: > > > > nice graph, what's preprocessing and postprocessing involved in this case? > > > > > > > > Yong. > > > > > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > > > > > There's a new tool in bin. > > > > > > > > > > It's a spin off Jens' "show-id" tool. > > > > > After careful analysis of show-id, it became apparent that a lot of the > > > > > difficulty was in gathering and organizing the data, rather than in > > > > > generating the plots. This one's written in python and lacks the command > > > > > line options to control sizes, but includes the logic in Jens' tool that > > > > > automatically scale things. > > > > > > > > > > It does not show individual stage-ins and stage-outs. I'll have to think > > > > > of a way to represent those on the plot without making it messy. > > > > > It needs the logs to contain debugging info from individual tasks: > > > > > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > > > > > > > > > > I've updated this in SVN, but if you want to run it on older builds, you > > > > > need the above in log4j.properties. > > > > > > > > > > I attached a sample output. > > > > > > > > > > Mihael > > > > > > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Sat Mar 10 23:16:46 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 10 Mar 2007 23:16:46 -0600 Subject: [Swift-devel] jobchart In-Reply-To: <1173590081.25508.8.camel@blabla.mcs.anl.gov> References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> <1173588913.25191.3.camel@blabla.mcs.anl.gov> <1173589323.25508.1.camel@blabla.mcs.anl.gov> <1173590081.25508.8.camel@blabla.mcs.anl.gov> Message-ID: <1173590206.25508.10.camel@blabla.mcs.anl.gov> On Sat, 2007-03-10 at 23:14 -0600, Mihael Hategan wrote: > On Sat, 2007-03-10 at 23:02 -0600, Mihael Hategan wrote: > > On Sat, 2007-03-10 at 23:02 -0600, Yong Zhao wrote: > > > I see, so this does not involve stage-in and outs. Why did postprocessing > > > take that much time in the other two graphs, where in this case it is > > > minimal? > > > > I think that's a cpu that does too many things at once. But I welcome > > other explanations. > > Actually I'm kinda confused. Running bash with 300 parallel "/bin/sleep > 2" takes 2.4 seconds and doing it with plain Karajan takes 5. Also, running 300 parallel wait=2000 in karajan takes 2.08 seconds. So the bulk must be in the implementation of execute() and/or the local provider. > > Some thinking needs to happen there. > > > > > > > > > Yong. > > > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > > > Preprocessing is the time between "Running job..." log message in > > > > vdl-int.k and when the task status gets changed to "Submitted". > > > > > > > > Postprocessing is between when the task status is changed to "Completed" > > > > and the "Completed job" log message in vdl-int.k, and includes the check > > > > for the exit code file. > > > > > > > > Mihael > > > > > > > > On Sat, 2007-03-10 at 22:51 -0600, Yong Zhao wrote: > > > > > nice graph, what's preprocessing and postprocessing involved in this case? > > > > > > > > > > Yong. > > > > > > > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > > > > > > > There's a new tool in bin. > > > > > > > > > > > > It's a spin off Jens' "show-id" tool. > > > > > > After careful analysis of show-id, it became apparent that a lot of the > > > > > > difficulty was in gathering and organizing the data, rather than in > > > > > > generating the plots. This one's written in python and lacks the command > > > > > > line options to control sizes, but includes the logic in Jens' tool that > > > > > > automatically scale things. > > > > > > > > > > > > It does not show individual stage-ins and stage-outs. I'll have to think > > > > > > of a way to represent those on the plot without making it messy. > > > > > > It needs the logs to contain debugging info from individual tasks: > > > > > > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > > > > > > > > > > > > I've updated this in SVN, but if you want to run it on older builds, you > > > > > > need the above in log4j.properties. > > > > > > > > > > > > I attached a sample output. > > > > > > > > > > > > Mihael > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Sat Mar 10 23:24:44 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 10 Mar 2007 23:24:44 -0600 Subject: [Swift-devel] jobchart In-Reply-To: <1173590206.25508.10.camel@blabla.mcs.anl.gov> References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> <1173588913.25191.3.camel@blabla.mcs.anl.gov> <1173589323.25508.1.camel@blabla.mcs.anl.gov> <1173590081.25508.8.camel@blabla.mcs.anl.gov> <1173590206.25508.10.camel@blabla.mcs.anl.gov> Message-ID: <1173590684.28080.0.camel@blabla.mcs.anl.gov> On Sat, 2007-03-10 at 23:16 -0600, Mihael Hategan wrote: > On Sat, 2007-03-10 at 23:14 -0600, Mihael Hategan wrote: > > On Sat, 2007-03-10 at 23:02 -0600, Mihael Hategan wrote: > > > On Sat, 2007-03-10 at 23:02 -0600, Yong Zhao wrote: > > > > I see, so this does not involve stage-in and outs. Why did postprocessing > > > > take that much time in the other two graphs, where in this case it is > > > > minimal? > > > > > > I think that's a cpu that does too many things at once. But I welcome > > > other explanations. > > > > Actually I'm kinda confused. Running bash with 300 parallel "/bin/sleep > > 2" takes 2.4 seconds and doing it with plain Karajan takes 5. > > Also, running 300 parallel wait=2000 in karajan takes 2.08 seconds. So > the bulk must be in the implementation of execute() and/or the local > provider. Note: not counting the jvm startup: print(time(parallelfor(i, range(1, 300), wait(delay=2000))) > > > > > Some thinking needs to happen there. > > > > > > > > > > > > > Yong. > > > > > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > > > > > Preprocessing is the time between "Running job..." log message in > > > > > vdl-int.k and when the task status gets changed to "Submitted". > > > > > > > > > > Postprocessing is between when the task status is changed to "Completed" > > > > > and the "Completed job" log message in vdl-int.k, and includes the check > > > > > for the exit code file. > > > > > > > > > > Mihael > > > > > > > > > > On Sat, 2007-03-10 at 22:51 -0600, Yong Zhao wrote: > > > > > > nice graph, what's preprocessing and postprocessing involved in this case? > > > > > > > > > > > > Yong. > > > > > > > > > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > > > > > > > > > There's a new tool in bin. > > > > > > > > > > > > > > It's a spin off Jens' "show-id" tool. > > > > > > > After careful analysis of show-id, it became apparent that a lot of the > > > > > > > difficulty was in gathering and organizing the data, rather than in > > > > > > > generating the plots. This one's written in python and lacks the command > > > > > > > line options to control sizes, but includes the logic in Jens' tool that > > > > > > > automatically scale things. > > > > > > > > > > > > > > It does not show individual stage-ins and stage-outs. I'll have to think > > > > > > > of a way to represent those on the plot without making it messy. > > > > > > > It needs the logs to contain debugging info from individual tasks: > > > > > > > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > > > > > > > > > > > > > > I've updated this in SVN, but if you want to run it on older builds, you > > > > > > > need the above in log4j.properties. > > > > > > > > > > > > > > I attached a sample output. > > > > > > > > > > > > > > Mihael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From yongzh at cs.uchicago.edu Sat Mar 10 23:35:35 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Sat, 10 Mar 2007 23:35:35 -0600 (CST) Subject: [Swift-devel] jobchart In-Reply-To: <1173590684.28080.0.camel@blabla.mcs.anl.gov> References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> <1173588913.25191.3.camel@blabla.mcs.anl.gov> <1173589323.25508.1.camel@blabla.mcs.anl.gov> <1173590081.25508.8.camel@blabla.mcs.anl.gov> <1173590206.25508.10.camel@blabla.mcs.anl.gov> <1173590684.28080.0.camel@blabla.mcs.anl.gov> Message-ID: In Swift there is dir creation, exit code checking, cleanup etc. but that is still a lot of extra time (25s vs 3s), kinda weird Yong. On Sat, 10 Mar 2007, Mihael Hategan wrote: > On Sat, 2007-03-10 at 23:16 -0600, Mihael Hategan wrote: > > On Sat, 2007-03-10 at 23:14 -0600, Mihael Hategan wrote: > > > On Sat, 2007-03-10 at 23:02 -0600, Mihael Hategan wrote: > > > > On Sat, 2007-03-10 at 23:02 -0600, Yong Zhao wrote: > > > > > I see, so this does not involve stage-in and outs. Why did postprocessing > > > > > take that much time in the other two graphs, where in this case it is > > > > > minimal? > > > > > > > > I think that's a cpu that does too many things at once. But I welcome > > > > other explanations. > > > > > > Actually I'm kinda confused. Running bash with 300 parallel "/bin/sleep > > > 2" takes 2.4 seconds and doing it with plain Karajan takes 5. > > > > Also, running 300 parallel wait=2000 in karajan takes 2.08 seconds. So > > the bulk must be in the implementation of execute() and/or the local > > provider. > > Note: not counting the jvm startup: > print(time(parallelfor(i, range(1, 300), wait(delay=2000))) > > > > > > > > > Some thinking needs to happen there. > > > > > > > > > > > > > > > > > Yong. > > > > > > > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > > > > > > > Preprocessing is the time between "Running job..." log message in > > > > > > vdl-int.k and when the task status gets changed to "Submitted". > > > > > > > > > > > > Postprocessing is between when the task status is changed to "Completed" > > > > > > and the "Completed job" log message in vdl-int.k, and includes the check > > > > > > for the exit code file. > > > > > > > > > > > > Mihael > > > > > > > > > > > > On Sat, 2007-03-10 at 22:51 -0600, Yong Zhao wrote: > > > > > > > nice graph, what's preprocessing and postprocessing involved in this case? > > > > > > > > > > > > > > Yong. > > > > > > > > > > > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > > > > > > > > > > > There's a new tool in bin. > > > > > > > > > > > > > > > > It's a spin off Jens' "show-id" tool. > > > > > > > > After careful analysis of show-id, it became apparent that a lot of the > > > > > > > > difficulty was in gathering and organizing the data, rather than in > > > > > > > > generating the plots. This one's written in python and lacks the command > > > > > > > > line options to control sizes, but includes the logic in Jens' tool that > > > > > > > > automatically scale things. > > > > > > > > > > > > > > > > It does not show individual stage-ins and stage-outs. I'll have to think > > > > > > > > of a way to represent those on the plot without making it messy. > > > > > > > > It needs the logs to contain debugging info from individual tasks: > > > > > > > > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > > > > > > > > > > > > > > > > I've updated this in SVN, but if you want to run it on older builds, you > > > > > > > > need the above in log4j.properties. > > > > > > > > > > > > > > > > I attached a sample output. > > > > > > > > > > > > > > > > Mihael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From hategan at mcs.anl.gov Sat Mar 10 23:41:22 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 10 Mar 2007 23:41:22 -0600 Subject: [Swift-devel] jobchart In-Reply-To: References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> <1173588913.25191.3.camel@blabla.mcs.anl.gov> <1173589323.25508.1.camel@blabla.mcs.anl.gov> <1173590081.25508.8.camel@blabla.mcs.anl.gov> <1173590206.25508.10.camel@blabla.mcs.anl.gov> <1173590684.28080.0.camel@blabla.mcs.anl.gov> Message-ID: <1173591682.28080.8.camel@blabla.mcs.anl.gov> On Sat, 2007-03-10 at 23:35 -0600, Yong Zhao wrote: > In Swift there is dir creation, exit code checking, cleanup etc. but that > is still a lot of extra time (25s vs 3s), kinda weird Cleanup happens only at the end of the whole workflow. And it's a handful of jobs. In this case (one site) it's only one rm -rf. And the scheduler is also not the culprit. If you look at the second graph, the one with throttling, and if you look at the scheduler job submit line (black) in bulk submits, you see that it pretty much a vertical line, which means that the scheduler (having only one thread) processes the jobs almost instantaneously. However, it's not the 25s vs. 3s that is concerning, because it's hard to isolate problems when there are many things happening, but the 5 vs. 2.08 seconds. > > Yong. > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > On Sat, 2007-03-10 at 23:16 -0600, Mihael Hategan wrote: > > > On Sat, 2007-03-10 at 23:14 -0600, Mihael Hategan wrote: > > > > On Sat, 2007-03-10 at 23:02 -0600, Mihael Hategan wrote: > > > > > On Sat, 2007-03-10 at 23:02 -0600, Yong Zhao wrote: > > > > > > I see, so this does not involve stage-in and outs. Why did postprocessing > > > > > > take that much time in the other two graphs, where in this case it is > > > > > > minimal? > > > > > > > > > > I think that's a cpu that does too many things at once. But I welcome > > > > > other explanations. > > > > > > > > Actually I'm kinda confused. Running bash with 300 parallel "/bin/sleep > > > > 2" takes 2.4 seconds and doing it with plain Karajan takes 5. > > > > > > Also, running 300 parallel wait=2000 in karajan takes 2.08 seconds. So > > > the bulk must be in the implementation of execute() and/or the local > > > provider. > > > > Note: not counting the jvm startup: > > print(time(parallelfor(i, range(1, 300), wait(delay=2000))) > > > > > > > > > > > > > Some thinking needs to happen there. > > > > > > > > > > > > > > > > > > > > > Yong. > > > > > > > > > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > > > > > > > > > Preprocessing is the time between "Running job..." log message in > > > > > > > vdl-int.k and when the task status gets changed to "Submitted". > > > > > > > > > > > > > > Postprocessing is between when the task status is changed to "Completed" > > > > > > > and the "Completed job" log message in vdl-int.k, and includes the check > > > > > > > for the exit code file. > > > > > > > > > > > > > > Mihael > > > > > > > > > > > > > > On Sat, 2007-03-10 at 22:51 -0600, Yong Zhao wrote: > > > > > > > > nice graph, what's preprocessing and postprocessing involved in this case? > > > > > > > > > > > > > > > > Yong. > > > > > > > > > > > > > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > > > > > > > > > > > > > There's a new tool in bin. > > > > > > > > > > > > > > > > > > It's a spin off Jens' "show-id" tool. > > > > > > > > > After careful analysis of show-id, it became apparent that a lot of the > > > > > > > > > difficulty was in gathering and organizing the data, rather than in > > > > > > > > > generating the plots. This one's written in python and lacks the command > > > > > > > > > line options to control sizes, but includes the logic in Jens' tool that > > > > > > > > > automatically scale things. > > > > > > > > > > > > > > > > > > It does not show individual stage-ins and stage-outs. I'll have to think > > > > > > > > > of a way to represent those on the plot without making it messy. > > > > > > > > > It needs the logs to contain debugging info from individual tasks: > > > > > > > > > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > > > > > > > > > > > > > > > > > > I've updated this in SVN, but if you want to run it on older builds, you > > > > > > > > > need the above in log4j.properties. > > > > > > > > > > > > > > > > > > I attached a sample output. > > > > > > > > > > > > > > > > > > Mihael > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > From foster at mcs.anl.gov Sun Mar 11 09:57:55 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Sun, 11 Mar 2007 09:57:55 -0500 Subject: [Swift-devel] jobchart In-Reply-To: <1173587575.24215.6.camel@blabla.mcs.anl.gov> References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> <45F380E1.3070301@mcs.anl.gov> <1173587575.24215.6.camel@blabla.mcs.anl.gov> Message-ID: <45F418F3.6080007@mcs.anl.gov> Mihael: Is it easy to do a version of this that shows what each specific CPU is doing? That would be helpful in terms of understanding bottlenecks. I guess it is difficult when a separate GRAM submission is performed per task. It would be easy when using DeeF. Ian. Mihael Hategan wrote: > I updated it a bit. > Two interesting ones (warning: ~700kB files): > http://www-unix.mcs.anl.gov/~hategan/helloworld-i4lb1xpvedgs0.log.png > > and > > http://www-unix.mcs.anl.gov/~hategan/helloworld-okdn8oj4qg411.log.png > > The first one has the gradual throttling disabled. The second one has it > set to a low value. > Granted, this is running /bin/sleep 2 (ignore the fact that the label > says "echo"), but the fact that lack of throttling can cause resource > saturation and slightly worse performance is interesting. > I still have to figure out what, besides checking the exit code file, > causes the long delays after the job is done. I'm guessing it's some CPU > intensive stuff that doesn't parallelize very well on my laptop. > > Mihael > > On Sat, 2007-03-10 at 22:09 -0600, Mike Wilde wrote: > >> That is beautiful! Nice work, Mihael. >> >> - Mike >> >> Mihael Hategan wrote, On 3/10/2007 7:44 PM: >> >>> There's a new tool in bin. >>> >>> It's a spin off Jens' "show-id" tool. >>> After careful analysis of show-id, it became apparent that a lot of the >>> difficulty was in gathering and organizing the data, rather than in >>> generating the plots. This one's written in python and lacks the command >>> line options to control sizes, but includes the logic in Jens' tool that >>> automatically scale things. >>> >>> It does not show individual stage-ins and stage-outs. I'll have to think >>> of a way to represent those on the plot without making it messy. >>> It needs the logs to contain debugging info from individual tasks: >>> log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG >>> >>> I've updated this in SVN, but if you want to run it on older builds, you >>> need the above in log4j.properties. >>> >>> I attached a sample output. >>> >>> Mihael >>> >>> >>> ------------------------------------------------------------------------ >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Mar 11 12:56:28 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 11 Mar 2007 11:56:28 -0600 Subject: [Swift-devel] jobchart In-Reply-To: <45F418F3.6080007@mcs.anl.gov> References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> <45F380E1.3070301@mcs.anl.gov> <1173587575.24215.6.camel@blabla.mcs.anl.gov> <45F418F3.6080007@mcs.anl.gov> Message-ID: <1173635788.2588.3.camel@blabla.mcs.anl.gov> On Sun, 2007-03-11 at 09:57 -0500, Ian Foster wrote: > Mihael: > > Is it easy to do a version of this that shows what each specific CPU > is doing? This is all running locally. One CPU for Swift and /bin/sleep. It's not really representative of "real" workflow runs. I was only trying to test "jobchart". It's intriguing nonetheless :) > That would be helpful in terms of understanding bottlenecks. I guess > it is difficult when a separate GRAM submission is performed per task. > It would be easy when using DeeF. > > Ian. > > Mihael Hategan wrote: > > I updated it a bit. > > Two interesting ones (warning: ~700kB files): > > http://www-unix.mcs.anl.gov/~hategan/helloworld-i4lb1xpvedgs0.log.png > > > > and > > > > http://www-unix.mcs.anl.gov/~hategan/helloworld-okdn8oj4qg411.log.png > > > > The first one has the gradual throttling disabled. The second one has it > > set to a low value. > > Granted, this is running /bin/sleep 2 (ignore the fact that the label > > says "echo"), but the fact that lack of throttling can cause resource > > saturation and slightly worse performance is interesting. > > I still have to figure out what, besides checking the exit code file, > > causes the long delays after the job is done. I'm guessing it's some CPU > > intensive stuff that doesn't parallelize very well on my laptop. > > > > Mihael > > > > On Sat, 2007-03-10 at 22:09 -0600, Mike Wilde wrote: > > > > > That is beautiful! Nice work, Mihael. > > > > > > - Mike > > > > > > Mihael Hategan wrote, On 3/10/2007 7:44 PM: > > > > > > > There's a new tool in bin. > > > > > > > > It's a spin off Jens' "show-id" tool. > > > > After careful analysis of show-id, it became apparent that a lot of the > > > > difficulty was in gathering and organizing the data, rather than in > > > > generating the plots. This one's written in python and lacks the command > > > > line options to control sizes, but includes the logic in Jens' tool that > > > > automatically scale things. > > > > > > > > It does not show individual stage-ins and stage-outs. I'll have to think > > > > of a way to represent those on the plot without making it messy. > > > > It needs the logs to contain debugging info from individual tasks: > > > > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > > > > > > > > I've updated this in SVN, but if you want to run it on older builds, you > > > > need the above in log4j.properties. > > > > > > > > I attached a sample output. > > > > > > > > Mihael > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > > Ian Foster, Director, Computation Institute > Argonne National Laboratory & University of Chicago > Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 > Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 > Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. > Globus Alliance: www.globus.org. From hategan at mcs.anl.gov Sun Mar 11 13:59:02 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 11 Mar 2007 12:59:02 -0600 Subject: [Swift-devel] jobchart In-Reply-To: <1173589323.25508.1.camel@blabla.mcs.anl.gov> References: <1173577473.15768.20.camel@blabla.mcs.anl.gov> <1173588913.25191.3.camel@blabla.mcs.anl.gov> <1173589323.25508.1.camel@blabla.mcs.anl.gov> Message-ID: <1173639542.1365.1.camel@blabla.mcs.anl.gov> On Sat, 2007-03-10 at 23:02 -0600, Mihael Hategan wrote: > On Sat, 2007-03-10 at 23:02 -0600, Yong Zhao wrote: > > I see, so this does not involve stage-in and outs. Why did postprocessing > > take that much time in the other two graphs, where in this case it is > > minimal? There's another bit. If the app fails (too many open files), it gets restarted. That is all hidden behind post-processing. Such failures should eventually appear properly on the chart. > > I think that's a cpu that does too many things at once. But I welcome > other explanations. > > > > > Yong. > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > Preprocessing is the time between "Running job..." log message in > > > vdl-int.k and when the task status gets changed to "Submitted". > > > > > > Postprocessing is between when the task status is changed to "Completed" > > > and the "Completed job" log message in vdl-int.k, and includes the check > > > for the exit code file. > > > > > > Mihael > > > > > > On Sat, 2007-03-10 at 22:51 -0600, Yong Zhao wrote: > > > > nice graph, what's preprocessing and postprocessing involved in this case? > > > > > > > > Yong. > > > > > > > > On Sat, 10 Mar 2007, Mihael Hategan wrote: > > > > > > > > > There's a new tool in bin. > > > > > > > > > > It's a spin off Jens' "show-id" tool. > > > > > After careful analysis of show-id, it became apparent that a lot of the > > > > > difficulty was in gathering and organizing the data, rather than in > > > > > generating the plots. This one's written in python and lacks the command > > > > > line options to control sizes, but includes the logic in Jens' tool that > > > > > automatically scale things. > > > > > > > > > > It does not show individual stage-ins and stage-outs. I'll have to think > > > > > of a way to represent those on the plot without making it messy. > > > > > It needs the logs to contain debugging info from individual tasks: > > > > > log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG > > > > > > > > > > I've updated this in SVN, but if you want to run it on older builds, you > > > > > need the above in log4j.properties. > > > > > > > > > > I attached a sample output. > > > > > > > > > > Mihael > > > > > > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Sun Mar 11 17:49:18 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 11 Mar 2007 16:49:18 -0600 Subject: [Swift-devel] changelogs Message-ID: <1173653358.14112.5.camel@blabla.mcs.anl.gov> Tibi's wish is here. There is now a changelog aggregator in the CoG builds with will interleave changelogs from various modules and write one out in the build directory. Mihael From wilde at mcs.anl.gov Mon Mar 12 14:31:04 2007 From: wilde at mcs.anl.gov (Mike Wilde) Date: Mon, 12 Mar 2007 14:31:04 -0500 Subject: [Swift-devel] RAM needs for running a swift workflow Message-ID: <45F5AA78.9040806@mcs.anl.gov> Does anyone have any data on how much memory is used for simple swift workflows, and how that scales up? I ask because if we have numerous students in a workshop running swift on the same submit host, we need to know how many it can handle before it consumes all RAM and starts paging/swapping. A good thing to measure in the future for our application workflows (albeit not the most critical for most applications). Alternatively - with the current command set, is there a practical way to have many users share the same JVM to run small independent workflows? (I assume not, and that this might be a feature to consider in the distant future. Would there be much savings for sharing a JVM, or is more of the memory overhead per-workflow than per-JVM?) - Mike From hategan at mcs.anl.gov Mon Mar 12 14:35:48 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 12 Mar 2007 14:35:48 -0500 Subject: [Swift-devel] RAM needs for running a swift workflow In-Reply-To: <45F5AA78.9040806@mcs.anl.gov> References: <45F5AA78.9040806@mcs.anl.gov> Message-ID: <1173728148.12331.2.camel@blabla.mcs.anl.gov> We need to figure out the exact numbers, but running multiple workflows inside one JVM is better than running multiple JVMs. However, there is no reasonable existing way of doing that right now. It could be explored though. On Mon, 2007-03-12 at 14:31 -0500, Mike Wilde wrote: > Does anyone have any data on how much memory is used for simple swift > workflows, and how that scales up? > > I ask because if we have numerous students in a workshop running swift on the > same submit host, we need to know how many it can handle before it consumes > all RAM and starts paging/swapping. > > A good thing to measure in the future for our application workflows (albeit > not the most critical for most applications). > > Alternatively - with the current command set, is there a practical way to have > many users share the same JVM to run small independent workflows? (I assume > not, and that this might be a feature to consider in the distant future. Would > there be much savings for sharing a JVM, or is more of the memory overhead > per-workflow than per-JVM?) > > - Mike > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From yongzh at cs.uchicago.edu Mon Mar 12 14:44:59 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Mon, 12 Mar 2007 14:44:59 -0500 (CDT) Subject: [Swift-devel] RAM needs for running a swift workflow In-Reply-To: <45F5AA78.9040806@mcs.anl.gov> References: <45F5AA78.9040806@mcs.anl.gov> Message-ID: I have some numbers using the big_diamond workflow: The format is of following: Mem iteration_in_workflow time_to_typecheck_the_graph number_of_nodes_in_wokflow big.kml: 1GB 5698 - 24m54s - 165243 512M 2832 - 13m4s - 82129 256M 1393 - 6m29s - 40398 128M 677 - 3m22s - 19634 64m 317 - 1m38s - 9194 32m 137 - 47s - 3974 Yong. On Mon, 12 Mar 2007, Mike Wilde wrote: > Does anyone have any data on how much memory is used for simple swift > workflows, and how that scales up? > > I ask because if we have numerous students in a workshop running swift on the > same submit host, we need to know how many it can handle before it consumes > all RAM and starts paging/swapping. > > A good thing to measure in the future for our application workflows (albeit > not the most critical for most applications). > > Alternatively - with the current command set, is there a practical way to have > many users share the same JVM to run small independent workflows? (I assume > not, and that this might be a feature to consider in the distant future. Would > there be much savings for sharing a JVM, or is more of the memory overhead > per-workflow than per-JVM?) > > - Mike > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Mon Mar 12 14:55:20 2007 From: wilde at mcs.anl.gov (Mike Wilde) Date: Mon, 12 Mar 2007 14:55:20 -0500 Subject: [Swift-devel] RAM needs for running a swift workflow In-Reply-To: References: <45F5AA78.9040806@mcs.anl.gov> Message-ID: <45F5B028.7020305@mcs.anl.gov> Thanks, Yong. So, it looks like a safe number is that it should be <32MB RAM for a small workflow (eg, 10s of nodes max). perhaps much less. Looks like RAM goes up near-linear with #nodes in workflow. But we dont know what the "floor" of fixed-overhead for Swift, Karajan and CoG is. At 32M, 32 users would use 1GB of RAM for user JVMs, theoretically. This sounds pretty reasonable and affordable, if a 2GB host could handle 20+ student users without paging. Mike Yong Zhao wrote, On 3/12/2007 2:44 PM: > I have some numbers using the big_diamond workflow: > The format is of following: > > Mem > iteration_in_workflow time_to_typecheck_the_graph number_of_nodes_in_wokflow > > big.kml: > 1GB > 5698 - 24m54s - 165243 > > 512M > 2832 - 13m4s - 82129 > > 256M > 1393 - 6m29s - 40398 > > 128M > 677 - 3m22s - 19634 > > 64m > 317 - 1m38s - 9194 > > 32m > 137 - 47s - 3974 > > Yong. > > On Mon, 12 Mar 2007, Mike Wilde wrote: > >> Does anyone have any data on how much memory is used for simple swift >> workflows, and how that scales up? >> >> I ask because if we have numerous students in a workshop running swift on the >> same submit host, we need to know how many it can handle before it consumes >> all RAM and starts paging/swapping. >> >> A good thing to measure in the future for our application workflows (albeit >> not the most critical for most applications). >> >> Alternatively - with the current command set, is there a practical way to have >> many users share the same JVM to run small independent workflows? (I assume >> not, and that this might be a feature to consider in the distant future. Would >> there be much savings for sharing a JVM, or is more of the memory overhead >> per-workflow than per-JVM?) >> >> - Mike >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > -- Mike Wilde Computation Institute, University of Chicago Math & Computer Science Division Argonne National Laboratory Argonne, IL 60439 USA tel 630-252-7497 fax 630-252-1997 From nefedova at mcs.anl.gov Tue Mar 13 16:46:43 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 13 Mar 2007 16:46:43 -0500 Subject: [Swift-devel] mapper problem or ...? Message-ID: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> Hi, I have a question: I am using a fixed_array_mapper to pass some 68 files as an input to my application called GENERATOR. I need to use the mapper since the number of input files is unknown before the workflow starts. Here is how I use it: file whamfiles_m002[] , solv_repu_0_0DOT2_b1_m002_wham">; These files are all generated by stage four of my workflow, each file is mapped to a physical filename, for example: file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; and this particular file is produced this way: (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, solv_chg_a0_m002_out, solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft, rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002, solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", "system:solv_m002", "stitle:m002", "rtffile:parm03_gaff_all.rtf", "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", "stage:chg", "urandseed:5395098", "dirname:solv_chg_a0_m002"); Then I call my application (the last stage of my workflow, stage five) (solv_chg_m002, solv_disp_m002, solv_repu_0DOT2_0DOT3_m002DOTwham, solv_repu_0DOT3_0DOT4_m002DOTwham, solv_repu_0DOT4_0DOT5_m002DOTwham, solv_repu_0DOT5_0DOT6_m002DOTwham, solv_repu_0DOT6_0DOT7_m002DOTwham, solv_repu_0DOT7_0DOT8_m002DOTwham, solv_repu_0DOT8_0DOT9_m002DOTwham, solv_repu_0DOT9_1_m002DOTwham, solv_repu_0_0DOT2_m002DOTwham ) = GENERATOR (@whamfiles_m002, "m002"); And then when I start my workflow, the GENERATOR starts right away. I am not sure why. Does the mapper look for the physical files on the disk and when finds them - starts right away ? I do have the needed files in the directory from my previous runs. Or there is something else wrong here ? 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 RunID: b0n2liektep92 pre_ch started <---------- thats the first stage generator_cat started <----------- not supposed to start now! generator_cat started My complete dtm file is in /home/nefedova/swift.dtm on terminable.ci.uchicago, but its pretty big... Thanks, Nika From hategan at mcs.anl.gov Tue Mar 13 16:54:12 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 13 Mar 2007 16:54:12 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> Message-ID: <1173822852.20823.4.camel@blabla.mcs.anl.gov> Oh my :) @whamfiles_m002 is known by the system at all times. That means GENERATOR does not need to wait for the actual files to be there since it knows very well what @whamfiles_m002 is (the list of names). You should try this instead: ... ... GENERATOR(whamfiles, str) { app { generator @whamfiles, str; } } ... = GENERATOR(whamfiles_m002, "m002") Mihael On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > Hi, > > I have a question: > > I am using a fixed_array_mapper to pass some 68 files as an input to my > application called GENERATOR. I need to use the mapper since the number of > input files is unknown before the workflow starts. Here is how I use it: > file whamfiles_m002[] solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > These files are all generated by stage four of my workflow, each file is > mapped to a physical filename, for example: > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > and this particular file is produced this way: > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, solv_chg_a0_m002_out, > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft, > rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002, > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", "system:solv_m002", > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", "stage:chg", > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > Then I call my application (the last stage of my workflow, stage five) > > (solv_chg_m002, solv_disp_m002, solv_repu_0DOT2_0DOT3_m002DOTwham, > solv_repu_0DOT3_0DOT4_m002DOTwham, solv_repu_0DOT4_0DOT5_m002DOTwham, > solv_repu_0DOT5_0DOT6_m002DOTwham, solv_repu_0DOT6_0DOT7_m002DOTwham, > solv_repu_0DOT7_0DOT8_m002DOTwham, solv_repu_0DOT8_0DOT9_m002DOTwham, > solv_repu_0DOT9_1_m002DOTwham, solv_repu_0_0DOT2_m002DOTwham ) = GENERATOR > (@whamfiles_m002, "m002"); > > And then when I start my workflow, the GENERATOR starts right away. I am > not sure why. Does the mapper look for the physical files on the disk and > when finds them - starts right away ? I do have the needed files in the > directory from my previous runs. Or there is something else wrong here ? > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > RunID: b0n2liektep92 > pre_ch started <---------- thats the first stage > generator_cat started <----------- not supposed to start now! > generator_cat started > > My complete dtm file is in /home/nefedova/swift.dtm on > terminable.ci.uchicago, but its pretty big... > > Thanks, > > Nika > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Tue Mar 13 17:23:02 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 13 Mar 2007 17:23:02 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <1173822852.20823.4.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> I think I am confused. Sorry! what will be the type of 'whamfiles' ? If its a string - will the swift know to brake it down to filenames and stage them all in ? Also - is there a mapper (or whatever) that can map the list of *logical* file names to an array ? (thats what I was trying to do). Thanks! Nika At 04:54 PM 3/13/2007, Mihael Hategan wrote: >Oh my :) >@whamfiles_m002 is known by the system at all times. That means >GENERATOR does not need to wait for the actual files to be there since >it knows very well what @whamfiles_m002 is (the list of names). > >You should try this instead: >... >... GENERATOR(whamfiles, str) { > app { > generator @whamfiles, str; > } >} > >... = GENERATOR(whamfiles_m002, "m002") > >Mihael > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > > Hi, > > > > I have a question: > > > > I am using a fixed_array_mapper to pass some 68 files as an input to my > > application called GENERATOR. I need to use the mapper since the number of > > input files is unknown before the workflow starts. Here is how I use it: > > file whamfiles_m002[] > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, get > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > These files are all generated by stage four of my workflow, each file is > > mapped to a physical filename, for example: > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > and this particular file is produced this way: > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, solv_chg_a0_m002_out, > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft, > > rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002, > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", "system:solv_m002", > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", "stage:chg", > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > Then I call my application (the last stage of my workflow, stage five) > > > > (solv_chg_m002, solv_disp_m002, solv_repu_0DOT2_0DOT3_m002DOTwham, > > solv_repu_0DOT3_0DOT4_m002DOTwham, solv_repu_0DOT4_0DOT5_m002DOTwham, > > solv_repu_0DOT5_0DOT6_m002DOTwham, solv_repu_0DOT6_0DOT7_m002DOTwham, > > solv_repu_0DOT7_0DOT8_m002DOTwham, solv_repu_0DOT8_0DOT9_m002DOTwham, > > solv_repu_0DOT9_1_m002DOTwham, solv_repu_0_0DOT2_m002DOTwham ) = GENERATOR > > (@whamfiles_m002, "m002"); > > > > And then when I start my workflow, the GENERATOR starts right away. I am > > not sure why. Does the mapper look for the physical files on the disk and > > when finds them - starts right away ? I do have the needed files in the > > directory from my previous runs. Or there is something else wrong here ? > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > RunID: b0n2liektep92 > > pre_ch started <---------- thats the first stage > > generator_cat started <----------- not supposed to start now! > > generator_cat started > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > terminable.ci.uchicago, but its pretty big... > > > > Thanks, > > > > Nika > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Tue Mar 13 17:37:52 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 13 Mar 2007 17:37:52 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> Message-ID: <1173825472.21214.5.camel@blabla.mcs.anl.gov> On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > I think I am confused. Sorry! whamfiles_m002 is a file array Swift will know how to break it into files and stage all of them in if you pass it as an array to the atomic procedure. @whamfiles_m002 is a string of space separated names. It gets passed to the application as one single argument. Mihael > what will be the type of 'whamfiles' ? If its a string - will the swift > know to brake it down to filenames and stage them all in ? > Also - is there a mapper (or whatever) that can map the list of *logical* > file names to an array ? (thats what I was trying to do). > > Thanks! > > Nika > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > >Oh my :) > >@whamfiles_m002 is known by the system at all times. That means > >GENERATOR does not need to wait for the actual files to be there since > >it knows very well what @whamfiles_m002 is (the list of names). > > > >You should try this instead: > >... > >... GENERATOR(whamfiles, str) { > > app { > > generator @whamfiles, str; > > } > >} > > > >... = GENERATOR(whamfiles_m002, "m002") > > > >Mihael > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > > > Hi, > > > > > > I have a question: > > > > > > I am using a fixed_array_mapper to pass some 68 files as an input to my > > > application called GENERATOR. I need to use the mapper since the number of > > > input files is unknown before the workflow starts. Here is how I use it: > > > file whamfiles_m002[] > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, > get > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > These files are all generated by stage four of my workflow, each file is > > > mapped to a physical filename, for example: > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > and this particular file is produced this way: > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, solv_chg_a0_m002_out, > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft, > > > rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002, > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", "system:solv_m002", > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", "stage:chg", > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > Then I call my application (the last stage of my workflow, stage five) > > > > > > (solv_chg_m002, solv_disp_m002, solv_repu_0DOT2_0DOT3_m002DOTwham, > > > solv_repu_0DOT3_0DOT4_m002DOTwham, solv_repu_0DOT4_0DOT5_m002DOTwham, > > > solv_repu_0DOT5_0DOT6_m002DOTwham, solv_repu_0DOT6_0DOT7_m002DOTwham, > > > solv_repu_0DOT7_0DOT8_m002DOTwham, solv_repu_0DOT8_0DOT9_m002DOTwham, > > > solv_repu_0DOT9_1_m002DOTwham, solv_repu_0_0DOT2_m002DOTwham ) = GENERATOR > > > (@whamfiles_m002, "m002"); > > > > > > And then when I start my workflow, the GENERATOR starts right away. I am > > > not sure why. Does the mapper look for the physical files on the disk and > > > when finds them - starts right away ? I do have the needed files in the > > > directory from my previous runs. Or there is something else wrong here ? > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > RunID: b0n2liektep92 > > > pre_ch started <---------- thats the first stage > > > generator_cat started <----------- not supposed to start now! > > > generator_cat started > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > terminable.ci.uchicago, but its pretty big... > > > > > > Thanks, > > > > > > Nika > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From hategan at mcs.anl.gov Tue Mar 13 17:51:45 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 13 Mar 2007 17:51:45 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> Message-ID: <1173826305.21214.8.camel@blabla.mcs.anl.gov> On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > Also - is there a mapper (or whatever) that can map the list of *logical* > file names to an array ? (thats what I was trying to do). The logical file names are your variables. whamfiles_m002[2] can be considered a structured logical file name. There is no distinct logical filename in Swift other than that. > > Thanks! > > Nika > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > >Oh my :) > >@whamfiles_m002 is known by the system at all times. That means > >GENERATOR does not need to wait for the actual files to be there since > >it knows very well what @whamfiles_m002 is (the list of names). > > > >You should try this instead: > >... > >... GENERATOR(whamfiles, str) { > > app { > > generator @whamfiles, str; > > } > >} > > > >... = GENERATOR(whamfiles_m002, "m002") > > > >Mihael > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > > > Hi, > > > > > > I have a question: > > > > > > I am using a fixed_array_mapper to pass some 68 files as an input to my > > > application called GENERATOR. I need to use the mapper since the number of > > > input files is unknown before the workflow starts. Here is how I use it: > > > file whamfiles_m002[] > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, > get > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > These files are all generated by stage four of my workflow, each file is > > > mapped to a physical filename, for example: > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > and this particular file is produced this way: > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, solv_chg_a0_m002_out, > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft, > > > rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002, > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", "system:solv_m002", > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", "stage:chg", > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > Then I call my application (the last stage of my workflow, stage five) > > > > > > (solv_chg_m002, solv_disp_m002, solv_repu_0DOT2_0DOT3_m002DOTwham, > > > solv_repu_0DOT3_0DOT4_m002DOTwham, solv_repu_0DOT4_0DOT5_m002DOTwham, > > > solv_repu_0DOT5_0DOT6_m002DOTwham, solv_repu_0DOT6_0DOT7_m002DOTwham, > > > solv_repu_0DOT7_0DOT8_m002DOTwham, solv_repu_0DOT8_0DOT9_m002DOTwham, > > > solv_repu_0DOT9_1_m002DOTwham, solv_repu_0_0DOT2_m002DOTwham ) = GENERATOR > > > (@whamfiles_m002, "m002"); > > > > > > And then when I start my workflow, the GENERATOR starts right away. I am > > > not sure why. Does the mapper look for the physical files on the disk and > > > when finds them - starts right away ? I do have the needed files in the > > > directory from my previous runs. Or there is something else wrong here ? > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > RunID: b0n2liektep92 > > > pre_ch started <---------- thats the first stage > > > generator_cat started <----------- not supposed to start now! > > > generator_cat started > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > terminable.ci.uchicago, but its pretty big... > > > > > > Thanks, > > > > > > Nika > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From hategan at mcs.anl.gov Tue Mar 13 17:56:02 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 13 Mar 2007 17:56:02 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> Message-ID: <1173826562.21214.14.camel@blabla.mcs.anl.gov> On a third thought. This looks like, eventually, you are trying to do the same thing that Yong did with the dependent mappers earlier. I think he would have more insight on the topic. On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > I think I am confused. Sorry! > what will be the type of 'whamfiles' ? If its a string - will the swift > know to brake it down to filenames and stage them all in ? > Also - is there a mapper (or whatever) that can map the list of *logical* > file names to an array ? (thats what I was trying to do). > > Thanks! > > Nika > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > >Oh my :) > >@whamfiles_m002 is known by the system at all times. That means > >GENERATOR does not need to wait for the actual files to be there since > >it knows very well what @whamfiles_m002 is (the list of names). > > > >You should try this instead: > >... > >... GENERATOR(whamfiles, str) { > > app { > > generator @whamfiles, str; > > } > >} > > > >... = GENERATOR(whamfiles_m002, "m002") > > > >Mihael > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > > > Hi, > > > > > > I have a question: > > > > > > I am using a fixed_array_mapper to pass some 68 files as an input to my > > > application called GENERATOR. I need to use the mapper since the number of > > > input files is unknown before the workflow starts. Here is how I use it: > > > file whamfiles_m002[] > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, > get > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > These files are all generated by stage four of my workflow, each file is > > > mapped to a physical filename, for example: > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > and this particular file is produced this way: > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, solv_chg_a0_m002_out, > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft, > > > rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002, > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", "system:solv_m002", > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", "stage:chg", > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > Then I call my application (the last stage of my workflow, stage five) > > > > > > (solv_chg_m002, solv_disp_m002, solv_repu_0DOT2_0DOT3_m002DOTwham, > > > solv_repu_0DOT3_0DOT4_m002DOTwham, solv_repu_0DOT4_0DOT5_m002DOTwham, > > > solv_repu_0DOT5_0DOT6_m002DOTwham, solv_repu_0DOT6_0DOT7_m002DOTwham, > > > solv_repu_0DOT7_0DOT8_m002DOTwham, solv_repu_0DOT8_0DOT9_m002DOTwham, > > > solv_repu_0DOT9_1_m002DOTwham, solv_repu_0_0DOT2_m002DOTwham ) = GENERATOR > > > (@whamfiles_m002, "m002"); > > > > > > And then when I start my workflow, the GENERATOR starts right away. I am > > > not sure why. Does the mapper look for the physical files on the disk and > > > when finds them - starts right away ? I do have the needed files in the > > > directory from my previous runs. Or there is something else wrong here ? > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > RunID: b0n2liektep92 > > > pre_ch started <---------- thats the first stage > > > generator_cat started <----------- not supposed to start now! > > > generator_cat started > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > terminable.ci.uchicago, but its pretty big... > > > > > > Thanks, > > > > > > Nika > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From nefedova at mcs.anl.gov Tue Mar 13 18:07:13 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 13 Mar 2007 18:07:13 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <1173826562.21214.14.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> <1173826562.21214.14.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070313180235.04929090@mail.mcs.anl.gov> ok, here is in short what I need to do: at some point in the workflow N files are produced (in my case its 68, but it could be any number). These files produced each by a separate job (i.e. N jobs produce N files). The next job in the workflow needs to take those N files as an input. Question: how do I pass these unknown number of files as an input to an application ? The array_mapper didn't work (or i didn't use it correctly). Thanks! Nika At 05:56 PM 3/13/2007, Mihael Hategan wrote: >On a third thought. This looks like, eventually, you are trying to do >the same thing that Yong did with the dependent mappers earlier. I think >he would have more insight on the topic. > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > > I think I am confused. Sorry! > > what will be the type of 'whamfiles' ? If its a string - will the swift > > know to brake it down to filenames and stage them all in ? > > Also - is there a mapper (or whatever) that can map the list of *logical* > > file names to an array ? (thats what I was trying to do). > > > > Thanks! > > > > Nika > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > > >Oh my :) > > >@whamfiles_m002 is known by the system at all times. That means > > >GENERATOR does not need to wait for the actual files to be there since > > >it knows very well what @whamfiles_m002 is (the list of names). > > > > > >You should try this instead: > > >... > > >... GENERATOR(whamfiles, str) { > > > app { > > > generator @whamfiles, str; > > > } > > >} > > > > > >... = GENERATOR(whamfiles_m002, "m002") > > > > > >Mihael > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > > > > Hi, > > > > > > > > I have a question: > > > > > > > > I am using a fixed_array_mapper to pass some 68 files as an input to my > > > > application called GENERATOR. I need to use the mapper since the > number of > > > > input files is unknown before the workflow starts. Here is how I > use it: > > > > file whamfiles_m002[] solv_chg_a0_m002_wham, > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, you > > > get > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > > > These files are all generated by stage four of my workflow, each > file is > > > > mapped to a physical filename, for example: > > > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > > and this particular file is produced this way: > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, solv_chg_a0_m002_out, > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft, > > > > rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002, > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", "system:solv_m002", > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", "stage:chg", > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > > > Then I call my application (the last stage of my workflow, stage five) > > > > > > > > (solv_chg_m002, solv_disp_m002, solv_repu_0DOT2_0DOT3_m002DOTwham, > > > > solv_repu_0DOT3_0DOT4_m002DOTwham, solv_repu_0DOT4_0DOT5_m002DOTwham, > > > > solv_repu_0DOT5_0DOT6_m002DOTwham, solv_repu_0DOT6_0DOT7_m002DOTwham, > > > > solv_repu_0DOT7_0DOT8_m002DOTwham, solv_repu_0DOT8_0DOT9_m002DOTwham, > > > > solv_repu_0DOT9_1_m002DOTwham, solv_repu_0_0DOT2_m002DOTwham ) = > GENERATOR > > > > (@whamfiles_m002, "m002"); > > > > > > > > And then when I start my workflow, the GENERATOR starts right away. > I am > > > > not sure why. Does the mapper look for the physical files on the > disk and > > > > when finds them - starts right away ? I do have the needed files in the > > > > directory from my previous runs. Or there is something else wrong > here ? > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > > RunID: b0n2liektep92 > > > > pre_ch started <---------- thats the first stage > > > > generator_cat started <----------- not supposed to start now! > > > > generator_cat started > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > > terminable.ci.uchicago, but its pretty big... > > > > > > > > Thanks, > > > > > > > > Nika > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > From hategan at mcs.anl.gov Tue Mar 13 18:12:51 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 13 Mar 2007 18:12:51 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <6.0.0.22.2.20070313180235.04929090@mail.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> <1173826562.21214.14.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313180235.04929090@mail.mcs.anl.gov> Message-ID: <1173827571.21499.3.camel@blabla.mcs.anl.gov> On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote: > ok, here is in short what I need to do: > > at some point in the workflow N files are produced (in my case its 68, but > it could be any number). These files produced each by a separate job (i.e. > N jobs produce N files). > The next job in the workflow needs to take those N files as an input. > > Question: how do I pass these unknown number of files as an input to an > application ? The array_mapper didn't work (or i didn't use it correctly). In this case you need some other kind of mapper that can deal with unknown numbers of items. The default mapper (i.e. specifying no mapper) should work. So you need to do: file whamfiles_002[]; foreach v,k in someinput { whamfiles_002[k] = job(v); } ... = GENERATOR(whamfiles_002); > > Thanks! > > Nika > > At 05:56 PM 3/13/2007, Mihael Hategan wrote: > >On a third thought. This looks like, eventually, you are trying to do > >the same thing that Yong did with the dependent mappers earlier. I think > >he would have more insight on the topic. > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > > > I think I am confused. Sorry! > > > what will be the type of 'whamfiles' ? If its a string - will the swift > > > know to brake it down to filenames and stage them all in ? > > > Also - is there a mapper (or whatever) that can map the list of *logical* > > > file names to an array ? (thats what I was trying to do). > > > > > > Thanks! > > > > > > Nika > > > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > > > >Oh my :) > > > >@whamfiles_m002 is known by the system at all times. That means > > > >GENERATOR does not need to wait for the actual files to be there since > > > >it knows very well what @whamfiles_m002 is (the list of names). > > > > > > > >You should try this instead: > > > >... > > > >... GENERATOR(whamfiles, str) { > > > > app { > > > > generator @whamfiles, str; > > > > } > > > >} > > > > > > > >... = GENERATOR(whamfiles_m002, "m002") > > > > > > > >Mihael > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > > > > > Hi, > > > > > > > > > > I have a question: > > > > > > > > > > I am using a fixed_array_mapper to pass some 68 files as an input to my > > > > > application called GENERATOR. I need to use the mapper since the > > number of > > > > > input files is unknown before the workflow starts. Here is how I > > use it: > > > > > file whamfiles_m002[] > solv_chg_a0_m002_wham, > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, > you > > > > get > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > > > > > These files are all generated by stage four of my workflow, each > > file is > > > > > mapped to a physical filename, for example: > > > > > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > > > and this particular file is produced this way: > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, solv_chg_a0_m002_out, > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft, > > > > > rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002, > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", "system:solv_m002", > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", "stage:chg", > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > > > > > Then I call my application (the last stage of my workflow, stage five) > > > > > > > > > > (solv_chg_m002, solv_disp_m002, solv_repu_0DOT2_0DOT3_m002DOTwham, > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham, solv_repu_0DOT4_0DOT5_m002DOTwham, > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham, solv_repu_0DOT6_0DOT7_m002DOTwham, > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham, solv_repu_0DOT8_0DOT9_m002DOTwham, > > > > > solv_repu_0DOT9_1_m002DOTwham, solv_repu_0_0DOT2_m002DOTwham ) = > > GENERATOR > > > > > (@whamfiles_m002, "m002"); > > > > > > > > > > And then when I start my workflow, the GENERATOR starts right away. > > I am > > > > > not sure why. Does the mapper look for the physical files on the > > disk and > > > > > when finds them - starts right away ? I do have the needed files in the > > > > > directory from my previous runs. Or there is something else wrong > > here ? > > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > > > RunID: b0n2liektep92 > > > > > pre_ch started <---------- thats the first stage > > > > > generator_cat started <----------- not supposed to start now! > > > > > generator_cat started > > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > > > terminable.ci.uchicago, but its pretty big... > > > > > > > > > > Thanks, > > > > > > > > > > Nika > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > From nefedova at mcs.anl.gov Tue Mar 13 18:28:07 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Tue, 13 Mar 2007 18:28:07 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <1173827571.21499.3.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> <1173826562.21214.14.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313180235.04929090@mail.mcs.anl.gov> <1173827571.21499.3.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070313181522.0491b840@mail.mcs.anl.gov> Hmmm. So here is how my files are produced (inside double loop over $s and $name): file $s9prt <"$name.prt">; file $s9wham <"$s9.wham">; file $s9crd <"$s9.crd">; file $s9out <"$s9.out">; file $s9done <"$s9donefile">; ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1", "$rcut2"); so if I change the mapping of the needed output file ($s9wham), everything should work? file whamfiles_$s[$i] <"$s9.wham">; i=`expr $i + 1` and call the function: (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1", "$rcut2"); Nika At 06:12 PM 3/13/2007, Mihael Hategan wrote: >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote: > > ok, here is in short what I need to do: > > > > at some point in the workflow N files are produced (in my case its 68, but > > it could be any number). These files produced each by a separate job (i.e. > > N jobs produce N files). > > The next job in the workflow needs to take those N files as an input. > > > > Question: how do I pass these unknown number of files as an input to an > > application ? The array_mapper didn't work (or i didn't use it correctly). > >In this case you need some other kind of mapper that can deal with >unknown numbers of items. The default mapper (i.e. specifying no mapper) >should work. > >So you need to do: > >file whamfiles_002[]; > >foreach v,k in someinput { > whamfiles_002[k] = job(v); >} > >... = GENERATOR(whamfiles_002); > > > > > Thanks! > > > > Nika > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote: > > >On a third thought. This looks like, eventually, you are trying to do > > >the same thing that Yong did with the dependent mappers earlier. I think > > >he would have more insight on the topic. > > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > > > > I think I am confused. Sorry! > > > > what will be the type of 'whamfiles' ? If its a string - will the swift > > > > know to brake it down to filenames and stage them all in ? > > > > Also - is there a mapper (or whatever) that can map the list of > *logical* > > > > file names to an array ? (thats what I was trying to do). > > > > > > > > Thanks! > > > > > > > > Nika > > > > > > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > > > > >Oh my :) > > > > >@whamfiles_m002 is known by the system at all times. That means > > > > >GENERATOR does not need to wait for the actual files to be there since > > > > >it knows very well what @whamfiles_m002 is (the list of names). > > > > > > > > > >You should try this instead: > > > > >... > > > > >... GENERATOR(whamfiles, str) { > > > > > app { > > > > > generator @whamfiles, str; > > > > > } > > > > >} > > > > > > > > > >... = GENERATOR(whamfiles_m002, "m002") > > > > > > > > > >Mihael > > > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > > > > > > Hi, > > > > > > > > > > > > I have a question: > > > > > > > > > > > > I am using a fixed_array_mapper to pass some 68 files as an > input to my > > > > > > application called GENERATOR. I need to use the mapper since the > > > number of > > > > > > input files is unknown before the workflow starts. Here is how I > > > use it: > > > > > > file whamfiles_m002[] > > solv_chg_a0_m002_wham, > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, files, > > > you > > > > > get > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > > > > > > > These files are all generated by stage four of my workflow, each > > > file is > > > > > > mapped to a physical filename, for example: > > > > > > > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > > > > and this particular file is produced this way: > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, solv_chg_a0_m002_out, > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft, > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002, > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", "system:solv_m002", > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", "stage:chg", > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > > > > > > > Then I call my application (the last stage of my workflow, > stage five) > > > > > > > > > > > > (solv_chg_m002, solv_disp_m002, solv_repu_0DOT2_0DOT3_m002DOTwham, > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham, > solv_repu_0DOT4_0DOT5_m002DOTwham, > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham, > solv_repu_0DOT6_0DOT7_m002DOTwham, > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham, > solv_repu_0DOT8_0DOT9_m002DOTwham, > > > > > > solv_repu_0DOT9_1_m002DOTwham, solv_repu_0_0DOT2_m002DOTwham ) = > > > GENERATOR > > > > > > (@whamfiles_m002, "m002"); > > > > > > > > > > > > And then when I start my workflow, the GENERATOR starts right > away. > > > I am > > > > > > not sure why. Does the mapper look for the physical files on the > > > disk and > > > > > > when finds them - starts right away ? I do have the needed > files in the > > > > > > directory from my previous runs. Or there is something else wrong > > > here ? > > > > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > > > > RunID: b0n2liektep92 > > > > > > pre_ch started <---------- thats the first stage > > > > > > generator_cat started <----------- not supposed to start now! > > > > > > generator_cat started > > > > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > > > > terminable.ci.uchicago, but its pretty big... > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > From yongzh at cs.uchicago.edu Tue Mar 13 18:35:29 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Tue, 13 Mar 2007 18:35:29 -0500 (CDT) Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <6.0.0.22.2.20070313181522.0491b840@mail.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> <1173826562.21214.14.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313180235.04929090@mail.mcs.anl.gov> <1173827571.21499.3.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313181522.0491b840@mail.mcs.anl.gov> Message-ID: I think that should do, you put the output from an individual app and put it into an array. We have the same pattern in the fmri workflow: apply a reorient to a set of images, and then average the produced image set. Yong. On Tue, 13 Mar 2007, Veronika V. Nefedova wrote: > Hmmm. So here is how my files are produced (inside double loop over $s and > $name): > > file $s9prt <"$name.prt">; > file $s9wham <"$s9.wham">; > file $s9crd <"$s9.crd">; > file $s9out <"$s9.out">; > file $s9done <"$s9donefile">; > > ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, gaff_rft, > rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, "$ss1", > "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1", "$rcut2"); > > so if I change the mapping of the needed output file ($s9wham), everything > should work? > > file whamfiles_$s[$i] <"$s9.wham">; > i=`expr $i + 1` > > and call the function: > (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, > gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, > "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1", > "$rcut2"); > > Nika > > At 06:12 PM 3/13/2007, Mihael Hategan wrote: > >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote: > > > ok, here is in short what I need to do: > > > > > > at some point in the workflow N files are produced (in my case its 68, but > > > it could be any number). These files produced each by a separate job (i.e. > > > N jobs produce N files). > > > The next job in the workflow needs to take those N files as an input. > > > > > > Question: how do I pass these unknown number of files as an input to an > > > application ? The array_mapper didn't work (or i didn't use it correctly). > > > >In this case you need some other kind of mapper that can deal with > >unknown numbers of items. The default mapper (i.e. specifying no mapper) > >should work. > > > >So you need to do: > > > >file whamfiles_002[]; > > > >foreach v,k in someinput { > > whamfiles_002[k] = job(v); > >} > > > >... = GENERATOR(whamfiles_002); > > > > > > > > Thanks! > > > > > > Nika > > > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote: > > > >On a third thought. This looks like, eventually, you are trying to do > > > >the same thing that Yong did with the dependent mappers earlier. I think > > > >he would have more insight on the topic. > > > > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > > > > > I think I am confused. Sorry! > > > > > what will be the type of 'whamfiles' ? If its a string - will the swift > > > > > know to brake it down to filenames and stage them all in ? > > > > > Also - is there a mapper (or whatever) that can map the list of > > *logical* > > > > > file names to an array ? (thats what I was trying to do). > > > > > > > > > > Thanks! > > > > > > > > > > Nika > > > > > > > > > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > > > > > >Oh my :) > > > > > >@whamfiles_m002 is known by the system at all times. That means > > > > > >GENERATOR does not need to wait for the actual files to be there since > > > > > >it knows very well what @whamfiles_m002 is (the list of names). > > > > > > > > > > > >You should try this instead: > > > > > >... > > > > > >... GENERATOR(whamfiles, str) { > > > > > > app { > > > > > > generator @whamfiles, str; > > > > > > } > > > > > >} > > > > > > > > > > > >... = GENERATOR(whamfiles_m002, "m002") > > > > > > > > > > > >Mihael > > > > > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I have a question: > > > > > > > > > > > > > > I am using a fixed_array_mapper to pass some 68 files as an > > input to my > > > > > > > application called GENERATOR. I need to use the mapper since the > > > > number of > > > > > > > input files is unknown before the workflow starts. Here is how I > > > > use it: > > > > > > > file whamfiles_m002[] > > > solv_chg_a0_m002_wham, > > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, > files, > > > > you > > > > > > get > > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > > > > > > > > > These files are all generated by stage four of my workflow, each > > > > file is > > > > > > > mapped to a physical filename, for example: > > > > > > > > > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > > > > > and this particular file is produced this way: > > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, solv_chg_a0_m002_out, > > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft, > > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002, > > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", "system:solv_m002", > > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", "stage:chg", > > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > > > > > > > > > Then I call my application (the last stage of my workflow, > > stage five) > > > > > > > > > > > > > > (solv_chg_m002, solv_disp_m002, solv_repu_0DOT2_0DOT3_m002DOTwham, > > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham, > > solv_repu_0DOT4_0DOT5_m002DOTwham, > > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham, > > solv_repu_0DOT6_0DOT7_m002DOTwham, > > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham, > > solv_repu_0DOT8_0DOT9_m002DOTwham, > > > > > > > solv_repu_0DOT9_1_m002DOTwham, solv_repu_0_0DOT2_m002DOTwham ) = > > > > GENERATOR > > > > > > > (@whamfiles_m002, "m002"); > > > > > > > > > > > > > > And then when I start my workflow, the GENERATOR starts right > > away. > > > > I am > > > > > > > not sure why. Does the mapper look for the physical files on the > > > > disk and > > > > > > > when finds them - starts right away ? I do have the needed > > files in the > > > > > > > directory from my previous runs. Or there is something else wrong > > > > here ? > > > > > > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > > > > > RunID: b0n2liektep92 > > > > > > > pre_ch started <---------- thats the first stage > > > > > > > generator_cat started <----------- not supposed to start now! > > > > > > > generator_cat started > > > > > > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > > > > > terminable.ci.uchicago, but its pretty big... > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Tue Mar 13 18:41:21 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 13 Mar 2007 18:41:21 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <6.0.0.22.2.20070313181522.0491b840@mail.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> <1173826562.21214.14.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313180235.04929090@mail.mcs.anl.gov> <1173827571.21499.3.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313181522.0491b840@mail.mcs.anl.gov> Message-ID: <1173829281.21674.9.camel@blabla.mcs.anl.gov> On Tue, 2007-03-13 at 18:28 -0500, Veronika V. Nefedova wrote: > Hmmm. So here is how my files are produced (inside double loop over $s and > $name): > > file $s9prt <"$name.prt">; > file $s9wham <"$s9.wham">; > file $s9crd <"$s9.crd">; > file $s9out <"$s9.out">; > file $s9done <"$s9donefile">; > > ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, gaff_rft, > rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, "$ss1", > "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1", "$rcut2"); > > so if I change the mapping of the needed output file ($s9wham), everything > should work? > > file whamfiles_$s[$i] <"$s9.wham">; That one won't work. You need to let Swift map whamfiles_$s[] to what it wants. So you can't map individual items in an array differently. I believe that you rely on the fact that whamfiles_xzy maps to the same file names as some other variables. This won't work. You need to use the same variable. The file names are irrelevant if the program doesn't make sense for Swift. So think about it this way: mentally remove all the mapper declarations from the Swift program. If after that, the program makes sense, then you should be good to go. If it doesn't then it's likely it won't work. Remember, mapping is not something that can be used to hack things because the workflow structure has nothing to do with the mappers and Swift ignores mappers when figuring out the data flow. (dependent mappers notwithstanding) > i=`expr $i + 1` > > and call the function: > (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, > gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, > "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1", > "$rcut2"); > > Nika > > At 06:12 PM 3/13/2007, Mihael Hategan wrote: > >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote: > > > ok, here is in short what I need to do: > > > > > > at some point in the workflow N files are produced (in my case its 68, but > > > it could be any number). These files produced each by a separate job (i.e. > > > N jobs produce N files). > > > The next job in the workflow needs to take those N files as an input. > > > > > > Question: how do I pass these unknown number of files as an input to an > > > application ? The array_mapper didn't work (or i didn't use it correctly). > > > >In this case you need some other kind of mapper that can deal with > >unknown numbers of items. The default mapper (i.e. specifying no mapper) > >should work. > > > >So you need to do: > > > >file whamfiles_002[]; > > > >foreach v,k in someinput { > > whamfiles_002[k] = job(v); > >} > > > >... = GENERATOR(whamfiles_002); > > > > > > > > Thanks! > > > > > > Nika > > > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote: > > > >On a third thought. This looks like, eventually, you are trying to do > > > >the same thing that Yong did with the dependent mappers earlier. I think > > > >he would have more insight on the topic. > > > > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > > > > > I think I am confused. Sorry! > > > > > what will be the type of 'whamfiles' ? If its a string - will the swift > > > > > know to brake it down to filenames and stage them all in ? > > > > > Also - is there a mapper (or whatever) that can map the list of > > *logical* > > > > > file names to an array ? (thats what I was trying to do). > > > > > > > > > > Thanks! > > > > > > > > > > Nika > > > > > > > > > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > > > > > >Oh my :) > > > > > >@whamfiles_m002 is known by the system at all times. That means > > > > > >GENERATOR does not need to wait for the actual files to be there since > > > > > >it knows very well what @whamfiles_m002 is (the list of names). > > > > > > > > > > > >You should try this instead: > > > > > >... > > > > > >... GENERATOR(whamfiles, str) { > > > > > > app { > > > > > > generator @whamfiles, str; > > > > > > } > > > > > >} > > > > > > > > > > > >... = GENERATOR(whamfiles_m002, "m002") > > > > > > > > > > > >Mihael > > > > > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > > > > > > > Hi, > > > > > > > > > > > > > > I have a question: > > > > > > > > > > > > > > I am using a fixed_array_mapper to pass some 68 files as an > > input to my > > > > > > > application called GENERATOR. I need to use the mapper since the > > > > number of > > > > > > > input files is unknown before the workflow starts. Here is how I > > > > use it: > > > > > > > file whamfiles_m002[] > > > solv_chg_a0_m002_wham, > > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, > files, > > > > you > > > > > > get > > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > > > > > > > > > These files are all generated by stage four of my workflow, each > > > > file is > > > > > > > mapped to a physical filename, for example: > > > > > > > > > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > > > > > and this particular file is produced this way: > > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, solv_chg_a0_m002_out, > > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft, > > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002, > > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", "system:solv_m002", > > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", "stage:chg", > > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > > > > > > > > > Then I call my application (the last stage of my workflow, > > stage five) > > > > > > > > > > > > > > (solv_chg_m002, solv_disp_m002, solv_repu_0DOT2_0DOT3_m002DOTwham, > > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham, > > solv_repu_0DOT4_0DOT5_m002DOTwham, > > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham, > > solv_repu_0DOT6_0DOT7_m002DOTwham, > > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham, > > solv_repu_0DOT8_0DOT9_m002DOTwham, > > > > > > > solv_repu_0DOT9_1_m002DOTwham, solv_repu_0_0DOT2_m002DOTwham ) = > > > > GENERATOR > > > > > > > (@whamfiles_m002, "m002"); > > > > > > > > > > > > > > And then when I start my workflow, the GENERATOR starts right > > away. > > > > I am > > > > > > > not sure why. Does the mapper look for the physical files on the > > > > disk and > > > > > > > when finds them - starts right away ? I do have the needed > > files in the > > > > > > > directory from my previous runs. Or there is something else wrong > > > > here ? > > > > > > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > > > > > RunID: b0n2liektep92 > > > > > > > pre_ch started <---------- thats the first stage > > > > > > > generator_cat started <----------- not supposed to start now! > > > > > > > generator_cat started > > > > > > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > > > > > terminable.ci.uchicago, but its pretty big... > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > From nefedova at mcs.anl.gov Wed Mar 14 09:32:09 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 14 Mar 2007 09:32:09 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <1173829281.21674.9.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> <1173826562.21214.14.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313180235.04929090@mail.mcs.anl.gov> <1173827571.21499.3.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313181522.0491b840@mail.mcs.anl.gov> <1173829281.21674.9.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070314091135.04922d80@mail.mcs.anl.gov> Ok, now I think you hit the area in your explanations that I always had a problem with. So here is my understanding of things: if I have two apps that I need to chain together, I need to do this: file a <"a.txt">; file b <"b.txt">; file c <"c.txt">; a = APP1 (b); c = APP2 (a); I.e. the chaining of the programs happens on a 'logical' file level (a,b,c rather then a.txt, b.txt, c.txt). Is that a correct understanding? I acted on this understanding and my workflow has been working fine (till now -- but thats another story). Having create all this logical files was a *big* pain (as I couldn't have the same logical names as physical filenames due to a different file naming conventions in swift: no multiple ".", etc). It really would've been much easier for my workflow to have just this: a1.txt = APP1 (b.txt); a2.txt = APP2 (b.txt); c.txt = APP3 (a1.txt, a2.txt); as my applications produce an enormous amount of intermediate files with some specific naming conventions. Now back to my original problem - constructing and passing to my next application a collection of files. If I didn't have to do any mappers, it would've been just as easy as (for example): c.txt = APP3 (a*.txt); Does it make sense at all ? Thanks, Nika At 06:41 PM 3/13/2007, Mihael Hategan wrote: >On Tue, 2007-03-13 at 18:28 -0500, Veronika V. Nefedova wrote: > > Hmmm. So here is how my files are produced (inside double loop over $s and > > $name): > > > > file $s9prt <"$name.prt">; > > file $s9wham <"$s9.wham">; > > file $s9crd <"$s9.crd">; > > file $s9out <"$s9.out">; > > file $s9done <"$s9donefile">; > > > > ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, gaff_rft, > > rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, "$ss1", > > "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1", > "$rcut2"); > > > > so if I change the mapping of the needed output file ($s9wham), everything > > should work? > > > > file whamfiles_$s[$i] <"$s9.wham">; > >That one won't work. >You need to let Swift map whamfiles_$s[] to what it wants. So you can't >map individual items in an array differently. > >I believe that you rely on the fact that whamfiles_xzy maps to the same >file names as some other variables. This won't work. You need to use the >same variable. The file names are irrelevant if the program doesn't make >sense for Swift. >So think about it this way: mentally remove all the mapper declarations >from the Swift program. If after that, the program makes sense, then you >should be good to go. If it doesn't then it's likely it won't work. >Remember, mapping is not something that can be used to hack things >because the workflow structure has nothing to do with the mappers and >Swift ignores mappers when figuring out the data flow. > >(dependent mappers notwithstanding) > > > i=`expr $i + 1` > > > > and call the function: > > (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, > > gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, > > "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", > "$rcut1", > > "$rcut2"); > > > > Nika > > > > At 06:12 PM 3/13/2007, Mihael Hategan wrote: > > >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote: > > > > ok, here is in short what I need to do: > > > > > > > > at some point in the workflow N files are produced (in my case its > 68, but > > > > it could be any number). These files produced each by a separate > job (i.e. > > > > N jobs produce N files). > > > > The next job in the workflow needs to take those N files as an input. > > > > > > > > Question: how do I pass these unknown number of files as an input to an > > > > application ? The array_mapper didn't work (or i didn't use it > correctly). > > > > > >In this case you need some other kind of mapper that can deal with > > >unknown numbers of items. The default mapper (i.e. specifying no mapper) > > >should work. > > > > > >So you need to do: > > > > > >file whamfiles_002[]; > > > > > >foreach v,k in someinput { > > > whamfiles_002[k] = job(v); > > >} > > > > > >... = GENERATOR(whamfiles_002); > > > > > > > > > > > Thanks! > > > > > > > > Nika > > > > > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote: > > > > >On a third thought. This looks like, eventually, you are trying to do > > > > >the same thing that Yong did with the dependent mappers earlier. I > think > > > > >he would have more insight on the topic. > > > > > > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > > > > > > I think I am confused. Sorry! > > > > > > what will be the type of 'whamfiles' ? If its a string - will > the swift > > > > > > know to brake it down to filenames and stage them all in ? > > > > > > Also - is there a mapper (or whatever) that can map the list of > > > *logical* > > > > > > file names to an array ? (thats what I was trying to do). > > > > > > > > > > > > Thanks! > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > > > > > > >Oh my :) > > > > > > >@whamfiles_m002 is known by the system at all times. That means > > > > > > >GENERATOR does not need to wait for the actual files to be > there since > > > > > > >it knows very well what @whamfiles_m002 is (the list of names). > > > > > > > > > > > > > >You should try this instead: > > > > > > >... > > > > > > >... GENERATOR(whamfiles, str) { > > > > > > > app { > > > > > > > generator @whamfiles, str; > > > > > > > } > > > > > > >} > > > > > > > > > > > > > >... = GENERATOR(whamfiles_m002, "m002") > > > > > > > > > > > > > >Mihael > > > > > > > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > > > > > > > > Hi, > > > > > > > > > > > > > > > > I have a question: > > > > > > > > > > > > > > > > I am using a fixed_array_mapper to pass some 68 files as an > > > input to my > > > > > > > > application called GENERATOR. I need to use the mapper > since the > > > > > number of > > > > > > > > input files is unknown before the workflow starts. Here is > how I > > > > > use it: > > > > > > > > file whamfiles_m002[] > > > > solv_chg_a0_m002_wham, > > > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, > > files, > > > > > you > > > > > > > get > > > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > > > > > > > > > > > These files are all generated by stage four of my workflow, > each > > > > > file is > > > > > > > > mapped to a physical filename, for example: > > > > > > > > > > > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > > > > > > and this particular file is produced this way: > > > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, > solv_chg_a0_m002_out, > > > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft, > > > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002, > > > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", > "system:solv_m002", > > > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", > "stage:chg", > > > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > > > > > > > > > > > Then I call my application (the last stage of my workflow, > > > stage five) > > > > > > > > > > > > > > > > (solv_chg_m002, solv_disp_m002, > solv_repu_0DOT2_0DOT3_m002DOTwham, > > > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham, > > > solv_repu_0DOT4_0DOT5_m002DOTwham, > > > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham, > > > solv_repu_0DOT6_0DOT7_m002DOTwham, > > > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham, > > > solv_repu_0DOT8_0DOT9_m002DOTwham, > > > > > > > > solv_repu_0DOT9_1_m002DOTwham, > solv_repu_0_0DOT2_m002DOTwham ) = > > > > > GENERATOR > > > > > > > > (@whamfiles_m002, "m002"); > > > > > > > > > > > > > > > > And then when I start my workflow, the GENERATOR starts right > > > away. > > > > > I am > > > > > > > > not sure why. Does the mapper look for the physical files > on the > > > > > disk and > > > > > > > > when finds them - starts right away ? I do have the needed > > > files in the > > > > > > > > directory from my previous runs. Or there is something else > wrong > > > > > here ? > > > > > > > > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > > > > > > RunID: b0n2liektep92 > > > > > > > > pre_ch started <---------- thats the first stage > > > > > > > > generator_cat started <----------- not supposed to start > now! > > > > > > > > generator_cat started > > > > > > > > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > > > > > > terminable.ci.uchicago, but its pretty big... > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Swift-devel mailing list > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From hategan at mcs.anl.gov Wed Mar 14 10:20:30 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 14 Mar 2007 10:20:30 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <6.0.0.22.2.20070314091135.04922d80@mail.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> <1173826562.21214.14.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313180235.04929090@mail.mcs.anl.gov> <1173827571.21499.3.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313181522.0491b840@mail.mcs.anl.gov> <1173829281.21674.9.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070314091135.04922d80@mail.mcs.anl.gov> Message-ID: <1173885630.22830.18.camel@blabla.mcs.anl.gov> On Wed, 2007-03-14 at 09:32 -0500, Veronika V. Nefedova wrote: > Ok, now I think you hit the area in your explanations that I always had a > problem with. So here is my understanding of things: > > if I have two apps that I need to chain together, I need to do this: > > file a <"a.txt">; > file b <"b.txt">; > file c <"c.txt">; > a = APP1 (b); > c = APP2 (a); Yep. But if you don't care about what file a is put in, you can skip mapping it. Although I gather it doesn't change things by much: file a; file b <"b.txt">; file c <"c.txt">; a = APP1 (b); c = APP2 (a); > > I.e. the chaining of the programs happens on a 'logical' file level (a,b,c > rather then a.txt, b.txt, c.txt). Is that a correct understanding? Yes. > I acted > on this understanding and my workflow has been working fine (till now -- > but thats another story). Having create all this logical files was a *big* > pain (as I couldn't have the same logical names as physical filenames due > to a different file naming conventions in swift: no multiple ".", etc). It > really would've been much easier for my workflow to have just this: > > a1.txt = APP1 (b.txt); > a2.txt = APP2 (b.txt); > c.txt = APP3 (a1.txt, a2.txt); > > as my applications produce an enormous amount of intermediate files with > some specific naming conventions. Swift needs to know about those. Any workflow system would need to know about those. There is no way to automatically determine what set of files an application invocation will need. It may be possible to determine what set of files an application invocation produces (although making it consistent may be difficult), but even in that case the matter of distinguishing which of those are meaningful for your workflow is not quite possible. > > Now back to my original problem - constructing and passing to my next > application a collection of files. If I didn't have to do any mappers, it > would've been just as easy as (for example): > > c.txt = APP3 (a*.txt); There isn't much difference between that and c = APP3(a), where a is an array. But I digress. > > Does it make sense at all ? Of course. However, I'm not convinced how well it would work, for reasons outlined above. So there is a number of operations and certain dependencies between them, where the operations are job submissions and file transfers (let's abstract low level, technology dependent things for now). These need to happen and are the result of executing a workflow (regardless of how it's expressed). They represent the application that you are trying to implement. If there is a way to infer all those operations and all the dependencies from your specification model, then it would be ok. So in the context of exploring a different way of expressing things, it would be helpful to have a clear illustration of both of them and the rules that get you from one to the other. Mihael > > Thanks, > > Nika > > > > At 06:41 PM 3/13/2007, Mihael Hategan wrote: > >On Tue, 2007-03-13 at 18:28 -0500, Veronika V. Nefedova wrote: > > > Hmmm. So here is how my files are produced (inside double loop over $s and > > > $name): > > > > > > file $s9prt <"$name.prt">; > > > file $s9wham <"$s9.wham">; > > > file $s9crd <"$s9.crd">; > > > file $s9out <"$s9.out">; > > > file $s9done <"$s9donefile">; > > > > > > ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, gaff_rft, > > > rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, "$ss1", > > > "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1", > > "$rcut2"); > > > > > > so if I change the mapping of the needed output file ($s9wham), everything > > > should work? > > > > > > file whamfiles_$s[$i] <"$s9.wham">; > > > >That one won't work. > >You need to let Swift map whamfiles_$s[] to what it wants. So you can't > >map individual items in an array differently. > > > >I believe that you rely on the fact that whamfiles_xzy maps to the same > >file names as some other variables. This won't work. You need to use the > >same variable. The file names are irrelevant if the program doesn't make > >sense for Swift. > >So think about it this way: mentally remove all the mapper declarations > >from the Swift program. If after that, the program makes sense, then you > >should be good to go. If it doesn't then it's likely it won't work. > >Remember, mapping is not something that can be used to hack things > >because the workflow structure has nothing to do with the mappers and > >Swift ignores mappers when figuring out the data flow. > > > >(dependent mappers notwithstanding) > > > > > i=`expr $i + 1` > > > > > > and call the function: > > > (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, > > > gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, > > > "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", > > "$rcut1", > > > "$rcut2"); > > > > > > Nika > > > > > > At 06:12 PM 3/13/2007, Mihael Hategan wrote: > > > >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote: > > > > > ok, here is in short what I need to do: > > > > > > > > > > at some point in the workflow N files are produced (in my case its > > 68, but > > > > > it could be any number). These files produced each by a separate > > job (i.e. > > > > > N jobs produce N files). > > > > > The next job in the workflow needs to take those N files as an input. > > > > > > > > > > Question: how do I pass these unknown number of files as an input to an > > > > > application ? The array_mapper didn't work (or i didn't use it > > correctly). > > > > > > > >In this case you need some other kind of mapper that can deal with > > > >unknown numbers of items. The default mapper (i.e. specifying no mapper) > > > >should work. > > > > > > > >So you need to do: > > > > > > > >file whamfiles_002[]; > > > > > > > >foreach v,k in someinput { > > > > whamfiles_002[k] = job(v); > > > >} > > > > > > > >... = GENERATOR(whamfiles_002); > > > > > > > > > > > > > > Thanks! > > > > > > > > > > Nika > > > > > > > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote: > > > > > >On a third thought. This looks like, eventually, you are trying to do > > > > > >the same thing that Yong did with the dependent mappers earlier. I > > think > > > > > >he would have more insight on the topic. > > > > > > > > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > > > > > > > I think I am confused. Sorry! > > > > > > > what will be the type of 'whamfiles' ? If its a string - will > > the swift > > > > > > > know to brake it down to filenames and stage them all in ? > > > > > > > Also - is there a mapper (or whatever) that can map the list of > > > > *logical* > > > > > > > file names to an array ? (thats what I was trying to do). > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > > > > > > > >Oh my :) > > > > > > > >@whamfiles_m002 is known by the system at all times. That means > > > > > > > >GENERATOR does not need to wait for the actual files to be > > there since > > > > > > > >it knows very well what @whamfiles_m002 is (the list of names). > > > > > > > > > > > > > > > >You should try this instead: > > > > > > > >... > > > > > > > >... GENERATOR(whamfiles, str) { > > > > > > > > app { > > > > > > > > generator @whamfiles, str; > > > > > > > > } > > > > > > > >} > > > > > > > > > > > > > > > >... = GENERATOR(whamfiles_m002, "m002") > > > > > > > > > > > > > > > >Mihael > > > > > > > > > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > I have a question: > > > > > > > > > > > > > > > > > > I am using a fixed_array_mapper to pass some 68 files as an > > > > input to my > > > > > > > > > application called GENERATOR. I need to use the mapper > > since the > > > > > > number of > > > > > > > > > input files is unknown before the workflow starts. Here is > > how I > > > > > > use it: > > > > > > > > > file whamfiles_m002[] > > > > > solv_chg_a0_m002_wham, > > > > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, > > > files, > > > > > > you > > > > > > > > get > > > > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > > > > > > > > > > > > > These files are all generated by stage four of my workflow, > > each > > > > > > file is > > > > > > > > > mapped to a physical filename, for example: > > > > > > > > > > > > > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > > > > > > > and this particular file is produced this way: > > > > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, > > solv_chg_a0_m002_out, > > > > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, gaff_rft, > > > > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002, crd_eq_file_m002, > > > > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", > > "system:solv_m002", > > > > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", > > "stage:chg", > > > > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > > > > > > > > > > > > > Then I call my application (the last stage of my workflow, > > > > stage five) > > > > > > > > > > > > > > > > > > (solv_chg_m002, solv_disp_m002, > > solv_repu_0DOT2_0DOT3_m002DOTwham, > > > > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham, > > > > solv_repu_0DOT4_0DOT5_m002DOTwham, > > > > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham, > > > > solv_repu_0DOT6_0DOT7_m002DOTwham, > > > > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham, > > > > solv_repu_0DOT8_0DOT9_m002DOTwham, > > > > > > > > > solv_repu_0DOT9_1_m002DOTwham, > > solv_repu_0_0DOT2_m002DOTwham ) = > > > > > > GENERATOR > > > > > > > > > (@whamfiles_m002, "m002"); > > > > > > > > > > > > > > > > > > And then when I start my workflow, the GENERATOR starts right > > > > away. > > > > > > I am > > > > > > > > > not sure why. Does the mapper look for the physical files > > on the > > > > > > disk and > > > > > > > > > when finds them - starts right away ? I do have the needed > > > > files in the > > > > > > > > > directory from my previous runs. Or there is something else > > wrong > > > > > > here ? > > > > > > > > > > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > > > > > > > RunID: b0n2liektep92 > > > > > > > > > pre_ch started <---------- thats the first stage > > > > > > > > > generator_cat started <----------- not supposed to start > > now! > > > > > > > > > generator_cat started > > > > > > > > > > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > > > > > > > terminable.ci.uchicago, but its pretty big... > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Swift-devel mailing list > > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From nefedova at mcs.anl.gov Wed Mar 14 11:02:01 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 14 Mar 2007 11:02:01 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <1173885630.22830.18.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> <1173826562.21214.14.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313180235.04929090@mail.mcs.anl.gov> <1173827571.21499.3.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313181522.0491b840@mail.mcs.anl.gov> <1173829281.21674.9.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070314091135.04922d80@mail.mcs.anl.gov> <1173885630.22830.18.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070314102609.046ce4a0@mail.mcs.anl.gov> Hi, Mihael: please see my comments below... Thanks, Nika At 10:20 AM 3/14/2007, Mihael Hategan wrote: >On Wed, 2007-03-14 at 09:32 -0500, Veronika V. Nefedova wrote: > > Ok, now I think you hit the area in your explanations that I always had a > > problem with. So here is my understanding of things: > > > > if I have two apps that I need to chain together, I need to do this: > > > > file a <"a.txt">; > > file b <"b.txt">; > > file c <"c.txt">; > > a = APP1 (b); > > c = APP2 (a); > >Yep. But if you don't care about what file a is put in, you can skip >mapping it. Although I gather it doesn't change things by much: >file a; >file b <"b.txt">; >file c <"c.txt">; > a = APP1 (b); > c = APP2 (a); I do care about "a.txt", but I do not care about "a". Thats the main point. more below... > > > > I.e. the chaining of the programs happens on a 'logical' file level (a,b,c > > rather then a.txt, b.txt, c.txt). Is that a correct understanding? > >Yes. > > > I acted > > on this understanding and my workflow has been working fine (till now -- > > but thats another story). Having create all this logical files was a *big* > > pain (as I couldn't have the same logical names as physical filenames due > > to a different file naming conventions in swift: no multiple ".", etc). It > > really would've been much easier for my workflow to have just this: > > > > a1.txt = APP1 (b.txt); > > a2.txt = APP2 (b.txt); > > c.txt = APP3 (a1.txt, a2.txt); > > > > as my applications produce an enormous amount of intermediate files with > > some specific naming conventions. > >Swift needs to know about those. Any workflow system would need to know >about those. There is no way to automatically determine what set of >files an application invocation will need. It may be possible to >determine what set of files an application invocation produces (although >making it consistent may be difficult), but even in that case the matter >of distinguishing which of those are meaningful for your workflow is not >quite possible. I do not agree. You can specify the files that you need (intermediate or final) on the left side of the function call - exactly the way it is done now (but use the actual file names) : (a1.txt, a2.txt, a3.txt) = APP1 (b*.txt); where APP could be producing hundreds of a.txt files (a1.txt - a100.txt) and 10 c*.txt files (c1.txt -c10.txt). And only those 3 specified should be cared for. Or it could be done even this way: (a*.txt, c1.txt) = APP2 (b.txt); where I want to get all a*.txt files and only one c1.txt file Swift stages files just before the application starts. So it shouldn't affect the workflow system at all (to my understanding). Just the amount files that need to be staged in/out (alternatively, you can always zip all files together and have just one file staged in/out for any application). Anyway, I am not saying all this is easy -- just suggesting some alternatives to the current system that requires (in case of my application) some tedious filename operations... more below... > > > > Now back to my original problem - constructing and passing to my next > > application a collection of files. If I didn't have to do any mappers, it > > would've been just as easy as (for example): > > > > c.txt = APP3 (a*.txt); >There isn't much difference between that and c = APP3(a), where a is an >array. But I digress. Ok. But how do I construct that array in a clean way ? I thought that a fixed_array_mapper would do that for me (if I pass a string of logical filenames to it, shouldn't it create an array of files for me ?). Thats the main point - I can't construct an array of logical filenames and pass it to my application without re-writing the already-working code. Or I am just missing something - and an answer is a one line code change ? (; more below... > > > > Does it make sense at all ? > >Of course. >However, I'm not convinced how well it would work, for reasons outlined >above. >So there is a number of operations and certain dependencies between >them, where the operations are job submissions and file transfers (let's >abstract low level, technology dependent things for now). These need to >happen and are the result of executing a workflow (regardless of how >it's expressed). They represent the application that you are trying to >implement. If there is a way to infer all those operations and all the >dependencies from your specification model, then it would be ok. So in >the context of exploring a different way of expressing things, it would >be helpful to have a clear illustration of both of them and the rules >that get you from one to the other. It does sound like a good topic for the discussion! (-; Thanks again, Nika >Mihael > > > > > Thanks, > > > > Nika > > > > > > > > At 06:41 PM 3/13/2007, Mihael Hategan wrote: > > >On Tue, 2007-03-13 at 18:28 -0500, Veronika V. Nefedova wrote: > > > > Hmmm. So here is how my files are produced (inside double loop over > $s and > > > > $name): > > > > > > > > file $s9prt <"$name.prt">; > > > > file $s9wham <"$s9.wham">; > > > > file $s9crd <"$s9.crd">; > > > > file $s9out <"$s9.out">; > > > > file $s9done <"$s9donefile">; > > > > > > > > ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, > gaff_rft, > > > > rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, "$ss1", > > > > "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1", > > > "$rcut2"); > > > > > > > > so if I change the mapping of the needed output file ($s9wham), > everything > > > > should work? > > > > > > > > file whamfiles_$s[$i] <"$s9.wham">; > > > > > >That one won't work. > > >You need to let Swift map whamfiles_$s[] to what it wants. So you can't > > >map individual items in an array differently. > > > > > >I believe that you rely on the fact that whamfiles_xzy maps to the same > > >file names as some other variables. This won't work. You need to use the > > >same variable. The file names are irrelevant if the program doesn't make > > >sense for Swift. > > >So think about it this way: mentally remove all the mapper declarations > > >from the Swift program. If after that, the program makes sense, then you > > >should be good to go. If it doesn't then it's likely it won't work. > > >Remember, mapping is not something that can be used to hack things > > >because the workflow structure has nothing to do with the mappers and > > >Swift ignores mappers when figuring out the data flow. > > > > > >(dependent mappers notwithstanding) > > > > > > > i=`expr $i + 1` > > > > > > > > and call the function: > > > > (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn, > gaff_prm, > > > > gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, > $s9prt, > > > > "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", > > > "$rcut1", > > > > "$rcut2"); > > > > > > > > Nika > > > > > > > > At 06:12 PM 3/13/2007, Mihael Hategan wrote: > > > > >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote: > > > > > > ok, here is in short what I need to do: > > > > > > > > > > > > at some point in the workflow N files are produced (in my case its > > > 68, but > > > > > > it could be any number). These files produced each by a separate > > > job (i.e. > > > > > > N jobs produce N files). > > > > > > The next job in the workflow needs to take those N files as an > input. > > > > > > > > > > > > Question: how do I pass these unknown number of files as an > input to an > > > > > > application ? The array_mapper didn't work (or i didn't use it > > > correctly). > > > > > > > > > >In this case you need some other kind of mapper that can deal with > > > > >unknown numbers of items. The default mapper (i.e. specifying no > mapper) > > > > >should work. > > > > > > > > > >So you need to do: > > > > > > > > > >file whamfiles_002[]; > > > > > > > > > >foreach v,k in someinput { > > > > > whamfiles_002[k] = job(v); > > > > >} > > > > > > > > > >... = GENERATOR(whamfiles_002); > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > Nika > > > > > > > > > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote: > > > > > > >On a third thought. This looks like, eventually, you are > trying to do > > > > > > >the same thing that Yong did with the dependent mappers > earlier. I > > > think > > > > > > >he would have more insight on the topic. > > > > > > > > > > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > > > > > > > > I think I am confused. Sorry! > > > > > > > > what will be the type of 'whamfiles' ? If its a string - will > > > the swift > > > > > > > > know to brake it down to filenames and stage them all in ? > > > > > > > > Also - is there a mapper (or whatever) that can map the list of > > > > > *logical* > > > > > > > > file names to an array ? (thats what I was trying to do). > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > > > > > > > > >Oh my :) > > > > > > > > >@whamfiles_m002 is known by the system at all times. That > means > > > > > > > > >GENERATOR does not need to wait for the actual files to be > > > there since > > > > > > > > >it knows very well what @whamfiles_m002 is (the list of > names). > > > > > > > > > > > > > > > > > >You should try this instead: > > > > > > > > >... > > > > > > > > >... GENERATOR(whamfiles, str) { > > > > > > > > > app { > > > > > > > > > generator @whamfiles, str; > > > > > > > > > } > > > > > > > > >} > > > > > > > > > > > > > > > > > >... = GENERATOR(whamfiles_m002, "m002") > > > > > > > > > > > > > > > > > >Mihael > > > > > > > > > > > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > I have a question: > > > > > > > > > > > > > > > > > > > > I am using a fixed_array_mapper to pass some 68 files as an > > > > > input to my > > > > > > > > > > application called GENERATOR. I need to use the mapper > > > since the > > > > > > > number of > > > > > > > > > > input files is unknown before the workflow starts. Here is > > > how I > > > > > > > use it: > > > > > > > > > > file whamfiles_m002[] > > > > > > solv_chg_a0_m002_wham, > > > > > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, many > > > > > files, > > > > > > > you > > > > > > > > > get > > > > > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > > > > > > > > > > > > > > > These files are all generated by stage four of my > workflow, > > > each > > > > > > > file is > > > > > > > > > > mapped to a physical filename, for example: > > > > > > > > > > > > > > > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > > > > > > > > and this particular file is produced this way: > > > > > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, > > > solv_chg_a0_m002_out, > > > > > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, > gaff_rft, > > > > > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002, > crd_eq_file_m002, > > > > > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", > > > "system:solv_m002", > > > > > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > > > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", > > > "stage:chg", > > > > > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > > > > > > > > > > > > > > > Then I call my application (the last stage of my workflow, > > > > > stage five) > > > > > > > > > > > > > > > > > > > > (solv_chg_m002, solv_disp_m002, > > > solv_repu_0DOT2_0DOT3_m002DOTwham, > > > > > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham, > > > > > solv_repu_0DOT4_0DOT5_m002DOTwham, > > > > > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham, > > > > > solv_repu_0DOT6_0DOT7_m002DOTwham, > > > > > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham, > > > > > solv_repu_0DOT8_0DOT9_m002DOTwham, > > > > > > > > > > solv_repu_0DOT9_1_m002DOTwham, > > > solv_repu_0_0DOT2_m002DOTwham ) = > > > > > > > GENERATOR > > > > > > > > > > (@whamfiles_m002, "m002"); > > > > > > > > > > > > > > > > > > > > And then when I start my workflow, the GENERATOR starts > right > > > > > away. > > > > > > > I am > > > > > > > > > > not sure why. Does the mapper look for the physical files > > > on the > > > > > > > disk and > > > > > > > > > > when finds them - starts right away ? I do have the needed > > > > > files in the > > > > > > > > > > directory from my previous runs. Or there is something > else > > > wrong > > > > > > > here ? > > > > > > > > > > > > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > > > > > > > > RunID: b0n2liektep92 > > > > > > > > > > pre_ch started <---------- thats the first > stage > > > > > > > > > > generator_cat started <----------- not supposed to > start > > > now! > > > > > > > > > > generator_cat started > > > > > > > > > > > > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > > > > > > > > terminable.ci.uchicago, but its pretty big... > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > Swift-devel mailing list > > > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From hategan at mcs.anl.gov Wed Mar 14 11:20:12 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 14 Mar 2007 11:20:12 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <6.0.0.22.2.20070314102609.046ce4a0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> <1173826562.21214.14.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313180235.04929090@mail.mcs.anl.gov> <1173827571.21499.3.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313181522.0491b840@mail.mcs.anl.gov> <1173829281.21674.9.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070314091135.04922d80@mail.mcs.anl.gov> <1173885630.22830.18.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070314102609.046ce4a0@mail.mcs.anl.gov> Message-ID: <1173889213.23052.6.camel@blabla.mcs.anl.gov> Ok, I see what you're saying. You're not suggesting "hiding" of dependencies. I guess it could be possible to come up with some syntatic sugar. We would then consider data like that to be singletons. Example: <"a.txt"> = APP1(<"b.txt">); <"c.txt"> = APP2(<"a.txt">); On Wed, 2007-03-14 at 11:02 -0500, Veronika V. Nefedova wrote: > Hi, Mihael: > > please see my comments below... > > Thanks, > > Nika > > At 10:20 AM 3/14/2007, Mihael Hategan wrote: > >On Wed, 2007-03-14 at 09:32 -0500, Veronika V. Nefedova wrote: > > > Ok, now I think you hit the area in your explanations that I always had a > > > problem with. So here is my understanding of things: > > > > > > if I have two apps that I need to chain together, I need to do this: > > > > > > file a <"a.txt">; > > > file b <"b.txt">; > > > file c <"c.txt">; > > > a = APP1 (b); > > > c = APP2 (a); > > > >Yep. But if you don't care about what file a is put in, you can skip > >mapping it. Although I gather it doesn't change things by much: > >file a; > >file b <"b.txt">; > >file c <"c.txt">; > > a = APP1 (b); > > c = APP2 (a); > > I do care about "a.txt", but I do not care about "a". Thats the main point. > > more below... > > > > > > > I.e. the chaining of the programs happens on a 'logical' file level (a,b,c > > > rather then a.txt, b.txt, c.txt). Is that a correct understanding? > > > >Yes. > > > > > I acted > > > on this understanding and my workflow has been working fine (till now -- > > > but thats another story). Having create all this logical files was a *big* > > > pain (as I couldn't have the same logical names as physical filenames due > > > to a different file naming conventions in swift: no multiple ".", etc). It > > > really would've been much easier for my workflow to have just this: > > > > > > a1.txt = APP1 (b.txt); > > > a2.txt = APP2 (b.txt); > > > c.txt = APP3 (a1.txt, a2.txt); > > > > > > as my applications produce an enormous amount of intermediate files with > > > some specific naming conventions. > > > >Swift needs to know about those. Any workflow system would need to know > >about those. There is no way to automatically determine what set of > >files an application invocation will need. It may be possible to > >determine what set of files an application invocation produces (although > >making it consistent may be difficult), but even in that case the matter > >of distinguishing which of those are meaningful for your workflow is not > >quite possible. > > I do not agree. You can specify the files that you need (intermediate or > final) on the left side of the function call - exactly the way it is done > now (but use the actual file names) : > > (a1.txt, a2.txt, a3.txt) = APP1 (b*.txt); > > where APP could be producing hundreds of a.txt files (a1.txt - a100.txt) > and 10 c*.txt files (c1.txt -c10.txt). And only those 3 specified should be > cared for. Or it could be done even this way: > > (a*.txt, c1.txt) = APP2 (b.txt); where I want to get all a*.txt files and > only one c1.txt file > > Swift stages files just before the application starts. So it shouldn't > affect the workflow system at all (to my understanding). Just the amount > files that need to be staged in/out (alternatively, you can always zip all > files together and have just one file staged in/out for any application). > > Anyway, I am not saying all this is easy -- just suggesting some > alternatives to the current system that requires (in case of my > application) some tedious filename operations... > > more below... > > > > > > > Now back to my original problem - constructing and passing to my next > > > application a collection of files. If I didn't have to do any mappers, it > > > would've been just as easy as (for example): > > > > > > c.txt = APP3 (a*.txt); > >There isn't much difference between that and c = APP3(a), where a is an > >array. But I digress. > > > Ok. But how do I construct that array in a clean way ? I thought that a > fixed_array_mapper would do that for me (if I pass a string of logical > filenames to it, shouldn't it create an array of files for me ?). Thats the > main point - I can't construct an array of logical filenames and pass it to > my application without re-writing the already-working code. Or I am just > missing something - and an answer is a one line code change ? (; > > more below... > > > > > > > Does it make sense at all ? > > > >Of course. > >However, I'm not convinced how well it would work, for reasons outlined > >above. > >So there is a number of operations and certain dependencies between > >them, where the operations are job submissions and file transfers (let's > >abstract low level, technology dependent things for now). These need to > >happen and are the result of executing a workflow (regardless of how > >it's expressed). They represent the application that you are trying to > >implement. If there is a way to infer all those operations and all the > >dependencies from your specification model, then it would be ok. So in > >the context of exploring a different way of expressing things, it would > >be helpful to have a clear illustration of both of them and the rules > >that get you from one to the other. > > It does sound like a good topic for the discussion! (-; > > Thanks again, > > Nika > > > >Mihael > > > > > > > > Thanks, > > > > > > Nika > > > > > > > > > > > > At 06:41 PM 3/13/2007, Mihael Hategan wrote: > > > >On Tue, 2007-03-13 at 18:28 -0500, Veronika V. Nefedova wrote: > > > > > Hmmm. So here is how my files are produced (inside double loop over > > $s and > > > > > $name): > > > > > > > > > > file $s9prt <"$name.prt">; > > > > > file $s9wham <"$s9.wham">; > > > > > file $s9crd <"$s9.crd">; > > > > > file $s9out <"$s9.out">; > > > > > file $s9done <"$s9donefile">; > > > > > > > > > > ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, > > gaff_rft, > > > > > rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, "$ss1", > > > > > "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1", > > > > "$rcut2"); > > > > > > > > > > so if I change the mapping of the needed output file ($s9wham), > > everything > > > > > should work? > > > > > > > > > > file whamfiles_$s[$i] <"$s9.wham">; > > > > > > > >That one won't work. > > > >You need to let Swift map whamfiles_$s[] to what it wants. So you can't > > > >map individual items in an array differently. > > > > > > > >I believe that you rely on the fact that whamfiles_xzy maps to the same > > > >file names as some other variables. This won't work. You need to use the > > > >same variable. The file names are irrelevant if the program doesn't make > > > >sense for Swift. > > > >So think about it this way: mentally remove all the mapper declarations > > > >from the Swift program. If after that, the program makes sense, then you > > > >should be good to go. If it doesn't then it's likely it won't work. > > > >Remember, mapping is not something that can be used to hack things > > > >because the workflow structure has nothing to do with the mappers and > > > >Swift ignores mappers when figuring out the data flow. > > > > > > > >(dependent mappers notwithstanding) > > > > > > > > > i=`expr $i + 1` > > > > > > > > > > and call the function: > > > > > (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn, > > gaff_prm, > > > > > gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, > > $s9prt, > > > > > "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", > > > > "$rcut1", > > > > > "$rcut2"); > > > > > > > > > > Nika > > > > > > > > > > At 06:12 PM 3/13/2007, Mihael Hategan wrote: > > > > > >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote: > > > > > > > ok, here is in short what I need to do: > > > > > > > > > > > > > > at some point in the workflow N files are produced (in my case its > > > > 68, but > > > > > > > it could be any number). These files produced each by a separate > > > > job (i.e. > > > > > > > N jobs produce N files). > > > > > > > The next job in the workflow needs to take those N files as an > > input. > > > > > > > > > > > > > > Question: how do I pass these unknown number of files as an > > input to an > > > > > > > application ? The array_mapper didn't work (or i didn't use it > > > > correctly). > > > > > > > > > > > >In this case you need some other kind of mapper that can deal with > > > > > >unknown numbers of items. The default mapper (i.e. specifying no > > mapper) > > > > > >should work. > > > > > > > > > > > >So you need to do: > > > > > > > > > > > >file whamfiles_002[]; > > > > > > > > > > > >foreach v,k in someinput { > > > > > > whamfiles_002[k] = job(v); > > > > > >} > > > > > > > > > > > >... = GENERATOR(whamfiles_002); > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote: > > > > > > > >On a third thought. This looks like, eventually, you are > > trying to do > > > > > > > >the same thing that Yong did with the dependent mappers > > earlier. I > > > > think > > > > > > > >he would have more insight on the topic. > > > > > > > > > > > > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > > > > > > > > > I think I am confused. Sorry! > > > > > > > > > what will be the type of 'whamfiles' ? If its a string - will > > > > the swift > > > > > > > > > know to brake it down to filenames and stage them all in ? > > > > > > > > > Also - is there a mapper (or whatever) that can map the list of > > > > > > *logical* > > > > > > > > > file names to an array ? (thats what I was trying to do). > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > > > > > > > > > >Oh my :) > > > > > > > > > >@whamfiles_m002 is known by the system at all times. That > > means > > > > > > > > > >GENERATOR does not need to wait for the actual files to be > > > > there since > > > > > > > > > >it knows very well what @whamfiles_m002 is (the list of > > names). > > > > > > > > > > > > > > > > > > > >You should try this instead: > > > > > > > > > >... > > > > > > > > > >... GENERATOR(whamfiles, str) { > > > > > > > > > > app { > > > > > > > > > > generator @whamfiles, str; > > > > > > > > > > } > > > > > > > > > >} > > > > > > > > > > > > > > > > > > > >... = GENERATOR(whamfiles_m002, "m002") > > > > > > > > > > > > > > > > > > > >Mihael > > > > > > > > > > > > > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. Nefedova wrote: > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > I have a question: > > > > > > > > > > > > > > > > > > > > > > I am using a fixed_array_mapper to pass some 68 files as an > > > > > > input to my > > > > > > > > > > > application called GENERATOR. I need to use the mapper > > > > since the > > > > > > > > number of > > > > > > > > > > > input files is unknown before the workflow starts. Here is > > > > how I > > > > > > > > use it: > > > > > > > > > > > file whamfiles_m002[] > > > > > > > solv_chg_a0_m002_wham, > > > > > > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, > many > > > > > > files, > > > > > > > > you > > > > > > > > > > get > > > > > > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > > > > > > > > > > > > > > > > > These files are all generated by stage four of my > > workflow, > > > > each > > > > > > > > file is > > > > > > > > > > > mapped to a physical filename, for example: > > > > > > > > > > > > > > > > > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > > > > > > > > > and this particular file is produced this way: > > > > > > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, > > > > solv_chg_a0_m002_out, > > > > > > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, > > gaff_rft, > > > > > > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002, > > crd_eq_file_m002, > > > > > > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", > > > > "system:solv_m002", > > > > > > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > > > > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", > > > > "stage:chg", > > > > > > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > > > > > > > > > > > > > > > > > Then I call my application (the last stage of my workflow, > > > > > > stage five) > > > > > > > > > > > > > > > > > > > > > > (solv_chg_m002, solv_disp_m002, > > > > solv_repu_0DOT2_0DOT3_m002DOTwham, > > > > > > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham, > > > > > > solv_repu_0DOT4_0DOT5_m002DOTwham, > > > > > > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham, > > > > > > solv_repu_0DOT6_0DOT7_m002DOTwham, > > > > > > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham, > > > > > > solv_repu_0DOT8_0DOT9_m002DOTwham, > > > > > > > > > > > solv_repu_0DOT9_1_m002DOTwham, > > > > solv_repu_0_0DOT2_m002DOTwham ) = > > > > > > > > GENERATOR > > > > > > > > > > > (@whamfiles_m002, "m002"); > > > > > > > > > > > > > > > > > > > > > > And then when I start my workflow, the GENERATOR starts > > right > > > > > > away. > > > > > > > > I am > > > > > > > > > > > not sure why. Does the mapper look for the physical files > > > > on the > > > > > > > > disk and > > > > > > > > > > > when finds them - starts right away ? I do have the needed > > > > > > files in the > > > > > > > > > > > directory from my previous runs. Or there is something > > else > > > > wrong > > > > > > > > here ? > > > > > > > > > > > > > > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > > > > > > > > > RunID: b0n2liektep92 > > > > > > > > > > > pre_ch started <---------- thats the first > > stage > > > > > > > > > > > generator_cat started <----------- not supposed to > > start > > > > now! > > > > > > > > > > > generator_cat started > > > > > > > > > > > > > > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > > > > > > > > > terminable.ci.uchicago, but its pretty big... > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > Swift-devel mailing list > > > > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From nefedova at mcs.anl.gov Wed Mar 14 11:31:35 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 14 Mar 2007 11:31:35 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <1173889213.23052.6.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> <1173826562.21214.14.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313180235.04929090@mail.mcs.anl.gov> <1173827571.21499.3.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313181522.0491b840@mail.mcs.anl.gov> <1173829281.21674.9.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070314091135.04922d80@mail.mcs.anl.gov> <1173885630.22830.18.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070314102609.046ce4a0@mail.mcs.anl.gov> <1173889213.23052.6.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070314112719.0492eba0@mail.mcs.anl.gov> You got it! I also would like to see this happen, for example: (a*.txt, c1.txt) = APP2 (b.txt); and (s.txt) = APP3 (a*.txt); (and the like) Then one won't need to worry about any mappers (; Thanks! Nika At 11:20 AM 3/14/2007, Mihael Hategan wrote: >Ok, I see what you're saying. You're not suggesting "hiding" of >dependencies. > >I guess it could be possible to come up with some syntatic sugar. We >would then consider data like that to be singletons. >Example: ><"a.txt"> = APP1(<"b.txt">); ><"c.txt"> = APP2(<"a.txt">); > >On Wed, 2007-03-14 at 11:02 -0500, Veronika V. Nefedova wrote: > > Hi, Mihael: > > > > please see my comments below... > > > > Thanks, > > > > Nika > > > > At 10:20 AM 3/14/2007, Mihael Hategan wrote: > > >On Wed, 2007-03-14 at 09:32 -0500, Veronika V. Nefedova wrote: > > > > Ok, now I think you hit the area in your explanations that I always > had a > > > > problem with. So here is my understanding of things: > > > > > > > > if I have two apps that I need to chain together, I need to do this: > > > > > > > > file a <"a.txt">; > > > > file b <"b.txt">; > > > > file c <"c.txt">; > > > > a = APP1 (b); > > > > c = APP2 (a); > > > > > >Yep. But if you don't care about what file a is put in, you can skip > > >mapping it. Although I gather it doesn't change things by much: > > >file a; > > >file b <"b.txt">; > > >file c <"c.txt">; > > > a = APP1 (b); > > > c = APP2 (a); > > > > I do care about "a.txt", but I do not care about "a". Thats the main point. > > > > more below... > > > > > > > > > > I.e. the chaining of the programs happens on a 'logical' file level > (a,b,c > > > > rather then a.txt, b.txt, c.txt). Is that a correct understanding? > > > > > >Yes. > > > > > > > I acted > > > > on this understanding and my workflow has been working fine (till > now -- > > > > but thats another story). Having create all this logical files was > a *big* > > > > pain (as I couldn't have the same logical names as physical > filenames due > > > > to a different file naming conventions in swift: no multiple ".", > etc). It > > > > really would've been much easier for my workflow to have just this: > > > > > > > > a1.txt = APP1 (b.txt); > > > > a2.txt = APP2 (b.txt); > > > > c.txt = APP3 (a1.txt, a2.txt); > > > > > > > > as my applications produce an enormous amount of intermediate files > with > > > > some specific naming conventions. > > > > > >Swift needs to know about those. Any workflow system would need to know > > >about those. There is no way to automatically determine what set of > > >files an application invocation will need. It may be possible to > > >determine what set of files an application invocation produces (although > > >making it consistent may be difficult), but even in that case the matter > > >of distinguishing which of those are meaningful for your workflow is not > > >quite possible. > > > > I do not agree. You can specify the files that you need (intermediate or > > final) on the left side of the function call - exactly the way it is done > > now (but use the actual file names) : > > > > (a1.txt, a2.txt, a3.txt) = APP1 (b*.txt); > > > > where APP could be producing hundreds of a.txt files (a1.txt - a100.txt) > > and 10 c*.txt files (c1.txt -c10.txt). And only those 3 specified > should be > > cared for. Or it could be done even this way: > > > > (a*.txt, c1.txt) = APP2 (b.txt); where I want to get all a*.txt files and > > only one c1.txt file > > > > Swift stages files just before the application starts. So it shouldn't > > affect the workflow system at all (to my understanding). Just the amount > > files that need to be staged in/out (alternatively, you can always zip all > > files together and have just one file staged in/out for any application). > > > > Anyway, I am not saying all this is easy -- just suggesting some > > alternatives to the current system that requires (in case of my > > application) some tedious filename operations... > > > > more below... > > > > > > > > > > Now back to my original problem - constructing and passing to my next > > > > application a collection of files. If I didn't have to do any > mappers, it > > > > would've been just as easy as (for example): > > > > > > > > c.txt = APP3 (a*.txt); > > >There isn't much difference between that and c = APP3(a), where a is an > > >array. But I digress. > > > > > > Ok. But how do I construct that array in a clean way ? I thought that a > > fixed_array_mapper would do that for me (if I pass a string of logical > > filenames to it, shouldn't it create an array of files for me ?). Thats > the > > main point - I can't construct an array of logical filenames and pass > it to > > my application without re-writing the already-working code. Or I am just > > missing something - and an answer is a one line code change ? (; > > > > more below... > > > > > > > > > > Does it make sense at all ? > > > > > >Of course. > > >However, I'm not convinced how well it would work, for reasons outlined > > >above. > > >So there is a number of operations and certain dependencies between > > >them, where the operations are job submissions and file transfers (let's > > >abstract low level, technology dependent things for now). These need to > > >happen and are the result of executing a workflow (regardless of how > > >it's expressed). They represent the application that you are trying to > > >implement. If there is a way to infer all those operations and all the > > >dependencies from your specification model, then it would be ok. So in > > >the context of exploring a different way of expressing things, it would > > >be helpful to have a clear illustration of both of them and the rules > > >that get you from one to the other. > > > > It does sound like a good topic for the discussion! (-; > > > > Thanks again, > > > > Nika > > > > > > >Mihael > > > > > > > > > > > Thanks, > > > > > > > > Nika > > > > > > > > > > > > > > > > At 06:41 PM 3/13/2007, Mihael Hategan wrote: > > > > >On Tue, 2007-03-13 at 18:28 -0500, Veronika V. Nefedova wrote: > > > > > > Hmmm. So here is how my files are produced (inside double loop > over > > > $s and > > > > > > $name): > > > > > > > > > > > > file $s9prt <"$name.prt">; > > > > > > file $s9wham <"$s9.wham">; > > > > > > file $s9crd <"$s9.crd">; > > > > > > file $s9out <"$s9.out">; > > > > > > file $s9done <"$s9donefile">; > > > > > > > > > > > > ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, > > > gaff_rft, > > > > > > rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, > "$ss1", > > > > > > "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1", > > > > > "$rcut2"); > > > > > > > > > > > > so if I change the mapping of the needed output file ($s9wham), > > > everything > > > > > > should work? > > > > > > > > > > > > file whamfiles_$s[$i] <"$s9.wham">; > > > > > > > > > >That one won't work. > > > > >You need to let Swift map whamfiles_$s[] to what it wants. So you > can't > > > > >map individual items in an array differently. > > > > > > > > > >I believe that you rely on the fact that whamfiles_xzy maps to the > same > > > > >file names as some other variables. This won't work. You need to > use the > > > > >same variable. The file names are irrelevant if the program > doesn't make > > > > >sense for Swift. > > > > >So think about it this way: mentally remove all the mapper > declarations > > > > >from the Swift program. If after that, the program makes sense, > then you > > > > >should be good to go. If it doesn't then it's likely it won't work. > > > > >Remember, mapping is not something that can be used to hack things > > > > >because the workflow structure has nothing to do with the mappers and > > > > >Swift ignores mappers when figuring out the data flow. > > > > > > > > > >(dependent mappers notwithstanding) > > > > > > > > > > > i=`expr $i + 1` > > > > > > > > > > > > and call the function: > > > > > > (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn, > > > gaff_prm, > > > > > > gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, > > > $s9prt, > > > > > > "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", > > > > > "$rcut1", > > > > > > "$rcut2"); > > > > > > > > > > > > Nika > > > > > > > > > > > > At 06:12 PM 3/13/2007, Mihael Hategan wrote: > > > > > > >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote: > > > > > > > > ok, here is in short what I need to do: > > > > > > > > > > > > > > > > at some point in the workflow N files are produced (in my > case its > > > > > 68, but > > > > > > > > it could be any number). These files produced each by a > separate > > > > > job (i.e. > > > > > > > > N jobs produce N files). > > > > > > > > The next job in the workflow needs to take those N files as an > > > input. > > > > > > > > > > > > > > > > Question: how do I pass these unknown number of files as an > > > input to an > > > > > > > > application ? The array_mapper didn't work (or i didn't use it > > > > > correctly). > > > > > > > > > > > > > >In this case you need some other kind of mapper that can deal with > > > > > > >unknown numbers of items. The default mapper (i.e. specifying no > > > mapper) > > > > > > >should work. > > > > > > > > > > > > > >So you need to do: > > > > > > > > > > > > > >file whamfiles_002[]; > > > > > > > > > > > > > >foreach v,k in someinput { > > > > > > > whamfiles_002[k] = job(v); > > > > > > >} > > > > > > > > > > > > > >... = GENERATOR(whamfiles_002); > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote: > > > > > > > > >On a third thought. This looks like, eventually, you are > > > trying to do > > > > > > > > >the same thing that Yong did with the dependent mappers > > > earlier. I > > > > > think > > > > > > > > >he would have more insight on the topic. > > > > > > > > > > > > > > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > > > > > > > > > > I think I am confused. Sorry! > > > > > > > > > > what will be the type of 'whamfiles' ? If its a string > - will > > > > > the swift > > > > > > > > > > know to brake it down to filenames and stage them all in ? > > > > > > > > > > Also - is there a mapper (or whatever) that can map the > list of > > > > > > > *logical* > > > > > > > > > > file names to an array ? (thats what I was trying to do). > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > > > > > > > > > > >Oh my :) > > > > > > > > > > >@whamfiles_m002 is known by the system at all times. That > > > means > > > > > > > > > > >GENERATOR does not need to wait for the actual files to be > > > > > there since > > > > > > > > > > >it knows very well what @whamfiles_m002 is (the list of > > > names). > > > > > > > > > > > > > > > > > > > > > >You should try this instead: > > > > > > > > > > >... > > > > > > > > > > >... GENERATOR(whamfiles, str) { > > > > > > > > > > > app { > > > > > > > > > > > generator @whamfiles, str; > > > > > > > > > > > } > > > > > > > > > > >} > > > > > > > > > > > > > > > > > > > > > >... = GENERATOR(whamfiles_m002, "m002") > > > > > > > > > > > > > > > > > > > > > >Mihael > > > > > > > > > > > > > > > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. > Nefedova wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > I have a question: > > > > > > > > > > > > > > > > > > > > > > > > I am using a fixed_array_mapper to pass some 68 > files as an > > > > > > > input to my > > > > > > > > > > > > application called GENERATOR. I need to use the mapper > > > > > since the > > > > > > > > > number of > > > > > > > > > > > > input files is unknown before the workflow starts. > Here is > > > > > how I > > > > > > > > > use it: > > > > > > > > > > > > file whamfiles_m002[] > > > > > > > > solv_chg_a0_m002_wham, > > > > > > > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, > > > many > > > > > > > files, > > > > > > > > > you > > > > > > > > > > > get > > > > > > > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > > > > > > > > > > > > > > > > > > > These files are all generated by stage four of my > > > workflow, > > > > > each > > > > > > > > > file is > > > > > > > > > > > > mapped to a physical filename, for example: > > > > > > > > > > > > > > > > > > > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > > > > > > > > > > and this particular file is produced this way: > > > > > > > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, > > > > > solv_chg_a0_m002_out, > > > > > > > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, > > > gaff_rft, > > > > > > > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002, > > > crd_eq_file_m002, > > > > > > > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", > > > > > "system:solv_m002", > > > > > > > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > > > > > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", > > > > > "stage:chg", > > > > > > > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > > > > > > > > > > > > > > > > > > > Then I call my application (the last stage of my > workflow, > > > > > > > stage five) > > > > > > > > > > > > > > > > > > > > > > > > (solv_chg_m002, solv_disp_m002, > > > > > solv_repu_0DOT2_0DOT3_m002DOTwham, > > > > > > > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham, > > > > > > > solv_repu_0DOT4_0DOT5_m002DOTwham, > > > > > > > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham, > > > > > > > solv_repu_0DOT6_0DOT7_m002DOTwham, > > > > > > > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham, > > > > > > > solv_repu_0DOT8_0DOT9_m002DOTwham, > > > > > > > > > > > > solv_repu_0DOT9_1_m002DOTwham, > > > > > solv_repu_0_0DOT2_m002DOTwham ) = > > > > > > > > > GENERATOR > > > > > > > > > > > > (@whamfiles_m002, "m002"); > > > > > > > > > > > > > > > > > > > > > > > > And then when I start my workflow, the GENERATOR > starts > > > right > > > > > > > away. > > > > > > > > > I am > > > > > > > > > > > > not sure why. Does the mapper look for the physical > files > > > > > on the > > > > > > > > > disk and > > > > > > > > > > > > when finds them - starts right away ? I do have the > needed > > > > > > > files in the > > > > > > > > > > > > directory from my previous runs. Or there is something > > > else > > > > > wrong > > > > > > > > > here ? > > > > > > > > > > > > > > > > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > > > > > > > > > > RunID: b0n2liektep92 > > > > > > > > > > > > pre_ch started <---------- thats the > first > > > stage > > > > > > > > > > > > generator_cat started <----------- not supposed to > > > start > > > > > now! > > > > > > > > > > > > generator_cat started > > > > > > > > > > > > > > > > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > > > > > > > > > > terminable.ci.uchicago, but its pretty big... > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > Swift-devel mailing list > > > > > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From hategan at mcs.anl.gov Wed Mar 14 11:47:06 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 14 Mar 2007 11:47:06 -0500 Subject: [Swift-devel] mapper problem or ...? In-Reply-To: <6.0.0.22.2.20070314112719.0492eba0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070313160831.046eaec0@mail.mcs.anl.gov> <1173822852.20823.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313171824.0493cec0@mail.mcs.anl.gov> <1173826562.21214.14.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313180235.04929090@mail.mcs.anl.gov> <1173827571.21499.3.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070313181522.0491b840@mail.mcs.anl.gov> <1173829281.21674.9.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070314091135.04922d80@mail.mcs.anl.gov> <1173885630.22830.18.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070314102609.046ce4a0@mail.mcs.anl.gov> <1173889213.23052.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070314112719.0492eba0@mail.mcs.anl.gov> Message-ID: <1173890826.23146.5.camel@blabla.mcs.anl.gov> On Wed, 2007-03-14 at 11:31 -0500, Veronika V. Nefedova wrote: > You got it! > I also would like to see this happen, for example: > (a*.txt, c1.txt) = APP2 (b.txt); > and > (s.txt) = APP3 (a*.txt); > (and the like) That's slightly different. I think we need to digest these issues. > > Then one won't need to worry about any mappers (; > > Thanks! > > Nika > > > At 11:20 AM 3/14/2007, Mihael Hategan wrote: > >Ok, I see what you're saying. You're not suggesting "hiding" of > >dependencies. > > > >I guess it could be possible to come up with some syntatic sugar. We > >would then consider data like that to be singletons. > >Example: > ><"a.txt"> = APP1(<"b.txt">); > ><"c.txt"> = APP2(<"a.txt">); > > > >On Wed, 2007-03-14 at 11:02 -0500, Veronika V. Nefedova wrote: > > > Hi, Mihael: > > > > > > please see my comments below... > > > > > > Thanks, > > > > > > Nika > > > > > > At 10:20 AM 3/14/2007, Mihael Hategan wrote: > > > >On Wed, 2007-03-14 at 09:32 -0500, Veronika V. Nefedova wrote: > > > > > Ok, now I think you hit the area in your explanations that I always > > had a > > > > > problem with. So here is my understanding of things: > > > > > > > > > > if I have two apps that I need to chain together, I need to do this: > > > > > > > > > > file a <"a.txt">; > > > > > file b <"b.txt">; > > > > > file c <"c.txt">; > > > > > a = APP1 (b); > > > > > c = APP2 (a); > > > > > > > >Yep. But if you don't care about what file a is put in, you can skip > > > >mapping it. Although I gather it doesn't change things by much: > > > >file a; > > > >file b <"b.txt">; > > > >file c <"c.txt">; > > > > a = APP1 (b); > > > > c = APP2 (a); > > > > > > I do care about "a.txt", but I do not care about "a". Thats the main point. > > > > > > more below... > > > > > > > > > > > > > I.e. the chaining of the programs happens on a 'logical' file level > > (a,b,c > > > > > rather then a.txt, b.txt, c.txt). Is that a correct understanding? > > > > > > > >Yes. > > > > > > > > > I acted > > > > > on this understanding and my workflow has been working fine (till > > now -- > > > > > but thats another story). Having create all this logical files was > > a *big* > > > > > pain (as I couldn't have the same logical names as physical > > filenames due > > > > > to a different file naming conventions in swift: no multiple ".", > > etc). It > > > > > really would've been much easier for my workflow to have just this: > > > > > > > > > > a1.txt = APP1 (b.txt); > > > > > a2.txt = APP2 (b.txt); > > > > > c.txt = APP3 (a1.txt, a2.txt); > > > > > > > > > > as my applications produce an enormous amount of intermediate files > > with > > > > > some specific naming conventions. > > > > > > > >Swift needs to know about those. Any workflow system would need to know > > > >about those. There is no way to automatically determine what set of > > > >files an application invocation will need. It may be possible to > > > >determine what set of files an application invocation produces (although > > > >making it consistent may be difficult), but even in that case the matter > > > >of distinguishing which of those are meaningful for your workflow is not > > > >quite possible. > > > > > > I do not agree. You can specify the files that you need (intermediate or > > > final) on the left side of the function call - exactly the way it is done > > > now (but use the actual file names) : > > > > > > (a1.txt, a2.txt, a3.txt) = APP1 (b*.txt); > > > > > > where APP could be producing hundreds of a.txt files (a1.txt - a100.txt) > > > and 10 c*.txt files (c1.txt -c10.txt). And only those 3 specified > > should be > > > cared for. Or it could be done even this way: > > > > > > (a*.txt, c1.txt) = APP2 (b.txt); where I want to get all a*.txt files and > > > only one c1.txt file > > > > > > Swift stages files just before the application starts. So it shouldn't > > > affect the workflow system at all (to my understanding). Just the amount > > > files that need to be staged in/out (alternatively, you can always zip all > > > files together and have just one file staged in/out for any application). > > > > > > Anyway, I am not saying all this is easy -- just suggesting some > > > alternatives to the current system that requires (in case of my > > > application) some tedious filename operations... > > > > > > more below... > > > > > > > > > > > > > Now back to my original problem - constructing and passing to my next > > > > > application a collection of files. If I didn't have to do any > > mappers, it > > > > > would've been just as easy as (for example): > > > > > > > > > > c.txt = APP3 (a*.txt); > > > >There isn't much difference between that and c = APP3(a), where a is an > > > >array. But I digress. > > > > > > > > > Ok. But how do I construct that array in a clean way ? I thought that a > > > fixed_array_mapper would do that for me (if I pass a string of logical > > > filenames to it, shouldn't it create an array of files for me ?). Thats > > the > > > main point - I can't construct an array of logical filenames and pass > > it to > > > my application without re-writing the already-working code. Or I am just > > > missing something - and an answer is a one line code change ? (; > > > > > > more below... > > > > > > > > > > > > > Does it make sense at all ? > > > > > > > >Of course. > > > >However, I'm not convinced how well it would work, for reasons outlined > > > >above. > > > >So there is a number of operations and certain dependencies between > > > >them, where the operations are job submissions and file transfers (let's > > > >abstract low level, technology dependent things for now). These need to > > > >happen and are the result of executing a workflow (regardless of how > > > >it's expressed). They represent the application that you are trying to > > > >implement. If there is a way to infer all those operations and all the > > > >dependencies from your specification model, then it would be ok. So in > > > >the context of exploring a different way of expressing things, it would > > > >be helpful to have a clear illustration of both of them and the rules > > > >that get you from one to the other. > > > > > > It does sound like a good topic for the discussion! (-; > > > > > > Thanks again, > > > > > > Nika > > > > > > > > > >Mihael > > > > > > > > > > > > > > Thanks, > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > At 06:41 PM 3/13/2007, Mihael Hategan wrote: > > > > > >On Tue, 2007-03-13 at 18:28 -0500, Veronika V. Nefedova wrote: > > > > > > > Hmmm. So here is how my files are produced (inside double loop > > over > > > > $s and > > > > > > > $name): > > > > > > > > > > > > > > file $s9prt <"$name.prt">; > > > > > > > file $s9wham <"$s9.wham">; > > > > > > > file $s9crd <"$s9.crd">; > > > > > > > file $s9out <"$s9.out">; > > > > > > > file $s9done <"$s9donefile">; > > > > > > > > > > > > > > ($s9wham, $s9crd, $s9out, $s9done) = CHARMM3 (standn, gaff_prm, > > > > gaff_rft, > > > > > > > rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, $s9prt, > > "$ss1", > > > > > > > "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", "$rcut1", > > > > > > "$rcut2"); > > > > > > > > > > > > > > so if I change the mapping of the needed output file ($s9wham), > > > > everything > > > > > > > should work? > > > > > > > > > > > > > > file whamfiles_$s[$i] <"$s9.wham">; > > > > > > > > > > > >That one won't work. > > > > > >You need to let Swift map whamfiles_$s[] to what it wants. So you > > can't > > > > > >map individual items in an array differently. > > > > > > > > > > > >I believe that you rely on the fact that whamfiles_xzy maps to the > > same > > > > > >file names as some other variables. This won't work. You need to > > use the > > > > > >same variable. The file names are irrelevant if the program > > doesn't make > > > > > >sense for Swift. > > > > > >So think about it this way: mentally remove all the mapper > > declarations > > > > > >from the Swift program. If after that, the program makes sense, > > then you > > > > > >should be good to go. If it doesn't then it's likely it won't work. > > > > > >Remember, mapping is not something that can be used to hack things > > > > > >because the workflow structure has nothing to do with the mappers and > > > > > >Swift ignores mappers when figuring out the data flow. > > > > > > > > > > > >(dependent mappers notwithstanding) > > > > > > > > > > > > > i=`expr $i + 1` > > > > > > > > > > > > > > and call the function: > > > > > > > (whamfiles_$s[$i], $s9crd, $s9out, $s9done) = CHARMM3 (standn, > > > > gaff_prm, > > > > > > > gaff_rft, rtf_file_$s, prm_file_$s, psf_file_$s, crd_eq_file_$s, > > > > $s9prt, > > > > > > > "$ss1", "$s1", "$s2", "$s3", "$s4", "$s5", "$s7", "$s8", "$sprt", > > > > > > "$rcut1", > > > > > > > "$rcut2"); > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > At 06:12 PM 3/13/2007, Mihael Hategan wrote: > > > > > > > >On Tue, 2007-03-13 at 18:07 -0500, Veronika V. Nefedova wrote: > > > > > > > > > ok, here is in short what I need to do: > > > > > > > > > > > > > > > > > > at some point in the workflow N files are produced (in my > > case its > > > > > > 68, but > > > > > > > > > it could be any number). These files produced each by a > > separate > > > > > > job (i.e. > > > > > > > > > N jobs produce N files). > > > > > > > > > The next job in the workflow needs to take those N files as an > > > > input. > > > > > > > > > > > > > > > > > > Question: how do I pass these unknown number of files as an > > > > input to an > > > > > > > > > application ? The array_mapper didn't work (or i didn't use it > > > > > > correctly). > > > > > > > > > > > > > > > >In this case you need some other kind of mapper that can deal with > > > > > > > >unknown numbers of items. The default mapper (i.e. specifying no > > > > mapper) > > > > > > > >should work. > > > > > > > > > > > > > > > >So you need to do: > > > > > > > > > > > > > > > >file whamfiles_002[]; > > > > > > > > > > > > > > > >foreach v,k in someinput { > > > > > > > > whamfiles_002[k] = job(v); > > > > > > > >} > > > > > > > > > > > > > > > >... = GENERATOR(whamfiles_002); > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > At 05:56 PM 3/13/2007, Mihael Hategan wrote: > > > > > > > > > >On a third thought. This looks like, eventually, you are > > > > trying to do > > > > > > > > > >the same thing that Yong did with the dependent mappers > > > > earlier. I > > > > > > think > > > > > > > > > >he would have more insight on the topic. > > > > > > > > > > > > > > > > > > > >On Tue, 2007-03-13 at 17:23 -0500, Veronika V. Nefedova wrote: > > > > > > > > > > > I think I am confused. Sorry! > > > > > > > > > > > what will be the type of 'whamfiles' ? If its a string > > - will > > > > > > the swift > > > > > > > > > > > know to brake it down to filenames and stage them all in ? > > > > > > > > > > > Also - is there a mapper (or whatever) that can map the > > list of > > > > > > > > *logical* > > > > > > > > > > > file names to an array ? (thats what I was trying to do). > > > > > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 04:54 PM 3/13/2007, Mihael Hategan wrote: > > > > > > > > > > > >Oh my :) > > > > > > > > > > > >@whamfiles_m002 is known by the system at all times. That > > > > means > > > > > > > > > > > >GENERATOR does not need to wait for the actual files to be > > > > > > there since > > > > > > > > > > > >it knows very well what @whamfiles_m002 is (the list of > > > > names). > > > > > > > > > > > > > > > > > > > > > > > >You should try this instead: > > > > > > > > > > > >... > > > > > > > > > > > >... GENERATOR(whamfiles, str) { > > > > > > > > > > > > app { > > > > > > > > > > > > generator @whamfiles, str; > > > > > > > > > > > > } > > > > > > > > > > > >} > > > > > > > > > > > > > > > > > > > > > > > >... = GENERATOR(whamfiles_m002, "m002") > > > > > > > > > > > > > > > > > > > > > > > >Mihael > > > > > > > > > > > > > > > > > > > > > > > >On Tue, 2007-03-13 at 16:46 -0500, Veronika V. > > Nefedova wrote: > > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > > > I have a question: > > > > > > > > > > > > > > > > > > > > > > > > > > I am using a fixed_array_mapper to pass some 68 > > files as an > > > > > > > > input to my > > > > > > > > > > > > > application called GENERATOR. I need to use the mapper > > > > > > since the > > > > > > > > > > number of > > > > > > > > > > > > > input files is unknown before the workflow starts. > > Here is > > > > > > how I > > > > > > > > > > use it: > > > > > > > > > > > > > file whamfiles_m002[] > > > > > > > > > solv_chg_a0_m002_wham, > > > > > > > > > > > > > solv_chg_a1_m002_wham, solv_chg_a10_m002_wham, > > > > > many > > > > > > > > files, > > > > > > > > > > you > > > > > > > > > > > > get > > > > > > > > > > > > > the idea>, solv_repu_0_0DOT2_b1_m002_wham">; > > > > > > > > > > > > > > > > > > > > > > > > > > These files are all generated by stage four of my > > > > workflow, > > > > > > each > > > > > > > > > > file is > > > > > > > > > > > > > mapped to a physical filename, for example: > > > > > > > > > > > > > > > > > > > > > > > > > > file solv_chg_a0_m002_wham <"solv_chg_a0_m002.wham">; > > > > > > > > > > > > > and this particular file is produced this way: > > > > > > > > > > > > > (solv_chg_a0_m002_wham, solv_chg_a0_m002_crd, > > > > > > solv_chg_a0_m002_out, > > > > > > > > > > > > > solv_chg_a0_m002_done) = CHARMM2 (standn, gaff_prm, > > > > gaff_rft, > > > > > > > > > > > > > rtf_file_m002, prm_file_m002, psf_file_m002, > > > > crd_eq_file_m002, > > > > > > > > > > > > > solv_chg_a0_m002_prt, "prtfile:solv_chg_a0", > > > > > > "system:solv_m002", > > > > > > > > > > > > > "stitle:m002", "rtffile:parm03_gaff_all.rtf", > > > > > > > > > > > > > "paramfile:parm03_gaffnb_all.prm", "gaff:m002_am1", > > > > > > "stage:chg", > > > > > > > > > > > > > "urandseed:5395098", "dirname:solv_chg_a0_m002"); > > > > > > > > > > > > > > > > > > > > > > > > > > Then I call my application (the last stage of my > > workflow, > > > > > > > > stage five) > > > > > > > > > > > > > > > > > > > > > > > > > > (solv_chg_m002, solv_disp_m002, > > > > > > solv_repu_0DOT2_0DOT3_m002DOTwham, > > > > > > > > > > > > > solv_repu_0DOT3_0DOT4_m002DOTwham, > > > > > > > > solv_repu_0DOT4_0DOT5_m002DOTwham, > > > > > > > > > > > > > solv_repu_0DOT5_0DOT6_m002DOTwham, > > > > > > > > solv_repu_0DOT6_0DOT7_m002DOTwham, > > > > > > > > > > > > > solv_repu_0DOT7_0DOT8_m002DOTwham, > > > > > > > > solv_repu_0DOT8_0DOT9_m002DOTwham, > > > > > > > > > > > > > solv_repu_0DOT9_1_m002DOTwham, > > > > > > solv_repu_0_0DOT2_m002DOTwham ) = > > > > > > > > > > GENERATOR > > > > > > > > > > > > > (@whamfiles_m002, "m002"); > > > > > > > > > > > > > > > > > > > > > > > > > > And then when I start my workflow, the GENERATOR > > starts > > > > right > > > > > > > > away. > > > > > > > > > > I am > > > > > > > > > > > > > not sure why. Does the mapper look for the physical > > files > > > > > > on the > > > > > > > > > > disk and > > > > > > > > > > > > > when finds them - starts right away ? I do have the > > needed > > > > > > > > files in the > > > > > > > > > > > > > directory from my previous runs. Or there is something > > > > else > > > > > > wrong > > > > > > > > > > here ? > > > > > > > > > > > > > > > > > > > > > > > > > > 109] wiggum /sandbox/ydeng/alamines > Swift V 0.0405 > > > > > > > > > > > > > RunID: b0n2liektep92 > > > > > > > > > > > > > pre_ch started <---------- thats the > > first > > > > stage > > > > > > > > > > > > > generator_cat started <----------- not supposed to > > > > start > > > > > > now! > > > > > > > > > > > > > generator_cat started > > > > > > > > > > > > > > > > > > > > > > > > > > My complete dtm file is in /home/nefedova/swift.dtm on > > > > > > > > > > > > > terminable.ci.uchicago, but its pretty big... > > > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > > Swift-devel mailing list > > > > > > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From nefedova at mcs.anl.gov Fri Mar 16 08:49:45 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 16 Mar 2007 08:49:45 -0500 Subject: [Swift-devel] dot error In-Reply-To: <1173496209.22230.1.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070309145104.048c5010@mail.mcs.anl.gov> <1173475555.13614.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309153112.032e73f0@mail.mcs.anl.gov> <1173475865.13614.6.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309153648.0553a930@mail.mcs.anl.gov> <1173479931.15803.8.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309165000.032ff910@mail.mcs.anl.gov> <1173484840.19054.2.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070309180625.05412d40@mail.mcs.anl.gov> <1173492320.19899.1.camel@blabla.mcs.anl.gov> <1173496209.22230.1.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070316084844.052fbc60@mail.mcs.anl.gov> would it be possible to fix it ? I really would need to produce a few pictures of the workflow... Thanks, Nika At 10:10 PM 3/9/2007, Mihael Hategan wrote: >On Fri, 2007-03-09 at 21:01 -0600, Tiberiu Stef-Praun wrote: > > Bad dot specification > >:) >Labels should be quoted. That still doesn't explain why dot doesn't >complain. > > > > > > > On 3/9/07, Mihael Hategan wrote: > > > If it's a valid .dot specification, dot shouldn't blow. > > > If it's not a valid dot specification, dot should complain, and we > > > should fix swift to produce valid .dot files. > > > > > > We need to figure out which one it is. > > > > > > Mihael > > > > > > On Fri, 2007-03-09 at 18:08 -0600, Veronika V. Nefedova wrote: > > > > It is a simple graph: a-b-c-d (where a,b,c are single nodes, and d > is 68 > > > > parallel nodes). Could the number of parallel nodes be a problem? > > > > > > > > Nika > > > > > > > > At 06:00 PM 3/9/2007, Mihael Hategan wrote: > > > > >No clue. Looks like a pretty big graph. I can somewhat view > things, but > > > > >it behaves weirdly. > > > > > > > > > >On Fri, 2007-03-09 at 16:50 -0600, Veronika V. Nefedova wrote: > > > > > > Thank you! > > > > > > > > > > > > Done. Its in my home dir on evitable. > > > > > > > > > > > > At 04:38 PM 3/9/2007, you wrote: > > > > > > >No walk. Only a step. Instead of -Tpng, -Tps. And the name > eventually > > > > > > >changed to graph_big.ps. Ok, it was two steps. > > > > > > > > > > > > > >Mihael > > > > > > > > > > > > > >On Fri, 2007-03-09 at 15:37 -0600, Veronika V. Nefedova wrote: > > > > > > > > You'd have to walk me through this. I do not know how to > produce the ps > > > > > > > > file out of dot file... > > > > > > > > Sorry! > > > > > > > > > > > > > > > > At 03:31 PM 3/9/2007, you wrote: > > > > > > > > >Identify from ImageMagick quickly eats up all the memory > when I try to > > > > > > > > >run it on that file. > > > > > > > > >I'm tempted to conclude that something might be broken > with dot. > > > > > > > > >Can you try producing a PostScript file instead? > > > > > > > > > > > > > > > > > >Mihael > > > > > > > > > > > > > > > > > >On Fri, 2007-03-09 at 15:31 -0600, Veronika V. Nefedova wrote: > > > > > > > > > > Nope! > > > > > > > > > > > > > > > > > > > > [nefedova at evitable ~]$ dot -ograph_big.png -Tpng > > > > > > > > > > swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > > > > [nefedova at evitable ~]$ cp graph_big.png public_html/ > > > > > > > > > > [nefedova at evitable ~]$ which dot > > > > > > > > > > /usr/bin/dot > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > At 03:25 PM 3/9/2007, Mihael Hategan wrote: > > > > > > > > > > >Did dot complain about anything? > > > > > > > > > > > > > > > > > > > > > >On Fri, 2007-03-09 at 14:55 -0600, Veronika V. > Nefedova wrote: > > > > > > > > > > > > Hi, > > > > > > > > > > > > > > > > > > > > > > > > Not sure what happened -- but I was unable to > produce (or > > > > > > > display) the > > > > > > > > > > > > correct png file out of dot file that was generated > by swift > > > > > > > (after the > > > > > > > > > > > > workflow was done). > > > > > > > > > > > > > > > > > > > > > > > > I put my dot file on evitable in > > > > > > > > > > > ~nefedova/swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > > > > > > Then I ran the dot command to generate the png file: > > > > > > > > > > > > $dot -ograph_big.png -Tpng > swift-MolDyn-free-01dqgqzbhns11.dot > > > > > > > > > > > > > > > > > > > > > > > > Then I tried to display it in the browser and got this: > > > > > > > > > > > > The image > "http://www.ci.uchicago.edu/~nefedova/graph_big.png" > > > > > > > > > cannot be > > > > > > > > > > > > displayed, because it contains errors. > > > > > > > > > > > > > > > > > > > > > > > > Any idea what is wrong here? > > > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > > > > Swift-devel mailing list > > > > > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > From tiberius at ci.uchicago.edu Fri Mar 16 12:24:28 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Fri, 16 Mar 2007 12:24:28 -0500 Subject: [Swift-devel] Wishlist: array elements assignment Message-ID: -------- excerpts from the code ---- (file out) solver (file in){ app{ solve @filename(in) stdout=filename(out) } } (file solutions[]) problemset (file inputfiles[]){ // _input_ is the file item, _i_ is its index foreach file input,i in inputfiles { //FIXME: this one is an alternative that it would be nice, because I could build the output filenames from input file names on the fly file solutions[i]; solutions[i]=solver(input); } } file problems[]; //FIXME: alternative option for output file naming: declare them with a regexp-like mapper from the input files: file solutions[] solutions=problemset(problems); ------------------------- I'm open for alternatives I have N input files that I have to process, and generate N output files. I have complete control over the naming of the physical input and output files Currently, I was trying to do file X.gz generates file X.gz.out (hence the regexp mapper, as a means of doing string concatenation). See the FIXME comments for my thoughts -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From yongzh at cs.uchicago.edu Fri Mar 16 13:17:24 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 16 Mar 2007 13:17:24 -0500 (CDT) Subject: [Swift-devel] Wishlist: array elements assignment In-Reply-To: References: Message-ID: Yeah, I actually thought about making the regexp_mapper to deal with an array of items. Yong. On Fri, 16 Mar 2007, Tiberiu Stef-Praun wrote: > -------- excerpts from the code ---- > (file out) solver (file in){ > app{ > solve @filename(in) stdout=filename(out) > } > } > > > (file solutions[]) problemset (file inputfiles[]){ > // _input_ is the file item, _i_ is its index > foreach file input,i in inputfiles { > //FIXME: this one is an alternative that it would be > nice, because I could build the output filenames from input file names > on the fly > file > solutions[i]; > solutions[i]=solver(input); > } > } > > file problems[]; > > //FIXME: alternative option for output file naming: declare them with > a regexp-like mapper from the input files: > file solutions[] source=@filenames(problems),match=(.*), transform="\1,out"> > > solutions=problemset(problems); > > ------------------------- > > I'm open for alternatives > I have N input files that I have to process, and generate N output files. > I have complete control over the naming of the physical input and output files > Currently, I was trying to do file X.gz generates file X.gz.out > (hence the regexp mapper, as a means of doing string concatenation). > > See the FIXME comments for my thoughts > > -- > Tiberiu (Tibi) Stef-Praun, PhD > Research Staff, Computation Institute > 5640 S. Ellis Ave, #405 > University of Chicago > http://www-unix.mcs.anl.gov/~tiberius/ > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Fri Mar 16 13:41:33 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 16 Mar 2007 13:41:33 -0500 Subject: [Swift-devel] Wishlist: array elements assignment In-Reply-To: References: Message-ID: <1174070493.444.1.camel@blabla.mcs.anl.gov> On Fri, 2007-03-16 at 13:17 -0500, Yong Zhao wrote: > Yeah, I actually thought about making the regexp_mapper to deal with an > array of items. What would be the result of the following: any a[]; foreach... { x <...>; x = ...; a[i] = x; } ? > > Yong. > > On Fri, 16 Mar 2007, Tiberiu Stef-Praun wrote: > > > -------- excerpts from the code ---- > > (file out) solver (file in){ > > app{ > > solve @filename(in) stdout=filename(out) > > } > > } > > > > > > (file solutions[]) problemset (file inputfiles[]){ > > // _input_ is the file item, _i_ is its index > > foreach file input,i in inputfiles { > > //FIXME: this one is an alternative that it would be > > nice, because I could build the output filenames from input file names > > on the fly > > file > > solutions[i]; > > solutions[i]=solver(input); > > } > > } > > > > file problems[]; > > > > //FIXME: alternative option for output file naming: declare them with > > a regexp-like mapper from the input files: > > file solutions[] > source=@filenames(problems),match=(.*), transform="\1,out"> > > > > solutions=problemset(problems); > > > > ------------------------- > > > > I'm open for alternatives > > I have N input files that I have to process, and generate N output files. > > I have complete control over the naming of the physical input and output files > > Currently, I was trying to do file X.gz generates file X.gz.out > > (hence the regexp mapper, as a means of doing string concatenation). > > > > See the FIXME comments for my thoughts > > > > -- > > Tiberiu (Tibi) Stef-Praun, PhD > > Research Staff, Computation Institute > > 5640 S. Ellis Ave, #405 > > University of Chicago > > http://www-unix.mcs.anl.gov/~tiberius/ > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From yongzh at cs.uchicago.edu Fri Mar 16 13:55:10 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 16 Mar 2007 13:55:10 -0500 (CDT) Subject: [Swift-devel] Wishlist: array elements assignment In-Reply-To: <1174070493.444.1.camel@blabla.mcs.anl.gov> References: <1174070493.444.1.camel@blabla.mcs.anl.gov> Message-ID: That is the way to do it right now. On Fri, 16 Mar 2007, Mihael Hategan wrote: > On Fri, 2007-03-16 at 13:17 -0500, Yong Zhao wrote: > > Yeah, I actually thought about making the regexp_mapper to deal with an > > array of items. > > What would be the result of the following: > any a[]; > foreach... { > x <...>; > x = ...; > a[i] = x; > } > > ? > > > > > Yong. > > > > On Fri, 16 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > -------- excerpts from the code ---- > > > (file out) solver (file in){ > > > app{ > > > solve @filename(in) stdout=filename(out) > > > } > > > } > > > > > > > > > (file solutions[]) problemset (file inputfiles[]){ > > > // _input_ is the file item, _i_ is its index > > > foreach file input,i in inputfiles { > > > //FIXME: this one is an alternative that it would be > > > nice, because I could build the output filenames from input file names > > > on the fly > > > file > > > solutions[i]; > > > solutions[i]=solver(input); > > > } > > > } > > > > > > file problems[]; > > > > > > //FIXME: alternative option for output file naming: declare them with > > > a regexp-like mapper from the input files: > > > file solutions[] > > source=@filenames(problems),match=(.*), transform="\1,out"> > > > > > > solutions=problemset(problems); > > > > > > ------------------------- > > > > > > I'm open for alternatives > > > I have N input files that I have to process, and generate N output files. > > > I have complete control over the naming of the physical input and output files > > > Currently, I was trying to do file X.gz generates file X.gz.out > > > (hence the regexp mapper, as a means of doing string concatenation). > > > > > > See the FIXME comments for my thoughts > > > > > > -- > > > Tiberiu (Tibi) Stef-Praun, PhD > > > Research Staff, Computation Institute > > > 5640 S. Ellis Ave, #405 > > > University of Chicago > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From hategan at mcs.anl.gov Fri Mar 16 13:56:59 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 16 Mar 2007 13:56:59 -0500 Subject: [Swift-devel] Wishlist: array elements assignment In-Reply-To: References: <1174070493.444.1.camel@blabla.mcs.anl.gov> Message-ID: <1174071419.612.0.camel@blabla.mcs.anl.gov> On Fri, 2007-03-16 at 13:55 -0500, Yong Zhao wrote: > That is the way to do it right now. Does it actually work? Wouldn't a be mapped by a concurrent mapper? > > On Fri, 16 Mar 2007, Mihael Hategan wrote: > > > On Fri, 2007-03-16 at 13:17 -0500, Yong Zhao wrote: > > > Yeah, I actually thought about making the regexp_mapper to deal with an > > > array of items. > > > > What would be the result of the following: > > any a[]; > > foreach... { > > x <...>; > > x = ...; > > a[i] = x; > > } > > > > ? > > > > > > > > Yong. > > > > > > On Fri, 16 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > > > -------- excerpts from the code ---- > > > > (file out) solver (file in){ > > > > app{ > > > > solve @filename(in) stdout=filename(out) > > > > } > > > > } > > > > > > > > > > > > (file solutions[]) problemset (file inputfiles[]){ > > > > // _input_ is the file item, _i_ is its index > > > > foreach file input,i in inputfiles { > > > > //FIXME: this one is an alternative that it would be > > > > nice, because I could build the output filenames from input file names > > > > on the fly > > > > file > > > > solutions[i]; > > > > solutions[i]=solver(input); > > > > } > > > > } > > > > > > > > file problems[]; > > > > > > > > //FIXME: alternative option for output file naming: declare them with > > > > a regexp-like mapper from the input files: > > > > file solutions[] > > > source=@filenames(problems),match=(.*), transform="\1,out"> > > > > > > > > solutions=problemset(problems); > > > > > > > > ------------------------- > > > > > > > > I'm open for alternatives > > > > I have N input files that I have to process, and generate N output files. > > > > I have complete control over the naming of the physical input and output files > > > > Currently, I was trying to do file X.gz generates file X.gz.out > > > > (hence the regexp mapper, as a means of doing string concatenation). > > > > > > > > See the FIXME comments for my thoughts > > > > > > > > -- > > > > Tiberiu (Tibi) Stef-Praun, PhD > > > > Research Staff, Computation Institute > > > > 5640 S. Ellis Ave, #405 > > > > University of Chicago > > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > From yongzh at cs.uchicago.edu Fri Mar 16 14:00:27 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 16 Mar 2007 14:00:27 -0500 (CDT) Subject: [Swift-devel] Wishlist: array elements assignment In-Reply-To: <1174071419.612.0.camel@blabla.mcs.anl.gov> References: <1174070493.444.1.camel@blabla.mcs.anl.gov> <1174071419.612.0.camel@blabla.mcs.anl.gov> Message-ID: a was mapped to concurrent mapper, but the assignment changes it to whatever is in x. On Fri, 16 Mar 2007, Mihael Hategan wrote: > On Fri, 2007-03-16 at 13:55 -0500, Yong Zhao wrote: > > That is the way to do it right now. > > Does it actually work? > Wouldn't a be mapped by a concurrent mapper? > > > > > On Fri, 16 Mar 2007, Mihael Hategan wrote: > > > > > On Fri, 2007-03-16 at 13:17 -0500, Yong Zhao wrote: > > > > Yeah, I actually thought about making the regexp_mapper to deal with an > > > > array of items. > > > > > > What would be the result of the following: > > > any a[]; > > > foreach... { > > > x <...>; > > > x = ...; > > > a[i] = x; > > > } > > > > > > ? > > > > > > > > > > > Yong. > > > > > > > > On Fri, 16 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > > > > > -------- excerpts from the code ---- > > > > > (file out) solver (file in){ > > > > > app{ > > > > > solve @filename(in) stdout=filename(out) > > > > > } > > > > > } > > > > > > > > > > > > > > > (file solutions[]) problemset (file inputfiles[]){ > > > > > // _input_ is the file item, _i_ is its index > > > > > foreach file input,i in inputfiles { > > > > > //FIXME: this one is an alternative that it would be > > > > > nice, because I could build the output filenames from input file names > > > > > on the fly > > > > > file > > > > > solutions[i]; > > > > > solutions[i]=solver(input); > > > > > } > > > > > } > > > > > > > > > > file problems[]; > > > > > > > > > > //FIXME: alternative option for output file naming: declare them with > > > > > a regexp-like mapper from the input files: > > > > > file solutions[] > > > > source=@filenames(problems),match=(.*), transform="\1,out"> > > > > > > > > > > solutions=problemset(problems); > > > > > > > > > > ------------------------- > > > > > > > > > > I'm open for alternatives > > > > > I have N input files that I have to process, and generate N output files. > > > > > I have complete control over the naming of the physical input and output files > > > > > Currently, I was trying to do file X.gz generates file X.gz.out > > > > > (hence the regexp mapper, as a means of doing string concatenation). > > > > > > > > > > See the FIXME comments for my thoughts > > > > > > > > > > -- > > > > > Tiberiu (Tibi) Stef-Praun, PhD > > > > > Research Staff, Computation Institute > > > > > 5640 S. Ellis Ave, #405 > > > > > University of Chicago > > > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > From tiberius at ci.uchicago.edu Fri Mar 16 14:10:42 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Fri, 16 Mar 2007 14:10:42 -0500 Subject: [Swift-devel] Wishlist: array elements assignment In-Reply-To: References: <1174070493.444.1.camel@blabla.mcs.anl.gov> <1174071419.612.0.camel@blabla.mcs.anl.gov> Message-ID: The syntactic solution is acceptable, but the workflow does not get executed, probably because it does not find any specific output file names that need to be generated. Can I force it to execute the problemset function ? I was thinking of adding a transformation at its end that would create a dummy file, but I'm not sure that this would trigger the execution of the for loop. Tibi ------------ code ---------- file solution) batch_lin_solver (file input){ app{ solver @filename(input) stdout=@filename(solution); } } (file solutions[]) problemset (file inputfiles[]){ // _input_ is the file item, _i_ is its index foreach file input,i in inputfiles { file solution; solution=batch_lin_solver(input); solutions[i]=solution; } } //file problems[]; file problems[]; file results[]; results=problemset(problems); On 3/16/07, Yong Zhao wrote: > a was mapped to concurrent mapper, but the assignment changes it to > whatever is in x. > > On Fri, 16 Mar 2007, Mihael Hategan wrote: > > > On Fri, 2007-03-16 at 13:55 -0500, Yong Zhao wrote: > > > That is the way to do it right now. > > > > Does it actually work? > > Wouldn't a be mapped by a concurrent mapper? > > > > > > > > On Fri, 16 Mar 2007, Mihael Hategan wrote: > > > > > > > On Fri, 2007-03-16 at 13:17 -0500, Yong Zhao wrote: > > > > > Yeah, I actually thought about making the regexp_mapper to deal with an > > > > > array of items. > > > > > > > > What would be the result of the following: > > > > any a[]; > > > > foreach... { > > > > x <...>; > > > > x = ...; > > > > a[i] = x; > > > > } > > > > > > > > ? > > > > > > > > > > > > > > Yong. > > > > > > > > > > On Fri, 16 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > > > > > > > -------- excerpts from the code ---- > > > > > > (file out) solver (file in){ > > > > > > app{ > > > > > > solve @filename(in) stdout=filename(out) > > > > > > } > > > > > > } > > > > > > > > > > > > > > > > > > (file solutions[]) problemset (file inputfiles[]){ > > > > > > // _input_ is the file item, _i_ is its index > > > > > > foreach file input,i in inputfiles { > > > > > > //FIXME: this one is an alternative that it would be > > > > > > nice, because I could build the output filenames from input file names > > > > > > on the fly > > > > > > file > > > > > > solutions[i]; > > > > > > solutions[i]=solver(input); > > > > > > } > > > > > > } > > > > > > > > > > > > file problems[]; > > > > > > > > > > > > //FIXME: alternative option for output file naming: declare them with > > > > > > a regexp-like mapper from the input files: > > > > > > file solutions[] > > > > > source=@filenames(problems),match=(.*), transform="\1,out"> > > > > > > > > > > > > solutions=problemset(problems); > > > > > > > > > > > > ------------------------- > > > > > > > > > > > > I'm open for alternatives > > > > > > I have N input files that I have to process, and generate N output files. > > > > > > I have complete control over the naming of the physical input and output files > > > > > > Currently, I was trying to do file X.gz generates file X.gz.out > > > > > > (hence the regexp mapper, as a means of doing string concatenation). > > > > > > > > > > > > See the FIXME comments for my thoughts > > > > > > > > > > > > -- > > > > > > Tiberiu (Tibi) Stef-Praun, PhD > > > > > > Research Staff, Computation Institute > > > > > > 5640 S. Ellis Ave, #405 > > > > > > University of Chicago > > > > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From nefedova at mcs.anl.gov Fri Mar 16 14:25:24 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 16 Mar 2007 14:25:24 -0500 Subject: [Swift-devel] Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <200703161912.l2GJCLRa007182@amantadine.ncsa.uiuc.edu> References: <200703161912.l2GJCLRa007182@amantadine.ncsa.uiuc.edu> Message-ID: <6.0.0.22.2.20070316142419.048a6ec0@mail.mcs.anl.gov> Hi, Dave: Could you please give me a bit more information on what has happened ? Thanks, Nika At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: >FROM: McWilliams, David G >(Concerning ticket No. 137212) > >Veronika, > >The system administrator had to kill several of your globus processes today >because the load average on the node was over 25. Below is a list of processes >that were killed. Please let us know if you need help identifying the case of >the problem. > >Regards, > >Dave McWilliams (217) 244-1144 consult at ncsa.uiuc.edu >NCSA Consulting Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ >-------------------------------------------------------------------------- > >nefedova 11084 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11165 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11166 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11277 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11295 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11379 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11402 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11480 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11522 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11602 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11668 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11715 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11785 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11813 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11892 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 11948 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12004 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12030 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12133 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12172 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12253 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12256 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12443 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12460 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12504 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12534 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12657 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12668 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12773 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12806 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12892 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 12946 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13023 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13072 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13142 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13233 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13245 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13352 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13379 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13474 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13488 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13618 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13663 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13743 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13820 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13887 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 13952 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 14046 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 14048 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 14172 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 14196 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 14319 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 14366 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 14430 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 14539 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 14572 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 14703 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 14725 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 14832 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 14849 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15009 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15017 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15164 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15165 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15322 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15332 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15526 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15544 1 0 09:33 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15671 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15672 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15787 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15842 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15981 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 15982 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 16131 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 16132 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 16320 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 16358 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 16553 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 16622 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 16725 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 16726 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 16845 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 16925 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 16989 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 17095 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 17262 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 17305 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 17375 1 0 09:34 ? 00:00:02 globus-job-manager -conf >/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type pbs_gcc -rdn >jobmanager-pbs -machine-type unknown -publish-jobs >nefedova 31347 14172 4 13:35 ? 00:00:00 /usr/bin/perl >/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m >pbs_gcc >-f /tmp/gram_1Drc0X -c poll >nefedova 31349 11295 4 13:35 ? 00:00:00 /usr/bin/perl >/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m >pbs_gcc >-f /tmp/gram_fSWNvt -c poll >nefedova 31352 11522 4 13:35 ? 00:00:00 /usr/bin/perl >/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m >pbs_gcc >-f /tmp/gram_a3whWx -c poll >nefedova 31380 16845 4 13:35 ? 00:00:00 /usr/bin/perl >/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m >pbs_gcc >-f /tmp/gram_zz0JuC -c poll >nefedova 31396 14832 4 13:35 ? 00:00:00 /usr/bin/perl >/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m >pbs_gcc >-f /tmp/gram_5jGFI1 -c poll >nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c >/usr/local/pbs/ia64/bin/qstat -f 905629.tg-master.ncsa.teragrid.org >2>/dev/null >nefedova 31546 31512 0 13:35 ? 00:00:00 >/usr/local/pbs/ia64/bin/qstat -f >905629.tg-master.ncsa.teragrid.org >nefedova 31578 31555 0 13:35 ? 00:00:00 >/usr/local/pbs/ia64/bin/qstat -f >905638.tg-master.ncsa.teragrid.org >nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c >/usr/local/torque-2.1.7/ia64/sbin/pbs_iff tg-master.ncsa.teragrid.org 15001 3 >nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c >/usr/local/torque-2.1.7/ia64/sbin/pbs_iff tg-master.ncsa.teragrid.org 15001 3 From hategan at mcs.anl.gov Fri Mar 16 14:35:04 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 16 Mar 2007 14:35:04 -0500 Subject: [Swift-devel] Wishlist: array elements assignment In-Reply-To: References: <1174070493.444.1.camel@blabla.mcs.anl.gov> <1174071419.612.0.camel@blabla.mcs.anl.gov> Message-ID: <1174073704.736.0.camel@blabla.mcs.anl.gov> On Fri, 2007-03-16 at 14:00 -0500, Yong Zhao wrote: > a was mapped to concurrent mapper, but the assignment changes it to > whatever is in x. I thought the mapper is associated with the root, and therefore the array. > > On Fri, 16 Mar 2007, Mihael Hategan wrote: > > > On Fri, 2007-03-16 at 13:55 -0500, Yong Zhao wrote: > > > That is the way to do it right now. > > > > Does it actually work? > > Wouldn't a be mapped by a concurrent mapper? > > > > > > > > On Fri, 16 Mar 2007, Mihael Hategan wrote: > > > > > > > On Fri, 2007-03-16 at 13:17 -0500, Yong Zhao wrote: > > > > > Yeah, I actually thought about making the regexp_mapper to deal with an > > > > > array of items. > > > > > > > > What would be the result of the following: > > > > any a[]; > > > > foreach... { > > > > x <...>; > > > > x = ...; > > > > a[i] = x; > > > > } > > > > > > > > ? > > > > > > > > > > > > > > Yong. > > > > > > > > > > On Fri, 16 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > > > > > > > -------- excerpts from the code ---- > > > > > > (file out) solver (file in){ > > > > > > app{ > > > > > > solve @filename(in) stdout=filename(out) > > > > > > } > > > > > > } > > > > > > > > > > > > > > > > > > (file solutions[]) problemset (file inputfiles[]){ > > > > > > // _input_ is the file item, _i_ is its index > > > > > > foreach file input,i in inputfiles { > > > > > > //FIXME: this one is an alternative that it would be > > > > > > nice, because I could build the output filenames from input file names > > > > > > on the fly > > > > > > file > > > > > > solutions[i]; > > > > > > solutions[i]=solver(input); > > > > > > } > > > > > > } > > > > > > > > > > > > file problems[]; > > > > > > > > > > > > //FIXME: alternative option for output file naming: declare them with > > > > > > a regexp-like mapper from the input files: > > > > > > file solutions[] > > > > > source=@filenames(problems),match=(.*), transform="\1,out"> > > > > > > > > > > > > solutions=problemset(problems); > > > > > > > > > > > > ------------------------- > > > > > > > > > > > > I'm open for alternatives > > > > > > I have N input files that I have to process, and generate N output files. > > > > > > I have complete control over the naming of the physical input and output files > > > > > > Currently, I was trying to do file X.gz generates file X.gz.out > > > > > > (hence the regexp mapper, as a means of doing string concatenation). > > > > > > > > > > > > See the FIXME comments for my thoughts > > > > > > > > > > > > -- > > > > > > Tiberiu (Tibi) Stef-Praun, PhD > > > > > > Research Staff, Computation Institute > > > > > > 5640 S. Ellis Ave, #405 > > > > > > University of Chicago > > > > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > From tiberius at ci.uchicago.edu Fri Mar 16 14:39:27 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Fri, 16 Mar 2007 14:39:27 -0500 Subject: [Swift-devel] Wishlist: array elements assignment In-Reply-To: References: <1174070493.444.1.camel@blabla.mcs.anl.gov> <1174071419.612.0.camel@blabla.mcs.anl.gov> Message-ID: Problem solved, caused by typo. On 3/16/07, Tiberiu Stef-Praun wrote: > The syntactic solution is acceptable, but the workflow does not get > executed, probably because it does not find any specific output file > names that need to be generated. > Can I force it to execute the problemset function ? > I was thinking of adding a transformation at its end that would create > a dummy file, but I'm not sure that this would trigger the execution > of the for loop. > > Tibi > > ------------ code ---------- > > file solution) batch_lin_solver (file input){ > app{ > solver @filename(input) stdout=@filename(solution); > } > } > > (file solutions[]) problemset (file inputfiles[]){ > // _input_ is the file item, _i_ is its index > foreach file input,i in inputfiles { > file > solution; > solution=batch_lin_solver(input); > solutions[i]=solution; > } > } > > //file problems[]; > file problems[]; > file results[]; > > results=problemset(problems); > > > > > > > On 3/16/07, Yong Zhao wrote: > > a was mapped to concurrent mapper, but the assignment changes it to > > whatever is in x. > > > > On Fri, 16 Mar 2007, Mihael Hategan wrote: > > > > > On Fri, 2007-03-16 at 13:55 -0500, Yong Zhao wrote: > > > > That is the way to do it right now. > > > > > > Does it actually work? > > > Wouldn't a be mapped by a concurrent mapper? > > > > > > > > > > > On Fri, 16 Mar 2007, Mihael Hategan wrote: > > > > > > > > > On Fri, 2007-03-16 at 13:17 -0500, Yong Zhao wrote: > > > > > > Yeah, I actually thought about making the regexp_mapper to deal with an > > > > > > array of items. > > > > > > > > > > What would be the result of the following: > > > > > any a[]; > > > > > foreach... { > > > > > x <...>; > > > > > x = ...; > > > > > a[i] = x; > > > > > } > > > > > > > > > > ? > > > > > > > > > > > > > > > > > Yong. > > > > > > > > > > > > On Fri, 16 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > > > > > > > > > -------- excerpts from the code ---- > > > > > > > (file out) solver (file in){ > > > > > > > app{ > > > > > > > solve @filename(in) stdout=filename(out) > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > > > > > > > > (file solutions[]) problemset (file inputfiles[]){ > > > > > > > // _input_ is the file item, _i_ is its index > > > > > > > foreach file input,i in inputfiles { > > > > > > > //FIXME: this one is an alternative that it would be > > > > > > > nice, because I could build the output filenames from input file names > > > > > > > on the fly > > > > > > > file > > > > > > > solutions[i]; > > > > > > > solutions[i]=solver(input); > > > > > > > } > > > > > > > } > > > > > > > > > > > > > > file problems[]; > > > > > > > > > > > > > > //FIXME: alternative option for output file naming: declare them with > > > > > > > a regexp-like mapper from the input files: > > > > > > > file solutions[] > > > > > > source=@filenames(problems),match=(.*), transform="\1,out"> > > > > > > > > > > > > > > solutions=problemset(problems); > > > > > > > > > > > > > > ------------------------- > > > > > > > > > > > > > > I'm open for alternatives > > > > > > > I have N input files that I have to process, and generate N output files. > > > > > > > I have complete control over the naming of the physical input and output files > > > > > > > Currently, I was trying to do file X.gz generates file X.gz.out > > > > > > > (hence the regexp mapper, as a means of doing string concatenation). > > > > > > > > > > > > > > See the FIXME comments for my thoughts > > > > > > > > > > > > > > -- > > > > > > > Tiberiu (Tibi) Stef-Praun, PhD > > > > > > > Research Staff, Computation Institute > > > > > > > 5640 S. Ellis Ave, #405 > > > > > > > University of Chicago > > > > > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > Tiberiu (Tibi) Stef-Praun, PhD > Research Staff, Computation Institute > 5640 S. Ellis Ave, #405 > University of Chicago > http://www-unix.mcs.anl.gov/~tiberius/ > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From benc at hawaga.org.uk Fri Mar 16 15:03:00 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 16 Mar 2007 20:03:00 +0000 (GMT) Subject: [Swift-devel] Wishlist: array elements assignment In-Reply-To: <1174073704.736.0.camel@blabla.mcs.anl.gov> References: <1174070493.444.1.camel@blabla.mcs.anl.gov> <1174071419.612.0.camel@blabla.mcs.anl.gov> <1174073704.736.0.camel@blabla.mcs.anl.gov> Message-ID: On Fri, 16 Mar 2007, Mihael Hategan wrote: > On Fri, 2007-03-16 at 14:00 -0500, Yong Zhao wrote: > > a was mapped to concurrent mapper, but the assignment changes it to > > whatever is in x. > > I thought the mapper is associated with the root, and therefore the > array. That's what I thought too... -- From hategan at mcs.anl.gov Fri Mar 16 15:09:55 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 16 Mar 2007 15:09:55 -0500 Subject: [Swift-devel] Wishlist: array elements assignment In-Reply-To: References: <1174070493.444.1.camel@blabla.mcs.anl.gov> <1174071419.612.0.camel@blabla.mcs.anl.gov> <1174073704.736.0.camel@blabla.mcs.anl.gov> Message-ID: <1174075795.21278.8.camel@blabla.mcs.anl.gov> Tibi, are you sure that it actually works? Are you using any of the items in that array for another stage? On Fri, 2007-03-16 at 20:03 +0000, Ben Clifford wrote: > > On Fri, 16 Mar 2007, Mihael Hategan wrote: > > > On Fri, 2007-03-16 at 14:00 -0500, Yong Zhao wrote: > > > a was mapped to concurrent mapper, but the assignment changes it to > > > whatever is in x. > > > > > I thought the mapper is associated with the root, and therefore the > > array. > > That's what I thought too... > From yongzh at cs.uchicago.edu Fri Mar 16 15:21:22 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Fri, 16 Mar 2007 15:21:22 -0500 (CDT) Subject: [Swift-devel] Wishlist: array elements assignment In-Reply-To: <1174075795.21278.8.camel@blabla.mcs.anl.gov> References: <1174070493.444.1.camel@blabla.mcs.anl.gov> <1174071419.612.0.camel@blabla.mcs.anl.gov> <1174073704.736.0.camel@blabla.mcs.anl.gov> <1174075795.21278.8.camel@blabla.mcs.anl.gov> Message-ID: I did this for my montage workflow and it worked nicely On Fri, 16 Mar 2007, Mihael Hategan wrote: > Tibi, are you sure that it actually works? Are you using any of the > items in that array for another stage? > > On Fri, 2007-03-16 at 20:03 +0000, Ben Clifford wrote: > > > > On Fri, 16 Mar 2007, Mihael Hategan wrote: > > > > > On Fri, 2007-03-16 at 14:00 -0500, Yong Zhao wrote: > > > > a was mapped to concurrent mapper, but the assignment changes it to > > > > whatever is in x. > > > > > > > > I thought the mapper is associated with the root, and therefore the > > > array. > > > > That's what I thought too... > > > > From tiberius at ci.uchicago.edu Fri Mar 16 15:29:22 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Fri, 16 Mar 2007 15:29:22 -0500 Subject: [Swift-devel] Wishlist: array elements assignment In-Reply-To: <1174075795.21278.8.camel@blabla.mcs.anl.gov> References: <1174070493.444.1.camel@blabla.mcs.anl.gov> <1174071419.612.0.camel@blabla.mcs.anl.gov> <1174073704.736.0.camel@blabla.mcs.anl.gov> <1174075795.21278.8.camel@blabla.mcs.anl.gov> Message-ID: Yes, it does work Previously I had an * in the regexp mapper, so it would not pick up any filenames because of that. On 3/16/07, Mihael Hategan wrote: > Tibi, are you sure that it actually works? Are you using any of the > items in that array for another stage? > > On Fri, 2007-03-16 at 20:03 +0000, Ben Clifford wrote: > > > > On Fri, 16 Mar 2007, Mihael Hategan wrote: > > > > > On Fri, 2007-03-16 at 14:00 -0500, Yong Zhao wrote: > > > > a was mapped to concurrent mapper, but the assignment changes it to > > > > whatever is in x. > > > > > > > > I thought the mapper is associated with the root, and therefore the > > > array. > > > > That's what I thought too... > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From nefedova at mcs.anl.gov Fri Mar 16 16:53:56 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 16 Mar 2007 16:53:56 -0500 Subject: [Swift-devel] Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <200703162131.l2GLVvK5022487@amantadine.ncsa.uiuc.edu> References: <6.0.0.22.2.20070316142419.048a6ec0@mail.mcs.anl.gov> <200703162131.l2GLVvK5022487@amantadine.ncsa.uiuc.edu> Message-ID: <6.0.0.22.2.20070316165058.050cc140@mail.mcs.anl.gov> Hi, Galen: I was told that I could have 384 jobs in PBS at the same time. By my estimation I had no more then 136 jobs in the queue. What are other limits I should keep in mind ? Thanks, Veronika At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: >FROM: Arnold, Galen >(Concerning ticket No. 137212) > >Veronika, > >We see many sequences in our globus-gatekeeper.log on tg-login1 like this > >Notice: 5: Authenticated globus user: >/DC=org/DC=doegrids/OU=People/CN=Veronika >Nefedova 137823 >Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 >Notice: 5: Requested service: jobmanager-pbs >Notice: 5: Authorized as local user: nefedova >Notice: 5: Authorized as local uid: 29202 >Notice: 5: and local gid: 11467 >Notice: 0: executing >/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager >Notice: 0: GRID_SECURITY_CONTEXT_FD=9 >Notice: 0: Child 16725 started >Notice: 5: Authenticated globus user: >/DC=org/DC=doegrids/OU=People/CN=Veronika >Nefedova 137823 >Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 >Notice: 5: Requested service: jobmanager-pbs >Notice: 5: Authorized as local user: nefedova >Notice: 5: Authorized as local uid: 29202 >Notice: 5: and local gid: 11467 >Notice: 0: executing >/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager >Notice: 0: GRID_SECURITY_CONTEXT_FD=9 >Notice: 0: Child 16726 started >Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=16769 >starting at Fri Mar 16 09:34:12 2007 > > >...your DN is also popular in the globus-gatekeeper.log. > >tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep 'CN=Veronika Nefedova' >globus-gatekeeper.log |wc -l > 15215 > > >The large number of connects around the same time: > > grep --after-context=10 Veronika globus-gatekeeper.log | grep starting | > grep >'Mar 16 09:3' | wc -l > 65 >... >Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=15872 >starting at Fri Mar 16 09:34:02 2007 >Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=16644 >starting at Fri Mar 16 09:34:11 2007 >Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=16769 >starting at Fri Mar 16 09:34:12 2007 >Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=16926 >starting at Fri Mar 16 09:34:14 2007 >... > > >...could be overwhelming the gatekeeper on tg-login1. > >-Galen > > >Veronika V. Nefedova writes: > >Hi, Dave: > > > >Could you please give me a bit more information on what has happened ? > > > >Thanks, > > > >Nika > > > >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > >>FROM: McWilliams, David G > >>(Concerning ticket No. 137212) > >> > >>Veronika, > >> > >>The system administrator had to kill several of your globus processes today > >>because the load average on the node was over 25. Below is a list of > processes > >>that were killed. Please let us know if you need help identifying the > case of > >>the problem. > >> > >>Regards, > >> > >>Dave McWilliams (217) 244-1144 consult at ncsa.uiuc.edu > >>NCSA Consulting Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > >>-------------------------------------------------------------------------- > >> > >>nefedova 11084 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11165 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11166 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11277 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11295 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11379 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11402 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11480 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11522 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11602 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11668 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11715 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11785 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11813 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11892 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 11948 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12004 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12030 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12133 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12172 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12253 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12256 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12443 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12460 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12504 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12534 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12657 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12668 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12773 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12806 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12892 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 12946 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13023 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13072 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13142 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13233 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13245 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13352 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13379 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13474 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13488 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13618 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13663 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13743 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13820 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13887 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 13952 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 14046 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 14048 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 14172 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 14196 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 14319 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 14366 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 14430 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 14539 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 14572 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 14703 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 14725 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 14832 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 14849 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15009 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15017 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15164 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15165 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15322 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15332 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15526 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15544 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15671 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15672 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15787 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15842 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15981 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 15982 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 16131 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 16132 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 16320 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 16358 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 16553 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 16622 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 16725 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 16726 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 16845 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 16925 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 16989 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 17095 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 17262 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 17305 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 17375 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > pbs_gcc -rdn > >>jobmanager-pbs -machine-type unknown -publish-jobs > >>nefedova 31347 14172 4 13:35 ? 00:00:00 /usr/bin/perl > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > >>pbs_gcc > >>-f /tmp/gram_1Drc0X -c poll > >>nefedova 31349 11295 4 13:35 ? 00:00:00 /usr/bin/perl > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > >>pbs_gcc > >>-f /tmp/gram_fSWNvt -c poll > >>nefedova 31352 11522 4 13:35 ? 00:00:00 /usr/bin/perl > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > >>pbs_gcc > >>-f /tmp/gram_a3whWx -c poll > >>nefedova 31380 16845 4 13:35 ? 00:00:00 /usr/bin/perl > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > >>pbs_gcc > >>-f /tmp/gram_zz0JuC -c poll > >>nefedova 31396 14832 4 13:35 ? 00:00:00 /usr/bin/perl > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > >>pbs_gcc > >>-f /tmp/gram_5jGFI1 -c poll > >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > >>/usr/local/pbs/ia64/bin/qstat -f 905629.tg-master.ncsa.teragrid.org > >>2>/dev/null > >>nefedova 31546 31512 0 13:35 ? 00:00:00 > >>/usr/local/pbs/ia64/bin/qstat -f > >>905629.tg-master.ncsa.teragrid.org > >>nefedova 31578 31555 0 13:35 ? 00:00:00 > >>/usr/local/pbs/ia64/bin/qstat -f > >>905638.tg-master.ncsa.teragrid.org > >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff tg-master.ncsa.teragrid.org > 15001 3 > >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff tg-master.ncsa.teragrid.org > 15001 3 From nefedova at mcs.anl.gov Fri Mar 16 17:02:29 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 16 Mar 2007 17:02:29 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) Message-ID: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> Hi, Mihael: how do I set this throttling parameter ? Thanks, Nika >Date: Fri, 16 Mar 2007 15:53:57 -0600 >Subject: Re: globus jobs killed on NCSA's IA64 cluster (mercury) >To: nefedova at mcs.anl.gov >From: consult at ncsa.uiuc.edu >Cc: >X-Mailer: Perl5 Mail::Internet v1.74 >Sender: Nobody >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9 >X-NCSA-MailScanner-Information: Please contact help at ncsa.uiuc.edu for more >information, amantadine.ncsa.uiuc.edu >X-NCSA-MailScanner: Found to be clean >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailgw.mcs.anl.gov > >FROM: Arnold, Galen >(Concerning ticket No. 137212) > >Veronika, > >If you can throttle the job submission so that there's more than 1 second >between them, that would probably help us out. > >-Galen > >Veronika V. Nefedova writes: > >Hi, Galen: > > > >I was told that I could have 384 jobs in PBS at the same time. By my > >estimation I had no more then 136 jobs in the queue. What are other limits > >I should keep in mind ? > > > >Thanks, > > > >Veronika > > > >At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > >>FROM: Arnold, Galen > >>(Concerning ticket No. 137212) > >> > >>Veronika, > >> > >>We see many sequences in our globus-gatekeeper.log on tg-login1 like this > >> > >>Notice: 5: Authenticated globus user: > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > >>Nefedova 137823 > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > >>Notice: 5: Requested service: jobmanager-pbs > >>Notice: 5: Authorized as local user: nefedova > >>Notice: 5: Authorized as local uid: 29202 > >>Notice: 5: and local gid: 11467 > >>Notice: 0: executing > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > >>Notice: 0: Child 16725 started > >>Notice: 5: Authenticated globus user: > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > >>Nefedova 137823 > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > >>Notice: 5: Requested service: jobmanager-pbs > >>Notice: 5: Authorized as local user: nefedova > >>Notice: 5: Authorized as local uid: 29202 > >>Notice: 5: and local gid: 11467 > >>Notice: 0: executing > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > >>Notice: 0: Child 16726 started > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=16769 > >>starting at Fri Mar 16 09:34:12 2007 > >> > >> > >>...your DN is also popular in the globus-gatekeeper.log. > >> > >>tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep 'CN=Veronika Nefedova' > >>globus-gatekeeper.log |wc -l > >> 15215 > >> > >> > >>The large number of connects around the same time: > >> > >> grep --after-context=10 Veronika globus-gatekeeper.log | grep starting | > >> grep > >>'Mar 16 09:3' | wc -l > >> 65 > >>... > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=15872 > >>starting at Fri Mar 16 09:34:02 2007 > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=16644 > >>starting at Fri Mar 16 09:34:11 2007 > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=16769 > >>starting at Fri Mar 16 09:34:12 2007 > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=16926 > >>starting at Fri Mar 16 09:34:14 2007 > >>... > >> > >> > >>...could be overwhelming the gatekeeper on tg-login1. > >> > >>-Galen > >> > >> > >>Veronika V. Nefedova writes: > >> >Hi, Dave: > >> > > >> >Could you please give me a bit more information on what has happened ? > >> > > >> >Thanks, > >> > > >> >Nika > >> > > >> >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > >> >>FROM: McWilliams, David G > >> >>(Concerning ticket No. 137212) > >> >> > >> >>Veronika, > >> >> > >> >>The system administrator had to kill several of your globus > processes today > >> >>because the load average on the node was over 25. Below is a list of > >> processes > >> >>that were killed. Please let us know if you need help identifying the > >> case of > >> >>the problem. > >> >> > >> >>Regards, > >> >> > >> >>Dave McWilliams (217) 244-1144 consult at ncsa.uiuc.edu > >> >>NCSA Consulting > Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > >> >>--------------------------------------------------------------------- > ----- > >> >> > >> >>nefedova 11084 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11165 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11166 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11277 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11295 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11379 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11402 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11480 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11522 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11602 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11668 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11715 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11785 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11813 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11892 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 11948 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12004 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12030 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12133 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12172 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12253 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12256 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12443 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12460 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12504 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12534 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12657 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12668 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12773 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12806 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12892 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 12946 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13023 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13072 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13142 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13233 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13245 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13352 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13379 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13474 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13488 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13618 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13663 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13743 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13820 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13887 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 13952 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 14046 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 14048 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 14172 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 14196 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 14319 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 14366 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 14430 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 14539 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 14572 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 14703 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 14725 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 14832 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 14849 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15009 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15017 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15164 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15165 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15322 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15332 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15526 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15544 1 0 09:33 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15671 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15672 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15787 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15842 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15981 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 15982 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 16131 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 16132 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 16320 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 16358 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 16553 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 16622 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 16725 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 16726 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 16845 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 16925 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 16989 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 17095 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 17262 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 17305 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 17375 1 0 09:34 ? 00:00:02 globus-job-manager -conf > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> pbs_gcc -rdn > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >>nefedova 31347 14172 4 13:35 ? 00:00:00 /usr/bin/perl > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > >> >>pbs_gcc > >> >>-f /tmp/gram_1Drc0X -c poll > >> >>nefedova 31349 11295 4 13:35 ? 00:00:00 /usr/bin/perl > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > >> >>pbs_gcc > >> >>-f /tmp/gram_fSWNvt -c poll > >> >>nefedova 31352 11522 4 13:35 ? 00:00:00 /usr/bin/perl > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > >> >>pbs_gcc > >> >>-f /tmp/gram_a3whWx -c poll > >> >>nefedova 31380 16845 4 13:35 ? 00:00:00 /usr/bin/perl > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > >> >>pbs_gcc > >> >>-f /tmp/gram_zz0JuC -c poll > >> >>nefedova 31396 14832 4 13:35 ? 00:00:00 /usr/bin/perl > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > >> >>pbs_gcc > >> >>-f /tmp/gram_5jGFI1 -c poll > >> >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > >> >>/usr/local/pbs/ia64/bin/qstat -f 905629.tg-master.ncsa.teragrid.org > >> >>2>/dev/null > >> >>nefedova 31546 31512 0 13:35 ? 00:00:00 > >> >>/usr/local/pbs/ia64/bin/qstat -f > >> >>905629.tg-master.ncsa.teragrid.org > >> >>nefedova 31578 31555 0 13:35 ? 00:00:00 > >> >>/usr/local/pbs/ia64/bin/qstat -f > >> >>905638.tg-master.ncsa.teragrid.org > >> >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff tg-master.ncsa.teragrid.org > >> 15001 3 > >> >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff tg-master.ncsa.teragrid.org > >> 15001 3 From hategan at mcs.anl.gov Sat Mar 17 09:18:40 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 17 Mar 2007 09:18:40 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> References: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> Message-ID: <1174141121.23152.1.camel@blabla.mcs.anl.gov> There is no direct rate limiter unfortunately. There is a submit throttle which tells the number of concurrent submissions. Setting that to 1 may work. On Fri, 2007-03-16 at 17:02 -0500, Veronika V. Nefedova wrote: > Hi, Mihael: > > how do I set this throttling parameter ? > > Thanks, > > Nika > > >Date: Fri, 16 Mar 2007 15:53:57 -0600 > >Subject: Re: globus jobs killed on NCSA's IA64 cluster (mercury) > >To: nefedova at mcs.anl.gov > >From: consult at ncsa.uiuc.edu > >Cc: > >X-Mailer: Perl5 Mail::Internet v1.74 > >Sender: Nobody > >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9 > >X-NCSA-MailScanner-Information: Please contact help at ncsa.uiuc.edu for more > >information, amantadine.ncsa.uiuc.edu > >X-NCSA-MailScanner: Found to be clean > >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at mailgw.mcs.anl.gov > > > >FROM: Arnold, Galen > >(Concerning ticket No. 137212) > > > >Veronika, > > > >If you can throttle the job submission so that there's more than 1 second > >between them, that would probably help us out. > > > >-Galen > > > >Veronika V. Nefedova writes: > > >Hi, Galen: > > > > > >I was told that I could have 384 jobs in PBS at the same time. By my > > >estimation I had no more then 136 jobs in the queue. What are other limits > > >I should keep in mind ? > > > > > >Thanks, > > > > > >Veronika > > > > > >At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > >>FROM: Arnold, Galen > > >>(Concerning ticket No. 137212) > > >> > > >>Veronika, > > >> > > >>We see many sequences in our globus-gatekeeper.log on tg-login1 like this > > >> > > >>Notice: 5: Authenticated globus user: > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > >>Nefedova 137823 > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > >>Notice: 5: Requested service: jobmanager-pbs > > >>Notice: 5: Authorized as local user: nefedova > > >>Notice: 5: Authorized as local uid: 29202 > > >>Notice: 5: and local gid: 11467 > > >>Notice: 0: executing > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > >>Notice: 0: Child 16725 started > > >>Notice: 5: Authenticated globus user: > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > >>Nefedova 137823 > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > >>Notice: 5: Requested service: jobmanager-pbs > > >>Notice: 5: Authorized as local user: nefedova > > >>Notice: 5: Authorized as local uid: 29202 > > >>Notice: 5: and local gid: 11467 > > >>Notice: 0: executing > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > >>Notice: 0: Child 16726 started > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=16769 > > >>starting at Fri Mar 16 09:34:12 2007 > > >> > > >> > > >>...your DN is also popular in the globus-gatekeeper.log. > > >> > > >>tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep 'CN=Veronika Nefedova' > > >>globus-gatekeeper.log |wc -l > > >> 15215 > > >> > > >> > > >>The large number of connects around the same time: > > >> > > >> grep --after-context=10 Veronika globus-gatekeeper.log | grep starting | > > >> grep > > >>'Mar 16 09:3' | wc -l > > >> 65 > > >>... > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=15872 > > >>starting at Fri Mar 16 09:34:02 2007 > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=16644 > > >>starting at Fri Mar 16 09:34:11 2007 > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=16769 > > >>starting at Fri Mar 16 09:34:12 2007 > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=16926 > > >>starting at Fri Mar 16 09:34:14 2007 > > >>... > > >> > > >> > > >>...could be overwhelming the gatekeeper on tg-login1. > > >> > > >>-Galen > > >> > > >> > > >>Veronika V. Nefedova writes: > > >> >Hi, Dave: > > >> > > > >> >Could you please give me a bit more information on what has happened ? > > >> > > > >> >Thanks, > > >> > > > >> >Nika > > >> > > > >> >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > >> >>FROM: McWilliams, David G > > >> >>(Concerning ticket No. 137212) > > >> >> > > >> >>Veronika, > > >> >> > > >> >>The system administrator had to kill several of your globus > > processes today > > >> >>because the load average on the node was over 25. Below is a list of > > >> processes > > >> >>that were killed. Please let us know if you need help identifying the > > >> case of > > >> >>the problem. > > >> >> > > >> >>Regards, > > >> >> > > >> >>Dave McWilliams (217) 244-1144 consult at ncsa.uiuc.edu > > >> >>NCSA Consulting > > Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > > >> >>--------------------------------------------------------------------- > > ----- > > >> >> > > >> >>nefedova 11084 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11165 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11166 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11277 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11295 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11379 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11402 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11480 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11522 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11602 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11668 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11715 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11785 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11813 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11892 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 11948 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12004 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12030 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12133 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12172 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12253 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12256 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12443 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12460 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12504 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12534 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12657 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12668 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12773 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12806 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12892 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 12946 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13023 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13072 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13142 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13233 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13245 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13352 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13379 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13474 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13488 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13618 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13663 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13743 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13820 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13887 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 13952 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 14046 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 14048 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 14172 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 14196 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 14319 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 14366 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 14430 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 14539 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 14572 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 14703 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 14725 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 14832 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 14849 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15009 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15017 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15164 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15165 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15322 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15332 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15526 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15544 1 0 09:33 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15671 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15672 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15787 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15842 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15981 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 15982 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 16131 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 16132 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 16320 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 16358 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 16553 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 16622 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 16725 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 16726 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 16845 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 16925 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 16989 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 17095 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 17262 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 17305 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 17375 1 0 09:34 ? 00:00:02 globus-job-manager -conf > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> pbs_gcc -rdn > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >>nefedova 31347 14172 4 13:35 ? 00:00:00 /usr/bin/perl > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > > >> >>pbs_gcc > > >> >>-f /tmp/gram_1Drc0X -c poll > > >> >>nefedova 31349 11295 4 13:35 ? 00:00:00 /usr/bin/perl > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > > >> >>pbs_gcc > > >> >>-f /tmp/gram_fSWNvt -c poll > > >> >>nefedova 31352 11522 4 13:35 ? 00:00:00 /usr/bin/perl > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > > >> >>pbs_gcc > > >> >>-f /tmp/gram_a3whWx -c poll > > >> >>nefedova 31380 16845 4 13:35 ? 00:00:00 /usr/bin/perl > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > > >> >>pbs_gcc > > >> >>-f /tmp/gram_zz0JuC -c poll > > >> >>nefedova 31396 14832 4 13:35 ? 00:00:00 /usr/bin/perl > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script.pl -m > > >> >>pbs_gcc > > >> >>-f /tmp/gram_5jGFI1 -c poll > > >> >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > > >> >>/usr/local/pbs/ia64/bin/qstat -f 905629.tg-master.ncsa.teragrid.org > > >> >>2>/dev/null > > >> >>nefedova 31546 31512 0 13:35 ? 00:00:00 > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > >> >>905629.tg-master.ncsa.teragrid.org > > >> >>nefedova 31578 31555 0 13:35 ? 00:00:00 > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > >> >>905638.tg-master.ncsa.teragrid.org > > >> >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff tg-master.ncsa.teragrid.org > > >> 15001 3 > > >> >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff tg-master.ncsa.teragrid.org > > >> 15001 3 > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Sun Mar 18 08:01:34 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Sun, 18 Mar 2007 08:01:34 -0500 Subject: [Swift-devel] swift errors Message-ID: <6.0.0.22.2.20070318075459.04daf8d0@mail.mcs.anl.gov> HI, When I was running the same workflow that worked before without any errors - I got this errors on my stdout: Exception occured in the exception handling code, so it cannot be properly propagated to the user java.lang.IllegalArgumentException: This socket does not seem to exist in the socket pool. at org.globus.ftp.dc.SocketPool.remove(SocketPool.java:78) at org.globus.ftp.dc.GridFTPTransferSinkThread.shutdown(GridFTPTransferSinkThread.java:87) at org.globus.ftp.dc.TransferSinkThread.run(TransferSinkThread.java:78) And then these (also a lot of them): java.lang.OutOfMemoryErrorjava.lang.OutOfMemoryError java.lang.OutOfMemoryError The is on wiggum in /sandbox/ydeng/alamines/swift-MolDyn-free-final-lor82qgcceac0.log and I am suing this swift code/verison: /sandbox/nefedova/SWIFT/vdsk-0.1rc2/ Nika From yongzh at cs.uchicago.edu Sun Mar 18 10:31:43 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Sun, 18 Mar 2007 10:31:43 -0500 (CDT) Subject: [Swift-devel] swift errors In-Reply-To: <6.0.0.22.2.20070318075459.04daf8d0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070318075459.04daf8d0@mail.mcs.anl.gov> Message-ID: Hi Nika, The socket error is fine, it is due to some unsolved issue in the java GridFTP client, you can ignore that. For the memory, you need to set the memory option for JVM a bit larger, locate the execute swift in you swift_home/bin, and edit the OPTIONS OPTIONS="-Xms512m -Xmx512m" Yong. On Sun, 18 Mar 2007, Veronika V. Nefedova wrote: > HI, > > When I was running the same workflow that worked before without any errors > - I got this errors on my stdout: > > Exception occured in the exception handling code, so it cannot be properly > propagated to the user > java.lang.IllegalArgumentException: This socket does not seem to exist in > the socket pool. > at org.globus.ftp.dc.SocketPool.remove(SocketPool.java:78) > at > org.globus.ftp.dc.GridFTPTransferSinkThread.shutdown(GridFTPTransferSinkThread.java:87) > at > org.globus.ftp.dc.TransferSinkThread.run(TransferSinkThread.java:78) > > > And then these (also a lot of them): > java.lang.OutOfMemoryErrorjava.lang.OutOfMemoryError > java.lang.OutOfMemoryError > > The is on wiggum in > /sandbox/ydeng/alamines/swift-MolDyn-free-final-lor82qgcceac0.log and I am > suing this swift code/verison: /sandbox/nefedova/SWIFT/vdsk-0.1rc2/ > > Nika > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From yongzh at cs.uchicago.edu Sun Mar 18 10:32:49 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Sun, 18 Mar 2007 10:32:49 -0500 (CDT) Subject: [Swift-devel] swift errors In-Reply-To: <6.0.0.22.2.20070318075459.04daf8d0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070318075459.04daf8d0@mail.mcs.anl.gov> Message-ID: BTW, you may want to lower your transfer throttle in swift.properties. Yong. On Sun, 18 Mar 2007, Veronika V. Nefedova wrote: > HI, > > When I was running the same workflow that worked before without any errors > - I got this errors on my stdout: > > Exception occured in the exception handling code, so it cannot be properly > propagated to the user > java.lang.IllegalArgumentException: This socket does not seem to exist in > the socket pool. > at org.globus.ftp.dc.SocketPool.remove(SocketPool.java:78) > at > org.globus.ftp.dc.GridFTPTransferSinkThread.shutdown(GridFTPTransferSinkThread.java:87) > at > org.globus.ftp.dc.TransferSinkThread.run(TransferSinkThread.java:78) > > > And then these (also a lot of them): > java.lang.OutOfMemoryErrorjava.lang.OutOfMemoryError > java.lang.OutOfMemoryError > > The is on wiggum in > /sandbox/ydeng/alamines/swift-MolDyn-free-final-lor82qgcceac0.log and I am > suing this swift code/verison: /sandbox/nefedova/SWIFT/vdsk-0.1rc2/ > > Nika > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Sun Mar 18 19:05:54 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 19 Mar 2007 00:05:54 +0000 (GMT) Subject: [Swift-devel] source location passthrough In-Reply-To: <45EC1B0D.6070908@mcs.anl.gov> References: <45EC1B0D.6070908@mcs.anl.gov> Message-ID: it is bug 40 http://bugzilla.mcs.anl.gov/swift/post_bug.cgi On Mon, 5 Mar 2007, Mike Wilde wrote: > I think that this is a great idea. Can you file it in bugz so we can sketch a > rough design and estimate its cost in a later release? 0.3 or beyond, likely. > > - Mike > > Ben Clifford wrote, On 3/5/2007 6:36 AM: > > A bunch of error reporting might be made more useful if we pass the source > > file location (line, at least) through the various intermediate languages so > > that errors like this: > > > > $ swift fmri.swift Swift V 0.0405 > > RunID: 8zr1mg5964q90 > > Execution failed: > > java.lang.IndexOutOfBoundsException: Index: 1, Size: 1 > > > > have a bit more context. > > > > Bit of a hassle to implement, though. > > > > From benc at hawaga.org.uk Sun Mar 18 19:06:30 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 19 Mar 2007 00:06:30 +0000 (GMT) Subject: [Swift-devel] source location passthrough In-Reply-To: References: <45EC1B0D.6070908@mcs.anl.gov> Message-ID: On Mon, 19 Mar 2007, Ben Clifford wrote: > http://bugzilla.mcs.anl.gov/swift/post_bug.cgi wrong! http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=40 -- From hategan at mcs.anl.gov Sun Mar 18 20:46:54 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 18 Mar 2007 20:46:54 -0500 Subject: [Swift-devel] swift errors In-Reply-To: <6.0.0.22.2.20070318075459.04daf8d0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070318075459.04daf8d0@mail.mcs.anl.gov> Message-ID: <1174268814.5287.9.camel@blabla.mcs.anl.gov> On Sun, 2007-03-18 at 08:01 -0500, Veronika V. Nefedova wrote: > HI, > > When I was running the same workflow that worked before without any errors > - I got this errors on my stdout: > > Exception occured in the exception handling code, so it cannot be properly > propagated to the user > java.lang.IllegalArgumentException: This socket does not seem to exist in > the socket pool. > at org.globus.ftp.dc.SocketPool.remove(SocketPool.java:78) > at > org.globus.ftp.dc.GridFTPTransferSinkThread.shutdown(GridFTPTransferSinkThread.java:87) > at > org.globus.ftp.dc.TransferSinkThread.run(TransferSinkThread.java:78) > Grr. I'll try to see what can be done to remove that error. > > And then these (also a lot of them): > java.lang.OutOfMemoryErrorjava.lang.OutOfMemoryError > java.lang.OutOfMemoryError If you get those, you should ctrl+c the workflow and restart it with more memory. Edit swift/bin/swift and on the OPTIONS= line say: OPTIONS="-Xmx256M" (or whatever heap limit you want it to have - the default is 32M I think). > > The is on wiggum in > /sandbox/ydeng/alamines/swift-MolDyn-free-final-lor82qgcceac0.log and I am > suing this swift code/verison: /sandbox/nefedova/SWIFT/vdsk-0.1rc2/ > > Nika > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From tiberius at ci.uchicago.edu Tue Mar 20 10:50:36 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Tue, 20 Mar 2007 10:50:36 -0500 Subject: [Swift-devel] req: int2string Message-ID: Would like to do something like this: foreach int i in 0..35 { file solution; solution = invoke($i); } Reason: I need to control (differentiate) the name of the outputs that are being generated in the loop such that they do not overwrite themselves when they are staged out. Any alternatives ? Tibi -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From hategan at mcs.anl.gov Tue Mar 20 04:57:55 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 20 Mar 2007 04:57:55 -0500 Subject: [Swift-devel] req: int2string In-Reply-To: References: Message-ID: <1174384675.16980.2.camel@blabla.mcs.anl.gov> try source=@string(i) On Tue, 2007-03-20 at 10:50 -0500, Tiberiu Stef-Praun wrote: > Would like to do something like this: > > foreach int i in 0..35 { > file > solution; > solution = invoke($i); > } > > Reason: > I need to control (differentiate) the name of the outputs that are > being generated in the loop such that they do not overwrite themselves > when they are staged out. > > Any alternatives ? > > Tibi > From yongzh at cs.uchicago.edu Tue Mar 20 11:12:02 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Tue, 20 Mar 2007 11:12:02 -0500 (CDT) Subject: [Swift-devel] req: int2string In-Reply-To: References: Message-ID: you can use a simple mapper solution; Yong. On Tue, 20 Mar 2007, Tiberiu Stef-Praun wrote: > Would like to do something like this: > > foreach int i in 0..35 { > file > solution; > solution = invoke($i); > } > > Reason: > I need to control (differentiate) the name of the outputs that are > being generated in the loop such that they do not overwrite themselves > when they are staged out. > > Any alternatives ? > > Tibi > > -- > Tiberiu (Tibi) Stef-Praun, PhD > Research Staff, Computation Institute > 5640 S. Ellis Ave, #405 > University of Chicago > http://www-unix.mcs.anl.gov/~tiberius/ > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From tiberius at ci.uchicago.edu Tue Mar 20 11:23:29 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Tue, 20 Mar 2007 11:23:29 -0500 Subject: [Swift-devel] req: int2string In-Reply-To: References: Message-ID: Given that I was about to use it in a loop, I have this uncertainty when the loop is unrolled. ; ; ; will it generate: 1.1.out 2.2.out or 1.1.out 2.1.out ? I though the simple_mapper would generate that numeral in between prefix and suffix, at least that is how I experienced it in the BRIC workflow. also, can you use a numeral in place of a string: int i .. wrote: > you can use a simple mapper > > solution; > > Yong. > > On Tue, 20 Mar 2007, Tiberiu Stef-Praun wrote: > > > Would like to do something like this: > > > > foreach int i in 0..35 { > > file > > solution; > > solution = invoke($i); > > } > > > > Reason: > > I need to control (differentiate) the name of the outputs that are > > being generated in the loop such that they do not overwrite themselves > > when they are staged out. > > > > Any alternatives ? > > > > Tibi > > > > -- > > Tiberiu (Tibi) Stef-Praun, PhD > > Research Staff, Computation Institute > > 5640 S. Ellis Ave, #405 > > University of Chicago > > http://www-unix.mcs.anl.gov/~tiberius/ > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From yongzh at cs.uchicago.edu Tue Mar 20 11:28:08 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Tue, 20 Mar 2007 11:28:08 -0500 (CDT) Subject: [Swift-devel] req: int2string In-Reply-To: References: Message-ID: It'll generate something in between if it is mapped for an array, but not for singletons. so in your case it is just: 1.out 2.out etc On Tue, 20 Mar 2007, Tiberiu Stef-Praun wrote: > Given that I was about to use it in a loop, I have this uncertainty > when the loop is unrolled. > ; > ; > ; > will it generate: > 1.1.out > 2.2.out > > or > > 1.1.out > 2.1.out > ? > > I though the simple_mapper would generate that numeral in between > prefix and suffix, at least that is how I experienced it in the BRIC > workflow. > > also, can you use a numeral in place of a string: > int i > .. > Tibi > > On 3/20/07, Yong Zhao wrote: > > you can use a simple mapper > > > > solution; > > > > Yong. > > > > On Tue, 20 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > Would like to do something like this: > > > > > > foreach int i in 0..35 { > > > file > > > solution; > > > solution = invoke($i); > > > } > > > > > > Reason: > > > I need to control (differentiate) the name of the outputs that are > > > being generated in the loop such that they do not overwrite themselves > > > when they are staged out. > > > > > > Any alternatives ? > > > > > > Tibi > > > > > > -- > > > Tiberiu (Tibi) Stef-Praun, PhD > > > Research Staff, Computation Institute > > > 5640 S. Ellis Ave, #405 > > > University of Chicago > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -- > Tiberiu (Tibi) Stef-Praun, PhD > Research Staff, Computation Institute > 5640 S. Ellis Ave, #405 > University of Chicago > http://www-unix.mcs.anl.gov/~tiberius/ > From tiberius at ci.uchicago.edu Tue Mar 20 11:30:49 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Tue, 20 Mar 2007 11:30:49 -0500 Subject: [Swift-devel] req: int2string In-Reply-To: References: Message-ID: Perfect Thanks On 3/20/07, Yong Zhao wrote: > It'll generate something in between if it is mapped for an array, but not > for singletons. so in your case it is just: > > 1.out > 2.out > etc > > > On Tue, 20 Mar 2007, Tiberiu Stef-Praun wrote: > > > Given that I was about to use it in a loop, I have this uncertainty > > when the loop is unrolled. > > ; > > ; > > ; > > will it generate: > > 1.1.out > > 2.2.out > > > > or > > > > 1.1.out > > 2.1.out > > ? > > > > I though the simple_mapper would generate that numeral in between > > prefix and suffix, at least that is how I experienced it in the BRIC > > workflow. > > > > also, can you use a numeral in place of a string: > > int i > > .. > > > Tibi > > > > On 3/20/07, Yong Zhao wrote: > > > you can use a simple mapper > > > > > > solution; > > > > > > Yong. > > > > > > On Tue, 20 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > > > Would like to do something like this: > > > > > > > > foreach int i in 0..35 { > > > > file > > > > solution; > > > > solution = invoke($i); > > > > } > > > > > > > > Reason: > > > > I need to control (differentiate) the name of the outputs that are > > > > being generated in the loop such that they do not overwrite themselves > > > > when they are staged out. > > > > > > > > Any alternatives ? > > > > > > > > Tibi > > > > > > > > -- > > > > Tiberiu (Tibi) Stef-Praun, PhD > > > > Research Staff, Computation Institute > > > > 5640 S. Ellis Ave, #405 > > > > University of Chicago > > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > -- > > Tiberiu (Tibi) Stef-Praun, PhD > > Research Staff, Computation Institute > > 5640 S. Ellis Ave, #405 > > University of Chicago > > http://www-unix.mcs.anl.gov/~tiberius/ > > > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From tiberius at ci.uchicago.edu Tue Mar 20 14:42:35 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Tue, 20 Mar 2007 14:42:35 -0500 Subject: [Swift-devel] req: int2string In-Reply-To: References: Message-ID: On 3/20/07, Yong Zhao wrote: > you can use a simple mapper > > solution; Execution failed: java.lang.ClassCastException :) > > Yong. > > On Tue, 20 Mar 2007, Tiberiu Stef-Praun wrote: > > > Would like to do something like this: > > > > foreach int i in 0..35 { > > file > > solution; > > solution = invoke($i); > > } > > > > Reason: > > I need to control (differentiate) the name of the outputs that are > > being generated in the loop such that they do not overwrite themselves > > when they are staged out. > > > > Any alternatives ? > > > > Tibi > > > > -- > > Tiberiu (Tibi) Stef-Praun, PhD > > Research Staff, Computation Institute > > 5640 S. Ellis Ave, #405 > > University of Chicago > > http://www-unix.mcs.anl.gov/~tiberius/ > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From hategan at mcs.anl.gov Tue Mar 20 14:44:39 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 20 Mar 2007 14:44:39 -0500 Subject: [Swift-devel] req: int2string In-Reply-To: References: Message-ID: <1174419879.22328.0.camel@blabla.mcs.anl.gov> @string(i) On Tue, 2007-03-20 at 14:42 -0500, Tiberiu Stef-Praun wrote: > On 3/20/07, Yong Zhao wrote: > > you can use a simple mapper > > > > solution; > > Execution failed: > java.lang.ClassCastException > > :) > > > > > Yong. > > > > On Tue, 20 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > Would like to do something like this: > > > > > > foreach int i in 0..35 { > > > file > > > solution; > > > solution = invoke($i); > > > } > > > > > > Reason: > > > I need to control (differentiate) the name of the outputs that are > > > being generated in the loop such that they do not overwrite themselves > > > when they are staged out. > > > > > > Any alternatives ? > > > > > > Tibi > > > > > > -- > > > Tiberiu (Tibi) Stef-Praun, PhD > > > Research Staff, Computation Institute > > > 5640 S. Ellis Ave, #405 > > > University of Chicago > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > From tiberius at ci.uchicago.edu Tue Mar 20 14:45:50 2007 From: tiberius at ci.uchicago.edu (Tiberiu Stef-Praun) Date: Tue, 20 Mar 2007 14:45:50 -0500 Subject: [Swift-devel] req: int2string In-Reply-To: <1174419879.22328.0.camel@blabla.mcs.anl.gov> References: <1174419879.22328.0.camel@blabla.mcs.anl.gov> Message-ID: Could not compile SwiftScript source: line 19:53: unexpected token: @ :) On 3/20/07, Mihael Hategan wrote: > @string(i) > > On Tue, 2007-03-20 at 14:42 -0500, Tiberiu Stef-Praun wrote: > > On 3/20/07, Yong Zhao wrote: > > > you can use a simple mapper > > > > > > solution; > > > > Execution failed: > > java.lang.ClassCastException > > > > :) > > > > > > > > Yong. > > > > > > On Tue, 20 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > > > Would like to do something like this: > > > > > > > > foreach int i in 0..35 { > > > > file > > > > solution; > > > > solution = invoke($i); > > > > } > > > > > > > > Reason: > > > > I need to control (differentiate) the name of the outputs that are > > > > being generated in the loop such that they do not overwrite themselves > > > > when they are staged out. > > > > > > > > Any alternatives ? > > > > > > > > Tibi > > > > > > > > -- > > > > Tiberiu (Tibi) Stef-Praun, PhD > > > > Research Staff, Computation Institute > > > > 5640 S. Ellis Ave, #405 > > > > University of Chicago > > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > -- Tiberiu (Tibi) Stef-Praun, PhD Research Staff, Computation Institute 5640 S. Ellis Ave, #405 University of Chicago http://www-unix.mcs.anl.gov/~tiberius/ From hategan at mcs.anl.gov Tue Mar 20 14:46:47 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 20 Mar 2007 14:46:47 -0500 Subject: [Swift-devel] req: int2string In-Reply-To: <1174419879.22328.0.camel@blabla.mcs.anl.gov> References: <1174419879.22328.0.camel@blabla.mcs.anl.gov> Message-ID: <1174420007.22328.4.camel@blabla.mcs.anl.gov> Actually wait a second. This would be a dependent mapper. This can't be done unless the mapper code deals with it. On Tue, 2007-03-20 at 14:44 -0500, Mihael Hategan wrote: > @string(i) > > On Tue, 2007-03-20 at 14:42 -0500, Tiberiu Stef-Praun wrote: > > On 3/20/07, Yong Zhao wrote: > > > you can use a simple mapper > > > > > > solution; > > > > Execution failed: > > java.lang.ClassCastException > > > > :) > > > > > > > > Yong. > > > > > > On Tue, 20 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > > > Would like to do something like this: > > > > > > > > foreach int i in 0..35 { > > > > file > > > > solution; > > > > solution = invoke($i); > > > > } > > > > > > > > Reason: > > > > I need to control (differentiate) the name of the outputs that are > > > > being generated in the loop such that they do not overwrite themselves > > > > when they are staged out. > > > > > > > > Any alternatives ? > > > > > > > > Tibi > > > > > > > > -- > > > > Tiberiu (Tibi) Stef-Praun, PhD > > > > Research Staff, Computation Institute > > > > 5640 S. Ellis Ave, #405 > > > > University of Chicago > > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > From benc at hawaga.org.uk Tue Mar 20 14:47:31 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 20 Mar 2007 19:47:31 +0000 (GMT) Subject: [Swift-devel] req: int2string In-Reply-To: References: <1174419879.22328.0.camel@blabla.mcs.anl.gov> Message-ID: what is line 19? (and the line before and after?) On Tue, 20 Mar 2007, Tiberiu Stef-Praun wrote: > Could not compile SwiftScript source: line 19:53: unexpected token: @ > > :) > > > On 3/20/07, Mihael Hategan wrote: > > @string(i) > > > > On Tue, 2007-03-20 at 14:42 -0500, Tiberiu Stef-Praun wrote: > > > On 3/20/07, Yong Zhao wrote: > > > > you can use a simple mapper > > > > > > > > solution; > > > > > > Execution failed: > > > java.lang.ClassCastException > > > > > > :) > > > > > > > > > > > Yong. > > > > > > > > On Tue, 20 Mar 2007, Tiberiu Stef-Praun wrote: > > > > > > > > > Would like to do something like this: > > > > > > > > > > foreach int i in 0..35 { > > > > > file > > > > > > > solution; > > > > > solution = invoke($i); > > > > > } > > > > > > > > > > Reason: > > > > > I need to control (differentiate) the name of the outputs that are > > > > > being generated in the loop such that they do not overwrite themselves > > > > > when they are staged out. > > > > > > > > > > Any alternatives ? > > > > > > > > > > Tibi > > > > > > > > > > -- > > > > > Tiberiu (Tibi) Stef-Praun, PhD > > > > > Research Staff, Computation Institute > > > > > 5640 S. Ellis Ave, #405 > > > > > University of Chicago > > > > > http://www-unix.mcs.anl.gov/~tiberius/ > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > From nefedova at mcs.anl.gov Wed Mar 21 10:14:17 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 21 Mar 2007 10:14:17 -0500 Subject: [Swift-devel] swift errors In-Reply-To: References: <6.0.0.22.2.20070318075459.04daf8d0@mail.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070321101118.04488930@mail.mcs.anl.gov> Hi, Yong, I found these OPTIONS entries in swift exec file: [59] wiggum /sandbox/nefedova/SWIFT/vdsk-0.1rc2/bin > grep OPTIONS swift OPTIONS= OPTIONS="$OPTIONS -D$2=$1" OPTIONS="$OPTIONS -Djava.endorsed.dirs=$SWIFT_HOME/lib/endorsed" eval java ${OPTIONS} ${COG_OPTS} ${EXEC} ${CMDLINE} eval java ${OPTIONS} ${COG_OPTS} -classpath ${LOCALCLASSPATH} ${EXEC} ${CMDLINE} Which one should be modified to increase the heap size ( I assume the first one? -- just want to be sure). Thanks, Nika At 10:31 AM 3/18/2007, Yong Zhao wrote: >Hi Nika, > >The socket error is fine, it is due to some unsolved issue in the >java GridFTP client, you can ignore that. > >For the memory, you need to set the memory option for JVM a bit larger, >locate the execute swift in you swift_home/bin, and edit the OPTIONS >OPTIONS="-Xms512m -Xmx512m" > >Yong. > > >On Sun, 18 Mar 2007, Veronika V. Nefedova wrote: > > > HI, > > > > When I was running the same workflow that worked before without any errors > > - I got this errors on my stdout: > > > > Exception occured in the exception handling code, so it cannot be properly > > propagated to the user > > java.lang.IllegalArgumentException: This socket does not seem to exist in > > the socket pool. > > at org.globus.ftp.dc.SocketPool.remove(SocketPool.java:78) > > at > > > org.globus.ftp.dc.GridFTPTransferSinkThread.shutdown(GridFTPTransferSinkThread.java:87) > > at > > org.globus.ftp.dc.TransferSinkThread.run(TransferSinkThread.java:78) > > > > > > And then these (also a lot of them): > > java.lang.OutOfMemoryErrorjava.lang.OutOfMemoryError > > java.lang.OutOfMemoryError > > > > The is on wiggum in > > /sandbox/ydeng/alamines/swift-MolDyn-free-final-lor82qgcceac0.log and I am > > suing this swift code/verison: /sandbox/nefedova/SWIFT/vdsk-0.1rc2/ > > > > Nika > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From yongzh at cs.uchicago.edu Wed Mar 21 10:23:28 2007 From: yongzh at cs.uchicago.edu (Yong Zhao) Date: Wed, 21 Mar 2007 10:23:28 -0500 (CDT) Subject: [Swift-devel] swift errors In-Reply-To: <6.0.0.22.2.20070321101118.04488930@mail.mcs.anl.gov> References: <6.0.0.22.2.20070318075459.04daf8d0@mail.mcs.anl.gov> <6.0.0.22.2.20070321101118.04488930@mail.mcs.anl.gov> Message-ID: Yes the first line: OPTIONS= replace it with OPTIONS="-Xms512m -Xmx512m" Yong. On Wed, 21 Mar 2007, Veronika V. Nefedova wrote: > Hi, Yong, > > I found these OPTIONS entries in swift exec file: > > [59] wiggum /sandbox/nefedova/SWIFT/vdsk-0.1rc2/bin > grep OPTIONS swift > OPTIONS= > OPTIONS="$OPTIONS -D$2=$1" > OPTIONS="$OPTIONS -Djava.endorsed.dirs=$SWIFT_HOME/lib/endorsed" > eval java ${OPTIONS} ${COG_OPTS} ${EXEC} ${CMDLINE} > eval java ${OPTIONS} ${COG_OPTS} -classpath ${LOCALCLASSPATH} ${EXEC} > ${CMDLINE} > > Which one should be modified to increase the heap size ( I assume the first > one? -- just want to be sure). > > Thanks, > > Nika > > At 10:31 AM 3/18/2007, Yong Zhao wrote: > >Hi Nika, > > > >The socket error is fine, it is due to some unsolved issue in the > >java GridFTP client, you can ignore that. > > > >For the memory, you need to set the memory option for JVM a bit larger, > >locate the execute swift in you swift_home/bin, and edit the OPTIONS > >OPTIONS="-Xms512m -Xmx512m" > > > >Yong. > > > > > >On Sun, 18 Mar 2007, Veronika V. Nefedova wrote: > > > > > HI, > > > > > > When I was running the same workflow that worked before without any errors > > > - I got this errors on my stdout: > > > > > > Exception occured in the exception handling code, so it cannot be properly > > > propagated to the user > > > java.lang.IllegalArgumentException: This socket does not seem to exist in > > > the socket pool. > > > at org.globus.ftp.dc.SocketPool.remove(SocketPool.java:78) > > > at > > > > > org.globus.ftp.dc.GridFTPTransferSinkThread.shutdown(GridFTPTransferSinkThread.java:87) > > > at > > > org.globus.ftp.dc.TransferSinkThread.run(TransferSinkThread.java:78) > > > > > > > > > And then these (also a lot of them): > > > java.lang.OutOfMemoryErrorjava.lang.OutOfMemoryError > > > java.lang.OutOfMemoryError > > > > > > The is on wiggum in > > > /sandbox/ydeng/alamines/swift-MolDyn-free-final-lor82qgcceac0.log and I am > > > suing this swift code/verison: /sandbox/nefedova/SWIFT/vdsk-0.1rc2/ > > > > > > Nika > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > From nefedova at mcs.anl.gov Wed Mar 21 10:27:41 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 21 Mar 2007 10:27:41 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <1174141121.23152.1.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> <1174141121.23152.1.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070321102155.044728e0@mail.mcs.anl.gov> Hi, Mihael: I have these properties modified in my scheduler.xml file: Are you suggesting to add also this inside ... : ? Do these set parameters guarantee me that: 1. I have no more then 384 jobs in a queue at any time and 2. Jobs are submitted to the queue with at least 1 sec delay (these are the requirements from TG NCSA). Thanks! Nika At 09:18 AM 3/17/2007, Mihael Hategan wrote: >There is no direct rate limiter unfortunately. There is a submit >throttle which tells the number of concurrent submissions. Setting that >to 1 may work. > >On Fri, 2007-03-16 at 17:02 -0500, Veronika V. Nefedova wrote: > > Hi, Mihael: > > > > how do I set this throttling parameter ? > > > > Thanks, > > > > Nika > > > > >Date: Fri, 16 Mar 2007 15:53:57 -0600 > > >Subject: Re: globus jobs killed on NCSA's IA64 cluster (mercury) > > >To: nefedova at mcs.anl.gov > > >From: consult at ncsa.uiuc.edu > > >Cc: > > >X-Mailer: Perl5 Mail::Internet v1.74 > > >Sender: Nobody > > >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9 > > >X-NCSA-MailScanner-Information: Please contact help at ncsa.uiuc.edu for > more > > >information, amantadine.ncsa.uiuc.edu > > >X-NCSA-MailScanner: Found to be clean > > >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at > mailgw.mcs.anl.gov > > > > > >FROM: Arnold, Galen > > >(Concerning ticket No. 137212) > > > > > >Veronika, > > > > > >If you can throttle the job submission so that there's more than 1 second > > >between them, that would probably help us out. > > > > > >-Galen > > > > > >Veronika V. Nefedova writes: > > > >Hi, Galen: > > > > > > > >I was told that I could have 384 jobs in PBS at the same time. By my > > > >estimation I had no more then 136 jobs in the queue. What are other > limits > > > >I should keep in mind ? > > > > > > > >Thanks, > > > > > > > >Veronika > > > > > > > >At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > >>FROM: Arnold, Galen > > > >>(Concerning ticket No. 137212) > > > >> > > > >>Veronika, > > > >> > > > >>We see many sequences in our globus-gatekeeper.log on tg-login1 > like this > > > >> > > > >>Notice: 5: Authenticated globus user: > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > >>Nefedova 137823 > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > >>Notice: 5: Requested service: jobmanager-pbs > > > >>Notice: 5: Authorized as local user: nefedova > > > >>Notice: 5: Authorized as local uid: 29202 > > > >>Notice: 5: and local gid: 11467 > > > >>Notice: 0: executing > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > >>Notice: 0: Child 16725 started > > > >>Notice: 5: Authenticated globus user: > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > >>Nefedova 137823 > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > >>Notice: 5: Requested service: jobmanager-pbs > > > >>Notice: 5: Authorized as local user: nefedova > > > >>Notice: 5: Authorized as local uid: 29202 > > > >>Notice: 5: and local gid: 11467 > > > >>Notice: 0: executing > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > >>Notice: 0: Child 16726 started > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > pid=16769 > > > >>starting at Fri Mar 16 09:34:12 2007 > > > >> > > > >> > > > >>...your DN is also popular in the globus-gatekeeper.log. > > > >> > > > >>tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep 'CN=Veronika > Nefedova' > > > >>globus-gatekeeper.log |wc -l > > > >> 15215 > > > >> > > > >> > > > >>The large number of connects around the same time: > > > >> > > > >> grep --after-context=10 Veronika globus-gatekeeper.log | grep > starting | > > > >> grep > > > >>'Mar 16 09:3' | wc -l > > > >> 65 > > > >>... > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > pid=15872 > > > >>starting at Fri Mar 16 09:34:02 2007 > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > pid=16644 > > > >>starting at Fri Mar 16 09:34:11 2007 > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > pid=16769 > > > >>starting at Fri Mar 16 09:34:12 2007 > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > pid=16926 > > > >>starting at Fri Mar 16 09:34:14 2007 > > > >>... > > > >> > > > >> > > > >>...could be overwhelming the gatekeeper on tg-login1. > > > >> > > > >>-Galen > > > >> > > > >> > > > >>Veronika V. Nefedova writes: > > > >> >Hi, Dave: > > > >> > > > > >> >Could you please give me a bit more information on what has > happened ? > > > >> > > > > >> >Thanks, > > > >> > > > > >> >Nika > > > >> > > > > >> >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > >> >>FROM: McWilliams, David G > > > >> >>(Concerning ticket No. 137212) > > > >> >> > > > >> >>Veronika, > > > >> >> > > > >> >>The system administrator had to kill several of your globus > > > processes today > > > >> >>because the load average on the node was over 25. Below is a list of > > > >> processes > > > >> >>that were killed. Please let us know if you need help > identifying the > > > >> case of > > > >> >>the problem. > > > >> >> > > > >> >>Regards, > > > >> >> > > > >> >>Dave McWilliams (217) > 244-1144 consult at ncsa.uiuc.edu > > > >> >>NCSA Consulting > > > Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > > > >> >>----------------------------------------------------------------- > ---- > > > ----- > > > >> >> > > > >> >>nefedova 11084 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11165 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11166 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11277 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11295 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11379 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11402 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11480 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11522 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11602 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11668 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11715 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11785 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11813 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11892 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 11948 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12004 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12030 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12133 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12172 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12253 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12256 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12443 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12460 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12504 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12534 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12657 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12668 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12773 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12806 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12892 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 12946 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13023 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13072 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13142 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13233 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13245 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13352 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13379 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13474 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13488 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13618 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13663 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13743 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13820 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13887 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 13952 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 14046 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 14048 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 14172 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 14196 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 14319 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 14366 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 14430 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 14539 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 14572 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 14703 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 14725 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 14832 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 14849 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15009 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15017 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15164 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15165 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15322 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15332 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15526 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15544 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15671 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15672 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15787 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15842 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15981 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 15982 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 16131 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 16132 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 16320 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 16358 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 16553 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 16622 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 16725 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 16726 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 16845 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 16925 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 16989 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 17095 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 17262 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 17305 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 17375 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > >> pbs_gcc -rdn > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > >> >>nefedova 31347 14172 4 13:35 ? 00:00:00 /usr/bin/perl > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > .pl -m > > > >> >>pbs_gcc > > > >> >>-f /tmp/gram_1Drc0X -c poll > > > >> >>nefedova 31349 11295 4 13:35 ? 00:00:00 /usr/bin/perl > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > .pl -m > > > >> >>pbs_gcc > > > >> >>-f /tmp/gram_fSWNvt -c poll > > > >> >>nefedova 31352 11522 4 13:35 ? 00:00:00 /usr/bin/perl > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > .pl -m > > > >> >>pbs_gcc > > > >> >>-f /tmp/gram_a3whWx -c poll > > > >> >>nefedova 31380 16845 4 13:35 ? 00:00:00 /usr/bin/perl > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > .pl -m > > > >> >>pbs_gcc > > > >> >>-f /tmp/gram_zz0JuC -c poll > > > >> >>nefedova 31396 14832 4 13:35 ? 00:00:00 /usr/bin/perl > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > .pl -m > > > >> >>pbs_gcc > > > >> >>-f /tmp/gram_5jGFI1 -c poll > > > >> >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > > > >> >>/usr/local/pbs/ia64/bin/qstat -f 905629.tg-master.ncsa.teragrid.org > > > >> >>2>/dev/null > > > >> >>nefedova 31546 31512 0 13:35 ? 00:00:00 > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > >> >>905629.tg-master.ncsa.teragrid.org > > > >> >>nefedova 31578 31555 0 13:35 ? 00:00:00 > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > >> >>905638.tg-master.ncsa.teragrid.org > > > >> >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > tg-master.ncsa.teragrid.org > > > >> 15001 3 > > > >> >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > tg-master.ncsa.teragrid.org > > > >> 15001 3 > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From nefedova at mcs.anl.gov Wed Mar 21 10:27:59 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 21 Mar 2007 10:27:59 -0500 Subject: [Swift-devel] swift errors In-Reply-To: References: <6.0.0.22.2.20070318075459.04daf8d0@mail.mcs.anl.gov> <6.0.0.22.2.20070321101118.04488930@mail.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070321102745.0446c4e0@mail.mcs.anl.gov> Thanks, Yong! Nika At 10:23 AM 3/21/2007, Yong Zhao wrote: >Yes the first line: > >OPTIONS= > >replace it with > >OPTIONS="-Xms512m -Xmx512m" > >Yong. > >On Wed, 21 Mar 2007, Veronika V. Nefedova wrote: > > > Hi, Yong, > > > > I found these OPTIONS entries in swift exec file: > > > > [59] wiggum /sandbox/nefedova/SWIFT/vdsk-0.1rc2/bin > grep OPTIONS swift > > OPTIONS= > > OPTIONS="$OPTIONS -D$2=$1" > > OPTIONS="$OPTIONS -Djava.endorsed.dirs=$SWIFT_HOME/lib/endorsed" > > eval java ${OPTIONS} ${COG_OPTS} ${EXEC} ${CMDLINE} > > eval java ${OPTIONS} ${COG_OPTS} -classpath ${LOCALCLASSPATH} ${EXEC} > > ${CMDLINE} > > > > Which one should be modified to increase the heap size ( I assume the first > > one? -- just want to be sure). > > > > Thanks, > > > > Nika > > > > At 10:31 AM 3/18/2007, Yong Zhao wrote: > > >Hi Nika, > > > > > >The socket error is fine, it is due to some unsolved issue in the > > >java GridFTP client, you can ignore that. > > > > > >For the memory, you need to set the memory option for JVM a bit larger, > > >locate the execute swift in you swift_home/bin, and edit the OPTIONS > > >OPTIONS="-Xms512m -Xmx512m" > > > > > >Yong. > > > > > > > > >On Sun, 18 Mar 2007, Veronika V. Nefedova wrote: > > > > > > > HI, > > > > > > > > When I was running the same workflow that worked before without > any errors > > > > - I got this errors on my stdout: > > > > > > > > Exception occured in the exception handling code, so it cannot be > properly > > > > propagated to the user > > > > java.lang.IllegalArgumentException: This socket does not seem to > exist in > > > > the socket pool. > > > > at org.globus.ftp.dc.SocketPool.remove(SocketPool.java:78) > > > > at > > > > > > > > org.globus.ftp.dc.GridFTPTransferSinkThread.shutdown(GridFTPTransferSinkThread.java:87) > > > > at > > > > org.globus.ftp.dc.TransferSinkThread.run(TransferSinkThread.java:78) > > > > > > > > > > > > And then these (also a lot of them): > > > > java.lang.OutOfMemoryErrorjava.lang.OutOfMemoryError > > > > java.lang.OutOfMemoryError > > > > > > > > The is on wiggum in > > > > /sandbox/ydeng/alamines/swift-MolDyn-free-final-lor82qgcceac0.log > and I am > > > > suing this swift code/verison: /sandbox/nefedova/SWIFT/vdsk-0.1rc2/ > > > > > > > > Nika > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > From hategan at mcs.anl.gov Wed Mar 21 10:31:16 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 21 Mar 2007 10:31:16 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <6.0.0.22.2.20070321102155.044728e0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> <1174141121.23152.1.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321102155.044728e0@mail.mcs.anl.gov> Message-ID: <1174491076.7102.0.camel@blabla.mcs.anl.gov> On Wed, 2007-03-21 at 10:27 -0500, Veronika V. Nefedova wrote: > Hi, Mihael: > > I have these properties modified in my scheduler.xml file: > > > > > Are you suggesting to add also this inside ... : > > ? > > Do these set parameters guarantee me that: > > 1. I have no more then 384 jobs in a queue at any time > and > 2. Jobs are submitted to the queue with at least 1 sec delay No. They don't. But they may get you closer to that. > > (these are the requirements from TG NCSA). > > Thanks! > > Nika > > At 09:18 AM 3/17/2007, Mihael Hategan wrote: > >There is no direct rate limiter unfortunately. There is a submit > >throttle which tells the number of concurrent submissions. Setting that > >to 1 may work. > > > >On Fri, 2007-03-16 at 17:02 -0500, Veronika V. Nefedova wrote: > > > Hi, Mihael: > > > > > > how do I set this throttling parameter ? > > > > > > Thanks, > > > > > > Nika > > > > > > >Date: Fri, 16 Mar 2007 15:53:57 -0600 > > > >Subject: Re: globus jobs killed on NCSA's IA64 cluster (mercury) > > > >To: nefedova at mcs.anl.gov > > > >From: consult at ncsa.uiuc.edu > > > >Cc: > > > >X-Mailer: Perl5 Mail::Internet v1.74 > > > >Sender: Nobody > > > >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9 > > > >X-NCSA-MailScanner-Information: Please contact help at ncsa.uiuc.edu for > > more > > > >information, amantadine.ncsa.uiuc.edu > > > >X-NCSA-MailScanner: Found to be clean > > > >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at > > mailgw.mcs.anl.gov > > > > > > > >FROM: Arnold, Galen > > > >(Concerning ticket No. 137212) > > > > > > > >Veronika, > > > > > > > >If you can throttle the job submission so that there's more than 1 second > > > >between them, that would probably help us out. > > > > > > > >-Galen > > > > > > > >Veronika V. Nefedova writes: > > > > >Hi, Galen: > > > > > > > > > >I was told that I could have 384 jobs in PBS at the same time. By my > > > > >estimation I had no more then 136 jobs in the queue. What are other > > limits > > > > >I should keep in mind ? > > > > > > > > > >Thanks, > > > > > > > > > >Veronika > > > > > > > > > >At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > >>FROM: Arnold, Galen > > > > >>(Concerning ticket No. 137212) > > > > >> > > > > >>Veronika, > > > > >> > > > > >>We see many sequences in our globus-gatekeeper.log on tg-login1 > > like this > > > > >> > > > > >>Notice: 5: Authenticated globus user: > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > >>Nefedova 137823 > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > >>Notice: 5: Authorized as local user: nefedova > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > >>Notice: 5: and local gid: 11467 > > > > >>Notice: 0: executing > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > >>Notice: 0: Child 16725 started > > > > >>Notice: 5: Authenticated globus user: > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > >>Nefedova 137823 > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > >>Notice: 5: Authorized as local user: nefedova > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > >>Notice: 5: and local gid: 11467 > > > > >>Notice: 0: executing > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > >>Notice: 0: Child 16726 started > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > pid=16769 > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > >> > > > > >> > > > > >>...your DN is also popular in the globus-gatekeeper.log. > > > > >> > > > > >>tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep 'CN=Veronika > > Nefedova' > > > > >>globus-gatekeeper.log |wc -l > > > > >> 15215 > > > > >> > > > > >> > > > > >>The large number of connects around the same time: > > > > >> > > > > >> grep --after-context=10 Veronika globus-gatekeeper.log | grep > > starting | > > > > >> grep > > > > >>'Mar 16 09:3' | wc -l > > > > >> 65 > > > > >>... > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > pid=15872 > > > > >>starting at Fri Mar 16 09:34:02 2007 > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > pid=16644 > > > > >>starting at Fri Mar 16 09:34:11 2007 > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > pid=16769 > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > pid=16926 > > > > >>starting at Fri Mar 16 09:34:14 2007 > > > > >>... > > > > >> > > > > >> > > > > >>...could be overwhelming the gatekeeper on tg-login1. > > > > >> > > > > >>-Galen > > > > >> > > > > >> > > > > >>Veronika V. Nefedova writes: > > > > >> >Hi, Dave: > > > > >> > > > > > >> >Could you please give me a bit more information on what has > > happened ? > > > > >> > > > > > >> >Thanks, > > > > >> > > > > > >> >Nika > > > > >> > > > > > >> >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > >> >>FROM: McWilliams, David G > > > > >> >>(Concerning ticket No. 137212) > > > > >> >> > > > > >> >>Veronika, > > > > >> >> > > > > >> >>The system administrator had to kill several of your globus > > > > processes today > > > > >> >>because the load average on the node was over 25. Below is a list of > > > > >> processes > > > > >> >>that were killed. Please let us know if you need help > > identifying the > > > > >> case of > > > > >> >>the problem. > > > > >> >> > > > > >> >>Regards, > > > > >> >> > > > > >> >>Dave McWilliams (217) > > 244-1144 consult at ncsa.uiuc.edu > > > > >> >>NCSA Consulting > > > > Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > > > > >> >>----------------------------------------------------------------- > > ---- > > > > ----- > > > > >> >> > > > > >> >>nefedova 11084 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11165 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11166 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11277 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11295 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11379 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11402 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11480 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11522 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11602 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11668 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11715 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11785 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11813 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11892 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 11948 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12004 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12030 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12133 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12172 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12253 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12256 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12443 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12460 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12504 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12534 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12657 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12668 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12773 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12806 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12892 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 12946 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13023 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13072 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13142 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13233 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13245 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13352 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13379 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13474 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13488 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13618 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13663 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13743 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13820 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13887 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 13952 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 14046 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 14048 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 14172 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 14196 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 14319 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 14366 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 14430 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 14539 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 14572 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 14703 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 14725 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 14832 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 14849 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15009 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15017 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15164 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15165 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15322 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15332 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15526 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15544 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15671 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15672 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15787 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15842 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15981 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 15982 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 16131 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 16132 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 16320 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 16358 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 16553 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 16622 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 16725 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 16726 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 16845 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 16925 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 16989 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 17095 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 17262 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 17305 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 17375 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > >> pbs_gcc -rdn > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > >> >>nefedova 31347 14172 4 13:35 ? 00:00:00 /usr/bin/perl > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > > .pl -m > > > > >> >>pbs_gcc > > > > >> >>-f /tmp/gram_1Drc0X -c poll > > > > >> >>nefedova 31349 11295 4 13:35 ? 00:00:00 /usr/bin/perl > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > > .pl -m > > > > >> >>pbs_gcc > > > > >> >>-f /tmp/gram_fSWNvt -c poll > > > > >> >>nefedova 31352 11522 4 13:35 ? 00:00:00 /usr/bin/perl > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > > .pl -m > > > > >> >>pbs_gcc > > > > >> >>-f /tmp/gram_a3whWx -c poll > > > > >> >>nefedova 31380 16845 4 13:35 ? 00:00:00 /usr/bin/perl > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > > .pl -m > > > > >> >>pbs_gcc > > > > >> >>-f /tmp/gram_zz0JuC -c poll > > > > >> >>nefedova 31396 14832 4 13:35 ? 00:00:00 /usr/bin/perl > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > > .pl -m > > > > >> >>pbs_gcc > > > > >> >>-f /tmp/gram_5jGFI1 -c poll > > > > >> >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f 905629.tg-master.ncsa.teragrid.org > > > > >> >>2>/dev/null > > > > >> >>nefedova 31546 31512 0 13:35 ? 00:00:00 > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > >> >>905629.tg-master.ncsa.teragrid.org > > > > >> >>nefedova 31578 31555 0 13:35 ? 00:00:00 > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > >> >>905638.tg-master.ncsa.teragrid.org > > > > >> >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > tg-master.ncsa.teragrid.org > > > > >> 15001 3 > > > > >> >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > tg-master.ncsa.teragrid.org > > > > >> 15001 3 > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > From hategan at mcs.anl.gov Wed Mar 21 10:32:23 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 21 Mar 2007 10:32:23 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <1174491076.7102.0.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> <1174141121.23152.1.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321102155.044728e0@mail.mcs.anl.gov> <1174491076.7102.0.camel@blabla.mcs.anl.gov> Message-ID: <1174491143.7183.0.camel@blabla.mcs.anl.gov> On Wed, 2007-03-21 at 10:31 -0500, Mihael Hategan wrote: > On Wed, 2007-03-21 at 10:27 -0500, Veronika V. Nefedova wrote: > > Hi, Mihael: > > > > I have these properties modified in my scheduler.xml file: > > > > > > > > > > Are you suggesting to add also this inside ... : > > > > ? > > > > Do these set parameters guarantee me that: > > > > 1. I have no more then 384 jobs in a queue at any time That actually is guaranteed. > > and > > 2. Jobs are submitted to the queue with at least 1 sec delay > > No. They don't. But they may get you closer to that. > > > > > (these are the requirements from TG NCSA). > > > > Thanks! > > > > Nika > > > > At 09:18 AM 3/17/2007, Mihael Hategan wrote: > > >There is no direct rate limiter unfortunately. There is a submit > > >throttle which tells the number of concurrent submissions. Setting that > > >to 1 may work. > > > > > >On Fri, 2007-03-16 at 17:02 -0500, Veronika V. Nefedova wrote: > > > > Hi, Mihael: > > > > > > > > how do I set this throttling parameter ? > > > > > > > > Thanks, > > > > > > > > Nika > > > > > > > > >Date: Fri, 16 Mar 2007 15:53:57 -0600 > > > > >Subject: Re: globus jobs killed on NCSA's IA64 cluster (mercury) > > > > >To: nefedova at mcs.anl.gov > > > > >From: consult at ncsa.uiuc.edu > > > > >Cc: > > > > >X-Mailer: Perl5 Mail::Internet v1.74 > > > > >Sender: Nobody > > > > >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9 > > > > >X-NCSA-MailScanner-Information: Please contact help at ncsa.uiuc.edu for > > > more > > > > >information, amantadine.ncsa.uiuc.edu > > > > >X-NCSA-MailScanner: Found to be clean > > > > >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at > > > mailgw.mcs.anl.gov > > > > > > > > > >FROM: Arnold, Galen > > > > >(Concerning ticket No. 137212) > > > > > > > > > >Veronika, > > > > > > > > > >If you can throttle the job submission so that there's more than 1 second > > > > >between them, that would probably help us out. > > > > > > > > > >-Galen > > > > > > > > > >Veronika V. Nefedova writes: > > > > > >Hi, Galen: > > > > > > > > > > > >I was told that I could have 384 jobs in PBS at the same time. By my > > > > > >estimation I had no more then 136 jobs in the queue. What are other > > > limits > > > > > >I should keep in mind ? > > > > > > > > > > > >Thanks, > > > > > > > > > > > >Veronika > > > > > > > > > > > >At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > >>FROM: Arnold, Galen > > > > > >>(Concerning ticket No. 137212) > > > > > >> > > > > > >>Veronika, > > > > > >> > > > > > >>We see many sequences in our globus-gatekeeper.log on tg-login1 > > > like this > > > > > >> > > > > > >>Notice: 5: Authenticated globus user: > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > >>Nefedova 137823 > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > >>Notice: 5: and local gid: 11467 > > > > > >>Notice: 0: executing > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > >>Notice: 0: Child 16725 started > > > > > >>Notice: 5: Authenticated globus user: > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > >>Nefedova 137823 > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > >>Notice: 5: and local gid: 11467 > > > > > >>Notice: 0: executing > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > >>Notice: 0: Child 16726 started > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > pid=16769 > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > >> > > > > > >> > > > > > >>...your DN is also popular in the globus-gatekeeper.log. > > > > > >> > > > > > >>tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep 'CN=Veronika > > > Nefedova' > > > > > >>globus-gatekeeper.log |wc -l > > > > > >> 15215 > > > > > >> > > > > > >> > > > > > >>The large number of connects around the same time: > > > > > >> > > > > > >> grep --after-context=10 Veronika globus-gatekeeper.log | grep > > > starting | > > > > > >> grep > > > > > >>'Mar 16 09:3' | wc -l > > > > > >> 65 > > > > > >>... > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > pid=15872 > > > > > >>starting at Fri Mar 16 09:34:02 2007 > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > pid=16644 > > > > > >>starting at Fri Mar 16 09:34:11 2007 > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > pid=16769 > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > pid=16926 > > > > > >>starting at Fri Mar 16 09:34:14 2007 > > > > > >>... > > > > > >> > > > > > >> > > > > > >>...could be overwhelming the gatekeeper on tg-login1. > > > > > >> > > > > > >>-Galen > > > > > >> > > > > > >> > > > > > >>Veronika V. Nefedova writes: > > > > > >> >Hi, Dave: > > > > > >> > > > > > > >> >Could you please give me a bit more information on what has > > > happened ? > > > > > >> > > > > > > >> >Thanks, > > > > > >> > > > > > > >> >Nika > > > > > >> > > > > > > >> >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > >> >>FROM: McWilliams, David G > > > > > >> >>(Concerning ticket No. 137212) > > > > > >> >> > > > > > >> >>Veronika, > > > > > >> >> > > > > > >> >>The system administrator had to kill several of your globus > > > > > processes today > > > > > >> >>because the load average on the node was over 25. Below is a list of > > > > > >> processes > > > > > >> >>that were killed. Please let us know if you need help > > > identifying the > > > > > >> case of > > > > > >> >>the problem. > > > > > >> >> > > > > > >> >>Regards, > > > > > >> >> > > > > > >> >>Dave McWilliams (217) > > > 244-1144 consult at ncsa.uiuc.edu > > > > > >> >>NCSA Consulting > > > > > Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > > > > > >> >>----------------------------------------------------------------- > > > ---- > > > > > ----- > > > > > >> >> > > > > > >> >>nefedova 11084 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11165 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11166 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11277 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11295 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11379 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11402 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11480 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11522 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11602 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11668 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11715 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11785 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11813 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11892 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11948 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12004 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12030 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12133 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12172 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12253 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12256 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12443 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12460 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12504 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12534 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12657 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12668 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12773 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12806 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12892 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12946 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13023 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13072 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13142 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13233 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13245 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13352 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13379 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13474 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13488 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13618 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13663 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13743 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13820 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13887 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13952 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14046 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14048 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14172 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14196 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14319 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14366 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14430 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14539 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14572 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14703 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14725 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14832 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14849 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15009 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15017 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15164 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15165 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15322 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15332 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15526 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15544 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15671 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15672 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15787 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15842 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15981 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15982 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16131 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16132 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16320 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16358 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16553 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16622 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16725 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16726 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16845 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16925 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16989 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 17095 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 17262 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 17305 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 17375 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 31347 14172 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > > > .pl -m > > > > > >> >>pbs_gcc > > > > > >> >>-f /tmp/gram_1Drc0X -c poll > > > > > >> >>nefedova 31349 11295 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > > > .pl -m > > > > > >> >>pbs_gcc > > > > > >> >>-f /tmp/gram_fSWNvt -c poll > > > > > >> >>nefedova 31352 11522 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > > > .pl -m > > > > > >> >>pbs_gcc > > > > > >> >>-f /tmp/gram_a3whWx -c poll > > > > > >> >>nefedova 31380 16845 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > > > .pl -m > > > > > >> >>pbs_gcc > > > > > >> >>-f /tmp/gram_zz0JuC -c poll > > > > > >> >>nefedova 31396 14832 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script > > > .pl -m > > > > > >> >>pbs_gcc > > > > > >> >>-f /tmp/gram_5jGFI1 -c poll > > > > > >> >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f 905629.tg-master.ncsa.teragrid.org > > > > > >> >>2>/dev/null > > > > > >> >>nefedova 31546 31512 0 13:35 ? 00:00:00 > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > >> >>905629.tg-master.ncsa.teragrid.org > > > > > >> >>nefedova 31578 31555 0 13:35 ? 00:00:00 > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > >> >>905638.tg-master.ncsa.teragrid.org > > > > > >> >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > tg-master.ncsa.teragrid.org > > > > > >> 15001 3 > > > > > >> >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > tg-master.ncsa.teragrid.org > > > > > >> 15001 3 > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > From nefedova at mcs.anl.gov Wed Mar 21 10:37:04 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 21 Mar 2007 10:37:04 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <1174491076.7102.0.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> <1174141121.23152.1.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321102155.044728e0@mail.mcs.anl.gov> <1174491076.7102.0.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070321103327.044339e0@mail.mcs.anl.gov> Ok. Hmmm. I am about to submit a large run (50 molecules), which could have as many as 3500 jobs per tier. I really would like to be sure that I do not brake TG. I want to play as safe as possible thus I'd like to make sure that I set all the possible parameters to safeguard the run ? Thanks, Nika At 10:31 AM 3/21/2007, Mihael Hategan wrote: >On Wed, 2007-03-21 at 10:27 -0500, Veronika V. Nefedova wrote: > > Hi, Mihael: > > > > I have these properties modified in my scheduler.xml file: > > > > > > > > > > Are you suggesting to add also this inside ... : > > > > ? > > > > Do these set parameters guarantee me that: > > > > 1. I have no more then 384 jobs in a queue at any time > > and > > 2. Jobs are submitted to the queue with at least 1 sec delay > >No. They don't. But they may get you closer to that. > > > > > (these are the requirements from TG NCSA). > > > > Thanks! > > > > Nika > > > > At 09:18 AM 3/17/2007, Mihael Hategan wrote: > > >There is no direct rate limiter unfortunately. There is a submit > > >throttle which tells the number of concurrent submissions. Setting that > > >to 1 may work. > > > > > >On Fri, 2007-03-16 at 17:02 -0500, Veronika V. Nefedova wrote: > > > > Hi, Mihael: > > > > > > > > how do I set this throttling parameter ? > > > > > > > > Thanks, > > > > > > > > Nika > > > > > > > > >Date: Fri, 16 Mar 2007 15:53:57 -0600 > > > > >Subject: Re: globus jobs killed on NCSA's IA64 cluster (mercury) > > > > >To: nefedova at mcs.anl.gov > > > > >From: consult at ncsa.uiuc.edu > > > > >Cc: > > > > >X-Mailer: Perl5 Mail::Internet v1.74 > > > > >Sender: Nobody > > > > >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9 > > > > >X-NCSA-MailScanner-Information: Please contact help at ncsa.uiuc.edu for > > > more > > > > >information, amantadine.ncsa.uiuc.edu > > > > >X-NCSA-MailScanner: Found to be clean > > > > >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at > > > mailgw.mcs.anl.gov > > > > > > > > > >FROM: Arnold, Galen > > > > >(Concerning ticket No. 137212) > > > > > > > > > >Veronika, > > > > > > > > > >If you can throttle the job submission so that there's more than 1 > second > > > > >between them, that would probably help us out. > > > > > > > > > >-Galen > > > > > > > > > >Veronika V. Nefedova writes: > > > > > >Hi, Galen: > > > > > > > > > > > >I was told that I could have 384 jobs in PBS at the same time. By my > > > > > >estimation I had no more then 136 jobs in the queue. What are other > > > limits > > > > > >I should keep in mind ? > > > > > > > > > > > >Thanks, > > > > > > > > > > > >Veronika > > > > > > > > > > > >At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > >>FROM: Arnold, Galen > > > > > >>(Concerning ticket No. 137212) > > > > > >> > > > > > >>Veronika, > > > > > >> > > > > > >>We see many sequences in our globus-gatekeeper.log on tg-login1 > > > like this > > > > > >> > > > > > >>Notice: 5: Authenticated globus user: > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > >>Nefedova 137823 > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > >>Notice: 5: and local gid: 11467 > > > > > >>Notice: 0: executing > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > >>Notice: 0: Child 16725 started > > > > > >>Notice: 5: Authenticated globus user: > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > >>Nefedova 137823 > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > >>Notice: 5: and local gid: 11467 > > > > > >>Notice: 0: executing > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > >>Notice: 0: Child 16726 started > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > pid=16769 > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > >> > > > > > >> > > > > > >>...your DN is also popular in the globus-gatekeeper.log. > > > > > >> > > > > > >>tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep 'CN=Veronika > > > Nefedova' > > > > > >>globus-gatekeeper.log |wc -l > > > > > >> 15215 > > > > > >> > > > > > >> > > > > > >>The large number of connects around the same time: > > > > > >> > > > > > >> grep --after-context=10 Veronika globus-gatekeeper.log | grep > > > starting | > > > > > >> grep > > > > > >>'Mar 16 09:3' | wc -l > > > > > >> 65 > > > > > >>... > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > pid=15872 > > > > > >>starting at Fri Mar 16 09:34:02 2007 > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > pid=16644 > > > > > >>starting at Fri Mar 16 09:34:11 2007 > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > pid=16769 > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > pid=16926 > > > > > >>starting at Fri Mar 16 09:34:14 2007 > > > > > >>... > > > > > >> > > > > > >> > > > > > >>...could be overwhelming the gatekeeper on tg-login1. > > > > > >> > > > > > >>-Galen > > > > > >> > > > > > >> > > > > > >>Veronika V. Nefedova writes: > > > > > >> >Hi, Dave: > > > > > >> > > > > > > >> >Could you please give me a bit more information on what has > > > happened ? > > > > > >> > > > > > > >> >Thanks, > > > > > >> > > > > > > >> >Nika > > > > > >> > > > > > > >> >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > >> >>FROM: McWilliams, David G > > > > > >> >>(Concerning ticket No. 137212) > > > > > >> >> > > > > > >> >>Veronika, > > > > > >> >> > > > > > >> >>The system administrator had to kill several of your globus > > > > > processes today > > > > > >> >>because the load average on the node was over 25. Below is a > list of > > > > > >> processes > > > > > >> >>that were killed. Please let us know if you need help > > > identifying the > > > > > >> case of > > > > > >> >>the problem. > > > > > >> >> > > > > > >> >>Regards, > > > > > >> >> > > > > > >> >>Dave McWilliams (217) > > > 244-1144 consult at ncsa.uiuc.edu > > > > > >> >>NCSA Consulting > > > > > Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > > > > > >> >>------------------------------------------------------------- > ---- > > > ---- > > > > > ----- > > > > > >> >> > > > > > >> >>nefedova 11084 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11165 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11166 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11277 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11295 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11379 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11402 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11480 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11522 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11602 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11668 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11715 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11785 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11813 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11892 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 11948 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12004 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12030 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12133 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12172 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12253 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12256 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12443 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12460 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12504 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12534 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12657 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12668 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12773 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12806 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12892 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 12946 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13023 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13072 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13142 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13233 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13245 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13352 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13379 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13474 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13488 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13618 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13663 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13743 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13820 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13887 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 13952 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14046 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14048 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14172 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14196 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14319 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14366 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14430 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14539 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14572 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14703 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14725 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14832 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 14849 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15009 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15017 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15164 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15165 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15322 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15332 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15526 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15544 1 0 09:33 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15671 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15672 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15787 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15842 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15981 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 15982 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16131 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16132 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16320 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16358 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16553 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16622 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16725 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16726 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16845 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16925 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 16989 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 17095 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 17262 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 17305 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 17375 1 0 09:34 ? 00:00:02 > > > globus-job-manager -conf > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > -type > > > > > >> pbs_gcc -rdn > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > >> >>nefedova 31347 14172 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-sc > ript > > > .pl -m > > > > > >> >>pbs_gcc > > > > > >> >>-f /tmp/gram_1Drc0X -c poll > > > > > >> >>nefedova 31349 11295 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-sc > ript > > > .pl -m > > > > > >> >>pbs_gcc > > > > > >> >>-f /tmp/gram_fSWNvt -c poll > > > > > >> >>nefedova 31352 11522 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-sc > ript > > > .pl -m > > > > > >> >>pbs_gcc > > > > > >> >>-f /tmp/gram_a3whWx -c poll > > > > > >> >>nefedova 31380 16845 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-sc > ript > > > .pl -m > > > > > >> >>pbs_gcc > > > > > >> >>-f /tmp/gram_zz0JuC -c poll > > > > > >> >>nefedova 31396 14832 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-sc > ript > > > .pl -m > > > > > >> >>pbs_gcc > > > > > >> >>-f /tmp/gram_5jGFI1 -c poll > > > > > >> >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > 905629.tg-master.ncsa.teragrid.org > > > > > >> >>2>/dev/null > > > > > >> >>nefedova 31546 31512 0 13:35 ? 00:00:00 > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > >> >>905629.tg-master.ncsa.teragrid.org > > > > > >> >>nefedova 31578 31555 0 13:35 ? 00:00:00 > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > >> >>905638.tg-master.ncsa.teragrid.org > > > > > >> >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > tg-master.ncsa.teragrid.org > > > > > >> 15001 3 > > > > > >> >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > tg-master.ncsa.teragrid.org > > > > > >> 15001 3 > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > From hategan at mcs.anl.gov Wed Mar 21 10:46:20 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 21 Mar 2007 10:46:20 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <6.0.0.22.2.20070321103327.044339e0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> <1174141121.23152.1.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321102155.044728e0@mail.mcs.anl.gov> <1174491076.7102.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321103327.044339e0@mail.mcs.anl.gov> Message-ID: <1174491980.7455.7.camel@blabla.mcs.anl.gov> I think these should be ok. Unfortunately I can't tell you what a solution to "as safe as possible" is because of two things: 1. The explanation of why your jobs got killed and the solution they proposed are ambiguous. They don't explain much. So the proposed solution may be insufficient or it may superfluous. 2. We don't exactly have submission rate limiters. The closest thing is the submission concurrency limiter. Setting this to 1 should work, because this will ensure that at most one job manager will do the submission dance at a time. Mihael On Wed, 2007-03-21 at 10:37 -0500, Veronika V. Nefedova wrote: > Ok. Hmmm. I am about to submit a large run (50 molecules), which could have > as many as 3500 jobs per tier. I really would like to be sure that I do not > brake TG. I want to play as safe as possible thus I'd like to make sure > that I set all the possible parameters to safeguard the run ? > > Thanks, > > Nika > > At 10:31 AM 3/21/2007, Mihael Hategan wrote: > >On Wed, 2007-03-21 at 10:27 -0500, Veronika V. Nefedova wrote: > > > Hi, Mihael: > > > > > > I have these properties modified in my scheduler.xml file: > > > > > > > > > > > > > > > Are you suggesting to add also this inside ... : > > > > > > ? > > > > > > Do these set parameters guarantee me that: > > > > > > 1. I have no more then 384 jobs in a queue at any time > > > and > > > 2. Jobs are submitted to the queue with at least 1 sec delay > > > >No. They don't. But they may get you closer to that. > > > > > > > > (these are the requirements from TG NCSA). > > > > > > Thanks! > > > > > > Nika > > > > > > At 09:18 AM 3/17/2007, Mihael Hategan wrote: > > > >There is no direct rate limiter unfortunately. There is a submit > > > >throttle which tells the number of concurrent submissions. Setting that > > > >to 1 may work. > > > > > > > >On Fri, 2007-03-16 at 17:02 -0500, Veronika V. Nefedova wrote: > > > > > Hi, Mihael: > > > > > > > > > > how do I set this throttling parameter ? > > > > > > > > > > Thanks, > > > > > > > > > > Nika > > > > > > > > > > >Date: Fri, 16 Mar 2007 15:53:57 -0600 > > > > > >Subject: Re: globus jobs killed on NCSA's IA64 cluster (mercury) > > > > > >To: nefedova at mcs.anl.gov > > > > > >From: consult at ncsa.uiuc.edu > > > > > >Cc: > > > > > >X-Mailer: Perl5 Mail::Internet v1.74 > > > > > >Sender: Nobody > > > > > >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9 > > > > > >X-NCSA-MailScanner-Information: Please contact help at ncsa.uiuc.edu for > > > > more > > > > > >information, amantadine.ncsa.uiuc.edu > > > > > >X-NCSA-MailScanner: Found to be clean > > > > > >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at > > > > mailgw.mcs.anl.gov > > > > > > > > > > > >FROM: Arnold, Galen > > > > > >(Concerning ticket No. 137212) > > > > > > > > > > > >Veronika, > > > > > > > > > > > >If you can throttle the job submission so that there's more than 1 > > second > > > > > >between them, that would probably help us out. > > > > > > > > > > > >-Galen > > > > > > > > > > > >Veronika V. Nefedova writes: > > > > > > >Hi, Galen: > > > > > > > > > > > > > >I was told that I could have 384 jobs in PBS at the same time. By my > > > > > > >estimation I had no more then 136 jobs in the queue. What are other > > > > limits > > > > > > >I should keep in mind ? > > > > > > > > > > > > > >Thanks, > > > > > > > > > > > > > >Veronika > > > > > > > > > > > > > >At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > > >>FROM: Arnold, Galen > > > > > > >>(Concerning ticket No. 137212) > > > > > > >> > > > > > > >>Veronika, > > > > > > >> > > > > > > >>We see many sequences in our globus-gatekeeper.log on tg-login1 > > > > like this > > > > > > >> > > > > > > >>Notice: 5: Authenticated globus user: > > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > > >>Nefedova 137823 > > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > > >>Notice: 5: and local gid: 11467 > > > > > > >>Notice: 0: executing > > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > > >>Notice: 0: Child 16725 started > > > > > > >>Notice: 5: Authenticated globus user: > > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > > >>Nefedova 137823 > > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > > >>Notice: 5: and local gid: 11467 > > > > > > >>Notice: 0: executing > > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > > >>Notice: 0: Child 16726 started > > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > pid=16769 > > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > > >> > > > > > > >> > > > > > > >>...your DN is also popular in the globus-gatekeeper.log. > > > > > > >> > > > > > > >>tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep 'CN=Veronika > > > > Nefedova' > > > > > > >>globus-gatekeeper.log |wc -l > > > > > > >> 15215 > > > > > > >> > > > > > > >> > > > > > > >>The large number of connects around the same time: > > > > > > >> > > > > > > >> grep --after-context=10 Veronika globus-gatekeeper.log | grep > > > > starting | > > > > > > >> grep > > > > > > >>'Mar 16 09:3' | wc -l > > > > > > >> 65 > > > > > > >>... > > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > pid=15872 > > > > > > >>starting at Fri Mar 16 09:34:02 2007 > > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > pid=16644 > > > > > > >>starting at Fri Mar 16 09:34:11 2007 > > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > pid=16769 > > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > > >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > pid=16926 > > > > > > >>starting at Fri Mar 16 09:34:14 2007 > > > > > > >>... > > > > > > >> > > > > > > >> > > > > > > >>...could be overwhelming the gatekeeper on tg-login1. > > > > > > >> > > > > > > >>-Galen > > > > > > >> > > > > > > >> > > > > > > >>Veronika V. Nefedova writes: > > > > > > >> >Hi, Dave: > > > > > > >> > > > > > > > >> >Could you please give me a bit more information on what has > > > > happened ? > > > > > > >> > > > > > > > >> >Thanks, > > > > > > >> > > > > > > > >> >Nika > > > > > > >> > > > > > > > >> >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > > >> >>FROM: McWilliams, David G > > > > > > >> >>(Concerning ticket No. 137212) > > > > > > >> >> > > > > > > >> >>Veronika, > > > > > > >> >> > > > > > > >> >>The system administrator had to kill several of your globus > > > > > > processes today > > > > > > >> >>because the load average on the node was over 25. Below is a > > list of > > > > > > >> processes > > > > > > >> >>that were killed. Please let us know if you need help > > > > identifying the > > > > > > >> case of > > > > > > >> >>the problem. > > > > > > >> >> > > > > > > >> >>Regards, > > > > > > >> >> > > > > > > >> >>Dave McWilliams (217) > > > > 244-1144 consult at ncsa.uiuc.edu > > > > > > >> >>NCSA Consulting > > > > > > Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > > > > > > >> >>------------------------------------------------------------- > > ---- > > > > ---- > > > > > > ----- > > > > > > >> >> > > > > > > >> >>nefedova 11084 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11165 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11166 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11277 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11295 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11379 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11402 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11480 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11522 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11602 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11668 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11715 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11785 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11813 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11892 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 11948 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12004 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12030 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12133 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12172 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12253 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12256 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12443 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12460 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12504 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12534 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12657 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12668 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12773 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12806 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12892 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 12946 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13023 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13072 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13142 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13233 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13245 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13352 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13379 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13474 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13488 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13618 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13663 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13743 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13820 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13887 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 13952 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 14046 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 14048 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 14172 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 14196 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 14319 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 14366 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 14430 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 14539 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 14572 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 14703 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 14725 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 14832 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 14849 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15009 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15017 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15164 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15165 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15322 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15332 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15526 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15544 1 0 09:33 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15671 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15672 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15787 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15842 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15981 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 15982 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 16131 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 16132 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 16320 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 16358 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 16553 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 16622 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 16725 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 16726 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 16845 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 16925 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 16989 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 17095 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 17262 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 17305 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 17375 1 0 09:34 ? 00:00:02 > > > > globus-job-manager -conf > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > -type > > > > > > >> pbs_gcc -rdn > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > >> >>nefedova 31347 14172 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-sc > > ript > > > > .pl -m > > > > > > >> >>pbs_gcc > > > > > > >> >>-f /tmp/gram_1Drc0X -c poll > > > > > > >> >>nefedova 31349 11295 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-sc > > ript > > > > .pl -m > > > > > > >> >>pbs_gcc > > > > > > >> >>-f /tmp/gram_fSWNvt -c poll > > > > > > >> >>nefedova 31352 11522 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-sc > > ript > > > > .pl -m > > > > > > >> >>pbs_gcc > > > > > > >> >>-f /tmp/gram_a3whWx -c poll > > > > > > >> >>nefedova 31380 16845 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-sc > > ript > > > > .pl -m > > > > > > >> >>pbs_gcc > > > > > > >> >>-f /tmp/gram_zz0JuC -c poll > > > > > > >> >>nefedova 31396 14832 4 13:35 ? 00:00:00 /usr/bin/perl > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-sc > > ript > > > > .pl -m > > > > > > >> >>pbs_gcc > > > > > > >> >>-f /tmp/gram_5jGFI1 -c poll > > > > > > >> >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > 905629.tg-master.ncsa.teragrid.org > > > > > > >> >>2>/dev/null > > > > > > >> >>nefedova 31546 31512 0 13:35 ? 00:00:00 > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > > >> >>905629.tg-master.ncsa.teragrid.org > > > > > > >> >>nefedova 31578 31555 0 13:35 ? 00:00:00 > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > > >> >>905638.tg-master.ncsa.teragrid.org > > > > > > >> >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > > tg-master.ncsa.teragrid.org > > > > > > >> 15001 3 > > > > > > >> >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > > tg-master.ncsa.teragrid.org > > > > > > >> 15001 3 > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > From nefedova at mcs.anl.gov Wed Mar 21 10:51:28 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 21 Mar 2007 10:51:28 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <1174491980.7455.7.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> <1174141121.23152.1.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321102155.044728e0@mail.mcs.anl.gov> <1174491076.7102.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321103327.044339e0@mail.mcs.anl.gov> <1174491980.7455.7.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070321104830.0442dac0@mail.mcs.anl.gov> OK. So if I set this submitThrottle to 1 it will submit jobs one at time (and it won't wait for the previous jobs to finish) ? What will be an indication for swift to go ahead and submit the next jobs (time delay?)? If thats so - than I think I am ok. Thanks again, Nika At 10:46 AM 3/21/2007, Mihael Hategan wrote: >I think these should be ok. Unfortunately I can't tell you what a >solution to "as safe as possible" is because of two things: >1. The explanation of why your jobs got killed and the solution they >proposed are ambiguous. They don't explain much. So the proposed >solution may be insufficient or it may superfluous. >2. We don't exactly have submission rate limiters. The closest thing is >the submission concurrency limiter. Setting this to 1 should work, >because this will ensure that at most one job manager will do the >submission dance at a time. > >Mihael > >On Wed, 2007-03-21 at 10:37 -0500, Veronika V. Nefedova wrote: > > Ok. Hmmm. I am about to submit a large run (50 molecules), which could > have > > as many as 3500 jobs per tier. I really would like to be sure that I do > not > > brake TG. I want to play as safe as possible thus I'd like to make sure > > that I set all the possible parameters to safeguard the run ? > > > > Thanks, > > > > Nika > > > > At 10:31 AM 3/21/2007, Mihael Hategan wrote: > > >On Wed, 2007-03-21 at 10:27 -0500, Veronika V. Nefedova wrote: > > > > Hi, Mihael: > > > > > > > > I have these properties modified in my scheduler.xml file: > > > > > > > > > > > > > > > > > > > > Are you suggesting to add also this inside ... : > > > > > > > > ? > > > > > > > > Do these set parameters guarantee me that: > > > > > > > > 1. I have no more then 384 jobs in a queue at any time > > > > and > > > > 2. Jobs are submitted to the queue with at least 1 sec delay > > > > > >No. They don't. But they may get you closer to that. > > > > > > > > > > > (these are the requirements from TG NCSA). > > > > > > > > Thanks! > > > > > > > > Nika > > > > > > > > At 09:18 AM 3/17/2007, Mihael Hategan wrote: > > > > >There is no direct rate limiter unfortunately. There is a submit > > > > >throttle which tells the number of concurrent submissions. Setting > that > > > > >to 1 may work. > > > > > > > > > >On Fri, 2007-03-16 at 17:02 -0500, Veronika V. Nefedova wrote: > > > > > > Hi, Mihael: > > > > > > > > > > > > how do I set this throttling parameter ? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Nika > > > > > > > > > > > > >Date: Fri, 16 Mar 2007 15:53:57 -0600 > > > > > > >Subject: Re: globus jobs killed on NCSA's IA64 cluster (mercury) > > > > > > >To: nefedova at mcs.anl.gov > > > > > > >From: consult at ncsa.uiuc.edu > > > > > > >Cc: > > > > > > >X-Mailer: Perl5 Mail::Internet v1.74 > > > > > > >Sender: Nobody > > > > > > >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9 > > > > > > >X-NCSA-MailScanner-Information: Please contact > help at ncsa.uiuc.edu for > > > > > more > > > > > > >information, amantadine.ncsa.uiuc.edu > > > > > > >X-NCSA-MailScanner: Found to be clean > > > > > > >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at > > > > > mailgw.mcs.anl.gov > > > > > > > > > > > > > >FROM: Arnold, Galen > > > > > > >(Concerning ticket No. 137212) > > > > > > > > > > > > > >Veronika, > > > > > > > > > > > > > >If you can throttle the job submission so that there's more > than 1 > > > second > > > > > > >between them, that would probably help us out. > > > > > > > > > > > > > >-Galen > > > > > > > > > > > > > >Veronika V. Nefedova writes: > > > > > > > >Hi, Galen: > > > > > > > > > > > > > > > >I was told that I could have 384 jobs in PBS at the same > time. By my > > > > > > > >estimation I had no more then 136 jobs in the queue. What > are other > > > > > limits > > > > > > > >I should keep in mind ? > > > > > > > > > > > > > > > >Thanks, > > > > > > > > > > > > > > > >Veronika > > > > > > > > > > > > > > > >At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > > > >>FROM: Arnold, Galen > > > > > > > >>(Concerning ticket No. 137212) > > > > > > > >> > > > > > > > >>Veronika, > > > > > > > >> > > > > > > > >>We see many sequences in our globus-gatekeeper.log on tg-login1 > > > > > like this > > > > > > > >> > > > > > > > >>Notice: 5: Authenticated globus user: > > > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > > > >>Nefedova 137823 > > > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > > > >>Notice: 5: and local gid: 11467 > > > > > > > >>Notice: 0: executing > > > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > > > >>Notice: 0: Child 16725 started > > > > > > > >>Notice: 5: Authenticated globus user: > > > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > > > >>Nefedova 137823 > > > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > > > >>Notice: 5: and local gid: 11467 > > > > > > > >>Notice: 0: executing > > > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > > > >>Notice: 0: Child 16726 started > > > > > > > >>Notice: 6: > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > pid=16769 > > > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > > > >> > > > > > > > >> > > > > > > > >>...your DN is also popular in the globus-gatekeeper.log. > > > > > > > >> > > > > > > > >>tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep > 'CN=Veronika > > > > > Nefedova' > > > > > > > >>globus-gatekeeper.log |wc -l > > > > > > > >> 15215 > > > > > > > >> > > > > > > > >> > > > > > > > >>The large number of connects around the same time: > > > > > > > >> > > > > > > > >> grep --after-context=10 Veronika globus-gatekeeper.log | grep > > > > > starting | > > > > > > > >> grep > > > > > > > >>'Mar 16 09:3' | wc -l > > > > > > > >> 65 > > > > > > > >>... > > > > > > > >>Notice: 6: > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > pid=15872 > > > > > > > >>starting at Fri Mar 16 09:34:02 2007 > > > > > > > >>Notice: 6: > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > pid=16644 > > > > > > > >>starting at Fri Mar 16 09:34:11 2007 > > > > > > > >>Notice: 6: > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > pid=16769 > > > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > > > >>Notice: 6: > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > pid=16926 > > > > > > > >>starting at Fri Mar 16 09:34:14 2007 > > > > > > > >>... > > > > > > > >> > > > > > > > >> > > > > > > > >>...could be overwhelming the gatekeeper on tg-login1. > > > > > > > >> > > > > > > > >>-Galen > > > > > > > >> > > > > > > > >> > > > > > > > >>Veronika V. Nefedova writes: > > > > > > > >> >Hi, Dave: > > > > > > > >> > > > > > > > > >> >Could you please give me a bit more information on what has > > > > > happened ? > > > > > > > >> > > > > > > > > >> >Thanks, > > > > > > > >> > > > > > > > > >> >Nika > > > > > > > >> > > > > > > > > >> >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > > > >> >>FROM: McWilliams, David G > > > > > > > >> >>(Concerning ticket No. 137212) > > > > > > > >> >> > > > > > > > >> >>Veronika, > > > > > > > >> >> > > > > > > > >> >>The system administrator had to kill several of your globus > > > > > > > processes today > > > > > > > >> >>because the load average on the node was over 25. Below > is a > > > list of > > > > > > > >> processes > > > > > > > >> >>that were killed. Please let us know if you need help > > > > > identifying the > > > > > > > >> case of > > > > > > > >> >>the problem. > > > > > > > >> >> > > > > > > > >> >>Regards, > > > > > > > >> >> > > > > > > > >> >>Dave McWilliams (217) > > > > > 244-1144 consult at ncsa.uiuc.edu > > > > > > > >> >>NCSA Consulting > > > > > > > Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > > > > > > > >> >>--------------------------------------------------------- > ---- > > > ---- > > > > > ---- > > > > > > > ----- > > > > > > > >> >> > > > > > > > >> >>nefedova 11084 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11165 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11166 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11277 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11295 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11379 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11402 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11480 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11522 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11602 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11668 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11715 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11785 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11813 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11892 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 11948 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12004 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12030 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12133 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12172 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12253 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12256 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12443 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12460 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12504 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12534 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12657 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12668 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12773 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12806 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12892 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 12946 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13023 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13072 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13142 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13233 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13245 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13352 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13379 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13474 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13488 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13618 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13663 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13743 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13820 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13887 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 13952 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 14046 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 14048 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 14172 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 14196 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 14319 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 14366 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 14430 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 14539 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 14572 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 14703 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 14725 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 14832 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 14849 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15009 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15017 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15164 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15165 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15322 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15332 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15526 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15544 1 0 09:33 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15671 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15672 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15787 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15842 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15981 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 15982 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 16131 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 16132 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 16320 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 16358 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 16553 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 16622 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 16725 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 16726 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 16845 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 16925 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 16989 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 17095 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 17262 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 17305 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 17375 1 0 09:34 ? 00:00:02 > > > > > globus-job-manager -conf > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > -type > > > > > > > >> pbs_gcc -rdn > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > >> >>nefedova 31347 14172 4 13:35 ? 00:00:00 > /usr/bin/perl > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manage > r-sc > > > ript > > > > > .pl -m > > > > > > > >> >>pbs_gcc > > > > > > > >> >>-f /tmp/gram_1Drc0X -c poll > > > > > > > >> >>nefedova 31349 11295 4 13:35 ? 00:00:00 > /usr/bin/perl > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manage > r-sc > > > ript > > > > > .pl -m > > > > > > > >> >>pbs_gcc > > > > > > > >> >>-f /tmp/gram_fSWNvt -c poll > > > > > > > >> >>nefedova 31352 11522 4 13:35 ? 00:00:00 > /usr/bin/perl > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manage > r-sc > > > ript > > > > > .pl -m > > > > > > > >> >>pbs_gcc > > > > > > > >> >>-f /tmp/gram_a3whWx -c poll > > > > > > > >> >>nefedova 31380 16845 4 13:35 ? 00:00:00 > /usr/bin/perl > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manage > r-sc > > > ript > > > > > .pl -m > > > > > > > >> >>pbs_gcc > > > > > > > >> >>-f /tmp/gram_zz0JuC -c poll > > > > > > > >> >>nefedova 31396 14832 4 13:35 ? 00:00:00 > /usr/bin/perl > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manage > r-sc > > > ript > > > > > .pl -m > > > > > > > >> >>pbs_gcc > > > > > > > >> >>-f /tmp/gram_5jGFI1 -c poll > > > > > > > >> >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > 905629.tg-master.ncsa.teragrid.org > > > > > > > >> >>2>/dev/null > > > > > > > >> >>nefedova 31546 31512 0 13:35 ? 00:00:00 > > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > > > >> >>905629.tg-master.ncsa.teragrid.org > > > > > > > >> >>nefedova 31578 31555 0 13:35 ? 00:00:00 > > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > > > >> >>905638.tg-master.ncsa.teragrid.org > > > > > > > >> >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > > > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > > > tg-master.ncsa.teragrid.org > > > > > > > >> 15001 3 > > > > > > > >> >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > > > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > > > tg-master.ncsa.teragrid.org > > > > > > > >> 15001 3 > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > From hategan at mcs.anl.gov Wed Mar 21 11:00:38 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 21 Mar 2007 11:00:38 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <6.0.0.22.2.20070321104830.0442dac0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> <1174141121.23152.1.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321102155.044728e0@mail.mcs.anl.gov> <1174491076.7102.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321103327.044339e0@mail.mcs.anl.gov> <1174491980.7455.7.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321104830.0442dac0@mail.mcs.anl.gov> Message-ID: <1174492838.8120.1.camel@blabla.mcs.anl.gov> On Wed, 2007-03-21 at 10:51 -0500, Veronika V. Nefedova wrote: > OK. So if I set this submitThrottle to 1 it will submit jobs one at time > (and it won't wait for the previous jobs to finish) ? Yes. > What will be an > indication for swift to go ahead and submit the next jobs (time delay?)? The fact that the submission of the previous job has been completed (ie, the job manager has put the job in the queue). > If > thats so - than I think I am ok. > > Thanks again, > > Nika > > At 10:46 AM 3/21/2007, Mihael Hategan wrote: > >I think these should be ok. Unfortunately I can't tell you what a > >solution to "as safe as possible" is because of two things: > >1. The explanation of why your jobs got killed and the solution they > >proposed are ambiguous. They don't explain much. So the proposed > >solution may be insufficient or it may superfluous. > >2. We don't exactly have submission rate limiters. The closest thing is > >the submission concurrency limiter. Setting this to 1 should work, > >because this will ensure that at most one job manager will do the > >submission dance at a time. > > > >Mihael > > > >On Wed, 2007-03-21 at 10:37 -0500, Veronika V. Nefedova wrote: > > > Ok. Hmmm. I am about to submit a large run (50 molecules), which could > > have > > > as many as 3500 jobs per tier. I really would like to be sure that I do > > not > > > brake TG. I want to play as safe as possible thus I'd like to make sure > > > that I set all the possible parameters to safeguard the run ? > > > > > > Thanks, > > > > > > Nika > > > > > > At 10:31 AM 3/21/2007, Mihael Hategan wrote: > > > >On Wed, 2007-03-21 at 10:27 -0500, Veronika V. Nefedova wrote: > > > > > Hi, Mihael: > > > > > > > > > > I have these properties modified in my scheduler.xml file: > > > > > > > > > > > > > > > > > > > > > > > > > Are you suggesting to add also this inside ... : > > > > > > > > > > ? > > > > > > > > > > Do these set parameters guarantee me that: > > > > > > > > > > 1. I have no more then 384 jobs in a queue at any time > > > > > and > > > > > 2. Jobs are submitted to the queue with at least 1 sec delay > > > > > > > >No. They don't. But they may get you closer to that. > > > > > > > > > > > > > > (these are the requirements from TG NCSA). > > > > > > > > > > Thanks! > > > > > > > > > > Nika > > > > > > > > > > At 09:18 AM 3/17/2007, Mihael Hategan wrote: > > > > > >There is no direct rate limiter unfortunately. There is a submit > > > > > >throttle which tells the number of concurrent submissions. Setting > > that > > > > > >to 1 may work. > > > > > > > > > > > >On Fri, 2007-03-16 at 17:02 -0500, Veronika V. Nefedova wrote: > > > > > > > Hi, Mihael: > > > > > > > > > > > > > > how do I set this throttling parameter ? > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > >Date: Fri, 16 Mar 2007 15:53:57 -0600 > > > > > > > >Subject: Re: globus jobs killed on NCSA's IA64 cluster (mercury) > > > > > > > >To: nefedova at mcs.anl.gov > > > > > > > >From: consult at ncsa.uiuc.edu > > > > > > > >Cc: > > > > > > > >X-Mailer: Perl5 Mail::Internet v1.74 > > > > > > > >Sender: Nobody > > > > > > > >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9 > > > > > > > >X-NCSA-MailScanner-Information: Please contact > > help at ncsa.uiuc.edu for > > > > > > more > > > > > > > >information, amantadine.ncsa.uiuc.edu > > > > > > > >X-NCSA-MailScanner: Found to be clean > > > > > > > >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at > > > > > > mailgw.mcs.anl.gov > > > > > > > > > > > > > > > >FROM: Arnold, Galen > > > > > > > >(Concerning ticket No. 137212) > > > > > > > > > > > > > > > >Veronika, > > > > > > > > > > > > > > > >If you can throttle the job submission so that there's more > > than 1 > > > > second > > > > > > > >between them, that would probably help us out. > > > > > > > > > > > > > > > >-Galen > > > > > > > > > > > > > > > >Veronika V. Nefedova writes: > > > > > > > > >Hi, Galen: > > > > > > > > > > > > > > > > > >I was told that I could have 384 jobs in PBS at the same > > time. By my > > > > > > > > >estimation I had no more then 136 jobs in the queue. What > > are other > > > > > > limits > > > > > > > > >I should keep in mind ? > > > > > > > > > > > > > > > > > >Thanks, > > > > > > > > > > > > > > > > > >Veronika > > > > > > > > > > > > > > > > > >At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > > > > >>FROM: Arnold, Galen > > > > > > > > >>(Concerning ticket No. 137212) > > > > > > > > >> > > > > > > > > >>Veronika, > > > > > > > > >> > > > > > > > > >>We see many sequences in our globus-gatekeeper.log on tg-login1 > > > > > > like this > > > > > > > > >> > > > > > > > > >>Notice: 5: Authenticated globus user: > > > > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > > > > >>Nefedova 137823 > > > > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > > > > >>Notice: 5: and local gid: 11467 > > > > > > > > >>Notice: 0: executing > > > > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > > > > >>Notice: 0: Child 16725 started > > > > > > > > >>Notice: 5: Authenticated globus user: > > > > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > > > > >>Nefedova 137823 > > > > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > > > > >>Notice: 5: and local gid: 11467 > > > > > > > > >>Notice: 0: executing > > > > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > > > > >>Notice: 0: Child 16726 started > > > > > > > > >>Notice: 6: > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > pid=16769 > > > > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > > > > >> > > > > > > > > >> > > > > > > > > >>...your DN is also popular in the globus-gatekeeper.log. > > > > > > > > >> > > > > > > > > >>tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep > > 'CN=Veronika > > > > > > Nefedova' > > > > > > > > >>globus-gatekeeper.log |wc -l > > > > > > > > >> 15215 > > > > > > > > >> > > > > > > > > >> > > > > > > > > >>The large number of connects around the same time: > > > > > > > > >> > > > > > > > > >> grep --after-context=10 Veronika globus-gatekeeper.log | grep > > > > > > starting | > > > > > > > > >> grep > > > > > > > > >>'Mar 16 09:3' | wc -l > > > > > > > > >> 65 > > > > > > > > >>... > > > > > > > > >>Notice: 6: > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > pid=15872 > > > > > > > > >>starting at Fri Mar 16 09:34:02 2007 > > > > > > > > >>Notice: 6: > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > pid=16644 > > > > > > > > >>starting at Fri Mar 16 09:34:11 2007 > > > > > > > > >>Notice: 6: > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > pid=16769 > > > > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > > > > >>Notice: 6: > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > pid=16926 > > > > > > > > >>starting at Fri Mar 16 09:34:14 2007 > > > > > > > > >>... > > > > > > > > >> > > > > > > > > >> > > > > > > > > >>...could be overwhelming the gatekeeper on tg-login1. > > > > > > > > >> > > > > > > > > >>-Galen > > > > > > > > >> > > > > > > > > >> > > > > > > > > >>Veronika V. Nefedova writes: > > > > > > > > >> >Hi, Dave: > > > > > > > > >> > > > > > > > > > >> >Could you please give me a bit more information on what has > > > > > > happened ? > > > > > > > > >> > > > > > > > > > >> >Thanks, > > > > > > > > >> > > > > > > > > > >> >Nika > > > > > > > > >> > > > > > > > > > >> >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > > > > >> >>FROM: McWilliams, David G > > > > > > > > >> >>(Concerning ticket No. 137212) > > > > > > > > >> >> > > > > > > > > >> >>Veronika, > > > > > > > > >> >> > > > > > > > > >> >>The system administrator had to kill several of your globus > > > > > > > > processes today > > > > > > > > >> >>because the load average on the node was over 25. Below > > is a > > > > list of > > > > > > > > >> processes > > > > > > > > >> >>that were killed. Please let us know if you need help > > > > > > identifying the > > > > > > > > >> case of > > > > > > > > >> >>the problem. > > > > > > > > >> >> > > > > > > > > >> >>Regards, > > > > > > > > >> >> > > > > > > > > >> >>Dave McWilliams (217) > > > > > > 244-1144 consult at ncsa.uiuc.edu > > > > > > > > >> >>NCSA Consulting > > > > > > > > Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > > > > > > > > >> >>--------------------------------------------------------- > > ---- > > > > ---- > > > > > > ---- > > > > > > > > ----- > > > > > > > > >> >> > > > > > > > > >> >>nefedova 11084 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11165 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11166 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11277 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11295 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11379 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11402 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11480 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11522 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11602 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11668 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11715 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11785 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11813 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11892 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 11948 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12004 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12030 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12133 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12172 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12253 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12256 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12443 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12460 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12504 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12534 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12657 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12668 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12773 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12806 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12892 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 12946 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13023 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13072 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13142 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13233 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13245 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13352 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13379 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13474 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13488 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13618 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13663 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13743 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13820 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13887 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 13952 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 14046 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 14048 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 14172 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 14196 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 14319 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 14366 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 14430 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 14539 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 14572 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 14703 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 14725 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 14832 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 14849 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15009 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15017 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15164 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15165 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15322 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15332 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15526 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15544 1 0 09:33 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15671 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15672 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15787 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15842 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15981 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 15982 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 16131 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 16132 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 16320 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 16358 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 16553 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 16622 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 16725 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 16726 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 16845 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 16925 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 16989 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 17095 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 17262 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 17305 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 17375 1 0 09:34 ? 00:00:02 > > > > > > globus-job-manager -conf > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf > > > > -type > > > > > > > > >> pbs_gcc -rdn > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > >> >>nefedova 31347 14172 4 13:35 ? 00:00:00 > > /usr/bin/perl > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manage > > r-sc > > > > ript > > > > > > .pl -m > > > > > > > > >> >>pbs_gcc > > > > > > > > >> >>-f /tmp/gram_1Drc0X -c poll > > > > > > > > >> >>nefedova 31349 11295 4 13:35 ? 00:00:00 > > /usr/bin/perl > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manage > > r-sc > > > > ript > > > > > > .pl -m > > > > > > > > >> >>pbs_gcc > > > > > > > > >> >>-f /tmp/gram_fSWNvt -c poll > > > > > > > > >> >>nefedova 31352 11522 4 13:35 ? 00:00:00 > > /usr/bin/perl > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manage > > r-sc > > > > ript > > > > > > .pl -m > > > > > > > > >> >>pbs_gcc > > > > > > > > >> >>-f /tmp/gram_a3whWx -c poll > > > > > > > > >> >>nefedova 31380 16845 4 13:35 ? 00:00:00 > > /usr/bin/perl > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manage > > r-sc > > > > ript > > > > > > .pl -m > > > > > > > > >> >>pbs_gcc > > > > > > > > >> >>-f /tmp/gram_zz0JuC -c poll > > > > > > > > >> >>nefedova 31396 14832 4 13:35 ? 00:00:00 > > /usr/bin/perl > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manage > > r-sc > > > > ript > > > > > > .pl -m > > > > > > > > >> >>pbs_gcc > > > > > > > > >> >>-f /tmp/gram_5jGFI1 -c poll > > > > > > > > >> >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > > > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > 905629.tg-master.ncsa.teragrid.org > > > > > > > > >> >>2>/dev/null > > > > > > > > >> >>nefedova 31546 31512 0 13:35 ? 00:00:00 > > > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > > > > >> >>905629.tg-master.ncsa.teragrid.org > > > > > > > > >> >>nefedova 31578 31555 0 13:35 ? 00:00:00 > > > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > > > > >> >>905638.tg-master.ncsa.teragrid.org > > > > > > > > >> >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > > > > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > > > > tg-master.ncsa.teragrid.org > > > > > > > > >> 15001 3 > > > > > > > > >> >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > > > > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > > > > tg-master.ncsa.teragrid.org > > > > > > > > >> 15001 3 > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > From nefedova at mcs.anl.gov Wed Mar 21 15:03:08 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 21 Mar 2007 15:03:08 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <1174492838.8120.1.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> <1174141121.23152.1.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321102155.044728e0@mail.mcs.anl.gov> <1174491076.7102.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321103327.044339e0@mail.mcs.anl.gov> <1174491980.7455.7.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321104830.0442dac0@mail.mcs.anl.gov> <1174492838.8120.1.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070321150053.04517920@mail.mcs.anl.gov> It looks like setting the submitThrottle to 1 didn't really make any difference. For example, I have 50 jobs that are 15 minutes each. And I see that some of them were glued together (shown in the queue as 30 or 45 minute jobs): 912830.tg-master.ncs nefedova dque STDIN -- 1 1 -- 00:30 Q -- 912832.tg-master.ncs nefedova dque STDIN -- 1 1 -- 00:15 Q -- 912833.tg-master.ncs nefedova dque STDIN -- 1 1 -- 00:30 Q -- 912834.tg-master.ncs nefedova dque STDIN -- 1 1 -- 00:15 Q -- 912835.tg-master.ncs nefedova dque STDIN -- 1 1 -- 00:15 Q -- 912836.tg-master.ncs nefedova dque STDIN -- 1 1 -- 00:45 Q -- Nika At 11:00 AM 3/21/2007, Mihael Hategan wrote: >On Wed, 2007-03-21 at 10:51 -0500, Veronika V. Nefedova wrote: > > OK. So if I set this submitThrottle to 1 it will submit jobs one at time > > (and it won't wait for the previous jobs to finish) ? > >Yes. > > > What will be an > > indication for swift to go ahead and submit the next jobs (time delay?)? > >The fact that the submission of the previous job has been completed (ie, >the job manager has put the job in the queue). > > > If > > thats so - than I think I am ok. > > > > Thanks again, > > > > Nika > > > > At 10:46 AM 3/21/2007, Mihael Hategan wrote: > > >I think these should be ok. Unfortunately I can't tell you what a > > >solution to "as safe as possible" is because of two things: > > >1. The explanation of why your jobs got killed and the solution they > > >proposed are ambiguous. They don't explain much. So the proposed > > >solution may be insufficient or it may superfluous. > > >2. We don't exactly have submission rate limiters. The closest thing is > > >the submission concurrency limiter. Setting this to 1 should work, > > >because this will ensure that at most one job manager will do the > > >submission dance at a time. > > > > > >Mihael > > > > > >On Wed, 2007-03-21 at 10:37 -0500, Veronika V. Nefedova wrote: > > > > Ok. Hmmm. I am about to submit a large run (50 molecules), which could > > > have > > > > as many as 3500 jobs per tier. I really would like to be sure that > I do > > > not > > > > brake TG. I want to play as safe as possible thus I'd like to make sure > > > > that I set all the possible parameters to safeguard the run ? > > > > > > > > Thanks, > > > > > > > > Nika > > > > > > > > At 10:31 AM 3/21/2007, Mihael Hategan wrote: > > > > >On Wed, 2007-03-21 at 10:27 -0500, Veronika V. Nefedova wrote: > > > > > > Hi, Mihael: > > > > > > > > > > > > I have these properties modified in my scheduler.xml file: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Are you suggesting to add also this inside > ... : > > > > > > > > > > > > ? > > > > > > > > > > > > Do these set parameters guarantee me that: > > > > > > > > > > > > 1. I have no more then 384 jobs in a queue at any time > > > > > > and > > > > > > 2. Jobs are submitted to the queue with at least 1 sec delay > > > > > > > > > >No. They don't. But they may get you closer to that. > > > > > > > > > > > > > > > > > (these are the requirements from TG NCSA). > > > > > > > > > > > > Thanks! > > > > > > > > > > > > Nika > > > > > > > > > > > > At 09:18 AM 3/17/2007, Mihael Hategan wrote: > > > > > > >There is no direct rate limiter unfortunately. There is a submit > > > > > > >throttle which tells the number of concurrent submissions. > Setting > > > that > > > > > > >to 1 may work. > > > > > > > > > > > > > >On Fri, 2007-03-16 at 17:02 -0500, Veronika V. Nefedova wrote: > > > > > > > > Hi, Mihael: > > > > > > > > > > > > > > > > how do I set this throttling parameter ? > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > >Date: Fri, 16 Mar 2007 15:53:57 -0600 > > > > > > > > >Subject: Re: globus jobs killed on NCSA's IA64 cluster > (mercury) > > > > > > > > >To: nefedova at mcs.anl.gov > > > > > > > > >From: consult at ncsa.uiuc.edu > > > > > > > > >Cc: > > > > > > > > >X-Mailer: Perl5 Mail::Internet v1.74 > > > > > > > > >Sender: Nobody > > > > > > > > >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9 > > > > > > > > >X-NCSA-MailScanner-Information: Please contact > > > help at ncsa.uiuc.edu for > > > > > > > more > > > > > > > > >information, amantadine.ncsa.uiuc.edu > > > > > > > > >X-NCSA-MailScanner: Found to be clean > > > > > > > > >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at > > > > > > > mailgw.mcs.anl.gov > > > > > > > > > > > > > > > > > >FROM: Arnold, Galen > > > > > > > > >(Concerning ticket No. 137212) > > > > > > > > > > > > > > > > > >Veronika, > > > > > > > > > > > > > > > > > >If you can throttle the job submission so that there's more > > > than 1 > > > > > second > > > > > > > > >between them, that would probably help us out. > > > > > > > > > > > > > > > > > >-Galen > > > > > > > > > > > > > > > > > >Veronika V. Nefedova writes: > > > > > > > > > >Hi, Galen: > > > > > > > > > > > > > > > > > > > >I was told that I could have 384 jobs in PBS at the same > > > time. By my > > > > > > > > > >estimation I had no more then 136 jobs in the queue. What > > > are other > > > > > > > limits > > > > > > > > > >I should keep in mind ? > > > > > > > > > > > > > > > > > > > >Thanks, > > > > > > > > > > > > > > > > > > > >Veronika > > > > > > > > > > > > > > > > > > > >At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > > > > > >>FROM: Arnold, Galen > > > > > > > > > >>(Concerning ticket No. 137212) > > > > > > > > > >> > > > > > > > > > >>Veronika, > > > > > > > > > >> > > > > > > > > > >>We see many sequences in our globus-gatekeeper.log on > tg-login1 > > > > > > > like this > > > > > > > > > >> > > > > > > > > > >>Notice: 5: Authenticated globus user: > > > > > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > > > > > >>Nefedova 137823 > > > > > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > > > > > >>Notice: 5: and local gid: 11467 > > > > > > > > > >>Notice: 0: executing > > > > > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > > > > > >>Notice: 0: Child 16725 started > > > > > > > > > >>Notice: 5: Authenticated globus user: > > > > > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > > > > > >>Nefedova 137823 > > > > > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > > > > > >>Notice: 5: and local gid: 11467 > > > > > > > > > >>Notice: 0: executing > > > > > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > > > > > >>Notice: 0: Child 16726 started > > > > > > > > > >>Notice: 6: > > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > > pid=16769 > > > > > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >>...your DN is also popular in the globus-gatekeeper.log. > > > > > > > > > >> > > > > > > > > > >>tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep > > > 'CN=Veronika > > > > > > > Nefedova' > > > > > > > > > >>globus-gatekeeper.log |wc -l > > > > > > > > > >> 15215 > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >>The large number of connects around the same time: > > > > > > > > > >> > > > > > > > > > >> grep --after-context=10 Veronika > globus-gatekeeper.log | grep > > > > > > > starting | > > > > > > > > > >> grep > > > > > > > > > >>'Mar 16 09:3' | wc -l > > > > > > > > > >> 65 > > > > > > > > > >>... > > > > > > > > > >>Notice: 6: > > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > > pid=15872 > > > > > > > > > >>starting at Fri Mar 16 09:34:02 2007 > > > > > > > > > >>Notice: 6: > > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > > pid=16644 > > > > > > > > > >>starting at Fri Mar 16 09:34:11 2007 > > > > > > > > > >>Notice: 6: > > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > > pid=16769 > > > > > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > > > > > >>Notice: 6: > > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > > pid=16926 > > > > > > > > > >>starting at Fri Mar 16 09:34:14 2007 > > > > > > > > > >>... > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >>...could be overwhelming the gatekeeper on tg-login1. > > > > > > > > > >> > > > > > > > > > >>-Galen > > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > >>Veronika V. Nefedova writes: > > > > > > > > > >> >Hi, Dave: > > > > > > > > > >> > > > > > > > > > > >> >Could you please give me a bit more information on > what has > > > > > > > happened ? > > > > > > > > > >> > > > > > > > > > > >> >Thanks, > > > > > > > > > >> > > > > > > > > > > >> >Nika > > > > > > > > > >> > > > > > > > > > > >> >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > > > > > >> >>FROM: McWilliams, David G > > > > > > > > > >> >>(Concerning ticket No. 137212) > > > > > > > > > >> >> > > > > > > > > > >> >>Veronika, > > > > > > > > > >> >> > > > > > > > > > >> >>The system administrator had to kill several of your > globus > > > > > > > > > processes today > > > > > > > > > >> >>because the load average on the node was over 25. Below > > > is a > > > > > list of > > > > > > > > > >> processes > > > > > > > > > >> >>that were killed. Please let us know if you need help > > > > > > > identifying the > > > > > > > > > >> case of > > > > > > > > > >> >>the problem. > > > > > > > > > >> >> > > > > > > > > > >> >>Regards, > > > > > > > > > >> >> > > > > > > > > > >> >>Dave McWilliams (217) > > > > > > > 244-1144 consult at ncsa.uiuc.edu > > > > > > > > > >> >>NCSA Consulting > > > > > > > > > Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > > > > > > > > > >> >>----------------------------------------------------- > ---- > > > ---- > > > > > ---- > > > > > > > ---- > > > > > > > > > ----- > > > > > > > > > >> >> > > > > > > > > > >> >>nefedova 11084 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11165 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11166 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11277 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11295 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11379 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11402 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11480 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11522 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11602 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11668 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11715 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11785 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11813 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11892 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 11948 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12004 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12030 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12133 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12172 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12253 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12256 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12443 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12460 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12504 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12534 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12657 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12668 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12773 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12806 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12892 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 12946 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13023 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13072 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13142 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13233 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13245 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13352 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13379 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13474 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13488 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13618 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13663 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13743 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13820 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13887 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 13952 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 14046 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 14048 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 14172 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 14196 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 14319 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 14366 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 14430 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 14539 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 14572 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 14703 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 14725 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 14832 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 14849 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15009 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15017 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15164 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15165 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15322 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15332 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15526 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15544 1 0 09:33 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15671 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15672 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15787 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15842 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15981 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 15982 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 16131 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 16132 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 16320 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 16358 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 16553 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 16622 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 16725 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 16726 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 16845 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 16925 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 16989 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 17095 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 17262 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 17305 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 17375 1 0 09:34 ? 00:00:02 > > > > > > > globus-job-manager -conf > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > r.conf > > > > > -type > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > >> >>nefedova 31347 14172 4 13:35 ? 00:00:00 > > > /usr/bin/perl > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-ma > nage > > > r-sc > > > > > ript > > > > > > > .pl -m > > > > > > > > > >> >>pbs_gcc > > > > > > > > > >> >>-f /tmp/gram_1Drc0X -c poll > > > > > > > > > >> >>nefedova 31349 11295 4 13:35 ? 00:00:00 > > > /usr/bin/perl > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-ma > nage > > > r-sc > > > > > ript > > > > > > > .pl -m > > > > > > > > > >> >>pbs_gcc > > > > > > > > > >> >>-f /tmp/gram_fSWNvt -c poll > > > > > > > > > >> >>nefedova 31352 11522 4 13:35 ? 00:00:00 > > > /usr/bin/perl > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-ma > nage > > > r-sc > > > > > ript > > > > > > > .pl -m > > > > > > > > > >> >>pbs_gcc > > > > > > > > > >> >>-f /tmp/gram_a3whWx -c poll > > > > > > > > > >> >>nefedova 31380 16845 4 13:35 ? 00:00:00 > > > /usr/bin/perl > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-ma > nage > > > r-sc > > > > > ript > > > > > > > .pl -m > > > > > > > > > >> >>pbs_gcc > > > > > > > > > >> >>-f /tmp/gram_zz0JuC -c poll > > > > > > > > > >> >>nefedova 31396 14832 4 13:35 ? 00:00:00 > > > /usr/bin/perl > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-ma > nage > > > r-sc > > > > > ript > > > > > > > .pl -m > > > > > > > > > >> >>pbs_gcc > > > > > > > > > >> >>-f /tmp/gram_5jGFI1 -c poll > > > > > > > > > >> >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > > > > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > 905629.tg-master.ncsa.teragrid.org > > > > > > > > > >> >>2>/dev/null > > > > > > > > > >> >>nefedova 31546 31512 0 13:35 ? 00:00:00 > > > > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > > > > > >> >>905629.tg-master.ncsa.teragrid.org > > > > > > > > > >> >>nefedova 31578 31555 0 13:35 ? 00:00:00 > > > > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > > > > > >> >>905638.tg-master.ncsa.teragrid.org > > > > > > > > > >> >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > > > > > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > > > > > tg-master.ncsa.teragrid.org > > > > > > > > > >> 15001 3 > > > > > > > > > >> >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > > > > > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > > > > > tg-master.ncsa.teragrid.org > > > > > > > > > >> 15001 3 > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > Swift-devel mailing list > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From hategan at mcs.anl.gov Wed Mar 21 15:05:01 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 21 Mar 2007 15:05:01 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <6.0.0.22.2.20070321150053.04517920@mail.mcs.anl.gov> References: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> <1174141121.23152.1.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321102155.044728e0@mail.mcs.anl.gov> <1174491076.7102.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321103327.044339e0@mail.mcs.anl.gov> <1174491980.7455.7.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321104830.0442dac0@mail.mcs.anl.gov> <1174492838.8120.1.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321150053.04517920@mail.mcs.anl.gov> Message-ID: <1174507501.16893.0.camel@blabla.mcs.anl.gov> I'm not sure what clustering has to do with the submit throttle. On Wed, 2007-03-21 at 15:03 -0500, Veronika V. Nefedova wrote: > It looks like setting the submitThrottle to 1 didn't really make any > difference. For example, I have 50 jobs that are 15 minutes each. And I > see that some of them were glued together (shown in the queue as 30 or 45 > minute jobs): > > 912830.tg-master.ncs nefedova > dque STDIN -- 1 1 -- 00:30 Q -- > 912832.tg-master.ncs nefedova > dque STDIN -- 1 1 -- 00:15 Q -- > 912833.tg-master.ncs nefedova > dque STDIN -- 1 1 -- 00:30 Q -- > 912834.tg-master.ncs nefedova > dque STDIN -- 1 1 -- 00:15 Q -- > 912835.tg-master.ncs nefedova > dque STDIN -- 1 1 -- 00:15 Q -- > 912836.tg-master.ncs nefedova > dque STDIN -- 1 1 -- 00:45 Q -- > > Nika > > At 11:00 AM 3/21/2007, Mihael Hategan wrote: > >On Wed, 2007-03-21 at 10:51 -0500, Veronika V. Nefedova wrote: > > > OK. So if I set this submitThrottle to 1 it will submit jobs one at time > > > (and it won't wait for the previous jobs to finish) ? > > > >Yes. > > > > > What will be an > > > indication for swift to go ahead and submit the next jobs (time delay?)? > > > >The fact that the submission of the previous job has been completed (ie, > >the job manager has put the job in the queue). > > > > > If > > > thats so - than I think I am ok. > > > > > > Thanks again, > > > > > > Nika > > > > > > At 10:46 AM 3/21/2007, Mihael Hategan wrote: > > > >I think these should be ok. Unfortunately I can't tell you what a > > > >solution to "as safe as possible" is because of two things: > > > >1. The explanation of why your jobs got killed and the solution they > > > >proposed are ambiguous. They don't explain much. So the proposed > > > >solution may be insufficient or it may superfluous. > > > >2. We don't exactly have submission rate limiters. The closest thing is > > > >the submission concurrency limiter. Setting this to 1 should work, > > > >because this will ensure that at most one job manager will do the > > > >submission dance at a time. > > > > > > > >Mihael > > > > > > > >On Wed, 2007-03-21 at 10:37 -0500, Veronika V. Nefedova wrote: > > > > > Ok. Hmmm. I am about to submit a large run (50 molecules), which could > > > > have > > > > > as many as 3500 jobs per tier. I really would like to be sure that > > I do > > > > not > > > > > brake TG. I want to play as safe as possible thus I'd like to make sure > > > > > that I set all the possible parameters to safeguard the run ? > > > > > > > > > > Thanks, > > > > > > > > > > Nika > > > > > > > > > > At 10:31 AM 3/21/2007, Mihael Hategan wrote: > > > > > >On Wed, 2007-03-21 at 10:27 -0500, Veronika V. Nefedova wrote: > > > > > > > Hi, Mihael: > > > > > > > > > > > > > > I have these properties modified in my scheduler.xml file: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Are you suggesting to add also this inside > > ... : > > > > > > > > > > > > > > ? > > > > > > > > > > > > > > Do these set parameters guarantee me that: > > > > > > > > > > > > > > 1. I have no more then 384 jobs in a queue at any time > > > > > > > and > > > > > > > 2. Jobs are submitted to the queue with at least 1 sec delay > > > > > > > > > > > >No. They don't. But they may get you closer to that. > > > > > > > > > > > > > > > > > > > > (these are the requirements from TG NCSA). > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > At 09:18 AM 3/17/2007, Mihael Hategan wrote: > > > > > > > >There is no direct rate limiter unfortunately. There is a submit > > > > > > > >throttle which tells the number of concurrent submissions. > > Setting > > > > that > > > > > > > >to 1 may work. > > > > > > > > > > > > > > > >On Fri, 2007-03-16 at 17:02 -0500, Veronika V. Nefedova wrote: > > > > > > > > > Hi, Mihael: > > > > > > > > > > > > > > > > > > how do I set this throttling parameter ? > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > >Date: Fri, 16 Mar 2007 15:53:57 -0600 > > > > > > > > > >Subject: Re: globus jobs killed on NCSA's IA64 cluster > > (mercury) > > > > > > > > > >To: nefedova at mcs.anl.gov > > > > > > > > > >From: consult at ncsa.uiuc.edu > > > > > > > > > >Cc: > > > > > > > > > >X-Mailer: Perl5 Mail::Internet v1.74 > > > > > > > > > >Sender: Nobody > > > > > > > > > >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9 > > > > > > > > > >X-NCSA-MailScanner-Information: Please contact > > > > help at ncsa.uiuc.edu for > > > > > > > > more > > > > > > > > > >information, amantadine.ncsa.uiuc.edu > > > > > > > > > >X-NCSA-MailScanner: Found to be clean > > > > > > > > > >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at > > > > > > > > mailgw.mcs.anl.gov > > > > > > > > > > > > > > > > > > > >FROM: Arnold, Galen > > > > > > > > > >(Concerning ticket No. 137212) > > > > > > > > > > > > > > > > > > > >Veronika, > > > > > > > > > > > > > > > > > > > >If you can throttle the job submission so that there's more > > > > than 1 > > > > > > second > > > > > > > > > >between them, that would probably help us out. > > > > > > > > > > > > > > > > > > > >-Galen > > > > > > > > > > > > > > > > > > > >Veronika V. Nefedova writes: > > > > > > > > > > >Hi, Galen: > > > > > > > > > > > > > > > > > > > > > >I was told that I could have 384 jobs in PBS at the same > > > > time. By my > > > > > > > > > > >estimation I had no more then 136 jobs in the queue. What > > > > are other > > > > > > > > limits > > > > > > > > > > >I should keep in mind ? > > > > > > > > > > > > > > > > > > > > > >Thanks, > > > > > > > > > > > > > > > > > > > > > >Veronika > > > > > > > > > > > > > > > > > > > > > >At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > > > > > > >>FROM: Arnold, Galen > > > > > > > > > > >>(Concerning ticket No. 137212) > > > > > > > > > > >> > > > > > > > > > > >>Veronika, > > > > > > > > > > >> > > > > > > > > > > >>We see many sequences in our globus-gatekeeper.log on > > tg-login1 > > > > > > > > like this > > > > > > > > > > >> > > > > > > > > > > >>Notice: 5: Authenticated globus user: > > > > > > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > > > > > > >>Nefedova 137823 > > > > > > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > > > > > > >>Notice: 5: and local gid: 11467 > > > > > > > > > > >>Notice: 0: executing > > > > > > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > > > > > > >>Notice: 0: Child 16725 started > > > > > > > > > > >>Notice: 5: Authenticated globus user: > > > > > > > > > > >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > > > > > > > > > >>Nefedova 137823 > > > > > > > > > > >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > > > > > > > > > >>Notice: 5: Requested service: jobmanager-pbs > > > > > > > > > > >>Notice: 5: Authorized as local user: nefedova > > > > > > > > > > >>Notice: 5: Authorized as local uid: 29202 > > > > > > > > > > >>Notice: 5: and local gid: 11467 > > > > > > > > > > >>Notice: 0: executing > > > > > > > > > > >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > > > > > > > > > >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > > > > > > > > > >>Notice: 0: Child 16726 started > > > > > > > > > > >>Notice: 6: > > > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > > > pid=16769 > > > > > > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >>...your DN is also popular in the globus-gatekeeper.log. > > > > > > > > > > >> > > > > > > > > > > >>tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep > > > > 'CN=Veronika > > > > > > > > Nefedova' > > > > > > > > > > >>globus-gatekeeper.log |wc -l > > > > > > > > > > >> 15215 > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >>The large number of connects around the same time: > > > > > > > > > > >> > > > > > > > > > > >> grep --after-context=10 Veronika > > globus-gatekeeper.log | grep > > > > > > > > starting | > > > > > > > > > > >> grep > > > > > > > > > > >>'Mar 16 09:3' | wc -l > > > > > > > > > > >> 65 > > > > > > > > > > >>... > > > > > > > > > > >>Notice: 6: > > > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > > > pid=15872 > > > > > > > > > > >>starting at Fri Mar 16 09:34:02 2007 > > > > > > > > > > >>Notice: 6: > > > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > > > pid=16644 > > > > > > > > > > >>starting at Fri Mar 16 09:34:11 2007 > > > > > > > > > > >>Notice: 6: > > > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > > > pid=16769 > > > > > > > > > > >>starting at Fri Mar 16 09:34:12 2007 > > > > > > > > > > >>Notice: 6: > > > > /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > > > > > > > pid=16926 > > > > > > > > > > >>starting at Fri Mar 16 09:34:14 2007 > > > > > > > > > > >>... > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >>...could be overwhelming the gatekeeper on tg-login1. > > > > > > > > > > >> > > > > > > > > > > >>-Galen > > > > > > > > > > >> > > > > > > > > > > >> > > > > > > > > > > >>Veronika V. Nefedova writes: > > > > > > > > > > >> >Hi, Dave: > > > > > > > > > > >> > > > > > > > > > > > >> >Could you please give me a bit more information on > > what has > > > > > > > > happened ? > > > > > > > > > > >> > > > > > > > > > > > >> >Thanks, > > > > > > > > > > >> > > > > > > > > > > > >> >Nika > > > > > > > > > > >> > > > > > > > > > > > >> >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > > > > > > > > > >> >>FROM: McWilliams, David G > > > > > > > > > > >> >>(Concerning ticket No. 137212) > > > > > > > > > > >> >> > > > > > > > > > > >> >>Veronika, > > > > > > > > > > >> >> > > > > > > > > > > >> >>The system administrator had to kill several of your > > globus > > > > > > > > > > processes today > > > > > > > > > > >> >>because the load average on the node was over 25. Below > > > > is a > > > > > > list of > > > > > > > > > > >> processes > > > > > > > > > > >> >>that were killed. Please let us know if you need help > > > > > > > > identifying the > > > > > > > > > > >> case of > > > > > > > > > > >> >>the problem. > > > > > > > > > > >> >> > > > > > > > > > > >> >>Regards, > > > > > > > > > > >> >> > > > > > > > > > > >> >>Dave McWilliams (217) > > > > > > > > 244-1144 consult at ncsa.uiuc.edu > > > > > > > > > > >> >>NCSA Consulting > > > > > > > > > > Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > > > > > > > > > > >> >>----------------------------------------------------- > > ---- > > > > ---- > > > > > > ---- > > > > > > > > ---- > > > > > > > > > > ----- > > > > > > > > > > >> >> > > > > > > > > > > >> >>nefedova 11084 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11165 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11166 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11277 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11295 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11379 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11402 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11480 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11522 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11602 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11668 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11715 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11785 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11813 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11892 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 11948 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12004 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12030 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12133 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12172 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12253 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12256 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12443 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12460 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12504 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12534 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12657 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12668 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12773 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12806 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12892 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 12946 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13023 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13072 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13142 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13233 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13245 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13352 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13379 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13474 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13488 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13618 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13663 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13743 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13820 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13887 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 13952 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 14046 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 14048 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 14172 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 14196 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 14319 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 14366 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 14430 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 14539 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 14572 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 14703 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 14725 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 14832 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 14849 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15009 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15017 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15164 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15165 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15322 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15332 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15526 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15544 1 0 09:33 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15671 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15672 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15787 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15842 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15981 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 15982 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 16131 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 16132 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 16320 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 16358 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 16553 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 16622 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 16725 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 16726 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 16845 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 16925 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 16989 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 17095 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 17262 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 17305 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 17375 1 0 09:34 ? 00:00:02 > > > > > > > > globus-job-manager -conf > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manage > > r.conf > > > > > > -type > > > > > > > > > > >> pbs_gcc -rdn > > > > > > > > > > >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > > > > > > > > > >> >>nefedova 31347 14172 4 13:35 ? 00:00:00 > > > > /usr/bin/perl > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-ma > > nage > > > > r-sc > > > > > > ript > > > > > > > > .pl -m > > > > > > > > > > >> >>pbs_gcc > > > > > > > > > > >> >>-f /tmp/gram_1Drc0X -c poll > > > > > > > > > > >> >>nefedova 31349 11295 4 13:35 ? 00:00:00 > > > > /usr/bin/perl > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-ma > > nage > > > > r-sc > > > > > > ript > > > > > > > > .pl -m > > > > > > > > > > >> >>pbs_gcc > > > > > > > > > > >> >>-f /tmp/gram_fSWNvt -c poll > > > > > > > > > > >> >>nefedova 31352 11522 4 13:35 ? 00:00:00 > > > > /usr/bin/perl > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-ma > > nage > > > > r-sc > > > > > > ript > > > > > > > > .pl -m > > > > > > > > > > >> >>pbs_gcc > > > > > > > > > > >> >>-f /tmp/gram_a3whWx -c poll > > > > > > > > > > >> >>nefedova 31380 16845 4 13:35 ? 00:00:00 > > > > /usr/bin/perl > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-ma > > nage > > > > r-sc > > > > > > ript > > > > > > > > .pl -m > > > > > > > > > > >> >>pbs_gcc > > > > > > > > > > >> >>-f /tmp/gram_zz0JuC -c poll > > > > > > > > > > >> >>nefedova 31396 14832 4 13:35 ? 00:00:00 > > > > /usr/bin/perl > > > > > > > > > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-ma > > nage > > > > r-sc > > > > > > ript > > > > > > > > .pl -m > > > > > > > > > > >> >>pbs_gcc > > > > > > > > > > >> >>-f /tmp/gram_5jGFI1 -c poll > > > > > > > > > > >> >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > > > > > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > > 905629.tg-master.ncsa.teragrid.org > > > > > > > > > > >> >>2>/dev/null > > > > > > > > > > >> >>nefedova 31546 31512 0 13:35 ? 00:00:00 > > > > > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > > > > > > >> >>905629.tg-master.ncsa.teragrid.org > > > > > > > > > > >> >>nefedova 31578 31555 0 13:35 ? 00:00:00 > > > > > > > > > > >> >>/usr/local/pbs/ia64/bin/qstat -f > > > > > > > > > > >> >>905638.tg-master.ncsa.teragrid.org > > > > > > > > > > >> >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > > > > > > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > > > > > > tg-master.ncsa.teragrid.org > > > > > > > > > > >> 15001 3 > > > > > > > > > > >> >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > > > > > > > > > > >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff > > > > > > > > tg-master.ncsa.teragrid.org > > > > > > > > > > >> 15001 3 > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > > > Swift-devel mailing list > > > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From nefedova at mcs.anl.gov Wed Mar 21 15:17:07 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 21 Mar 2007 15:17:07 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <1174507501.16893.0.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> <1174141121.23152.1.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321102155.044728e0@mail.mcs.anl.gov> <1174491076.7102.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321103327.044339e0@mail.mcs.anl.gov> <1174491980.7455.7.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321104830.0442dac0@mail.mcs.anl.gov> <1174492838.8120.1.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321150053.04517920@mail.mcs.anl.gov> <1174507501.16893.0.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070321151400.044298e0@mail.mcs.anl.gov> I thought clustering happens whne the jobs are being submitted at the same time (or really close to each other, less then a second apart). Or I am mistaken ? If I am - when does the clustering happens? Is there a parameter that controls it? Thanks! Nika At 03:05 PM 3/21/2007, Mihael Hategan wrote: >I'm not sure what clustering has to do with the submit throttle. > >On Wed, 2007-03-21 at 15:03 -0500, Veronika V. Nefedova wrote: > > It looks like setting the submitThrottle to 1 didn't really make any > > difference. For example, I have 50 jobs that are 15 minutes each. And I > > see that some of them were glued together (shown in the queue as 30 or 45 > > minute jobs): > > > > 912830.tg-master.ncs nefedova > > dque STDIN -- 1 1 -- 00:30 Q -- > > 912832.tg-master.ncs nefedova > > dque STDIN -- 1 1 -- 00:15 Q -- > > 912833.tg-master.ncs nefedova > > dque STDIN -- 1 1 -- 00:30 Q -- > > 912834.tg-master.ncs nefedova > > dque STDIN -- 1 1 -- 00:15 Q -- > > 912835.tg-master.ncs nefedova > > dque STDIN -- 1 1 -- 00:15 Q -- > > 912836.tg-master.ncs nefedova > > dque STDIN -- 1 1 -- 00:45 Q -- > > > > Nika > > > > At 11:00 AM 3/21/2007, Mihael Hategan wrote: > > >On Wed, 2007-03-21 at 10:51 -0500, Veronika V. Nefedova wrote: > > > > OK. So if I set this submitThrottle to 1 it will submit jobs one at > time > > > > (and it won't wait for the previous jobs to finish) ? > > > > > >Yes. > > > > > > > What will be an > > > > indication for swift to go ahead and submit the next jobs (time > delay?)? > > > > > >The fact that the submission of the previous job has been completed (ie, > > >the job manager has put the job in the queue). > > > > > > > If > > > > thats so - than I think I am ok. > > > > > > > > Thanks again, > > > > > > > > Nika > > > > > > > > At 10:46 AM 3/21/2007, Mihael Hategan wrote: > > > > >I think these should be ok. Unfortunately I can't tell you what a > > > > >solution to "as safe as possible" is because of two things: > > > > >1. The explanation of why your jobs got killed and the solution they > > > > >proposed are ambiguous. They don't explain much. So the proposed > > > > >solution may be insufficient or it may superfluous. > > > > >2. We don't exactly have submission rate limiters. The closest > thing is > > > > >the submission concurrency limiter. Setting this to 1 should work, > > > > >because this will ensure that at most one job manager will do the > > > > >submission dance at a time. > > > > > > > > > >Mihael > > > > > > > > > >On Wed, 2007-03-21 at 10:37 -0500, Veronika V. Nefedova wrote: > > > > > > Ok. Hmmm. I am about to submit a large run (50 molecules), > which could > > > > > have > > > > > > as many as 3500 jobs per tier. I really would like to be sure that > > > I do > > > > > not > > > > > > brake TG. I want to play as safe as possible thus I'd like to > make sure > > > > > > that I set all the possible parameters to safeguard the run ? > > > > > > > > > > > > Thanks, > > > > > > > > > > > > Nika > > > > > > > > > > > > At 10:31 AM 3/21/2007, Mihael Hategan wrote: > > > > > > >On Wed, 2007-03-21 at 10:27 -0500, Veronika V. Nefedova wrote: > > > > > > > > Hi, Mihael: > > > > > > > > > > > > > > > > I have these properties modified in my scheduler.xml file: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Are you suggesting to add also this inside > > > ... : > > > > > > > > > > > > > > > > ? > > > > > > > > > > > > > > > > Do these set parameters guarantee me that: > > > > > > > > > > > > > > > > 1. I have no more then 384 jobs in a queue at any time > > > > > > > > and > > > > > > > > 2. Jobs are submitted to the queue with at least 1 sec delay > > > > > > > > > > > > > >No. They don't. But they may get you closer to that. > > > > > > > > > > > > > > > > > > > > > > > (these are the requirements from TG NCSA). > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > At 09:18 AM 3/17/2007, Mihael Hategan wrote: > > > > > > > > >There is no direct rate limiter unfortunately. There is a > submit > > > > > > > > >throttle which tells the number of concurrent submissions. > > > Setting > > > > > that > > > > > > > > >to 1 may work. > > > > > > > > > > > > > > > > > >On Fri, 2007-03-16 at 17:02 -0500, Veronika V. Nefedova wrote: > > > > > > > > > > Hi, Mihael: > > > > > > > > > > > > > > > > > > > > how do I set this throttling parameter ? > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > >Date: Fri, 16 Mar 2007 15:53:57 -0600 > > > > > > > > > > >Subject: Re: globus jobs killed on NCSA's IA64 cluster > > > (mercury) > > > > > > > > > > >To: nefedova at mcs.anl.gov > > > > > > > > > > >From: consult at ncsa.uiuc.edu > > > > > > > > > > >Cc: > > > > > > > > > > >X-Mailer: Perl5 Mail::Internet v1.74 > > > > > > > > > > >Sender: Nobody > > > > > > > > > > >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9 > > > > > > > > > > >X-NCSA-MailScanner-Information: Please contact > > > > > help at ncsa.uiuc.edu for > > > > > > > > > more > > > > > > > > > > >information, amantadine.ncsa.uiuc.edu > > > > > > > > > > >X-NCSA-MailScanner: Found to be clean > > > > > > > > > > >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at > > > > > > > > > mailgw.mcs.anl.gov > > > > > > > > > > > > > > > > > > > > > >FROM: Arnold, Galen > > > > > > > > > > >(Concerning ticket No. 137212) > > > > > > > > > > > > > > > > > > > > > >Veronika, > > > > > > > > > > > > > > > > > > > > > >If you can throttle the job submission so that there's > more > > > > > than 1 > > > > > > > second > > > > > > > > > > >between them, that would probably help us out. > > > > > > > > > > > > > > > > > > > > > >-Galen > > > > > > > > > > > From hategan at mcs.anl.gov Wed Mar 21 15:19:33 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 21 Mar 2007 15:19:33 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <6.0.0.22.2.20070321151400.044298e0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070316170133.05600a60@mail.mcs.anl.gov> <1174141121.23152.1.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321102155.044728e0@mail.mcs.anl.gov> <1174491076.7102.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321103327.044339e0@mail.mcs.anl.gov> <1174491980.7455.7.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321104830.0442dac0@mail.mcs.anl.gov> <1174492838.8120.1.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321150053.04517920@mail.mcs.anl.gov> <1174507501.16893.0.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321151400.044298e0@mail.mcs.anl.gov> Message-ID: <1174508373.17463.1.camel@blabla.mcs.anl.gov> Clustering is done before the throttling. So the submit throttle will apply to the cluster not the individual jobs. On Wed, 2007-03-21 at 15:17 -0500, Veronika V. Nefedova wrote: > I thought clustering happens whne the jobs are being submitted at the same > time (or really close to each other, less then a second apart). Or I am > mistaken ? If I am - when does the clustering happens? Is there a parameter > that controls it? > > Thanks! > > Nika > > At 03:05 PM 3/21/2007, Mihael Hategan wrote: > >I'm not sure what clustering has to do with the submit throttle. > > > >On Wed, 2007-03-21 at 15:03 -0500, Veronika V. Nefedova wrote: > > > It looks like setting the submitThrottle to 1 didn't really make any > > > difference. For example, I have 50 jobs that are 15 minutes each. And I > > > see that some of them were glued together (shown in the queue as 30 or 45 > > > minute jobs): > > > > > > 912830.tg-master.ncs nefedova > > > dque STDIN -- 1 1 -- 00:30 Q -- > > > 912832.tg-master.ncs nefedova > > > dque STDIN -- 1 1 -- 00:15 Q -- > > > 912833.tg-master.ncs nefedova > > > dque STDIN -- 1 1 -- 00:30 Q -- > > > 912834.tg-master.ncs nefedova > > > dque STDIN -- 1 1 -- 00:15 Q -- > > > 912835.tg-master.ncs nefedova > > > dque STDIN -- 1 1 -- 00:15 Q -- > > > 912836.tg-master.ncs nefedova > > > dque STDIN -- 1 1 -- 00:45 Q -- > > > > > > Nika > > > > > > At 11:00 AM 3/21/2007, Mihael Hategan wrote: > > > >On Wed, 2007-03-21 at 10:51 -0500, Veronika V. Nefedova wrote: > > > > > OK. So if I set this submitThrottle to 1 it will submit jobs one at > > time > > > > > (and it won't wait for the previous jobs to finish) ? > > > > > > > >Yes. > > > > > > > > > What will be an > > > > > indication for swift to go ahead and submit the next jobs (time > > delay?)? > > > > > > > >The fact that the submission of the previous job has been completed (ie, > > > >the job manager has put the job in the queue). > > > > > > > > > If > > > > > thats so - than I think I am ok. > > > > > > > > > > Thanks again, > > > > > > > > > > Nika > > > > > > > > > > At 10:46 AM 3/21/2007, Mihael Hategan wrote: > > > > > >I think these should be ok. Unfortunately I can't tell you what a > > > > > >solution to "as safe as possible" is because of two things: > > > > > >1. The explanation of why your jobs got killed and the solution they > > > > > >proposed are ambiguous. They don't explain much. So the proposed > > > > > >solution may be insufficient or it may superfluous. > > > > > >2. We don't exactly have submission rate limiters. The closest > > thing is > > > > > >the submission concurrency limiter. Setting this to 1 should work, > > > > > >because this will ensure that at most one job manager will do the > > > > > >submission dance at a time. > > > > > > > > > > > >Mihael > > > > > > > > > > > >On Wed, 2007-03-21 at 10:37 -0500, Veronika V. Nefedova wrote: > > > > > > > Ok. Hmmm. I am about to submit a large run (50 molecules), > > which could > > > > > > have > > > > > > > as many as 3500 jobs per tier. I really would like to be sure that > > > > I do > > > > > > not > > > > > > > brake TG. I want to play as safe as possible thus I'd like to > > make sure > > > > > > > that I set all the possible parameters to safeguard the run ? > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > At 10:31 AM 3/21/2007, Mihael Hategan wrote: > > > > > > > >On Wed, 2007-03-21 at 10:27 -0500, Veronika V. Nefedova wrote: > > > > > > > > > Hi, Mihael: > > > > > > > > > > > > > > > > > > I have these properties modified in my scheduler.xml file: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Are you suggesting to add also this inside > > > > ... : > > > > > > > > > > > > > > > > > > ? > > > > > > > > > > > > > > > > > > Do these set parameters guarantee me that: > > > > > > > > > > > > > > > > > > 1. I have no more then 384 jobs in a queue at any time > > > > > > > > > and > > > > > > > > > 2. Jobs are submitted to the queue with at least 1 sec delay > > > > > > > > > > > > > > > >No. They don't. But they may get you closer to that. > > > > > > > > > > > > > > > > > > > > > > > > > > (these are the requirements from TG NCSA). > > > > > > > > > > > > > > > > > > Thanks! > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > At 09:18 AM 3/17/2007, Mihael Hategan wrote: > > > > > > > > > >There is no direct rate limiter unfortunately. There is a > > submit > > > > > > > > > >throttle which tells the number of concurrent submissions. > > > > Setting > > > > > > that > > > > > > > > > >to 1 may work. > > > > > > > > > > > > > > > > > > > >On Fri, 2007-03-16 at 17:02 -0500, Veronika V. Nefedova wrote: > > > > > > > > > > > Hi, Mihael: > > > > > > > > > > > > > > > > > > > > > > how do I set this throttling parameter ? > > > > > > > > > > > > > > > > > > > > > > Thanks, > > > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > >Date: Fri, 16 Mar 2007 15:53:57 -0600 > > > > > > > > > > > >Subject: Re: globus jobs killed on NCSA's IA64 cluster > > > > (mercury) > > > > > > > > > > > >To: nefedova at mcs.anl.gov > > > > > > > > > > > >From: consult at ncsa.uiuc.edu > > > > > > > > > > > >Cc: > > > > > > > > > > > >X-Mailer: Perl5 Mail::Internet v1.74 > > > > > > > > > > > >Sender: Nobody > > > > > > > > > > > >X-Null-Tag: 2edd4a9833fa010df5441f1443ff58a9 > > > > > > > > > > > >X-NCSA-MailScanner-Information: Please contact > > > > > > help at ncsa.uiuc.edu for > > > > > > > > > > more > > > > > > > > > > > >information, amantadine.ncsa.uiuc.edu > > > > > > > > > > > >X-NCSA-MailScanner: Found to be clean > > > > > > > > > > > >X-Virus-Scanned: by amavisd-new-20030616-p10 (Debian) at > > > > > > > > > > mailgw.mcs.anl.gov > > > > > > > > > > > > > > > > > > > > > > > >FROM: Arnold, Galen > > > > > > > > > > > >(Concerning ticket No. 137212) > > > > > > > > > > > > > > > > > > > > > > > >Veronika, > > > > > > > > > > > > > > > > > > > > > > > >If you can throttle the job submission so that there's > > more > > > > > > than 1 > > > > > > > > second > > > > > > > > > > > >between them, that would probably help us out. > > > > > > > > > > > > > > > > > > > > > > > >-Galen > > > > > > > > > > > > > > From nefedova at mcs.anl.gov Wed Mar 21 16:50:34 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 21 Mar 2007 16:50:34 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) Message-ID: <6.0.0.22.2.20070321164837.04b3eaf0@mail.mcs.anl.gov> Ok, NCSA wants to space job submissions to 5 seconds now... I am not sure if we can do that ? (like inserting 'sleep' before each submission) Nika >Date: Wed, 21 Mar 2007 16:17:59 -0500 >Subject: Re: globus jobs killed on NCSA's IA64 cluster (mercury) >To: nefedova at mcs.anl.gov >From: consult at ncsa.uiuc.edu > >FROM: Arnold, Galen >(Concerning ticket No. 137212) > >Veronika, > >Could you space these out a little more? > >Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=12889 >starting at Wed Mar 21 15:52:23 2007 >Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=12999 >starting at Wed Mar 21 15:52:25 2007 >Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=13024 >starting at Wed Mar 21 15:52:27 2007 >Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=13142 >starting at Wed Mar 21 15:52:29 2007 > > >...looks like they're at 2s intervals. If many of them get queued in a few >minutes, it starts to get ahead of that gatekeeper and jobmanager. Try 5s >next. > >-Galen > >Veronika V. Nefedova writes: > >Yes, Thank you. I'll set this throttling parameter in my workflow. > > > >Nika > > > >At 04:53 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > >>FROM: Arnold, Galen > >>(Concerning ticket No. 137212) > >> > >>Veronika, > >> > >>If you can throttle the job submission so that there's more than 1 second > >>between them, that would probably help us out. > >> > >>-Galen > >> > >>Veronika V. Nefedova writes: > >> >Hi, Galen: > >> > > >> >I was told that I could have 384 jobs in PBS at the same time. By my > >> >estimation I had no more then 136 jobs in the queue. What are other > limits > >> >I should keep in mind ? > >> > > >> >Thanks, > >> > > >> >Veronika > >> > > >> >At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > >> >>FROM: Arnold, Galen > >> >>(Concerning ticket No. 137212) > >> >> > >> >>Veronika, > >> >> > >> >>We see many sequences in our globus-gatekeeper.log on tg-login1 like > this > >> >> > >> >>Notice: 5: Authenticated globus user: > >> >>/DC=org/DC=doegrids/OU=People/CN=Veronika > >> >>Nefedova 137823 > >> >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > >> >>Notice: 5: Requested service: jobmanager-pbs > >> >>Notice: 5: Authorized as local user: nefedova > >> >>Notice: 5: Authorized as local uid: 29202 > >> >>Notice: 5: and local gid: 11467 > >> >>Notice: 0: executing > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > >> >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > >> >>Notice: 0: Child 16725 started > >> >>Notice: 5: Authenticated globus user: > >> >>/DC=org/DC=doegrids/OU=People/CN=Veronika > >> >>Nefedova 137823 > >> >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > >> >>Notice: 5: Requested service: jobmanager-pbs > >> >>Notice: 5: Authorized as local user: nefedova > >> >>Notice: 5: Authorized as local uid: 29202 > >> >>Notice: 5: and local gid: 11467 > >> >>Notice: 0: executing > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > >> >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > >> >>Notice: 0: Child 16726 started > >> >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > pid=16769 > >> >>starting at Fri Mar 16 09:34:12 2007 > >> >> > >> >> > >> >>...your DN is also popular in the globus-gatekeeper.log. > >> >> > >> >>tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep 'CN=Veronika > Nefedova' > >> >>globus-gatekeeper.log |wc -l > >> >> 15215 > >> >> > >> >> > >> >>The large number of connects around the same time: > >> >> > >> >> grep --after-context=10 Veronika globus-gatekeeper.log | grep > starting | > >> >> grep > >> >>'Mar 16 09:3' | wc -l > >> >> 65 > >> >>... > >> >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > pid=15872 > >> >>starting at Fri Mar 16 09:34:02 2007 > >> >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > pid=16644 > >> >>starting at Fri Mar 16 09:34:11 2007 > >> >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > pid=16769 > >> >>starting at Fri Mar 16 09:34:12 2007 > >> >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > pid=16926 > >> >>starting at Fri Mar 16 09:34:14 2007 > >> >>... > >> >> > >> >> > >> >>...could be overwhelming the gatekeeper on tg-login1. > >> >> > >> >>-Galen > >> >> > >> >> > >> >>Veronika V. Nefedova writes: > >> >> >Hi, Dave: > >> >> > > >> >> >Could you please give me a bit more information on what has happened ? > >> >> > > >> >> >Thanks, > >> >> > > >> >> >Nika > >> >> > > >> >> >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > >> >> >>FROM: McWilliams, David G > >> >> >>(Concerning ticket No. 137212) > >> >> >> > >> >> >>Veronika, > >> >> >> > >> >> >>The system administrator had to kill several of your globus > >> processes today > >> >> >>because the load average on the node was over 25. Below is a list of > >> >> processes > >> >> >>that were killed. Please let us know if you need help identifying the > >> >> case of > >> >> >>the problem. > >> >> >> > >> >> >>Regards, > >> >> >> > >> >> >>Dave McWilliams (217) > 244-1144 consult at ncsa.uiuc.edu > >> >> >>NCSA Consulting > >> Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > >> >> >>------------------------------------------------------------------ > --- > >> ----- > >> >> >> > >> >> >>nefedova 11084 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11165 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11166 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11277 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11295 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11379 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11402 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11480 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11522 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11602 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11668 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11715 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11785 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11813 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11892 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 11948 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12004 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12030 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12133 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12172 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12253 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12256 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12443 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12460 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12504 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12534 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12657 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12668 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12773 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12806 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12892 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 12946 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13023 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13072 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13142 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13233 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13245 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13352 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13379 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13474 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13488 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13618 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13663 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13743 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13820 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13887 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 13952 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 14046 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 14048 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 14172 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 14196 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 14319 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 14366 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 14430 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 14539 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 14572 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 14703 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 14725 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 14832 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 14849 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15009 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15017 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15164 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15165 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15322 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15332 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15526 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15544 1 0 09:33 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15671 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15672 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15787 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15842 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15981 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 15982 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 16131 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 16132 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 16320 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 16358 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 16553 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 16622 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 16725 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 16726 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 16845 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 16925 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 16989 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 17095 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 17262 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 17305 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 17375 1 0 09:34 ? 00:00:02 > globus-job-manager -conf > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > >> >> pbs_gcc -rdn > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > >> >> >>nefedova 31347 14172 4 13:35 ? 00:00:00 /usr/bin/perl > >> >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script. > pl -m > >> >> >>pbs_gcc > >> >> >>-f /tmp/gram_1Drc0X -c poll > >> >> >>nefedova 31349 11295 4 13:35 ? 00:00:00 /usr/bin/perl > >> >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script. > pl -m > >> >> >>pbs_gcc > >> >> >>-f /tmp/gram_fSWNvt -c poll > >> >> >>nefedova 31352 11522 4 13:35 ? 00:00:00 /usr/bin/perl > >> >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script. > pl -m > >> >> >>pbs_gcc > >> >> >>-f /tmp/gram_a3whWx -c poll > >> >> >>nefedova 31380 16845 4 13:35 ? 00:00:00 /usr/bin/perl > >> >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script. > pl -m > >> >> >>pbs_gcc > >> >> >>-f /tmp/gram_zz0JuC -c poll > >> >> >>nefedova 31396 14832 4 13:35 ? 00:00:00 /usr/bin/perl > >> >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script. > pl -m > >> >> >>pbs_gcc > >> >> >>-f /tmp/gram_5jGFI1 -c poll > >> >> >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > >> >> >>/usr/local/pbs/ia64/bin/qstat -f 905629.tg-master.ncsa.teragrid.org > >> >> >>2>/dev/null > >> >> >>nefedova 31546 31512 0 13:35 ? 00:00:00 > >> >> >>/usr/local/pbs/ia64/bin/qstat -f > >> >> >>905629.tg-master.ncsa.teragrid.org > >> >> >>nefedova 31578 31555 0 13:35 ? 00:00:00 > >> >> >>/usr/local/pbs/ia64/bin/qstat -f > >> >> >>905638.tg-master.ncsa.teragrid.org > >> >> >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > >> >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff tg-master.ncsa.teragrid.org > >> >> 15001 3 > >> >> >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > >> >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff tg-master.ncsa.teragrid.org > >> >> 15001 3 From hategan at mcs.anl.gov Wed Mar 21 17:02:21 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 21 Mar 2007 17:02:21 -0500 Subject: [Swift-devel] Fwd: Re: globus jobs killed on NCSA's IA64 cluster (mercury) In-Reply-To: <6.0.0.22.2.20070321164837.04b3eaf0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070321164837.04b3eaf0@mail.mcs.anl.gov> Message-ID: <1174514541.21757.2.camel@blabla.mcs.anl.gov> On Wed, 2007-03-21 at 16:50 -0500, Veronika V. Nefedova wrote: > Ok, NCSA wants to space job submissions to 5 seconds now... I am not sure > if we can do that ? The only thing left is to add a submission rate limiter, but this won't happen overnight. > (like inserting 'sleep' before each submission) > > Nika > > > >Date: Wed, 21 Mar 2007 16:17:59 -0500 > >Subject: Re: globus jobs killed on NCSA's IA64 cluster (mercury) > >To: nefedova at mcs.anl.gov > >From: consult at ncsa.uiuc.edu > > > >FROM: Arnold, Galen > >(Concerning ticket No. 137212) > > > >Veronika, > > > >Could you space these out a little more? > > > >Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=12889 > >starting at Wed Mar 21 15:52:23 2007 > >Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=12999 > >starting at Wed Mar 21 15:52:25 2007 > >Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=13024 > >starting at Wed Mar 21 15:52:27 2007 > >Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper pid=13142 > >starting at Wed Mar 21 15:52:29 2007 > > > > > >...looks like they're at 2s intervals. If many of them get queued in a few > >minutes, it starts to get ahead of that gatekeeper and jobmanager. Try 5s > >next. > > > >-Galen > > > >Veronika V. Nefedova writes: > > >Yes, Thank you. I'll set this throttling parameter in my workflow. > > > > > >Nika > > > > > >At 04:53 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > >>FROM: Arnold, Galen > > >>(Concerning ticket No. 137212) > > >> > > >>Veronika, > > >> > > >>If you can throttle the job submission so that there's more than 1 second > > >>between them, that would probably help us out. > > >> > > >>-Galen > > >> > > >>Veronika V. Nefedova writes: > > >> >Hi, Galen: > > >> > > > >> >I was told that I could have 384 jobs in PBS at the same time. By my > > >> >estimation I had no more then 136 jobs in the queue. What are other > > limits > > >> >I should keep in mind ? > > >> > > > >> >Thanks, > > >> > > > >> >Veronika > > >> > > > >> >At 04:31 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > >> >>FROM: Arnold, Galen > > >> >>(Concerning ticket No. 137212) > > >> >> > > >> >>Veronika, > > >> >> > > >> >>We see many sequences in our globus-gatekeeper.log on tg-login1 like > > this > > >> >> > > >> >>Notice: 5: Authenticated globus user: > > >> >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > >> >>Nefedova 137823 > > >> >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > >> >>Notice: 5: Requested service: jobmanager-pbs > > >> >>Notice: 5: Authorized as local user: nefedova > > >> >>Notice: 5: Authorized as local uid: 29202 > > >> >>Notice: 5: and local gid: 11467 > > >> >>Notice: 0: executing > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > >> >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > >> >>Notice: 0: Child 16725 started > > >> >>Notice: 5: Authenticated globus user: > > >> >>/DC=org/DC=doegrids/OU=People/CN=Veronika > > >> >>Nefedova 137823 > > >> >>Notice: 0: GRID_SECURITY_HTTP_BODY_FD=6 > > >> >>Notice: 5: Requested service: jobmanager-pbs > > >> >>Notice: 5: Authorized as local user: nefedova > > >> >>Notice: 5: Authorized as local uid: 29202 > > >> >>Notice: 5: and local gid: 11467 > > >> >>Notice: 0: executing > > >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager > > >> >>Notice: 0: GRID_SECURITY_CONTEXT_FD=9 > > >> >>Notice: 0: Child 16726 started > > >> >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > pid=16769 > > >> >>starting at Fri Mar 16 09:34:12 2007 > > >> >> > > >> >> > > >> >>...your DN is also popular in the globus-gatekeeper.log. > > >> >> > > >> >>tg-login1:/usr/local/globus-2.4.3-gcc-r5/var # grep 'CN=Veronika > > Nefedova' > > >> >>globus-gatekeeper.log |wc -l > > >> >> 15215 > > >> >> > > >> >> > > >> >>The large number of connects around the same time: > > >> >> > > >> >> grep --after-context=10 Veronika globus-gatekeeper.log | grep > > starting | > > >> >> grep > > >> >>'Mar 16 09:3' | wc -l > > >> >> 65 > > >> >>... > > >> >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > pid=15872 > > >> >>starting at Fri Mar 16 09:34:02 2007 > > >> >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > pid=16644 > > >> >>starting at Fri Mar 16 09:34:11 2007 > > >> >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > pid=16769 > > >> >>starting at Fri Mar 16 09:34:12 2007 > > >> >>Notice: 6: /usr/local/globus-2.4.3-gcc-r5/sbin/globus-gatekeeper > > pid=16926 > > >> >>starting at Fri Mar 16 09:34:14 2007 > > >> >>... > > >> >> > > >> >> > > >> >>...could be overwhelming the gatekeeper on tg-login1. > > >> >> > > >> >>-Galen > > >> >> > > >> >> > > >> >>Veronika V. Nefedova writes: > > >> >> >Hi, Dave: > > >> >> > > > >> >> >Could you please give me a bit more information on what has happened ? > > >> >> > > > >> >> >Thanks, > > >> >> > > > >> >> >Nika > > >> >> > > > >> >> >At 02:12 PM 3/16/2007, consult at ncsa.uiuc.edu wrote: > > >> >> >>FROM: McWilliams, David G > > >> >> >>(Concerning ticket No. 137212) > > >> >> >> > > >> >> >>Veronika, > > >> >> >> > > >> >> >>The system administrator had to kill several of your globus > > >> processes today > > >> >> >>because the load average on the node was over 25. Below is a list of > > >> >> processes > > >> >> >>that were killed. Please let us know if you need help identifying the > > >> >> case of > > >> >> >>the problem. > > >> >> >> > > >> >> >>Regards, > > >> >> >> > > >> >> >>Dave McWilliams (217) > > 244-1144 consult at ncsa.uiuc.edu > > >> >> >>NCSA Consulting > > >> Services http://www.ncsa.uiuc.edu/UserInfo/Consulting/ > > >> >> >>------------------------------------------------------------------ > > --- > > >> ----- > > >> >> >> > > >> >> >>nefedova 11084 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11165 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11166 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11277 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11295 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11379 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11402 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11480 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11522 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11602 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11668 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11715 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11785 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11813 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11892 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 11948 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12004 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12030 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12133 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12172 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12253 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12256 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12443 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12460 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12504 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12534 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12657 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12668 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12773 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12806 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12892 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 12946 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13023 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13072 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13142 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13233 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13245 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13352 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13379 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13474 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13488 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13618 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13663 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13743 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13820 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13887 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 13952 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 14046 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 14048 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 14172 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 14196 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 14319 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 14366 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 14430 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 14539 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 14572 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 14703 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 14725 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 14832 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 14849 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15009 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15017 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15164 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15165 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15322 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15332 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15526 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15544 1 0 09:33 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15671 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15672 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15787 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15842 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15981 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 15982 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 16131 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 16132 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 16320 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 16358 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 16553 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 16622 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 16725 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 16726 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 16845 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 16925 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 16989 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 17095 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 17262 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 17305 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 17375 1 0 09:34 ? 00:00:02 > > globus-job-manager -conf > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/etc/globus-job-manager.conf -type > > >> >> pbs_gcc -rdn > > >> >> >>jobmanager-pbs -machine-type unknown -publish-jobs > > >> >> >>nefedova 31347 14172 4 13:35 ? 00:00:00 /usr/bin/perl > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script. > > pl -m > > >> >> >>pbs_gcc > > >> >> >>-f /tmp/gram_1Drc0X -c poll > > >> >> >>nefedova 31349 11295 4 13:35 ? 00:00:00 /usr/bin/perl > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script. > > pl -m > > >> >> >>pbs_gcc > > >> >> >>-f /tmp/gram_fSWNvt -c poll > > >> >> >>nefedova 31352 11522 4 13:35 ? 00:00:00 /usr/bin/perl > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script. > > pl -m > > >> >> >>pbs_gcc > > >> >> >>-f /tmp/gram_a3whWx -c poll > > >> >> >>nefedova 31380 16845 4 13:35 ? 00:00:00 /usr/bin/perl > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script. > > pl -m > > >> >> >>pbs_gcc > > >> >> >>-f /tmp/gram_zz0JuC -c poll > > >> >> >>nefedova 31396 14832 4 13:35 ? 00:00:00 /usr/bin/perl > > >> >> >>/usr/local//globus-2.4.3-gcc-r5/libexec/globus-job-manager-script. > > pl -m > > >> >> >>pbs_gcc > > >> >> >>-f /tmp/gram_5jGFI1 -c poll > > >> >> >>nefedova 31512 31347 0 13:35 ? 00:00:00 sh -c > > >> >> >>/usr/local/pbs/ia64/bin/qstat -f 905629.tg-master.ncsa.teragrid.org > > >> >> >>2>/dev/null > > >> >> >>nefedova 31546 31512 0 13:35 ? 00:00:00 > > >> >> >>/usr/local/pbs/ia64/bin/qstat -f > > >> >> >>905629.tg-master.ncsa.teragrid.org > > >> >> >>nefedova 31578 31555 0 13:35 ? 00:00:00 > > >> >> >>/usr/local/pbs/ia64/bin/qstat -f > > >> >> >>905638.tg-master.ncsa.teragrid.org > > >> >> >>nefedova 31629 31560 0 13:35 ? 00:00:00 sh -c > > >> >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff tg-master.ncsa.teragrid.org > > >> >> 15001 3 > > >> >> >>nefedova 31634 31578 0 13:35 ? 00:00:00 sh -c > > >> >> >>/usr/local/torque-2.1.7/ia64/sbin/pbs_iff tg-master.ncsa.teragrid.org > > >> >> 15001 3 > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Wed Mar 21 17:08:20 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 21 Mar 2007 17:08:20 -0500 Subject: [Swift-devel] swift problem? Message-ID: <6.0.0.22.2.20070321165651.05291c90@mail.mcs.anl.gov> I've submitted a big job to TG NCSA today. At some point it filled up the PBS queue completely - I had 384 jobs queued/running (thats the limit). And I know that I had many more jobs waiting on my local machine to be submitted to TG. Once the jobs started to leave the queue (i.e. were finished) - no more jobs were submitted. So I have now only 372 jobs in the queue while I should be having 384. Any ideas why is it happening ? I checked my log on wiggum: /sandbox/ydeng/alamines/swift-MolDyn-free-final-c2eygeq2do861.log and found this error: 2007-03-21 15:51:35,963 INFO vdl:execute2 Running job chrm_long-8qmvzv8i chrm_long with arguments [pstep:40000, prtfile:solv_chg_a3, system:solv_m018, stitle:m018, rtffile:parm03_gaff_all.rtf, paramfile:parm03_gaffnb_all.prm, gaff:m018_am1, vac:, restart:NONE, faster:off, rwater:15, chem:chem, minstep:0, rforce:0, ligcrd:lyz, stage:chg, urandseed:4212951, dirname:solv_chg_a3_m018] in swift-MolDyn-free-final-c2eygeq2do861/chrm_long-8qmvzv8i on TG-NCSA 2007-03-21 15:51:38,162 DEBUG vdl:execute2 Application exception: It is unknown if the job was submitted task:execute @ vdl-int.k, line: 352 vdl:execute2 @ execute-default.k, line: 22 vdl:execute @ swift-MolDyn-free-final.kml, line: 142 charmm2 @ swift-MolDyn-free-final.kml, line: 155790 vdl:mains @ swift-MolDyn-free-final.kml, line: 122678 Caused by: org.globus.gram.GramException: It is unknown if the job was submitted I am not sure if its causing the job submission problems ? I am using this swift code: /sandbox/nefedova/SWIFT/vdsk-0.1rc2 (with some options tweaked in scheduler.xml and swift exec) Thanks! Nika From hategan at mcs.anl.gov Wed Mar 21 17:16:38 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 21 Mar 2007 17:16:38 -0500 Subject: [Swift-devel] swift problem? In-Reply-To: <6.0.0.22.2.20070321165651.05291c90@mail.mcs.anl.gov> References: <6.0.0.22.2.20070321165651.05291c90@mail.mcs.anl.gov> Message-ID: <1174515398.22205.4.camel@blabla.mcs.anl.gov> I've never seen this error before, but it's coming from the GRAM service. It's not the reason why more jobs were not submitted properly, but it may be related to it. My guess is that something happened on the server side that caused most jobs to not send notifications and some (or one) to fail in that way, and Swift thinks most of these jobs are still running. Did the jobs get killed? Do the GRAM logs give any details? Mihael On Wed, 2007-03-21 at 17:08 -0500, Veronika V. Nefedova wrote: > I've submitted a big job to TG NCSA today. At some point it filled up the > PBS queue completely - I had 384 jobs queued/running (thats the limit). And > I know that I had many more jobs waiting on my local machine to be > submitted to TG. Once the jobs started to leave the queue (i.e. were > finished) - no more jobs were submitted. So I have now only 372 jobs in the > queue while I should be having 384. Any ideas why is it happening ? > > I checked my log on wiggum: > /sandbox/ydeng/alamines/swift-MolDyn-free-final-c2eygeq2do861.log > > and found this error: > > 2007-03-21 15:51:35,963 INFO vdl:execute2 Running job chrm_long-8qmvzv8i > chrm_long with arguments [pstep:40000, prtfile:solv_chg_a3, > system:solv_m018, stitle:m018, rtffile:parm03_gaff_all.rtf, > paramfile:parm03_gaffnb_all.prm, gaff:m018_am1, vac:, restart:NONE, > faster:off, rwater:15, chem:chem, minstep:0, rforce:0, ligcrd:lyz, > stage:chg, urandseed:4212951, dirname:solv_chg_a3_m018] in > swift-MolDyn-free-final-c2eygeq2do861/chrm_long-8qmvzv8i on TG-NCSA > 2007-03-21 15:51:38,162 DEBUG vdl:execute2 Application exception: It is > unknown if the job was submitted > task:execute @ vdl-int.k, line: 352 > vdl:execute2 @ execute-default.k, line: 22 > vdl:execute @ swift-MolDyn-free-final.kml, line: 142 > charmm2 @ swift-MolDyn-free-final.kml, line: 155790 > vdl:mains @ swift-MolDyn-free-final.kml, line: 122678 > Caused by: org.globus.gram.GramException: It is unknown if the job was > submitted > > I am not sure if its causing the job submission problems ? > I am using this swift code: /sandbox/nefedova/SWIFT/vdsk-0.1rc2 (with some > options tweaked in scheduler.xml and swift exec) > Thanks! > > Nika > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Wed Mar 21 17:32:47 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 21 Mar 2007 17:32:47 -0500 Subject: [Swift-devel] swift problem? In-Reply-To: <1174515398.22205.4.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070321165651.05291c90@mail.mcs.anl.gov> <1174515398.22205.4.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070321172906.052b5930@mail.mcs.anl.gov> I am not sure what I should look for. I have several hundreds of gram logs -- I checked a few of them and they looked normal (all approximately the same size). I also didn't see any stderr in my outputs (usually when the job is killed you get some kind of GRAM and/or PBS error in stderr.txt file)... The number of jobs in the queue are decreasing -- i.e. the jobs are finishing and nothing new is submitted... Nika At 05:16 PM 3/21/2007, Mihael Hategan wrote: >I've never seen this error before, but it's coming from the GRAM >service. It's not the reason why more jobs were not submitted properly, >but it may be related to it. My guess is that something happened on the >server side that caused most jobs to not send notifications and some (or >one) to fail in that way, and Swift thinks most of these jobs are still >running. > >Did the jobs get killed? Do the GRAM logs give any details? > >Mihael > >On Wed, 2007-03-21 at 17:08 -0500, Veronika V. Nefedova wrote: > > I've submitted a big job to TG NCSA today. At some point it filled up the > > PBS queue completely - I had 384 jobs queued/running (thats the limit). > And > > I know that I had many more jobs waiting on my local machine to be > > submitted to TG. Once the jobs started to leave the queue (i.e. were > > finished) - no more jobs were submitted. So I have now only 372 jobs in > the > > queue while I should be having 384. Any ideas why is it happening ? > > > > I checked my log on wiggum: > > /sandbox/ydeng/alamines/swift-MolDyn-free-final-c2eygeq2do861.log > > > > and found this error: > > > > 2007-03-21 15:51:35,963 INFO vdl:execute2 Running job chrm_long-8qmvzv8i > > chrm_long with arguments [pstep:40000, prtfile:solv_chg_a3, > > system:solv_m018, stitle:m018, rtffile:parm03_gaff_all.rtf, > > paramfile:parm03_gaffnb_all.prm, gaff:m018_am1, vac:, restart:NONE, > > faster:off, rwater:15, chem:chem, minstep:0, rforce:0, ligcrd:lyz, > > stage:chg, urandseed:4212951, dirname:solv_chg_a3_m018] in > > swift-MolDyn-free-final-c2eygeq2do861/chrm_long-8qmvzv8i on TG-NCSA > > 2007-03-21 15:51:38,162 DEBUG vdl:execute2 Application exception: It is > > unknown if the job was submitted > > task:execute @ vdl-int.k, line: 352 > > vdl:execute2 @ execute-default.k, line: 22 > > vdl:execute @ swift-MolDyn-free-final.kml, line: 142 > > charmm2 @ swift-MolDyn-free-final.kml, line: 155790 > > vdl:mains @ swift-MolDyn-free-final.kml, line: 122678 > > Caused by: org.globus.gram.GramException: It is unknown if the job was > > submitted > > > > I am not sure if its causing the job submission problems ? > > I am using this swift code: /sandbox/nefedova/SWIFT/vdsk-0.1rc2 (with some > > options tweaked in scheduler.xml and swift exec) > > Thanks! > > > > Nika > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Mar 21 17:37:52 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 21 Mar 2007 17:37:52 -0500 Subject: [Swift-devel] swift problem? In-Reply-To: <6.0.0.22.2.20070321172906.052b5930@mail.mcs.anl.gov> References: <6.0.0.22.2.20070321165651.05291c90@mail.mcs.anl.gov> <1174515398.22205.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321172906.052b5930@mail.mcs.anl.gov> Message-ID: <1174516672.22999.2.camel@blabla.mcs.anl.gov> On Wed, 2007-03-21 at 17:32 -0500, Veronika V. Nefedova wrote: > I am not sure what I should look for. I have several hundreds of gram > logs -- I checked a few of them and they looked normal (all > approximately the same size). I also didn't see any stderr in my > outputs (usually when the job is killed you get some kind of GRAM > and/or PBS error in stderr.txt file)... > > The number of jobs in the queue are decreasing The fact that the number of jobs in the queue is decreasing doesn't mean that Swift knows about it. Can you add "log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG" in log4j.properties and try it again? Mihael > -- i.e. the jobs are finishing and nothing new is submitted... > > Nika > > At 05:16 PM 3/21/2007, Mihael Hategan wrote: > > I've never seen this error before, but it's coming from the GRAM > > service. It's not the reason why more jobs were not submitted > > properly, > > but it may be related to it. My guess is that something happened on > > the > > server side that caused most jobs to not send notifications and some > > (or > > one) to fail in that way, and Swift thinks most of these jobs are > > still > > running. > > > > Did the jobs get killed? Do the GRAM logs give any details? > > > > Mihael > > > > On Wed, 2007-03-21 at 17:08 -0500, Veronika V. Nefedova wrote: > > > I've submitted a big job to TG NCSA today. At some point it filled > > up the > > > PBS queue completely - I had 384 jobs queued/running (thats the > > limit). And > > > I know that I had many more jobs waiting on my local machine to > > be > > > submitted to TG. Once the jobs started to leave the queue (i.e. > > were > > > finished) - no more jobs were submitted. So I have now only 372 > > jobs in the > > > queue while I should be having 384. Any ideas why is it > > happening ? > > > > > > I checked my log on wiggum: > > > /sandbox/ydeng/alamines/swift-MolDyn-free-final-c2eygeq2do861.log > > > > > > and found this error: > > > > > > 2007-03-21 15:51:35,963 INFO vdl:execute2 Running job > > chrm_long-8qmvzv8i > > > chrm_long with arguments [pstep:40000, prtfile:solv_chg_a3, > > > system:solv_m018, stitle:m018, rtffile:parm03_gaff_all.rtf, > > > paramfile:parm03_gaffnb_all.prm, gaff:m018_am1, vac:, > > restart:NONE, > > > faster:off, rwater:15, chem:chem, minstep:0, rforce:0, > > ligcrd:lyz, > > > stage:chg, urandseed:4212951, dirname:solv_chg_a3_m018] in > > > swift-MolDyn-free-final-c2eygeq2do861/chrm_long-8qmvzv8i on > > TG-NCSA > > > 2007-03-21 15:51:38,162 DEBUG vdl:execute2 Application exception: > > It is > > > unknown if the job was submitted > > > task:execute @ vdl-int.k, line: 352 > > > vdl:execute2 @ execute-default.k, line: 22 > > > vdl:execute @ swift-MolDyn-free-final.kml, line: 142 > > > charmm2 @ swift-MolDyn-free-final.kml, line: 155790 > > > vdl:mains @ swift-MolDyn-free-final.kml, line: 122678 > > > Caused by: org.globus.gram.GramException: It is unknown if the job > > was > > > submitted > > > > > > I am not sure if its causing the job submission problems ? > > > I am using this swift code: /sandbox/nefedova/SWIFT/vdsk-0.1rc2 > > (with some > > > options tweaked in scheduler.xml and swift exec) > > > Thanks! > > > > > > Nika > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From nefedova at mcs.anl.gov Wed Mar 21 18:14:39 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Wed, 21 Mar 2007 18:14:39 -0500 Subject: [Swift-devel] swift problem? In-Reply-To: <1174516672.22999.2.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070321165651.05291c90@mail.mcs.anl.gov> <1174515398.22205.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321172906.052b5930@mail.mcs.anl.gov> <1174516672.22999.2.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070321181401.05257850@mail.mcs.anl.gov> You want me to cancel the whole job and then restart it? At 05:37 PM 3/21/2007, Mihael Hategan wrote: >On Wed, 2007-03-21 at 17:32 -0500, Veronika V. Nefedova wrote: > > I am not sure what I should look for. I have several hundreds of gram > > logs -- I checked a few of them and they looked normal (all > > approximately the same size). I also didn't see any stderr in my > > outputs (usually when the job is killed you get some kind of GRAM > > and/or PBS error in stderr.txt file)... > > > > The number of jobs in the queue are decreasing > >The fact that the number of jobs in the queue is decreasing doesn't mean >that Swift knows about it. >Can you add >"log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG" >in log4j.properties and try it again? > >Mihael > > > -- i.e. the jobs are finishing and nothing new is submitted... > > > > Nika > > > > At 05:16 PM 3/21/2007, Mihael Hategan wrote: > > > I've never seen this error before, but it's coming from the GRAM > > > service. It's not the reason why more jobs were not submitted > > > properly, > > > but it may be related to it. My guess is that something happened on > > > the > > > server side that caused most jobs to not send notifications and some > > > (or > > > one) to fail in that way, and Swift thinks most of these jobs are > > > still > > > running. > > > > > > Did the jobs get killed? Do the GRAM logs give any details? > > > > > > Mihael > > > > > > On Wed, 2007-03-21 at 17:08 -0500, Veronika V. Nefedova wrote: > > > > I've submitted a big job to TG NCSA today. At some point it filled > > > up the > > > > PBS queue completely - I had 384 jobs queued/running (thats the > > > limit). And > > > > I know that I had many more jobs waiting on my local machine to > > > be > > > > submitted to TG. Once the jobs started to leave the queue (i.e. > > > were > > > > finished) - no more jobs were submitted. So I have now only 372 > > > jobs in the > > > > queue while I should be having 384. Any ideas why is it > > > happening ? > > > > > > > > I checked my log on wiggum: > > > > /sandbox/ydeng/alamines/swift-MolDyn-free-final-c2eygeq2do861.log > > > > > > > > and found this error: > > > > > > > > 2007-03-21 15:51:35,963 INFO vdl:execute2 Running job > > > chrm_long-8qmvzv8i > > > > chrm_long with arguments [pstep:40000, prtfile:solv_chg_a3, > > > > system:solv_m018, stitle:m018, rtffile:parm03_gaff_all.rtf, > > > > paramfile:parm03_gaffnb_all.prm, gaff:m018_am1, vac:, > > > restart:NONE, > > > > faster:off, rwater:15, chem:chem, minstep:0, rforce:0, > > > ligcrd:lyz, > > > > stage:chg, urandseed:4212951, dirname:solv_chg_a3_m018] in > > > > swift-MolDyn-free-final-c2eygeq2do861/chrm_long-8qmvzv8i on > > > TG-NCSA > > > > 2007-03-21 15:51:38,162 DEBUG vdl:execute2 Application exception: > > > It is > > > > unknown if the job was submitted > > > > task:execute @ vdl-int.k, line: 352 > > > > vdl:execute2 @ execute-default.k, line: 22 > > > > vdl:execute @ swift-MolDyn-free-final.kml, line: 142 > > > > charmm2 @ swift-MolDyn-free-final.kml, line: 155790 > > > > vdl:mains @ swift-MolDyn-free-final.kml, line: 122678 > > > > Caused by: org.globus.gram.GramException: It is unknown if the job > > > was > > > > submitted > > > > > > > > I am not sure if its causing the job submission problems ? > > > > I am using this swift code: /sandbox/nefedova/SWIFT/vdsk-0.1rc2 > > > (with some > > > > options tweaked in scheduler.xml and swift exec) > > > > Thanks! > > > > > > > > Nika > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > From nefedova at mcs.anl.gov Thu Mar 22 08:43:55 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Thu, 22 Mar 2007 08:43:55 -0500 Subject: [Swift-devel] swift problem? In-Reply-To: <6.0.0.22.2.20070321181401.05257850@mail.mcs.anl.gov> References: <6.0.0.22.2.20070321165651.05291c90@mail.mcs.anl.gov> <1174515398.22205.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321172906.052b5930@mail.mcs.anl.gov> <1174516672.22999.2.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321181401.05257850@mail.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070322083224.056c88f0@mail.mcs.anl.gov> Ok, After I restarted the run, I have a similar behavior: the queue got saturated at first with 384 jobs, but then the number started to decline as the jobs get finished. I have now only 230 jobs (vs 384 max). Another weird thing: I see that the jobs that finished - all 192 finished successfully but one (the one has this error: forrtl: error (78): process killed (SIGTERM) - probably it was killed for some reason). Anyway -- none of the results of the finished jobs were transferred back to my submit host. I should probably just kill the whole thing and start it fresh - the restart thing probably is not working properly (?). The only question is - should I modify anything in the settings to produce more of the debug output, etc ? Thanks, Nika At 06:14 PM 3/21/2007, Veronika V. Nefedova wrote: >You want me to cancel the whole job and then restart it? > >At 05:37 PM 3/21/2007, Mihael Hategan wrote: >>On Wed, 2007-03-21 at 17:32 -0500, Veronika V. Nefedova wrote: >> > I am not sure what I should look for. I have several hundreds of gram >> > logs -- I checked a few of them and they looked normal (all >> > approximately the same size). I also didn't see any stderr in my >> > outputs (usually when the job is killed you get some kind of GRAM >> > and/or PBS error in stderr.txt file)... >> > >> > The number of jobs in the queue are decreasing >> >>The fact that the number of jobs in the queue is decreasing doesn't mean >>that Swift knows about it. >>Can you add >>"log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG" >>in log4j.properties and try it again? >> >>Mihael >> >> > -- i.e. the jobs are finishing and nothing new is submitted... >> > >> > Nika >> > >> > At 05:16 PM 3/21/2007, Mihael Hategan wrote: >> > > I've never seen this error before, but it's coming from the GRAM >> > > service. It's not the reason why more jobs were not submitted >> > > properly, >> > > but it may be related to it. My guess is that something happened on >> > > the >> > > server side that caused most jobs to not send notifications and some >> > > (or >> > > one) to fail in that way, and Swift thinks most of these jobs are >> > > still >> > > running. >> > > >> > > Did the jobs get killed? Do the GRAM logs give any details? >> > > >> > > Mihael >> > > >> > > On Wed, 2007-03-21 at 17:08 -0500, Veronika V. Nefedova wrote: >> > > > I've submitted a big job to TG NCSA today. At some point it filled >> > > up the >> > > > PBS queue completely - I had 384 jobs queued/running (thats the >> > > limit). And >> > > > I know that I had many more jobs waiting on my local machine to >> > > be >> > > > submitted to TG. Once the jobs started to leave the queue (i.e. >> > > were >> > > > finished) - no more jobs were submitted. So I have now only 372 >> > > jobs in the >> > > > queue while I should be having 384. Any ideas why is it >> > > happening ? >> > > > >> > > > I checked my log on wiggum: >> > > > /sandbox/ydeng/alamines/swift-MolDyn-free-final-c2eygeq2do861.log >> > > > >> > > > and found this error: >> > > > >> > > > 2007-03-21 15:51:35,963 INFO vdl:execute2 Running job >> > > chrm_long-8qmvzv8i >> > > > chrm_long with arguments [pstep:40000, prtfile:solv_chg_a3, >> > > > system:solv_m018, stitle:m018, rtffile:parm03_gaff_all.rtf, >> > > > paramfile:parm03_gaffnb_all.prm, gaff:m018_am1, vac:, >> > > restart:NONE, >> > > > faster:off, rwater:15, chem:chem, minstep:0, rforce:0, >> > > ligcrd:lyz, >> > > > stage:chg, urandseed:4212951, dirname:solv_chg_a3_m018] in >> > > > swift-MolDyn-free-final-c2eygeq2do861/chrm_long-8qmvzv8i on >> > > TG-NCSA >> > > > 2007-03-21 15:51:38,162 DEBUG vdl:execute2 Application exception: >> > > It is >> > > > unknown if the job was submitted >> > > > task:execute @ vdl-int.k, line: 352 >> > > > vdl:execute2 @ execute-default.k, line: 22 >> > > > vdl:execute @ swift-MolDyn-free-final.kml, line: 142 >> > > > charmm2 @ swift-MolDyn-free-final.kml, line: 155790 >> > > > vdl:mains @ swift-MolDyn-free-final.kml, line: 122678 >> > > > Caused by: org.globus.gram.GramException: It is unknown if the job >> > > was >> > > > submitted >> > > > >> > > > I am not sure if its causing the job submission problems ? >> > > > I am using this swift code: /sandbox/nefedova/SWIFT/vdsk-0.1rc2 >> > > (with some >> > > > options tweaked in scheduler.xml and swift exec) >> > > > Thanks! >> > > > >> > > > Nika >> > > > >> > > > >> > > > _______________________________________________ >> > > > Swift-devel mailing list >> > > > Swift-devel at ci.uchicago.edu >> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > >> > > > >_______________________________________________ >Swift-devel mailing list >Swift-devel at ci.uchicago.edu >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Thu Mar 22 09:18:30 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 22 Mar 2007 09:18:30 -0500 Subject: [Swift-devel] swift problem? In-Reply-To: <6.0.0.22.2.20070322083224.056c88f0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070321165651.05291c90@mail.mcs.anl.gov> <1174515398.22205.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321172906.052b5930@mail.mcs.anl.gov> <1174516672.22999.2.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321181401.05257850@mail.mcs.anl.gov> <6.0.0.22.2.20070322083224.056c88f0@mail.mcs.anl.gov> Message-ID: <1174573110.5987.0.camel@blabla.mcs.anl.gov> On Thu, 2007-03-22 at 08:43 -0500, Veronika V. Nefedova wrote: > Ok, After I restarted the run, I have a similar behavior: > the queue got saturated at first with 384 jobs, but then the number started > to decline as the jobs get finished. I have now only 230 jobs (vs 384 max). > Another weird thing: I see that the jobs that finished - all 192 finished > successfully but one (the one has this error: forrtl: error (78): process > killed (SIGTERM) - probably it was killed for some reason). Anyway -- none > of the results of the finished jobs were transferred back to my submit host. That would indicate that Swift doesn't know that the jobs finished. Does a simple workflow still work on NCSA? > > I should probably just kill the whole thing and start it fresh - the > restart thing probably is not working properly (?). The only question is - > should I modify anything in the settings to produce more of the debug > output, etc ? > > Thanks, > > Nika > > At 06:14 PM 3/21/2007, Veronika V. Nefedova wrote: > >You want me to cancel the whole job and then restart it? > > > >At 05:37 PM 3/21/2007, Mihael Hategan wrote: > >>On Wed, 2007-03-21 at 17:32 -0500, Veronika V. Nefedova wrote: > >> > I am not sure what I should look for. I have several hundreds of gram > >> > logs -- I checked a few of them and they looked normal (all > >> > approximately the same size). I also didn't see any stderr in my > >> > outputs (usually when the job is killed you get some kind of GRAM > >> > and/or PBS error in stderr.txt file)... > >> > > >> > The number of jobs in the queue are decreasing > >> > >>The fact that the number of jobs in the queue is decreasing doesn't mean > >>that Swift knows about it. > >>Can you add > >>"log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEBUG" > >>in log4j.properties and try it again? > >> > >>Mihael > >> > >> > -- i.e. the jobs are finishing and nothing new is submitted... > >> > > >> > Nika > >> > > >> > At 05:16 PM 3/21/2007, Mihael Hategan wrote: > >> > > I've never seen this error before, but it's coming from the GRAM > >> > > service. It's not the reason why more jobs were not submitted > >> > > properly, > >> > > but it may be related to it. My guess is that something happened on > >> > > the > >> > > server side that caused most jobs to not send notifications and some > >> > > (or > >> > > one) to fail in that way, and Swift thinks most of these jobs are > >> > > still > >> > > running. > >> > > > >> > > Did the jobs get killed? Do the GRAM logs give any details? > >> > > > >> > > Mihael > >> > > > >> > > On Wed, 2007-03-21 at 17:08 -0500, Veronika V. Nefedova wrote: > >> > > > I've submitted a big job to TG NCSA today. At some point it filled > >> > > up the > >> > > > PBS queue completely - I had 384 jobs queued/running (thats the > >> > > limit). And > >> > > > I know that I had many more jobs waiting on my local machine to > >> > > be > >> > > > submitted to TG. Once the jobs started to leave the queue (i.e. > >> > > were > >> > > > finished) - no more jobs were submitted. So I have now only 372 > >> > > jobs in the > >> > > > queue while I should be having 384. Any ideas why is it > >> > > happening ? > >> > > > > >> > > > I checked my log on wiggum: > >> > > > /sandbox/ydeng/alamines/swift-MolDyn-free-final-c2eygeq2do861.log > >> > > > > >> > > > and found this error: > >> > > > > >> > > > 2007-03-21 15:51:35,963 INFO vdl:execute2 Running job > >> > > chrm_long-8qmvzv8i > >> > > > chrm_long with arguments [pstep:40000, prtfile:solv_chg_a3, > >> > > > system:solv_m018, stitle:m018, rtffile:parm03_gaff_all.rtf, > >> > > > paramfile:parm03_gaffnb_all.prm, gaff:m018_am1, vac:, > >> > > restart:NONE, > >> > > > faster:off, rwater:15, chem:chem, minstep:0, rforce:0, > >> > > ligcrd:lyz, > >> > > > stage:chg, urandseed:4212951, dirname:solv_chg_a3_m018] in > >> > > > swift-MolDyn-free-final-c2eygeq2do861/chrm_long-8qmvzv8i on > >> > > TG-NCSA > >> > > > 2007-03-21 15:51:38,162 DEBUG vdl:execute2 Application exception: > >> > > It is > >> > > > unknown if the job was submitted > >> > > > task:execute @ vdl-int.k, line: 352 > >> > > > vdl:execute2 @ execute-default.k, line: 22 > >> > > > vdl:execute @ swift-MolDyn-free-final.kml, line: 142 > >> > > > charmm2 @ swift-MolDyn-free-final.kml, line: 155790 > >> > > > vdl:mains @ swift-MolDyn-free-final.kml, line: 122678 > >> > > > Caused by: org.globus.gram.GramException: It is unknown if the job > >> > > was > >> > > > submitted > >> > > > > >> > > > I am not sure if its causing the job submission problems ? > >> > > > I am using this swift code: /sandbox/nefedova/SWIFT/vdsk-0.1rc2 > >> > > (with some > >> > > > options tweaked in scheduler.xml and swift exec) > >> > > > Thanks! > >> > > > > >> > > > Nika > >> > > > > >> > > > > >> > > > _______________________________________________ > >> > > > Swift-devel mailing list > >> > > > Swift-devel at ci.uchicago.edu > >> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > > > >> > > > > > > >_______________________________________________ > >Swift-devel mailing list > >Swift-devel at ci.uchicago.edu > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From nefedova at mcs.anl.gov Thu Mar 22 09:34:05 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Thu, 22 Mar 2007 09:34:05 -0500 Subject: [Swift-devel] swift problem? In-Reply-To: <1174573110.5987.0.camel@blabla.mcs.anl.gov> References: <6.0.0.22.2.20070321165651.05291c90@mail.mcs.anl.gov> <1174515398.22205.4.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321172906.052b5930@mail.mcs.anl.gov> <1174516672.22999.2.camel@blabla.mcs.anl.gov> <6.0.0.22.2.20070321181401.05257850@mail.mcs.anl.gov> <6.0.0.22.2.20070322083224.056c88f0@mail.mcs.anl.gov> <1174573110.5987.0.camel@blabla.mcs.anl.gov> Message-ID: <6.0.0.22.2.20070322092932.04b2f130@mail.mcs.anl.gov> Yes, the first 3 stages of this workflow worked just fine -- I have some results transferred back to my submit host (from the original run, not the restart). None of the results from stage 4 were transferred back (both from the original run and the restart run). Stage 4 has about 3500 jobs and this is when the queue got saturated. Stages 1-3 never had more then 50 jobs at the same time... The workflow is still going... Nika At 09:18 AM 3/22/2007, Mihael Hategan wrote: >On Thu, 2007-03-22 at 08:43 -0500, Veronika V. Nefedova wrote: > > Ok, After I restarted the run, I have a similar behavior: > > the queue got saturated at first with 384 jobs, but then the number > started > > to decline as the jobs get finished. I have now only 230 jobs (vs 384 max). > > Another weird thing: I see that the jobs that finished - all 192 finished > > successfully but one (the one has this error: forrtl: error (78): process > > killed (SIGTERM) - probably it was killed for some reason). Anyway -- none > > of the results of the finished jobs were transferred back to my submit > host. > >That would indicate that Swift doesn't know that the jobs finished. Does >a simple workflow still work on NCSA? > > > > > I should probably just kill the whole thing and start it fresh - the > > restart thing probably is not working properly (?). The only question is - > > should I modify anything in the settings to produce more of the debug > > output, etc ? > > > > Thanks, > > > > Nika > > > > At 06:14 PM 3/21/2007, Veronika V. Nefedova wrote: > > >You want me to cancel the whole job and then restart it? > > > > > >At 05:37 PM 3/21/2007, Mihael Hategan wrote: > > >>On Wed, 2007-03-21 at 17:32 -0500, Veronika V. Nefedova wrote: > > >> > I am not sure what I should look for. I have several hundreds of gram > > >> > logs -- I checked a few of them and they looked normal (all > > >> > approximately the same size). I also didn't see any stderr in my > > >> > outputs (usually when the job is killed you get some kind of GRAM > > >> > and/or PBS error in stderr.txt file)... > > >> > > > >> > The number of jobs in the queue are decreasing > > >> > > >>The fact that the number of jobs in the queue is decreasing doesn't mean > > >>that Swift knows about it. > > >>Can you add > > >>"log4j.logger.org.globus.cog.abstraction.impl.common.task.TaskImpl=DEB > UG" > > >>in log4j.properties and try it again? > > >> > > >>Mihael > > >> > > >> > -- i.e. the jobs are finishing and nothing new is submitted... > > >> > > > >> > Nika > > >> > > > >> > At 05:16 PM 3/21/2007, Mihael Hategan wrote: > > >> > > I've never seen this error before, but it's coming from the GRAM > > >> > > service. It's not the reason why more jobs were not submitted > > >> > > properly, > > >> > > but it may be related to it. My guess is that something happened on > > >> > > the > > >> > > server side that caused most jobs to not send notifications and some > > >> > > (or > > >> > > one) to fail in that way, and Swift thinks most of these jobs are > > >> > > still > > >> > > running. > > >> > > > > >> > > Did the jobs get killed? Do the GRAM logs give any details? > > >> > > > > >> > > Mihael > > >> > > > > >> > > On Wed, 2007-03-21 at 17:08 -0500, Veronika V. Nefedova wrote: > > >> > > > I've submitted a big job to TG NCSA today. At some point it filled > > >> > > up the > > >> > > > PBS queue completely - I had 384 jobs queued/running (thats the > > >> > > limit). And > > >> > > > I know that I had many more jobs waiting on my local machine to > > >> > > be > > >> > > > submitted to TG. Once the jobs started to leave the queue (i.e. > > >> > > were > > >> > > > finished) - no more jobs were submitted. So I have now only 372 > > >> > > jobs in the > > >> > > > queue while I should be having 384. Any ideas why is it > > >> > > happening ? > > >> > > > > > >> > > > I checked my log on wiggum: > > >> > > > /sandbox/ydeng/alamines/swift-MolDyn-free-final-c2eygeq2do861.log > > >> > > > > > >> > > > and found this error: > > >> > > > > > >> > > > 2007-03-21 15:51:35,963 INFO vdl:execute2 Running job > > >> > > chrm_long-8qmvzv8i > > >> > > > chrm_long with arguments [pstep:40000, prtfile:solv_chg_a3, > > >> > > > system:solv_m018, stitle:m018, rtffile:parm03_gaff_all.rtf, > > >> > > > paramfile:parm03_gaffnb_all.prm, gaff:m018_am1, vac:, > > >> > > restart:NONE, > > >> > > > faster:off, rwater:15, chem:chem, minstep:0, rforce:0, > > >> > > ligcrd:lyz, > > >> > > > stage:chg, urandseed:4212951, dirname:solv_chg_a3_m018] in > > >> > > > swift-MolDyn-free-final-c2eygeq2do861/chrm_long-8qmvzv8i on > > >> > > TG-NCSA > > >> > > > 2007-03-21 15:51:38,162 DEBUG vdl:execute2 Application exception: > > >> > > It is > > >> > > > unknown if the job was submitted > > >> > > > task:execute @ vdl-int.k, line: 352 > > >> > > > vdl:execute2 @ execute-default.k, line: 22 > > >> > > > vdl:execute @ swift-MolDyn-free-final.kml, line: 142 > > >> > > > charmm2 @ swift-MolDyn-free-final.kml, line: 155790 > > >> > > > vdl:mains @ swift-MolDyn-free-final.kml, line: 122678 > > >> > > > Caused by: org.globus.gram.GramException: It is unknown if the job > > >> > > was > > >> > > > submitted > > >> > > > > > >> > > > I am not sure if its causing the job submission problems ? > > >> > > > I am using this swift code: /sandbox/nefedova/SWIFT/vdsk-0.1rc2 > > >> > > (with some > > >> > > > options tweaked in scheduler.xml and swift exec) > > >> > > > Thanks! > > >> > > > > > >> > > > Nika > > >> > > > > > >> > > > > > >> > > > _______________________________________________ > > >> > > > Swift-devel mailing list > > >> > > > Swift-devel at ci.uchicago.edu > > >> > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >> > > > > > >> > > > > > > > > > >_______________________________________________ > > >Swift-devel mailing list > > >Swift-devel at ci.uchicago.edu > > >http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From nefedova at mcs.anl.gov Fri Mar 23 08:05:44 2007 From: nefedova at mcs.anl.gov (Veronika V. Nefedova) Date: Fri, 23 Mar 2007 08:05:44 -0500 Subject: [Swift-devel] resume jobs? Message-ID: <6.0.0.22.2.20070323075755.0466fec0@mail.mcs.anl.gov> Hi, I think I forgot how to resume failed workflows. Is it how it should be done: $ swift -resume MolDyn-0zignmjbujqj0.0.rlog MolDyn.dtm & ? The command above just restarted the whole workflow from the very beginning while it was supposed to run just a couple of very last stages... The rlog contains the list of all already produced files so I am not sure why it started from scratch. Or I screwed up the syntax for -resume? swift --help gives only this: [-resume ] Resumes the execution using a log file Thanks, Nika From hategan at mcs.anl.gov Fri Mar 23 12:27:56 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 23 Mar 2007 12:27:56 -0500 Subject: [Swift-devel] resume jobs? In-Reply-To: <6.0.0.22.2.20070323075755.0466fec0@mail.mcs.anl.gov> References: <6.0.0.22.2.20070323075755.0466fec0@mail.mcs.anl.gov> Message-ID: <1174670876.10692.0.camel@blabla.mcs.anl.gov> You are doing it the correct way. I'll have to investigate. On Fri, 2007-03-23 at 08:05 -0500, Veronika V. Nefedova wrote: > Hi, > > I think I forgot how to resume failed workflows. Is it how it should be done: > > $ swift -resume MolDyn-0zignmjbujqj0.0.rlog MolDyn.dtm & > > ? > > The command above just restarted the whole workflow from the very beginning > while it was supposed to run just a couple of very last stages... The rlog > contains the list of all already produced files so I am not sure why it > started from scratch. Or I screwed up the syntax for -resume? > swift --help gives only this: > > [-resume ] > Resumes the execution using a log file > > Thanks, > > Nika > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Mon Mar 26 21:55:57 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 27 Mar 2007 02:55:57 +0000 (GMT) Subject: [Swift-devel] Re: ResourceMatcher Information (fwd) Message-ID: I got this from my friends in Argentina. ---------- Forwarded message ---------- Date: Mon, 26 Mar 2007 12:33:22 -0300 From: Esteban Mocskos To: Ben Clifford , benc at cs.uchicago.edu Cc: "Diego [iso-8859-1] Fern?ndez Slezak" , mcarri at dc.uba.ar, pturjanski at dc.uba.ar Subject: Re: ResourceMatcher Information Hi, Ben! I finished with the tutorial, but what I'm looking for is something like a "developer guide". Some kind of document with the design concepts and hints to understand the 17M of swift source code. At least, please send me the plug in API description for site selection. Thank you. See you. Esteban. > On Wed, 21 Mar 2007, Diego Fern?ndez Slezak wrote: > > Please send us some tips about Swift so that we can start looking at it. > > The website is: http://www.ci.uchicago.edu/swift/ > > There is a simple tutorial at: > http://www.ci.uchicago.edu/swift/guides/tutorial.php > > -- From benc at hawaga.org.uk Tue Mar 27 02:26:56 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 27 Mar 2007 07:26:56 +0000 (GMT) Subject: [Swift-devel] Re: ResourceMatcher Information (fwd) In-Reply-To: References: Message-ID: specifically what they were looking at here was how some resource selection work that they've previously done might plug into swift, rather than doing its own job submission. On Tue, 27 Mar 2007, Ben Clifford wrote: > > I got this from my friends in Argentina. > > ---------- Forwarded message ---------- > Date: Mon, 26 Mar 2007 12:33:22 -0300 > From: Esteban Mocskos > To: Ben Clifford , benc at cs.uchicago.edu > Cc: "Diego [iso-8859-1] Fern?ndez Slezak" , > mcarri at dc.uba.ar, pturjanski at dc.uba.ar > Subject: Re: ResourceMatcher Information > > Hi, Ben! I finished with the tutorial, but what I'm looking for is something > like a "developer guide". Some kind of document with the design concepts and > hints to understand the 17M of swift source code. > At least, please send me the plug in API description for site selection. > Thank you. > See you. > Esteban. > > On Wed, 21 Mar 2007, Diego Fern?ndez Slezak wrote: > > > Please send us some tips about Swift so that we can start looking at it. > > > > The website is: http://www.ci.uchicago.edu/swift/ > > > > There is a simple tutorial at: > > http://www.ci.uchicago.edu/swift/guides/tutorial.php > > > > -- > > From itf at mcs.anl.gov Tue Mar 27 02:50:44 2007 From: itf at mcs.anl.gov (=?UTF-8?B?SWFuIEZvc3Rlcg==?=) Date: Tue, 27 Mar 2007 07:50:44 +0000 Subject: [Swift-devel] Re: ResourceMatcher Information (fwd) In-Reply-To: References: Message-ID: <310549501-1174981871-cardhu_blackberry.rim.net-2285956-@bwe023-cell00.bisx.prod.on.blackberry> So the one thing they need to know is the resouirce selector callout? It seems to me that as we advise people on that, we need to take into account the impact of using DeeF on design. Ian Sent via BlackBerry from T-Mobile -----Original Message----- From: Ben Clifford Date: Tue, 27 Mar 2007 07:26:56 To:swift-devel at ci.uchicago.edu Subject: Re: [Swift-devel] Re: ResourceMatcher Information (fwd) specifically what they were looking at here was how some resource selection work that they've previously done might plug into swift, rather than doing its own job submission. On Tue, 27 Mar 2007, Ben Clifford wrote: > > I got this from my friends in Argentina. > > ---------- Forwarded message ---------- > Date: Mon, 26 Mar 2007 12:33:22 -0300 > From: Esteban Mocskos > To: Ben Clifford , benc at cs.uchicago.edu > Cc: "Diego [iso-8859-1] Fern?ndez Slezak" , > mcarri at dc.uba.ar, pturjanski at dc.uba.ar > Subject: Re: ResourceMatcher Information > > Hi, Ben! I finished with the tutorial, but what I'm looking for is something > like a "developer guide". Some kind of document with the design concepts and > hints to understand the 17M of swift source code. > At least, please send me the plug in API description for site selection. > Thank you. > See you. > Esteban. > > On Wed, 21 Mar 2007, Diego Fern?ndez Slezak wrote: > > > Please send us some tips about Swift so that we can start looking at it. > > > > The website is: http://www.ci.uchicago.edu/swift/ > > > > There is a simple tutorial at: > > http://www.ci.uchicago.edu/swift/guides/tutorial.php > > > > -- > > _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Tue Mar 27 03:12:39 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 27 Mar 2007 08:12:39 +0000 (GMT) Subject: [Swift-devel] Re: ResourceMatcher Information (fwd) In-Reply-To: <310549501-1174981871-cardhu_blackberry.rim.net-2285956-@bwe023-cell00.bisx.prod.on.blackberry> References: <310549501-1174981871-cardhu_blackberry.rim.net-2285956-@bwe023-cell00.bisx.prod.on.blackberry> Message-ID: On Tue, 27 Mar 2007, Ian Foster wrote: > So the one thing they need to know is the resouirce selector callout? essentially yes, though I suspect some explanation of what the rest of the code does would be useful... > It seems to me that as we advise people on that, we need to take into > account the impact of using DeeF on design. sure. and DeeF's design probably needs to be influenced by the requirements of its potential users. -- From itf at mcs.anl.gov Tue Mar 27 03:21:09 2007 From: itf at mcs.anl.gov (=?UTF-8?B?SWFuIEZvc3Rlcg==?=) Date: Tue, 27 Mar 2007 08:21:09 +0000 Subject: [Swift-devel] Re: ResourceMatcher Information (fwd) In-Reply-To: References: <310549501-1174981871-cardhu_blackberry.rim.net-2285956-@bwe023-cell00.bisx.prod.on.blackberry> Message-ID: <2062420567-1174983696-cardhu_blackberry.rim.net-1169424102-@bwe023-cell00.bisx.prod.on.blackberry> Which potential users do you think are important for DeeF to be talking to? Sent via BlackBerry from T-Mobile -----Original Message----- From: Ben Clifford Date: Tue, 27 Mar 2007 08:12:39 To:Ian Foster Cc:swift-devel at ci.uchicago.edu Subject: Re: [Swift-devel] Re: ResourceMatcher Information (fwd) On Tue, 27 Mar 2007, Ian Foster wrote: > So the one thing they need to know is the resouirce selector callout? essentially yes, though I suspect some explanation of what the rest of the code does would be useful... > It seems to me that as we advise people on that, we need to take into > account the impact of using DeeF on design. sure. and DeeF's design probably needs to be influenced by the requirements of its potential users. -- From benc at hawaga.org.uk Tue Mar 27 03:27:50 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 27 Mar 2007 08:27:50 +0000 (GMT) Subject: [Swift-devel] Re: ResourceMatcher Information (fwd) In-Reply-To: <2062420567-1174983696-cardhu_blackberry.rim.net-1169424102-@bwe023-cell00.bisx.prod.on.blackberry> References: <310549501-1174981871-cardhu_blackberry.rim.net-2285956-@bwe023-cell00.bisx.prod.on.blackberry> <2062420567-1174983696-cardhu_blackberry.rim.net-1169424102-@bwe023-cell00.bisx.prod.on.blackberry> Message-ID: On Tue, 27 Mar 2007, Ian Foster wrote: > Which potential users do you think are important for DeeF to be talking to? I really mean Swift-as-user-of-deef (in the sense of Swift that someone has chosen to connect DeeF rather than one of the other job submission plugins) - they're different projects, with different goals and expectations. -- From benc at hawaga.org.uk Tue Mar 27 03:41:50 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 27 Mar 2007 08:41:50 +0000 (GMT) Subject: [Swift-devel] Re: ResourceMatcher Information (fwd) In-Reply-To: <2062420567-1174983696-cardhu_blackberry.rim.net-1169424102-@bwe023-cell00.bisx.prod.on.blackberry> References: <310549501-1174981871-cardhu_blackberry.rim.net-2285956-@bwe023-cell00.bisx.prod.on.blackberry> <2062420567-1174983696-cardhu_blackberry.rim.net-1169424102-@bwe023-cell00.bisx.prod.on.blackberry> Message-ID: though on the users-of-swift side of things, gadu are the only people I think actually do any serious resource selection that are a candidate for using swift, I think. On Tue, 27 Mar 2007, Ian Foster wrote: > Which potential users do you think are important for DeeF to be talking to? > > > Sent via BlackBerry from T-Mobile > > -----Original Message----- > From: Ben Clifford > Date: Tue, 27 Mar 2007 08:12:39 > To:Ian Foster > Cc:swift-devel at ci.uchicago.edu > Subject: Re: [Swift-devel] Re: ResourceMatcher Information (fwd) > > > > On Tue, 27 Mar 2007, Ian Foster wrote: > > > So the one thing they need to know is the resouirce selector callout? > > essentially yes, though I suspect some explanation of what the rest of the > code does would be useful... > > > It seems to me that as we advise people on that, we need to take into > > account the impact of using DeeF on design. > > sure. and DeeF's design probably needs to be influenced by the > requirements of its potential users. > > > From itf at mcs.anl.gov Tue Mar 27 04:21:21 2007 From: itf at mcs.anl.gov (=?UTF-8?B?SWFuIEZvc3Rlcg==?=) Date: Tue, 27 Mar 2007 09:21:21 +0000 Subject: [Swift-devel] Re: ResourceMatcher Information (fwd) In-Reply-To: References: <310549501-1174981871-cardhu_blackberry.rim.net-2285956-@bwe023-cell00.bisx.prod.on.blackberry><2062420567-1174983696-cardhu_blackberry.rim.net-1169424102-@bwe023-cell00.bisx.prod.on.blackberry> Message-ID: <1894637909-1174987308-cardhu_blackberry.rim.net-1093261511-@bwe053-cell00.bisx.prod.on.blackberry> Ok, so what requirements do you think Swift has? Sent via BlackBerry from T-Mobile -----Original Message----- From: Ben Clifford Date: Tue, 27 Mar 2007 08:27:50 To:Ian Foster Cc:swift-devel at ci.uchicago.edu Subject: Re: [Swift-devel] Re: ResourceMatcher Information (fwd) On Tue, 27 Mar 2007, Ian Foster wrote: > Which potential users do you think are important for DeeF to be talking to? I really mean Swift-as-user-of-deef (in the sense of Swift that someone has chosen to connect DeeF rather than one of the other job submission plugins) - they're different projects, with different goals and expectations. -- From hategan at mcs.anl.gov Tue Mar 27 08:09:34 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Mar 2007 08:09:34 -0500 Subject: [Swift-devel] Re: ResourceMatcher Information (fwd) In-Reply-To: <310549501-1174981871-cardhu_blackberry.rim.net-2285956-@bwe023-cell00.bisx.prod.on.blackberry> References: <310549501-1174981871-cardhu_blackberry.rim.net-2285956-@bwe023-cell00.bisx.prod.on.blackberry> Message-ID: <1175000974.23788.6.camel@blabla.mcs.anl.gov> On Tue, 2007-03-27 at 07:50 +0000, Ian Foster wrote: > So the one thing they need to know is the resouirce selector callout? > > It seems to me that as we advise people on that, we need to take into account the impact of using DeeF on design. Shouldn't good design exclude implementation of lower layers from consideration when talking about higher layers? > > Ian > > Sent via BlackBerry from T-Mobile > > -----Original Message----- > From: Ben Clifford > Date: Tue, 27 Mar 2007 07:26:56 > To:swift-devel at ci.uchicago.edu > Subject: Re: [Swift-devel] Re: ResourceMatcher Information (fwd) > > > specifically what they were looking at here was how some resource > selection work that they've previously done might plug into swift, rather > than doing its own job submission. > > On Tue, 27 Mar 2007, Ben Clifford wrote: > > > > > I got this from my friends in Argentina. > > > > ---------- Forwarded message ---------- > > Date: Mon, 26 Mar 2007 12:33:22 -0300 > > From: Esteban Mocskos > > To: Ben Clifford , benc at cs.uchicago.edu > > Cc: "Diego [iso-8859-1] Fern?ndez Slezak" , > > mcarri at dc.uba.ar, pturjanski at dc.uba.ar > > Subject: Re: ResourceMatcher Information > > > > Hi, Ben! I finished with the tutorial, but what I'm looking for is something > > like a "developer guide". Some kind of document with the design concepts and > > hints to understand the 17M of swift source code. > > At least, please send me the plug in API description for site selection. > > Thank you. > > See you. > > Esteban. > > > On Wed, 21 Mar 2007, Diego Fern?ndez Slezak wrote: > > > > Please send us some tips about Swift so that we can start looking at it. > > > > > > The website is: http://www.ci.uchicago.edu/swift/ > > > > > > There is a simple tutorial at: > > > http://www.ci.uchicago.edu/swift/guides/tutorial.php > > > > > > -- > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From itf at mcs.anl.gov Tue Mar 27 08:56:37 2007 From: itf at mcs.anl.gov (=?UTF-8?B?SWFuIEZvc3Rlcg==?=) Date: Tue, 27 Mar 2007 13:56:37 +0000 Subject: [Swift-devel] Re: ResourceMatcher Information (fwd) In-Reply-To: <1175000974.23788.6.camel@blabla.mcs.anl.gov> References: <310549501-1174981871-cardhu_blackberry.rim.net-2285956-@bwe023-cell00.bisx.prod.on.blackberry> <1175000974.23788.6.camel@blabla.mcs.anl.gov> Message-ID: <2061693776-1175003825-cardhu_blackberry.rim.net-786188562-@bwe029-cell00.bisx.prod.on.blackberry> What I meant was that the RSS callout might be different--maybe DRP might use some RSS functions to acquire resources? This is too harsd for email--let's discuss in person sometime soon. Ian Sent via BlackBerry from T-Mobile -----Original Message----- From: Mihael Hategan Date: Tue, 27 Mar 2007 08:09:34 To:itf at mcs.anl.gov Cc:Ben Clifford , swift-devel at ci.uchicago.edu Subject: Re: [Swift-devel] Re: ResourceMatcher Information (fwd) On Tue, 2007-03-27 at 07:50 +0000, Ian Foster wrote: > So the one thing they need to know is the resouirce selector callout? > > It seems to me that as we advise people on that, we need to take into account the impact of using DeeF on design. Shouldn't good design exclude implementation of lower layers from consideration when talking about higher layers? > > Ian > > Sent via BlackBerry from T-Mobile > > -----Original Message----- > From: Ben Clifford > Date: Tue, 27 Mar 2007 07:26:56 > To:swift-devel at ci.uchicago.edu > Subject: Re: [Swift-devel] Re: ResourceMatcher Information (fwd) > > > specifically what they were looking at here was how some resource > selection work that they've previously done might plug into swift, rather > than doing its own job submission. > > On Tue, 27 Mar 2007, Ben Clifford wrote: > > > > > I got this from my friends in Argentina. > > > > ---------- Forwarded message ---------- > > Date: Mon, 26 Mar 2007 12:33:22 -0300 > > From: Esteban Mocskos > > To: Ben Clifford , benc at cs.uchicago.edu > > Cc: "Diego [iso-8859-1] Fern?ndez Slezak" , > > mcarri at dc.uba.ar, pturjanski at dc.uba.ar > > Subject: Re: ResourceMatcher Information > > > > Hi, Ben! I finished with the tutorial, but what I'm looking for is something > > like a "developer guide". Some kind of document with the design concepts and > > hints to understand the 17M of swift source code. > > At least, please send me the plug in API description for site selection. > > Thank you. > > See you. > > Esteban. > > > On Wed, 21 Mar 2007, Diego Fern?ndez Slezak wrote: > > > > Please send us some tips about Swift so that we can start looking at it. > > > > > > The website is: http://www.ci.uchicago.edu/swift/ > > > > > > There is a simple tutorial at: > > > http://www.ci.uchicago.edu/swift/guides/tutorial.php > > > > > > -- > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From benc at hawaga.org.uk Fri Mar 30 20:28:46 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 31 Mar 2007 01:28:46 +0000 (GMT) Subject: [Swift-devel] language parser regression tests Message-ID: In r598 I committed a bunch of regression tests for the language parser. There are two sets of tests: The first set consists of a number of swift text source files and corresponding expected XML intermediate form files. When the tests run, they check that the input files produce the expected output files. I'll probably extend this to have expected .kml files too. The second set consists of swift source text that should not compile. The tests pass when the compiler reports an error compiling, and fails if the compiler reports successful compilation. As there is no expected XML/KML output, these tests do not have expected-xml files. The tests are not run in the nighly build at the moment, though I should probably make that so. These tests something to be aware of if modifying the parser/compiler code. To run them at the moment, cd tests/language ./run with the swift bin directory on the path. To add tests, put a .dtm file in working/ (I should rename those to .swift) and the expected xml file in working-base/ or put a broken .dtm file in not-working/ (depending on whether you want the first or second kind of test mentioned above) --