From wilde at mcs.anl.gov Tue Feb 1 08:30:22 2011 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Tue, 1 Feb 2011 08:30:22 -0600 (CST) Subject: [Swift-devel] Needs for site selection and job scheduling enhancements In-Reply-To: <989264118.7584.1296566253439.JavaMail.root@zimbra.anl.gov> Message-ID: <513619430.7751.1296570622752.JavaMail.root@zimbra.anl.gov> Mihael, Below is a proposal for Swift scheduling features that will need a fair amount of deliberation. This email is intended to start the process. I can move this to a bugzila enhancement to start the process, so long as you agree that the discussion makes sense to have. Allan, Dan and I have been re-examining the SCEC workflow that Allan is working on. Doing it efficiently on OSG raises scheduling aspects that Swift still doesn't handle well. We propose to address these issues in two phases: I. Use simple workflows that group more work into single scripts to achieve the job affinities needed for reasonable performance. Provide scheduling hints to Swift. II. Determine how Swift could automatically achieve the same scheduling decisions. Phase II is pretty complex as far as we can tell, so lets defer its discussion. To do phase I, we want to ask if any of the following capabilities could be added, and which ones are both reasonable and of "affordable" cost and make sense to try. Most of these involve enabling a Swift script to specify scheduling "hints" on individual app() invocations. 1. Hint to bias a job to a specific set of sites (by pool name?) app myapp (file f, int v1, int v2) { myapp @f sitebias(UNL=v1,Clemson=v2,UChicago=v2); } 2. Hint to bias a job to site(s) that already have designated input parameter files cached. myapp @f1 @f2 filebias(f2=v3); 3. A prio hint to cause a job to be scheduled earlier than lower prio jobs myapp @f1 schedulebias(f2=v3); We'd like to permit multiple hints to be specified on a single app call: myapp @f sitebias(UNL=v1,Clemson=v2,UChicago=v2) schedulebias(f2=v3); And we might need a feature (perhaps a swift.properties setting) to tell Swift to defer initial scheduling decisions for N seconds or until J jobs have been queued by the script, so that a sufficiently large number of jobs are in the queue before scheduling decisions are made (probably delay for say a minute on a multi-hour script run). In addition, we're wondering how easy (and desirable) any/all of the following language extensions could be done: - select statement to work on string values and/or ranges - elseif clause to achieve the above in a multi-branch if statement - function pointers to select a function dynamically, eg from an array - ability to set the app program name from a variable These enhancements would enable us to manually code in the scheduling hints by providing multiple pool groups with different throttle settings and to manually force jobs to different pools. If the easiest way to set the hints requested above on an individual job is to pass an env var on the command line, then that capability might be a useful alternative to setting env vars with one-value-for-all method that we currently employ with the ENV profile. This could be considered as a useful enhancement separate from the question of how scheduling hints are set. Lastly, in phase I we will be testing the performance of having the jobs "pull" files via wget in a pre-staged manner, within the applicaton script. For Phase II we'd like to consider having Swift do that in the worker: Have the coaster worker "pull" files in via wget or similar command/function, asynchronously pre-staging files for jobs that have been queued/assigned to a site. But that can be deferred for a later discussion. -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Tue Feb 1 12:02:36 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 1 Feb 2011 12:02:36 -0600 (CST) Subject: [Swift-devel] Test suite group status and display Message-ID: <661863645.9716.1296583356570.JavaMail.root@zimbra.anl.gov> Im looking for from a way for users to see what tests are available, and which are reasonable to run in different settings and for various purposes. This should assist new users in initial testing and validation of their environment, as well as experienced users in knowing what tests are available and working (or not). Would it be useful to have an option to nightly.sh to display all the directories below tests/ that represent valid test groups? Something like: ./nighly.sh -g com$ find providers -type d | grep -v svn providers providers/local-pbs providers/local-pbs/pads providers/local-pbs/queenbee providers/local-cobalt providers/local-cobalt/surveyor providers/local-cobalt/intrepid providers/ssh providers/sge-local providers/local providers/local-pbs-coasters providers/ssh-pbs-coasters com$ Would it be useful to put a status file in each group dir to help identify what dirs are really groups, and what the current status of those tests are? - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From dsk at ci.uchicago.edu Tue Feb 1 12:55:54 2011 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Tue, 1 Feb 2011 12:55:54 -0600 Subject: [Swift-devel] Test suite group status and display In-Reply-To: <661863645.9716.1296583356570.JavaMail.root@zimbra.anl.gov> References: <661863645.9716.1296583356570.JavaMail.root@zimbra.anl.gov> Message-ID: Ideally, if these really are nightly tests, the swift web page would have a matrix of these tests vs day that would show green/red boxes, and would enable a user to click on a test and get to the code to be able to run that test himself. Dan On Feb 1, 2011, at 12:02 PM, Michael Wilde wrote: > Im looking for from a way for users to see what tests are available, and which are reasonable to run in different settings and for various purposes. This should assist new users in initial testing and validation of their environment, as well as experienced users in knowing what tests are available and working (or not). > > Would it be useful to have an option to nightly.sh to display all the directories below tests/ that represent valid test groups? > > Something like: ./nighly.sh -g > > com$ find providers -type d | grep -v svn > providers > providers/local-pbs > providers/local-pbs/pads > providers/local-pbs/queenbee > providers/local-cobalt > providers/local-cobalt/surveyor > providers/local-cobalt/intrepid > providers/ssh > providers/sge-local > providers/local > providers/local-pbs-coasters > providers/ssh-pbs-coasters > com$ > > Would it be useful to put a status file in each group dir to help identify what dirs are really groups, and what the current status of those tests are? > > - Mike > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From wilde at mcs.anl.gov Tue Feb 1 15:34:55 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 1 Feb 2011 15:34:55 -0600 (CST) Subject: [Swift-devel] Re: [Swift-user] pbs ppn count and stuff In-Reply-To: <1295895095.31774.6.camel@blabla2.none> Message-ID: <30688681.10766.1296596095833.JavaMail.root@zimbra.anl.gov> Hi Mihael, This issue is very timely - it came up in our meeting on the 0.92 release. I dont understand the specifics of much of what you say below, regarding which of the many count parameters you are referring to, how this works with coasters, plain PBS and SGE (and Condor providers), and MPI issues. I think a good step would be to help us (Sarah, Justin, and me) update the User Guide with all that a user needs to know to get node and processor counts specified correctly for the many different configurations of sites and Swift that are possible. Some of my initial questions are below. Maybe this would be best discussed in a teleconference, but we can start by trying to clarify the issues using this email thread. > On Mon, 2011-01-24 at 10:46 -0800, Mihael Hategan wrote: > > So I think some of the problems with ppn are as follows: > > 1. count in cog means number of processes. count in PBS means number > > of > > nodes. What is "count in cog"? Presumably a pool attribute? How does it get specified both for coasters and non-coasters? Is this related to the xcount parameter in the GLOBUS profile in the Swift User Guide MPI example: GLOBUS::host_xcount=3 ? > > 2. when the number of nodes requested was 1 but ppn > 1, You mean the number of nodes that Swift requested in the PBS submit file? as in #PBS -l nodes=$nodes:ppn=$cores > the > > multiple > > job scheme was not enabled so, despite having multiple lines in > > PBS_NODEFILE, only one process would get started. If count was > 1 > > then > > PBS would understand that count*ppn lines should be in PBS_NODEFILE, > > which would result in that number of processes be started. In other > > words there was no way to tell PBS to start 4 jobs on only one node. > > So: > > > > - I changed this to be consistent with 1. Count means number of > > processes to be started. This imposes the restriction that count % > > ppn = > > 0. If not, the pbs provider will throw an exception. # of processes to be started is number of workers in coaster case? > > - I also added mppnppn if USE_MPPWIDTH is enabled. Where & how should USE_MPPWIDTH be specified? > > > > This is in trunk. Should it be retrofitted to 0.92? Does it apply to SGE and the associated "pe" parallel environment issues? How does it relate to workersPerNode and the various coaster settings that control size of node allocations? How does it relate to issues of whether or not a site does node-packing, and whether or not a user wants to use node-packing (ie single-core jobs in most or all cases). I apologize that I cant formulate the question cleanly, but Im finding the terminology and processor-count model between Swift, cog, coasters, and multiple schedulers with multiple modes to be so complex as to require a more detailed review of this entire issue, with a Swift end-user focus. Lets start with a voice call and then bring the issue back to the devel list. - Mike > > > > Mihael > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Tue Feb 1 15:47:44 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 01 Feb 2011 13:47:44 -0800 Subject: [Swift-devel] Re: [Swift-user] pbs ppn count and stuff In-Reply-To: <30688681.10766.1296596095833.JavaMail.root@zimbra.anl.gov> References: <30688681.10766.1296596095833.JavaMail.root@zimbra.anl.gov> Message-ID: <1296596864.3372.10.camel@blabla2.none> On Tue, 2011-02-01 at 15:34 -0600, Michael Wilde wrote: > Hi Mihael, > > This issue is very timely - it came up in our meeting on the 0.92 > release. > > I dont understand the specifics of much of what you say below, > regarding which of the many count parameters you are referring to, how > this works with coasters, plain PBS and SGE (and Condor providers), > and MPI issues. > > I think a good step would be to help us (Sarah, Justin, and me) update > the User Guide with all that a user needs to know to get node and > processor counts specified correctly for the many different > configurations of sites and Swift that are possible. > > Some of my initial questions are below. Maybe this would be best > discussed in a teleconference, but we can start by trying to clarify > the issues using this email thread. > > > On Mon, 2011-01-24 at 10:46 -0800, Mihael Hategan wrote: > > > So I think some of the problems with ppn are as follows: > > > 1. count in cog means number of processes. count in PBS means > number > > > of > > > nodes. > > What is "count in cog"? Presumably a pool attribute? How does it get > specified both for coasters and non-coasters? Is this related to the > xcount parameter in the GLOBUS profile in the Swift User Guide MPI > example: GLOBUS::host_xcount=3 ? It's a task attribute. It means "start this many instances of the process". > > > > 2. when the number of nodes requested was 1 but ppn > 1, > > You mean the number of nodes that Swift requested in the PBS submit > file? Right. > > as in #PBS -l nodes=$nodes:ppn=$cores No. As in #PBS -l nodes=1:ppn=n, with n > 1. So one physical node with multiple processes on that node. > > > the > > > multiple > > > job scheme was not enabled so, despite having multiple lines in > > > PBS_NODEFILE, only one process would get started. If count was > 1 > > > then > > > PBS would understand that count*ppn lines should be in > PBS_NODEFILE, > > > which would result in that number of processes be started. In > other > > > words there was no way to tell PBS to start 4 jobs on only one > node. > > > > > So: > > > > > > - I changed this to be consistent with 1. Count means number of > > > processes to be started. This imposes the restriction that count % > > > ppn = > > > 0. If not, the pbs provider will throw an exception. > > # of processes to be started is number of workers in coaster case? Yes. Number of instances of the worker.pl process. > > > > - I also added mppnppn if USE_MPPWIDTH is enabled. > > Where & how should USE_MPPWIDTH be specified? Justin added support for it, so I'm assuming there was a place where it was needed. > > > > > > > This is in trunk. > > Should it be retrofitted to 0.92? It's a pretty radical change. I will port one thing to 0.92, and that is enabling the multi-job handling when ppn>1. > > Does it apply to SGE and the associated "pe" parallel environment > issues? This is strictly about PBS. > > How does it relate to workersPerNode and the various coaster settings > that control size of node allocations? If you specify ppn > 1, then you need to have nodeGranularity=ppn. We should also change nodeGranularity to read coreGranularity. > > How does it relate to issues of whether or not a site does > node-packing, and whether or not a user wants to use node-packing (ie > single-core jobs in most or all cases). If a site feels like re-defining the notion of a node from the physical thing with multiple cores to a virtual thing with a single core, there's not much we can do about it. But it is not much different from considering the site to physically have 1-core nodes. > I apologize that I cant formulate the question cleanly, but Im finding > the terminology and processor-count model between Swift, cog, > coasters, and multiple schedulers with multiple modes to be so complex > as to require a more detailed review of this entire issue, with a > Swift end-user focus. It's somewhat complex. But the way to look at it is that you pick one model (say the cog/globus one, which says count=number of processes) and stick with that. Then you translate that into the specifics of each site. > > Lets start with a voice call and then bring the issue back to the > devel list. > - Mike > > > > > > > Mihael > > > > > > _______________________________________________ > > > Swift-user mailing list > > > Swift-user at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Tue Feb 1 16:01:46 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 1 Feb 2011 16:01:46 -0600 (CST) Subject: [Swift-devel] Re: [Swift-user] pbs ppn count and stuff In-Reply-To: <1296596864.3372.10.camel@blabla2.none> Message-ID: <1164390573.10875.1296597706575.JavaMail.root@zimbra.anl.gov> Thanks for the quick response. One followup - when you say: > > What is "count in cog"? ... > It's a task attribute. It means "start this many instances of the > process". By task attribute you mean I assume a parameter to the Karajan task() element and the associated CoG execution providers? But there is no direct way to set it from Swift sites.xml? Or is there? Or its just set in the process of translating Swift requests into jobs? - Mike ----- Original Message ----- > On Tue, 2011-02-01 at 15:34 -0600, Michael Wilde wrote: > > Hi Mihael, > > > > This issue is very timely - it came up in our meeting on the 0.92 > > release. > > > > I dont understand the specifics of much of what you say below, > > regarding which of the many count parameters you are referring to, > > how > > this works with coasters, plain PBS and SGE (and Condor providers), > > and MPI issues. > > > > I think a good step would be to help us (Sarah, Justin, and me) > > update > > the User Guide with all that a user needs to know to get node and > > processor counts specified correctly for the many different > > configurations of sites and Swift that are possible. > > > > Some of my initial questions are below. Maybe this would be best > > discussed in a teleconference, but we can start by trying to clarify > > the issues using this email thread. > > > > > On Mon, 2011-01-24 at 10:46 -0800, Mihael Hategan wrote: > > > > So I think some of the problems with ppn are as follows: > > > > 1. count in cog means number of processes. count in PBS means > > number > > > > of > > > > nodes. > > > > What is "count in cog"? Presumably a pool attribute? How does it get > > specified both for coasters and non-coasters? Is this related to the > > xcount parameter in the GLOBUS profile in the Swift User Guide MPI > > example: GLOBUS::host_xcount=3 ? > > It's a task attribute. It means "start this many instances of the > process". > > > > > > 2. when the number of nodes requested was 1 but ppn > 1, > > > > You mean the number of nodes that Swift requested in the PBS submit > > file? > > Right. > > > > as in #PBS -l nodes=$nodes:ppn=$cores > > No. As in #PBS -l nodes=1:ppn=n, with n > 1. > > So one physical node with multiple processes on that node. > > > > > the > > > > multiple > > > > job scheme was not enabled so, despite having multiple lines in > > > > PBS_NODEFILE, only one process would get started. If count was > > > > > 1 > > > > then > > > > PBS would understand that count*ppn lines should be in > > PBS_NODEFILE, > > > > which would result in that number of processes be started. In > > other > > > > words there was no way to tell PBS to start 4 jobs on only one > > node. > > > > > > > > So: > > > > > > > > - I changed this to be consistent with 1. Count means number of > > > > processes to be started. This imposes the restriction that count > > > > % > > > > ppn = > > > > 0. If not, the pbs provider will throw an exception. > > > > # of processes to be started is number of workers in coaster case? > > Yes. Number of instances of the worker.pl process. > > > > > > - I also added mppnppn if USE_MPPWIDTH is enabled. > > > > Where & how should USE_MPPWIDTH be specified? > > Justin added support for it, so I'm assuming there was a place where > it > was needed. > > > > > > > > > > This is in trunk. > > > > Should it be retrofitted to 0.92? > > It's a pretty radical change. I will port one thing to 0.92, and that > is > enabling the multi-job handling when ppn>1. > > > > Does it apply to SGE and the associated "pe" parallel environment > > issues? > > This is strictly about PBS. > > > > How does it relate to workersPerNode and the various coaster > > settings > > that control size of node allocations? > > If you specify ppn > 1, then you need to have nodeGranularity=ppn. We > should also change nodeGranularity to read coreGranularity. > > > > How does it relate to issues of whether or not a site does > > node-packing, and whether or not a user wants to use node-packing > > (ie > > single-core jobs in most or all cases). > > If a site feels like re-defining the notion of a node from the > physical > thing with multiple cores to a virtual thing with a single core, > there's > not much we can do about it. But it is not much different from > considering the site to physically have 1-core nodes. > > > I apologize that I cant formulate the question cleanly, but Im > > finding > > the terminology and processor-count model between Swift, cog, > > coasters, and multiple schedulers with multiple modes to be so > > complex > > as to require a more detailed review of this entire issue, with a > > Swift end-user focus. > > It's somewhat complex. But the way to look at it is that you pick one > model (say the cog/globus one, which says count=number of processes) > and > stick with that. Then you translate that into the specifics of each > site. > > > > Lets start with a voice call and then bring the issue back to the > > devel list. > > - Mike > > > > > > > > > > Mihael > > > > > > > > _______________________________________________ > > > > Swift-user mailing list > > > > Swift-user at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Tue Feb 1 16:38:47 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 01 Feb 2011 14:38:47 -0800 Subject: [Swift-devel] Re: [Swift-user] pbs ppn count and stuff In-Reply-To: <30688681.10766.1296596095833.JavaMail.root@zimbra.anl.gov> References: <30688681.10766.1296596095833.JavaMail.root@zimbra.anl.gov> Message-ID: <1296599927.4011.0.camel@blabla2.none> On Tue, 2011-02-01 at 15:34 -0600, Michael Wilde wrote: > Lets start with a voice call and then bring the issue back to the devel list. Can we do this on Thursday after 12:30 Chicago time? Mihael From hategan at mcs.anl.gov Tue Feb 1 16:40:53 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 01 Feb 2011 14:40:53 -0800 Subject: [Swift-devel] Re: [Swift-user] pbs ppn count and stuff In-Reply-To: <1164390573.10875.1296597706575.JavaMail.root@zimbra.anl.gov> References: <1164390573.10875.1296597706575.JavaMail.root@zimbra.anl.gov> Message-ID: <1296600053.4011.2.camel@blabla2.none> On Tue, 2011-02-01 at 16:01 -0600, Michael Wilde wrote: > Thanks for the quick response. One followup - when you say: > > > > What is "count in cog"? ... > > It's a task attribute. It means "start this many instances of the > > process". > > By task attribute you mean I assume a parameter to the Karajan task() > element and the associated CoG execution providers? But there is no > direct way to set it from Swift sites.xml? Or is there? Or its just > set in the process of translating Swift requests into jobs? Good point. There is no direct way to express it in swift. The only way it transpires is through the coaster configuration. So we might as well keep that in terms of compute nodes. Mihael From hategan at mcs.anl.gov Tue Feb 1 17:13:16 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 01 Feb 2011 15:13:16 -0800 Subject: [Swift-devel] Re: Needs for site selection and job scheduling enhancements In-Reply-To: <513619430.7751.1296570622752.JavaMail.root@zimbra.anl.gov> References: <513619430.7751.1296570622752.JavaMail.root@zimbra.anl.gov> Message-ID: <1296601996.4182.22.camel@blabla2.none> I think we need to slow down a bit :) On Tue, 2011-02-01 at 08:30 -0600, wilde at mcs.anl.gov wrote: > Mihael, > > Below is a proposal for Swift scheduling features that will need a > fair amount of deliberation. This email is intended to start the > process. I can move this to a bugzila enhancement to start the > process, so long as you agree that the discussion makes sense to have. > > Allan, Dan and I have been re-examining the SCEC workflow that Allan > is working on. > > Doing it efficiently on OSG raises scheduling aspects that Swift still > doesn't handle well. We propose to address these issues in two phases: I think it may be useful to spell those out. > > I. Use simple workflows that group more work into single scripts to > achieve the job affinities needed for reasonable performance. Provide > scheduling hints to Swift. > > II. Determine how Swift could automatically achieve the same > scheduling decisions. > > Phase II is pretty complex as far as we can tell, so lets defer its > discussion. > > To do phase I, we want to ask if any of the following capabilities > could be added, and which ones are both reasonable and of "affordable" > cost and make sense to try. > > Most of these involve enabling a Swift script to specify scheduling > "hints" on individual app() invocations. Does the place where the hints are specified have any relevance? If we put the hints in the swift source we lose the "site independence" aspect. > [...] > And we might need a feature (perhaps a swift.properties setting) to > tell Swift to defer initial scheduling decisions for N seconds or > until J jobs have been queued by the script, so that a sufficiently > large number of jobs are in the queue before scheduling decisions are > made (probably delay for say a minute on a multi-hour script run). How would that help? Given that the scheduling is probabilistic, that makes the distribution essentially the same whether you have N or N/2 jobs. > > In addition, we're wondering how easy (and desirable) any/all of the > following language extensions could be done: > > - select statement to work on string values and/or ranges What would be the semantics of this statement? Can you give examples? > > - elseif clause to achieve the above in a multi-branch if statement Quite silly we don't support that already. > > - function pointers to select a function dynamically, eg from an array Well, I do like the idea of higher order functions, but that's not quite the way we went with this in the start. Though I'm sure it could be added. However, I would be curious to see the kind of problem that one would solve with swift that would require this. > - ability to set the app program name from a variable Could you clarify that? > > These enhancements would enable us to manually code in the scheduling > hints by providing multiple pool groups with different throttle > settings and to manually force jobs to different pools. I feel that to be a contrived way to avoid java code. Things are separated into components in order to isolate solutions to subproblems into loosely connected parts of the code. The idea that we'd implement scheduling features in the swift language seems to be the antithesis of that design principle. > > If the easiest way to set the hints requested above on an individual > job is to pass an env var on the command line, then that capability > might be a useful alternative to setting env vars with > one-value-for-all method that we currently employ with the ENV > profile. This could be considered as a useful enhancement separate > from the question of how scheduling hints are set. > > Lastly, in phase I we will be testing the performance of having the > jobs "pull" files via wget in a pre-staged manner, within the > applicaton script. For Phase II we'd like to consider having Swift do > that in the worker: Have the coaster worker "pull" files in via wget > or similar command/function, asynchronously pre-staging files for jobs > that have been queued/assigned to a site. But that can be deferred for > a later discussion. How is that different from the current worker staging mechanism (aside from changing protocols and tools)? I.e., what is the theoretical difference? > From wilde at mcs.anl.gov Tue Feb 1 19:23:51 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 1 Feb 2011 19:23:51 -0600 (CST) Subject: [Swift-devel] Re: Needs for site selection and job scheduling enhancements In-Reply-To: <1296601996.4182.22.camel@blabla2.none> Message-ID: <782052521.11240.1296609831487.JavaMail.root@zimbra.anl.gov> ----- Original Message ----- > I think we need to slow down a bit :) Indeed. I'll try to space out my emails better :) I mainly wanted to get the ideas on the table before we forget them, and also let Dan, Allan, and others comment on them. I'll follow up on your responses below in a later email. The main thing to do first is to characterize the IO/compute behavior in the application, show what Swift does currently, and postulate how that could be improved. - Mike > > On Tue, 2011-02-01 at 08:30 -0600, wilde at mcs.anl.gov wrote: > > Mihael, > > > > Below is a proposal for Swift scheduling features that will need a > > fair amount of deliberation. This email is intended to start the > > process. I can move this to a bugzila enhancement to start the > > process, so long as you agree that the discussion makes sense to > > have. > > > > Allan, Dan and I have been re-examining the SCEC workflow that Allan > > is working on. > > > > Doing it efficiently on OSG raises scheduling aspects that Swift > > still > > doesn't handle well. We propose to address these issues in two > > phases: > > I think it may be useful to spell those out. > > > > I. Use simple workflows that group more work into single scripts to > > achieve the job affinities needed for reasonable performance. > > Provide > > scheduling hints to Swift. > > > > II. Determine how Swift could automatically achieve the same > > scheduling decisions. > > > > Phase II is pretty complex as far as we can tell, so lets defer its > > discussion. > > > > To do phase I, we want to ask if any of the following capabilities > > could be added, and which ones are both reasonable and of > > "affordable" > > cost and make sense to try. > > > > Most of these involve enabling a Swift script to specify scheduling > > "hints" on individual app() invocations. > > Does the place where the hints are specified have any relevance? If we > put the hints in the swift source we lose the "site independence" > aspect. > > > [...] > > And we might need a feature (perhaps a swift.properties setting) to > > tell Swift to defer initial scheduling decisions for N seconds or > > until J jobs have been queued by the script, so that a sufficiently > > large number of jobs are in the queue before scheduling decisions > > are > > made (probably delay for say a minute on a multi-hour script run). > > How would that help? Given that the scheduling is probabilistic, that > makes the distribution essentially the same whether you have N or N/2 > jobs. > > > > In addition, we're wondering how easy (and desirable) any/all of the > > following language extensions could be done: > > > > - select statement to work on string values and/or ranges > > What would be the semantics of this statement? Can you give examples? > > > > - elseif clause to achieve the above in a multi-branch if statement > > Quite silly we don't support that already. > > > > - function pointers to select a function dynamically, eg from an > > array > > Well, I do like the idea of higher order functions, but that's not > quite > the way we went with this in the start. Though I'm sure it could be > added. However, I would be curious to see the kind of problem that one > would solve with swift that would require this. > > > - ability to set the app program name from a variable > > Could you clarify that? > > > > These enhancements would enable us to manually code in the > > scheduling > > hints by providing multiple pool groups with different throttle > > settings and to manually force jobs to different pools. > > I feel that to be a contrived way to avoid java code. Things are > separated into components in order to isolate solutions to subproblems > into loosely connected parts of the code. The idea that we'd implement > scheduling features in the swift language seems to be the antithesis > of > that design principle. > > > > If the easiest way to set the hints requested above on an individual > > job is to pass an env var on the command line, then that capability > > might be a useful alternative to setting env vars with > > one-value-for-all method that we currently employ with the ENV > > profile. This could be considered as a useful enhancement separate > > from the question of how scheduling hints are set. > > > > Lastly, in phase I we will be testing the performance of having the > > jobs "pull" files via wget in a pre-staged manner, within the > > applicaton script. For Phase II we'd like to consider having Swift > > do > > that in the worker: Have the coaster worker "pull" files in via wget > > or similar command/function, asynchronously pre-staging files for > > jobs > > that have been queued/assigned to a site. But that can be deferred > > for > > a later discussion. > > How is that different from the current worker staging mechanism (aside > from changing protocols and tools)? I.e., what is the theoretical > difference? > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From dk0966 at cs.ship.edu Tue Feb 1 23:24:50 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Wed, 2 Feb 2011 00:24:50 -0500 Subject: [Swift-devel] Swiftconfig merge / changes Message-ID: On Tue, Feb 1, 2011 at 12:07 PM, Michael Wilde wrote: Also, Im very eager to reconcile and unify into one coordinated plan the > idea of merging the best aspects of Justin's ad-hoc configuration methid, > which he's documented on the SWFT wiki (see eg CoasterCookbook etc) and your > swiftconfig/swiftrun mechanism. > > I like the fact that Justin's is based on sh rather then perl - that will > make maintenance easier and reduce portability issues. But I also want to > bring back various concepts from swiftconfig/run into this mechanism. If we > can agree on a spec, perhaps that would be a good project for you to > (resume) work on? > Here is what I see as the list of features we would need to add to the shell script in order to merge features: >From swiftconfig: - XML editor. It should have some basic knowledge of valid inputs (for example, when you're editing the execution provider setting, it will only allow you to select one that swift knows) - Editor for adding/removing/modifying apps to the tc file and verifying formatting - Using templates to generate new site configurations - Using templates to generate new templates - Importing existing configuration files into the template format/directory structure - Creating and managing ssh configurations for various hosts >From swiftrun: - A new swiftrun mechanism which reads from a template directory, generates the run.XXXX directory, links input data, and runs swift - The concept of site groups. For example, creating a group called "MCS-coasters" which combines the individual configurations of thrash-coasters, thwomp-coasters, etc. - Similar to above, application sets for handling different sets of apps It's possible we may not need or want all of these. I just wanted to point out what would be involved if we merged the main features as they currently exist in the swiftconfig/swiftrun utilities. I also have a list of suggestions for future improvements which include things like revision control for modifications and a tagging/search system for searching through previous swift runs. The suggestion was moving away from perl due to portability issues. I can understand that, but what do you think about doing it in Java? Bash is nice, but in some ways it suffers from the same portability issues that perl does. Here are a few issues I have run into so far. The "swift" shell script we use calls /bin/sh. Sometimes this is bash. On some linux systems it's actually a shell called ash, which is similar but does not always compatible. If you can assume there will always be a /bin/bash, there are individual differences between bash versions. When I was testing changes to usage stats, I ran into a really old version of bash which caused it to fail. Then there are things like /dev/udp which may or may not be turned on based on bash compilation options. Bash is nice, but it's a little limited in what is can do by itself without relying on a lot of external applications.. which may not be always be installed on the system or work as expected. The nice thing about doing it in Java is that you can bank on Java 1.5 or later being there regardless of whatever system quirks you may run into. Java also includes XML handling libraries and has nice regular expression system which would make this a little easier to handle. It will also be easier to manage as complexity increases in the future. Shifting gears a bit.. what I would _really_ would like in the future is a graphical interface on top of swiftconfig and swiftrun. I may be the only one, but I haven't given up on this idea :-) - Have one visual interface running locally which will let me do everything - Easily and visually configure swift to submit jobs on a remote system - Visually create a workflow which generates a swift script - Select a swift script and be able to easily change parameters, queue, maxtime, # nodes, etc on the fly - See a visual progression of what is happening. Something like a swing version of -tui -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.monette at gmail.com Wed Feb 2 00:04:22 2011 From: jon.monette at gmail.com (=?utf-8?B?am9uLm1vbmV0dGVAZ21haWwuY29t?=) Date: Wed, 02 Feb 2011 00:04:22 -0600 Subject: =?utf-8?B?UmU6IFtTd2lmdC1kZXZlbF0gU3dpZnRjb25maWcgbWVyZ2UgLyBjaGFuZ2Vz?= Message-ID: <4d48f3d9.431eec0a.5bab.ffffc2a2@mx.google.com> I agree switching the swiftrun/swiftconfig to using bash may not be the best idea due to the above reasons David has mentioned. However I do not believe switching over to using Java is the solution. In the end swiftrun is calling swift which is a shell script. Having a java process call a shell script which in turns starts java processes may not be the best plan of action. May I suggest using python? Not much changes between python versions and it has very good xml libraries to work with. I do not believe there will be very many comparability issues in using python but in all programming there are always these issues. Python is also much easier to read in my humble opinion. ----- Reply message ----- From: "David Kelly" Date: Tue, Feb 1, 2011 11:24 pm Subject: [Swift-devel] Swiftconfig merge / changes To: "Michael Wilde" Cc: "swift-devel" -------------- next part -------------- An HTML attachment was scrubbed... URL: From dk0966 at cs.ship.edu Wed Feb 2 02:44:51 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Wed, 2 Feb 2011 03:44:51 -0500 Subject: [Swift-devel] Swiftconfig merge / changes In-Reply-To: <4d48f3d9.431eec0a.5bab.ffffc2a2@mx.google.com> References: <4d48f3d9.431eec0a.5bab.ffffc2a2@mx.google.com> Message-ID: On Wed, Feb 2, 2011 at 1:04 AM, jon.monette at gmail.com wrote: > Having a java process call a shell script which in turns starts java > processes may not be the best plan of action. > That is a good point. Perhaps we should think about integrating what swiftrun does/should do directly into swift. It may be better than building up layers of scripts around it. I think it could done in a way that is backwards compatible. We would have to discuss the details, but for example, if you specify -sites.file it will use that exclusively. Otherwise you could pass it a -site and it will know to look in $HOME/.swift for templates, to replace values in templates with environment variables where requested, and so on. There's more to it, but that's the basic idea. Then swiftconfig just has to worry about managing the config files. It would simplify the test suite. Swift would be more flexible.. and maybe it would reduce the number of people having to write custom shell scripts to do things like this. David -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.monette at gmail.com Wed Feb 2 11:24:51 2011 From: jon.monette at gmail.com (=?utf-8?B?am9uLm1vbmV0dGVAZ21haWwuY29t?=) Date: Wed, 02 Feb 2011 11:24:51 -0600 Subject: =?utf-8?B?UmU6IFtTd2lmdC1kZXZlbF0gU3dpZnRjb25maWcgbWVyZ2UgLyBjaGFuZ2Vz?= Message-ID: <4d499357.81a5e60a.5dfc.ffff8d93@mx.google.com> Yea. I agree merging swiftrun and swift together makes sense. I don't think it will eliminate a user from making there own run script but it will certainly simplify the job of doing so. ----- Reply message ----- From: "David Kelly" Date: Wed, Feb 2, 2011 2:44 am Subject: [Swift-devel] Swiftconfig merge / changes To: "jon.monette at gmail.com" Cc: "Michael Wilde" , "swift-devel" -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Feb 2 13:26:26 2011 From: wilde at mcs.anl.gov (wilde at mcs.anl.gov) Date: Wed, 2 Feb 2011 13:26:26 -0600 (CST) Subject: [Swift-devel] Swiftconfig merge / changes In-Reply-To: Message-ID: <2028817107.12817.1296674786980.JavaMail.root@zimbra.anl.gov> I agree that anything swiftrun does which we want to retain should be integrated into swift. Two things I think are important are: - allow selection of a set of sites - copy all relevant config information into the swift log (by default) Regarding swiftconfig: - I think we can code its functions in the lowest common denominator of /bin/sh capabilities - the difficult part is organizing the many sites file variations into a manageable set of parameterized templates, possible both provided by swift and also managed as a sites template library by users or groups. Mike ----- Original Message ----- On Wed, Feb 2, 2011 at 1:04 AM, jon.monette at gmail.com < jon.monette at gmail.com > wrote: Having a java process call a shell script which in turns starts java processes may not be the best plan of action. That is a good point. Perhaps we should think about integrating what swiftrun does/should do directly into swift. It may be better than building up layers of scripts around it. I think it could done in a way that is backwards compatible. We would have to discuss the details, but for example, if you specify -sites.file it will use that exclusively. Otherwise you could pass it a -site and it will know to look in $HOME/.swift for templates, to replace values in templates with environment variables where requested, and so on. There's more to it, but that's the basic idea. Then swiftconfig just has to worry about managing the config files. It would simplify the test suite. Swift would be more flexible.. and maybe it would reduce the number of people having to write custom shell scripts to do things like this. David -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Wed Feb 2 14:06:00 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Wed, 2 Feb 2011 12:06:00 -0800 Subject: [Swift-devel] Swiftconfig merge / changes In-Reply-To: <4d48f3d9.431eec0a.5bab.ffffc2a2@mx.google.com> References: <4d48f3d9.431eec0a.5bab.ffffc2a2@mx.google.com> Message-ID: i did commit the latest version of meta.sh (sitetester) in python...so, we know where my preference lies :) On Tue, Feb 1, 2011 at 10:04 PM, jon.monette at gmail.com < jon.monette at gmail.com> wrote: > I agree switching the swiftrun/swiftconfig to using bash may not be the > best idea due to the above reasons David has mentioned. However I do not > believe switching over to using Java is the solution. In the end swiftrun is > calling swift which is a shell script. Having a java process call a shell > script which in turns starts java processes may not be the best plan of > action. May I suggest using python? Not much changes between python versions > and it has very good xml libraries to work with. I do not believe there will > be very many comparability issues in using python but in all programming > there are always these issues. Python is also much easier to read in my > humble opinion. > > ----- Reply message ----- > From: "David Kelly" > Date: Tue, Feb 1, 2011 11:24 pm > Subject: [Swift-devel] Swiftconfig merge / changes > To: "Michael Wilde" > Cc: "swift-devel" > > > > On Tue, Feb 1, 2011 at 12:07 PM, Michael Wilde wrote: > > Also, Im very eager to reconcile and unify into one coordinated plan the >> idea of merging the best aspects of Justin's ad-hoc configuration methid, >> which he's documented on the SWFT wiki (see eg CoasterCookbook etc) and your >> swiftconfig/swiftrun mechanism. >> >> I like the fact that Justin's is based on sh rather then perl - that will >> make maintenance easier and reduce portability issues. But I also want to >> bring back various concepts from swiftconfig/run into this mechanism. If we >> can agree on a spec, perhaps that would be a good project for you to >> (resume) work on? >> > > Here is what I see as the list of features we would need to add to the > shell script in order to merge features: > > From swiftconfig: > > - XML editor. It should have some basic knowledge of valid inputs (for > example, when you're editing the execution provider setting, it will only > allow you to select one that swift knows) > - Editor for adding/removing/modifying apps to the tc file and verifying > formatting > - Using templates to generate new site configurations > - Using templates to generate new templates > - Importing existing configuration files into the template format/directory > structure > - Creating and managing ssh configurations for various hosts > > From swiftrun: > > - A new swiftrun mechanism which reads from a template directory, generates > the run.XXXX directory, links input data, and runs swift > - The concept of site groups. For example, creating a group called > "MCS-coasters" which combines the individual configurations of > thrash-coasters, thwomp-coasters, etc. > - Similar to above, application sets for handling different sets of apps > > It's possible we may not need or want all of these. I just wanted to point > out what would be involved if we merged the main features as they currently > exist in the swiftconfig/swiftrun utilities. I also have a list of > suggestions for future improvements which include things like revision > control for modifications and a tagging/search system for searching through > previous swift runs. > > The suggestion was moving away from perl due to portability issues. I can > understand that, but what do you think about doing it in Java? Bash is nice, > but in some ways it suffers from the same portability issues that perl does. > Here are a few issues I have run into so far. The "swift" shell script we > use calls /bin/sh. Sometimes this is bash. On some linux systems it's > actually a shell called ash, which is similar but does not always > compatible. If you can assume there will always be a /bin/bash, there are > individual differences between bash versions. When I was testing changes to > usage stats, I ran into a really old version of bash which caused it to > fail. Then there are things like /dev/udp which may or may not be turned on > based on bash compilation options. Bash is nice, but it's a little limited > in what is can do by itself without relying on a lot of external > applications.. which may not be always be installed on the system or work as > expected. > > The nice thing about doing it in Java is that you can bank on Java 1.5 or > later being there regardless of whatever system quirks you may run into. > Java also includes XML handling libraries and has nice regular expression > system which would make this a little easier to handle. It will also be > easier to manage as complexity increases in the future. > > Shifting gears a bit.. what I would _really_ would like in the future is a > graphical interface on top of swiftconfig and swiftrun. I may be the only > one, but I haven't given up on this idea :-) > > - Have one visual interface running locally which will let me do everything > - Easily and visually configure swift to submit jobs on a remote system > - Visually create a workflow which generates a swift script > - Select a swift script and be able to easily change parameters, queue, > maxtime, # nodes, etc on the fly > - See a visual progression of what is happening. Something like a swing > version of -tui > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Wed Feb 2 14:08:51 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 2 Feb 2011 14:08:51 -0600 (CST) Subject: [Swift-devel] Re: [Swift-user] pbs ppn count and stuff In-Reply-To: <1296599927.4011.0.camel@blabla2.none> Message-ID: <861161623.12935.1296677331670.JavaMail.root@zimbra.anl.gov> Would 2PM tomorrow work? Justin, can you join this discussion? I'll set up a conf call once we confirm a time. Im inserting below an email thread started by Matt at NCAR on this topic. He also refers back to a very old thread in which the same issue was raised. - Mike ----- Forwarded Message ----- From: "Matthew Woitaszek" To: "Allan Espinosa" Cc: swift-user at ci.uchicago.edu Sent: Thursday, November 4, 2010 10:06:48 AM Subject: Re: [Swift-user] Coasters and PBS resource requests: nodes and ppn Hi Allan, Yep, that's it. When the coasters resource request comes in with just "nodes=1", it gets interpreted by PBS as nodes=1:ppn=1, and thus PBS puts other jobs on the node, too, until all 8 CPUs are allocated (e.g., 8 1-cpu PBS jobs are running on it). I'd like to find some way to make the request as: nodes=1:ppn=8 along with workersPerNode=8 so that PBS allocates one node and all 8 processors, and then one Coasters job would put 8 workers on it, matching the resource request with the use. Matthew On Wed, Nov 3, 2010 at 5:41 PM, Allan Espinosa < aespinosa at cs.uchicago.edu > wrote: Hi Matthew, Does this mean, coasters will now submit nodes=1;ppn=1 and do node packing? If there is no node packing being initiated by PBS, you can just specify workersPerNode=8 . But then what you request to PBS is now different to what you actually use. -Allan 2010/11/3 Matthew Woitaszek < matthew.woitaszek at gmail.com >: > Good afternoon, > > Is there a way to update PBS resource requests when using coasters to supply > modified PBS resource strings such as "nodes=1:ppn=8"? (Or other arbitrary > resource requests, such as node properties?) > > Of course, I'm just trying to get coasters to allocate all of the processors > on an 8-core node, using either the "gt2:gt2:pbs" or "local:pbs" provider. > Both submit jobs just fine. I found no discernible difference with the > "host_types" Globus namespace variable, presuming I'm setting it right. > > The particular cluster I'm using allows node packing for users that run lots > of single-processor tasks, so without ppn, it will assume nodes=1,ncpus=1 > and thus pack 8 jobs on each node before moving on to the next node. (I know > it won't be an issue at sites that make nodes exclusive. On this system, the > queue default is "nodes=1:ppn=8", but because coasters explicitly specifies > the number of nodes in its generated resource request, the ppn default seems > to get lost!) > > I see that this has been discussed as far back as 2007, and I found Marcin > and Mike's previous discussion of the topic at > > http://mail.ci.uchicago.edu/pipermail/swift-user/2010-March/001409.html > > but there didn't seem to be any definitive conclusion. Any suggestions would > be appreciated! > > Matthew > -- Allan M. Espinosa < http://amespinosa.wordpress.com > PhD student, Computer Science University of Chicago < http://people.cs.uchicago.edu/~aespinosa > _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-user -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory ----- Original Message ----- > On Tue, 2011-02-01 at 15:34 -0600, Michael Wilde wrote: > > > Lets start with a voice call and then bring the issue back to the > > devel list. > > Can we do this on Thursday after 12:30 Chicago time? > > Mihael -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Wed Feb 2 15:22:43 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 02 Feb 2011 13:22:43 -0800 Subject: [Swift-devel] Swiftconfig merge / changes In-Reply-To: <4d48f3d9.431eec0a.5bab.ffffc2a2@mx.google.com> References: <4d48f3d9.431eec0a.5bab.ffffc2a2@mx.google.com> Message-ID: <1296681763.9069.1.camel@blabla2.none> On Wed, 2011-02-02 at 00:04 -0600, jon.monette at gmail.com wrote: > I agree switching the swiftrun/swiftconfig to using bash may not be > the best idea due to the above reasons David has mentioned. However I > do not believe switching over to using Java is the solution. In the > end swiftrun is calling swift which is a shell script. Having a java > process call a shell script which in turns starts java processes may > not be the best plan of action. That doesn't need to be so. A java program can invoke the main() method of swift. > May I suggest using python? Not much changes between python versions > and it has very good xml libraries to work with. I do not believe > there will be very many comparability issues in using python but in > all programming there are always these issues. Python is also much > easier to read in my humble opinion. > > ----- Reply message ----- > From: "David Kelly" > Date: Tue, Feb 1, 2011 11:24 pm > Subject: [Swift-devel] Swiftconfig merge / changes > To: "Michael Wilde" > Cc: "swift-devel" > > > > > On Tue, Feb 1, 2011 at 12:07 PM, Michael Wilde > wrote: > > Also, Im very eager to reconcile and unify into one > coordinated plan the idea of merging the best aspects of > Justin's ad-hoc configuration methid, which he's documented on > the SWFT wiki (see eg CoasterCookbook etc) and your > swiftconfig/swiftrun mechanism. > > I like the fact that Justin's is based on sh rather then perl > - that will make maintenance easier and reduce portability > issues. But I also want to bring back various concepts from > swiftconfig/run into this mechanism. If we can agree on a > spec, perhaps that would be a good project for you to (resume) > work on? > > Here is what I see as the list of features we would need to add to the > shell script in order to merge features: > > From swiftconfig: > > - XML editor. It should have some basic knowledge of valid inputs (for > example, when you're editing the execution provider setting, it will > only allow you to select one that swift knows) > - Editor for adding/removing/modifying apps to the tc file and > verifying formatting > - Using templates to generate new site configurations > - Using templates to generate new templates > - Importing existing configuration files into the template > format/directory structure > - Creating and managing ssh configurations for various hosts > > From swiftrun: > > - A new swiftrun mechanism which reads from a template directory, > generates the run.XXXX directory, links input data, and runs swift > - The concept of site groups. For example, creating a group called > "MCS-coasters" which combines the individual configurations of > thrash-coasters, thwomp-coasters, etc. > - Similar to above, application sets for handling different sets of > apps > > It's possible we may not need or want all of these. I just wanted to > point out what would be involved if we merged the main features as > they currently exist in the swiftconfig/swiftrun utilities. I also > have a list of suggestions for future improvements which include > things like revision control for modifications and a tagging/search > system for searching through previous swift runs. > > The suggestion was moving away from perl due to portability issues. I > can understand that, but what do you think about doing it in Java? > Bash is nice, but in some ways it suffers from the same portability > issues that perl does. Here are a few issues I have run into so far. > The "swift" shell script we use calls /bin/sh. Sometimes this is bash. > On some linux systems it's actually a shell called ash, which is > similar but does not always compatible. If you can assume there will > always be a /bin/bash, there are individual differences between bash > versions. When I was testing changes to usage stats, I ran into a > really old version of bash which caused it to fail. Then there are > things like /dev/udp which may or may not be turned on > based on bash compilation options. Bash is nice, but it's a little > limited in what is can do by itself without relying on a lot of > external applications.. which may not be always be installed on the > system or work as expected. > > The nice thing about doing it in Java is that you can bank on Java 1.5 > or later being there regardless of whatever system quirks you may run > into. Java also includes XML handling libraries and has nice regular > expression system which would make this a little easier to handle. It > will also be easier to manage as complexity increases in the future. > > Shifting gears a bit.. what I would _really_ would like in the future > is a graphical interface on top of swiftconfig and swiftrun. I may be > the only one, but I haven't given up on this idea :-) > > - Have one visual interface running locally which will let me do > everything > - Easily and visually configure swift to submit jobs on a remote > system > - Visually create a workflow which generates a swift script > - Select a swift script and be able to easily change parameters, > queue, maxtime, # nodes, etc on the fly > - See a visual progression of what is happening. Something like a > swing version of -tui > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Wed Feb 2 15:26:05 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 02 Feb 2011 13:26:05 -0800 Subject: [Swift-devel] Swiftconfig merge / changes In-Reply-To: <1296681763.9069.1.camel@blabla2.none> References: <4d48f3d9.431eec0a.5bab.ffffc2a2@mx.google.com> <1296681763.9069.1.camel@blabla2.none> Message-ID: <1296681965.9069.2.camel@blabla2.none> On Wed, 2011-02-02 at 13:22 -0800, Mihael Hategan wrote: > On Wed, 2011-02-02 at 00:04 -0600, jon.monette at gmail.com wrote: > > I agree switching the swiftrun/swiftconfig to using bash may not be > > the best idea due to the above reasons David has mentioned. However I > > do not believe switching over to using Java is the solution. In the > > end swiftrun is calling swift which is a shell script. Having a java > > process call a shell script which in turns starts java processes may > > not be the best plan of action. > > That doesn't need to be so. A java program can invoke the main() method > of swift. In fact, this may very well turn out to fit into that swift shell model that Mike mentioned that would keep coaster workers (and other JVM persistent things) alive between swift runs. > > > May I suggest using python? Not much changes between python versions > > and it has very good xml libraries to work with. I do not believe > > there will be very many comparability issues in using python but in > > all programming there are always these issues. Python is also much > > easier to read in my humble opinion. > > > > ----- Reply message ----- > > From: "David Kelly" > > Date: Tue, Feb 1, 2011 11:24 pm > > Subject: [Swift-devel] Swiftconfig merge / changes > > To: "Michael Wilde" > > Cc: "swift-devel" > > > > > > > > > > On Tue, Feb 1, 2011 at 12:07 PM, Michael Wilde > > wrote: > > > > Also, Im very eager to reconcile and unify into one > > coordinated plan the idea of merging the best aspects of > > Justin's ad-hoc configuration methid, which he's documented on > > the SWFT wiki (see eg CoasterCookbook etc) and your > > swiftconfig/swiftrun mechanism. > > > > I like the fact that Justin's is based on sh rather then perl > > - that will make maintenance easier and reduce portability > > issues. But I also want to bring back various concepts from > > swiftconfig/run into this mechanism. If we can agree on a > > spec, perhaps that would be a good project for you to (resume) > > work on? > > > > Here is what I see as the list of features we would need to add to the > > shell script in order to merge features: > > > > From swiftconfig: > > > > - XML editor. It should have some basic knowledge of valid inputs (for > > example, when you're editing the execution provider setting, it will > > only allow you to select one that swift knows) > > - Editor for adding/removing/modifying apps to the tc file and > > verifying formatting > > - Using templates to generate new site configurations > > - Using templates to generate new templates > > - Importing existing configuration files into the template > > format/directory structure > > - Creating and managing ssh configurations for various hosts > > > > From swiftrun: > > > > - A new swiftrun mechanism which reads from a template directory, > > generates the run.XXXX directory, links input data, and runs swift > > - The concept of site groups. For example, creating a group called > > "MCS-coasters" which combines the individual configurations of > > thrash-coasters, thwomp-coasters, etc. > > - Similar to above, application sets for handling different sets of > > apps > > > > It's possible we may not need or want all of these. I just wanted to > > point out what would be involved if we merged the main features as > > they currently exist in the swiftconfig/swiftrun utilities. I also > > have a list of suggestions for future improvements which include > > things like revision control for modifications and a tagging/search > > system for searching through previous swift runs. > > > > The suggestion was moving away from perl due to portability issues. I > > can understand that, but what do you think about doing it in Java? > > Bash is nice, but in some ways it suffers from the same portability > > issues that perl does. Here are a few issues I have run into so far. > > The "swift" shell script we use calls /bin/sh. Sometimes this is bash. > > On some linux systems it's actually a shell called ash, which is > > similar but does not always compatible. If you can assume there will > > always be a /bin/bash, there are individual differences between bash > > versions. When I was testing changes to usage stats, I ran into a > > really old version of bash which caused it to fail. Then there are > > things like /dev/udp which may or may not be turned on > > based on bash compilation options. Bash is nice, but it's a little > > limited in what is can do by itself without relying on a lot of > > external applications.. which may not be always be installed on the > > system or work as expected. > > > > The nice thing about doing it in Java is that you can bank on Java 1.5 > > or later being there regardless of whatever system quirks you may run > > into. Java also includes XML handling libraries and has nice regular > > expression system which would make this a little easier to handle. It > > will also be easier to manage as complexity increases in the future. > > > > Shifting gears a bit.. what I would _really_ would like in the future > > is a graphical interface on top of swiftconfig and swiftrun. I may be > > the only one, but I haven't given up on this idea :-) > > > > - Have one visual interface running locally which will let me do > > everything > > - Easily and visually configure swift to submit jobs on a remote > > system > > - Visually create a workflow which generates a swift script > > - Select a swift script and be able to easily change parameters, > > queue, maxtime, # nodes, etc on the fly > > - See a visual progression of what is happening. Something like a > > swing version of -tui > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From jon.monette at gmail.com Wed Feb 2 15:31:38 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Wed, 02 Feb 2011 15:31:38 -0600 Subject: [Swift-devel] Swiftconfig merge / changes In-Reply-To: <1296681965.9069.2.camel@blabla2.none> References: <4d48f3d9.431eec0a.5bab.ffffc2a2@mx.google.com> <1296681763.9069.1.camel@blabla2.none> <1296681965.9069.2.camel@blabla2.none> Message-ID: <4D49CD3A.2040605@gmail.com> I was not aware that a java program could call another java programs main method. By not aware I mean I do not know how. On 2/2/11 3:26 PM, Mihael Hategan wrote: > On Wed, 2011-02-02 at 13:22 -0800, Mihael Hategan wrote: >> On Wed, 2011-02-02 at 00:04 -0600, jon.monette at gmail.com wrote: >>> I agree switching the swiftrun/swiftconfig to using bash may not be >>> the best idea due to the above reasons David has mentioned. However I >>> do not believe switching over to using Java is the solution. In the >>> end swiftrun is calling swift which is a shell script. Having a java >>> process call a shell script which in turns starts java processes may >>> not be the best plan of action. >> That doesn't need to be so. A java program can invoke the main() method >> of swift. > In fact, this may very well turn out to fit into that swift shell model > that Mike mentioned that would keep coaster workers (and other JVM > persistent things) alive between swift runs. > >>> May I suggest using python? Not much changes between python versions >>> and it has very good xml libraries to work with. I do not believe >>> there will be very many comparability issues in using python but in >>> all programming there are always these issues. Python is also much >>> easier to read in my humble opinion. >>> >>> ----- Reply message ----- >>> From: "David Kelly" >>> Date: Tue, Feb 1, 2011 11:24 pm >>> Subject: [Swift-devel] Swiftconfig merge / changes >>> To: "Michael Wilde" >>> Cc: "swift-devel" >>> >>> >>> >>> >>> On Tue, Feb 1, 2011 at 12:07 PM, Michael Wilde >>> wrote: >>> >>> Also, Im very eager to reconcile and unify into one >>> coordinated plan the idea of merging the best aspects of >>> Justin's ad-hoc configuration methid, which he's documented on >>> the SWFT wiki (see eg CoasterCookbook etc) and your >>> swiftconfig/swiftrun mechanism. >>> >>> I like the fact that Justin's is based on sh rather then perl >>> - that will make maintenance easier and reduce portability >>> issues. But I also want to bring back various concepts from >>> swiftconfig/run into this mechanism. If we can agree on a >>> spec, perhaps that would be a good project for you to (resume) >>> work on? >>> >>> Here is what I see as the list of features we would need to add to the >>> shell script in order to merge features: >>> >>> From swiftconfig: >>> >>> - XML editor. It should have some basic knowledge of valid inputs (for >>> example, when you're editing the execution provider setting, it will >>> only allow you to select one that swift knows) >>> - Editor for adding/removing/modifying apps to the tc file and >>> verifying formatting >>> - Using templates to generate new site configurations >>> - Using templates to generate new templates >>> - Importing existing configuration files into the template >>> format/directory structure >>> - Creating and managing ssh configurations for various hosts >>> >>> From swiftrun: >>> >>> - A new swiftrun mechanism which reads from a template directory, >>> generates the run.XXXX directory, links input data, and runs swift >>> - The concept of site groups. For example, creating a group called >>> "MCS-coasters" which combines the individual configurations of >>> thrash-coasters, thwomp-coasters, etc. >>> - Similar to above, application sets for handling different sets of >>> apps >>> >>> It's possible we may not need or want all of these. I just wanted to >>> point out what would be involved if we merged the main features as >>> they currently exist in the swiftconfig/swiftrun utilities. I also >>> have a list of suggestions for future improvements which include >>> things like revision control for modifications and a tagging/search >>> system for searching through previous swift runs. >>> >>> The suggestion was moving away from perl due to portability issues. I >>> can understand that, but what do you think about doing it in Java? >>> Bash is nice, but in some ways it suffers from the same portability >>> issues that perl does. Here are a few issues I have run into so far. >>> The "swift" shell script we use calls /bin/sh. Sometimes this is bash. >>> On some linux systems it's actually a shell called ash, which is >>> similar but does not always compatible. If you can assume there will >>> always be a /bin/bash, there are individual differences between bash >>> versions. When I was testing changes to usage stats, I ran into a >>> really old version of bash which caused it to fail. Then there are >>> things like /dev/udp which may or may not be turned on >>> based on bash compilation options. Bash is nice, but it's a little >>> limited in what is can do by itself without relying on a lot of >>> external applications.. which may not be always be installed on the >>> system or work as expected. >>> >>> The nice thing about doing it in Java is that you can bank on Java 1.5 >>> or later being there regardless of whatever system quirks you may run >>> into. Java also includes XML handling libraries and has nice regular >>> expression system which would make this a little easier to handle. It >>> will also be easier to manage as complexity increases in the future. >>> >>> Shifting gears a bit.. what I would _really_ would like in the future >>> is a graphical interface on top of swiftconfig and swiftrun. I may be >>> the only one, but I haven't given up on this idea :-) >>> >>> - Have one visual interface running locally which will let me do >>> everything >>> - Easily and visually configure swift to submit jobs on a remote >>> system >>> - Visually create a workflow which generates a swift script >>> - Select a swift script and be able to easily change parameters, >>> queue, maxtime, # nodes, etc on the fly >>> - See a visual progression of what is happening. Something like a >>> swing version of -tui >>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Wed Feb 2 15:34:56 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 02 Feb 2011 13:34:56 -0800 Subject: [Swift-devel] Swiftconfig merge / changes In-Reply-To: <4D49CD3A.2040605@gmail.com> References: <4d48f3d9.431eec0a.5bab.ffffc2a2@mx.google.com> <1296681763.9069.1.camel@blabla2.none> <1296681965.9069.2.camel@blabla2.none> <4D49CD3A.2040605@gmail.com> Message-ID: <1296682496.9069.3.camel@blabla2.none> On Wed, 2011-02-02 at 15:31 -0600, Jonathan Monette wrote: > I was not aware that a java program could call another java programs > main method. By not aware I mean I do not know how. The main method is a plain static method. So ClassName.main(new String[] {"arg1", "arg2", ...}); > > On 2/2/11 3:26 PM, Mihael Hategan wrote: > > On Wed, 2011-02-02 at 13:22 -0800, Mihael Hategan wrote: > >> On Wed, 2011-02-02 at 00:04 -0600, jon.monette at gmail.com wrote: > >>> I agree switching the swiftrun/swiftconfig to using bash may not be > >>> the best idea due to the above reasons David has mentioned. However I > >>> do not believe switching over to using Java is the solution. In the > >>> end swiftrun is calling swift which is a shell script. Having a java > >>> process call a shell script which in turns starts java processes may > >>> not be the best plan of action. > >> That doesn't need to be so. A java program can invoke the main() method > >> of swift. > > In fact, this may very well turn out to fit into that swift shell model > > that Mike mentioned that would keep coaster workers (and other JVM > > persistent things) alive between swift runs. > > > >>> May I suggest using python? Not much changes between python versions > >>> and it has very good xml libraries to work with. I do not believe > >>> there will be very many comparability issues in using python but in > >>> all programming there are always these issues. Python is also much > >>> easier to read in my humble opinion. > >>> > >>> ----- Reply message ----- > >>> From: "David Kelly" > >>> Date: Tue, Feb 1, 2011 11:24 pm > >>> Subject: [Swift-devel] Swiftconfig merge / changes > >>> To: "Michael Wilde" > >>> Cc: "swift-devel" > >>> > >>> > >>> > >>> > >>> On Tue, Feb 1, 2011 at 12:07 PM, Michael Wilde > >>> wrote: > >>> > >>> Also, Im very eager to reconcile and unify into one > >>> coordinated plan the idea of merging the best aspects of > >>> Justin's ad-hoc configuration methid, which he's documented on > >>> the SWFT wiki (see eg CoasterCookbook etc) and your > >>> swiftconfig/swiftrun mechanism. > >>> > >>> I like the fact that Justin's is based on sh rather then perl > >>> - that will make maintenance easier and reduce portability > >>> issues. But I also want to bring back various concepts from > >>> swiftconfig/run into this mechanism. If we can agree on a > >>> spec, perhaps that would be a good project for you to (resume) > >>> work on? > >>> > >>> Here is what I see as the list of features we would need to add to the > >>> shell script in order to merge features: > >>> > >>> From swiftconfig: > >>> > >>> - XML editor. It should have some basic knowledge of valid inputs (for > >>> example, when you're editing the execution provider setting, it will > >>> only allow you to select one that swift knows) > >>> - Editor for adding/removing/modifying apps to the tc file and > >>> verifying formatting > >>> - Using templates to generate new site configurations > >>> - Using templates to generate new templates > >>> - Importing existing configuration files into the template > >>> format/directory structure > >>> - Creating and managing ssh configurations for various hosts > >>> > >>> From swiftrun: > >>> > >>> - A new swiftrun mechanism which reads from a template directory, > >>> generates the run.XXXX directory, links input data, and runs swift > >>> - The concept of site groups. For example, creating a group called > >>> "MCS-coasters" which combines the individual configurations of > >>> thrash-coasters, thwomp-coasters, etc. > >>> - Similar to above, application sets for handling different sets of > >>> apps > >>> > >>> It's possible we may not need or want all of these. I just wanted to > >>> point out what would be involved if we merged the main features as > >>> they currently exist in the swiftconfig/swiftrun utilities. I also > >>> have a list of suggestions for future improvements which include > >>> things like revision control for modifications and a tagging/search > >>> system for searching through previous swift runs. > >>> > >>> The suggestion was moving away from perl due to portability issues. I > >>> can understand that, but what do you think about doing it in Java? > >>> Bash is nice, but in some ways it suffers from the same portability > >>> issues that perl does. Here are a few issues I have run into so far. > >>> The "swift" shell script we use calls /bin/sh. Sometimes this is bash. > >>> On some linux systems it's actually a shell called ash, which is > >>> similar but does not always compatible. If you can assume there will > >>> always be a /bin/bash, there are individual differences between bash > >>> versions. When I was testing changes to usage stats, I ran into a > >>> really old version of bash which caused it to fail. Then there are > >>> things like /dev/udp which may or may not be turned on > >>> based on bash compilation options. Bash is nice, but it's a little > >>> limited in what is can do by itself without relying on a lot of > >>> external applications.. which may not be always be installed on the > >>> system or work as expected. > >>> > >>> The nice thing about doing it in Java is that you can bank on Java 1.5 > >>> or later being there regardless of whatever system quirks you may run > >>> into. Java also includes XML handling libraries and has nice regular > >>> expression system which would make this a little easier to handle. It > >>> will also be easier to manage as complexity increases in the future. > >>> > >>> Shifting gears a bit.. what I would _really_ would like in the future > >>> is a graphical interface on top of swiftconfig and swiftrun. I may be > >>> the only one, but I haven't given up on this idea :-) > >>> > >>> - Have one visual interface running locally which will let me do > >>> everything > >>> - Easily and visually configure swift to submit jobs on a remote > >>> system > >>> - Visually create a workflow which generates a swift script > >>> - Select a swift script and be able to easily change parameters, > >>> queue, maxtime, # nodes, etc on the fly > >>> - See a visual progression of what is happening. Something like a > >>> swing version of -tui > >>> > >>> > >>> _______________________________________________ > >>> Swift-devel mailing list > >>> Swift-devel at ci.uchicago.edu > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From wozniak at mcs.anl.gov Wed Feb 2 16:15:02 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Wed, 2 Feb 2011 16:15:02 -0600 (Central Standard Time) Subject: [Swift-devel] Test suite group status and display In-Reply-To: <661863645.9716.1296583356570.JavaMail.root@zimbra.anl.gov> References: <661863645.9716.1296583356570.JavaMail.root@zimbra.anl.gov> Message-ID: Yeah, this looks useful and should be easy to implement. Groups are currently very ad hoc. On Tue, 1 Feb 2011, Michael Wilde wrote: > Im looking for from a way for users to see what tests are available, and > which are reasonable to run in different settings and for various > purposes. This should assist new users in initial testing and validation > of their environment, as well as experienced users in knowing what tests > are available and working (or not). > > Would it be useful to have an option to nightly.sh to display all the > directories below tests/ that represent valid test groups? > > Something like: ./nighly.sh -g > > com$ find providers -type d | grep -v svn > providers > providers/local-pbs > providers/local-pbs/pads > providers/local-pbs/queenbee > providers/local-cobalt > providers/local-cobalt/surveyor > providers/local-cobalt/intrepid > providers/ssh > providers/sge-local > providers/local > providers/local-pbs-coasters > providers/ssh-pbs-coasters > com$ > > Would it be useful to put a status file in each group dir to help identify what dirs are really groups, and what the current status of those tests are? > > - Mike > > > > > -- Justin M Wozniak From aespinosa at cs.uchicago.edu Wed Feb 2 18:31:22 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Wed, 2 Feb 2011 18:31:22 -0600 Subject: [Swift-devel] Re: Needs for site selection and job scheduling enhancements In-Reply-To: <1296601996.4182.22.camel@blabla2.none> References: <513619430.7751.1296570622752.JavaMail.root@zimbra.anl.gov> <1296601996.4182.22.camel@blabla2.none> Message-ID: 2011/2/1 Mihael Hategan : > I think we need to slow down a bit :) > > On Tue, 2011-02-01 at 08:30 -0600, wilde at mcs.anl.gov wrote: >> Mihael, >> >> Below is a proposal for Swift scheduling features that will need a >> fair amount of deliberation. This email is intended to start the >> process. I can move this to a bugzila enhancement to start the >> process, so long as you agree that the discussion makes sense to have. >> >> Allan, Dan and I have been re-examining the SCEC workflow that Allan >> is working on. >> >> Doing it efficiently on OSG raises scheduling aspects that Swift still >> doesn't handle well. We propose to address these issues in two phases: > > I think it may be useful to spell those out. >> >> I. Use simple workflows that group more work into single scripts to >> achieve the job affinities needed for reasonable performance. Provide >> scheduling hints to Swift. >> >> II. Determine how Swift could automatically achieve the same >> scheduling decisions. >> >> Phase II is pretty complex as far as we can tell, so lets defer its >> discussion. >> >> To do phase I, we want to ask if any of the following capabilities >> could be added, and which ones are both reasonable and of "affordable" >> cost and make sense to try. >> >> Most of these involve enabling a Swift script to specify scheduling >> "hints" on individual app() invocations. > > Does the place where the hints are specified have any relevance? If we > put the hints in the swift source we lose the "site independence" > aspect. I wonder if site independence only works when your workflow is compute-intensive. What about a mechanism where you can checkpoint the site scores from other runs of a workflow? But that would be available for all the jobs in a site. I guess we could make a 1 site catalog per 1 app entry in the transformation catalog and do the 'hinting' at that level. >> > [...] >> And we might need a feature (perhaps a swift.properties setting) to >> tell Swift to defer initial scheduling decisions for N seconds or >> until J jobs have been queued by the script, so that a sufficiently >> large number of jobs are in the queue before scheduling decisions are >> made (probably delay for say a minute on a multi-hour script run). > > How would that help? Given that the scheduling is probabilistic, that > makes the distribution essentially the same whether you have N or N/2 > jobs. Here is what I think the motivation for this feature: Given a workflow with jobs grouped into m. Each group has {n_1, n_2, n_3, ..., n_m} jobs. Each group has a common data {d_1, d_2, ..., d_m}. Then let us say that n_1 > n_2 > n_3 > ... > n_m . From here, we say that scheduling group m on multiple sites does not make sense since there is only a few jobs that share a data. it would be better to bundle the jobs in group m into a single site. I wonder how you can factor that in the probablistic scores. >> >> In addition, we're wondering how easy (and desirable) any/all of the >> following language extensions could be done: >> >> - select statement to work on string values and/or ranges > > What would be the semantics of this statement? Can you give examples? >> >> - elseif clause to achieve the above in a multi-branch if statement > > Quite silly we don't support that already. At least officially in the documentation, it says we don't support it. >> >> - function pointers to select a function dynamically, eg from an array > > Well, I do like the idea of higher order functions, but that's not quite > the way we went with this in the start. Though I'm sure it could be > added. However, I would be curious to see the kind of problem that one > would solve with swift that would require this. > >> - ability to set the app program name from a variable > > Could you clarify that? >> >> These enhancements would enable us to manually code in the scheduling >> hints by providing multiple pool groups with different throttle >> settings and to manually force jobs to different pools. > > I feel that to be a contrived way to avoid java code. Things are > separated into components in order to isolate solutions to subproblems > into loosely connected parts of the code. The idea that we'd implement > scheduling features in the swift language seems to be the antithesis of > that design principle. >> >> If the easiest way to set the hints requested above on an individual >> job is to pass an env var on the command line, then that capability >> might be a useful alternative to setting env vars with >> one-value-for-all method that we currently employ with the ENV >> profile. ?This could be considered as a useful enhancement separate >> from the question of how scheduling hints are set. >> >> Lastly, in phase I we will be testing the performance of having the >> jobs "pull" files via wget in a pre-staged manner, within the >> applicaton script. ?For Phase II we'd like to consider having Swift do >> that in the worker: Have the coaster worker "pull" files in via wget >> or similar command/function, asynchronously pre-staging files for jobs >> that have been queued/assigned to a site. But that can be deferred for >> a later discussion. > > How is that different from the current worker staging mechanism (aside > from changing protocols and tools)? I.e., what is the theoretical > difference? >> > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Thu Feb 3 11:13:31 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 03 Feb 2011 09:13:31 -0800 Subject: [Swift-devel] Re: Needs for site selection and job scheduling enhancements In-Reply-To: References: <513619430.7751.1296570622752.JavaMail.root@zimbra.anl.gov> <1296601996.4182.22.camel@blabla2.none> Message-ID: <1296753211.13967.8.camel@blabla2.none> On Wed, 2011-02-02 at 18:31 -0600, Allan Espinosa wrote: > 2011/2/1 Mihael Hategan : > I wonder if site independence only works when your workflow is > compute-intensive. What about a mechanism where you can checkpoint > the site scores from other runs of a workflow? But that would be > available for all the jobs in a site. > > I guess we could make a 1 site catalog per 1 app entry in the > transformation catalog and do the 'hinting' at that level. Or augment the site catalog to contain app-specific biases. > > >> > > [...] > >> And we might need a feature (perhaps a swift.properties setting) to > >> tell Swift to defer initial scheduling decisions for N seconds or > >> until J jobs have been queued by the script, so that a sufficiently > >> large number of jobs are in the queue before scheduling decisions are > >> made (probably delay for say a minute on a multi-hour script run). > > > > How would that help? Given that the scheduling is probabilistic, that > > makes the distribution essentially the same whether you have N or N/2 > > jobs. > > Here is what I think the motivation for this feature: Given a > workflow with jobs grouped into m. Each group has {n_1, n_2, n_3, > ..., n_m} jobs. Each group has a common data {d_1, d_2, ..., d_m}. > > Then let us say that n_1 > n_2 > n_3 > ... > n_m . From here, we say > that scheduling group m on multiple sites does not make sense That's a strong statement. If that one site is busy enough that would cause additional jobs to take longer without data staging than it would take them to run on a different site with staging, then it would make sense. > since > there is only a few jobs that share a data. it would be better to > bundle the jobs in group m into a single site. I wonder how you can > factor that in the probablistic scores. Bias based on data locality. We had a student that did some preliminary work there, but it never really made it in. However I now see what Mike meant, and that is a windowing algorithm for deciding that bias. But I don't think that's ultimately necessary. I think a probabilistic approach would work ok without the need for a delay. > > >> > >> In addition, we're wondering how easy (and desirable) any/all of the > >> following language extensions could be done: > >> > >> - select statement to work on string values and/or ranges > > > > What would be the semantics of this statement? Can you give examples? > >> > >> - elseif clause to achieve the above in a multi-branch if statement > > > > Quite silly we don't support that already. > > At least officially in the documentation, it says we don't support it. I really have no idea whether this works or not, but if it doesn't it's silly. From hategan at mcs.anl.gov Thu Feb 3 11:04:22 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 03 Feb 2011 09:04:22 -0800 Subject: [Swift-devel] Re: [Swift-user] pbs ppn count and stuff In-Reply-To: <861161623.12935.1296677331670.JavaMail.root@zimbra.anl.gov> References: <861161623.12935.1296677331670.JavaMail.root@zimbra.anl.gov> Message-ID: <1296752662.13967.0.camel@blabla2.none> On Wed, 2011-02-02 at 14:08 -0600, Michael Wilde wrote: > Would 2PM tomorrow work? Works for me. > Justin, can you join this discussion? I'll set up a conf call once we confirm a time. > > Im inserting below an email thread started by Matt at NCAR on this > topic. He also refers back to a very old thread in which the same > issue was raised. > > - Mike > > > > ----- Forwarded Message ----- > From: "Matthew Woitaszek" > To: "Allan Espinosa" > Cc: swift-user at ci.uchicago.edu > Sent: Thursday, November 4, 2010 10:06:48 AM > Subject: Re: [Swift-user] Coasters and PBS resource requests: nodes and ppn > > > Hi Allan, > > Yep, that's it. When the coasters resource request comes in with just "nodes=1", it gets interpreted by PBS as nodes=1:ppn=1, and thus PBS puts other jobs on the node, too, until all 8 CPUs are allocated (e.g., 8 1-cpu PBS jobs are running on it). > > I'd like to find some way to make the request as: > nodes=1:ppn=8 > along with > workersPerNode=8 > so that PBS allocates one node and all 8 processors, and then one Coasters job would put 8 workers on it, matching the resource request with the use. > > Matthew > > > > > > On Wed, Nov 3, 2010 at 5:41 PM, Allan Espinosa < aespinosa at cs.uchicago.edu > wrote: > > > Hi Matthew, > > Does this mean, coasters will now submit nodes=1;ppn=1 and do node packing? > > If there is no node packing being initiated by PBS, you can just > specify workersPerNode=8 . But then what you request to PBS is now > different to what you actually use. > > -Allan > > 2010/11/3 Matthew Woitaszek < matthew.woitaszek at gmail.com >: > > > > > Good afternoon, > > > > Is there a way to update PBS resource requests when using coasters to supply > > modified PBS resource strings such as "nodes=1:ppn=8"? (Or other arbitrary > > resource requests, such as node properties?) > > > > Of course, I'm just trying to get coasters to allocate all of the processors > > on an 8-core node, using either the "gt2:gt2:pbs" or "local:pbs" provider. > > Both submit jobs just fine. I found no discernible difference with the > > "host_types" Globus namespace variable, presuming I'm setting it right. > > > > The particular cluster I'm using allows node packing for users that run lots > > of single-processor tasks, so without ppn, it will assume nodes=1,ncpus=1 > > and thus pack 8 jobs on each node before moving on to the next node. (I know > > it won't be an issue at sites that make nodes exclusive. On this system, the > > queue default is "nodes=1:ppn=8", but because coasters explicitly specifies > > the number of nodes in its generated resource request, the ppn default seems > > to get lost!) > > > > I see that this has been discussed as far back as 2007, and I found Marcin > > and Mike's previous discussion of the topic at > > > > http://mail.ci.uchicago.edu/pipermail/swift-user/2010-March/001409.html > > > > but there didn't seem to be any definitive conclusion. Any suggestions would > > be appreciated! > > > > Matthew > > > > -- > Allan M. Espinosa < http://amespinosa.wordpress.com > > PhD student, Computer Science > University of Chicago < http://people.cs.uchicago.edu/~aespinosa > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > > > ----- Original Message ----- > > On Tue, 2011-02-01 at 15:34 -0600, Michael Wilde wrote: > > > > > Lets start with a voice call and then bring the issue back to the > > > devel list. > > > > Can we do this on Thursday after 12:30 Chicago time? > > > > Mihael > From wilde at mcs.anl.gov Fri Feb 4 14:44:33 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 4 Feb 2011 14:44:33 -0600 (CST) Subject: [Swift-devel] Swift trunk broken? In-Reply-To: <858540826.21141.1296852228861.JavaMail.root@zimbra.anl.gov> Message-ID: <1631861994.21143.1296852273187.JavaMail.root@zimbra.anl.gov> Im getting strange errors from trunk at the moment from previously working scripts. Seemed to be unable to parse sites.xml. In the process of debugging this, I find that I cant get the simplest of swift scripts (a single trace statement) to run. Is anyone else encountering similar problems? Here's what I get: com$ cat hi.swift trace("hi"); com$ java -version java version "1.6.0_20" Java(TM) SE Runtime Environment (build 1.6.0_20-b02) Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) com$ swift -version Swift svn swift-r4061 cog-r3046 com$ swift hi.swift Swift svn swift-r4061 cog-r3046 RunID: 20110204-1439-j430gp9g Progress: time:0 Execution failed: 1 names specified; 0 arguments found Time: 1.179, rate: 13896 j/s com$ cat hi.xml hi com$ com$ cat *9g.log 2011-02-04 14:39:29,205-0600 DEBUG Loader Max heap: 238616576 2011-02-04 14:39:29,206-0600 DEBUG Loader kmlversion is >a1ce0de8-81e9-4226-987a-0bbcb40af008< 2011-02-04 14:39:29,207-0600 DEBUG Loader build version is >a1ce0de8-81e9-4226-987a-0bbcb40af008< 2011-02-04 14:39:29,207-0600 DEBUG Loader Recompilation suppressed. 2011-02-04 14:39:29,343-0600 INFO VDL2ExecutionContext Stack dump: Level 1 [iA = 0, iB = 0, bA = false, bB = false] vdl:instanceconfig = Swift configuration [] vdl:operation = run swift.home = /home/wilde/swift/rev/trunk/bin/.. PATH_SEPARATOR = / 2011-02-04 14:39:29,900-0600 INFO unknown Using sites file: /home/wilde/swift/rev/trunk/bin/../etc/sites.xml 2011-02-04 14:39:29,928-0600 INFO unknown Using tc.data: /home/wilde/swift/rev/trunk/bin/../etc/tc.data 2011-02-04 14:39:30,023-0600 INFO AbstractScheduler Setting resources to: {localhost=localhost} 2011-02-04 14:39:30,468-0600 INFO unknown Swift svn swift-r4061 cog-r3046 2011-02-04 14:39:30,469-0600 INFO unknown RUNID id=run:20110204-1439-j430gp9g 2011-02-04 14:39:30,511-0600 DEBUG VDL2ExecutionContext 1 names specified; 0 arguments found 1 names specified; 0 arguments found at org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) 2011-02-04 14:39:30,522-0600 INFO ExecutionContext Detailed exception: 1 names specified; 0 arguments found at org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) 2011-02-04 14:39:30,522-0600 INFO Loader Swift finished with errors com$ -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From bugzilla-daemon at mcs.anl.gov Sat Feb 5 07:12:16 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 5 Feb 2011 07:12:16 -0600 (CST) Subject: [Swift-devel] [Bug 251] New: swift -version should include version and cog and swift branch info Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=251 Summary: swift -version should include version and cog and swift branch info Product: Swift Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P3 Component: SwiftScript language AssignedTo: hategan at mcs.anl.gov ReportedBy: wilde at mcs.anl.gov swift -version should include the swift version (eg, 0.92 or trunk) and the cog and swift branch names used. Otherwise there is no way to distinguish between, e.g., a build of swift from the trunks or a specific branch. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Sun Feb 6 13:32:35 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 6 Feb 2011 13:32:35 -0600 (CST) Subject: [Swift-devel] [Bug 253] New: Enhance coaster timeout processing in passive and persistent modes Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=253 Summary: Enhance coaster timeout processing in passive and persistent modes Product: Swift Version: unspecified Platform: All OS/Version: All Status: NEW Severity: enhancement Priority: P1 Component: SwiftScript language AssignedTo: hategan at mcs.anl.gov ReportedBy: wilde at mcs.anl.gov There is concern/evidence that coaster timeout processing is still not working well in at least passive persistent mode, and possibly ordinary persistent mode. The following IM thread between Mihael and Mike on Jan 17 5:05 PM CST describe the issues and initial diagnosis. --- 6:06:26 PM Michael Wilde: thing 2: the coaster timeout ignore feature 6:06:49 PM Michael Wilde: thing 3: make thing 2 turn of timing for passive as well, *or* based on a property 6:07:05 PM Michael Wilde: (all those are for SwiftR) 6:07:16 PM Michael Wilde: or, integrate fast back into trunk? 6:07:34 PM Michael Wilde: and then its just thing 3 (out of 2 ; ) 6:07:58 PM Mihael: what's 2? 6:08:50 PM Michael Wilde: the feature you added to disable worker timeouts when using persisten-coasters 6:09:22 PM Michael Wilde: Turns out SwiftR needs to disable those timers, but its using only passive but not persistent 6:09:44 PM Michael Wilde: so I needed to manually set the ignioreTimeouts flag in CoasterService 6:11:30 PM Mihael: it's disabled in all cases 6:12:07 PM Michael Wilde: no, not as far as my tests show 6:12:18 PM Michael Wilde: Im using just passive coasters, 6:12:35 PM Michael Wilde: and the workers die after a few minutes idle time 6:12:45 PM Mihael: that code is in the worker 6:12:46 PM Mihael: it got removed 6:12:56 PM Michael Wilde: the flag is set in the persisten-coaster service startup only, as far as i can tell 6:13:05 PM Michael Wilde: no, im talking about the java side 6:13:22 PM Michael Wilde: the service its telling the worker to shutdown 6:13:25 PM Michael Wilde: after a few minutes 6:13:57 PM Michael Wilde: if i force on the flag that is set on persisten-coaster startup, then the workers dont time out 6:14:46 PM Mihael: shouldn't happen 6:14:47 PM Mihael: but I can check 6:14:57 PM Michael Wilde: k, thanks 6:15:10 PM Michael Wilde: so your intent was to disable *all* worker timeout? 6:15:34 PM Michael Wilde: leaving it on for normal auto coasters and off if anything is running passive or persistent seems to make sense 6:15:37 PM Michael Wilde: i think 6:16:26 PM Mihael: yes 6:16:29 PM Mihael: all worker timeout is disabled 6:16:44 PM Mihael: the workers still die when no service is present due to lack of heartbeat 6:16:46 PM Michael Wilde: thats not the behavior I am getting 6:16:50 PM Michael Wilde: ah 6:17:17 PM Michael Wilde: but no - when I logged them, they were getting a shutdown from the service 6:17:29 PM Michael Wilde: would that in *turn* be from lack of heartbeat? 6:18:25 PM Michael Wilde: This is the change I had to make to prevent the workers from quitting: 6:18:25 PM Mihael: no 6:18:26 PM Michael Wilde: private synchronized void checkIdleTime() { + return; + /* if (ignoreIdleTime) { return; } @@ -238,6 +240,7 @@ shutdown(); } } + */ } public synchronized void suspend() { @@ -342,7 +345,8 @@ } public boolean getIgnoreIdleTime() { - return ignoreIdleTime; + return true; // ignoreIdleTime; MW: set this based on a swift.properties property or ??? + // need to disable idle timeout for the R passive coaster config } 6:18:27 PM Mihael: probably not 6:18:56 PM Michael Wilde: This *seems* to work for me, although "seems" can be deceiveing 6:19:00 PM Mihael: the persitent worker manager should not shut down workers 6:19:04 PM Mihael: there may be a bug there 6:19:22 PM Michael Wilde: but thats the point, I am not *running* the persistent mgr 6:19:36 PM Michael Wilde: Im using normal coaster provider, in passive mode 6:20:05 PM Michael Wilde: so that ignoreIdleTimeout flag is not getting set 6:20:47 PM Michael Wilde: btw my change above to getIgnoreIdleTime doesnt work, that is not used it seems. 6:21:40 PM Mihael: ah, ok 6:21:44 PM Mihael: right 6:21:48 PM Mihael: I meant passive mode 6:22:00 PM Mihael: "passive worker manager" 6:22:27 PM Mihael: @Override protected void removeIdleBlocks() { // no removing of idle blocks here 6:22:45 PM Mihael: though they might get shut down by something else 6:22:50 PM Mihael: I will have to check 6:23:03 PM Michael Wilde: Ah, ok. so yes, that may be a bug, and it seems to get de-bugged if force the immediate return form checkIdleTime. 6:23:07 PM Michael Wilde: k. 6:23:46 PM Michael Wilde: So, back t the fast branch issue: whats your judgement on that? Integrate it into trunk, or integrate selected changes into fast? 6:24:20 PM Michael Wilde: I was thinking to re-do my small tasks-per-sec tests between fast and trunk, and see how much fast helps 6:25:01 PM Michael Wilde: I can probably do the needed changes in a local copy of fast, unless integration of fast is more imminent 6:25:27 PM Mihael: we'll put that into trunk next -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From bugzilla-daemon at mcs.anl.gov Sun Feb 6 18:40:04 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 6 Feb 2011 18:40:04 -0600 (CST) Subject: [Swift-devel] [Bug 253] Enhance coaster timeout processing in passive and persistent modes In-Reply-To: References: Message-ID: <20110207004004.8C4E21BD89@wind.mcs.anl.gov> https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=253 --- Comment #1 from Michael Wilde 2011-02-06 18:40:04 --- I missed this message from Mihael: > There was also the issue regarding coaster timeout in this IM thread > below from Jan 17 5:05 PM CST. I cant recall if you fixed that > already; I *think* you did but need to check. I think this is the same as the problem Johnathan has been experiencing and which should be fixed in the stable branch and possibly in trunk. The problem does indeed seem fixed in the 0.92 branch. Once we can run trunk again, I will verify that its fixed there too. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. You are watching the reporter. From hategan at mcs.anl.gov Mon Feb 7 12:03:42 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 07 Feb 2011 10:03:42 -0800 Subject: [Swift-devel] Swift trunk broken? In-Reply-To: <1631861994.21143.1296852273187.JavaMail.root@zimbra.anl.gov> References: <1631861994.21143.1296852273187.JavaMail.root@zimbra.anl.gov> Message-ID: <1297101822.14468.0.camel@blabla2.none> Ooops. I didn't get any emails over the weekend (well, it seems I did, but my otherwise reliable email notification didn't work). So I'm a bit behind. Mihael On Fri, 2011-02-04 at 14:44 -0600, Michael Wilde wrote: > Im getting strange errors from trunk at the moment from previously working scripts. Seemed to be unable to parse sites.xml. > > In the process of debugging this, I find that I cant get the simplest of swift scripts (a single trace statement) to run. > > Is anyone else encountering similar problems? > > Here's what I get: > > com$ cat hi.swift > trace("hi"); > com$ java -version > java version "1.6.0_20" > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > com$ swift -version > Swift svn swift-r4061 cog-r3046 > > com$ swift hi.swift > Swift svn swift-r4061 cog-r3046 > > RunID: 20110204-1439-j430gp9g > Progress: time:0 > Execution failed: > 1 names specified; 0 arguments found > Time: 1.179, rate: 13896 j/s > com$ cat hi.xml > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > xmlns:xs="http://www.w3.org/2001/XMLSchema"> > > hi > > > com$ > > com$ cat *9g.log > 2011-02-04 14:39:29,205-0600 DEBUG Loader Max heap: 238616576 > 2011-02-04 14:39:29,206-0600 DEBUG Loader kmlversion is >a1ce0de8-81e9-4226-987a-0bbcb40af008< > 2011-02-04 14:39:29,207-0600 DEBUG Loader build version is >a1ce0de8-81e9-4226-987a-0bbcb40af008< > 2011-02-04 14:39:29,207-0600 DEBUG Loader Recompilation suppressed. > 2011-02-04 14:39:29,343-0600 INFO VDL2ExecutionContext Stack dump: > Level 1 > [iA = 0, iB = 0, bA = false, bB = false] > vdl:instanceconfig = Swift configuration [] > vdl:operation = run > swift.home = /home/wilde/swift/rev/trunk/bin/.. > PATH_SEPARATOR = / > > > 2011-02-04 14:39:29,900-0600 INFO unknown Using sites file: /home/wilde/swift/rev/trunk/bin/../etc/sites.xml > 2011-02-04 14:39:29,928-0600 INFO unknown Using tc.data: /home/wilde/swift/rev/trunk/bin/../etc/tc.data > 2011-02-04 14:39:30,023-0600 INFO AbstractScheduler Setting resources to: {localhost=localhost} > 2011-02-04 14:39:30,468-0600 INFO unknown Swift svn swift-r4061 cog-r3046 > > 2011-02-04 14:39:30,469-0600 INFO unknown RUNID id=run:20110204-1439-j430gp9g > 2011-02-04 14:39:30,511-0600 DEBUG VDL2ExecutionContext 1 names specified; 0 arguments found > 1 names specified; 0 arguments found > > at org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > 2011-02-04 14:39:30,522-0600 INFO ExecutionContext Detailed exception: > 1 names specified; 0 arguments found > > at org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > 2011-02-04 14:39:30,522-0600 INFO Loader Swift finished with errors > com$ > > > From hategan at mcs.anl.gov Mon Feb 7 12:54:49 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 07 Feb 2011 10:54:49 -0800 Subject: [Swift-devel] Swift trunk broken? In-Reply-To: <1297101822.14468.0.camel@blabla2.none> References: <1631861994.21143.1296852273187.JavaMail.root@zimbra.anl.gov> <1297101822.14468.0.camel@blabla2.none> Message-ID: <1297104889.17005.1.camel@blabla2.none> Should be fixed now. On Mon, 2011-02-07 at 10:03 -0800, Mihael Hategan wrote: > Ooops. I didn't get any emails over the weekend (well, it seems I did, > but my otherwise reliable email notification didn't work). So I'm a bit > behind. > > Mihael > > On Fri, 2011-02-04 at 14:44 -0600, Michael Wilde wrote: > > Im getting strange errors from trunk at the moment from previously working scripts. Seemed to be unable to parse sites.xml. > > > > In the process of debugging this, I find that I cant get the simplest of swift scripts (a single trace statement) to run. > > > > Is anyone else encountering similar problems? > > > > Here's what I get: > > > > com$ cat hi.swift > > trace("hi"); > > com$ java -version > > java version "1.6.0_20" > > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > > com$ swift -version > > Swift svn swift-r4061 cog-r3046 > > > > com$ swift hi.swift > > Swift svn swift-r4061 cog-r3046 > > > > RunID: 20110204-1439-j430gp9g > > Progress: time:0 > > Execution failed: > > 1 names specified; 0 arguments found > > Time: 1.179, rate: 13896 j/s > > com$ cat hi.xml > > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > > xmlns:xs="http://www.w3.org/2001/XMLSchema"> > > > > hi > > > > > > com$ > > > > com$ cat *9g.log > > 2011-02-04 14:39:29,205-0600 DEBUG Loader Max heap: 238616576 > > 2011-02-04 14:39:29,206-0600 DEBUG Loader kmlversion is >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > 2011-02-04 14:39:29,207-0600 DEBUG Loader build version is >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > 2011-02-04 14:39:29,207-0600 DEBUG Loader Recompilation suppressed. > > 2011-02-04 14:39:29,343-0600 INFO VDL2ExecutionContext Stack dump: > > Level 1 > > [iA = 0, iB = 0, bA = false, bB = false] > > vdl:instanceconfig = Swift configuration [] > > vdl:operation = run > > swift.home = /home/wilde/swift/rev/trunk/bin/.. > > PATH_SEPARATOR = / > > > > > > 2011-02-04 14:39:29,900-0600 INFO unknown Using sites file: /home/wilde/swift/rev/trunk/bin/../etc/sites.xml > > 2011-02-04 14:39:29,928-0600 INFO unknown Using tc.data: /home/wilde/swift/rev/trunk/bin/../etc/tc.data > > 2011-02-04 14:39:30,023-0600 INFO AbstractScheduler Setting resources to: {localhost=localhost} > > 2011-02-04 14:39:30,468-0600 INFO unknown Swift svn swift-r4061 cog-r3046 > > > > 2011-02-04 14:39:30,469-0600 INFO unknown RUNID id=run:20110204-1439-j430gp9g > > 2011-02-04 14:39:30,511-0600 DEBUG VDL2ExecutionContext 1 names specified; 0 arguments found > > 1 names specified; 0 arguments found > > > > at org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:619) > > 2011-02-04 14:39:30,522-0600 INFO ExecutionContext Detailed exception: > > 1 names specified; 0 arguments found > > > > at org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:619) > > 2011-02-04 14:39:30,522-0600 INFO Loader Swift finished with errors > > com$ > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From tim.g.armstrong at gmail.com Mon Feb 7 16:33:33 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Mon, 7 Feb 2011 16:33:33 -0600 Subject: [Swift-devel] Swift trunk broken? In-Reply-To: <1297104889.17005.1.camel@blabla2.none> References: <1631861994.21143.1296852273187.JavaMail.root@zimbra.anl.gov> <1297101822.14468.0.camel@blabla2.none> <1297104889.17005.1.camel@blabla2.none> Message-ID: I've run into a different problem... Swift fails with the following exception: Execution failed: java.lang.NullPointerException at org.globus.cog.karajan.Optimizer.optimize0(Optimizer.java:36) at org.globus.cog.karajan.Optimizer.optimize(Optimizer.java:28) at org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:207) at org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:192) at org.globus.cog.karajan.workflow.nodes.ExecuteFile.post(ExecuteFile.java:128) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) at org.globus.cog.karajan.workflow.nodes.ExecuteFile.completed(ExecuteFile.java:155) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Time: 0.535, rate: 30624 j/s Nothing really appears in the logs: 2011-02-07 16:32:22,668-0600 DEBUG Loader Loader started - Tim On Mon, Feb 7, 2011 at 12:54 PM, Mihael Hategan wrote: > Should be fixed now. > > On Mon, 2011-02-07 at 10:03 -0800, Mihael Hategan wrote: > > Ooops. I didn't get any emails over the weekend (well, it seems I did, > > but my otherwise reliable email notification didn't work). So I'm a bit > > behind. > > > > Mihael > > > > On Fri, 2011-02-04 at 14:44 -0600, Michael Wilde wrote: > > > Im getting strange errors from trunk at the moment from previously > working scripts. Seemed to be unable to parse sites.xml. > > > > > > In the process of debugging this, I find that I cant get the simplest > of swift scripts (a single trace statement) to run. > > > > > > Is anyone else encountering similar problems? > > > > > > Here's what I get: > > > > > > com$ cat hi.swift > > > trace("hi"); > > > com$ java -version > > > java version "1.6.0_20" > > > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > > > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > > > com$ swift -version > > > Swift svn swift-r4061 cog-r3046 > > > > > > com$ swift hi.swift > > > Swift svn swift-r4061 cog-r3046 > > > > > > RunID: 20110204-1439-j430gp9g > > > Progress: time:0 > > > Execution failed: > > > 1 names specified; 0 arguments found > > > Time: 1.179, rate: 13896 j/s > > > com$ cat hi.xml > > > > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > > > xmlns:xs="http://www.w3.org/2001/XMLSchema"> > > > > > > hi > > > > > > > > > com$ > > > > > > com$ cat *9g.log > > > 2011-02-04 14:39:29,205-0600 DEBUG Loader Max heap: 238616576 > > > 2011-02-04 14:39:29,206-0600 DEBUG Loader kmlversion is > >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > > 2011-02-04 14:39:29,207-0600 DEBUG Loader build version is > >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > > 2011-02-04 14:39:29,207-0600 DEBUG Loader Recompilation suppressed. > > > 2011-02-04 14:39:29,343-0600 INFO VDL2ExecutionContext Stack dump: > > > Level 1 > > > [iA = 0, iB = 0, bA = false, bB = false] > > > vdl:instanceconfig = Swift configuration [] > > > vdl:operation = run > > > swift.home = /home/wilde/swift/rev/trunk/bin/.. > > > PATH_SEPARATOR = / > > > > > > > > > 2011-02-04 14:39:29,900-0600 INFO unknown Using sites file: > /home/wilde/swift/rev/trunk/bin/../etc/sites.xml > > > 2011-02-04 14:39:29,928-0600 INFO unknown Using tc.data: > /home/wilde/swift/rev/trunk/bin/../etc/tc.data > > > 2011-02-04 14:39:30,023-0600 INFO AbstractScheduler Setting resources > to: {localhost=localhost} > > > 2011-02-04 14:39:30,468-0600 INFO unknown Swift svn swift-r4061 > cog-r3046 > > > > > > 2011-02-04 14:39:30,469-0600 INFO unknown RUNID > id=run:20110204-1439-j430gp9g > > > 2011-02-04 14:39:30,511-0600 DEBUG VDL2ExecutionContext 1 names > specified; 0 arguments found > > > 1 names specified; 0 arguments found > > > > > > at > org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > at > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > at > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > at > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > > > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > > at java.lang.Thread.run(Thread.java:619) > > > 2011-02-04 14:39:30,522-0600 INFO ExecutionContext Detailed exception: > > > 1 names specified; 0 arguments found > > > > > > at > org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > at > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > at > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > at > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > > > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > > at java.lang.Thread.run(Thread.java:619) > > > 2011-02-04 14:39:30,522-0600 INFO Loader Swift finished with errors > > > com$ > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Mon Feb 7 16:37:13 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Mon, 7 Feb 2011 16:37:13 -0600 Subject: [Swift-devel] Swift trunk broken? In-Reply-To: References: <1631861994.21143.1296852273187.JavaMail.root@zimbra.anl.gov> <1297101822.14468.0.camel@blabla2.none> <1297104889.17005.1.camel@blabla2.none> Message-ID: P.S. I am doing a clean build from the the latest svn versions of swift and cog On Mon, Feb 7, 2011 at 4:33 PM, Tim Armstrong wrote: > I've run into a different problem... > > Swift fails with the following exception: > > Execution failed: > java.lang.NullPointerException > at org.globus.cog.karajan.Optimizer.optimize0(Optimizer.java:36) > at org.globus.cog.karajan.Optimizer.optimize(Optimizer.java:28) > at > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:207) > at > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:192) > at > org.globus.cog.karajan.workflow.nodes.ExecuteFile.post(ExecuteFile.java:128) > > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > at > org.globus.cog.karajan.workflow.nodes.ExecuteFile.completed(ExecuteFile.java:155) > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > at > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at > java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > > Time: 0.535, rate: 30624 j/s > > Nothing really appears in the logs: > 2011-02-07 16:32:22,668-0600 DEBUG Loader Loader started > > > - Tim > > > > On Mon, Feb 7, 2011 at 12:54 PM, Mihael Hategan wrote: > >> Should be fixed now. >> >> On Mon, 2011-02-07 at 10:03 -0800, Mihael Hategan wrote: >> > Ooops. I didn't get any emails over the weekend (well, it seems I did, >> > but my otherwise reliable email notification didn't work). So I'm a bit >> > behind. >> > >> > Mihael >> > >> > On Fri, 2011-02-04 at 14:44 -0600, Michael Wilde wrote: >> > > Im getting strange errors from trunk at the moment from previously >> working scripts. Seemed to be unable to parse sites.xml. >> > > >> > > In the process of debugging this, I find that I cant get the simplest >> of swift scripts (a single trace statement) to run. >> > > >> > > Is anyone else encountering similar problems? >> > > >> > > Here's what I get: >> > > >> > > com$ cat hi.swift >> > > trace("hi"); >> > > com$ java -version >> > > java version "1.6.0_20" >> > > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) >> > > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) >> > > com$ swift -version >> > > Swift svn swift-r4061 cog-r3046 >> > > >> > > com$ swift hi.swift >> > > Swift svn swift-r4061 cog-r3046 >> > > >> > > RunID: 20110204-1439-j430gp9g >> > > Progress: time:0 >> > > Execution failed: >> > > 1 names specified; 0 arguments found >> > > Time: 1.179, rate: 13896 j/s >> > > com$ cat hi.xml >> > > > > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" >> > > xmlns:xs="http://www.w3.org/2001/XMLSchema"> >> > > >> > > hi >> > > >> > > >> > > com$ >> > > >> > > com$ cat *9g.log >> > > 2011-02-04 14:39:29,205-0600 DEBUG Loader Max heap: 238616576 >> > > 2011-02-04 14:39:29,206-0600 DEBUG Loader kmlversion is >> >a1ce0de8-81e9-4226-987a-0bbcb40af008< >> > > 2011-02-04 14:39:29,207-0600 DEBUG Loader build version is >> >a1ce0de8-81e9-4226-987a-0bbcb40af008< >> > > 2011-02-04 14:39:29,207-0600 DEBUG Loader Recompilation suppressed. >> > > 2011-02-04 14:39:29,343-0600 INFO VDL2ExecutionContext Stack dump: >> > > Level 1 >> > > [iA = 0, iB = 0, bA = false, bB = false] >> > > vdl:instanceconfig = Swift configuration [] >> > > vdl:operation = run >> > > swift.home = /home/wilde/swift/rev/trunk/bin/.. >> > > PATH_SEPARATOR = / >> > > >> > > >> > > 2011-02-04 14:39:29,900-0600 INFO unknown Using sites file: >> /home/wilde/swift/rev/trunk/bin/../etc/sites.xml >> > > 2011-02-04 14:39:29,928-0600 INFO unknown Using tc.data: >> /home/wilde/swift/rev/trunk/bin/../etc/tc.data >> > > 2011-02-04 14:39:30,023-0600 INFO AbstractScheduler Setting resources >> to: {localhost=localhost} >> > > 2011-02-04 14:39:30,468-0600 INFO unknown Swift svn swift-r4061 >> cog-r3046 >> > > >> > > 2011-02-04 14:39:30,469-0600 INFO unknown RUNID >> id=run:20110204-1439-j430gp9g >> > > 2011-02-04 14:39:30,511-0600 DEBUG VDL2ExecutionContext 1 names >> specified; 0 arguments found >> > > 1 names specified; 0 arguments found >> > > >> > > at >> org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) >> > > at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) >> > > at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) >> > > at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> > > at >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) >> > > at >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) >> > > at >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) >> > > at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) >> > > at >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) >> > > at >> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) >> > > at >> org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) >> > > at >> org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) >> > > at >> org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) >> > > at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >> > > at >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> > > at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> > > at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> > > at java.lang.Thread.run(Thread.java:619) >> > > 2011-02-04 14:39:30,522-0600 INFO ExecutionContext Detailed >> exception: >> > > 1 names specified; 0 arguments found >> > > >> > > at >> org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) >> > > at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) >> > > at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) >> > > at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> > > at >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) >> > > at >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) >> > > at >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) >> > > at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) >> > > at >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) >> > > at >> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) >> > > at >> org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) >> > > at >> org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) >> > > at >> org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) >> > > at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >> > > at >> java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> > > at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> > > at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> > > at java.lang.Thread.run(Thread.java:619) >> > > 2011-02-04 14:39:30,522-0600 INFO Loader Swift finished with errors >> > > com$ >> > > >> > > >> > > >> > >> > >> > _______________________________________________ >> > Swift-devel mailing list >> > Swift-devel at ci.uchicago.edu >> > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wozniak at mcs.anl.gov Mon Feb 7 18:57:26 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 7 Feb 2011 19:57:26 -0500 (Eastern Standard Time) Subject: [Swift-devel] Test suite group status and display In-Reply-To: References: <661863645.9716.1296583356570.JavaMail.root@zimbra.anl.gov> Message-ID: Yup, nightly.sh actually does use HTML green/red boxes and links to the script code. However, the results are not indexed or published. The tests are available. On Tue, 1 Feb 2011, Daniel S. Katz wrote: > Ideally, if these really are nightly tests, the swift web page would > have a matrix of these tests vs day that would show green/red boxes, and > would enable a user to click on a test and get to the code to be able to > run that test himself. > > Dan > > > On Feb 1, 2011, at 12:02 PM, Michael Wilde wrote: > >> Im looking for from a way for users to see what tests are available, >> and which are reasonable to run in different settings and for various >> purposes. This should assist new users in initial testing and >> validation of their environment, as well as experienced users in >> knowing what tests are available and working (or not). >> >> Would it be useful to have an option to nightly.sh to display all the >> directories below tests/ that represent valid test groups? >> >> Something like: ./nighly.sh -g >> >> com$ find providers -type d | grep -v svn >> providers >> providers/local-pbs >> providers/local-pbs/pads >> providers/local-pbs/queenbee >> providers/local-cobalt >> providers/local-cobalt/surveyor >> providers/local-cobalt/intrepid >> providers/ssh >> providers/sge-local >> providers/local >> providers/local-pbs-coasters >> providers/ssh-pbs-coasters >> com$ >> >> Would it be useful to put a status file in each group dir to help identify what dirs are really groups, and what the current status of those tests are? >> >> - Mike >> >> >> >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- Justin M Wozniak From dsk at ci.uchicago.edu Mon Feb 7 19:01:32 2011 From: dsk at ci.uchicago.edu (Daniel S. Katz) Date: Mon, 7 Feb 2011 19:01:32 -0600 Subject: [Swift-devel] Test suite group status and display In-Reply-To: References: <661863645.9716.1296583356570.JavaMail.root@zimbra.anl.gov> Message-ID: <4A7AA585-ADB0-4A69-8FAE-3D64C76D942C@ci.uchicago.edu> "not published" doesn't seem like it meets Mike's request of "a way for users to see what tests are available". Can the results be sent to a web page? Dan On Feb 7, 2011, at 6:57 PM, Justin M Wozniak wrote: > > Yup, nightly.sh actually does use HTML green/red boxes and links to the script code. However, the results are not indexed or published. The tests are available. > > On Tue, 1 Feb 2011, Daniel S. Katz wrote: > >> Ideally, if these really are nightly tests, the swift web page would have a matrix of these tests vs day that would show green/red boxes, and would enable a user to click on a test and get to the code to be able to run that test himself. >> >> Dan >> >> >> On Feb 1, 2011, at 12:02 PM, Michael Wilde wrote: >> >>> Im looking for from a way for users to see what tests are available, and which are reasonable to run in different settings and for various purposes. This should assist new users in initial testing and validation of their environment, as well as experienced users in knowing what tests are available and working (or not). >>> >>> Would it be useful to have an option to nightly.sh to display all the directories below tests/ that represent valid test groups? >>> >>> Something like: ./nighly.sh -g >>> >>> com$ find providers -type d | grep -v svn >>> providers >>> providers/local-pbs >>> providers/local-pbs/pads >>> providers/local-pbs/queenbee >>> providers/local-cobalt >>> providers/local-cobalt/surveyor >>> providers/local-cobalt/intrepid >>> providers/ssh >>> providers/sge-local >>> providers/local >>> providers/local-pbs-coasters >>> providers/ssh-pbs-coasters >>> com$ >>> >>> Would it be useful to put a status file in each group dir to help identify what dirs are really groups, and what the current status of those tests are? >>> >>> - Mike >>> >>> >>> >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > > -- > Justin M Wozniak -- Daniel S. Katz University of Chicago (773) 834-7186 (voice) (773) 834-3700 (fax) d.katz at ieee.org or dsk at ci.uchicago.edu http://www.ci.uchicago.edu/~dsk/ From wozniak at mcs.anl.gov Mon Feb 7 19:19:28 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Mon, 7 Feb 2011 20:19:28 -0500 (Eastern Standard Time) Subject: [Swift-devel] Test suite group status and display In-Reply-To: <4A7AA585-ADB0-4A69-8FAE-3D64C76D942C@ci.uchicago.edu> References: <661863645.9716.1296583356570.JavaMail.root@zimbra.anl.gov> <4A7AA585-ADB0-4A69-8FAE-3D64C76D942C@ci.uchicago.edu> Message-ID: Ok, so there are two user groups- users browsing the Swift web page looking for what Swift does and developers using nightly.sh . I stress that nightly.sh is not used as a nightly script and should be renamed, although its core could be wrapped up as such. Recent work has focused on getting nightly.sh to be a useful script for swift-devel types to 1) check that Swift works and 2) build up a test suite for various providers. Sarah and David have been doing quite a bit of the latter recently. HTML is produced as the output of these runs. If some additional infrastructure was in there, it would be possible to run this under cron, index the results wrt time, and post something to the web, as the HTML is pretty nice. However, we are currently focusing on getting Swift to run on the target sites and indexing the results wrt test sites (providers). On Mon, 7 Feb 2011, Daniel S. Katz wrote: > "not published" doesn't seem like it meets Mike's request of "a way for > users to see what tests are available". Can the results be sent to a > web page? > > Dan > > > On Feb 7, 2011, at 6:57 PM, Justin M Wozniak wrote: > >> >> Yup, nightly.sh actually does use HTML green/red boxes and links to the >> script code. However, the results are not indexed or published. The >> tests are available. >> >> On Tue, 1 Feb 2011, Daniel S. Katz wrote: >> >>> Ideally, if these really are nightly tests, the swift web page would >>> have a matrix of these tests vs day that would show green/red boxes, >>> and would enable a user to click on a test and get to the code to be >>> able to run that test himself. >>> >>> Dan >>> >>> >>> On Feb 1, 2011, at 12:02 PM, Michael Wilde wrote: >>> >>>> Im looking for from a way for users to see what tests are available, >>>> and which are reasonable to run in different settings and for various >>>> purposes. This should assist new users in initial testing and >>>> validation of their environment, as well as experienced users in >>>> knowing what tests are available and working (or not). >>>> >>>> Would it be useful to have an option to nightly.sh to display all the >>>> directories below tests/ that represent valid test groups? >>>> >>>> Something like: ./nighly.sh -g >>>> >>>> com$ find providers -type d | grep -v svn >>>> providers >>>> providers/local-pbs >>>> providers/local-pbs/pads >>>> providers/local-pbs/queenbee >>>> providers/local-cobalt >>>> providers/local-cobalt/surveyor >>>> providers/local-cobalt/intrepid >>>> providers/ssh >>>> providers/sge-local >>>> providers/local >>>> providers/local-pbs-coasters >>>> providers/ssh-pbs-coasters >>>> com$ >>>> >>>> Would it be useful to put a status file in each group dir to help identify what dirs are really groups, and what the current status of those tests are? >>>> >>>> - Mike >>>> >>>> >>>> >>>> >>>> -- >>>> Michael Wilde >>>> Computation Institute, University of Chicago >>>> Mathematics and Computer Science Division >>>> Argonne National Laboratory >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >> >> -- >> Justin M Wozniak > > -- Justin M Wozniak From wilde at mcs.anl.gov Mon Feb 7 20:00:38 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 7 Feb 2011 20:00:38 -0600 (CST) Subject: [Swift-devel] Swift trunk broken? In-Reply-To: Message-ID: <667749620.29971.1297130438445.JavaMail.root@zimbra.anl.gov> Tim, I saw the same problem. The similar problem I reported occurred when I tried a simpler script to test basic sanity. I thought they were related but apparently not. - Mike ----- Original Message ----- P.S. I am doing a clean build from the the latest svn versions of swift and cog On Mon, Feb 7, 2011 at 4:33 PM, Tim Armstrong < tim.g.armstrong at gmail.com > wrote: I've run into a different problem... Swift fails with the following exception: Execution failed: java.lang.NullPointerException at org.globus.cog.karajan.Optimizer.optimize0(Optimizer.java:36) at org.globus.cog.karajan.Optimizer.optimize(Optimizer.java:28) at org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:207) at org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:192) at org.globus.cog.karajan.workflow.nodes.ExecuteFile.post(ExecuteFile.java:128) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) at org.globus.cog.karajan.workflow.nodes.ExecuteFile.completed(ExecuteFile.java:155) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) Time: 0.535, rate: 30624 j/s Nothing really appears in the logs: 2011-02-07 16:32:22,668-0600 DEBUG Loader Loader started - Tim On Mon, Feb 7, 2011 at 12:54 PM, Mihael Hategan < hategan at mcs.anl.gov > wrote: Should be fixed now. On Mon, 2011-02-07 at 10:03 -0800, Mihael Hategan wrote: > Ooops. I didn't get any emails over the weekend (well, it seems I did, > but my otherwise reliable email notification didn't work). So I'm a bit > behind. > > Mihael > > On Fri, 2011-02-04 at 14:44 -0600, Michael Wilde wrote: > > Im getting strange errors from trunk at the moment from previously working scripts. Seemed to be unable to parse sites.xml. > > > > In the process of debugging this, I find that I cant get the simplest of swift scripts (a single trace statement) to run. > > > > Is anyone else encountering similar problems? > > > > Here's what I get: > > > > com$ cat hi.swift > > trace("hi"); > > com$ java -version > > java version "1.6.0_20" > > Java(TM) SE Runtime Environment (build 1.6.0_20-b02) > > Java HotSpot(TM) 64-Bit Server VM (build 16.3-b01, mixed mode) > > com$ swift -version > > Swift svn swift-r4061 cog-r3046 > > > > com$ swift hi.swift > > Swift svn swift-r4061 cog-r3046 > > > > RunID: 20110204-1439-j430gp9g > > Progress: time:0 > > Execution failed: > > 1 names specified; 0 arguments found > > Time: 1.179, rate: 13896 j/s > > com$ cat hi.xml > > > xmlns:xsi=" http://www.w3.org/2001/XMLSchema-instance " > > xmlns:xs=" http://www.w3.org/2001/XMLSchema "> > > > > hi > > > > > > com$ > > > > com$ cat *9g.log > > 2011-02-04 14:39:29,205-0600 DEBUG Loader Max heap: 238616576 > > 2011-02-04 14:39:29,206-0600 DEBUG Loader kmlversion is >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > 2011-02-04 14:39:29,207-0600 DEBUG Loader build version is >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > 2011-02-04 14:39:29,207-0600 DEBUG Loader Recompilation suppressed. > > 2011-02-04 14:39:29,343-0600 INFO VDL2ExecutionContext Stack dump: > > Level 1 > > [iA = 0, iB = 0, bA = false, bB = false] > > vdl:instanceconfig = Swift configuration [] > > vdl:operation = run > > swift.home = /home/wilde/swift/rev/trunk/bin/.. > > PATH_SEPARATOR = / > > > > > > 2011-02-04 14:39:29,900-0600 INFO unknown Using sites file: /home/wilde/swift/rev/trunk/bin/../etc/sites.xml > > 2011-02-04 14:39:29,928-0600 INFO unknown Using tc.data: /home/wilde/swift/rev/trunk/bin/../etc/tc.data > > 2011-02-04 14:39:30,023-0600 INFO AbstractScheduler Setting resources to: {localhost=localhost} > > 2011-02-04 14:39:30,468-0600 INFO unknown Swift svn swift-r4061 cog-r3046 > > > > 2011-02-04 14:39:30,469-0600 INFO unknown RUNID id=run:20110204-1439-j430gp9g > > 2011-02-04 14:39:30,511-0600 DEBUG VDL2ExecutionContext 1 names specified; 0 arguments found > > 1 names specified; 0 arguments found > > > > at org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:619) > > 2011-02-04 14:39:30,522-0600 INFO ExecutionContext Detailed exception: > > 1 names specified; 0 arguments found > > > > at org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:619) > > 2011-02-04 14:39:30,522-0600 INFO Loader Swift finished with errors > > com$ > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Tue Feb 8 19:02:28 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Tue, 8 Feb 2011 17:02:28 -0800 Subject: [Swift-devel] (very) rough draft of changes in swift version Message-ID: hey all, i started this page as a draft of the changes for the new swift version (with the intention of migrating to the web page once we make the new release available). it's a first pass and doesn't include cog changes at the moment. this is derived from the svn log...feel free to make changes, as i'm sure i missed things that should be there (or added superfluous items)...as i said, very rough :) http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ChangeLog ~sk -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Feb 8 20:24:04 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 08 Feb 2011 18:24:04 -0800 Subject: [Swift-devel] (very) rough draft of changes in swift version In-Reply-To: References: Message-ID: <1297218244.20393.0.camel@blabla2.none> Wow. Nice. I lost track of all the stuff that went in there. On Tue, 2011-02-08 at 17:02 -0800, Sarah Kenny wrote: > hey all, i started this page as a draft of the changes for the new > swift version (with the intention of migrating to the web page once we > make the new release available). it's a first pass and doesn't include > cog changes at the moment. this is derived from the svn log...feel > free to make changes, as i'm sure i missed things that should be there > (or added superfluous items)...as i said, very rough :) > > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ChangeLog > > ~sk > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From wozniak at mcs.anl.gov Thu Feb 10 10:19:17 2011 From: wozniak at mcs.anl.gov (Justin M Wozniak) Date: Thu, 10 Feb 2011 10:19:17 -0600 (CST) Subject: [Swift-devel] (very) rough draft of changes in swift version In-Reply-To: <1297218244.20393.0.camel@blabla2.none> References: <1297218244.20393.0.camel@blabla2.none> Message-ID: Looks good! Sorry I missed the call... On Tue, 8 Feb 2011, Mihael Hategan wrote: > Wow. Nice. I lost track of all the stuff that went in there. > > On Tue, 2011-02-08 at 17:02 -0800, Sarah Kenny wrote: >> hey all, i started this page as a draft of the changes for the new >> swift version (with the intention of migrating to the web page once we >> make the new release available). it's a first pass and doesn't include >> cog changes at the moment. this is derived from the svn log...feel >> free to make changes, as i'm sure i missed things that should be there >> (or added superfluous items)...as i said, very rough :) >> >> http://www.ci.uchicago.edu/wiki/bin/view/SWFT/ChangeLog >> >> ~sk >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Justin M Wozniak From wilde at mcs.anl.gov Thu Feb 10 10:19:27 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 10 Feb 2011 10:19:27 -0600 (CST) Subject: [Swift-devel] Proposed sites.xml management for 0.92 release In-Reply-To: <967622245.40497.1297353512839.JavaMail.root@zimbra.anl.gov> Message-ID: <599762557.40635.1297354767679.JavaMail.root@zimbra.anl.gov> Based on discussions in the last 2 weekly meetings, I want to propose this for 0.92, based on Justin's page http://www.ci.uchicago.edu/wiki/bin/view/SWFT/CoastersCookbook under "Script snippets". Im proposing here a few name changes, specific file location conventions, but basically staying in the spirit of Justin's scripts. gensites.sh -> etc/gensites settings.sh -> merged into a local swift.properties file The swift command "sites" will generate a sites.xml file in the current directory based on the users "site settings" (obtained from swift.properties) and one or more template files that are provided in the Swift release and which can be augmented with the user's own template collection(s). User selects a site template from $SWIFT/etc/sites gensites templates >sites.xml gensites -p sites.properties templates >sites.xml gensites -L template.dir templates >sites.xml The user typically runs gensites only once per run directory Templates are searched for in the following locations: - current directory - etc/sites in the Swift directory - $HOME/.swift/sites in the user's home directory - any -L directories specified on the gensites command line The user's sites-related properties are obtained from: - sites.properties in the current directory - the -p file specified on the gensites command line - $HOME/.swift/sites.properties - $HOME/.swift/swift.properties lines with starting with #site Obviously the fine details of this need to be nailed down; there's some room for discussion for how simple and/or flexible to make this. As long as the search and editing rules are clearly stated, I think users will be well served and happy. Each template should have comments describing its use, nature, and user-changeable settings and defaults. The command should flag required parameters that were not specified in the user's settings. The generated sites.xml file could contain comments stating where the options and templates came from. Im putting this text in: http://www.ci.uchicago.edu/wiki/bin/view/SWFT/GenSites and we can document specific site file templates in: http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SiteFiles - Mike -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Thu Feb 10 15:43:08 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 10 Feb 2011 15:43:08 -0600 (CST) Subject: [Swift-devel] Fix needed for ppn for non-coaster pbs provider Message-ID: <124038108.43151.1297374188489.JavaMail.root@zimbra.anl.gov> The recent ppn changes need a small change to work in the case of the PBS provider running without coasters. This causes it to put this in the .submit file: #PBS -l ppn=8 ...which PBS rejects. The line needs to be: #PBS -l nodes=1:ppn=8 (as alluded to in the comments in PBSExecutor.java) Fixing it as above when count is not specified seems to work on PADS. svn diff is below. I did not commit this. Should I? To trunk, 0.92 branch, or both? - Mike login1$ cd /home/wilde/swift/src/0.92/cog/modules/provider-localscheduler/ login1$ svn diff Index: src/org/globus/cog/abstraction/impl/scheduler/pbs/PBSExecutor.java =================================================================== --- src/org/globus/cog/abstraction/impl/scheduler/pbs/PBSExecutor.java (revision 3046) +++ src/org/globus/cog/abstraction/impl/scheduler/pbs/PBSExecutor.java (working copy) @@ -68,7 +68,7 @@ // 1. assuming count=1 when count is missing // 2. not specifying PPN when count is missing // ... are any better - wr.write("#PBS -l ppn=" + ppn + "\n"); + wr.write("#PBS -l nodes=1:ppn=" + ppn + "\n"); } } login1$ -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Fri Feb 11 10:49:14 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 11 Feb 2011 10:49:14 -0600 (CST) Subject: [Swift-devel] Proposed sites.xml management for 0.92 release In-Reply-To: <599762557.40635.1297354767679.JavaMail.root@zimbra.anl.gov> Message-ID: <1967949324.45848.1297442954624.JavaMail.root@zimbra.anl.gov> Sarah, David, While there has been no response on the email list on this, Justin tells me that he's OK with it. He wont be able to work on it due to other priorities, though. Can you both collaborate to implement it? Its very similar to what you did with swiftconfig, David. Lets first resolve: should it be in shell, python, or perl? (I favor these in the order listed) I think we need this asap. Can the two of you work together to make it happen? Lets use this email thread and the wiki page to work out the details, on swift-devel. The main steps are: - create the initial command - create and test a library of templates (drawing the templates from the test suite?) - create the doc page content (wiki) for end users: http://www.ci.uchicago.edu/wiki/bin/view/SWFT/GenSites -- should describe the command(s) and conventions -- should describe the templates, perhaps with comments in each template -- gensites -T # lists all template -- gensites -h # gives basic help -- gensites -h template # gives the help for specific template(s) - add any support for genapps? (gentc.sh) Do you want to do a voice call or txt chat to get this going, or should we do it all in email? - Mike ----- Original Message ----- > Based on discussions in the last 2 weekly meetings, I want to propose > this for 0.92, based on Justin's page > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/CoastersCookbook under > "Script snippets". > > Im proposing here a few name changes, specific file location > conventions, but basically staying in the spirit of Justin's scripts. > > gensites.sh -> etc/gensites > settings.sh -> merged into a local swift.properties file > > The swift command "sites" will generate a sites.xml file in the > current directory based on the users "site settings" (obtained from > swift.properties) and one or more template files that are provided in > the Swift release and which can be augmented with the user's own > template collection(s). > > User selects a site template from $SWIFT/etc/sites > > gensites templates >sites.xml > > gensites -p sites.properties templates >sites.xml > > gensites -L template.dir templates >sites.xml > > The user typically runs gensites only once per run directory > > Templates are searched for in the following locations: > > - current directory > - etc/sites in the Swift directory > - $HOME/.swift/sites in the user's home directory > - any -L directories specified on the gensites command line > > The user's sites-related properties are obtained from: > > - sites.properties in the current directory > - the -p file specified on the gensites command line > - $HOME/.swift/sites.properties > - $HOME/.swift/swift.properties lines with starting with #site > > Obviously the fine details of this need to be nailed down; there's > some room for discussion for how simple and/or flexible to make this. > As long as the search and editing rules are clearly stated, I think > users will be well served and happy. > > Each template should have comments describing its use, nature, and > user-changeable settings and defaults. > > The command should flag required parameters that were not specified in > the user's settings. The generated sites.xml file could contain > comments stating where the options and templates came from. > > Im putting this text in: > > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/GenSites > > and we can document specific site file templates in: > > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SiteFiles > > - Mike > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From skenny at uchicago.edu Fri Feb 11 17:03:43 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Fri, 11 Feb 2011 15:03:43 -0800 Subject: [Swift-devel] Proposed sites.xml management for 0.92 release In-Reply-To: <1967949324.45848.1297442954624.JavaMail.root@zimbra.anl.gov> References: <599762557.40635.1297354767679.JavaMail.root@zimbra.anl.gov> <1967949324.45848.1297442954624.JavaMail.root@zimbra.anl.gov> Message-ID: i can do this most quickly in python if that's ok with others...david, could you work on gathering the (working) templates from the testing suite into a directory under etc/ (and commmit to svn) where i can pull them from? On Fri, Feb 11, 2011 at 8:49 AM, Michael Wilde wrote: > Sarah, David, > > While there has been no response on the email list on this, Justin tells me > that he's OK with it. He wont be able to work on it due to other priorities, > though. > > Can you both collaborate to implement it? Its very similar to what you did > with swiftconfig, David. > > Lets first resolve: should it be in shell, python, or perl? (I favor these > in the order listed) > > I think we need this asap. Can the two of you work together to make it > happen? > > Lets use this email thread and the wiki page to work out the details, on > swift-devel. > > The main steps are: > > - create the initial command > - create and test a library of templates > (drawing the templates from the test suite?) > - create the doc page content (wiki) for end users: > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/GenSites > -- should describe the command(s) and conventions > -- should describe the templates, perhaps with comments in each template > -- gensites -T # lists all template > -- gensites -h # gives basic help > -- gensites -h template # gives the help for specific template(s) > - add any support for genapps? (gentc.sh) > > Do you want to do a voice call or txt chat to get this going, or should we > do it all in email? > > - Mike > > > ----- Original Message ----- > > Based on discussions in the last 2 weekly meetings, I want to propose > > this for 0.92, based on Justin's page > > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/CoastersCookbook under > > "Script snippets". > > > > Im proposing here a few name changes, specific file location > > conventions, but basically staying in the spirit of Justin's scripts. > > > > gensites.sh -> etc/gensites > > settings.sh -> merged into a local swift.properties file > > > > The swift command "sites" will generate a sites.xml file in the > > current directory based on the users "site settings" (obtained from > > swift.properties) and one or more template files that are provided in > > the Swift release and which can be augmented with the user's own > > template collection(s). > > > > User selects a site template from $SWIFT/etc/sites > > > > gensites templates >sites.xml > > > > gensites -p sites.properties templates >sites.xml > > > > gensites -L template.dir templates >sites.xml > > > > The user typically runs gensites only once per run directory > > > > Templates are searched for in the following locations: > > > > - current directory > > - etc/sites in the Swift directory > > - $HOME/.swift/sites in the user's home directory > > - any -L directories specified on the gensites command line > > > > The user's sites-related properties are obtained from: > > > > - sites.properties in the current directory > > - the -p file specified on the gensites command line > > - $HOME/.swift/sites.properties > > - $HOME/.swift/swift.properties lines with starting with #site > > > > Obviously the fine details of this need to be nailed down; there's > > some room for discussion for how simple and/or flexible to make this. > > As long as the search and editing rules are clearly stated, I think > > users will be well served and happy. > > > > Each template should have comments describing its use, nature, and > > user-changeable settings and defaults. > > > > The command should flag required parameters that were not specified in > > the user's settings. The generated sites.xml file could contain > > comments stating where the options and templates came from. > > > > Im putting this text in: > > > > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/GenSites > > > > and we can document specific site file templates in: > > > > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SiteFiles > > > > - Mike > > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Fri Feb 11 20:19:59 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 11 Feb 2011 20:19:59 -0600 (CST) Subject: [Swift-devel] Proposed sites.xml management for 0.92 release In-Reply-To: Message-ID: <1846104324.48505.1297477199375.JavaMail.root@zimbra.anl.gov> Great, thanks, Sarah. How about using etc/sites (which exists) for the templates. No sub directories for now (I think we can move all the current subdirs to OLD/ ) - Mike i can do this most quickly in python if that's ok with others...david, could you work on gathering the (working) templates from the testing suite into a directory under etc/ (and commmit to svn) where i can pull them from? On Fri, Feb 11, 2011 at 8:49 AM, Michael Wilde < wilde at mcs.anl.gov > wrote: Sarah, David, While there has been no response on the email list on this, Justin tells me that he's OK with it. He wont be able to work on it due to other priorities, though. Can you both collaborate to implement it? Its very similar to what you did with swiftconfig, David. Lets first resolve: should it be in shell, python, or perl? (I favor these in the order listed) I think we need this asap. Can the two of you work together to make it happen? Lets use this email thread and the wiki page to work out the details, on swift-devel. The main steps are: - create the initial command - create and test a library of templates (drawing the templates from the test suite?) - create the doc page content (wiki) for end users: http://www.ci.uchicago.edu/wiki/bin/view/SWFT/GenSites -- should describe the command(s) and conventions -- should describe the templates, perhaps with comments in each template -- gensites -T # lists all template -- gensites -h # gives basic help -- gensites -h template # gives the help for specific template(s) - add any support for genapps? (gentc.sh) Do you want to do a voice call or txt chat to get this going, or should we do it all in email? - Mike ----- Original Message ----- > Based on discussions in the last 2 weekly meetings, I want to propose > this for 0.92, based on Justin's page > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/CoastersCookbook under > "Script snippets". > > Im proposing here a few name changes, specific file location > conventions, but basically staying in the spirit of Justin's scripts. > > gensites.sh -> etc/gensites > settings.sh -> merged into a local swift.properties file > > The swift command "sites" will generate a sites.xml file in the > current directory based on the users "site settings" (obtained from > swift.properties) and one or more template files that are provided in > the Swift release and which can be augmented with the user's own > template collection(s). > > User selects a site template from $SWIFT/etc/sites > > gensites templates >sites.xml > > gensites -p sites.properties templates >sites.xml > > gensites -L template.dir templates >sites.xml > > The user typically runs gensites only once per run directory > > Templates are searched for in the following locations: > > - current directory > - etc/sites in the Swift directory > - $HOME/.swift/sites in the user's home directory > - any -L directories specified on the gensites command line > > The user's sites-related properties are obtained from: > > - sites.properties in the current directory > - the -p file specified on the gensites command line > - $HOME/.swift/sites.properties > - $HOME/.swift/swift.properties lines with starting with #site > > Obviously the fine details of this need to be nailed down; there's > some room for discussion for how simple and/or flexible to make this. > As long as the search and editing rules are clearly stated, I think > users will be well served and happy. > > Each template should have comments describing its use, nature, and > user-changeable settings and defaults. > > The command should flag required parameters that were not specified in > the user's settings. The generated sites.xml file could contain > comments stating where the options and templates came from. > > Im putting this text in: > > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/GenSites > > and we can document specific site file templates in: > > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SiteFiles > > - Mike > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From dk0966 at cs.ship.edu Fri Feb 11 22:51:19 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Fri, 11 Feb 2011 23:51:19 -0500 Subject: [Swift-devel] Proposed sites.xml management for 0.92 release In-Reply-To: <1967949324.45848.1297442954624.JavaMail.root@zimbra.anl.gov> References: <599762557.40635.1297354767679.JavaMail.root@zimbra.anl.gov> <1967949324.45848.1297442954624.JavaMail.root@zimbra.anl.gov> Message-ID: Would it be easier to modify the existing swiftconfig to adjust to the new format of templates? The main steps that were outlined are nearly already completed with swiftconfig - The commands are already there - A set of templates already exists, but would most likely be replaced with the ones verified by automated testing in the format Justin specified - A good start for documentation using swiftconfig on a variety of configurations is at http://www.ci.uchicago.edu/wiki/bin/view/SWFT/LearningSwift - Documentation for commands and syntax is there, swiftconfig -h and swiftrun -h - List all templates with swiftconfig -list templates (already knows the correct order of where to look for templates) - Help for specific templates is a good idea. That would be pretty straightforward to add - Support for applications and application groups is already there If we started over it seems like we would be duplicating a lot of code -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Sat Feb 12 15:59:11 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Sat, 12 Feb 2011 15:59:11 -0600 Subject: [Swift-devel] Re: fixed_array_mapper versus array_mapper on output In-Reply-To: References: Message-ID: Moving thread to devel for brainstorming possible solutions: 1. implement a join() function: @join(array, ", "); 2. fix array_mapper 2011/2/12 Allan Espinosa : > For an array output data structure, the two mappers behave differently > > type file; > > app(file o[]) > ? ?split(file i){ > ?split "-l" 1 @filename(i) "seqout."; > } > > /*file out[] "seqout.ac", ?// Does not work > ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "seqout.ad"]>;*/ > file out[] ; // Works > > file input <"seq.in">; > out = split(input); > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Sat Feb 12 16:02:54 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 12 Feb 2011 14:02:54 -0800 Subject: [Swift-devel] Re: fixed_array_mapper versus array_mapper on output In-Reply-To: References: Message-ID: <1297548174.13877.0.camel@blabla2.none> Somebody remind me why we have those both since they seem to be intended to do the same thing. On Sat, 2011-02-12 at 15:59 -0600, Allan Espinosa wrote: > Moving thread to devel for brainstorming possible solutions: > > 1. implement a join() function: @join(array, ", "); > 2. fix array_mapper > > 2011/2/12 Allan Espinosa : > > For an array output data structure, the two mappers behave differently > > > > type file; > > > > app(file o[]) > > split(file i){ > > split "-l" 1 @filename(i) "seqout."; > > } > > > > /*file out[] > "seqout.ac", // Does not work > > "seqout.ad"]>;*/ > > file out[] ; // Works > > > > file input <"seq.in">; > > out = split(input); > > > > > > From benc at hawaga.org.uk Sat Feb 12 16:40:07 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 12 Feb 2011 22:40:07 +0000 (GMT) Subject: [Swift-devel] Re: fixed_array_mapper versus array_mapper on output In-Reply-To: <1297548174.13877.0.camel@blabla2.none> References: <1297548174.13877.0.camel@blabla2.none> Message-ID: > Somebody remind me why we have those both since they seem to be intended > to do the same thing. Array mapper is bug 27, r750 and r764. I think maybe at the time there were no array literals in arbitrary locations? (I think there didn't used to be early on...?) Which would make it not quite a superset of the fixed_array_mapper at the time it was introduced, though it looks like it is now (except apparently its broken, so it isn't...) -- From hategan at mcs.anl.gov Sat Feb 12 16:52:40 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 12 Feb 2011 14:52:40 -0800 Subject: [Swift-devel] Re: fixed_array_mapper versus array_mapper on output In-Reply-To: References: <1297548174.13877.0.camel@blabla2.none> Message-ID: <1297551160.14139.4.camel@blabla2.none> On Sat, 2011-02-12 at 22:40 +0000, Ben Clifford wrote: > > Somebody remind me why we have those both since they seem to be intended > > to do the same thing. > > Array mapper is bug 27, r750 and r764. > > I think maybe at the time there were no array literals in arbitrary > locations? (I think there didn't used to be early on...?) > > Which would make it not quite a superset of the fixed_array_mapper at the > time it was introduced, though it looks like it is now (except apparently > its broken, so it isn't...) > Ben, your emails are usually epitomes of clarity. This one, not so much. I know that we used the fixed array mapper to allow returning arrays from an app. It's isStatic() returns true, whereas array_mapper's isStatic() returns false. So perhaps it's a matter of making array_mapper "static". From benc at hawaga.org.uk Sat Feb 12 17:10:16 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Sat, 12 Feb 2011 23:10:16 +0000 (GMT) Subject: [Swift-devel] Re: fixed_array_mapper versus array_mapper on output In-Reply-To: <1297551160.14139.4.camel@blabla2.none> References: <1297548174.13877.0.camel@blabla2.none> <1297551160.14139.4.camel@blabla2.none> Message-ID: > I know that we used the fixed array mapper to allow returning arrays > from an app. It's isStatic() returns true, whereas array_mapper's > isStatic() returns false. So perhaps it's a matter of making > array_mapper "static". Perhaps. The array specifying the mappings is always known beforehand so I don't think that should break anything conceptually. To elaborate on my previous comment in other email: I think the example allan gave, specifying a literal array as a parameter to the mapper, did not used to work - I think you could not specify such array literals arbitrarily. You could only specify them in an assignment statement. They were a specialised assignment syntax, rather than a simple "lvalue = rvalue". At some point, I think I made array literals possible as first order values anywhere you could put an expression. That change meant (amongst other things) that the example that Allan gave became valid. So when array_mapper was introduced, before real array literals, it was complementary to the fixed_array_mapper. The latter you could pass a literal string to, whilst the former you had to name an array that already existed somehow - you could not specify a literal array value in the mapper parameter. You could define "myarray" somewhere else and give it some elements, and then specify files=myarray, but you could not specify myarray=["one","two"]. The introduction of array literals made array_mapper able to take literal mappings in the mapping expression without needing to separately declare an array variable and populate it - at that point, fixed_array_mapper became pretty much redundant. -- From hategan at mcs.anl.gov Sat Feb 12 17:31:04 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 12 Feb 2011 15:31:04 -0800 Subject: [Swift-devel] Re: fixed_array_mapper versus array_mapper on output In-Reply-To: References: <1297548174.13877.0.camel@blabla2.none> <1297551160.14139.4.camel@blabla2.none> Message-ID: <1297553464.14386.8.camel@blabla2.none> fixed_array_mapper, through the syntax it uses for files= guarantees that the number of elements in the array, and their names, are known when an app is ready to run. Can the same be said about array_mapper? I believe that it would not make sense to allow an app returning an array mapped with array_mapper to run before the respective files= array parameter to the mapper, as well as all its elements, are closed. While that may work in theory, I'm afraid that the way things are implemented right now does not blend with that idea very well. The only place where isStatic() is used is in RootDataNode.checkInputs(). I believe that the array_mapper would work in Allan's scenario if, somehow, the closing of the files= parameter would also cause the mapped array to close. On Sat, 2011-02-12 at 23:10 +0000, Ben Clifford wrote: > I think the example allan gave, specifying a literal array as a parameter > to the mapper, did not used to work - I think you could not specify such > array literals arbitrarily. You could only specify them in an assignment > statement. They were a specialised assignment syntax, rather than a simple > "lvalue = rvalue". > > At some point, I think I made array literals possible as first order > values anywhere you could put an expression. That change meant (amongst > other things) that the example that Allan gave became valid. > > So when array_mapper was introduced, before real array literals, it was > complementary to the fixed_array_mapper. The latter you could pass a > literal string to, whilst the former you had to name an array that already > existed somehow - you could not specify a literal array value in the > mapper parameter. You could define "myarray" somewhere else and give it > some elements, and then specify files=myarray, but you could not specify > myarray=["one","two"]. > > The introduction of array literals made array_mapper able to take literal > mappings in the mapping expression without needing to separately declare > an array variable and populate it - at that point, fixed_array_mapper > became pretty much redundant. > From benc at hawaga.org.uk Sat Feb 12 18:17:09 2011 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 13 Feb 2011 00:17:09 +0000 (GMT) Subject: [Swift-devel] Re: fixed_array_mapper versus array_mapper on output In-Reply-To: <1297553464.14386.8.camel@blabla2.none> References: <1297548174.13877.0.camel@blabla2.none> <1297551160.14139.4.camel@blabla2.none> <1297553464.14386.8.camel@blabla2.none> Message-ID: > I believe that it would not > make sense to allow an app returning an array mapped with array_mapper > to run before the respective files= array parameter to the mapper, as > well as all its elements, are closed. I agree with that. Its more general than what you describe here, though: > I believe that the array_mapper would work in Allan's scenario if, > somehow, the closing of the files= parameter would also cause the mapped > array to close. -- From hategan at mcs.anl.gov Sun Feb 13 21:31:08 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 13 Feb 2011 19:31:08 -0800 Subject: [Swift-devel] Swift trunk broken? In-Reply-To: <667749620.29971.1297130438445.JavaMail.root@zimbra.anl.gov> References: <667749620.29971.1297130438445.JavaMail.root@zimbra.anl.gov> Message-ID: <1297654268.30292.5.camel@blabla2.none> Fixed in cog r3051. It would however be useful to see the swift script that caused this. On Mon, 2011-02-07 at 20:00 -0600, Michael Wilde wrote: > Tim, I saw the same problem. The similar problem I reported occurred > when I tried a simpler script to test basic sanity. I thought they > were related but apparently not. > > > - Mike > > > > ______________________________________________________________________ > P.S. I am doing a clean build from the the latest svn > versions of swift and cog > > On Mon, Feb 7, 2011 at 4:33 PM, Tim Armstrong > wrote: > I've run into a different problem... > > Swift fails with the following exception: > > Execution failed: > java.lang.NullPointerException > at > org.globus.cog.karajan.Optimizer.optimize0(Optimizer.java:36) > at > org.globus.cog.karajan.Optimizer.optimize(Optimizer.java:28) > at > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:207) > at > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:192) > at > org.globus.cog.karajan.workflow.nodes.ExecuteFile.post(ExecuteFile.java:128) > > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > at > org.globus.cog.karajan.workflow.nodes.ExecuteFile.completed(ExecuteFile.java:155) > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > at > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at java.util.concurrent.Executors > $RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask > $Sync.innerRun(FutureTask.java:303) > at > java.util.concurrent.FutureTask.run(FutureTask.java:138) > at java.util.concurrent.ThreadPoolExecutor > $Worker.runTask(ThreadPoolExecutor.java:886) > at java.util.concurrent.ThreadPoolExecutor > $Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:619) > > > Time: 0.535, rate: 30624 j/s > > Nothing really appears in the logs: > 2011-02-07 16:32:22,668-0600 DEBUG Loader Loader > started > > > - Tim > > > > > On Mon, Feb 7, 2011 at 12:54 PM, Mihael Hategan > wrote: > Should be fixed now. > > > On Mon, 2011-02-07 at 10:03 -0800, Mihael > Hategan wrote: > > Ooops. I didn't get any emails over the > weekend (well, it seems I did, > > but my otherwise reliable email notification > didn't work). So I'm a bit > > behind. > > > > Mihael > > > > On Fri, 2011-02-04 at 14:44 -0600, Michael > Wilde wrote: > > > Im getting strange errors from trunk at > the moment from previously working scripts. > Seemed to be unable to parse sites.xml. > > > > > > In the process of debugging this, I find > that I cant get the simplest of swift scripts > (a single trace statement) to run. > > > > > > Is anyone else encountering similar > problems? > > > > > > Here's what I get: > > > > > > com$ cat hi.swift > > > trace("hi"); > > > com$ java -version > > > java version "1.6.0_20" > > > Java(TM) SE Runtime Environment (build > 1.6.0_20-b02) > > > Java HotSpot(TM) 64-Bit Server VM (build > 16.3-b01, mixed mode) > > > com$ swift -version > > > Swift svn swift-r4061 cog-r3046 > > > > > > com$ swift hi.swift > > > Swift svn swift-r4061 cog-r3046 > > > > > > RunID: 20110204-1439-j430gp9g > > > Progress: time:0 > > > Execution failed: > > > 1 names specified; 0 arguments > found > > > Time: 1.179, rate: 13896 j/s > > > com$ cat hi.xml > > > xmlns="http://ci.uchicago.edu/swift/2009/02/swiftscript" > > > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > > > > xmlns:xs="http://www.w3.org/2001/XMLSchema"> > > > > > > > hi > > > > > > > > > com$ > > > > > > com$ cat *9g.log > > > 2011-02-04 14:39:29,205-0600 DEBUG Loader > Max heap: 238616576 > > > 2011-02-04 14:39:29,206-0600 DEBUG Loader > kmlversion is > >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > > 2011-02-04 14:39:29,207-0600 DEBUG Loader > build version is > >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > > 2011-02-04 14:39:29,207-0600 DEBUG Loader > Recompilation suppressed. > > > 2011-02-04 14:39:29,343-0600 INFO > VDL2ExecutionContext Stack dump: > > > Level 1 > > > [iA = 0, iB = 0, bA = false, bB = false] > > > vdl:instanceconfig = Swift > configuration [] > > > vdl:operation = run > > > swift.home > = /home/wilde/swift/rev/trunk/bin/.. > > > PATH_SEPARATOR = / > > > > > > > > > 2011-02-04 14:39:29,900-0600 INFO unknown > Using sites > file: /home/wilde/swift/rev/trunk/bin/../etc/sites.xml > > > 2011-02-04 14:39:29,928-0600 INFO unknown > Using > tc.data: /home/wilde/swift/rev/trunk/bin/../etc/tc.data > > > 2011-02-04 14:39:30,023-0600 INFO > AbstractScheduler Setting resources to: > {localhost=localhost} > > > 2011-02-04 14:39:30,468-0600 INFO unknown > Swift svn swift-r4061 cog-r3046 > > > > > > 2011-02-04 14:39:30,469-0600 INFO unknown > RUNID id=run:20110204-1439-j430gp9g > > > 2011-02-04 14:39:30,511-0600 DEBUG > VDL2ExecutionContext 1 names specified; 0 > arguments found > > > 1 names specified; 0 arguments found > > > > > > at > org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > at > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > at > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > at > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > at java.util.concurrent.Executors > $RunnableAdapter.call(Executors.java:441) > > > at java.util.concurrent.FutureTask > $Sync.innerRun(FutureTask.java:303) > > > at > java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > at > java.util.concurrent.ThreadPoolExecutor > $Worker.runTask(ThreadPoolExecutor.java:886) > > > at > java.util.concurrent.ThreadPoolExecutor > $Worker.run(ThreadPoolExecutor.java:908) > > > at > java.lang.Thread.run(Thread.java:619) > > > 2011-02-04 14:39:30,522-0600 INFO > ExecutionContext Detailed exception: > > > 1 names specified; 0 arguments found > > > > > > at > org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > at > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > at > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > at > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > at java.util.concurrent.Executors > $RunnableAdapter.call(Executors.java:441) > > > at java.util.concurrent.FutureTask > $Sync.innerRun(FutureTask.java:303) > > > at > java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > at > java.util.concurrent.ThreadPoolExecutor > $Worker.runTask(ThreadPoolExecutor.java:886) > > > at > java.util.concurrent.ThreadPoolExecutor > $Worker.run(ThreadPoolExecutor.java:908) > > > at > java.lang.Thread.run(Thread.java:619) > > > 2011-02-04 14:39:30,522-0600 INFO Loader > Swift finished with errors > > > com$ > > > > > > > > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > From wilde at mcs.anl.gov Sun Feb 13 22:04:57 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 13 Feb 2011 22:04:57 -0600 (CST) Subject: [Swift-devel] Swift trunk broken? In-Reply-To: <1297654268.30292.5.camel@blabla2.none> Message-ID: <602491213.50794.1297656297525.JavaMail.root@zimbra.anl.gov> Cool - will test. Here's an example of the failure. Tiny swift script, but a lengthy (OSG) sites.xml: login1$ swift -config swift.properties -sites.file coaster_osg.xml tsleep.swift Execution failed: java.lang.NullPointerException at org.globus.cog.karajan.Optimizer.optimize0(Optimizer.java:36) at org.globus.cog.karajan.Optimizer.optimize(Optimizer.java:28) at org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:207) at org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:192) at org.globus.cog.karajan.workflow.nodes.ExecuteFile.post(ExecuteFile.java:128) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) at org.globus.cog.karajan.workflow.nodes.ExecuteFile.completed(ExecuteFile.java:155) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) Time: 0.734, rate: 22321 j/s login1$ cat tsleep.swift app sleep(string time) { sleep time; } /* Main program */ string t = "1.1"; foreach ai,i in [0:99] { sleep(t); } login1$ cat coaster_osg.xml passive 200.0 10.92 /opt/osg/data/engage/mp01/swift_scratch passive 200.0 45.0 /opt/osg/data/engage/mp01/swift_scratch passive 200.0 86.9 /afs/hep.wisc.edu/osg/data/engage/mp01/swift_scratch passive 200.0 68.46 /uscms_grid/data/engage/mp01/swift_scratch passive 200.0 9999.97 /osg/data/engage/mp01/swift_scratch passive 200.0 27.58 /osg/data/engage/mp01/swift_scratch passive 200.0 50.35 /usatlas/prodjob/share/engage-mp01/swift_scratch passive 200.0 31.7 /osg/data/engage/mp01/swift_scratch passive 200.0 0.22 /osg/storage/data/engage/mp01/swift_scratch passive 200.0 1.42 /osgremote/osg_data/engage/mp01/swift_scratch passive 200.0 3.18 /opt/pfgriddata/engage/mp01/swift_scratch passive 200.0 3.09 /osg/data/engage/mp01/swift_scratch passive 200.0 9999.97 /scratch/osg/engage/mp01/swift_scratch passive 200.0 2.19 /raid2/osg-data/engage/mp01/swift_scratch passive 200.0 2.17 /raid2/osg-data/engage/mp01/swift_scratch passive 200.0 4.07 /opt/osg/data/engage/mp01/swift_scratch passive 200.0 50.05 /usatlas/prodjob/share/engage-mp01/swift_scratch passive 200.0 49.69 /usatlas/prodjob/share/engage-mp01/swift_scratch passive 200.0 9999.97 /lustre/pg/data/engage/mp01/swift_scratch passive 200.0 2.73 /nfs/osg-data/engage/mp01/swift_scratch ----- Original Message ----- > Fixed in cog r3051. > > It would however be useful to see the swift script that caused this. > > On Mon, 2011-02-07 at 20:00 -0600, Michael Wilde wrote: > > Tim, I saw the same problem. The similar problem I reported occurred > > when I tried a simpler script to test basic sanity. I thought they > > were related but apparently not. > > > > > > - Mike > > > > > > > > ______________________________________________________________________ > > P.S. I am doing a clean build from the the latest svn > > versions of swift and cog > > > > On Mon, Feb 7, 2011 at 4:33 PM, Tim Armstrong > > wrote: > > I've run into a different problem... > > > > Swift fails with the following exception: > > > > Execution failed: > > java.lang.NullPointerException > > at > > org.globus.cog.karajan.Optimizer.optimize0(Optimizer.java:36) > > at > > org.globus.cog.karajan.Optimizer.optimize(Optimizer.java:28) > > at > > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:207) > > at > > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:192) > > at > > org.globus.cog.karajan.workflow.nodes.ExecuteFile.post(ExecuteFile.java:128) > > > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > > at > > org.globus.cog.karajan.workflow.nodes.ExecuteFile.completed(ExecuteFile.java:155) > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > at > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > at > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > at > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > at java.util.concurrent.Executors > > $RunnableAdapter.call(Executors.java:441) > > at java.util.concurrent.FutureTask > > $Sync.innerRun(FutureTask.java:303) > > at > > java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.runTask(ThreadPoolExecutor.java:886) > > at java.util.concurrent.ThreadPoolExecutor > > $Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:619) > > > > > > Time: 0.535, rate: 30624 j/s > > > > Nothing really appears in the logs: > > 2011-02-07 16:32:22,668-0600 DEBUG Loader Loader > > started > > > > > > - Tim > > > > > > > > > > On Mon, Feb 7, 2011 at 12:54 PM, Mihael Hategan > > wrote: > > Should be fixed now. > > > > > > On Mon, 2011-02-07 at 10:03 -0800, Mihael > > Hategan wrote: > > > Ooops. I didn't get any emails over the > > weekend (well, it seems I did, > > > but my otherwise reliable email > > > notification > > didn't work). So I'm a bit > > > behind. > > > > > > Mihael > > > > > > On Fri, 2011-02-04 at 14:44 -0600, Michael > > Wilde wrote: > > > > Im getting strange errors from trunk at > > the moment from previously working scripts. > > Seemed to be unable to parse sites.xml. > > > > > > > > In the process of debugging this, I find > > that I cant get the simplest of swift > > scripts > > (a single trace statement) to run. > > > > > > > > Is anyone else encountering similar > > problems? > > > > > > > > Here's what I get: > > > > > > > > com$ cat hi.swift > > > > trace("hi"); > > > > com$ java -version > > > > java version "1.6.0_20" > > > > Java(TM) SE Runtime Environment (build > > 1.6.0_20-b02) > > > > Java HotSpot(TM) 64-Bit Server VM (build > > 16.3-b01, mixed mode) > > > > com$ swift -version > > > > Swift svn swift-r4061 cog-r3046 > > > > > > > > com$ swift hi.swift > > > > Swift svn swift-r4061 cog-r3046 > > > > > > > > RunID: 20110204-1439-j430gp9g > > > > Progress: time:0 > > > > Execution failed: > > > > 1 names specified; 0 arguments > > found > > > > Time: 1.179, rate: 13896 j/s > > > > com$ cat hi.xml > > > > > xmlns="http://ci.uchicago.edu/swift/2009/02/swiftscript" > > > > > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > > > > > > xmlns:xs="http://www.w3.org/2001/XMLSchema"> > > > > > > > > > > hi > > > > > > > > > > > > com$ > > > > > > > > com$ cat *9g.log > > > > 2011-02-04 14:39:29,205-0600 DEBUG > > > > Loader > > Max heap: 238616576 > > > > 2011-02-04 14:39:29,206-0600 DEBUG > > > > Loader > > kmlversion is > > >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > > > 2011-02-04 14:39:29,207-0600 DEBUG > > > > Loader > > build version is > > >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > > > 2011-02-04 14:39:29,207-0600 DEBUG > > > > Loader > > Recompilation suppressed. > > > > 2011-02-04 14:39:29,343-0600 INFO > > VDL2ExecutionContext Stack dump: > > > > Level 1 > > > > [iA = 0, iB = 0, bA = false, bB = false] > > > > vdl:instanceconfig = Swift > > configuration [] > > > > vdl:operation = run > > > > swift.home > > = /home/wilde/swift/rev/trunk/bin/.. > > > > PATH_SEPARATOR = / > > > > > > > > > > > > 2011-02-04 14:39:29,900-0600 INFO > > > > unknown > > Using sites > > file: > > /home/wilde/swift/rev/trunk/bin/../etc/sites.xml > > > > 2011-02-04 14:39:29,928-0600 INFO > > > > unknown > > Using > > tc.data: > > /home/wilde/swift/rev/trunk/bin/../etc/tc.data > > > > 2011-02-04 14:39:30,023-0600 INFO > > AbstractScheduler Setting resources to: > > {localhost=localhost} > > > > 2011-02-04 14:39:30,468-0600 INFO > > > > unknown > > Swift svn swift-r4061 cog-r3046 > > > > > > > > 2011-02-04 14:39:30,469-0600 INFO > > > > unknown > > RUNID id=run:20110204-1439-j430gp9g > > > > 2011-02-04 14:39:30,511-0600 DEBUG > > VDL2ExecutionContext 1 names specified; 0 > > arguments found > > > > 1 names specified; 0 arguments found > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > > at > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > > at > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > > at > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > > at > > > > java.util.concurrent.Executors > > $RunnableAdapter.call(Executors.java:441) > > > > at > > > > java.util.concurrent.FutureTask > > $Sync.innerRun(FutureTask.java:303) > > > > at > > java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > > at > > java.util.concurrent.ThreadPoolExecutor > > $Worker.runTask(ThreadPoolExecutor.java:886) > > > > at > > java.util.concurrent.ThreadPoolExecutor > > $Worker.run(ThreadPoolExecutor.java:908) > > > > at > > java.lang.Thread.run(Thread.java:619) > > > > 2011-02-04 14:39:30,522-0600 INFO > > ExecutionContext Detailed exception: > > > > 1 names specified; 0 arguments found > > > > > > > > at > > org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > > > at > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > > at > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > > at > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > > at > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > > at > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > > at > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > > at > > > > java.util.concurrent.Executors > > $RunnableAdapter.call(Executors.java:441) > > > > at > > > > java.util.concurrent.FutureTask > > $Sync.innerRun(FutureTask.java:303) > > > > at > > java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > > at > > java.util.concurrent.ThreadPoolExecutor > > $Worker.runTask(ThreadPoolExecutor.java:886) > > > > at > > java.util.concurrent.ThreadPoolExecutor > > $Worker.run(ThreadPoolExecutor.java:908) > > > > at > > java.lang.Thread.run(Thread.java:619) > > > > 2011-02-04 14:39:30,522-0600 INFO Loader > > Swift finished with errors > > > > com$ > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Sun Feb 13 22:25:34 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sun, 13 Feb 2011 22:25:34 -0600 (CST) Subject: [Swift-devel] Swift trunk broken? In-Reply-To: <602491213.50794.1297656297525.JavaMail.root@zimbra.anl.gov> Message-ID: <54574774.50810.1297657534675.JavaMail.root@zimbra.anl.gov> Great - seems to work with the latest trunk rev. Same sites file but with localhost added - sleep app ran on localhost only. Tim, can you see if trunk now supports SwiftR? Thanks! - Mike login1$ swift -config swift.properties -tc.file tc -sites.file tsites.xml tsleep.swift Swift svn swift-r4087 cog-r3051 RunID: 20110213-2222-g1t8t4xg Progress: time:0 Progress: time:1006 Submitting:21 Submitted:4 Active:75 Progress: time:2641 Active:99 Checking status:1 Progress: time:3656 Active:31 Checking status:1 Finished successfully:68 Final status: time:3768 Finished successfully:100 Time: 5.009, rate: 3270 j/s login1$ ----- Original Message ----- > Cool - will test. > > Here's an example of the failure. Tiny swift script, but a lengthy > (OSG) sites.xml: > > login1$ swift -config swift.properties -sites.file coaster_osg.xml > tsleep.swift > Execution failed: > java.lang.NullPointerException > at org.globus.cog.karajan.Optimizer.optimize0(Optimizer.java:36) > at org.globus.cog.karajan.Optimizer.optimize(Optimizer.java:28) > at > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:207) > at > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:192) > at > org.globus.cog.karajan.workflow.nodes.ExecuteFile.post(ExecuteFile.java:128) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > at > org.globus.cog.karajan.workflow.nodes.ExecuteFile.completed(ExecuteFile.java:155) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > at > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > at > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > Time: 0.734, rate: 22321 j/s > login1$ cat tsleep.swift > app sleep(string time) { > sleep time; > } > > > /* Main program */ > > string t = "1.1"; > > foreach ai,i in [0:99] { > sleep(t); > } > login1$ cat coaster_osg.xml > > > > > > url="https://communicado.ci.uchicago.edu:62100" > jobmanager="local:local" /> > > passive > > 200.0 > 10.92 > > > /opt/osg/data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62101" > jobmanager="local:local" /> > > passive > > 200.0 > 45.0 > > > /opt/osg/data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62102" > jobmanager="local:local" /> > > passive > > 200.0 > 86.9 > > > /afs/hep.wisc.edu/osg/data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62103" > jobmanager="local:local" /> > > passive > > 200.0 > 68.46 > > > /uscms_grid/data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62104" > jobmanager="local:local" /> > > passive > > 200.0 > 9999.97 > > > /osg/data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62105" > jobmanager="local:local" /> > > passive > > 200.0 > 27.58 > > > /osg/data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62106" > jobmanager="local:local" /> > > passive > > 200.0 > 50.35 > > > /usatlas/prodjob/share/engage-mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62107" > jobmanager="local:local" /> > > passive > > 200.0 > 31.7 > > > /osg/data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62108" > jobmanager="local:local" /> > > passive > > 200.0 > 0.22 > > > /osg/storage/data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62109" > jobmanager="local:local" /> > > passive > > 200.0 > 1.42 > > > /osgremote/osg_data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62110" > jobmanager="local:local" /> > > passive > > 200.0 > 3.18 > > > /opt/pfgriddata/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62111" > jobmanager="local:local" /> > > passive > > 200.0 > 3.09 > > > /osg/data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62112" > jobmanager="local:local" /> > > passive > > 200.0 > 9999.97 > > > /scratch/osg/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62113" > jobmanager="local:local" /> > > passive > > 200.0 > 2.19 > > > /raid2/osg-data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62114" > jobmanager="local:local" /> > > passive > > 200.0 > 2.17 > > > /raid2/osg-data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62115" > jobmanager="local:local" /> > > passive > > 200.0 > 4.07 > > > /opt/osg/data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62116" > jobmanager="local:local" /> > > passive > > 200.0 > 50.05 > > > /usatlas/prodjob/share/engage-mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62117" > jobmanager="local:local" /> > > passive > > 200.0 > 49.69 > > > /usatlas/prodjob/share/engage-mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62118" > jobmanager="local:local" /> > > passive > > 200.0 > 9999.97 > > > /lustre/pg/data/engage/mp01/swift_scratch > > > > > url="https://communicado.ci.uchicago.edu:62119" > jobmanager="local:local" /> > > passive > > 200.0 > 2.73 > > > /nfs/osg-data/engage/mp01/swift_scratch > > > > > > ----- Original Message ----- > > Fixed in cog r3051. > > > > It would however be useful to see the swift script that caused this. > > > > On Mon, 2011-02-07 at 20:00 -0600, Michael Wilde wrote: > > > Tim, I saw the same problem. The similar problem I reported > > > occurred > > > when I tried a simpler script to test basic sanity. I thought they > > > were related but apparently not. > > > > > > > > > - Mike > > > > > > > > > > > > ______________________________________________________________________ > > > P.S. I am doing a clean build from the the latest svn > > > versions of swift and cog > > > > > > On Mon, Feb 7, 2011 at 4:33 PM, Tim Armstrong > > > wrote: > > > I've run into a different problem... > > > > > > Swift fails with the following exception: > > > > > > Execution failed: > > > java.lang.NullPointerException > > > at > > > org.globus.cog.karajan.Optimizer.optimize0(Optimizer.java:36) > > > at > > > org.globus.cog.karajan.Optimizer.optimize(Optimizer.java:28) > > > at > > > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:207) > > > at > > > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:192) > > > at > > > org.globus.cog.karajan.workflow.nodes.ExecuteFile.post(ExecuteFile.java:128) > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.ExecuteFile.completed(ExecuteFile.java:155) > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > at > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > at > > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > at > > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > at java.util.concurrent.Executors > > > $RunnableAdapter.call(Executors.java:441) > > > at java.util.concurrent.FutureTask > > > $Sync.innerRun(FutureTask.java:303) > > > at > > > java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > at java.util.concurrent.ThreadPoolExecutor > > > $Worker.runTask(ThreadPoolExecutor.java:886) > > > at java.util.concurrent.ThreadPoolExecutor > > > $Worker.run(ThreadPoolExecutor.java:908) > > > at java.lang.Thread.run(Thread.java:619) > > > > > > > > > Time: 0.535, rate: 30624 j/s > > > > > > Nothing really appears in the logs: > > > 2011-02-07 16:32:22,668-0600 DEBUG Loader Loader > > > started > > > > > > > > > - Tim > > > > > > > > > > > > > > > On Mon, Feb 7, 2011 at 12:54 PM, Mihael Hategan > > > wrote: > > > Should be fixed now. > > > > > > > > > On Mon, 2011-02-07 at 10:03 -0800, Mihael > > > Hategan wrote: > > > > Ooops. I didn't get any emails over the > > > weekend (well, it seems I did, > > > > but my otherwise reliable email > > > > notification > > > didn't work). So I'm a bit > > > > behind. > > > > > > > > Mihael > > > > > > > > On Fri, 2011-02-04 at 14:44 -0600, > > > > Michael > > > Wilde wrote: > > > > > Im getting strange errors from trunk > > > > > at > > > the moment from previously working > > > scripts. > > > Seemed to be unable to parse sites.xml. > > > > > > > > > > In the process of debugging this, I > > > > > find > > > that I cant get the simplest of swift > > > scripts > > > (a single trace statement) to run. > > > > > > > > > > Is anyone else encountering similar > > > problems? > > > > > > > > > > Here's what I get: > > > > > > > > > > com$ cat hi.swift > > > > > trace("hi"); > > > > > com$ java -version > > > > > java version "1.6.0_20" > > > > > Java(TM) SE Runtime Environment (build > > > 1.6.0_20-b02) > > > > > Java HotSpot(TM) 64-Bit Server VM > > > > > (build > > > 16.3-b01, mixed mode) > > > > > com$ swift -version > > > > > Swift svn swift-r4061 cog-r3046 > > > > > > > > > > com$ swift hi.swift > > > > > Swift svn swift-r4061 cog-r3046 > > > > > > > > > > RunID: 20110204-1439-j430gp9g > > > > > Progress: time:0 > > > > > Execution failed: > > > > > 1 names specified; 0 arguments > > > found > > > > > Time: 1.179, rate: 13896 j/s > > > > > com$ cat hi.xml > > > > > > > xmlns="http://ci.uchicago.edu/swift/2009/02/swiftscript" > > > > > > > > xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" > > > > > > > > xmlns:xs="http://www.w3.org/2001/XMLSchema"> > > > > > > > > > > > > > hi > > > > > > > > > > > > > > > com$ > > > > > > > > > > com$ cat *9g.log > > > > > 2011-02-04 14:39:29,205-0600 DEBUG > > > > > Loader > > > Max heap: 238616576 > > > > > 2011-02-04 14:39:29,206-0600 DEBUG > > > > > Loader > > > kmlversion is > > > >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > > > > 2011-02-04 14:39:29,207-0600 DEBUG > > > > > Loader > > > build version is > > > >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > > > > 2011-02-04 14:39:29,207-0600 DEBUG > > > > > Loader > > > Recompilation suppressed. > > > > > 2011-02-04 14:39:29,343-0600 INFO > > > VDL2ExecutionContext Stack dump: > > > > > Level 1 > > > > > [iA = 0, iB = 0, bA = false, bB = > > > > > false] > > > > > vdl:instanceconfig = Swift > > > configuration [] > > > > > vdl:operation = run > > > > > swift.home > > > = /home/wilde/swift/rev/trunk/bin/.. > > > > > PATH_SEPARATOR = / > > > > > > > > > > > > > > > 2011-02-04 14:39:29,900-0600 INFO > > > > > unknown > > > Using sites > > > file: > > > /home/wilde/swift/rev/trunk/bin/../etc/sites.xml > > > > > 2011-02-04 14:39:29,928-0600 INFO > > > > > unknown > > > Using > > > tc.data: > > > /home/wilde/swift/rev/trunk/bin/../etc/tc.data > > > > > 2011-02-04 14:39:30,023-0600 INFO > > > AbstractScheduler Setting resources to: > > > {localhost=localhost} > > > > > 2011-02-04 14:39:30,468-0600 INFO > > > > > unknown > > > Swift svn swift-r4061 cog-r3046 > > > > > > > > > > 2011-02-04 14:39:30,469-0600 INFO > > > > > unknown > > > RUNID id=run:20110204-1439-j430gp9g > > > > > 2011-02-04 14:39:30,511-0600 DEBUG > > > VDL2ExecutionContext 1 names specified; 0 > > > arguments found > > > > > 1 names specified; 0 arguments found > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > > > at > > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > > > at > > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > > > at > > > > > java.util.concurrent.Executors > > > $RunnableAdapter.call(Executors.java:441) > > > > > at > > > > > java.util.concurrent.FutureTask > > > $Sync.innerRun(FutureTask.java:303) > > > > > at > > > java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > > > at > > > java.util.concurrent.ThreadPoolExecutor > > > $Worker.runTask(ThreadPoolExecutor.java:886) > > > > > at > > > java.util.concurrent.ThreadPoolExecutor > > > $Worker.run(ThreadPoolExecutor.java:908) > > > > > at > > > java.lang.Thread.run(Thread.java:619) > > > > > 2011-02-04 14:39:30,522-0600 INFO > > > ExecutionContext Detailed exception: > > > > > 1 names specified; 0 arguments found > > > > > > > > > > at > > > org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > > > at > > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > > > at > > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > > > at > > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > > > at > > > > > java.util.concurrent.Executors > > > $RunnableAdapter.call(Executors.java:441) > > > > > at > > > > > java.util.concurrent.FutureTask > > > $Sync.innerRun(FutureTask.java:303) > > > > > at > > > java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > > > at > > > java.util.concurrent.ThreadPoolExecutor > > > $Worker.runTask(ThreadPoolExecutor.java:886) > > > > > at > > > java.util.concurrent.ThreadPoolExecutor > > > $Worker.run(ThreadPoolExecutor.java:908) > > > > > at > > > java.lang.Thread.run(Thread.java:619) > > > > > 2011-02-04 14:39:30,522-0600 INFO > > > > > Loader > > > Swift finished with errors > > > > > com$ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Sun Feb 13 23:12:24 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 13 Feb 2011 21:12:24 -0800 Subject: [Swift-devel] cluster mess table Message-ID: <1297660344.31881.81.camel@blabla2.none> Hi, I started a table that Mike, Justin and I spoke about at http://www.ci.uchicago.edu/wiki/bin/view/SWFT/NodeConfigurationTable The purpose of it is to gather information about a wide range of clusters and see how we can nicely come up with a set of parameters that are easy to use and address the variety in cluster configurations. It's somewhat empty right now, so please fill it up with clusters you know. Mihael From aespinosa at cs.uchicago.edu Mon Feb 14 02:11:43 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 14 Feb 2011 02:11:43 -0600 Subject: [Swift-devel] Re: fixed_array_mapper versus array_mapper on output In-Reply-To: References: Message-ID: Making my question more concrete, how you suggest going about my workflow: string seis_str[]; string peak_str[]; foreach var,i in vars { seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, "_", rup.index, "_", i, ".grm"); peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, "_", rup.index, "_", i, ".bsa"); } Seismogram seis[] ; PeakValue peak[] ; (seis, peak) = seispeak_agg(sub, vars, site, rup.size); Should I hack around to force this into a fixed_array_mapper? -Allan 2011/2/12 Allan Espinosa : > Moving thread to devel for brainstorming possible solutions: > > 1. ?implement a join() function: @join(array, ", "); > 2. ?fix array_mapper > > 2011/2/12 Allan Espinosa : >> For an array output data structure, the two mappers behave differently >> >> type file; >> >> app(file o[]) >> ? ?split(file i){ >> ?split "-l" 1 @filename(i) "seqout."; >> } >> >> /*file out[] > "seqout.ac", ?// Does not work >> ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? "seqout.ad"]>;*/ >> file out[] ; // Works >> >> file input <"seq.in">; >> out = split(input); -- Allan M. Espinosa PhD student, Computer Science University of Chicago From tga at uchicago.edu Mon Feb 14 08:01:04 2011 From: tga at uchicago.edu (Tim Armstrong) Date: Mon, 14 Feb 2011 08:01:04 -0600 Subject: [Swift-devel] Swift trunk broken? In-Reply-To: <54574774.50810.1297657534675.JavaMail.root@zimbra.anl.gov> References: <602491213.50794.1297656297525.JavaMail.root@zimbra.anl.gov> <54574774.50810.1297657534675.JavaMail.root@zimbra.anl.gov> Message-ID: Thanks Mihael, So far it passes all of the SwiftR test suite - looks good :) - Tim On Sun, Feb 13, 2011 at 10:25 PM, Michael Wilde wrote: > Great - seems to work with the latest trunk rev. Same sites file but with > localhost added - sleep app ran on localhost only. Tim, can you see if trunk > now supports SwiftR? > > Thanks! > > - Mike > > login1$ swift -config swift.properties -tc.file tc -sites.file tsites.xml > tsleep.swift > Swift svn swift-r4087 cog-r3051 > > RunID: 20110213-2222-g1t8t4xg > Progress: time:0 > Progress: time:1006 Submitting:21 Submitted:4 Active:75 > Progress: time:2641 Active:99 Checking status:1 > Progress: time:3656 Active:31 Checking status:1 Finished > successfully:68 > Final status: time:3768 Finished successfully:100 > Time: 5.009, rate: 3270 j/s > login1$ > > > ----- Original Message ----- > > Cool - will test. > > > > Here's an example of the failure. Tiny swift script, but a lengthy > > (OSG) sites.xml: > > > > login1$ swift -config swift.properties -sites.file coaster_osg.xml > > tsleep.swift > > Execution failed: > > java.lang.NullPointerException > > at org.globus.cog.karajan.Optimizer.optimize0(Optimizer.java:36) > > at org.globus.cog.karajan.Optimizer.optimize(Optimizer.java:28) > > at > > > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:207) > > at > > > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:192) > > at > > > org.globus.cog.karajan.workflow.nodes.ExecuteFile.post(ExecuteFile.java:128) > > at > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > at > > > org.globus.cog.karajan.workflow.nodes.ExecuteFile.completed(ExecuteFile.java:155) > > at > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > at > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > at > > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > at > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > at > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > at > > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > at > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > at > > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > at > > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > at > > > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > at java.lang.Thread.run(Thread.java:662) > > > > Time: 0.734, rate: 22321 j/s > > login1$ cat tsleep.swift > > app sleep(string time) { > > sleep time; > > } > > > > > > /* Main program */ > > > > string t = "1.1"; > > > > foreach ai,i in [0:99] { > > sleep(t); > > } > > login1$ cat coaster_osg.xml > > > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62100" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 10.92 > > > > > > /opt/osg/data/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62101" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 45.0 > > > > > > /opt/osg/data/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62102" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 86.9 > > > > > > /afs/hep.wisc.edu/osg/data/engage/mp01/swift_scratch > > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62103" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 68.46 > > > > > > /uscms_grid/data/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62104" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 9999.97 > > > > > > /osg/data/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62105" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 27.58 > > > > > > /osg/data/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62106" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 50.35 > > > > > > > /usatlas/prodjob/share/engage-mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62107" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 31.7 > > > > > > /osg/data/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62108" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 0.22 > > > > > > > /osg/storage/data/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62109" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 1.42 > > > > > > > /osgremote/osg_data/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62110" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 3.18 > > > > > > /opt/pfgriddata/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62111" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 3.09 > > > > > > /osg/data/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62112" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 9999.97 > > > > > > /scratch/osg/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62113" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 2.19 > > > > > > /raid2/osg-data/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62114" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 2.17 > > > > > > /raid2/osg-data/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62115" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 4.07 > > > > > > /opt/osg/data/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62116" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 50.05 > > > > > > > /usatlas/prodjob/share/engage-mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62117" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 49.69 > > > > > > > /usatlas/prodjob/share/engage-mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62118" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 9999.97 > > > > > > /lustre/pg/data/engage/mp01/swift_scratch > > > > > > > > > > > url="https://communicado.ci.uchicago.edu:62119" > > jobmanager="local:local" /> > > > > passive > > > > 200.0 > > 2.73 > > > > > > /nfs/osg-data/engage/mp01/swift_scratch > > > > > > > > > > > > ----- Original Message ----- > > > Fixed in cog r3051. > > > > > > It would however be useful to see the swift script that caused this. > > > > > > On Mon, 2011-02-07 at 20:00 -0600, Michael Wilde wrote: > > > > Tim, I saw the same problem. The similar problem I reported > > > > occurred > > > > when I tried a simpler script to test basic sanity. I thought they > > > > were related but apparently not. > > > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > ______________________________________________________________________ > > > > P.S. I am doing a clean build from the the latest svn > > > > versions of swift and cog > > > > > > > > On Mon, Feb 7, 2011 at 4:33 PM, Tim Armstrong > > > > wrote: > > > > I've run into a different problem... > > > > > > > > Swift fails with the following exception: > > > > > > > > Execution failed: > > > > java.lang.NullPointerException > > > > at > > > > > org.globus.cog.karajan.Optimizer.optimize0(Optimizer.java:36) > > > > at > > > > > org.globus.cog.karajan.Optimizer.optimize(Optimizer.java:28) > > > > at > > > > > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:207) > > > > at > > > > > org.globus.cog.karajan.util.serialization.XMLConverter.read(XMLConverter.java:192) > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.ExecuteFile.post(ExecuteFile.java:128) > > > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.ExecuteFile.completed(ExecuteFile.java:155) > > > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > > at > > > > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > > at > > > > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > > at > > > > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > > at java.util.concurrent.Executors > > > > $RunnableAdapter.call(Executors.java:441) > > > > at java.util.concurrent.FutureTask > > > > $Sync.innerRun(FutureTask.java:303) > > > > at > > > > > java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > > at java.util.concurrent.ThreadPoolExecutor > > > > $Worker.runTask(ThreadPoolExecutor.java:886) > > > > at java.util.concurrent.ThreadPoolExecutor > > > > $Worker.run(ThreadPoolExecutor.java:908) > > > > at java.lang.Thread.run(Thread.java:619) > > > > > > > > > > > > Time: 0.535, rate: 30624 j/s > > > > > > > > Nothing really appears in the logs: > > > > 2011-02-07 16:32:22,668-0600 DEBUG Loader Loader > > > > started > > > > > > > > > > > > - Tim > > > > > > > > > > > > > > > > > > > > On Mon, Feb 7, 2011 at 12:54 PM, Mihael Hategan > > > > wrote: > > > > Should be fixed now. > > > > > > > > > > > > On Mon, 2011-02-07 at 10:03 -0800, Mihael > > > > Hategan wrote: > > > > > Ooops. I didn't get any emails over the > > > > weekend (well, it seems I did, > > > > > but my otherwise reliable email > > > > > notification > > > > didn't work). So I'm a bit > > > > > behind. > > > > > > > > > > Mihael > > > > > > > > > > On Fri, 2011-02-04 at 14:44 -0600, > > > > > Michael > > > > Wilde wrote: > > > > > > Im getting strange errors from trunk > > > > > > at > > > > the moment from previously working > > > > scripts. > > > > Seemed to be unable to parse sites.xml. > > > > > > > > > > > > In the process of debugging this, I > > > > > > find > > > > that I cant get the simplest of swift > > > > scripts > > > > (a single trace statement) to run. > > > > > > > > > > > > Is anyone else encountering similar > > > > problems? > > > > > > > > > > > > Here's what I get: > > > > > > > > > > > > com$ cat hi.swift > > > > > > trace("hi"); > > > > > > com$ java -version > > > > > > java version "1.6.0_20" > > > > > > Java(TM) SE Runtime Environment (build > > > > 1.6.0_20-b02) > > > > > > Java HotSpot(TM) 64-Bit Server VM > > > > > > (build > > > > 16.3-b01, mixed mode) > > > > > > com$ swift -version > > > > > > Swift svn swift-r4061 cog-r3046 > > > > > > > > > > > > com$ swift hi.swift > > > > > > Swift svn swift-r4061 cog-r3046 > > > > > > > > > > > > RunID: 20110204-1439-j430gp9g > > > > > > Progress: time:0 > > > > > > Execution failed: > > > > > > 1 names specified; 0 arguments > > > > found > > > > > > Time: 1.179, rate: 13896 j/s > > > > > > com$ cat hi.xml > > > > > > > > > xmlns=" > http://ci.uchicago.edu/swift/2009/02/swiftscript" > > > > > > > > > > xmlns:xsi=" > http://www.w3.org/2001/XMLSchema-instance" > > > > > > > > > > xmlns:xs="http://www.w3.org/2001/XMLSchema > "> > > > > > > > > > > > > > > > > > hi > > > > > > > > > > > > > > > > > > com$ > > > > > > > > > > > > com$ cat *9g.log > > > > > > 2011-02-04 14:39:29,205-0600 DEBUG > > > > > > Loader > > > > Max heap: 238616576 > > > > > > 2011-02-04 14:39:29,206-0600 DEBUG > > > > > > Loader > > > > kmlversion is > > > > >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > > > > > 2011-02-04 14:39:29,207-0600 DEBUG > > > > > > Loader > > > > build version is > > > > >a1ce0de8-81e9-4226-987a-0bbcb40af008< > > > > > > 2011-02-04 14:39:29,207-0600 DEBUG > > > > > > Loader > > > > Recompilation suppressed. > > > > > > 2011-02-04 14:39:29,343-0600 INFO > > > > VDL2ExecutionContext Stack dump: > > > > > > Level 1 > > > > > > [iA = 0, iB = 0, bA = false, bB = > > > > > > false] > > > > > > vdl:instanceconfig = Swift > > > > configuration [] > > > > > > vdl:operation = run > > > > > > swift.home > > > > = /home/wilde/swift/rev/trunk/bin/.. > > > > > > PATH_SEPARATOR = / > > > > > > > > > > > > > > > > > > 2011-02-04 14:39:29,900-0600 INFO > > > > > > unknown > > > > Using sites > > > > file: > > > > > /home/wilde/swift/rev/trunk/bin/../etc/sites.xml > > > > > > 2011-02-04 14:39:29,928-0600 INFO > > > > > > unknown > > > > Using > > > > tc.data: > > > > > /home/wilde/swift/rev/trunk/bin/../etc/tc.data > > > > > > 2011-02-04 14:39:30,023-0600 INFO > > > > AbstractScheduler Setting resources to: > > > > {localhost=localhost} > > > > > > 2011-02-04 14:39:30,468-0600 INFO > > > > > > unknown > > > > Swift svn swift-r4061 cog-r3046 > > > > > > > > > > > > 2011-02-04 14:39:30,469-0600 INFO > > > > > > unknown > > > > RUNID id=run:20110204-1439-j430gp9g > > > > > > 2011-02-04 14:39:30,511-0600 DEBUG > > > > VDL2ExecutionContext 1 names specified; 0 > > > > arguments found > > > > > > 1 names specified; 0 arguments found > > > > > > > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > > > > at > > > > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > > > > at > > > > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > > > > at > > > > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > > > > at > > > > > > java.util.concurrent.Executors > > > > $RunnableAdapter.call(Executors.java:441) > > > > > > at > > > > > > java.util.concurrent.FutureTask > > > > $Sync.innerRun(FutureTask.java:303) > > > > > > at > > > > > java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > > > > at > > > > java.util.concurrent.ThreadPoolExecutor > > > > $Worker.runTask(ThreadPoolExecutor.java:886) > > > > > > at > > > > java.util.concurrent.ThreadPoolExecutor > > > > $Worker.run(ThreadPoolExecutor.java:908) > > > > > > at > > > > java.lang.Thread.run(Thread.java:619) > > > > > > 2011-02-04 14:39:30,522-0600 INFO > > > > ExecutionContext Detailed exception: > > > > > > 1 names specified; 0 arguments found > > > > > > > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.SetVar.post(SetVar.java:43) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > > > > at > > > > > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > > > > at > > > > > org.globus.cog.karajan.workflow.FlowElementWrapper.start(FlowElementWrapper.java:227) > > > > > > at > > > > > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > > > > at > > > > > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > > > > at > > > > > > java.util.concurrent.Executors > > > > $RunnableAdapter.call(Executors.java:441) > > > > > > at > > > > > > java.util.concurrent.FutureTask > > > > $Sync.innerRun(FutureTask.java:303) > > > > > > at > > > > > java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > > > > at > > > > java.util.concurrent.ThreadPoolExecutor > > > > $Worker.runTask(ThreadPoolExecutor.java:886) > > > > > > at > > > > java.util.concurrent.ThreadPoolExecutor > > > > $Worker.run(ThreadPoolExecutor.java:908) > > > > > > at > > > > java.lang.Thread.run(Thread.java:619) > > > > > > 2011-02-04 14:39:30,522-0600 INFO > > > > > > Loader > > > > Swift finished with errors > > > > > > com$ > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > > > > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at mcs.anl.gov Mon Feb 14 10:12:08 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Mon, 14 Feb 2011 10:12:08 -0600 (CST) Subject: [Swift-devel] [Bug 255] New: Extra field in tc file gives java exception in profile parsing Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=255 Summary: Extra field in tc file gives java exception in profile parsing Product: Swift Version: unspecified Platform: PC OS/Version: Mac OS Status: NEW Severity: minor Priority: P3 Component: SwiftScript language AssignedTo: skenny at uchicago.edu ReportedBy: wilde at mcs.anl.gov When there is one extra null field in tc.data, swift gives an ugly traceback rather than a clear message on what was wrong, in what file. For example if you have: localhost cat /bin/cat null null null GLOBUS::maxwalltime="00:01:00" instead of: localhost cat /bin/cat null null GLOBUS::maxwalltime="00:01:00" you get: Parsing profiles on line 1 Illegal character ' 'at position 5 org.globus.swift.catalog.util.ProfileParserException: Illegal character ' ' at org.globus.swift.catalog.util.ProfileParser.parse(ProfileParser.java:181) at org.globus.swift.catalog.transformation.File.populateTC(File.java:1099) at org.globus.swift.catalog.transformation.File.populateTC(File.java:1034) (and many more lines) The error should give the name of the file it was parsing, and should probably say there was an extra field. There's no need for the traceback on stdout/err. -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the reporter. From skenny at uchicago.edu Mon Feb 14 10:37:11 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 14 Feb 2011 08:37:11 -0800 Subject: [Swift-devel] Proposed sites.xml management for 0.92 release In-Reply-To: References: <599762557.40635.1297354767679.JavaMail.root@zimbra.anl.gov> <1967949324.45848.1297442954624.JavaMail.root@zimbra.anl.gov> Message-ID: sounds good to me david...i don't know perl so maybe my time would be better spent using it and updating the doc in sync with you working on the code (?) i think first we want to separate swiftconfig from swiftrun...it would be good if users who are already used to running regular 'swift' to be able to transition to using swiftconfig (i realize this probably affects the doc more than your code, i just wanted to mention it). so, maybe the first step is switching out the sites templates? On Fri, Feb 11, 2011 at 8:51 PM, David Kelly wrote: > Would it be easier to modify the existing swiftconfig to adjust to the new > format of templates? > > The main steps that were outlined are nearly already completed with > swiftconfig > > - The commands are already there > - A set of templates already exists, but would most likely be replaced with > the ones verified by automated testing in the format Justin specified > - A good start for documentation using swiftconfig on a variety of > configurations is at > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/LearningSwift > - Documentation for commands and syntax is there, swiftconfig -h and > swiftrun -h > - List all templates with swiftconfig -list templates (already knows the > correct order of where to look for templates) > - Help for specific templates is a good idea. That would be pretty > straightforward to add > - Support for applications and application groups is already there > > If we started over it seems like we would be duplicating a lot of code > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Feb 14 12:25:02 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 14 Feb 2011 12:25:02 -0600 (CST) Subject: [Swift-devel] Proposed sites.xml management for 0.92 release In-Reply-To: Message-ID: <270595120.53387.1297707902524.JavaMail.root@zimbra.anl.gov> Sarah, David, Yes, I think gathering the site templates is indeed the first step. Then parameterizing them, including provisions for removing lines (per the GetSites spec page). And I agree that we should set swiftrun aside (and later bundle any needed parts of it into the swift command). David, I reviewed the current swiftconfg, and I feel it can be greatly simplified by rewriting as a shell script. There is some reasonable shell arg parsing logic we can lift from: https://svn.ci.uchicago.edu/svn/vdl2/SwiftApps/SwiftR/Swift/exec/start-swift (arg parsing is right after the usage() function around midway through the script) - Mike ----- Original Message ----- sounds good to me david...i don't know perl so maybe my time would be better spent using it and updating the doc in sync with you working on the code (?) i think first we want to separate swiftconfig from swiftrun...it would be good if users who are already used to running regular 'swift' to be able to transition to using swiftconfig (i realize this probably affects the doc more than your code, i just wanted to mention it). so, maybe the first step is switching out the sites templates? On Fri, Feb 11, 2011 at 8:51 PM, David Kelly < dk0966 at cs.ship.edu > wrote: Would it be easier to modify the existing swiftconfig to adjust to the new format of templates? The main steps that were outlined are nearly already completed with swiftconfig - The commands are already there - A set of templates already exists, but would most likely be replaced with the ones verified by automated testing in the format Justin specified - A good start for documentation using swiftconfig on a variety of configurations is at http://www.ci.uchicago.edu/wiki/bin/view/SWFT/LearningSwift - Documentation for commands and syntax is there, swiftconfig -h and swiftrun -h - List all templates with swiftconfig -list templates (already knows the correct order of where to look for templates) - Help for specific templates is a good idea. That would be pretty straightforward to add - Support for applications and application groups is already there If we started over it seems like we would be duplicating a lot of code -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Mon Feb 14 12:44:09 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 14 Feb 2011 10:44:09 -0800 Subject: [Swift-devel] Re: fixed_array_mapper versus array_mapper on output In-Reply-To: References: Message-ID: <1297709049.2980.1.camel@blabla2.none> I made the array_mapper static in swift trunk r4089. It seems that the code in there assumes that the files= argument is a closed array anyway. I am not entirely convinced this would work ok, but please give it a try. Another solution might be to add a @join function and then use the fixed array mapper. Mihael On Mon, 2011-02-14 at 02:11 -0600, Allan Espinosa wrote: > Making my question more concrete, how you suggest going about my workflow: > > string seis_str[]; > string peak_str[]; > > foreach var,i in vars { > seis_str[i] = @strcat(loc_sub, "/Seismogram_", site.name, "_", rup.source, > "_", rup.index, "_", i, ".grm"); > peak_str[i] = @strcat(loc_sub, "/PeakVals_", site.name, "_", rup.source, > "_", rup.index, "_", i, ".bsa"); > } > > Seismogram seis[] ; > PeakValue peak[] ; > > (seis, peak) = seispeak_agg(sub, vars, site, rup.size); > > > Should I hack around to force this into a fixed_array_mapper? > > -Allan > > 2011/2/12 Allan Espinosa : > > Moving thread to devel for brainstorming possible solutions: > > > > 1. implement a join() function: @join(array, ", "); > > 2. fix array_mapper > > > > 2011/2/12 Allan Espinosa : > >> For an array output data structure, the two mappers behave differently > >> > >> type file; > >> > >> app(file o[]) > >> split(file i){ > >> split "-l" 1 @filename(i) "seqout."; > >> } > >> > >> /*file out[] >> "seqout.ac", // Does not work > >> "seqout.ad"]>;*/ > >> file out[] ; // Works > >> > >> file input <"seq.in">; > >> out = split(input); > > From wilde at mcs.anl.gov Mon Feb 14 13:22:53 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 14 Feb 2011 13:22:53 -0600 (CST) Subject: [Swift-devel] Re: Fix needed for ppn for non-coaster pbs provider In-Reply-To: <124038108.43151.1297374188489.JavaMail.root@zimbra.anl.gov> Message-ID: <703750908.53830.1297711373385.JavaMail.root@zimbra.anl.gov> I just looked at the file containing this code: src/org/globus/cog/abstraction/impl/scheduler/pbs/PBSExecutor.java and its changed a fair bit more from 0.92 to trunk than I can deal with. Mihael, is this file in 0.92 more recent than the one in trunk, and does it and related changes need to get integrated back into trunk? Thanks, Mike ----- Original Message ----- > The recent ppn changes need a small change to work in the case of the > PBS provider running without coasters. This causes it to put this in > the .submit file: > > #PBS -l ppn=8 > > ...which PBS rejects. The line needs to be: > > #PBS -l nodes=1:ppn=8 > > (as alluded to in the comments in PBSExecutor.java) > > Fixing it as above when count is not specified seems to work on PADS. > > svn diff is below. > I did not commit this. Should I? To trunk, 0.92 branch, or both? > > - Mike > > > login1$ cd > /home/wilde/swift/src/0.92/cog/modules/provider-localscheduler/ > login1$ svn diff > Index: > src/org/globus/cog/abstraction/impl/scheduler/pbs/PBSExecutor.java > =================================================================== > --- src/org/globus/cog/abstraction/impl/scheduler/pbs/PBSExecutor.java > (revision 3046) > +++ src/org/globus/cog/abstraction/impl/scheduler/pbs/PBSExecutor.java > (working copy) > @@ -68,7 +68,7 @@ > // 1. assuming count=1 when count is missing > // 2. not specifying PPN when count is missing > // ... are any better > - wr.write("#PBS -l ppn=" + ppn + "\n"); > + wr.write("#PBS -l nodes=1:ppn=" + ppn + "\n"); > } > } > > login1$ > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From skenny at uchicago.edu Mon Feb 14 14:01:26 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Mon, 14 Feb 2011 12:01:26 -0800 Subject: [Swift-devel] Proposed sites.xml management for 0.92 release In-Reply-To: <270595120.53387.1297707902524.JavaMail.root@zimbra.anl.gov> References: <270595120.53387.1297707902524.JavaMail.root@zimbra.anl.gov> Message-ID: On Mon, Feb 14, 2011 at 10:25 AM, Michael Wilde wrote: > Sarah, David, > > Yes, I think gathering the site templates is indeed the first step. Then > parameterizing them, including provisions for removing lines (per the > GetSites spec page). > > And I agree that we should set swiftrun aside (and later bundle any needed > parts of it into the swift command). > > David, I reviewed the current swiftconfg, and I feel it can be greatly > simplified by rewriting as a shell script. > oh, if we're going to actually do a re-write rather than leverage the code david already wrote my preference would be for python over shell...david do you know python? if not then shell is *ok* with me, it'll just be a little slower and clumsier for me :P i thought if we were sticking with the perl i would not help write it but just write the doc, but if we're doing a re-write i'm guessing it will take the effort of both of is (?) > > There is some reasonable shell arg parsing logic we can lift from: > > > https://svn.ci.uchicago.edu/svn/vdl2/SwiftApps/SwiftR/Swift/exec/start-swift > > (arg parsing is right after the usage() function around midway through the > script) > > - Mike > > > ----- Original Message ----- > > > sounds good to me david...i don't know perl so maybe my time would be > better spent using it and updating the doc in sync with you working on the > code (?) i think first we want to separate swiftconfig from swiftrun...it > would be good if users who are already used to running regular 'swift' to be > able to transition to using swiftconfig (i realize this probably affects the > doc more than your code, i just wanted to mention it). > > so, maybe the first step is switching out the sites templates? > > > On Fri, Feb 11, 2011 at 8:51 PM, David Kelly < dk0966 at cs.ship.edu > wrote: > > > Would it be easier to modify the existing swiftconfig to adjust to the new > format of templates? > > > The main steps that were outlined are nearly already completed with > swiftconfig > > > - The commands are already there > - A set of templates already exists, but would most likely be replaced with > the ones verified by automated testing in the format Justin specified > - A good start for documentation using swiftconfig on a variety of > configurations is at > http://www.ci.uchicago.edu/wiki/bin/view/SWFT/LearningSwift > - Documentation for commands and syntax is there, swiftconfig -h and > swiftrun -h > - List all templates with swiftconfig -list templates (already knows the > correct order of where to look for templates) > - Help for specific templates is a good idea. That would be pretty > straightforward to add > - Support for applications and application groups is already there > > > If we started over it seems like we would be duplicating a lot of code > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Mon Feb 14 14:11:38 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 14 Feb 2011 14:11:38 -0600 (CST) Subject: [Swift-devel] Proposed sites.xml management for 0.92 release In-Reply-To: Message-ID: <380931837.54213.1297714298078.JavaMail.root@zimbra.anl.gov> The proposed gensites command is intended to be very simple. It basically takes a template and user-set variables, and inserts the variables into the template. The GenSites spec page proposed that templates can live in the release etc/sites dir, the users $HOME/.swift/sites dir, or a -T directory from the cmd line. User settings can live in swift.properties in the current dir or $HOME/.swift. Thats about it. I think this can be done simply and quickly in a shell script. - Mike ----- Original Message ----- On Mon, Feb 14, 2011 at 10:25 AM, Michael Wilde < wilde at mcs.anl.gov > wrote: Sarah, David, Yes, I think gathering the site templates is indeed the first step. Then parameterizing them, including provisions for removing lines (per the GetSites spec page). And I agree that we should set swiftrun aside (and later bundle any needed parts of it into the swift command). David, I reviewed the current swiftconfg, and I feel it can be greatly simplified by rewriting as a shell script. oh, if we're going to actually do a re-write rather than leverage the code david already wrote my preference would be for python over shell...david do you know python? if not then shell is *ok* with me, it'll just be a little slower and clumsier for me :P i thought if we were sticking with the perl i would not help write it but just write the doc, but if we're doing a re-write i'm guessing it will take the effort of both of is (?) There is some reasonable shell arg parsing logic we can lift from: https://svn.ci.uchicago.edu/svn/vdl2/SwiftApps/SwiftR/Swift/exec/start-swift (arg parsing is right after the usage() function around midway through the script) - Mike ----- Original Message ----- sounds good to me david...i don't know perl so maybe my time would be better spent using it and updating the doc in sync with you working on the code (?) i think first we want to separate swiftconfig from swiftrun...it would be good if users who are already used to running regular 'swift' to be able to transition to using swiftconfig (i realize this probably affects the doc more than your code, i just wanted to mention it). so, maybe the first step is switching out the sites templates? On Fri, Feb 11, 2011 at 8:51 PM, David Kelly < dk0966 at cs.ship.edu > wrote: Would it be easier to modify the existing swiftconfig to adjust to the new format of templates? The main steps that were outlined are nearly already completed with swiftconfig - The commands are already there - A set of templates already exists, but would most likely be replaced with the ones verified by automated testing in the format Justin specified - A good start for documentation using swiftconfig on a variety of configurations is at http://www.ci.uchicago.edu/wiki/bin/view/SWFT/LearningSwift - Documentation for commands and syntax is there, swiftconfig -h and swiftrun -h - List all templates with swiftconfig -list templates (already knows the correct order of where to look for templates) - Help for specific templates is a good idea. That would be pretty straightforward to add - Support for applications and application groups is already there If we started over it seems like we would be duplicating a lot of code -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From dk0966 at cs.ship.edu Mon Feb 14 21:56:15 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Mon, 14 Feb 2011 22:56:15 -0500 Subject: [Swift-devel] Proposed sites.xml management for 0.92 release In-Reply-To: <380931837.54213.1297714298078.JavaMail.root@zimbra.anl.gov> References: <380931837.54213.1297714298078.JavaMail.root@zimbra.anl.gov> Message-ID: To store their settings in swift.properties, can I safely create arbitrary values like gensites.workdirectory=/home/blah? On Mon, Feb 14, 2011 at 3:11 PM, Michael Wilde wrote: > The proposed gensites command is intended to be very simple. It basically > takes a template and user-set variables, and inserts the variables into the > template. > > The GenSites spec page proposed that templates can live in the release > etc/sites dir, the users $HOME/.swift/sites dir, or a -T directory from the > cmd line. User settings can live in swift.properties in the current dir or > $HOME/.swift. Thats about it. I think this can be done simply and quickly > in a shell script. > > - Mike > > ------------------------------ > > > > On Mon, Feb 14, 2011 at 10:25 AM, Michael Wilde wrote: > >> Sarah, David, >> >> Yes, I think gathering the site templates is indeed the first step. Then >> parameterizing them, including provisions for removing lines (per the >> GetSites spec page). >> >> And I agree that we should set swiftrun aside (and later bundle any needed >> parts of it into the swift command). >> >> David, I reviewed the current swiftconfg, and I feel it can be greatly >> simplified by rewriting as a shell script. >> > > oh, if we're going to actually do a re-write rather than leverage the code > david already wrote my preference would be for python over shell...david do > you know python? if not then shell is *ok* with me, it'll just be a little > slower and clumsier for me :P i thought if we were sticking with the perl i > would not help write it but just write the doc, but if we're doing a > re-write i'm guessing it will take the effort of both of is (?) > > >> >> There is some reasonable shell arg parsing logic we can lift from: >> >> >> https://svn.ci.uchicago.edu/svn/vdl2/SwiftApps/SwiftR/Swift/exec/start-swift >> >> (arg parsing is right after the usage() function around midway through the >> script) >> >> - Mike >> >> >> ----- Original Message ----- >> >> >> sounds good to me david...i don't know perl so maybe my time would be >> better spent using it and updating the doc in sync with you working on the >> code (?) i think first we want to separate swiftconfig from swiftrun...it >> would be good if users who are already used to running regular 'swift' to be >> able to transition to using swiftconfig (i realize this probably affects the >> doc more than your code, i just wanted to mention it). >> >> so, maybe the first step is switching out the sites templates? >> >> >> On Fri, Feb 11, 2011 at 8:51 PM, David Kelly < dk0966 at cs.ship.edu > >> wrote: >> >> >> Would it be easier to modify the existing swiftconfig to adjust to the new >> format of templates? >> >> >> The main steps that were outlined are nearly already completed with >> swiftconfig >> >> >> - The commands are already there >> - A set of templates already exists, but would most likely be replaced >> with the ones verified by automated testing in the format Justin specified >> - A good start for documentation using swiftconfig on a variety of >> configurations is at >> http://www.ci.uchicago.edu/wiki/bin/view/SWFT/LearningSwift >> - Documentation for commands and syntax is there, swiftconfig -h and >> swiftrun -h >> - List all templates with swiftconfig -list templates (already knows the >> correct order of where to look for templates) >> - Help for specific templates is a good idea. That would be pretty >> straightforward to add >> - Support for applications and application groups is already there >> >> >> If we started over it seems like we would be duplicating a lot of code >> >> >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Tue Feb 15 05:06:14 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 15 Feb 2011 05:06:14 -0600 (CST) Subject: [Swift-devel] Proposed sites.xml management for 0.92 release In-Reply-To: Message-ID: <1796442543.56054.1297767974607.JavaMail.root@zimbra.anl.gov> Yes, I think you can. If not, you can use lines like: #site workdirectory=/home/blah But I think site.workdidir should do. Best to try it to verify. Im pretty sure that properties just go into a namespace, and if no Java code looks at the name, its ignored. Two related points: - for workdir in particular I suggest a default of $(pwd)/work. A few other vars may have similar needs, but this is perhaps the only exception for now. - we can do a very simple core gensites implementation that is just a tad beyond Justin's prototype, to get started. Get templates from only one place (the release's etc/sites dir) and settings from only one place (swift.properties in the current directory). Then we can add a few more options and paths to match the spec. - the spec needs comment and review. We should distill the options down to a good match for what users need to do, without getting too complex. Typically two levels of templates ("swifts" and "mine") and two levels of settings ("my global settings" and "my (overriding) settings for this run directory"). Mike ----- Original Message ----- To store their settings in swift.properties, can I safely create arbitrary values like gensites.workdirectory=/home/blah? On Mon, Feb 14, 2011 at 3:11 PM, Michael Wilde < wilde at mcs.anl.gov > wrote: The proposed gensites command is intended to be very simple. It basically takes a template and user-set variables, and inserts the variables into the template. The GenSites spec page proposed that templates can live in the release etc/sites dir, the users $HOME/.swift/sites dir, or a -T directory from the cmd line. User settings can live in swift.properties in the current dir or $HOME/.swift. Thats about it. I think this can be done simply and quickly in a shell script. - Mike On Mon, Feb 14, 2011 at 10:25 AM, Michael Wilde < wilde at mcs.anl.gov > wrote: Sarah, David, Yes, I think gathering the site templates is indeed the first step. Then parameterizing them, including provisions for removing lines (per the GetSites spec page). And I agree that we should set swiftrun aside (and later bundle any needed parts of it into the swift command). David, I reviewed the current swiftconfg, and I feel it can be greatly simplified by rewriting as a shell script. oh, if we're going to actually do a re-write rather than leverage the code david already wrote my preference would be for python over shell...david do you know python? if not then shell is *ok* with me, it'll just be a little slower and clumsier for me :P i thought if we were sticking with the perl i would not help write it but just write the doc, but if we're doing a re-write i'm guessing it will take the effort of both of is (?) There is some reasonable shell arg parsing logic we can lift from: https://svn.ci.uchicago.edu/svn/vdl2/SwiftApps/SwiftR/Swift/exec/start-swift (arg parsing is right after the usage() function around midway through the script) - Mike ----- Original Message ----- sounds good to me david...i don't know perl so maybe my time would be better spent using it and updating the doc in sync with you working on the code (?) i think first we want to separate swiftconfig from swiftrun...it would be good if users who are already used to running regular 'swift' to be able to transition to using swiftconfig (i realize this probably affects the doc more than your code, i just wanted to mention it). so, maybe the first step is switching out the sites templates? On Fri, Feb 11, 2011 at 8:51 PM, David Kelly < dk0966 at cs.ship.edu > wrote: Would it be easier to modify the existing swiftconfig to adjust to the new format of templates? The main steps that were outlined are nearly already completed with swiftconfig - The commands are already there - A set of templates already exists, but would most likely be replaced with the ones verified by automated testing in the format Justin specified - A good start for documentation using swiftconfig on a variety of configurations is at http://www.ci.uchicago.edu/wiki/bin/view/SWFT/LearningSwift - Documentation for commands and syntax is there, swiftconfig -h and swiftrun -h - List all templates with swiftconfig -list templates (already knows the correct order of where to look for templates) - Help for specific templates is a good idea. That would be pretty straightforward to add - Support for applications and application groups is already there If we started over it seems like we would be duplicating a lot of code -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Tue Feb 15 18:01:14 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Tue, 15 Feb 2011 18:01:14 -0600 Subject: [Swift-devel] resume files broken in trunk? Message-ID: Hi, I noticed that the workflow I've been running before doest not create the resumefile properly after using swift-r4089 cog-r3051 -Allan -- Allan M. Espinosa PhD student, Computer Science University of Chicago From tim.g.armstrong at gmail.com Wed Feb 16 13:30:47 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Wed, 16 Feb 2011 13:30:47 -0600 Subject: [Swift-devel] Re: Swift on Eureka In-Reply-To: <1019521736.40464.1294526433159.JavaMail.root@zimbra.anl.gov> References: <1019521736.40464.1294526433159.JavaMail.root@zimbra.anl.gov> Message-ID: I ran into this bug too but now have coasters working on Eureka. My solution was to generate a script to execute to cobalt with the argument embedded: cat > $HOME/batch.sub < wrote: > Thanks, Justin. cc'ing back to the list, Rob, and Sheri. > > Sheri, maybe you can run on PADS or Fusion till this is fixed? > > - Mike > > ----- Original Message ----- > > Hello > > Right, Swift does not currently run on Eureka due to the following > > bug in Cobalt: > > > > http://trac.mcs.anl.gov/projects/cobalt/ticket/462 > > > > I got about half of a work-around for this done... > > > > Justin > > > > On Fri, 7 Jan 2011, Michael Wilde wrote: > > > > > Hi Rob and Sheri, > > > > > > I don't know the status of Swift on Eureka, but Im eager to see it > > > running there, so we'll make sure it works. > > > > > > A long while back I tried Swift there, and at the time we had a > > > minor > > > bug in the Cobalt provider. Justin may have fixed that recently on > > > the > > > BG/P's. So Im hoping it either works or has only some > > > readily-fixable > > > issues in the way. > > > > > > We'll try it and get back to you. > > > > > > In the mean time, Sheri, you might want to try a simple hello-world > > > test > > > on Eureka, and see if you can progress to replicating what John > > > Dennis > > > had done so far. > > > > > > Its best to send any errors you get to the swift-user list (which > > > you > > > should join) so that everyone on the Swift team is aware f any > > > issues > > > you encounter and can offer help. > > > > > > You should meet with Justin at Argonne (3rd floor, 240) who can > > > serve as > > > your Swift mentor. > > > > > > Sarah, David - lets add Eureka to the test matrix for release 0.92. > > > Cobalt is very very close to PBS's interface, but there is a > > > separate > > > Swift execution provider that handles the differences. > > > > > > Regards, > > > > > > Mike > > > > > > > > > ----- Original Message ----- > > >> Hi Mike, > > >> > > >> Sheri is going to take over some of the development work John > > >> Dennis > > >> was > > >> doing on using swift with the AMWG diag package. > > >> > > >> Our platform is Eureka. Is there a development version of Swift > > >> installed there? > > >> > > >> Rob > > > > > > > > > > -- > > Justin M Wozniak > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at mcs.anl.gov Wed Feb 16 13:54:16 2011 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 16 Feb 2011 13:54:16 -0600 (CST) Subject: [Swift-devel] [Bug 257] New: Race condition in Swiftscript execution Message-ID: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=257 Summary: Race condition in Swiftscript execution Product: Swift Version: unspecified Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: tim.g.armstrong at gmail.com Created an attachment (id=289) --> (http://bugzilla.mcs.anl.gov/swift/attachment.cgi?id=289) Swift script and Swift logs for passing and failing cases. I am running: Swift svn swift-r4087 cog-r3051 I have come across a race condition in a script that I have been using for a while. . What it does is launches a dummy job (to ensure everything is started up), then sits in a serial loop, acting as a server listening on a named pipe for tasks to execute. Every 5 or so times the script runs, the dummy jobs completes but the script it fails to even enter into the loop. I have attached logs for both the usual (correct) behaviour, and the race condition. The side-by-side diff at the bottom illustrates clearly where the behaviour diverges. = rserver.swift ============================================================ type file; type RData; app (external e, RData result, file stout, file sterr) runR (file shellscript, file RServerScript, RData rcall) { bash @shellscript @RServerScript @rcall @result stdout=@stout stderr=@sterr; } app ack (external e[]) { bashlocal "-c" @strcat("echo done > ",resultPipeName); } app passivate () { bash "-c" "echo dummy swift job;"; } (external e[]) apply (string runDir) { RData rcalls[] ; RData results[] ; file stout[] ; file sterr[] ; file runRscript <"EvalRBatchPersistent.sh">; file rsScript <"SwiftRServer.R">; foreach c, i in rcalls { (e[i], results[i],stout[i], sterr[i]) = runR(runRscript,rsScript,c); } } passivate(); string pipedir = @arg("pipedir"); global string requestPipeName = @strcat(pipedir,"/requestpipe"); global string resultPipeName = @strcat(pipedir,"/resultpipe"); iterate serially { boolean done; string dir; dir = readData(requestPipeName); # Reads direct from this local pipe. Assumes Swift started in right dir. external wait[]; wait = apply(dir); if (dir=="done") { done=true; } else { done=false;} fprintf(resultPipeName, "%kdone\n", wait); } until (done); = sidebyside ============================================================ Loader Max heap: 238616576 Loader Max heap: 238616576 Loader rserver.swift: source file is new. Recompiling. Loader rserver.swift: source file is new. Recompiling. Karajan Validation of XML intermediate file was successful Karajan Validation of XML intermediate file was successful VDL2ExecutionContext Stack dump: VDL2ExecutionContext Stack dump: iB = 0, bA = false, bB = false] iB = 0, bA = false, bB = false] configuration [cf] configuration [cf] unknown Using sites file: sites.xml unknown Using sites file: sites.xml unknown Using tc.data: tc unknown Using tc.data: tc AbstractScheduler Setting resources to: {local3=local3, fork= AbstractScheduler Setting resources to: {local3=local3, fork= unknown Swift svn swift-r4087 cog-r3051 unknown Swift svn swift-r4087 cog-r3051 unknown RUNID id=run:20110216-1244-7l6y6irf | unknown RUNID id=run:20110216-1246-vfu3jq4c VDLFunction FUNCTION: arg() VDLFunction FUNCTION: arg() SetFieldValue Set: pipedir=/tmp/tga/SwiftR/swift.1237 | SetFieldValue Set: pipedir=/tmp/tga/SwiftR/swift.1447 > SetFieldValue Set: requestPipeName=/tmp/tga/SwiftR/swift.1447 > SetFieldValue Set: resultPipeName=/tmp/tga/SwiftR/swift.1447/ vdl:execute START thread=0-0 tr=bash vdl:execute START thread=0-0 tr=bash WeightedHostScoreScheduler CONTACT_SELECTED host=local0, scor < apply STARTCOMPOUND thread=0-4-0-1 name=apply apply STARTCOMPOUND thread=0-4-0-1 name=apply > SetFieldValue Set: done=false > WeightedHostScoreScheduler CONTACT_SELECTED host=local3, scor GlobalSubmitQueue No global submit throttle set. Using defaul GlobalSubmitQueue No global submit throttle set. Using defaul LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 | AbstractDataNode Found data org.griphyn.vdl.mapping.RootDataN vdl:initshareddir START host=local0 - Initializing shared dir | AbstractDataNode Found data org.griphyn.vdl.mapping.RootDataN AbstractDataNode Found data org.griphyn.vdl.mapping.RootDataN < AbstractDataNode Found data org.griphyn.vdl.mapping.RootDataN < SetFieldValue Set: swift#mapper#17003=cbatch. SetFieldValue Set: swift#mapper#17003=cbatch. SetFieldValue Set: swift#mapper#17005=.Rdata SetFieldValue Set: swift#mapper#17005=.Rdata SetFieldValue Set: swift#mapper#17007=0 SetFieldValue Set: swift#mapper#17007=0 SetFieldValue Set: swift#mapper#17008=rbatch. SetFieldValue Set: swift#mapper#17008=rbatch. SetFieldValue Set: swift#mapper#17010=.Rdata SetFieldValue Set: swift#mapper#17010=.Rdata SetFieldValue Set: swift#mapper#17011=0 < SetFieldValue Set: swift#mapper#17012=stdout. SetFieldValue Set: swift#mapper#17012=stdout. SetFieldValue Set: swift#mapper#17014=.txt SetFieldValue Set: swift#mapper#17014=.txt SetFieldValue Set: swift#mapper#17016=0 | SetFieldValue Set: swift#mapper#17020=0 > SetFieldValue Set: swift#mapper#17011=0 SetFieldValue Set: swift#mapper#17017=stderr. SetFieldValue Set: swift#mapper#17017=stderr. SetFieldValue Set: swift#mapper#17019=.txt SetFieldValue Set: swift#mapper#17019=.txt SetFieldValue Set: swift#mapper#17020=0 | SetFieldValue Set: swift#mapper#17016=0 > AbstractDataNode Found data rcalls.$[]/1.[1] > LateBindingScheduler JobQueue: 0 > vdl:initshareddir START host=local3 - Initializing shared dir > LateBindingScheduler JobQueue: 0 > vdl:execute START thread=0-4-0-1-12-0-1 tr=bash > WeightedHostScoreScheduler CONTACT_SELECTED host=local1, scor > vdl:initshareddir START host=local1 - Initializing shared dir > LateBindingScheduler JobQueue: 0 > LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 vdl:initshareddir END host=local0 - Done initializing shared < vdl:createdirset START jobid=bash-dls6916k host=local0 - Init < vdl:createdirset END jobid=bash-dls6916k - Done initializing < vdl:dostagein START jobid=bash-dls6916k - Staging in files < vdl:dostagein END jobid=bash-dls6916k - Staging in finished < GridExec TASK_DEFINITION: Task(type=JOB_SUBMISSION, identity= < JobSubmissionTaskHandler Submit: in: /tmp/tga/SwiftR/swift.12 < LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 vdl:dostageout START jobid=bash-dls6916k - Staging out files | LateBindingScheduler JobQueue: 0 vdl:dostageout END jobid=bash-dls6916k - Staging out finished | vdl:initshareddir END host=local1 - Done initializing shared > vdl:initshareddir END host=local3 - Done initializing shared > vdl:createdirset START jobid=bash-nq2a916k host=local3 - Init > vdl:createdirset END jobid=bash-nq2a916k - Done initializing > vdl:dostagein START jobid=bash-nq2a916k - Staging in files > vdl:createdirset START jobid=bash-mq2a916k host=local1 - Init > vdl:dostagein END jobid=bash-nq2a916k - Staging in finished > vdl:createdirs START path=tmp/tga/SwiftR/requests.P31228/R000 > LateBindingScheduler JobQueue: 0 > vdl:createdirset END jobid=bash-mq2a916k - Done initializing > vdl:dostagein START jobid=bash-mq2a916k - Staging in files > LateBindingScheduler JobQueue: 0 > LateBindingScheduler JobQueue: 0 > LateBindingScheduler JobQueue: 0 > GridExec TASK_DEFINITION: Task(type=JOB_SUBMISSION, identity= > JobSubmissionTaskHandler Submit: in: /tmp/tga/SwiftR/swift.14 > LateBindingScheduler JobQueue: 0 > LateBindingScheduler JobQueue: 0 > LateBindingScheduler JobQueue: 0 > vdl:dostagein END jobid=bash-mq2a916k - Staging in finished > GridExec TASK_DEFINITION: Task(type=JOB_SUBMISSION, identity= > JobSubmissionTaskHandler Submit: in: /tmp/tga/SwiftR/swift.14 > LateBindingScheduler JobQueue: 0 > vdl:dostageout START jobid=bash-nq2a916k - Staging out files > vdl:dostageout END jobid=bash-nq2a916k - Staging out finished LateBindingScheduler JobQueue: 0 LateBindingScheduler JobQueue: 0 vdl:execute END_SUCCESS thread=0-0 tr=bash vdl:execute END_SUCCESS thread=0-0 tr=bash > LateBindingScheduler JobQueue: 0 > vdl:dostageout START jobid=bash-mq2a916k - Staging out files > LateBindingScheduler JobQueue: 0 > LateBindingScheduler JobQueue: 0 > LateBindingScheduler JobQueue: 0 > LateBindingScheduler JobQueue: 0 > LateBindingScheduler JobQueue: 0 > LateBindingScheduler JobQueue: 0 > LateBindingScheduler JobQueue: 0 > LateBindingScheduler JobQueue: 1 > LateBindingScheduler JobQueue: 0 > vdl:dostageout END jobid=bash-mq2a916k - Staging out finished > LateBindingScheduler JobQueue: 0 > vdl:execute END_SUCCESS thread=0-4-0-1-12-0-1 tr=bash > apply ENDCOMPOUND thread=0-4-0-1 > apply STARTCOMPOUND thread=0-4-1-1 name=apply > AbstractDataNode Found data org.griphyn.vdl.mapping.RootDataN > AbstractDataNode Found data org.griphyn.vdl.mapping.RootDataN > SetFieldValue Set: swift#mapper#17003=cbatch. > SetFieldValue Set: swift#mapper#17005=.Rdata > SetFieldValue Set: swift#mapper#17007=0 > SetFieldValue Set: swift#mapper#17008=rbatch. > SetFieldValue Set: swift#mapper#17010=.Rdata > SetFieldValue Set: swift#mapper#17011=0 > SetFieldValue Set: swift#mapper#17012=stdout. > SetFieldValue Set: swift#mapper#17014=.txt > SetFieldValue Set: swift#mapper#17016=0 > SetFieldValue Set: swift#mapper#17017=stderr. > SetFieldValue Set: swift#mapper#17019=.txt > SetFieldValue Set: swift#mapper#17020=0 -- Configure bugmail: https://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are watching the assignee of the bug. From dk0966 at cs.ship.edu Wed Feb 16 19:13:26 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Wed, 16 Feb 2011 20:13:26 -0500 Subject: [Swift-devel] Swift Website Wishlist Message-ID: Hello, In a conference call today we started discussing some ideas on how we could improve the Swift website. What could we add that would make it more useful? If you have any ideas, please add them to the website wishlist at http://www.ci.uchicago.edu/wiki/bin/view/SWFT/WebsiteWishlist Regards, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From aespinosa at cs.uchicago.edu Thu Feb 17 13:49:52 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 17 Feb 2011 13:49:52 -0600 Subject: [Swift-devel] deadlock on workflow: Message-ID: Version swift-r3835 cog-r2988 see attached: -- Allan M. Espinosa PhD student, Computer Science University of Chicago -------------- next part -------------- A non-text attachment was scrubbed... Name: deadlock Type: application/octet-stream Size: 24728 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: postproc-20110217-1212-adm01mh8.log.bz2 Type: application/x-bzip2 Size: 1627188 bytes Desc: not available URL: From jon.monette at gmail.com Thu Feb 17 15:13:14 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Thu, 17 Feb 2011 15:13:14 -0600 Subject: [Swift-devel] Workflow waiting on condition hang Message-ID: <4D5D8F6A.20906@gmail.com> Hello, My workflow seems to be hanging. This is trunk swift-r4107 and cog-r3051. Attached is a compressed log file and the jstack output for my workflow. The jstack file says it is waiting for a condition and my workflow hangs. Following my workflow, it hangs always at the same app even though all the files needed to run the app has been created. However is the log file I get these weird PBS outputs that I have not seen before. ---------------------------------------- Begin PBS Epilogue Thu Feb 17 15:01:21 CST 2011 Job ID: 913461.svc.pads.ci.uchicago.edu Username: jonmon Group: ci-users Job Name: Block-0217-590238-000006 Session: 7544 Limits: ncpus=1,neednodes=1,nodes=1,size=1,walltime=00:40:00 Resources: cput=00:00:29,mem=24724kb,vmem=317560kb,walltime=00:01:36 Nodes: c27.pads.ci.uchicago.edu End PBS Epilogue Thu Feb 17 15:01:21 CST 2011 ---------------------------------------- Not sure what this is trying to tell me but this appears along with some other output I have not seen before. I am leaning towards an array not being closed since I believe that is the most recent thing being changed to handle the array mappers but I have evidence(yet) to back this claim. -- Jon Computers are incredibly fast, accurate, and stupid. Human beings are incredibly slow, inaccurate, and brilliant. Together they are powerful beyond imagination. - Albert Einstein -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: jstack.out URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: m101_tutorial.tar.gz Type: application/x-gzip Size: 20335 bytes Desc: not available URL: From hategan at mcs.anl.gov Thu Feb 17 15:39:29 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 17 Feb 2011 13:39:29 -0800 Subject: [Swift-devel] Re: Workflow waiting on condition hang In-Reply-To: <4D5D8F6A.20906@gmail.com> References: <4D5D8F6A.20906@gmail.com> Message-ID: <1297978769.20789.2.camel@blabla2.none> On Thu, 2011-02-17 at 15:13 -0600, Jonathan Monette wrote: > Hello, > My workflow seems to be hanging. This is trunk swift-r4107 and > cog-r3051. Attached is a compressed log file and the jstack output for > my workflow. The jstack file says it is waiting for a condition and my > workflow hangs. There's lots of stuff waiting because that's what they do when they don't have anything else to do. So I don't see a problem there. There are no jobs going to the coaster service, so clearly things aren't progressing. So now the question is: does this happen every time you run it or just some times? Also, please send the swift script. Mihael From hategan at mcs.anl.gov Thu Feb 17 15:42:28 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 17 Feb 2011 13:42:28 -0800 Subject: [Swift-devel] deadlock on workflow: In-Reply-To: References: Message-ID: <1297978948.21061.0.camel@blabla2.none> Ok. Your deadlock is genuine, but your version of swift seems old. Are you sure it wasn't fixed in the mean time? On Thu, 2011-02-17 at 13:49 -0600, Allan Espinosa wrote: > Version > > swift-r3835 cog-r2988 > > see attached: > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From aespinosa at cs.uchicago.edu Thu Feb 17 15:45:54 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 17 Feb 2011 15:45:54 -0600 Subject: [Swift-devel] deadlock on workflow: In-Reply-To: <1297978948.21061.0.camel@blabla2.none> References: <1297978948.21061.0.camel@blabla2.none> Message-ID: The latest trunk breaks for another case (see my post on 'broken resume files'). So I can't reproduce this there (yet). 2011/2/17 Mihael Hategan : > Ok. Your deadlock is genuine, but your version of swift seems old. Are > you sure it wasn't fixed in the mean time? > > On Thu, 2011-02-17 at 13:49 -0600, Allan Espinosa wrote: >> Version >> >> swift-r3835 cog-r2988 >> >> see attached: >> >> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Thu Feb 17 15:55:50 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 17 Feb 2011 13:55:50 -0800 Subject: [Swift-devel] deadlock on workflow: In-Reply-To: References: <1297978948.21061.0.camel@blabla2.none> Message-ID: <1297979750.21061.2.camel@blabla2.none> Yeah. I'll take a look at that. But the other question is whether the deadlocking version is something that is worth fixing (i.e. a current stable branch or trunk). On Thu, 2011-02-17 at 15:45 -0600, Allan Espinosa wrote: > The latest trunk breaks for another case (see my post on 'broken > resume files'). So I can't reproduce this there (yet). > > 2011/2/17 Mihael Hategan : > > Ok. Your deadlock is genuine, but your version of swift seems old. Are > > you sure it wasn't fixed in the mean time? > > > > On Thu, 2011-02-17 at 13:49 -0600, Allan Espinosa wrote: > >> Version > >> > >> swift-r3835 cog-r2988 > >> > >> see attached: > >> > >> > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > From wilde at mcs.anl.gov Thu Feb 17 16:49:54 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 17 Feb 2011 16:49:54 -0600 (CST) Subject: [Swift-devel] deadlock on workflow: In-Reply-To: <1297979750.21061.2.camel@blabla2.none> Message-ID: <1338073925.69910.1297982994955.JavaMail.root@zimbra.anl.gov> Allan, If you already stated this I missed it: are you able to run on 0.92? And does the deadlock occur there? Is resume working in 0.92? And, have you considered using explicit resume based on having your input mapper only return members of the dataset that are not yet competed? (I think Glen Hocky used that technique with good results in his latest Glass runs). - Mike ----- Original Message ----- > Yeah. I'll take a look at that. > > But the other question is whether the deadlocking version is something > that is worth fixing (i.e. a current stable branch or trunk). > > > On Thu, 2011-02-17 at 15:45 -0600, Allan Espinosa wrote: > > The latest trunk breaks for another case (see my post on 'broken > > resume files'). So I can't reproduce this there (yet). > > > > 2011/2/17 Mihael Hategan : > > > Ok. Your deadlock is genuine, but your version of swift seems old. > > > Are > > > you sure it wasn't fixed in the mean time? > > > > > > On Thu, 2011-02-17 at 13:49 -0600, Allan Espinosa wrote: > > >> Version > > >> > > >> swift-r3835 cog-r2988 > > >> > > >> see attached: > > >> > > >> > > >> > > >> _______________________________________________ > > >> Swift-devel mailing list > > >> Swift-devel at ci.uchicago.edu > > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Thu Feb 17 17:00:16 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 17 Feb 2011 17:00:16 -0600 Subject: [Swift-devel] deadlock on workflow: In-Reply-To: References: <1297979750.21061.2.camel@blabla2.none> <1338073925.69910.1297982994955.JavaMail.root@zimbra.anl.gov> Message-ID: quick update: it looks like resume is working in 0.91: $ ~/swift/swift-0.91/bin/swift -config swift.properties -resume resumefile postproc.swift Swift svn swift-r3826 cog-r2988 RunID: 20110217-1657-erl4auu4 Progress: Progress: Finished in previous run:1 Progress: Finished in previous run:1 Progress: Finished in previous run:1 2011/2/17 Allan Espinosa : > Hi Mike, > > I haven't tested it yet. I will need my sites.xml definitions to not > use persistent+ passive coasters before being able to test it. > > What's the difference in 'explicit' resume? ?In my setup i have a > "resumefile" that i've been using for the past few months. > > -Allan > > 2011/2/17 Michael Wilde : >> Allan, >> >> If you already stated this I missed it: are you able to run on 0.92? And does the deadlock occur there? Is resume working in 0.92? >> >> And, have you considered using explicit resume based on having your input mapper only return members of the dataset that are not yet competed? (I think Glen Hocky used that technique with good results in his latest Glass runs). >> >> - Mike >> >> >> ----- Original Message ----- >>> Yeah. I'll take a look at that. >>> >>> But the other question is whether the deadlocking version is something >>> that is worth fixing (i.e. a current stable branch or trunk). >>> >>> >>> On Thu, 2011-02-17 at 15:45 -0600, Allan Espinosa wrote: >>> > The latest trunk breaks for another case (see my post on 'broken >>> > resume files'). So I can't reproduce this there (yet). >>> > >>> > 2011/2/17 Mihael Hategan : >>> > > Ok. Your deadlock is genuine, but your version of swift seems old. >>> > > Are >>> > > you sure it wasn't fixed in the mean time? >>> > > >>> > > On Thu, 2011-02-17 at 13:49 -0600, Allan Espinosa wrote: >>> > >> Version >>> > >> >>> > >> swift-r3835 cog-r2988 >>> > >> >>> > >> see attached: >>> > >> -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Thu Feb 17 17:11:04 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 17 Feb 2011 17:11:04 -0600 (CST) Subject: [Swift-devel] deadlock on workflow: In-Reply-To: Message-ID: <2018980679.70011.1297984264120.JavaMail.root@zimbra.anl.gov> Allan, > 2011/2/17 Allan Espinosa : > > Hi Mike, > > > > I haven't tested it yet. I will need my sites.xml definitions to not > > use persistent+ passive coasters before being able to test it. Why is that? 0.92 supports persistent, passive coasters, doesn't it? > > > > What's the difference in 'explicit' resume? In my setup i have a > > "resumefile" that i've been using for the past few months. By "explicit resume" I meant not using the Swift resume feature, but instead, having your input mapper not return any input dataset members that it knows have already been processed successfully, by checking the output dataset. Both styles of resume have their pros and cons. The advantage of this "explicit" resume approach is that the definition of "done" for a dataset member can be application-specific. And that it doesn't depend on the automated Swift feature, which likely needs more testing and hardening. The disadvantage is that you have to program it yourself, explicitly. - Mike > > > > -Allan > > > > 2011/2/17 Michael Wilde : > >> Allan, > >> > >> If you already stated this I missed it: are you able to run on > >> 0.92? And does the deadlock occur there? Is resume working in 0.92? > >> > >> And, have you considered using explicit resume based on having your > >> input mapper only return members of the dataset that are not yet > >> competed? (I think Glen Hocky used that technique with good results > >> in his latest Glass runs). > >> > >> - Mike > >> > >> > >> ----- Original Message ----- > >>> Yeah. I'll take a look at that. > >>> > >>> But the other question is whether the deadlocking version is > >>> something > >>> that is worth fixing (i.e. a current stable branch or trunk). > >>> > >>> > >>> On Thu, 2011-02-17 at 15:45 -0600, Allan Espinosa wrote: > >>> > The latest trunk breaks for another case (see my post on 'broken > >>> > resume files'). So I can't reproduce this there (yet). > >>> > > >>> > 2011/2/17 Mihael Hategan : > >>> > > Ok. Your deadlock is genuine, but your version of swift seems > >>> > > old. > >>> > > Are > >>> > > you sure it wasn't fixed in the mean time? > >>> > > > >>> > > On Thu, 2011-02-17 at 13:49 -0600, Allan Espinosa wrote: > >>> > >> Version > >>> > >> > >>> > >> swift-r3835 cog-r2988 > >>> > >> > >>> > >> see attached: > >>> > >> > > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Thu Feb 17 16:54:14 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 17 Feb 2011 16:54:14 -0600 Subject: [Swift-devel] deadlock on workflow: In-Reply-To: <1338073925.69910.1297982994955.JavaMail.root@zimbra.anl.gov> References: <1297979750.21061.2.camel@blabla2.none> <1338073925.69910.1297982994955.JavaMail.root@zimbra.anl.gov> Message-ID: Hi Mike, I haven't tested it yet. I will need my sites.xml definitions to not use persistent+ passive coasters before being able to test it. What's the difference in 'explicit' resume? In my setup i have a "resumefile" that i've been using for the past few months. -Allan 2011/2/17 Michael Wilde : > Allan, > > If you already stated this I missed it: are you able to run on 0.92? And does the deadlock occur there? Is resume working in 0.92? > > And, have you considered using explicit resume based on having your input mapper only return members of the dataset that are not yet competed? (I think Glen Hocky used that technique with good results in his latest Glass runs). > > - Mike > > > ----- Original Message ----- >> Yeah. I'll take a look at that. >> >> But the other question is whether the deadlocking version is something >> that is worth fixing (i.e. a current stable branch or trunk). >> >> >> On Thu, 2011-02-17 at 15:45 -0600, Allan Espinosa wrote: >> > The latest trunk breaks for another case (see my post on 'broken >> > resume files'). So I can't reproduce this there (yet). >> > >> > 2011/2/17 Mihael Hategan : >> > > Ok. Your deadlock is genuine, but your version of swift seems old. >> > > Are >> > > you sure it wasn't fixed in the mean time? >> > > >> > > On Thu, 2011-02-17 at 13:49 -0600, Allan Espinosa wrote: >> > >> Version >> > >> >> > >> swift-r3835 cog-r2988 >> > >> >> > >> see attached: >> > >> -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Thu Feb 17 17:19:17 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 17 Feb 2011 17:19:17 -0600 Subject: [Swift-devel] deadlock on workflow: In-Reply-To: <2018980679.70011.1297984264120.JavaMail.root@zimbra.anl.gov> References: <2018980679.70011.1297984264120.JavaMail.root@zimbra.anl.gov> Message-ID: 2011/2/17 Michael Wilde : > Allan, > >> 2011/2/17 Allan Espinosa : >> > Hi Mike, >> > >> > I haven't tested it yet. I will need my sites.xml definitions to not >> > use persistent+ passive coasters before being able to test it. > > Why is that? 0.92 supports persistent, passive coasters, doesn't it? My mistake, I was refering to the 0.91 release (tarball package in the Swift webpage) > >> > >> > What's the difference in 'explicit' resume? In my setup i have a >> > "resumefile" that i've been using for the past few months. > > By "explicit resume" I meant not using the Swift resume feature, but instead, having your input mapper not return any input dataset members that it knows have already been processed successfully, by checking the output dataset. > > Both styles of resume have their pros and cons. The advantage of this "explicit" resume approach is that the definition of "done" for a dataset member can be application-specific. And that it doesn't depend on the automated Swift feature, which likely needs more testing and hardening. The disadvantage is that you have to program it yourself, explicitly. > > - Mike > >> > >> > -Allan >> > >> > 2011/2/17 Michael Wilde : >> >> Allan, >> >> >> >> If you already stated this I missed it: are you able to run on >> >> 0.92? And does the deadlock occur there? Is resume working in 0.92? >> >> >> >> And, have you considered using explicit resume based on having your >> >> input mapper only return members of the dataset that are not yet >> >> competed? (I think Glen Hocky used that technique with good results >> >> in his latest Glass runs). >> >> >> >> - Mike >> >> >> >> >> >> ----- Original Message ----- >> >>> Yeah. I'll take a look at that. >> >>> >> >>> But the other question is whether the deadlocking version is >> >>> something >> >>> that is worth fixing (i.e. a current stable branch or trunk). >> >>> >> >>> >> >>> On Thu, 2011-02-17 at 15:45 -0600, Allan Espinosa wrote: >> >>> > The latest trunk breaks for another case (see my post on 'broken >> >>> > resume files'). So I can't reproduce this there (yet). >> >>> > >> >>> > 2011/2/17 Mihael Hategan : >> >>> > > Ok. Your deadlock is genuine, but your version of swift seems >> >>> > > old. >> >>> > > Are >> >>> > > you sure it wasn't fixed in the mean time? >> >>> > > >> >>> > > On Thu, 2011-02-17 at 13:49 -0600, Allan Espinosa wrote: >> >>> > >> Version >> >>> > >> >> >>> > >> swift-r3835 cog-r2988 >> >>> > >> >> >>> > >> see attached: >> >>> > >> -- Allan M. Espinosa PhD student, Computer Science University of Chicago From aespinosa at cs.uchicago.edu Thu Feb 17 17:32:30 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 17 Feb 2011 17:32:30 -0600 Subject: [Swift-devel] Re: resume files broken in trunk? In-Reply-To: References: Message-ID: Resume works in release-0.92: $ ~/swift/stable/bin/swift -config swift.properties -cdm.file fs.data -resume resumefile postproc.swift swift-r4110 cog-r3032 RunID: 20110217-1730-pcgv48qb Progress: Progress: uninitialized:1 Finished in previous run:2 Progress: uninitialized:3 Initializing:259 Finished in previous run:317 Progress: uninitialized:1 Initializing:19 Finished in previous run:609 Progress: Initializing:4 Finished in previous run:658 Progress: uninitialized:1 Finished in previous run:1656 Progress: Initializing:3 Finished in previous run:2985 Progress: Finished in previous run:3150 Progress: Initializing:74 Finished in previous run:4131 Progress: uninitialized:7 Finished in previous run:4450 2011/2/15 Allan Espinosa : > Hi, > > I noticed that the workflow I've been running before doest not create > the resumefile properly after using swift-r4089 cog-r3051 > > -Allan > > -- > Allan M. Espinosa > PhD student, Computer Science > University of Chicago > -- Allan M. Espinosa PhD student, Computer Science University of Chicago From hategan at mcs.anl.gov Thu Feb 17 17:40:00 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 17 Feb 2011 15:40:00 -0800 Subject: [Swift-devel] Re: resume files broken in trunk? In-Reply-To: References: Message-ID: <1297986000.21875.0.camel@blabla2.none> Right. So it's broken, somewhat unsurprisingly, in trunk. On Thu, 2011-02-17 at 17:32 -0600, Allan Espinosa wrote: > Resume works in release-0.92: > > $ ~/swift/stable/bin/swift -config swift.properties -cdm.file fs.data > -resume resumefile postproc.swift > > swift-r4110 cog-r3032 > > RunID: 20110217-1730-pcgv48qb > Progress: > Progress: uninitialized:1 Finished in previous run:2 > Progress: uninitialized:3 Initializing:259 Finished in previous run:317 > Progress: uninitialized:1 Initializing:19 Finished in previous run:609 > Progress: Initializing:4 Finished in previous run:658 > Progress: uninitialized:1 Finished in previous run:1656 > Progress: Initializing:3 Finished in previous run:2985 > Progress: Finished in previous run:3150 > Progress: Initializing:74 Finished in previous run:4131 > Progress: uninitialized:7 Finished in previous run:4450 > > > 2011/2/15 Allan Espinosa : > > Hi, > > > > I noticed that the workflow I've been running before doest not create > > the resumefile properly after using swift-r4089 cog-r3051 > > > > -Allan > > > > -- > > Allan M. Espinosa > > PhD student, Computer Science > > University of Chicago > > > > > From aespinosa at cs.uchicago.edu Thu Feb 17 18:02:47 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 17 Feb 2011 18:02:47 -0600 Subject: [Swift-devel] Re: deadlock on workflow: In-Reply-To: References: Message-ID: ok, the deadlock is in branches/release-0.92 as well (swift-r4110 cog-r3032) -Allan 2011/2/17 Allan Espinosa : > Version > > swift-r3835 cog-r2988 > > see attached: -------------- next part -------------- A non-text attachment was scrubbed... Name: postproc-20110217-1735-qohncon6.log.bz2 Type: application/x-bzip2 Size: 3127599 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: deadlock Type: application/octet-stream Size: 24646 bytes Desc: not available URL: From wilde at mcs.anl.gov Thu Feb 17 18:14:24 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 17 Feb 2011 18:14:24 -0600 (CST) Subject: [Swift-devel] Re: deadlock on workflow: In-Reply-To: Message-ID: <227721707.70260.1297988064430.JavaMail.root@zimbra.anl.gov> Thanks, Allan - good catch. *That* makes it worth fixing I feel, or at least diagnosing its likelihood. Mihael, do you agree? - Mike ----- Original Message ----- > ok, the deadlock is in branches/release-0.92 as well (swift-r4110 > cog-r3032) > > -Allan > > 2011/2/17 Allan Espinosa : > > Version > > > > swift-r3835 cog-r2988 > > > > see attached: > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From aespinosa at cs.uchicago.edu Thu Feb 17 18:33:43 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Thu, 17 Feb 2011 18:33:43 -0600 Subject: [Swift-devel] Re: deadlock on workflow: In-Reply-To: <227721707.70260.1297988064430.JavaMail.root@zimbra.anl.gov> References: <227721707.70260.1297988064430.JavaMail.root@zimbra.anl.gov> Message-ID: Just to describe the setup: persistent coasters + passive workers in PADS. I didn't to a passive-fy workflow before running this one. I'll try this with normal coasters on local:pbs (PADS) -Allan 2011/2/17 Michael Wilde : > Thanks, Allan - good catch. ?*That* makes it worth fixing I feel, or at least diagnosing its likelihood. Mihael, do you agree? > > - Mike > > ----- Original Message ----- >> ok, the deadlock is in branches/release-0.92 as well (swift-r4110 >> cog-r3032) >> >> -Allan >> >> 2011/2/17 Allan Espinosa : >> > Version >> > >> > swift-r3835 cog-r2988 >> > >> > see attached: >> From hategan at mcs.anl.gov Thu Feb 17 21:32:43 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 17 Feb 2011 19:32:43 -0800 Subject: [Swift-devel] Re: deadlock on workflow: In-Reply-To: <227721707.70260.1297988064430.JavaMail.root@zimbra.anl.gov> References: <227721707.70260.1297988064430.JavaMail.root@zimbra.anl.gov> Message-ID: <1297999963.23095.0.camel@blabla2.none> I agree. On Thu, 2011-02-17 at 18:14 -0600, Michael Wilde wrote: > Thanks, Allan - good catch. *That* makes it worth fixing I feel, or at least diagnosing its likelihood. Mihael, do you agree? > > - Mike > > ----- Original Message ----- > > ok, the deadlock is in branches/release-0.92 as well (swift-r4110 > > cog-r3032) > > > > -Allan > > > > 2011/2/17 Allan Espinosa : > > > Version > > > > > > swift-r3835 cog-r2988 > > > > > > see attached: > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Fri Feb 18 20:19:00 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 18 Feb 2011 18:19:00 -0800 Subject: [Swift-devel] coasters about half the jobs Message-ID: <1298081940.31362.26.camel@blabla2.none> There was a bug in the block allocation scheme that would cause blocks to be kept, in the long run, at about half of what would normally be necessary. This included shutting down perfectly good blocks that could be used for jobs. The effect was more dramatic when the maximum block size was 1. I committed a fix for this in the stable branch (cog r3052). If you've experienced the above, you could give this a try. It would also be helpful if you gave it a try anyway, just to check if things are going ok. Mihael From wilde at mcs.anl.gov Fri Feb 18 21:45:13 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 18 Feb 2011 21:45:13 -0600 (CST) Subject: [Swift-devel] coasters about half the jobs In-Reply-To: <1298081940.31362.26.camel@blabla2.none> Message-ID: <482871825.74668.1298087113370.JavaMail.root@zimbra.anl.gov> Just tried this on Beagle with similar workload to the one that shoes the original problem. I got: Progress: Stage in:2486 Submitting:14 Progress: Stage in:1712 Submitting:787 Submitted:1 queuedsize > 0 but no job dequeued. Queued: {} java.lang.Throwable at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) Logs are in: login1$ cat out.pdb.all.00 Swift svn swift-r4061 (swift modified locally) cog-r3052 (cog modified locally) Output on stdout/err is below. Thanks! Mike RunID: 20110218-2137-v87vupcc Progress: SwiftScript trace: 10gs-1 SwiftScript trace: 1a1u-1 SwiftScript trace: 1m3g-1 SwiftScript trace: 1a1x-1 SwiftScript trace: 1a1m-1 SwiftScript trace: 1a12-1 SwiftScript trace: 1m62-1 SwiftScript trace: 1a22-1 SwiftScript trace: 121p-1 SwiftScript trace: 1a4p-1 SwiftScript trace: 1m6b-1 SwiftScript trace: 1m7b-1 SwiftScript trace: 1m9i-1 SwiftScript trace: 1mi1-1 SwiftScript trace: 1m6b-2 SwiftScript trace: 1a22-2 SwiftScript trace: 1mfg-1 SwiftScript trace: 1m9j-1 SwiftScript trace: 1a1w-1 SwiftScript trace: 1mdi-1 SwiftScript trace: 1mq1-1 SwiftScript trace: 1mp1-1 SwiftScript trace: 1mq0-1 SwiftScript trace: 1mk3-1 SwiftScript trace: 1mj4-1 SwiftScript trace: 1mil-1 SwiftScript trace: 1mr1-1 SwiftScript trace: 1nbq-1 SwiftScript trace: 1mr8-1 SwiftScript trace: 1mr1-2 SwiftScript trace: 1n4m-2 SwiftScript trace: 1n83-1 SwiftScript trace: 1mm2-1 SwiftScript trace: 1nd7-1 SwiftScript trace: 1nm8-1 SwiftScript trace: 1n4m-3 SwiftScript trace: 1nfi-2 SwiftScript trace: 1nou-2 SwiftScript trace: 1nou-1 SwiftScript trace: 1nfi-1 SwiftScript trace: 1o5e-1 SwiftScript trace: 1o6u-2 SwiftScript trace: 1nty-1 SwiftScript trace: 1mx3-1 SwiftScript trace: 1n3u-2 SwiftScript trace: 1muz-1 SwiftScript trace: 1o86-1 SwiftScript trace: 1n3u-1 SwiftScript trace: 1oa8-1 SwiftScript trace: 1oc0-1 Progress: uninitialized:3 Progress: Initializing:1311 Selecting site:1189 Progress: Selecting site:2499 Initializing site shared directory:1 Progress: Selecting site:2340 Initializing site shared directory:1 Stage in:159 Progress: Stage in:2486 Submitting:14 Progress: Stage in:1712 Submitting:787 Submitted:1 queuedsize > 0 but no job dequeued. Queued: {} java.lang.Throwable at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) queuedsize > 0 but no job dequeued. Queued: {} java.lang.Throwable at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521) at org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) login1$ finger kelly Logs are on CT net in /home/wilde/mp/mp04: cp ftdock-20110218-2137-v87vupcc.log out.pdb.all.00 ~/mp/mp04/ - Mike ----- Original Message ----- > There was a bug in the block allocation scheme that would cause blocks > to be kept, in the long run, at about half of what would normally be > necessary. This included shutting down perfectly good blocks that > could > be used for jobs. The effect was more dramatic when the maximum block > size was 1. > > I committed a fix for this in the stable branch (cog r3052). If you've > experienced the above, you could give this a try. It would also be > helpful if you gave it a try anyway, just to check if things are going > ok. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Fri Feb 18 21:58:48 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 18 Feb 2011 21:58:48 -0600 (CST) Subject: [Swift-devel] coasters about half the jobs In-Reply-To: <482871825.74668.1298087113370.JavaMail.root@zimbra.anl.gov> Message-ID: <1587690424.74686.1298087928037.JavaMail.root@zimbra.anl.gov> It fails for 10- and 1-job runs as well. - Mike ----- Original Message ----- > Just tried this on Beagle with similar workload to the one that shoes > the original problem. I got: > > Progress: Stage in:2486 Submitting:14 > Progress: Stage in:1712 Submitting:787 Submitted:1 > queuedsize > 0 but no job dequeued. Queued: {} > java.lang.Throwable > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > > Logs are in: > > login1$ cat out.pdb.all.00 > Swift svn swift-r4061 (swift modified locally) cog-r3052 (cog modified > locally) > > Output on stdout/err is below. > > Thanks! > > Mike > > RunID: 20110218-2137-v87vupcc > Progress: > SwiftScript trace: 10gs-1 > SwiftScript trace: 1a1u-1 > SwiftScript trace: 1m3g-1 > SwiftScript trace: 1a1x-1 > SwiftScript trace: 1a1m-1 > SwiftScript trace: 1a12-1 > SwiftScript trace: 1m62-1 > SwiftScript trace: 1a22-1 > SwiftScript trace: 121p-1 > SwiftScript trace: 1a4p-1 > SwiftScript trace: 1m6b-1 > SwiftScript trace: 1m7b-1 > SwiftScript trace: 1m9i-1 > SwiftScript trace: 1mi1-1 > SwiftScript trace: 1m6b-2 > SwiftScript trace: 1a22-2 > SwiftScript trace: 1mfg-1 > SwiftScript trace: 1m9j-1 > SwiftScript trace: 1a1w-1 > SwiftScript trace: 1mdi-1 > SwiftScript trace: 1mq1-1 > SwiftScript trace: 1mp1-1 > SwiftScript trace: 1mq0-1 > SwiftScript trace: 1mk3-1 > SwiftScript trace: 1mj4-1 > SwiftScript trace: 1mil-1 > SwiftScript trace: 1mr1-1 > SwiftScript trace: 1nbq-1 > SwiftScript trace: 1mr8-1 > SwiftScript trace: 1mr1-2 > SwiftScript trace: 1n4m-2 > SwiftScript trace: 1n83-1 > SwiftScript trace: 1mm2-1 > SwiftScript trace: 1nd7-1 > SwiftScript trace: 1nm8-1 > SwiftScript trace: 1n4m-3 > SwiftScript trace: 1nfi-2 > SwiftScript trace: 1nou-2 > SwiftScript trace: 1nou-1 > SwiftScript trace: 1nfi-1 > SwiftScript trace: 1o5e-1 > SwiftScript trace: 1o6u-2 > SwiftScript trace: 1nty-1 > SwiftScript trace: 1mx3-1 > SwiftScript trace: 1n3u-2 > SwiftScript trace: 1muz-1 > SwiftScript trace: 1o86-1 > SwiftScript trace: 1n3u-1 > SwiftScript trace: 1oa8-1 > SwiftScript trace: 1oc0-1 > Progress: uninitialized:3 > Progress: Initializing:1311 Selecting site:1189 > Progress: Selecting site:2499 Initializing site shared directory:1 > Progress: Selecting site:2340 Initializing site shared directory:1 > Stage in:159 > Progress: Stage in:2486 Submitting:14 > Progress: Stage in:1712 Submitting:787 Submitted:1 > queuedsize > 0 but no job dequeued. Queued: {} > java.lang.Throwable > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > queuedsize > 0 but no job dequeued. Queued: {} > java.lang.Throwable > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > login1$ finger kelly > > > Logs are on CT net in /home/wilde/mp/mp04: > cp ftdock-20110218-2137-v87vupcc.log out.pdb.all.00 ~/mp/mp04/ > > - Mike > > > > ----- Original Message ----- > > There was a bug in the block allocation scheme that would cause > > blocks > > to be kept, in the long run, at about half of what would normally be > > necessary. This included shutting down perfectly good blocks that > > could > > be used for jobs. The effect was more dramatic when the maximum > > block > > size was 1. > > > > I committed a fix for this in the stable branch (cog r3052). If > > you've > > experienced the above, you could give this a try. It would also be > > helpful if you gave it a try anyway, just to check if things are > > going > > ok. > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From hategan at mcs.anl.gov Fri Feb 18 22:01:59 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 18 Feb 2011 20:01:59 -0800 Subject: [Swift-devel] coasters about half the jobs In-Reply-To: <1587690424.74686.1298087928037.JavaMail.root@zimbra.anl.gov> References: <1587690424.74686.1298087928037.JavaMail.root@zimbra.anl.gov> Message-ID: <1298088119.5261.0.camel@blabla2.none> Thanks. On Fri, 2011-02-18 at 21:58 -0600, Michael Wilde wrote: > It fails for 10- and 1-job runs as well. > > - Mike > > > ----- Original Message ----- > > Just tried this on Beagle with similar workload to the one that shoes > > the original problem. I got: > > > > Progress: Stage in:2486 Submitting:14 > > Progress: Stage in:1712 Submitting:787 Submitted:1 > > queuedsize > 0 but no job dequeued. Queued: {} > > java.lang.Throwable > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > > > > Logs are in: > > > > login1$ cat out.pdb.all.00 > > Swift svn swift-r4061 (swift modified locally) cog-r3052 (cog modified > > locally) > > > > Output on stdout/err is below. > > > > Thanks! > > > > Mike > > > > RunID: 20110218-2137-v87vupcc > > Progress: > > SwiftScript trace: 10gs-1 > > SwiftScript trace: 1a1u-1 > > SwiftScript trace: 1m3g-1 > > SwiftScript trace: 1a1x-1 > > SwiftScript trace: 1a1m-1 > > SwiftScript trace: 1a12-1 > > SwiftScript trace: 1m62-1 > > SwiftScript trace: 1a22-1 > > SwiftScript trace: 121p-1 > > SwiftScript trace: 1a4p-1 > > SwiftScript trace: 1m6b-1 > > SwiftScript trace: 1m7b-1 > > SwiftScript trace: 1m9i-1 > > SwiftScript trace: 1mi1-1 > > SwiftScript trace: 1m6b-2 > > SwiftScript trace: 1a22-2 > > SwiftScript trace: 1mfg-1 > > SwiftScript trace: 1m9j-1 > > SwiftScript trace: 1a1w-1 > > SwiftScript trace: 1mdi-1 > > SwiftScript trace: 1mq1-1 > > SwiftScript trace: 1mp1-1 > > SwiftScript trace: 1mq0-1 > > SwiftScript trace: 1mk3-1 > > SwiftScript trace: 1mj4-1 > > SwiftScript trace: 1mil-1 > > SwiftScript trace: 1mr1-1 > > SwiftScript trace: 1nbq-1 > > SwiftScript trace: 1mr8-1 > > SwiftScript trace: 1mr1-2 > > SwiftScript trace: 1n4m-2 > > SwiftScript trace: 1n83-1 > > SwiftScript trace: 1mm2-1 > > SwiftScript trace: 1nd7-1 > > SwiftScript trace: 1nm8-1 > > SwiftScript trace: 1n4m-3 > > SwiftScript trace: 1nfi-2 > > SwiftScript trace: 1nou-2 > > SwiftScript trace: 1nou-1 > > SwiftScript trace: 1nfi-1 > > SwiftScript trace: 1o5e-1 > > SwiftScript trace: 1o6u-2 > > SwiftScript trace: 1nty-1 > > SwiftScript trace: 1mx3-1 > > SwiftScript trace: 1n3u-2 > > SwiftScript trace: 1muz-1 > > SwiftScript trace: 1o86-1 > > SwiftScript trace: 1n3u-1 > > SwiftScript trace: 1oa8-1 > > SwiftScript trace: 1oc0-1 > > Progress: uninitialized:3 > > Progress: Initializing:1311 Selecting site:1189 > > Progress: Selecting site:2499 Initializing site shared directory:1 > > Progress: Selecting site:2340 Initializing site shared directory:1 > > Stage in:159 > > Progress: Stage in:2486 Submitting:14 > > Progress: Stage in:1712 Submitting:787 Submitted:1 > > queuedsize > 0 but no job dequeued. Queued: {} > > java.lang.Throwable > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > > queuedsize > 0 but no job dequeued. Queued: {} > > java.lang.Throwable > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521) > > at > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > > login1$ finger kelly > > > > > > Logs are on CT net in /home/wilde/mp/mp04: > > cp ftdock-20110218-2137-v87vupcc.log out.pdb.all.00 ~/mp/mp04/ > > > > - Mike > > > > > > > > ----- Original Message ----- > > > There was a bug in the block allocation scheme that would cause > > > blocks > > > to be kept, in the long run, at about half of what would normally be > > > necessary. This included shutting down perfectly good blocks that > > > could > > > be used for jobs. The effect was more dramatic when the maximum > > > block > > > size was 1. > > > > > > I committed a fix for this in the stable branch (cog r3052). If > > > you've > > > experienced the above, you could give this a try. It would also be > > > helpful if you gave it a try anyway, just to check if things are > > > going > > > ok. > > > > > > Mihael > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > -- > > Michael Wilde > > Computation Institute, University of Chicago > > Mathematics and Computer Science Division > > Argonne National Laboratory > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Fri Feb 18 22:35:48 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 18 Feb 2011 20:35:48 -0800 Subject: [Swift-devel] coasters about half the jobs In-Reply-To: <1298088119.5261.0.camel@blabla2.none> References: <1587690424.74686.1298087928037.JavaMail.root@zimbra.anl.gov> <1298088119.5261.0.camel@blabla2.none> Message-ID: <1298090148.5261.1.camel@blabla2.none> And sorry about that. r3053 should fix that. On Fri, 2011-02-18 at 20:01 -0800, Mihael Hategan wrote: > Thanks. > > On Fri, 2011-02-18 at 21:58 -0600, Michael Wilde wrote: > > It fails for 10- and 1-job runs as well. > > > > - Mike > > > > > > ----- Original Message ----- > > > Just tried this on Beagle with similar workload to the one that shoes > > > the original problem. I got: > > > > > > Progress: Stage in:2486 Submitting:14 > > > Progress: Stage in:1712 Submitting:787 Submitted:1 > > > queuedsize > 0 but no job dequeued. Queued: {} > > > java.lang.Throwable > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > > > > > > Logs are in: > > > > > > login1$ cat out.pdb.all.00 > > > Swift svn swift-r4061 (swift modified locally) cog-r3052 (cog modified > > > locally) > > > > > > Output on stdout/err is below. > > > > > > Thanks! > > > > > > Mike > > > > > > RunID: 20110218-2137-v87vupcc > > > Progress: > > > SwiftScript trace: 10gs-1 > > > SwiftScript trace: 1a1u-1 > > > SwiftScript trace: 1m3g-1 > > > SwiftScript trace: 1a1x-1 > > > SwiftScript trace: 1a1m-1 > > > SwiftScript trace: 1a12-1 > > > SwiftScript trace: 1m62-1 > > > SwiftScript trace: 1a22-1 > > > SwiftScript trace: 121p-1 > > > SwiftScript trace: 1a4p-1 > > > SwiftScript trace: 1m6b-1 > > > SwiftScript trace: 1m7b-1 > > > SwiftScript trace: 1m9i-1 > > > SwiftScript trace: 1mi1-1 > > > SwiftScript trace: 1m6b-2 > > > SwiftScript trace: 1a22-2 > > > SwiftScript trace: 1mfg-1 > > > SwiftScript trace: 1m9j-1 > > > SwiftScript trace: 1a1w-1 > > > SwiftScript trace: 1mdi-1 > > > SwiftScript trace: 1mq1-1 > > > SwiftScript trace: 1mp1-1 > > > SwiftScript trace: 1mq0-1 > > > SwiftScript trace: 1mk3-1 > > > SwiftScript trace: 1mj4-1 > > > SwiftScript trace: 1mil-1 > > > SwiftScript trace: 1mr1-1 > > > SwiftScript trace: 1nbq-1 > > > SwiftScript trace: 1mr8-1 > > > SwiftScript trace: 1mr1-2 > > > SwiftScript trace: 1n4m-2 > > > SwiftScript trace: 1n83-1 > > > SwiftScript trace: 1mm2-1 > > > SwiftScript trace: 1nd7-1 > > > SwiftScript trace: 1nm8-1 > > > SwiftScript trace: 1n4m-3 > > > SwiftScript trace: 1nfi-2 > > > SwiftScript trace: 1nou-2 > > > SwiftScript trace: 1nou-1 > > > SwiftScript trace: 1nfi-1 > > > SwiftScript trace: 1o5e-1 > > > SwiftScript trace: 1o6u-2 > > > SwiftScript trace: 1nty-1 > > > SwiftScript trace: 1mx3-1 > > > SwiftScript trace: 1n3u-2 > > > SwiftScript trace: 1muz-1 > > > SwiftScript trace: 1o86-1 > > > SwiftScript trace: 1n3u-1 > > > SwiftScript trace: 1oa8-1 > > > SwiftScript trace: 1oc0-1 > > > Progress: uninitialized:3 > > > Progress: Initializing:1311 Selecting site:1189 > > > Progress: Selecting site:2499 Initializing site shared directory:1 > > > Progress: Selecting site:2340 Initializing site shared directory:1 > > > Stage in:159 > > > Progress: Stage in:2486 Submitting:14 > > > Progress: Stage in:1712 Submitting:787 Submitted:1 > > > queuedsize > 0 but no job dequeued. Queued: {} > > > java.lang.Throwable > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > > > queuedsize > 0 but no job dequeued. Queued: {} > > > java.lang.Throwable > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521) > > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > > > login1$ finger kelly > > > > > > > > > Logs are on CT net in /home/wilde/mp/mp04: > > > cp ftdock-20110218-2137-v87vupcc.log out.pdb.all.00 ~/mp/mp04/ > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > There was a bug in the block allocation scheme that would cause > > > > blocks > > > > to be kept, in the long run, at about half of what would normally be > > > > necessary. This included shutting down perfectly good blocks that > > > > could > > > > be used for jobs. The effect was more dramatic when the maximum > > > > block > > > > size was 1. > > > > > > > > I committed a fix for this in the stable branch (cog r3052). If > > > > you've > > > > experienced the above, you could give this a try. It would also be > > > > helpful if you gave it a try anyway, just to check if things are > > > > going > > > > ok. > > > > > > > > Mihael > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > -- > > > Michael Wilde > > > Computation Institute, University of Chicago > > > Mathematics and Computer Science Division > > > Argonne National Laboratory > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Sat Feb 19 08:42:56 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 19 Feb 2011 08:42:56 -0600 (CST) Subject: [Swift-devel] coasters about half the jobs In-Reply-To: <1298090148.5261.1.camel@blabla2.none> Message-ID: <330353490.75871.1298126576736.JavaMail.root@zimbra.anl.gov> r3053 nicely fixes the problem with coaster blocks getting cancelled prematurely. But the scheduling behavior still shows a similar problem, in that only about half the cores are utilized. I ran the same setup: foreach.max threads of 50 to run about 2500 jobs at once. Throttle of 25.0 to throttle coasters similarly. 100 slots. Slots have 27 hour walltime (100Ksecs). App maxwalltime in tc.data of 1 hour. The logs are on CI net at /home/wilde/mp/mp04: ftdock-20110218-2307-xfdlhkd5.{log,stdout} Pretty much same execution pattern occurred: > Stage in completes rapidly and jobs are started: Progress: uninitialized:3 Progress: Selecting site:2499 Initializing site shared directory:1 Progress: Selecting site:1100 Initializing site shared directory:1 Stage in:1399 Progress: Stage in:2360 Submitting:140 Progress: Stage in:1630 Submitting:869 Submitted:1 Progress: Stage in:1630 Submitting:820 Submitted:50 Progress: Stage in:1625 Submitting:442 Submitted:433 Progress: Stage in:1625 Submitting:79 Submitted:796 Progress: Stage in:1368 Submitting:12 Submitted:1120 Progress: Stage in:1037 Submitting:58 Submitted:1357 Active:48 Progress: Stage in:302 Submitting:269 Submitted:1812 Active:117 Progress: Submitted:2331 Active:169 Progress: Submitted:2259 Active:241 Progress: Submitted:2211 Active:289 > This time, we get all coaster slots filled pretty quickly: Progress: Submitted:219 Active:2281 Progress: Submitted:147 Active:2353 Progress: Submitted:100 Active:2400 > Then jobs start finishing: Progress: Submitted:100 Active:2399 Checking status:1 Finished successfully:2 Progress: Submitted:100 Active:2399 Finished successfully:7 Progress: Submitted:100 Active:2399 Checking status:1 Finished successfully:14 > Workers stay filled until about 800 jobs finish. Then the worker utilization level starts dropping off, monotonically: Progress: Submitted:95 Active:2399 Checking status:1 Finished successfully:747 Progress: Submitted:90 Active:2398 Finished successfully:760 Progress: Submitted:81 Active:2398 Checking status:1 Stage out:2 Finished successfully:773 Progress: Submitted:71 Active:2399 Checking status:1 Finished successfully:797 Progress: Submitted:68 Active:2398 Checking status:1 Stage out:1 Finished successfully:809 Progress: Submitted:63 Active:2397 Finished successfully:827 Progress: Submitted:64 Active:2392 Checking status:1 Finished successfully:852 Progress: Stage in:1 Submitted:64 Active:2385 Stage out:1 Finished successfully:869 Progress: Submitting:2 Submitted:59 Active:2379 Finished successfully:890 Progress: Submitted:62 Active:2372 Finished successfully:909 > The dropoff continues till I hit ^C on the run: Progress: Stage in:1 Submitted:1174 Active:1024 Finished successfully:3591 Progress: Submitted:1174 Active:1024 Checking status:1 Finished successfully:3591 Progress: Submitted:1175 Active:1023 Checking status:1 Finished successfully:3593 Progress: Submitted:1175 Active:1023 Stage out:1 Finished successfully:3596 Progress: Submitted:1177 Active:1021 Checking status:1 Finished successfully:3601 Progress: Submitted:1178 Active:1020 Checking status:1 Finished successfully:3604 Shutting down worker Shutting down worker > Just before I stopped the run, I checked a few times on # running worker blocks in PBS, and saw this: login1$ qstat -u wilde | grep ' R ' | wc -l 99 I caught at least 1 job in a "C" state. Looks like 1 worker of 100 died, for separate reasons we can explore later. Or, could the one worker termination have triggered the worker-underutilization anomaly? With a duration of 100,000 secs / 27 hours I would have expected the workers to stay up (In the absence of fatal node errors, I guess: which may be a possibility if that one worker died from OOM errors??? I wonder if workers can report RAM pressure stats back to the service? For now, just as logging info; later as a scheduling criteria?) I will try now to run with multi-job coaster blocks. If it works, I'll try with one big block and see how the scheduler handles that config. - Mike ----- Original Message ----- > And sorry about that. > > r3053 should fix that. > > On Fri, 2011-02-18 at 20:01 -0800, Mihael Hategan wrote: > > Thanks. > > > > On Fri, 2011-02-18 at 21:58 -0600, Michael Wilde wrote: > > > It fails for 10- and 1-job runs as well. > > > > > > - Mike > > > > > > > > > ----- Original Message ----- > > > > Just tried this on Beagle with similar workload to the one that > > > > shoes > > > > the original problem. I got: > > > > > > > > Progress: Stage in:2486 Submitting:14 > > > > Progress: Stage in:1712 Submitting:787 Submitted:1 > > > > queuedsize > 0 but no job dequeued. Queued: {} > > > > java.lang.Throwable > > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > > > > > > > > Logs are in: > > > > > > > > login1$ cat out.pdb.all.00 > > > > Swift svn swift-r4061 (swift modified locally) cog-r3052 (cog > > > > modified > > > > locally) > > > > > > > > Output on stdout/err is below. > > > > > > > > Thanks! > > > > > > > > Mike > > > > > > > > RunID: 20110218-2137-v87vupcc > > > > Progress: > > > > SwiftScript trace: 10gs-1 > > > > SwiftScript trace: 1a1u-1 > > > > SwiftScript trace: 1m3g-1 > > > > SwiftScript trace: 1a1x-1 > > > > SwiftScript trace: 1a1m-1 > > > > SwiftScript trace: 1a12-1 > > > > SwiftScript trace: 1m62-1 > > > > SwiftScript trace: 1a22-1 > > > > SwiftScript trace: 121p-1 > > > > SwiftScript trace: 1a4p-1 > > > > SwiftScript trace: 1m6b-1 > > > > SwiftScript trace: 1m7b-1 > > > > SwiftScript trace: 1m9i-1 > > > > SwiftScript trace: 1mi1-1 > > > > SwiftScript trace: 1m6b-2 > > > > SwiftScript trace: 1a22-2 > > > > SwiftScript trace: 1mfg-1 > > > > SwiftScript trace: 1m9j-1 > > > > SwiftScript trace: 1a1w-1 > > > > SwiftScript trace: 1mdi-1 > > > > SwiftScript trace: 1mq1-1 > > > > SwiftScript trace: 1mp1-1 > > > > SwiftScript trace: 1mq0-1 > > > > SwiftScript trace: 1mk3-1 > > > > SwiftScript trace: 1mj4-1 > > > > SwiftScript trace: 1mil-1 > > > > SwiftScript trace: 1mr1-1 > > > > SwiftScript trace: 1nbq-1 > > > > SwiftScript trace: 1mr8-1 > > > > SwiftScript trace: 1mr1-2 > > > > SwiftScript trace: 1n4m-2 > > > > SwiftScript trace: 1n83-1 > > > > SwiftScript trace: 1mm2-1 > > > > SwiftScript trace: 1nd7-1 > > > > SwiftScript trace: 1nm8-1 > > > > SwiftScript trace: 1n4m-3 > > > > SwiftScript trace: 1nfi-2 > > > > SwiftScript trace: 1nou-2 > > > > SwiftScript trace: 1nou-1 > > > > SwiftScript trace: 1nfi-1 > > > > SwiftScript trace: 1o5e-1 > > > > SwiftScript trace: 1o6u-2 > > > > SwiftScript trace: 1nty-1 > > > > SwiftScript trace: 1mx3-1 > > > > SwiftScript trace: 1n3u-2 > > > > SwiftScript trace: 1muz-1 > > > > SwiftScript trace: 1o86-1 > > > > SwiftScript trace: 1n3u-1 > > > > SwiftScript trace: 1oa8-1 > > > > SwiftScript trace: 1oc0-1 > > > > Progress: uninitialized:3 > > > > Progress: Initializing:1311 Selecting site:1189 > > > > Progress: Selecting site:2499 Initializing site shared > > > > directory:1 > > > > Progress: Selecting site:2340 Initializing site shared > > > > directory:1 > > > > Stage in:159 > > > > Progress: Stage in:2486 Submitting:14 > > > > Progress: Stage in:1712 Submitting:787 Submitted:1 > > > > queuedsize > 0 but no job dequeued. Queued: {} > > > > java.lang.Throwable > > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521) > > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > > > > queuedsize > 0 but no job dequeued. Queued: {} > > > > java.lang.Throwable > > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521) > > > > at > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > > > > login1$ finger kelly > > > > > > > > > > > > Logs are on CT net in /home/wilde/mp/mp04: > > > > cp ftdock-20110218-2137-v87vupcc.log out.pdb.all.00 ~/mp/mp04/ > > > > > > > > - Mike > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > There was a bug in the block allocation scheme that would > > > > > cause > > > > > blocks > > > > > to be kept, in the long run, at about half of what would > > > > > normally be > > > > > necessary. This included shutting down perfectly good blocks > > > > > that > > > > > could > > > > > be used for jobs. The effect was more dramatic when the > > > > > maximum > > > > > block > > > > > size was 1. > > > > > > > > > > I committed a fix for this in the stable branch (cog r3052). > > > > > If > > > > > you've > > > > > experienced the above, you could give this a try. It would > > > > > also be > > > > > helpful if you gave it a try anyway, just to check if things > > > > > are > > > > > going > > > > > ok. > > > > > > > > > > Mihael > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > -- > > > > Michael Wilde > > > > Computation Institute, University of Chicago > > > > Mathematics and Computer Science Division > > > > Argonne National Laboratory > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Sat Feb 19 10:15:24 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 19 Feb 2011 10:15:24 -0600 (CST) Subject: [Swift-devel] coasters about half the jobs In-Reply-To: <330353490.75871.1298126576736.JavaMail.root@zimbra.anl.gov> Message-ID: <2048001520.75941.1298132124001.JavaMail.root@zimbra.anl.gov> Mihael, I need to correct one point I made below: > But the scheduling behavior still shows a similar problem, in that > only about half the cores are utilized. As I was pasting the output I realized that the workers *were* getting filled to 100%, but then later the utilization dropped off and did not seem to recover. In later experiments I tested with a single large coaster block of compute nodes instead of many small one-node blocks. This showed some interesting behavior (but I think much better utilization). There I got an oscilating pattern, where I would have all 2400 nodes utilized, then what seemed like sinusoidal dips to about 2100 cores, then back up to 2400, etc. (I cant tell without plotting if its really in fact a sinusoid). Im a bit suspicious of some interaction in this run with the foreach.maxthreads throttle, as that throttle is set only 100 jobs higher than the #workers, and I see curious reporting of the value of "submitted", which does not seem to stay at 100 like I would expect. Since the large node blocks seem to work now, Im going to try to get a science-production run going and we can come back to the scheduling behavior later. I'll post the log from the big-block run shortly and maybe you can see the pattern and issue from that. Thanks, Mike ----- Original Message ----- > r3053 nicely fixes the problem with coaster blocks getting cancelled > prematurely. > But the scheduling behavior still shows a similar problem, in that > only about half the cores are utilized. > > I ran the same setup: foreach.max threads of 50 to run about 2500 jobs > at once. Throttle of 25.0 to throttle coasters similarly. 100 slots. > Slots have 27 hour walltime (100Ksecs). App maxwalltime in tc.data of > 1 hour. > > The logs are on CI net at /home/wilde/mp/mp04: > ftdock-20110218-2307-xfdlhkd5.{log,stdout} > > Pretty much same execution pattern occurred: > > > Stage in completes rapidly and jobs are started: > > Progress: uninitialized:3 > Progress: Selecting site:2499 Initializing site shared directory:1 > Progress: Selecting site:1100 Initializing site shared directory:1 > Stage in:1399 > Progress: Stage in:2360 Submitting:140 > Progress: Stage in:1630 Submitting:869 Submitted:1 > Progress: Stage in:1630 Submitting:820 Submitted:50 > Progress: Stage in:1625 Submitting:442 Submitted:433 > Progress: Stage in:1625 Submitting:79 Submitted:796 > Progress: Stage in:1368 Submitting:12 Submitted:1120 > Progress: Stage in:1037 Submitting:58 Submitted:1357 Active:48 > Progress: Stage in:302 Submitting:269 Submitted:1812 Active:117 > Progress: Submitted:2331 Active:169 > Progress: Submitted:2259 Active:241 > Progress: Submitted:2211 Active:289 > > > This time, we get all coaster slots filled pretty quickly: > > Progress: Submitted:219 Active:2281 > Progress: Submitted:147 Active:2353 > Progress: Submitted:100 Active:2400 > > > Then jobs start finishing: > > Progress: Submitted:100 Active:2399 Checking status:1 Finished > successfully:2 > Progress: Submitted:100 Active:2399 Finished successfully:7 > Progress: Submitted:100 Active:2399 Checking status:1 Finished > successfully:14 > > > Workers stay filled until about 800 jobs finish. Then the worker > > utilization level starts dropping off, monotonically: > > Progress: Submitted:95 Active:2399 Checking status:1 Finished > successfully:747 > Progress: Submitted:90 Active:2398 Finished successfully:760 > Progress: Submitted:81 Active:2398 Checking status:1 Stage out:2 > Finished successfully:773 > Progress: Submitted:71 Active:2399 Checking status:1 Finished > successfully:797 > Progress: Submitted:68 Active:2398 Checking status:1 Stage out:1 > Finished successfully:809 > Progress: Submitted:63 Active:2397 Finished successfully:827 > Progress: Submitted:64 Active:2392 Checking status:1 Finished > successfully:852 > Progress: Stage in:1 Submitted:64 Active:2385 Stage out:1 Finished > successfully:869 > Progress: Submitting:2 Submitted:59 Active:2379 Finished > successfully:890 > Progress: Submitted:62 Active:2372 Finished successfully:909 > > > The dropoff continues till I hit ^C on the run: > > Progress: Stage in:1 Submitted:1174 Active:1024 Finished > successfully:3591 > Progress: Submitted:1174 Active:1024 Checking status:1 Finished > successfully:3591 > Progress: Submitted:1175 Active:1023 Checking status:1 Finished > successfully:3593 > Progress: Submitted:1175 Active:1023 Stage out:1 Finished > successfully:3596 > Progress: Submitted:1177 Active:1021 Checking status:1 Finished > successfully:3601 > Progress: Submitted:1178 Active:1020 Checking status:1 Finished > successfully:3604 > Shutting down worker > > Shutting down worker > > > Just before I stopped the run, I checked a few times on # running > > worker blocks in PBS, and saw this: > > login1$ qstat -u wilde | grep ' R ' | wc -l > 99 > > I caught at least 1 job in a "C" state. Looks like 1 worker of 100 > died, for separate reasons we can explore later. Or, could the one > worker termination have triggered the worker-underutilization anomaly? > > With a duration of 100,000 secs / 27 hours I would have expected the > workers to stay up (In the absence of fatal node errors, I guess: > which may be a possibility if that one worker died from OOM errors??? > I wonder if workers can report RAM pressure stats back to the service? > For now, just as logging info; later as a scheduling criteria?) > > I will try now to run with multi-job coaster blocks. If it works, I'll > try with one big block and see how the scheduler handles that config. > > - Mike > > > ----- Original Message ----- > > And sorry about that. > > > > r3053 should fix that. > > > > On Fri, 2011-02-18 at 20:01 -0800, Mihael Hategan wrote: > > > Thanks. > > > > > > On Fri, 2011-02-18 at 21:58 -0600, Michael Wilde wrote: > > > > It fails for 10- and 1-job runs as well. > > > > > > > > - Mike > > > > > > > > > > > > ----- Original Message ----- > > > > > Just tried this on Beagle with similar workload to the one > > > > > that > > > > > shoes > > > > > the original problem. I got: > > > > > > > > > > Progress: Stage in:2486 Submitting:14 > > > > > Progress: Stage in:1712 Submitting:787 Submitted:1 > > > > > queuedsize > 0 but no job dequeued. Queued: {} > > > > > java.lang.Throwable > > > > > at > > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > > > > > > > > > > Logs are in: > > > > > > > > > > login1$ cat out.pdb.all.00 > > > > > Swift svn swift-r4061 (swift modified locally) cog-r3052 (cog > > > > > modified > > > > > locally) > > > > > > > > > > Output on stdout/err is below. > > > > > > > > > > Thanks! > > > > > > > > > > Mike > > > > > > > > > > RunID: 20110218-2137-v87vupcc > > > > > Progress: > > > > > SwiftScript trace: 10gs-1 > > > > > SwiftScript trace: 1a1u-1 > > > > > SwiftScript trace: 1m3g-1 > > > > > SwiftScript trace: 1a1x-1 > > > > > SwiftScript trace: 1a1m-1 > > > > > SwiftScript trace: 1a12-1 > > > > > SwiftScript trace: 1m62-1 > > > > > SwiftScript trace: 1a22-1 > > > > > SwiftScript trace: 121p-1 > > > > > SwiftScript trace: 1a4p-1 > > > > > SwiftScript trace: 1m6b-1 > > > > > SwiftScript trace: 1m7b-1 > > > > > SwiftScript trace: 1m9i-1 > > > > > SwiftScript trace: 1mi1-1 > > > > > SwiftScript trace: 1m6b-2 > > > > > SwiftScript trace: 1a22-2 > > > > > SwiftScript trace: 1mfg-1 > > > > > SwiftScript trace: 1m9j-1 > > > > > SwiftScript trace: 1a1w-1 > > > > > SwiftScript trace: 1mdi-1 > > > > > SwiftScript trace: 1mq1-1 > > > > > SwiftScript trace: 1mp1-1 > > > > > SwiftScript trace: 1mq0-1 > > > > > SwiftScript trace: 1mk3-1 > > > > > SwiftScript trace: 1mj4-1 > > > > > SwiftScript trace: 1mil-1 > > > > > SwiftScript trace: 1mr1-1 > > > > > SwiftScript trace: 1nbq-1 > > > > > SwiftScript trace: 1mr8-1 > > > > > SwiftScript trace: 1mr1-2 > > > > > SwiftScript trace: 1n4m-2 > > > > > SwiftScript trace: 1n83-1 > > > > > SwiftScript trace: 1mm2-1 > > > > > SwiftScript trace: 1nd7-1 > > > > > SwiftScript trace: 1nm8-1 > > > > > SwiftScript trace: 1n4m-3 > > > > > SwiftScript trace: 1nfi-2 > > > > > SwiftScript trace: 1nou-2 > > > > > SwiftScript trace: 1nou-1 > > > > > SwiftScript trace: 1nfi-1 > > > > > SwiftScript trace: 1o5e-1 > > > > > SwiftScript trace: 1o6u-2 > > > > > SwiftScript trace: 1nty-1 > > > > > SwiftScript trace: 1mx3-1 > > > > > SwiftScript trace: 1n3u-2 > > > > > SwiftScript trace: 1muz-1 > > > > > SwiftScript trace: 1o86-1 > > > > > SwiftScript trace: 1n3u-1 > > > > > SwiftScript trace: 1oa8-1 > > > > > SwiftScript trace: 1oc0-1 > > > > > Progress: uninitialized:3 > > > > > Progress: Initializing:1311 Selecting site:1189 > > > > > Progress: Selecting site:2499 Initializing site shared > > > > > directory:1 > > > > > Progress: Selecting site:2340 Initializing site shared > > > > > directory:1 > > > > > Stage in:159 > > > > > Progress: Stage in:2486 Submitting:14 > > > > > Progress: Stage in:1712 Submitting:787 Submitted:1 > > > > > queuedsize > 0 but no job dequeued. Queued: {} > > > > > java.lang.Throwable > > > > > at > > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > > > > > at > > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521) > > > > > at > > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > > > > > queuedsize > 0 but no job dequeued. Queued: {} > > > > > java.lang.Throwable > > > > > at > > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:253) > > > > > at > > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:521) > > > > > at > > > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) > > > > > login1$ finger kelly > > > > > > > > > > > > > > > Logs are on CT net in /home/wilde/mp/mp04: > > > > > cp ftdock-20110218-2137-v87vupcc.log out.pdb.all.00 ~/mp/mp04/ > > > > > > > > > > - Mike > > > > > > > > > > > > > > > > > > > > ----- Original Message ----- > > > > > > There was a bug in the block allocation scheme that would > > > > > > cause > > > > > > blocks > > > > > > to be kept, in the long run, at about half of what would > > > > > > normally be > > > > > > necessary. This included shutting down perfectly good blocks > > > > > > that > > > > > > could > > > > > > be used for jobs. The effect was more dramatic when the > > > > > > maximum > > > > > > block > > > > > > size was 1. > > > > > > > > > > > > I committed a fix for this in the stable branch (cog r3052). > > > > > > If > > > > > > you've > > > > > > experienced the above, you could give this a try. It would > > > > > > also be > > > > > > helpful if you gave it a try anyway, just to check if things > > > > > > are > > > > > > going > > > > > > ok. > > > > > > > > > > > > Mihael > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > -- > > > > > Michael Wilde > > > > > Computation Institute, University of Chicago > > > > > Mathematics and Computer Science Division > > > > > Argonne National Laboratory > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jon.monette at gmail.com Sat Feb 19 14:54:11 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sat, 19 Feb 2011 14:54:11 -0600 Subject: [Swift-devel] Re: Workflow waiting on condition hang In-Reply-To: <1297978769.20789.2.camel@blabla2.none> References: <4D5D8F6A.20906@gmail.com> <1297978769.20789.2.camel@blabla2.none> Message-ID: <4D602DF3.6000306@gmail.com> Yes. It always seems to hang at the same place. Attached is my montage script. It hangs in the mFitBatch function at the mConcatFit app call. All other files have been created up to that step but that app never runs. On 2/17/11 3:39 PM, Mihael Hategan wrote: > On Thu, 2011-02-17 at 15:13 -0600, Jonathan Monette wrote: >> Hello, >> My workflow seems to be hanging. This is trunk swift-r4107 and >> cog-r3051. Attached is a compressed log file and the jstack output for >> my workflow. The jstack file says it is waiting for a condition and my >> workflow hangs. > There's lots of stuff waiting because that's what they do when they > don't have anything else to do. So I don't see a problem there. > > There are no jobs going to the coaster service, so clearly things aren't > progressing. > > So now the question is: does this happen every time you run it or just > some times? > > Also, please send the swift script. > > Mihael > > -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: montage.swift URL: From hategan at mcs.anl.gov Sat Feb 19 16:33:36 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 19 Feb 2011 14:33:36 -0800 Subject: [Swift-devel] coasters about half the jobs In-Reply-To: <2048001520.75941.1298132124001.JavaMail.root@zimbra.anl.gov> References: <2048001520.75941.1298132124001.JavaMail.root@zimbra.anl.gov> Message-ID: <1298154816.17560.10.camel@blabla2.none> On Sat, 2011-02-19 at 10:15 -0600, Michael Wilde wrote: > Mihael, I need to correct one point I made below: > > > But the scheduling behavior still shows a similar problem, in that > > only about half the cores are utilized. > > As I was pasting the output I realized that the workers *were* getting > filled to 100%, but then later the utilization dropped off and did not > seem to recover. > > In later experiments I tested with a single large coaster block of > compute nodes instead of many small one-node blocks. This showed some > interesting behavior (but I think much better utilization). There I > got an oscilating pattern, where I would have all 2400 nodes utilized, > then what seemed like sinusoidal dips to about 2100 cores, then back > up to 2400, etc. (I cant tell without plotting if its really in fact > a sinusoid). It's oscillating, but overall (assuming some non-trivial distribution of job durations) it should be close to a decaying sine that tends to a value somewhat less than the maximum number of workers. This is a known "problem" when you have a delay between completion and the submission of new jobs. It causes transients around job waves as long as there are such waves (i.e. lots of jobs being run at once). And we've seen this on the BGP and Sarah has also seen this earlier on Ranger. The wider the job time distribution, the quicker the decay of the oscillatory pattern. The solution is to make sure that the coaster service always has enough jobs queued. And enough here is, I would suggest, about twice the amount of maximum workers. So try this: set maximum workers (wpn*slots) to half the site throttle (or the site throttle and the foreach max threads to twice wpn*slots). This way, even if one wave of jobs completes at once, the coaster service will immediately have enough jobs queued that can be started immediately on the available workers. Mihael From aespinosa at cs.uchicago.edu Mon Feb 21 14:40:21 2011 From: aespinosa at cs.uchicago.edu (Allan Espinosa) Date: Mon, 21 Feb 2011 14:40:21 -0600 Subject: [Swift-devel] [Patch] remove incorrect git-svn options Message-ID: patch: diff --git a/libexec/svn-revision b/libexec/svn-revision index 6d88fb8..f759d63 100755 --- a/libexec/svn-revision +++ b/libexec/svn-revision @@ -6,7 +6,7 @@ hereversion() { M=$(svn status | grep --invert-match '^\?' > /dev/null && echo "($1 modified locally)") elif [ -d ".git" ] && [ -x "$(which git)" ]; then R=$(git svn info | grep '^Revision' | sed "s/Revision: /$1-r/") - if git status -a >/dev/null ; then + if git status --porcelain | grep 'M ' >/dev/null ; then M="($1 modified locally)" fi else Before: $ ./libexec/svn-revision error: unknown switch `a' usage: git status [options] [--] ... -v, --verbose be verbose -s, --short show status concisely --porcelain show porcelain output format -z, --null terminate entries with NUL -u, --untracked-files[=] show untracked files, optional modes: all, normal, no. (Default: all) error: unknown switch `a' usage: git status [options] [--] ... -v, --verbose be verbose -s, --short show status concisely --porcelain show porcelain output format -z, --null terminate entries with NUL -u, --untracked-files[=] show untracked files, optional modes: all, normal, no. (Default: all) swift-r4110 cog-r3032 $ After: ]$ ./libexec/svn-revision swift-r4110 (swift modified locally) cog-r3032 -- Allan M. Espinosa PhD student, Computer Science University of Chicago From wilde at mcs.anl.gov Wed Feb 23 11:30:54 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 23 Feb 2011 11:30:54 -0600 (CST) Subject: [Swift-devel] Example of a binary release build process from the OpenMX team In-Reply-To: Message-ID: <831748476.92132.1298482254052.JavaMail.root@zimbra.anl.gov> Attached is a note from Michael Spiegel on how they build OpenMx releases. Just fyi. OpenMx has a much harder job than us because they build binaries compiled for specific platforms. As a pure Java product, we avoid much of this. But the attached doc has some release workflow examples that may be useful to us, including building the doc set and updating their web with a notice about the new release. They dont automate this; we have I think some automation of our process but it needs to be dusted off, adjusted, and documented. - Mike ----- Forwarded Message ----- From: "Michael Spiegel" To: "OpenMx Developers" Sent: Tuesday, February 22, 2011 3:54:00 PM Subject: [[openmx-dev]] binary release instructions These instructions are primarily for Ross. But I figured we should keep a record of this information somewhere. I do not recommend attempting to automate this process, unless you are very good at writing error handling routines for shell scripts. Invariably, something will go wrong in the build process. When running this process by hand, a person can stop and attempt to correct the problem. An automated script is likely to push the website into an inconsistent state. --Michael -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- There are four possible build types when creating an OpenMx binary release. These four types are the cross-product of: (a) either building a "dev" release from /trunk or a "stable" release from /tag. And (b) either building the online documentation or not. Never lose track of which build type you currently creating. The following definitions are used in these notes: SERVER = "openmx.psyc.virginia.edu" SERVER_ROOT = "/var/openmx/newrelease" SVN_REPO = either the "trunk" directory or a "tags/stable-XX" directory I. On your local machine, we are run several tests on the code. When the tests have passed, we will commit some changes to the svn repository that indicate a new binary version has been released. - In SVN_REPO, run "make install". - In SVN_REPO, run "make test". Optionally use the CPUS=n argument. All these tests MUST PASS. - In SVN_REPO, run "make check". Ignore the warning for missing documentation entires. All other warnings must be corrected. - Go to http://openmx.psyc.virginia.edu/dev/timeline. Let R* be the revision number in the top left panel. The next revision number R = R* + 1. The full version number of a "dev" release is VERSION = 999.0.0-R. The full version number of a stable release is VERSION = X.Y.Z-R. - In SVN_REPO/Makefile, change the TARGET to OpenMx_[VERSION].tar.gz. In SVN_REPO/DESCRIPTION, change the "Version:" field and the "Date:" field. In SVN_REPO/CHANGES, rename the first line from "trunk" to "Release [VERSION] (Today's date)". In SVN_REPO/CHANGES, make sure to update the "=====" to the correct length. That eliminates a warning message. If you are building the documentation, then in docs/source/conf.py update release and version information. - Commit these changes to the SVN repository. Step (I) is completed. II. ssh to SERVER and go to the directory SERVER_ROOT. Run the script ./cleanup.sh. You are going to build the binary release on various machines. For each build, you will copy the result into a subdirectory of SERVER_ROOT. Each row of the following table has a hostname, a makefile target, the output file on the hostname, and the destination directory on SERVER. hostname make target output file destination directory euterpe build32 build/OpenMx**.tar.gz macosx-intel-32-2.12 euterpe build64 build/OpenMx**.tar.gz macosx-intel-64-2.12 euterpe buildppc build/OpenMx**.tar.gz macosx-intel-ppc-2.12 euterpe srcbuild build/OpenMx**.tar.gz source euterpe pdf build/OpenMx.pdf docs-api-pdf polymnia build32 build/OpenMx**.tar.gz macosx-intel-32-2.11 polymnia build64 build/OpenMx**.tar.gz macosx-intel-64-2.11 polymnia buildppc build/OpenMx**.tar.gz macosx-intel-ppc-2.11 win R 2.11 winbuild32 build/OpenMx**.tar.gz windows-32-2.11 win R 2.12 winbuild32 build/OpenMx**.tar.gz windows-32-2.12 You will need to rely on me to generate the windows builds until somebody else on the team procures a reasonable development machine that is running windows. If you are building the documentation, then first run "make html" on euterpe. Then enter the docs/ directory and run "make latex". Finally enter the docs/build/latex directory and run "make all-pdf". You want to scp the file docs/build/latex/OpenMx.pdf into the destination directory "docs-userguide-pdf". You want to recursively copy all the contents of docs/build/html into the destination directory "docs-userguide-html". Now ssh into SERVER and go to SERVER_ROOT. Type "ls -l *" to see that each directory has some files. You want to edit the script "go.sh". Change the version number to the correct version number of this binary release. Next select the values you want for the TARGETDIR and MAKEDOCS variables. Comment out the values you don't want. Run ./go.sh and see if any errors are reported. Finally, go to the website and test the installation of the binary release on your laptop. Then logon to the website and click on "Create Story" under "Publicist" in the left-hand panel. Create an announcement of the new binary release. You may copy/paste the top of the CHANGES file into this annoucement. From wilde at mcs.anl.gov Wed Feb 23 11:33:51 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 23 Feb 2011 11:33:51 -0600 (CST) Subject: [Swift-devel] Userguide line for 0.91 needs updating Message-ID: <285619035.92143.1298482431332.JavaMail.root@zimbra.anl.gov> Sarah, The doc link at: http://www.ci.uchicago.edu/swift/docs/index.php Swift User Guide single-page html ...points to the trunk userguide instead of the 0.91 version. - Mike From wilde at mcs.anl.gov Wed Feb 23 12:02:46 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 23 Feb 2011 12:02:46 -0600 (CST) Subject: [Swift-devel] Wiki page on maintaining Swift Web content Message-ID: <1058673345.92263.1298484166337.JavaMail.root@zimbra.anl.gov> Sarah, David, This page created by Justin will likely be useful: http://www.ci.uchicago.edu/wiki/bin/view/SWFT/MaintainingSwiftWebContent It would be great if you can update as we go through the Swift-web update process. - Mike From skenny at uchicago.edu Thu Feb 24 13:31:22 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Thu, 24 Feb 2011 11:31:22 -0800 Subject: [Swift-devel] SwiftR user-ready? Message-ID: does this work? http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SwiftR if so i have someone who might be interested in using it, he's just trying to run bootstrapping on an 8-core mac and getting frustrated with snowfall...i'd seen svn activity on this so thought i'd see if it was in a state where you might want someone to play around with it :) however, the download failed for me: [skenny at cosmo builds]$ wget http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz --2011-02-24 11:21:45-- http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz Resolving people.cs.uchicago.edu... 128.135.164.139 Connecting to people.cs.uchicago.edu|128.135.164.139|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2011-02-24 11:21:45 ERROR 404: Not Found -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Feb 24 13:56:54 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 24 Feb 2011 13:56:54 -0600 (CST) Subject: [Swift-devel] SwiftR user-ready? In-Reply-To: Message-ID: <885057122.97707.1298577414458.JavaMail.root@zimbra.anl.gov> Sarah, it does work - although it needs much more testing and hardening. Tim Armstrong, a CS PhD student working under Ian is developing and enhancing it. At the moments Tim is getting 400X speedup on 512-core cluster jobs on Eureka/ He's tuning it further, and will test and validate on Beagle shortly. It would be great if you can have your user try it and report problems and questions back to swift-user for now where Tim and I can answer from. Later we may move or cc: the questions to an OpenMx list. It would be good to know what the snowfall problems are so we can know more of the pitfalls of parallel R. Tim and I are eager to get more users and validate that SwiftR works on many platforms. Related: we'd like to try to replicate some of your large fMRI-SEM workflows under this. Could you suggest the best way to get started on that? (Ie, from a package of data and swift + shell scripts, Tim could transcribe the workflow to SwiftR). Thanks, - Mike ----- Original Message ----- does this work? http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SwiftR if so i have someone who might be interested in using it, he's just trying to run bootstrapping on an 8-core mac and getting frustrated with snowfall...i'd seen svn activity on this so thought i'd see if it was in a state where you might want someone to play around with it :) however, the download failed for me: [skenny at cosmo builds]$ wget http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz --2011-02-24 11:21:45-- http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz Resolving people.cs.uchicago.edu... 128.135.164.139 Connecting to people.cs.uchicago.edu |128.135.164.139|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2011-02-24 11:21:45 ERROR 404: Not Found _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From skenny at uchicago.edu Thu Feb 24 14:14:36 2011 From: skenny at uchicago.edu (Sarah Kenny) Date: Thu, 24 Feb 2011 12:14:36 -0800 Subject: [Swift-devel] SwiftR user-ready? In-Reply-To: <885057122.97707.1298577414458.JavaMail.root@zimbra.anl.gov> References: <885057122.97707.1298577414458.JavaMail.root@zimbra.anl.gov> Message-ID: On Thu, Feb 24, 2011 at 11:56 AM, Michael Wilde wrote: > Sarah, it does work > do you have the correct pointer for the tarball? > > ------------------------------ > > does this work? http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SwiftR > > if so i have someone who might be interested in using it, he's just trying > to run bootstrapping on an 8-core mac and getting frustrated with > snowfall...i'd seen svn activity on this so thought i'd see if it was in a > state where you might want someone to play around with it :) however, the > download failed for me: > > [skenny at cosmo builds]$ wget > http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz > --2011-02-24 11:21:45-- > http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz > Resolving people.cs.uchicago.edu... 128.135.164.139 > Connecting to people.cs.uchicago.edu|128.135.164.139|:80... connected. > HTTP request sent, awaiting response... 404 Not Found > 2011-02-24 11:21:45 ERROR 404: Not Found > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > -- > Michael Wilde > Computation Institute, University of Chicago > Mathematics and Computer Science Division > Argonne National Laboratory > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at mcs.anl.gov Thu Feb 24 16:47:51 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 24 Feb 2011 16:47:51 -0600 (CST) Subject: [Swift-devel] SwiftR user-ready? In-Reply-To: Message-ID: <98097976.98934.1298587671555.JavaMail.root@zimbra.anl.gov> Sorry, I missed the note about the download failure. But pasting your wget into a ci machine it works for me. Was there a transient error, or did Tim fix it in the meantime? Can you try again now? - Mike login1$ wget http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz --2011-02-24 16:45:31-- http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz Resolving people.cs.uchicago.edu... 128.135.164.139 Connecting to people.cs.uchicago.edu|128.135.164.139|:80... connected. HTTP request sent, awaiting response... 200 OK Length: 23061464 (22M) [application/x-gzip] Saving to: `Swift_0.1.3.tar.gz.1' 100%[============================================================================================>] 23,061,464 8.50M/s in 2.6s 2011-02-24 16:45:35 (8.50 MB/s) - `Swift_0.1.3.tar.gz.1' saved [23061464/23061464] login1$ ----- Original Message ----- On Thu, Feb 24, 2011 at 11:56 AM, Michael Wilde < wilde at mcs.anl.gov > wrote: Sarah, it does work do you have the correct pointer for the tarball? does this work? http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SwiftR if so i have someone who might be interested in using it, he's just trying to run bootstrapping on an 8-core mac and getting frustrated with snowfall...i'd seen svn activity on this so thought i'd see if it was in a state where you might want someone to play around with it :) however, the download failed for me: [skenny at cosmo builds]$ wget http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz --2011-02-24 11:21:45-- http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz Resolving people.cs.uchicago.edu... 128.135.164.139 Connecting to people.cs.uchicago.edu |128.135.164.139|:80... connected. HTTP request sent, awaiting response... 404 Not Found 2011-02-24 11:21:45 ERROR 404: Not Found _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Thu Feb 24 16:57:06 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Thu, 24 Feb 2011 16:57:06 -0600 Subject: [Swift-devel] SwiftR user-ready? In-Reply-To: References: <885057122.97707.1298577414458.JavaMail.root@zimbra.anl.gov> Message-ID: Hi Sarah, Sorry about the dead link, that was my mistake - it should work now. It does indeed work. It's still very much in development, but the current version is quite dependable running on a single multi-core machine. I haven't tested on a Mac yet, but it should work provided there is a Java 1.6+ VM and R 2.11+. It would be good to get confirmation that there aren't any funny mac-specific issues. For your user it should be as simple as > swiftInit(cores=8) to start up swift on 8 cores, then to apply a functino to a list of arguments > arglists = list( 1, 2, 3, 4, 5) # arguments for invocations of bootstrap > results <- swiftapply(bootstrap, arglists) If they have a dataset used by all instances of bootstrap, the following can work: >swiftExport(hugeDataSet) >swiftapply(bootstrap, arglists) If they need to use a library: > swiftLibrary(MyLibrary) - Tim On Thu, Feb 24, 2011 at 2:14 PM, Sarah Kenny wrote: > > > On Thu, Feb 24, 2011 at 11:56 AM, Michael Wilde wrote: > >> Sarah, it does work >> > > do you have the correct pointer for the tarball? > > >> >> ------------------------------ >> >> does this work? http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SwiftR >> >> if so i have someone who might be interested in using it, he's just trying >> to run bootstrapping on an 8-core mac and getting frustrated with >> snowfall...i'd seen svn activity on this so thought i'd see if it was in a >> state where you might want someone to play around with it :) however, the >> download failed for me: >> >> [skenny at cosmo builds]$ wget >> http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz >> --2011-02-24 11:21:45-- >> http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz >> Resolving people.cs.uchicago.edu... 128.135.164.139 >> Connecting to people.cs.uchicago.edu|128.135.164.139|:80... connected. >> HTTP request sent, awaiting response... 404 Not Found >> 2011-02-24 11:21:45 ERROR 404: Not Found >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> >> >> >> -- >> Michael Wilde >> Computation Institute, University of Chicago >> Mathematics and Computer Science Division >> Argonne National Laboratory >> >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Thu Feb 24 16:58:20 2011 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Thu, 24 Feb 2011 16:58:20 -0600 Subject: [Swift-devel] SwiftR user-ready? In-Reply-To: References: <885057122.97707.1298577414458.JavaMail.root@zimbra.anl.gov> Message-ID: P.S. I also pushed out a new release, the new URL is: http://people.cs.uchicago.edu/~tga/swiftR/Swift_0.1.4.tar.gz - Tim On Thu, Feb 24, 2011 at 4:57 PM, Tim Armstrong wrote: > Hi Sarah, > > Sorry about the dead link, that was my mistake - it should work now. > > It does indeed work. It's still very much in development, but the current > version is quite dependable running on a single multi-core machine. > > I haven't tested on a Mac yet, but it should work provided there is a Java > 1.6+ VM and R 2.11+. It would be good to get confirmation that there aren't > any funny mac-specific issues. > > For your user it should be as simple as > > swiftInit(cores=8) > to start up swift on 8 cores, then to apply a functino to a list of > arguments > > arglists = list( 1, 2, 3, 4, 5) # arguments for invocations of > bootstrap > > results <- swiftapply(bootstrap, arglists) > > If they have a dataset used by all instances of bootstrap, the following > can work: > >swiftExport(hugeDataSet) > >swiftapply(bootstrap, arglists) > > If they need to use a library: > > > swiftLibrary(MyLibrary) > > - Tim > > > > On Thu, Feb 24, 2011 at 2:14 PM, Sarah Kenny wrote: > >> >> >> On Thu, Feb 24, 2011 at 11:56 AM, Michael Wilde wrote: >> >>> Sarah, it does work >>> >> >> do you have the correct pointer for the tarball? >> >> >>> >>> ------------------------------ >>> >>> does this work? http://www.ci.uchicago.edu/wiki/bin/view/SWFT/SwiftR >>> >>> if so i have someone who might be interested in using it, he's just >>> trying to run bootstrapping on an 8-core mac and getting frustrated with >>> snowfall...i'd seen svn activity on this so thought i'd see if it was in a >>> state where you might want someone to play around with it :) however, the >>> download failed for me: >>> >>> [skenny at cosmo builds]$ wget >>> http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz >>> --2011-02-24 11:21:45-- >>> http://people.cs.uchicago.edu/~tga/Swift_0.1.3.tar.gz >>> Resolving people.cs.uchicago.edu... 128.135.164.139 >>> Connecting to people.cs.uchicago.edu|128.135.164.139|:80... connected. >>> HTTP request sent, awaiting response... 404 Not Found >>> 2011-02-24 11:21:45 ERROR 404: Not Found >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >>> >>> >>> -- >>> Michael Wilde >>> Computation Institute, University of Chicago >>> Mathematics and Computer Science Division >>> Argonne National Laboratory >>> >>> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jon.monette at gmail.com Sat Feb 26 00:22:05 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sat, 26 Feb 2011 00:22:05 -0600 Subject: [Swift-devel] Error in Swift mapping Message-ID: <4D689C0D.2030504@gmail.com> Hello, I seem to have found an error in Swift. Here is the error that Swift reported. Execution failed: swift#mapper#17019 is closed with a value of proj_dir/proj_raw_image_3.fits And here is a portion from the log file. 011-02-25 21:09:48,012-0600 DEBUG VDL2ExecutionContext java.lang.IllegalArgumentException: swift#mapper#17019 is closed with a value of proj_dir/proj_raw_image_3.fits java.lang.IllegalArgumentException: swift#mapper#17019 is closed with a value of proj_dir/proj_raw_image_3.fits Caused by: java.lang.IllegalArgumentException: swift#mapper#17019 is closed with a value of proj_dir/proj_raw_image_3.fits at org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:338) at org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:218) at org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:88) at org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:50) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.futureModified(AbstractSequentialWithArguments.java:210) at org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:71) at org.griphyn.vdl.karajan.DSHandleFutureWrapper.addModificationAction(DSHandleFutureWrapper.java:60) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:199) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) at org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) at org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) at org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) at org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) at org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) at org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) at org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) at org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) at org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) at org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) at org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) at java.util.concurrent.FutureTask.run(FutureTask.java:138) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:662) From wilde at mcs.anl.gov Sat Feb 26 07:21:30 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 26 Feb 2011 07:21:30 -0600 (CST) Subject: [Swift-devel] Error in Swift mapping In-Reply-To: <4D689C0D.2030504@gmail.com> Message-ID: <22209853.104170.1298726490620.JavaMail.root@zimbra.anl.gov> Hi Jon, Can you send us the version of Swift you used, any local mods to it, and the full log file and source script. If possible, a command line that we could replicate it with. ANy chance you could replicate it with a self-contained script that we could re-execute? Thanks, Mike ----- Original Message ----- > Hello, > I seem to have found an error in Swift. Here is the error that > Swift reported. > > Execution failed: > swift#mapper#17019 is closed with a value of > proj_dir/proj_raw_image_3.fits > > And here is a portion from the log file. > > 011-02-25 21:09:48,012-0600 DEBUG VDL2ExecutionContext > java.lang.IllegalArgumentException: swift#mapper#17019 is closed with > a > value of proj_dir/proj_raw_image_3.fits > java.lang.IllegalArgumentException: swift#mapper#17019 is closed with > a > value of proj_dir/proj_raw_image_3.fits > Caused by: java.lang.IllegalArgumentException: swift#mapper#17019 is > closed with a value of proj_dir/proj_raw_image_3.fits > at > org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:338) > at > org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:218) > at > org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:88) > at > org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:50) > at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.futureModified(AbstractSequentialWithArguments.java:210) > at > org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:71) > at > org.griphyn.vdl.karajan.DSHandleFutureWrapper.addModificationAction(DSHandleFutureWrapper.java:60) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:199) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) > at > org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > at > org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) > at > org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > at > org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > at > org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > at > org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > at > org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > at > org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From jon.monette at gmail.com Sat Feb 26 13:17:34 2011 From: jon.monette at gmail.com (Jonathan Monette) Date: Sat, 26 Feb 2011 13:17:34 -0600 Subject: [Swift-devel] Error in Swift mapping In-Reply-To: <22209853.104170.1298726490620.JavaMail.root@zimbra.anl.gov> References: <22209853.104170.1298726490620.JavaMail.root@zimbra.anl.gov> Message-ID: <4D6951CE.5070603@gmail.com> The swift version is r4143. The cog version is r3056. Attached is the log file and the script. It is from my Montage stuff. This is the first time this error has appeared. I have no local mods to Swift. I will try to replicate the error in a small script since it seems the mapping error occurs early on in my script. On 2/26/11 7:21 AM, Michael Wilde wrote: > Hi Jon, > > Can you send us the version of Swift you used, any local mods to it, and the full log file and source script. If possible, a command line that we could replicate it with. ANy chance you could replicate it with a self-contained script that we could re-execute? > > Thanks, > > Mike > > ----- Original Message ----- >> Hello, >> I seem to have found an error in Swift. Here is the error that >> Swift reported. >> >> Execution failed: >> swift#mapper#17019 is closed with a value of >> proj_dir/proj_raw_image_3.fits >> >> And here is a portion from the log file. >> >> 011-02-25 21:09:48,012-0600 DEBUG VDL2ExecutionContext >> java.lang.IllegalArgumentException: swift#mapper#17019 is closed with >> a >> value of proj_dir/proj_raw_image_3.fits >> java.lang.IllegalArgumentException: swift#mapper#17019 is closed with >> a >> value of proj_dir/proj_raw_image_3.fits >> Caused by: java.lang.IllegalArgumentException: swift#mapper#17019 is >> closed with a value of proj_dir/proj_raw_image_3.fits >> at >> org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:338) >> at >> org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:218) >> at >> org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:88) >> at >> org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:50) >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) >> at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.futureModified(AbstractSequentialWithArguments.java:210) >> at >> org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:71) >> at >> org.griphyn.vdl.karajan.DSHandleFutureWrapper.addModificationAction(DSHandleFutureWrapper.java:60) >> at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:199) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> at >> org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) >> at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> at >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) >> at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) >> at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) >> at >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) >> at >> org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) >> at >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) >> at >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) >> at >> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) >> at >> org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) >> at >> org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) >> at >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) >> at >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) >> at java.lang.Thread.run(Thread.java:662) >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: m101_montage.log URL: -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: montage.swift URL: From wilde at mcs.anl.gov Sat Feb 26 15:26:04 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Sat, 26 Feb 2011 15:26:04 -0600 (CST) Subject: [Swift-devel] A suggested web content strategy for Swift Message-ID: <2091987030.104718.1298755564473.JavaMail.root@zimbra.anl.gov> Condensing thoughts from many teams members (Sarah, Justin, Mihael, David, Ketan) and tying together various discussions on this topic, here's my suggestion on how to proceed. 1. Treat the Swift web as a standalone entity, whose backbone framework can be content- and version- managed outside of Swift SVN. 2. Treat Swift documents (User Guide, Tutorial, and eventual Reference Manual) as svn-managed part of trunk and each release branch, pushed to version-specific URLs and linked into the web backbone manually each time a new release appears (or eventually, automatically) 3. Use a fast web content manager (Google sites or docs) to make preliminary versions of newly developed user content available fast; then migrate that content to the appropriate SVN-controlled document. a) Do the 3-4 pages we identified for 0.92 release b) Convert all such content on the SWFT wiki to this format as a first step. 4. Use Google Sites to gather and mock up the structure and content of a revised Swift web site that addresses the main new-user and user-community-growing needs. 5. Decide on one of three strategies for SVN docs: a) stay with docbook but clean up the format of the content (mostly indentation and tabbing issues) and conquer the problems of tool execution and the push-to-web process. b) switch to Sphinx and reStructured text, ala the Python documentation framework. This will have its own tool and push issues, but is more likely to be easier on many counts: - the tools are "just" python scripts - the format is very simple and amenable to production with plain text editors (and much easier on the eye for writing) - its got a much larger and growing user base and tool support than docbook c) use Google docs *iff* we can svn-manage the content loss-free in some text format in SVN. Then we use Google Documents simply as a ubiquitous wysiwyg editor but we push each revision back into SVN. This is less likely to be feasible but worth examining. 6. Push hard to consolidate our web content and get to the point where we can first enlist Gail Pieper and others to review and improve the content. 7. Then engage the Argonne/CI web team to help spruce up the look. They may suggest at that point that we move the site to Word Press, ala most of the Argonne and CI production sites, including the Globus Online site. So I *think* that we are on exactly the right track with respect to most of these points. I suggest we do some fast, lightweight experiments to decide on how we want to manage the document content. One to three experiments (in the order below) might help guide us here: Exp 1. Try using Google docs (or perhaps sites) as an editing tool for the user guide. See if we can save content to text, push to svn, push to web (and pdf) from there, and repeat the editing cycle. I would be OK if we had to sacrifice the page-per-chapter html format for now. Exp 3. See what improvements we could make in the docbook content editing and management process. What changes, if any, to this approach would people suggest? - Mike From dk0966 at cs.ship.edu Sun Feb 27 18:26:19 2011 From: dk0966 at cs.ship.edu (David Kelly) Date: Sun, 27 Feb 2011 19:26:19 -0500 Subject: [Swift-devel] A suggested web content strategy for Swift In-Reply-To: <2091987030.104718.1298755564473.JavaMail.root@zimbra.anl.gov> References: <2091987030.104718.1298755564473.JavaMail.root@zimbra.anl.gov> Message-ID: Personally I find it easiest to write documentation by using Google docs. We used google docs in a software engineering class, and I found the collaboration abilities to be very useful. My preferences when writing documentation: 1) Google docs 2) Word/open office ... 99) Manually entering octal values in a hex editor ... 9001) docbook If we wanted to, I think it's possible to automate Google docs -> SVN. There is a set of utilities called google command line tools. They are written in python and allow you to do things like: $ google docs get --title "Userguide 0.92" http://code.google.com/p/googlecl/ I've installed it on my laptop and it seems to work well so far. David On Sat, Feb 26, 2011 at 4:26 PM, Michael Wilde wrote: > Condensing thoughts from many teams members (Sarah, Justin, Mihael, David, > Ketan) and tying together various discussions on this topic, here's my > suggestion on how to proceed. > > 1. Treat the Swift web as a standalone entity, whose backbone framework can > be content- and version- managed outside of Swift SVN. > > 2. Treat Swift documents (User Guide, Tutorial, and eventual Reference > Manual) as svn-managed part of trunk and each release branch, pushed to > version-specific URLs and linked into the web backbone manually each time a > new release appears (or eventually, automatically) > > 3. Use a fast web content manager (Google sites or docs) to make > preliminary versions of newly developed user content available fast; then > migrate that content to the appropriate SVN-controlled document. > > a) Do the 3-4 pages we identified for 0.92 release > b) Convert all such content on the SWFT wiki to this format as a first > step. > > 4. Use Google Sites to gather and mock up the structure and content of a > revised Swift web site that addresses the main new-user and > user-community-growing needs. > > 5. Decide on one of three strategies for SVN docs: > > a) stay with docbook but clean up the format of the content (mostly > indentation and tabbing issues) and conquer the problems of tool execution > and the push-to-web process. > > b) switch to Sphinx and reStructured text, ala the Python documentation > framework. This will have its own tool and push issues, but is more likely > to be easier on many counts: > - the tools are "just" python scripts > - the format is very simple and amenable to production with plain text > editors > (and much easier on the eye for writing) > - its got a much larger and growing user base and tool support than docbook > > c) use Google docs *iff* we can svn-manage the content loss-free in some > text format in SVN. Then we use Google Documents simply as a ubiquitous > wysiwyg editor but we push each revision back into SVN. This is less likely > to be feasible but worth examining. > > 6. Push hard to consolidate our web content and get to the point where we > can first enlist Gail Pieper and others to review and improve the content. > > 7. Then engage the Argonne/CI web team to help spruce up the look. They may > suggest at that point that we move the site to Word Press, ala most of the > Argonne and CI production sites, including the Globus Online site. > > So I *think* that we are on exactly the right track with respect to most of > these points. I suggest we do some fast, lightweight experiments to > decide on how we want to manage the document content. One to three > experiments (in the order below) might help guide us here: > > Exp 1. Try using Google docs (or perhaps sites) as an editing tool for the > user guide. See if we can save content to text, push to svn, push to web > (and pdf) from there, and repeat the editing cycle. I would be OK if we had > to sacrifice the page-per-chapter html format for now. > > Exp 3. See what improvements we could make in the docbook content editing > and management process. > > What changes, if any, to this approach would people suggest? > > - Mike > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Feb 28 03:47:35 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 28 Feb 2011 01:47:35 -0800 Subject: [Swift-devel] Error in Swift mapping In-Reply-To: <4D6951CE.5070603@gmail.com> References: <22209853.104170.1298726490620.JavaMail.root@zimbra.anl.gov> <4D6951CE.5070603@gmail.com> Message-ID: <1298886455.31181.0.camel@blabla2.none> Trunk or stable branch? On Sat, 2011-02-26 at 13:17 -0600, Jonathan Monette wrote: > The swift version is r4143. The cog version is r3056. Attached is the > log file and the script. It is from my Montage stuff. This is the > first time this error has appeared. I have no local mods to Swift. I > will try to replicate the error in a small script since it seems the > mapping error occurs early on in my script. > > On 2/26/11 7:21 AM, Michael Wilde wrote: > > Hi Jon, > > > > Can you send us the version of Swift you used, any local mods to it, and the full log file and source script. If possible, a command line that we could replicate it with. ANy chance you could replicate it with a self-contained script that we could re-execute? > > > > Thanks, > > > > Mike > > > > ----- Original Message ----- > >> Hello, > >> I seem to have found an error in Swift. Here is the error that > >> Swift reported. > >> > >> Execution failed: > >> swift#mapper#17019 is closed with a value of > >> proj_dir/proj_raw_image_3.fits > >> > >> And here is a portion from the log file. > >> > >> 011-02-25 21:09:48,012-0600 DEBUG VDL2ExecutionContext > >> java.lang.IllegalArgumentException: swift#mapper#17019 is closed with > >> a > >> value of proj_dir/proj_raw_image_3.fits > >> java.lang.IllegalArgumentException: swift#mapper#17019 is closed with > >> a > >> value of proj_dir/proj_raw_image_3.fits > >> Caused by: java.lang.IllegalArgumentException: swift#mapper#17019 is > >> closed with a value of proj_dir/proj_raw_image_3.fits > >> at > >> org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:338) > >> at > >> org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:218) > >> at > >> org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:88) > >> at > >> org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:50) > >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) > >> at > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.futureModified(AbstractSequentialWithArguments.java:210) > >> at > >> org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:71) > >> at > >> org.griphyn.vdl.karajan.DSHandleFutureWrapper.addModificationAction(DSHandleFutureWrapper.java:60) > >> at > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:199) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > >> at > >> org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) > >> at > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > >> at > >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > >> at > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) > >> at > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) > >> at > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > >> at > >> org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) > >> at > >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > >> at > >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > >> at > >> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > >> at > >> org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > >> at > >> org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > >> at > >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) > >> at > >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > >> at > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > >> at java.lang.Thread.run(Thread.java:662) > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Mon Feb 28 03:49:14 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 28 Feb 2011 01:49:14 -0800 Subject: [Swift-devel] Error in Swift mapping In-Reply-To: <1298886455.31181.0.camel@blabla2.none> References: <22209853.104170.1298726490620.JavaMail.root@zimbra.anl.gov> <4D6951CE.5070603@gmail.com> <1298886455.31181.0.camel@blabla2.none> Message-ID: <1298886554.31181.1.camel@blabla2.none> Nevermind. It can't be trunk. On Mon, 2011-02-28 at 01:47 -0800, Mihael Hategan wrote: > Trunk or stable branch? > > On Sat, 2011-02-26 at 13:17 -0600, Jonathan Monette wrote: > > The swift version is r4143. The cog version is r3056. Attached is the > > log file and the script. It is from my Montage stuff. This is the > > first time this error has appeared. I have no local mods to Swift. I > > will try to replicate the error in a small script since it seems the > > mapping error occurs early on in my script. > > > > On 2/26/11 7:21 AM, Michael Wilde wrote: > > > Hi Jon, > > > > > > Can you send us the version of Swift you used, any local mods to it, and the full log file and source script. If possible, a command line that we could replicate it with. ANy chance you could replicate it with a self-contained script that we could re-execute? > > > > > > Thanks, > > > > > > Mike > > > > > > ----- Original Message ----- > > >> Hello, > > >> I seem to have found an error in Swift. Here is the error that > > >> Swift reported. > > >> > > >> Execution failed: > > >> swift#mapper#17019 is closed with a value of > > >> proj_dir/proj_raw_image_3.fits > > >> > > >> And here is a portion from the log file. > > >> > > >> 011-02-25 21:09:48,012-0600 DEBUG VDL2ExecutionContext > > >> java.lang.IllegalArgumentException: swift#mapper#17019 is closed with > > >> a > > >> value of proj_dir/proj_raw_image_3.fits > > >> java.lang.IllegalArgumentException: swift#mapper#17019 is closed with > > >> a > > >> value of proj_dir/proj_raw_image_3.fits > > >> Caused by: java.lang.IllegalArgumentException: swift#mapper#17019 is > > >> closed with a value of proj_dir/proj_raw_image_3.fits > > >> at > > >> org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:338) > > >> at > > >> org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:218) > > >> at > > >> org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:88) > > >> at > > >> org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:50) > > >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.futureModified(AbstractSequentialWithArguments.java:210) > > >> at > > >> org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:71) > > >> at > > >> org.griphyn.vdl.karajan.DSHandleFutureWrapper.addModificationAction(DSHandleFutureWrapper.java:60) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:199) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > >> at > > >> org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > >> at > > >> org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > >> at > > >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > > >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > >> at > > >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > >> at > > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > >> at java.lang.Thread.run(Thread.java:662) > > >> > > >> _______________________________________________ > > >> Swift-devel mailing list > > >> Swift-devel at ci.uchicago.edu > > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Mon Feb 28 03:51:36 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 28 Feb 2011 01:51:36 -0800 Subject: [Swift-devel] Error in Swift mapping In-Reply-To: <1298886455.31181.0.camel@blabla2.none> References: <22209853.104170.1298726490620.JavaMail.root@zimbra.anl.gov> <4D6951CE.5070603@gmail.com> <1298886455.31181.0.camel@blabla2.none> Message-ID: <1298886696.366.0.camel@blabla2.none> Nevermind. It must be trunk. On Mon, 2011-02-28 at 01:47 -0800, Mihael Hategan wrote: > Trunk or stable branch? > > On Sat, 2011-02-26 at 13:17 -0600, Jonathan Monette wrote: > > The swift version is r4143. The cog version is r3056. Attached is the > > log file and the script. It is from my Montage stuff. This is the > > first time this error has appeared. I have no local mods to Swift. I > > will try to replicate the error in a small script since it seems the > > mapping error occurs early on in my script. > > > > On 2/26/11 7:21 AM, Michael Wilde wrote: > > > Hi Jon, > > > > > > Can you send us the version of Swift you used, any local mods to it, and the full log file and source script. If possible, a command line that we could replicate it with. ANy chance you could replicate it with a self-contained script that we could re-execute? > > > > > > Thanks, > > > > > > Mike > > > > > > ----- Original Message ----- > > >> Hello, > > >> I seem to have found an error in Swift. Here is the error that > > >> Swift reported. > > >> > > >> Execution failed: > > >> swift#mapper#17019 is closed with a value of > > >> proj_dir/proj_raw_image_3.fits > > >> > > >> And here is a portion from the log file. > > >> > > >> 011-02-25 21:09:48,012-0600 DEBUG VDL2ExecutionContext > > >> java.lang.IllegalArgumentException: swift#mapper#17019 is closed with > > >> a > > >> value of proj_dir/proj_raw_image_3.fits > > >> java.lang.IllegalArgumentException: swift#mapper#17019 is closed with > > >> a > > >> value of proj_dir/proj_raw_image_3.fits > > >> Caused by: java.lang.IllegalArgumentException: swift#mapper#17019 is > > >> closed with a value of proj_dir/proj_raw_image_3.fits > > >> at > > >> org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:338) > > >> at > > >> org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:218) > > >> at > > >> org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:88) > > >> at > > >> org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:50) > > >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.futureModified(AbstractSequentialWithArguments.java:210) > > >> at > > >> org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:71) > > >> at > > >> org.griphyn.vdl.karajan.DSHandleFutureWrapper.addModificationAction(DSHandleFutureWrapper.java:60) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:199) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > >> at > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > >> at > > >> org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > >> at > > >> org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > >> at > > >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > > >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > >> at > > >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > >> at > > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > >> at java.lang.Thread.run(Thread.java:662) > > >> > > >> _______________________________________________ > > >> Swift-devel mailing list > > >> Swift-devel at ci.uchicago.edu > > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Mon Feb 28 10:24:03 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 28 Feb 2011 10:24:03 -0600 (CST) Subject: [Swift-devel] A suggested web content strategy for Swift In-Reply-To: Message-ID: <1201387603.107616.1298910243273.JavaMail.root@zimbra.anl.gov> David wrote: > If we wanted to, I think it's possible to automate Google docs -> SVN. > There is a set of utilities called google command line tools. They are > written in python and allow you to do things like: > > > $ google docs get --title "Userguide 0.92" > > > http://code.google.com/p/googlecl/ This would be a big plus for making Google sites/docs usable as our document editor, without loosing the ability to do proper svn management. Can the tools copy from Google to a textual format that can be svn'ed and then copy files back to an new online URL without loss of format? Can you propose what a good svn-based management and release deployment process would be for say the Users Guide, Tutorial, and a future Reference Manual? - Mike ----- Original Message ----- > Personally I find it easiest to write documentation by using Google > docs. We used google docs in a software engineering class, and I found > the collaboration abilities to be very useful. > > > My preferences when writing documentation: > 1) Google docs > 2) Word/open office > ... > 99) Manually entering octal values in a hex editor > ... > 9001) docbook > > > If we wanted to, I think it's possible to automate Google docs -> SVN. > There is a set of utilities called google command line tools. They are > written in python and allow you to do things like: > > > $ google docs get --title "Userguide 0.92" > > > http://code.google.com/p/googlecl/ > > > I've installed it on my laptop and it seems to work well so far. > > > David > > > > On Sat, Feb 26, 2011 at 4:26 PM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Condensing thoughts from many teams members (Sarah, Justin, Mihael, > David, Ketan) and tying together various discussions on this topic, > here's my suggestion on how to proceed. > > 1. Treat the Swift web as a standalone entity, whose backbone > framework can be content- and version- managed outside of Swift SVN. > > 2. Treat Swift documents (User Guide, Tutorial, and eventual Reference > Manual) as svn-managed part of trunk and each release branch, pushed > to version-specific URLs and linked into the web backbone manually > each time a new release appears (or eventually, automatically) > > 3. Use a fast web content manager (Google sites or docs) to make > preliminary versions of newly developed user content available fast; > then migrate that content to the appropriate SVN-controlled document. > > a) Do the 3-4 pages we identified for 0.92 release > b) Convert all such content on the SWFT wiki to this format as a first > step. > > 4. Use Google Sites to gather and mock up the structure and content of > a revised Swift web site that addresses the main new-user and > user-community-growing needs. > > 5. Decide on one of three strategies for SVN docs: > > a) stay with docbook but clean up the format of the content (mostly > indentation and tabbing issues) and conquer the problems of tool > execution and the push-to-web process. > > b) switch to Sphinx and reStructured text, ala the Python > documentation framework. This will have its own tool and push issues, > but is more likely to be easier on many counts: > - the tools are "just" python scripts > - the format is very simple and amenable to production with plain text > editors > (and much easier on the eye for writing) > - its got a much larger and growing user base and tool support than > docbook > > c) use Google docs *iff* we can svn-manage the content loss-free in > some text format in SVN. Then we use Google Documents simply as a > ubiquitous wysiwyg editor but we push each revision back into SVN. > This is less likely to be feasible but worth examining. > > 6. Push hard to consolidate our web content and get to the point where > we can first enlist Gail Pieper and others to review and improve the > content. > > 7. Then engage the Argonne/CI web team to help spruce up the look. > They may suggest at that point that we move the site to Word Press, > ala most of the Argonne and CI production sites, including the Globus > Online site. > > So I *think* that we are on exactly the right track with respect to > most of these points. I suggest we do some fast, lightweight > experiments to decide on how we want to manage the document content. > One to three experiments (in the order below) might help guide us > here: > > Exp 1. Try using Google docs (or perhaps sites) as an editing tool for > the user guide. See if we can save content to text, push to svn, push > to web (and pdf) from there, and repeat the editing cycle. I would be > OK if we had to sacrifice the page-per-chapter html format for now. > > Exp 3. See what improvements we could make in the docbook content > editing and management process. > > What changes, if any, to this approach would people suggest? > > - Mike > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -- Michael Wilde Computation Institute, University of Chicago Mathematics and Computer Science Division Argonne National Laboratory From wilde at mcs.anl.gov Mon Feb 28 11:19:11 2011 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 28 Feb 2011 11:19:11 -0600 (CST) Subject: [Swift-devel] Welcome new team member Ketan Maheshwari Message-ID: <1360528342.108076.1298913551874.JavaMail.root@zimbra.anl.gov> Dear All, Ketan Maheshwari is joining our group today in the dual role of Argonne Post-doctoral researcher and Beagle Catalyst. He'll be working on the ExM exascale many-task computing project, and helping to engage new science users on our new Cray Beagle system. Ketan just graduated with his Ph.D. in computer science from the University of Nice in France, and moved here last week with his wife and daughter. His primary office is at Argonne and he'll be spending a fair amount of time at the CI and UChicago campus as well. Please welcome Ketan and stop by to say hello. His desk is near 5141 at Argonne. Regards, Mike Wilde and Paul Dave From hategan at mcs.anl.gov Mon Feb 28 13:10:24 2011 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 28 Feb 2011 11:10:24 -0800 Subject: [Swift-devel] Error in Swift mapping In-Reply-To: <1298886696.366.0.camel@blabla2.none> References: <22209853.104170.1298726490620.JavaMail.root@zimbra.anl.gov> <4D6951CE.5070603@gmail.com> <1298886455.31181.0.camel@blabla2.none> <1298886696.366.0.camel@blabla2.none> Message-ID: <1298920224.1737.1.camel@blabla2.none> Does this happen on every run? I suspect it might be due to somehow a future event getting fired twice, but I can see the code path that would cause this. Mihael On Mon, 2011-02-28 at 01:51 -0800, Mihael Hategan wrote: > Nevermind. It must be trunk. > > On Mon, 2011-02-28 at 01:47 -0800, Mihael Hategan wrote: > > Trunk or stable branch? > > > > On Sat, 2011-02-26 at 13:17 -0600, Jonathan Monette wrote: > > > The swift version is r4143. The cog version is r3056. Attached is the > > > log file and the script. It is from my Montage stuff. This is the > > > first time this error has appeared. I have no local mods to Swift. I > > > will try to replicate the error in a small script since it seems the > > > mapping error occurs early on in my script. > > > > > > On 2/26/11 7:21 AM, Michael Wilde wrote: > > > > Hi Jon, > > > > > > > > Can you send us the version of Swift you used, any local mods to it, and the full log file and source script. If possible, a command line that we could replicate it with. ANy chance you could replicate it with a self-contained script that we could re-execute? > > > > > > > > Thanks, > > > > > > > > Mike > > > > > > > > ----- Original Message ----- > > > >> Hello, > > > >> I seem to have found an error in Swift. Here is the error that > > > >> Swift reported. > > > >> > > > >> Execution failed: > > > >> swift#mapper#17019 is closed with a value of > > > >> proj_dir/proj_raw_image_3.fits > > > >> > > > >> And here is a portion from the log file. > > > >> > > > >> 011-02-25 21:09:48,012-0600 DEBUG VDL2ExecutionContext > > > >> java.lang.IllegalArgumentException: swift#mapper#17019 is closed with > > > >> a > > > >> value of proj_dir/proj_raw_image_3.fits > > > >> java.lang.IllegalArgumentException: swift#mapper#17019 is closed with > > > >> a > > > >> value of proj_dir/proj_raw_image_3.fits > > > >> Caused by: java.lang.IllegalArgumentException: swift#mapper#17019 is > > > >> closed with a value of proj_dir/proj_raw_image_3.fits > > > >> at > > > >> org.griphyn.vdl.mapping.AbstractDataNode.setValue(AbstractDataNode.java:338) > > > >> at > > > >> org.griphyn.vdl.mapping.RootDataNode.setValue(RootDataNode.java:218) > > > >> at > > > >> org.griphyn.vdl.karajan.lib.SetFieldValue.deepCopy(SetFieldValue.java:88) > > > >> at > > > >> org.griphyn.vdl.karajan.lib.SetFieldValue.function(SetFieldValue.java:50) > > > >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:68) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.futureModified(AbstractSequentialWithArguments.java:210) > > > >> at > > > >> org.griphyn.vdl.karajan.DSHandleFutureWrapper.notifyListeners(DSHandleFutureWrapper.java:71) > > > >> at > > > >> org.griphyn.vdl.karajan.DSHandleFutureWrapper.addModificationAction(DSHandleFutureWrapper.java:60) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:199) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.functions.AbstractFunction.post(AbstractFunction.java:28) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > >> at org.griphyn.vdl.karajan.lib.VDLFunction.post(VDLFunction.java:72) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.AbstractSequentialWithArguments.completed(AbstractSequentialWithArguments.java:196) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.complete(FlowNode.java:214) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.post(FlowContainer.java:58) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.functions.Argument.post(Argument.java:48) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.Sequential.startNext(Sequential.java:29) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.Sequential.executeChildren(Sequential.java:20) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.FlowContainer.execute(FlowContainer.java:63) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.restart(FlowNode.java:139) > > > >> at > > > >> org.globus.cog.karajan.workflow.nodes.FlowNode.start(FlowNode.java:197) > > > >> at > > > >> org.globus.cog.karajan.workflow.events.EventBus.start(EventBus.java:104) > > > >> at > > > >> org.globus.cog.karajan.workflow.events.EventTargetPair.run(EventTargetPair.java:40) > > > >> at > > > >> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > > > >> at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > > > >> at java.util.concurrent.FutureTask.run(FutureTask.java:138) > > > >> at > > > >> java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > > > >> at > > > >> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > > > >> at java.lang.Thread.run(Thread.java:662) > > > >> > > > >> _______________________________________________ > > > >> Swift-devel mailing list > > > >> Swift-devel at ci.uchicago.edu > > > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel