From tfreeman at mcs.anl.gov Thu Aug 2 13:27:25 2007 From: tfreeman at mcs.anl.gov (Tim Freeman) Date: Thu, 2 Aug 2007 13:27:25 -0500 Subject: [Swift-devel] might be interesting: Tom Message-ID: <20070802132725.c8847f4d.tfreeman@mcs.anl.gov> http://tom.loria.fr/about.php Tim From wilde at mcs.anl.gov Thu Aug 2 17:38:27 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 02 Aug 2007 17:38:27 -0500 Subject: [Swift-devel] Error when running with compiled swift system Message-ID: <46B25CE3.3070005@mcs.anl.gov> Hi all, I'm able to run small workflow fine with the 0.2 binary release, but when I try with a release that I built from a source checkout, i get the error below. It seems like a karajan element is missing in the vdl package, and I'll try to track that down, but if anyone recognizes whats wrong here please let me know. Thanks, Mike 32$ more awf*.log::::::::::::::awf2-3e0ckpa4u7pf2.log::::::::::::::2007-08-02 17:13:11,907 DEBUG Loader Recompilation suppressed.2007-08-02 17:13:14,717 INFO unknown Using sites file: /home/wilde/swift/vdsk-0.2-dev/bin/../etc/sites.xml 2007-08-02 17:13:14,719 INFO unknown Using tc.data: /home/wilde/swift/vdsk-0.2-dev/bin/../etc/tc.data 2007-08-02 17:13:17,240 DEBUG VDL2ExecutionContext 'vdl:getarrayfieldvalue' is not defined. 'vdl:getarrayfieldvalue' is not defined. sys:parallelfor @ awf2.kml, line: 64 vdl:mains @ awf2.kml, line: 61 at org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:36) at org.globus.cog.karajan.workflow.events.FailureNotificationEvent.(FailureNotificationEvent.java:42) at org.globus.cog.karajan.workflow.FlowElementWrapper.failImmediately(FlowElementWrapper.java:212) at org.globus.cog.karajan.workflow.events.EventBus.failElement(EventBus.java:187) at org.globus.cog.karajan.workflow.events.EventBus.send(EventBus.java(Compiled Code)) at org.globus.cog.karajan.workflow.events.EventBus.sendHooked(EventBus.java(Inlined Compiled Code)) at org.globus.cog.karajan.workflow.events.EventWorker.run(EventWorker.java(Compiled Code)) 32$ From benc at hawaga.org.uk Fri Aug 3 14:45:30 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 3 Aug 2007 19:45:30 +0000 (GMT) Subject: [Swift-devel] provider-deef Message-ID: I added a readme into the provider-deef module with an example commandline of how t obuild and deploy it into your swift installation. If you rebuild your swift installation, its quite liekly that you'll need to redeploy provider-deef into your swift installation. You'll find the single command to do that in the provider-deef/README file. -- From hategan at mcs.anl.gov Fri Aug 3 15:24:00 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 03 Aug 2007 15:24:00 -0500 Subject: [Swift-devel] provider-deef In-Reply-To: References: Message-ID: <1186172640.11838.0.camel@blabla.mcs.anl.gov> I also removed all the gt4 jars and added a dependency to the gt4 provider instead. There are some differences in the jar versions (the one in the current gt4 provider being from gt4.0.3), so you may want to test this. On Fri, 2007-08-03 at 19:45 +0000, Ben Clifford wrote: > I added a readme into the provider-deef module with an example commandline > of how t obuild and deploy it into your swift installation. > > If you rebuild your swift installation, its quite liekly that you'll need > to redeploy provider-deef into your swift installation. > > You'll find the single command to do that in the provider-deef/README > file. > > From wilde at mcs.anl.gov Fri Aug 3 17:59:51 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 03 Aug 2007 17:59:51 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46AF37D9.7000301@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> Message-ID: <46B3B367.6090701@mcs.anl.gov> Im catching up from some of this weeks email. I didnt see a followup to this, nor can I tell which two jobs Ian is referring to or where those came from. Can anyone clarify what this issue is here? Ian Foster wrote: > Hi, > > I am curious whether we found out why those two jobs (?) were failing at > the end of the big MolDyn run? > > Ian. > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From iraicu at cs.uchicago.edu Fri Aug 3 23:03:03 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Fri, 03 Aug 2007 23:03:03 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B3B367.6090701@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> Message-ID: <46B3FA77.90801@cs.uchicago.edu> Hi, Nika can probably be more specific, but the last time we ran the 244 molecule MolDyn, the workflow failed on the last few jobs, and the failures were application specific, not Swift or Falkon. I believe the specific issue that caused those jobs to fail has been resolved. We have made another attempt at the MolDyn 244 molecule run, and from what I can tell, it did not complete successfully again. We were supposed to have 20497 jobs... 1 1 1 1 244 244 1 244 244 68 244 16592 1 244 244 11 244 2684 1 244 244 1 244 244 20497 but we have: 20482 with exit code 0 1 with exit code -3 2 with exit code 253 I forgot to enable the debug at the workers, so I don't know what the STDOUT and STDERR was for these 3 jobs. Given that Swift retries 3 times a job before it fails the workflow, my guess is that these 3 jobs were really the same job failing 3 times. The failure occurred on 3 different machines, so I don't think it was machine related. Nika, can you tell from the various Swift logs what happened to these 3 jobs? Is this the same issue as we had on the last 244 mol run? It looks like we failed the workflow with 15 jobs to go. The graphs all look nice, similar to the last ones we had. If people really want to see them, I can generate them again. Otherwise, look at http://tg-viz-login1.uc.teragrid.org:51000/index.htm to see the last 10K samples of the experiment. Nika, after you try to figure out what happened, can you simply retry the workflow, maybe it will manage to finish the last 15 jobs. Depending on what problem we find, I think we might conclude that 3 retries is not enough, and we might want to have a higher number as the default when running with Falkon. If the error was an application error, then no matter how many retries we have, it won't make any difference. Ioan Michael Wilde wrote: > Im catching up from some of this weeks email. > > I didnt see a followup to this, nor can I tell which two jobs Ian is > referring to or where those came from. Can anyone clarify what this > issue is here? > > > Ian Foster wrote: >> Hi, >> >> I am curious whether we found out why those two jobs (?) were failing >> at the end of the big MolDyn run? >> >> Ian. >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From bugzilla-daemon at mcs.anl.gov Sat Aug 4 15:29:47 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sat, 4 Aug 2007 15:29:47 -0500 (CDT) Subject: [Swift-devel] [Bug 84] switch does not work with variable parameter In-Reply-To: Message-ID: <20070804202947.D8E56164B3@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=84 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|NEW |ASSIGNED ------- Comment #4 from benc at hawaga.org.uk 2007-08-04 15:29 ------- this appears to have been fixed by recent internal values-in-DSHandle changes (around the r1000 mark). -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Sun Aug 5 08:27:50 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 5 Aug 2007 08:27:50 -0500 (CDT) Subject: [Swift-devel] [Bug 84] switch does not work with variable parameter In-Reply-To: Message-ID: <20070805132750.E0467164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=84 benc at hawaga.org.uk changed: What |Removed |Added ---------------------------------------------------------------------------- Status|ASSIGNED |RESOLVED Resolution| |FIXED -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Sun Aug 5 08:31:16 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Sun, 5 Aug 2007 08:31:16 -0500 (CDT) Subject: [Swift-devel] [Bug 86] New: recompilation should not be suppressed if compiler version has changed Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=86 Summary: recompilation should not be suppressed if compiler version has changed Product: Swift Version: unspecified Platform: Macintosh OS/Version: All Status: NEW Severity: enhancement Priority: P2 Component: General AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk CC: swift-devel at ci.uchicago.edu Given the frequency at which we make cross-version incompatible changes to our intermediate file formats and runtime libraries, swift should not suppress recompilation of intermediate files if the compiler version that generated the intermediate files is different from the present version. (This includes both older and newer versions, rather than a stricter 'newer compiler' test) Recompilation suppression in this situation has caused trouble for people a few times. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From hategan at mcs.anl.gov Mon Aug 6 10:42:57 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 06 Aug 2007 10:42:57 -0500 Subject: [Swift-devel] karmasphere Message-ID: <1186414977.2455.3.camel@blabla.mcs.anl.gov> http://labs.karmasphere.org/dp/ From nefedova at mcs.anl.gov Mon Aug 6 11:01:10 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 11:01:10 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B3FA77.90801@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> Message-ID: <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> Ok, here is what happened with the last 244-molecule run. 1. First of all, the new swift code (with loops etc) was used. The code's size is dramatically reduced: -rw-r--r-- 1 nefedova users 13342526 2007-07-05 12:01 MolDyn-244.dtm -rw-r--r-- 1 nefedova users 21898 2007-08-03 11:00 MolDyn-244- loops.swift 2. I do not have the log on the swift size (probably it was not produced because I put in the hack for output reduction and log output was suppressed -- it can be fixed easily) 3. There were 2 molecules that failed. That infamous m179 failed at the last step (3 re-tries). Yuqing -- its the same molecule you said you fixed the antechamber code for. You told me to use the code in your home directory /home/ydeng/antechamber-1.27, I assumed it was on tg-uc. Is that correct? Or its on another host? Anyway, I used the code from the directory above and it didn't work. The output is @tg- login1:/disks/scratchgpfs1/iraicu/ModLyn/MolDyn-244-loos- bm66sjz1li5h1/shared. I could try to run again this molecule specifically in case it works for you. 4. The second molecule that failed is m050. Its quite a mystery why it failed: it finished the 4-th stage (those 68 charm jobs) successfully (I have the data in shared directory on tg-uc) but then the 5-th stage has never started! I do not see any leftover directories from the 5-th stage for m050 (or any other stages for m050 for that matter). So it was not a job failure, but job submission failure (since no directories were even created). It had to be a job called 'generator_cat' with a parameter 'm050'. Ioan - is that possible to rack what happened to this job in Falcon logs? 5. I can't restart the workflow since this bug/feature has not been fixed: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=29 (as long as I use the hack for output reduction -- restarts do not work). Nika On Aug 3, 2007, at 11:03 PM, Ioan Raicu wrote: > Hi, > Nika can probably be more specific, but the last time we ran the > 244 molecule MolDyn, the workflow failed on the last few jobs, and > the failures were application specific, not Swift or Falkon. I > believe the specific issue that caused those jobs to fail has been > resolved. > > We have made another attempt at the MolDyn 244 molecule run, and > from what I can tell, it did not complete successfully again. We > were supposed to have 20497 jobs... > > 1 1 1 > 1 244 244 > 1 244 244 > 68 244 16592 > 1 244 244 > 11 244 2684 > 1 244 244 > 1 244 244 > > > > > > 20497 > > but we have: > 20482 with exit code 0 > 1 with exit code -3 > 2 with exit code 253 > > I forgot to enable the debug at the workers, so I don't know what > the STDOUT and STDERR was for these 3 jobs. Given that Swift > retries 3 times a job before it fails the workflow, my guess is > that these 3 jobs were really the same job failing 3 times. The > failure occurred on 3 different machines, so I don't think it was > machine related. Nika, can you tell from the various Swift logs > what happened to these 3 jobs? Is this the same issue as we had on > the last 244 mol run? It looks like we failed the workflow with 15 > jobs to go. > > The graphs all look nice, similar to the last ones we had. If > people really want to see them, I can generate them again. > Otherwise, look at http://tg-viz-login1.uc.teragrid.org:51000/ > index.htm to see the last 10K samples of the experiment. > > Nika, after you try to figure out what happened, can you simply > retry the workflow, maybe it will manage to finish the last 15 > jobs. Depending on what problem we find, I think we might conclude > that 3 retries is not enough, and we might want to have a higher > number as the default when running with Falkon. If the error was > an application error, then no matter how many retries we have, it > won't make any difference. > > Ioan > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Mon Aug 6 11:25:47 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 11:25:47 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> Message-ID: <46B74B8B.1080408@cs.uchicago.edu> Hi, Veronika Nefedova wrote: > Ok, here is what happened with the last 244-molecule run. > > 1. First of all, the new swift code (with loops etc) was used. The > code's size is dramatically reduced: > > -rw-r--r-- 1 nefedova users 13342526 2007-07-05 12:01 MolDyn-244.dtm > -rw-r--r-- 1 nefedova users 21898 2007-08-03 11:00 > MolDyn-244-loops.swift > > > 2. I do not have the log on the swift size (probably it was not > produced because I put in the hack for output reduction and log output > was suppressed -- it can be fixed easily) > > 3. There were 2 molecules that failed. That infamous m179 failed at > the last step (3 re-tries). Yuqing -- its the same molecule you said > you fixed the antechamber code for. You told me to use the code in > your home directory /home/ydeng/antechamber-1.27, I assumed it was on > tg-uc. Is that correct? Or its on another host? Anyway, I used the > code from the directory above and it didn't work. The output > is @tg-login1:/disks/scratchgpfs1/iraicu/ModLyn/MolDyn-244-loos-bm66sjz1li5h1/shared. > I could try to run again this molecule specifically in case it works > for you. > > 4. The second molecule that failed is m050. Its quite a mystery why > it failed: it finished the 4-th stage (those 68 charm jobs) > successfully (I have the data in shared directory on tg-uc) but then > the 5-th stage has never started! I do not see any leftover > directories from the 5-th stage for m050 (or any other stages for m050 > for that matter). So it was not a job failure, but job submission > failure (since no directories were even created). It had to be a job > called 'generator_cat' with a parameter 'm050'. Ioan - is that > possible to rack what happened to this job in Falcon logs? > There were only 3 jobs that failed in the Falkon logs, so I presume that those were from (3) above. I also forgot to enable any debug logging, as the settings were from some older high throughput experiments, so I don't have a trace of all the task descriptions and STDOUT and STDERR. About the only thing I can think of is... can you summarize from the Swift log, how many submitted jobs there were, how many success and how many failed? At least maybe we can make sure that the Swift log is consistent with the Falkon logs. Could it be that a task actually fails (say it doesn't produce all the output files), but still returns an exit code of 0 (success)? If yes, then would Swift attempt the next task that needed the missing files and likely fail while executing due to not finding all the files? Now, you mention that it could be a job submission failure... but wouldn't this be explicit in the Swift logs, that it tried to submit and it failed? Here is the list of all tasks that Falkon knows of: http://tg-viz-login1.uc.teragrid.org:51000/service_logs/GenericPortalWS_taskPerf.txt Can you produce a similar list of tasks (from the Swift logs), if the task ID (urn:0-1-10-0-1186176957479), and the status (i.e. submitted, success, failed, etc)? I believe that the latest provisioner code you had (which I hope it did not get overwritten by SVN as I don't know if it was ever checked in, and I don't remember when it was changed, before or after the commit to SVN) should have printed at each submission to Falkon the task ID in the form it is above, and the status of the task at that point in time. Assuming this information is in the Swift log, you should be able to grep for these lines and produce a summary of all the tasks, that we can then cross-match with Falkon's logs. Which one is the Swift log for this latest run on viper? There are so many, and I can't tell which one it is. Ioan > 5. I can't restart the workflow since this bug/feature has not been > fixed: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=29 (as long > as I use the hack for output reduction -- restarts do not work). > > Nika > > On Aug 3, 2007, at 11:03 PM, Ioan Raicu wrote: > >> Hi, >> Nika can probably be more specific, but the last time we ran the 244 >> molecule MolDyn, the workflow failed on the last few jobs, and the >> failures were application specific, not Swift or Falkon. I believe >> the specific issue that caused those jobs to fail has been resolved. >> >> We have made another attempt at the MolDyn 244 molecule run, and from >> what I can tell, it did not complete successfully again. We were >> supposed to have 20497 jobs... >> >> 1 1 1 >> 1 244 244 >> 1 244 244 >> 68 244 16592 >> 1 244 244 >> 11 244 2684 >> 1 244 244 >> 1 244 244 >> >> >> >> >> >> 20497 >> >> >> but we have: >> 20482 with exit code 0 >> 1 with exit code -3 >> 2 with exit code 253 >> >> I forgot to enable the debug at the workers, so I don't know what the >> STDOUT and STDERR was for these 3 jobs. Given that Swift retries 3 >> times a job before it fails the workflow, my guess is that these 3 >> jobs were really the same job failing 3 times. The failure occurred >> on 3 different machines, so I don't think it was machine related. >> Nika, can you tell from the various Swift logs what happened to these >> 3 jobs? Is this the same issue as we had on the last 244 mol run? >> It looks like we failed the workflow with 15 jobs to go. >> >> The graphs all look nice, similar to the last ones we had. If people >> really want to see them, I can generate them again. Otherwise, look >> at http://tg-viz-login1.uc.teragrid.org:51000/index.htm to see the >> last 10K samples of the experiment. >> >> Nika, after you try to figure out what happened, can you simply retry >> the workflow, maybe it will manage to finish the last 15 jobs. >> Depending on what problem we find, I think we might conclude that 3 >> retries is not enough, and we might want to have a higher number as >> the default when running with Falkon. If the error was an >> application error, then no matter how many retries we have, it won't >> make any difference. >> >> Ioan >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nefedova at mcs.anl.gov Mon Aug 6 11:31:34 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 11:31:34 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B74B8B.1080408@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> Message-ID: Ioan, I can't answer any of your questions -- read my point number 2 below ); Nika On Aug 6, 2007, at 11:25 AM, Ioan Raicu wrote: > Hi, > > Veronika Nefedova wrote: >> Ok, here is what happened with the last 244-molecule run. >> >> 1. First of all, the new swift code (with loops etc) was used. The >> code's size is dramatically reduced: >> >> -rw-r--r-- 1 nefedova users 13342526 2007-07-05 12:01 MolDyn-244.dtm >> -rw-r--r-- 1 nefedova users 21898 2007-08-03 11:00 MolDyn-244- >> loops.swift >> >> >> 2. I do not have the log on the swift size (probably it was not >> produced because I put in the hack for output reduction and log >> output was suppressed -- it can be fixed easily) >> >> 3. There were 2 molecules that failed. That infamous m179 failed >> at the last step (3 re-tries). Yuqing -- its the same molecule you >> said you fixed the antechamber code for. You told me to use the >> code in your home directory /home/ydeng/antechamber-1.27, I >> assumed it was on tg-uc. Is that correct? Or its on another host? >> Anyway, I used the code from the directory above and it didn't >> work. The output is @tg-login1:/disks/scratchgpfs1/iraicu/ModLyn/ >> MolDyn-244-loos-bm66sjz1li5h1/shared. I could try to run again >> this molecule specifically in case it works for you. >> >> 4. The second molecule that failed is m050. Its quite a mystery >> why it failed: it finished the 4-th stage (those 68 charm jobs) >> successfully (I have the data in shared directory on tg-uc) but >> then the 5-th stage has never started! I do not see any leftover >> directories from the 5-th stage for m050 (or any other stages for >> m050 for that matter). So it was not a job failure, but job >> submission failure (since no directories were even created). It >> had to be a job called 'generator_cat' with a parameter 'm050'. >> Ioan - is that possible to rack what happened to this job in >> Falcon logs? >> > There were only 3 jobs that failed in the Falkon logs, so I presume > that those were from (3) above. I also forgot to enable any debug > logging, as the settings were from some older high throughput > experiments, so I don't have a trace of all the task descriptions > and STDOUT and STDERR. About the only thing I can think of is... > can you summarize from the Swift log, how many submitted jobs there > were, how many success and how many failed? At least maybe we can > make sure that the Swift log is consistent with the Falkon logs. > Could it be that a task actually fails (say it doesn't produce all > the output files), but still returns an exit code of 0 (success)? > If yes, then would Swift attempt the next task that needed the > missing files and likely fail while executing due to not finding > all the files? > > Now, you mention that it could be a job submission failure... but > wouldn't this be explicit in the Swift logs, that it tried to > submit and it failed? > > Here is the list of all tasks that Falkon knows of: http://tg-viz- > login1.uc.teragrid.org:51000/service_logs/GenericPortalWS_taskPerf.txt > > Can you produce a similar list of tasks (from the Swift logs), if > the task ID (urn:0-1-10-0-1186176957479), and the status (i.e. > submitted, success, failed, etc)? I believe that the latest > provisioner code you had (which I hope it did not get overwritten > by SVN as I don't know if it was ever checked in, and I don't > remember when it was changed, before or after the commit to SVN) > should have printed at each submission to Falkon the task ID in the > form it is above, and the status of the task at that point in > time. Assuming this information is in the Swift log, you should be > able to grep for these lines and produce a summary of all the > tasks, that we can then cross-match with Falkon's logs. Which one > is the Swift log for this latest run on viper? There are so many, > and I can't tell which one it is. > > Ioan >> 5. I can't restart the workflow since this bug/feature has not >> been fixed: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=29 >> (as long as I use the hack for output reduction -- restarts do not >> work). >> >> Nika >> >> On Aug 3, 2007, at 11:03 PM, Ioan Raicu wrote: >> >>> Hi, >>> Nika can probably be more specific, but the last time we ran the >>> 244 molecule MolDyn, the workflow failed on the last few jobs, >>> and the failures were application specific, not Swift or Falkon. >>> I believe the specific issue that caused those jobs to fail has >>> been resolved. >>> >>> We have made another attempt at the MolDyn 244 molecule run, and >>> from what I can tell, it did not complete successfully again. We >>> were supposed to have 20497 jobs... >>> >>> 1 1 1 >>> 1 244 244 >>> 1 244 244 >>> 68 244 16592 >>> 1 244 244 >>> 11 244 2684 >>> 1 244 244 >>> 1 244 244 >>> >>> >>> >>> >>> >>> 20497 >>> >>> but we have: >>> 20482 with exit code 0 >>> 1 with exit code -3 >>> 2 with exit code 253 >>> >>> I forgot to enable the debug at the workers, so I don't know what >>> the STDOUT and STDERR was for these 3 jobs. Given that Swift >>> retries 3 times a job before it fails the workflow, my guess is >>> that these 3 jobs were really the same job failing 3 times. The >>> failure occurred on 3 different machines, so I don't think it was >>> machine related. Nika, can you tell from the various Swift logs >>> what happened to these 3 jobs? Is this the same issue as we had >>> on the last 244 mol run? It looks like we failed the workflow >>> with 15 jobs to go. >>> >>> The graphs all look nice, similar to the last ones we had. If >>> people really want to see them, I can generate them again. >>> Otherwise, look at http://tg-viz-login1.uc.teragrid.org:51000/ >>> index.htm to see the last 10K samples of the experiment. >>> >>> Nika, after you try to figure out what happened, can you simply >>> retry the workflow, maybe it will manage to finish the last 15 >>> jobs. Depending on what problem we find, I think we might >>> conclude that 3 retries is not enough, and we might want to have >>> a higher number as the default when running with Falkon. If the >>> error was an application error, then no matter how many retries >>> we have, it won't make any difference. >>> >>> Ioan >>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Mon Aug 6 11:34:48 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 6 Aug 2007 16:34:48 +0000 (GMT) Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> Message-ID: have you tried running those molecules individually? -- From iraicu at cs.uchicago.edu Mon Aug 6 11:34:42 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 11:34:42 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> Message-ID: <46B74DA2.5010107@cs.uchicago.edu> Aha, OK, it didn't click that (2) was referring to the Swift log that I was referring to. So, in that case, we can't do much else on this run, other than make sure we fix the infamous m179 molecule, turn on all debugging (and make sure its actually printing debug statements), and try the run again! Ioan Veronika Nefedova wrote: > Ioan, I can't answer any of your questions -- read my point number 2 > below ); > > Nika > > On Aug 6, 2007, at 11:25 AM, Ioan Raicu wrote: > >> Hi, >> >> Veronika Nefedova wrote: >>> Ok, here is what happened with the last 244-molecule run. >>> >>> 1. First of all, the new swift code (with loops etc) was used. The >>> code's size is dramatically reduced: >>> >>> -rw-r--r-- 1 nefedova users 13342526 2007-07-05 12:01 MolDyn-244.dtm >>> -rw-r--r-- 1 nefedova users 21898 2007-08-03 11:00 >>> MolDyn-244-loops.swift >>> >>> >>> 2. I do not have the log on the swift size (probably it was not >>> produced because I put in the hack for output reduction and log >>> output was suppressed -- it can be fixed easily) >>> >>> 3. There were 2 molecules that failed. That infamous m179 failed at >>> the last step (3 re-tries). Yuqing -- its the same molecule you said >>> you fixed the antechamber code for. You told me to use the code in >>> your home directory /home/ydeng/antechamber-1.27, I assumed it was >>> on tg-uc. Is that correct? Or its on another host? Anyway, I used >>> the code from the directory above and it didn't work. The output >>> is @tg-login1:/disks/scratchgpfs1/iraicu/ModLyn/MolDyn-244-loos-bm66sjz1li5h1/shared. >>> I could try to run again this molecule specifically in case it works >>> for you. >>> >>> 4. The second molecule that failed is m050. Its quite a mystery why >>> it failed: it finished the 4-th stage (those 68 charm jobs) >>> successfully (I have the data in shared directory on tg-uc) but then >>> the 5-th stage has never started! I do not see any leftover >>> directories from the 5-th stage for m050 (or any other stages for >>> m050 for that matter). So it was not a job failure, but job >>> submission failure (since no directories were even created). It had >>> to be a job called 'generator_cat' with a parameter 'm050'. Ioan - >>> is that possible to rack what happened to this job in Falcon logs? >>> >> There were only 3 jobs that failed in the Falkon logs, so I presume >> that those were from (3) above. I also forgot to enable any debug >> logging, as the settings were from some older high throughput >> experiments, so I don't have a trace of all the task descriptions and >> STDOUT and STDERR. About the only thing I can think of is... can you >> summarize from the Swift log, how many submitted jobs there were, how >> many success and how many failed? At least maybe we can make sure >> that the Swift log is consistent with the Falkon logs. Could it be >> that a task actually fails (say it doesn't produce all the output >> files), but still returns an exit code of 0 (success)? If yes, then >> would Swift attempt the next task that needed the missing files and >> likely fail while executing due to not finding all the files? >> >> Now, you mention that it could be a job submission failure... but >> wouldn't this be explicit in the Swift logs, that it tried to submit >> and it failed? >> >> Here is the list of all tasks that Falkon knows of: >> http://tg-viz-login1.uc.teragrid.org:51000/service_logs/GenericPortalWS_taskPerf.txt >> >> Can you produce a similar list of tasks (from the Swift logs), if the >> task ID (urn:0-1-10-0-1186176957479), and the status (i.e. submitted, >> success, failed, etc)? I believe that the latest provisioner code >> you had (which I hope it did not get overwritten by SVN as I don't >> know if it was ever checked in, and I don't remember when it was >> changed, before or after the commit to SVN) should have printed at >> each submission to Falkon the task ID in the form it is above, and >> the status of the task at that point in time. Assuming this >> information is in the Swift log, you should be able to grep for these >> lines and produce a summary of all the tasks, that we can then >> cross-match with Falkon's logs. Which one is the Swift log for this >> latest run on viper? There are so many, and I can't tell which one >> it is. >> >> Ioan >>> 5. I can't restart the workflow since this bug/feature has not been >>> fixed: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=29 (as long >>> as I use the hack for output reduction -- restarts do not work). >>> >>> Nika >>> >>> On Aug 3, 2007, at 11:03 PM, Ioan Raicu wrote: >>> >>>> Hi, >>>> Nika can probably be more specific, but the last time we ran the >>>> 244 molecule MolDyn, the workflow failed on the last few jobs, and >>>> the failures were application specific, not Swift or Falkon. I >>>> believe the specific issue that caused those jobs to fail has been >>>> resolved. >>>> >>>> We have made another attempt at the MolDyn 244 molecule run, and >>>> from what I can tell, it did not complete successfully again. We >>>> were supposed to have 20497 jobs... >>>> >>>> 1 1 1 >>>> 1 244 244 >>>> 1 244 244 >>>> 68 244 16592 >>>> 1 244 244 >>>> 11 244 2684 >>>> 1 244 244 >>>> 1 244 244 >>>> >>>> >>>> >>>> >>>> >>>> 20497 >>>> >>>> >>>> but we have: >>>> 20482 with exit code 0 >>>> 1 with exit code -3 >>>> 2 with exit code 253 >>>> >>>> I forgot to enable the debug at the workers, so I don't know what >>>> the STDOUT and STDERR was for these 3 jobs. Given that Swift >>>> retries 3 times a job before it fails the workflow, my guess is >>>> that these 3 jobs were really the same job failing 3 times. The >>>> failure occurred on 3 different machines, so I don't think it was >>>> machine related. Nika, can you tell from the various Swift logs >>>> what happened to these 3 jobs? Is this the same issue as we had on >>>> the last 244 mol run? It looks like we failed the workflow with 15 >>>> jobs to go. >>>> >>>> The graphs all look nice, similar to the last ones we had. If >>>> people really want to see them, I can generate them again. >>>> Otherwise, look at >>>> http://tg-viz-login1.uc.teragrid.org:51000/index.htm to see the >>>> last 10K samples of the experiment. >>>> >>>> Nika, after you try to figure out what happened, can you simply >>>> retry the workflow, maybe it will manage to finish the last 15 >>>> jobs. Depending on what problem we find, I think we might conclude >>>> that 3 retries is not enough, and we might want to have a higher >>>> number as the default when running with Falkon. If the error was >>>> an application error, then no matter how many retries we have, it >>>> won't make any difference. >>>> >>>> Ioan >>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Mon Aug 6 11:36:19 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 11:36:19 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> Message-ID: <46B74E03.1030403@cs.uchicago.edu> Hi, ANL/UC seems almost idle, I bet we could get 244 processors if we try it again soon! Ioan iraicu at tg-viz-login1:~/java/Falkon_v0.8.1/service/logs/244-mol-08-03-07> showq active jobs------------------------ JOBID USERNAME STATE PROCS REMAINING STARTTIME 1479856 leggett Running 2 3:27:15 Mon Aug 6 10:02:18 1479840 leggett Running 2 1:05:51 Mon Aug 6 07:40:54 2 active jobs 4 of 260 processors in use by local jobs (1.54%) 2 of 130 nodes active (1.54%) eligible jobs---------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 eligible jobs blocked jobs----------------------- JOBID USERNAME STATE PROCS WCLIMIT QUEUETIME 0 blocked jobs Total jobs: 2 Veronika Nefedova wrote: > Ioan, I can't answer any of your questions -- read my point number 2 > below ); > > Nika > > On Aug 6, 2007, at 11:25 AM, Ioan Raicu wrote: > >> Hi, >> >> Veronika Nefedova wrote: >>> Ok, here is what happened with the last 244-molecule run. >>> >>> 1. First of all, the new swift code (with loops etc) was used. The >>> code's size is dramatically reduced: >>> >>> -rw-r--r-- 1 nefedova users 13342526 2007-07-05 12:01 MolDyn-244.dtm >>> -rw-r--r-- 1 nefedova users 21898 2007-08-03 11:00 >>> MolDyn-244-loops.swift >>> >>> >>> 2. I do not have the log on the swift size (probably it was not >>> produced because I put in the hack for output reduction and log >>> output was suppressed -- it can be fixed easily) >>> >>> 3. There were 2 molecules that failed. That infamous m179 failed at >>> the last step (3 re-tries). Yuqing -- its the same molecule you said >>> you fixed the antechamber code for. You told me to use the code in >>> your home directory /home/ydeng/antechamber-1.27, I assumed it was >>> on tg-uc. Is that correct? Or its on another host? Anyway, I used >>> the code from the directory above and it didn't work. The output >>> is @tg-login1:/disks/scratchgpfs1/iraicu/ModLyn/MolDyn-244-loos-bm66sjz1li5h1/shared. >>> I could try to run again this molecule specifically in case it works >>> for you. >>> >>> 4. The second molecule that failed is m050. Its quite a mystery why >>> it failed: it finished the 4-th stage (those 68 charm jobs) >>> successfully (I have the data in shared directory on tg-uc) but then >>> the 5-th stage has never started! I do not see any leftover >>> directories from the 5-th stage for m050 (or any other stages for >>> m050 for that matter). So it was not a job failure, but job >>> submission failure (since no directories were even created). It had >>> to be a job called 'generator_cat' with a parameter 'm050'. Ioan - >>> is that possible to rack what happened to this job in Falcon logs? >>> >> There were only 3 jobs that failed in the Falkon logs, so I presume >> that those were from (3) above. I also forgot to enable any debug >> logging, as the settings were from some older high throughput >> experiments, so I don't have a trace of all the task descriptions and >> STDOUT and STDERR. About the only thing I can think of is... can you >> summarize from the Swift log, how many submitted jobs there were, how >> many success and how many failed? At least maybe we can make sure >> that the Swift log is consistent with the Falkon logs. Could it be >> that a task actually fails (say it doesn't produce all the output >> files), but still returns an exit code of 0 (success)? If yes, then >> would Swift attempt the next task that needed the missing files and >> likely fail while executing due to not finding all the files? >> >> Now, you mention that it could be a job submission failure... but >> wouldn't this be explicit in the Swift logs, that it tried to submit >> and it failed? >> >> Here is the list of all tasks that Falkon knows of: >> http://tg-viz-login1.uc.teragrid.org:51000/service_logs/GenericPortalWS_taskPerf.txt >> >> Can you produce a similar list of tasks (from the Swift logs), if the >> task ID (urn:0-1-10-0-1186176957479), and the status (i.e. submitted, >> success, failed, etc)? I believe that the latest provisioner code >> you had (which I hope it did not get overwritten by SVN as I don't >> know if it was ever checked in, and I don't remember when it was >> changed, before or after the commit to SVN) should have printed at >> each submission to Falkon the task ID in the form it is above, and >> the status of the task at that point in time. Assuming this >> information is in the Swift log, you should be able to grep for these >> lines and produce a summary of all the tasks, that we can then >> cross-match with Falkon's logs. Which one is the Swift log for this >> latest run on viper? There are so many, and I can't tell which one >> it is. >> >> Ioan >>> 5. I can't restart the workflow since this bug/feature has not been >>> fixed: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=29 (as long >>> as I use the hack for output reduction -- restarts do not work). >>> >>> Nika >>> >>> On Aug 3, 2007, at 11:03 PM, Ioan Raicu wrote: >>> >>>> Hi, >>>> Nika can probably be more specific, but the last time we ran the >>>> 244 molecule MolDyn, the workflow failed on the last few jobs, and >>>> the failures were application specific, not Swift or Falkon. I >>>> believe the specific issue that caused those jobs to fail has been >>>> resolved. >>>> >>>> We have made another attempt at the MolDyn 244 molecule run, and >>>> from what I can tell, it did not complete successfully again. We >>>> were supposed to have 20497 jobs... >>>> >>>> 1 1 1 >>>> 1 244 244 >>>> 1 244 244 >>>> 68 244 16592 >>>> 1 244 244 >>>> 11 244 2684 >>>> 1 244 244 >>>> 1 244 244 >>>> >>>> >>>> >>>> >>>> >>>> 20497 >>>> >>>> >>>> but we have: >>>> 20482 with exit code 0 >>>> 1 with exit code -3 >>>> 2 with exit code 253 >>>> >>>> I forgot to enable the debug at the workers, so I don't know what >>>> the STDOUT and STDERR was for these 3 jobs. Given that Swift >>>> retries 3 times a job before it fails the workflow, my guess is >>>> that these 3 jobs were really the same job failing 3 times. The >>>> failure occurred on 3 different machines, so I don't think it was >>>> machine related. Nika, can you tell from the various Swift logs >>>> what happened to these 3 jobs? Is this the same issue as we had on >>>> the last 244 mol run? It looks like we failed the workflow with 15 >>>> jobs to go. >>>> >>>> The graphs all look nice, similar to the last ones we had. If >>>> people really want to see them, I can generate them again. >>>> Otherwise, look at >>>> http://tg-viz-login1.uc.teragrid.org:51000/index.htm to see the >>>> last 10K samples of the experiment. >>>> >>>> Nika, after you try to figure out what happened, can you simply >>>> retry the workflow, maybe it will manage to finish the last 15 >>>> jobs. Depending on what problem we find, I think we might conclude >>>> that 3 retries is not enough, and we might want to have a higher >>>> number as the default when running with Falkon. If the error was >>>> an application error, then no matter how many retries we have, it >>>> won't make any difference. >>>> >>>> Ioan >>>> >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nefedova at mcs.anl.gov Mon Aug 6 11:43:10 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 11:43:10 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> Message-ID: <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> m050 worked just fine last time we ran 244 molecules. And it doesn't look like an application failure but rather a job submission failure (no leftover directories). m179 failed the last time we ran it but I thought it was fixed? Yuqing - could you please verify that I am using the right antechamber code on tg-uc. Nika On Aug 6, 2007, at 11:34 AM, Ben Clifford wrote: > have you tried running those molecules individually? > -- > From iraicu at cs.uchicago.edu Mon Aug 6 11:51:59 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 11:51:59 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> Message-ID: <46B751AF.9050502@cs.uchicago.edu> If there are no leftover directories, it could mean one of two things: 1) the job submission failed and the job never made it out to a remote resource to create directories and execute 2) the job went through to the remote resource, created directories, executed, finished with an exit code of 0, cleaned up directories, ... In both instances, there are no directories at the end. What we are missing is the logs to determine which of the two cases from above happened. Note that there were absolutely no Exceptions thrown in the Falkon logs, so I would be cautious to rule out (2) from above. Just for a sanity check, can you run those two molecules manually, just to double check that they work! Could we do a 2 molecule run through Swift with just the two suspect molecules to see if those 2 molecules complete OK? Then, let's try another run with 244 molecules soon, as most of ANL/UC is free! Ioan Veronika Nefedova wrote: > m050 worked just fine last time we ran 244 molecules. And it doesn't > look like an application failure but rather a job submission failure > (no leftover directories). > m179 failed the last time we ran it but I thought it was fixed? Yuqing > - could you please verify that I am using the right antechamber code > on tg-uc. > > Nika > > On Aug 6, 2007, at 11:34 AM, Ben Clifford wrote: > >> have you tried running those molecules individually? >> -- >> > > From nefedova at mcs.anl.gov Mon Aug 6 11:56:58 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 11:56:58 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B751AF.9050502@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> Message-ID: On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: > If there are no leftover directories, it could mean one of two things: > 1) the job submission failed and the job never made it out to a > remote resource to create directories and execute > 2) the job went through to the remote resource, created > directories, executed, finished with an exit code of 0, cleaned up > directories, ... > > In both instances, there are no directories at the end. What we > are missing is the logs to determine which of the two cases from > above happened. Note that there were absolutely no Exceptions > thrown in the Falkon logs, so I would be cautious to rule out (2) > from above. > There are no output files from any stages after the Stage 4 (in shared) for m050. So we can rule out any successful execution of Stage 5 and after... > Just for a sanity check, can you run those two molecules manually, > just to double check that they work! Could we do a 2 molecule run > through Swift with just the two suspect molecules to see if those 2 > molecules complete OK? Then, let's try another run with 244 > molecules soon, as most of ANL/UC is free! > > Ioan > > Veronika Nefedova wrote: >> m050 worked just fine last time we ran 244 molecules. And it >> doesn't look like an application failure but rather a job >> submission failure (no leftover directories). >> m179 failed the last time we ran it but I thought it was fixed? >> Yuqing - could you please verify that I am using the right >> antechamber code on tg-uc. >> >> Nika >> >> On Aug 6, 2007, at 11:34 AM, Ben Clifford wrote: >> >>> have you tried running those molecules individually? >>> -- >>> >> >> > From benc at hawaga.org.uk Mon Aug 6 12:04:41 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 6 Aug 2007 17:04:41 +0000 (GMT) Subject: [Swift-devel] log4j.logger.org.globus.cog.abstraction.impl.execution.deef=DEBUG Message-ID: I saw Nika had this hacked into her personal copy of the swift source code, in etc/log4j.properties: > log4j.logger.org.globus.cog.abstraction.impl.execution.deef=DEBUG Who claims ownership of this line? provider-deef? falkon? I guess provider-deef, in which case I think the provider-deef build system can put this in more sensibly without needing to hack the source by hand. -- From hategan at mcs.anl.gov Mon Aug 6 12:19:58 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 06 Aug 2007 12:19:58 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B751AF.9050502@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> Message-ID: <1186420798.6614.0.camel@blabla.mcs.anl.gov> On Mon, 2007-08-06 at 11:51 -0500, Ioan Raicu wrote: > If there are no leftover directories, it could mean one of two things: > 1) the job submission failed and the job never made it out to a remote > resource to create directories and execute > 2) the job went through to the remote resource, created directories, > executed, finished with an exit code of 0, cleaned up directories, ... > > In both instances, there are no directories at the end. What we are > missing is the logs to determine which of the two cases from above > happened. There would be the wrapper log. > Note that there were absolutely no Exceptions thrown in the > Falkon logs, so I would be cautious to rule out (2) from above. > > Just for a sanity check, can you run those two molecules manually, just > to double check that they work! Could we do a 2 molecule run through > Swift with just the two suspect molecules to see if those 2 molecules > complete OK? Then, let's try another run with 244 molecules soon, as > most of ANL/UC is free! > > Ioan > > Veronika Nefedova wrote: > > m050 worked just fine last time we ran 244 molecules. And it doesn't > > look like an application failure but rather a job submission failure > > (no leftover directories). > > m179 failed the last time we ran it but I thought it was fixed? Yuqing > > - could you please verify that I am using the right antechamber code > > on tg-uc. > > > > Nika > > > > On Aug 6, 2007, at 11:34 AM, Ben Clifford wrote: > > > >> have you tried running those molecules individually? > >> -- > >> > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Mon Aug 6 12:20:06 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 12:20:06 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B751AF.9050502@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> Message-ID: <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: > If there are no leftover directories, it could mean one of two things: > 1) the job submission failed and the job never made it out to a > remote resource to create directories and execute > 2) the job went through to the remote resource, created > directories, executed, finished with an exit code of 0, cleaned up > directories, ... > > In both instances, there are no directories at the end. What we > are missing is the logs to determine which of the two cases from > above happened. Note that there were absolutely no Exceptions > thrown in the Falkon logs, so I would be cautious to rule out (2) > from above. > > Just for a sanity check, can you run those two molecules manually, > just to double check that they work! Could we do a 2 molecule run > through Swift with just the two suspect molecules to see if those 2 > molecules complete OK? I started those 2 molecules via GRAM. I have no trust in m179 finishing completely since I didn't change anything. I hope for m050 to finish though... You can watch the swift log on viper in ~nefedova/alamines/MolDyn-2- loops-be9484k93kk21.log Nika > Then, let's try another run with 244 molecules soon, as most of ANL/ > UC is free! > > Ioan > > Veronika Nefedova wrote: >> m050 worked just fine last time we ran 244 molecules. And it >> doesn't look like an application failure but rather a job >> submission failure (no leftover directories). >> m179 failed the last time we ran it but I thought it was fixed? >> Yuqing - could you please verify that I am using the right >> antechamber code on tg-uc. >> >> Nika >> >> On Aug 6, 2007, at 11:34 AM, Ben Clifford wrote: >> >>> have you tried running those molecules individually? >>> -- >>> >> >> > From hategan at mcs.anl.gov Mon Aug 6 12:22:17 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 06 Aug 2007 12:22:17 -0500 Subject: [Swift-devel] log4j.logger.org.globus.cog.abstraction.impl.execution.deef=DEBUG In-Reply-To: References: Message-ID: <1186420937.6614.3.camel@blabla.mcs.anl.gov> provider-deef/etc/log4j.properties.module already has that line. It should get integrated into the big log4j.properties file if provider-deef is build as a proper dependency to swift. On Mon, 2007-08-06 at 17:04 +0000, Ben Clifford wrote: > I saw Nika had this hacked into her personal copy of the swift source > code, in etc/log4j.properties: > > > log4j.logger.org.globus.cog.abstraction.impl.execution.deef=DEBUG > > Who claims ownership of this line? provider-deef? falkon? > > I guess provider-deef, in which case I think the provider-deef build > system can put this in more sensibly without needing to hack the source by > hand. > From nefedova at mcs.anl.gov Mon Aug 6 12:22:28 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 12:22:28 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B74DA2.5010107@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <46B74DA2.5010107@cs.uchicago.edu> Message-ID: BTW - the swift log thing was fixed thanks to Ben -- it was not the output reduction hack but in fact some discrepancies in log4j.properties file that were introduced during the latest SVN update. Nika On Aug 6, 2007, at 11:34 AM, Ioan Raicu wrote: > Aha, OK, it didn't click that (2) was referring to the Swift log > that I was referring to. So, in that case, we can't do much else > on this run, other than make sure we fix the infamous m179 > molecule, turn on all debugging (and make sure its actually > printing debug statements), and try the run again! > > Ioan > > Veronika Nefedova wrote: >> Ioan, I can't answer any of your questions -- read my point number >> 2 below ); >> >> Nika >> >> On Aug 6, 2007, at 11:25 AM, Ioan Raicu wrote: >> >>> Hi, >>> >>> Veronika Nefedova wrote: >>>> Ok, here is what happened with the last 244-molecule run. >>>> >>>> 1. First of all, the new swift code (with loops etc) was used. >>>> The code's size is dramatically reduced: >>>> >>>> -rw-r--r-- 1 nefedova users 13342526 2007-07-05 12:01 >>>> MolDyn-244.dtm >>>> -rw-r--r-- 1 nefedova users 21898 2007-08-03 11:00 >>>> MolDyn-244-loops.swift >>>> >>>> >>>> 2. I do not have the log on the swift size (probably it was not >>>> produced because I put in the hack for output reduction and log >>>> output was suppressed -- it can be fixed easily) >>>> >>>> 3. There were 2 molecules that failed. That infamous m179 >>>> failed at the last step (3 re-tries). Yuqing -- its the same >>>> molecule you said you fixed the antechamber code for. You told >>>> me to use the code in your home directory /home/ydeng/ >>>> antechamber-1.27, I assumed it was on tg-uc. Is that correct? Or >>>> its on another host? Anyway, I used the code from the directory >>>> above and it didn't work. The output is @tg-login1:/disks/ >>>> scratchgpfs1/iraicu/ModLyn/MolDyn-244-loos-bm66sjz1li5h1/shared. >>>> I could try to run again this molecule specifically in case it >>>> works for you. >>>> >>>> 4. The second molecule that failed is m050. Its quite a mystery >>>> why it failed: it finished the 4-th stage (those 68 charm jobs) >>>> successfully (I have the data in shared directory on tg-uc) but >>>> then the 5-th stage has never started! I do not see any leftover >>>> directories from the 5-th stage for m050 (or any other stages >>>> for m050 for that matter). So it was not a job failure, but job >>>> submission failure (since no directories were even created). It >>>> had to be a job called 'generator_cat' with a parameter 'm050'. >>>> Ioan - is that possible to rack what happened to this job in >>>> Falcon logs? >>>> >>> There were only 3 jobs that failed in the Falkon logs, so I >>> presume that those were from (3) above. I also forgot to enable >>> any debug logging, as the settings were from some older high >>> throughput experiments, so I don't have a trace of all the task >>> descriptions and STDOUT and STDERR. About the only thing I can >>> think of is... can you summarize from the Swift log, how many >>> submitted jobs there were, how many success and how many failed? >>> At least maybe we can make sure that the Swift log is consistent >>> with the Falkon logs. Could it be that a task actually fails >>> (say it doesn't produce all the output files), but still returns >>> an exit code of 0 (success)? If yes, then would Swift attempt >>> the next task that needed the missing files and likely fail while >>> executing due to not finding all the files? >>> >>> Now, you mention that it could be a job submission failure... but >>> wouldn't this be explicit in the Swift logs, that it tried to >>> submit and it failed? >>> >>> Here is the list of all tasks that Falkon knows of: http://tg-viz- >>> login1.uc.teragrid.org:51000/service_logs/ >>> GenericPortalWS_taskPerf.txt >>> >>> Can you produce a similar list of tasks (from the Swift logs), if >>> the task ID (urn:0-1-10-0-1186176957479), and the status (i.e. >>> submitted, success, failed, etc)? I believe that the latest >>> provisioner code you had (which I hope it did not get overwritten >>> by SVN as I don't know if it was ever checked in, and I don't >>> remember when it was changed, before or after the commit to SVN) >>> should have printed at each submission to Falkon the task ID in >>> the form it is above, and the status of the task at that point in >>> time. Assuming this information is in the Swift log, you should >>> be able to grep for these lines and produce a summary of all the >>> tasks, that we can then cross-match with Falkon's logs. Which >>> one is the Swift log for this latest run on viper? There are so >>> many, and I can't tell which one it is. >>> >>> Ioan >>>> 5. I can't restart the workflow since this bug/feature has not >>>> been fixed: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=29 >>>> (as long as I use the hack for output reduction -- restarts do >>>> not work). >>>> >>>> Nika >>>> >>>> On Aug 3, 2007, at 11:03 PM, Ioan Raicu wrote: >>>> >>>>> Hi, >>>>> Nika can probably be more specific, but the last time we ran >>>>> the 244 molecule MolDyn, the workflow failed on the last few >>>>> jobs, and the failures were application specific, not Swift or >>>>> Falkon. I believe the specific issue that caused those jobs to >>>>> fail has been resolved. >>>>> >>>>> We have made another attempt at the MolDyn 244 molecule run, >>>>> and from what I can tell, it did not complete successfully >>>>> again. We were supposed to have 20497 jobs... >>>>> >>>>> 1 1 1 >>>>> 1 244 244 >>>>> 1 244 244 >>>>> 68 244 16592 >>>>> 1 244 244 >>>>> 11 244 2684 >>>>> 1 244 244 >>>>> 1 244 244 >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> 20497 >>>>> >>>>> but we have: >>>>> 20482 with exit code 0 >>>>> 1 with exit code -3 >>>>> 2 with exit code 253 >>>>> >>>>> I forgot to enable the debug at the workers, so I don't know >>>>> what the STDOUT and STDERR was for these 3 jobs. Given that >>>>> Swift retries 3 times a job before it fails the workflow, my >>>>> guess is that these 3 jobs were really the same job failing 3 >>>>> times. The failure occurred on 3 different machines, so I >>>>> don't think it was machine related. Nika, can you tell from >>>>> the various Swift logs what happened to these 3 jobs? Is this >>>>> the same issue as we had on the last 244 mol run? It looks >>>>> like we failed the workflow with 15 jobs to go. >>>>> >>>>> The graphs all look nice, similar to the last ones we had. If >>>>> people really want to see them, I can generate them again. >>>>> Otherwise, look at http://tg-viz-login1.uc.teragrid.org:51000/ >>>>> index.htm to see the last 10K samples of the experiment. >>>>> >>>>> Nika, after you try to figure out what happened, can you simply >>>>> retry the workflow, maybe it will manage to finish the last 15 >>>>> jobs. Depending on what problem we find, I think we might >>>>> conclude that 3 retries is not enough, and we might want to >>>>> have a higher number as the default when running with Falkon. >>>>> If the error was an application error, then no matter how many >>>>> retries we have, it won't make any difference. >>>>> >>>>> Ioan >>>>> >>>> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From nefedova at mcs.anl.gov Mon Aug 6 12:26:30 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 12:26:30 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186420798.6614.0.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <1186420798.6614.0.camel@blabla.mcs.anl.gov> Message-ID: <5350B4E7-78F8-4F1D-92FE-F9306F63F745@mcs.anl.gov> On Aug 6, 2007, at 12:19 PM, Mihael Hategan wrote: > On Mon, 2007-08-06 at 11:51 -0500, Ioan Raicu wrote: >> If there are no leftover directories, it could mean one of two >> things: >> 1) the job submission failed and the job never made it out to a >> remote >> resource to create directories and execute >> 2) the job went through to the remote resource, created directories, >> executed, finished with an exit code of 0, cleaned up >> directories, ... >> >> In both instances, there are no directories at the end. What we are >> missing is the logs to determine which of the two cases from above >> happened. > > There would be the wrapper log. > I checked the log. There were only 243 5th Stage jobs done (not 244): nefedova at tg-login1:/disks/scratchgpfs1/iraicu/ModLyn/MolDyn-244-loos- bm66sjz1li5h1> grep generator_cat wrapper.log | wc 243 243 6561 The only failures in that log are the last stage jobs (3 repeats) for m179. Nothing for m050. Nika From iraicu at cs.uchicago.edu Mon Aug 6 13:48:52 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 13:48:52 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <5350B4E7-78F8-4F1D-92FE-F9306F63F745@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <1186420798.6614.0.camel@blabla.mcs.anl.gov> <5350B4E7-78F8-4F1D-92FE-F9306F63F745@mcs.anl.gov> Message-ID: <46B76D14.1000304@cs.uchicago.edu> So m050 finished OK? All files associated with this molecule were created correctly? We are really working blind without any logs. Are there any wrapper logs as Mihael suggested that could point into what happened to this molecule? If nothing else, just fix m179, and let's try it again. We should also switch over to your temp credentials that you got for ANL/UC. I can help you configure Falkon for this run, just let me know when you are ready! Ioan Veronika Nefedova wrote: > > On Aug 6, 2007, at 12:19 PM, Mihael Hategan wrote: > >> On Mon, 2007-08-06 at 11:51 -0500, Ioan Raicu wrote: >>> If there are no leftover directories, it could mean one of two things: >>> 1) the job submission failed and the job never made it out to a remote >>> resource to create directories and execute >>> 2) the job went through to the remote resource, created directories, >>> executed, finished with an exit code of 0, cleaned up directories, ... >>> >>> In both instances, there are no directories at the end. What we are >>> missing is the logs to determine which of the two cases from above >>> happened. >> >> There would be the wrapper log. >> > > > I checked the log. There were only 243 5th Stage jobs done (not 244): > > nefedova at tg-login1:/disks/scratchgpfs1/iraicu/ModLyn/MolDyn-244-loos-bm66sjz1li5h1> > grep generator_cat wrapper.log | wc > 243 243 6561 > > The only failures in that log are the last stage jobs (3 repeats) for > m179. Nothing for m050. > > Nika > From nefedova at mcs.anl.gov Mon Aug 6 14:11:58 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 14:11:58 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> Message-ID: <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> m050 and m179 finished just fine now via GRAM (thanks to Yuqing who fixed the m179 just in time!). We could start again the 244- molecule run to verify that nothing is wrong with the whole system. Nika On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: > > On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: > > > I started those 2 molecules via GRAM. I have no trust in m179 > finishing completely since I didn't change anything. I hope for > m050 to finish though... > You can watch the swift log on viper in ~nefedova/alamines/MolDyn-2- > loops-be9484k93kk21.log > > Nika > >> Then, let's try another run with 244 molecules soon, as most of >> ANL/UC is free! >> >> Ioan >> From yuqing.deng at gmail.com Mon Aug 6 13:24:09 2007 From: yuqing.deng at gmail.com (Yuqing Deng) Date: Mon, 6 Aug 2007 13:24:09 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> Message-ID: Nika, The fix is in one of the data files that are loaded by antechamber. The ACHOME environment viariable has to be set to /home/ydeng/antechamber-1.27/ too. Yuqing On 8/6/07, Veronika Nefedova wrote: > Ok, here is what happened with the last 244-molecule run. > > 1. First of all, the new swift code (with loops etc) was used. The code's > size is dramatically reduced: > > -rw-r--r-- 1 nefedova users 13342526 2007-07-05 12:01 MolDyn-244.dtm > -rw-r--r-- 1 nefedova users 21898 2007-08-03 11:00 > MolDyn-244-loops.swift > > > 2. I do not have the log on the swift size (probably it was not produced > because I put in the hack for output reduction and log output was suppressed > -- it can be fixed easily) > > 3. There were 2 molecules that failed. That infamous m179 failed at the > last step (3 re-tries). Yuqing -- its the same molecule you said you fixed > the antechamber code for. You told me to use the code in your home > directory /home/ydeng/antechamber-1.27, I assumed it was > on tg-uc. Is that correct? Or its on another host? Anyway, I used the code > from the directory above and it didn't work. The output > is @tg-login1:/disks/scratchgpfs1/iraicu/ModLyn/MolDyn-244-loos-bm66sjz1li5h1/shared. > I could try to run again this molecule specifically in case it works for > you. > > 4. The second molecule that failed is m050. Its quite a mystery why it > failed: it finished the 4-th stage (those 68 charm jobs) successfully (I > have the data in shared directory on tg-uc) but then the 5-th stage has > never started! I do not see any leftover directories from the 5-th stage for > m050 (or any other stages for m050 for that matter). So it was not a job > failure, but job submission failure (since no directories were even > created). It had to be a job called 'generator_cat' with a parameter 'm050'. > Ioan - is that possible to rack what happened to this job in Falcon logs? > > 5. I can't restart the workflow since this bug/feature has not been > fixed: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=29 > (as long as I use the hack for output reduction -- restarts do not work). > > Nika > > > On Aug 3, 2007, at 11:03 PM, Ioan Raicu wrote: > Hi, > Nika can probably be more specific, but the last time we ran the 244 > molecule MolDyn, the workflow failed on the last few jobs, and the failures > were application specific, not Swift or Falkon. I believe the specific > issue that caused those jobs to fail has been resolved. > > We have made another attempt at the MolDyn 244 molecule run, and from what > I can tell, it did not complete successfully again. We were supposed to > have 20497 jobs... > > > 111 > 1244244 > 1244244 > 6824416592 > 1244244 > 112442684 > 1244244 > 1244244 > > > > > > > 20497 > but we have: > 20482 with exit code 0 > 1 with exit code -3 > 2 with exit code 253 > > I forgot to enable the debug at the workers, so I don't know what the > STDOUT and STDERR was for these 3 jobs. Given that Swift retries 3 times a > job before it fails the workflow, my guess is that these 3 jobs were really > the same job failing 3 times. The failure occurred on 3 different machines, > so I don't think it was machine related. Nika, can you tell from the > various Swift logs what happened to these 3 jobs? Is this the same issue as > we had on the last 244 mol run? It looks like we failed the workflow with > 15 jobs to go. > > The graphs all look nice, similar to the last ones we had. If people > really want to see them, I can generate them again. Otherwise, look at > http://tg-viz-login1.uc.teragrid.org:51000/index.htm to see > the last 10K samples of the experiment. > > Nika, after you try to figure out what happened, can you simply retry the > workflow, maybe it will manage to finish the last 15 jobs. Depending on > what problem we find, I think we might conclude that 3 retries is not > enough, and we might want to have a higher number as the default when > running with Falkon. If the error was an application error, then no matter > how many retries we have, it won't make any difference. > > Ioan > > > From iraicu at cs.uchicago.edu Mon Aug 6 14:29:43 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 14:29:43 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> Message-ID: <46B776A7.7040005@cs.uchicago.edu> OK! Why don't we do one last run from my allocation, as everything is set up already and ready to go! Make sure to enable all debug logging. Falkon is up and running with all debug enabled! Falkon location is unchanged from the last experiment. Falkon Factory Service: http://tg-viz-login2:50010/wsrf/services/GenericPortal/core/WS/GPFactoryService Web Server (graphs): http://tg-viz-login2.uc.teragrid.org:51000/index.htm ANL/UC is not quite so idle as it was earlier, but I bet we could still get 150~200 processors! Ioan Veronika Nefedova wrote: > m050 and m179 finished just fine now via GRAM (thanks to Yuqing who > fixed the m179 just in time!). We could start again the 244- molecule > run to verify that nothing is wrong with the whole system. > > Nika > > On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: > >> >> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >> >> >> I started those 2 molecules via GRAM. I have no trust in m179 >> finishing completely since I didn't change anything. I hope for m050 >> to finish though... >> You can watch the swift log on viper in >> ~nefedova/alamines/MolDyn-2-loops-be9484k93kk21.log >> >> Nika >> >>> Then, let's try another run with 244 molecules soon, as most of >>> ANL/UC is free! >>> >>> Ioan >>> > > From benc at hawaga.org.uk Mon Aug 6 14:38:21 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 6 Aug 2007 19:38:21 +0000 (GMT) Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <46B74DA2.5010107@cs.uchicago.edu> Message-ID: On Mon, 6 Aug 2007, Veronika Nefedova wrote: > the output reduction hack The output reducation hack really needs to go away pretty soon. That's mostly more mihael's side of the code than mine (though maybe need some language changes to deal with indicating what should be staged out or not) -- From nefedova at mcs.anl.gov Mon Aug 6 15:17:05 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 15:17:05 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B776A7.7040005@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> Message-ID: OK. There is something weird happening. I've got several such entries in my swift log: 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application exception: Task failed task:execute @ vdl-int.k, line: 332 vdl:execute2 @ execute-default.k, line: 22 vdl:execute @ MolDyn-244-loops.kml, line: 20 antchmbr @ MolDyn-244-loops.kml, line: 2845 vdl:mains @ MolDyn-244-loops.kml, line: 2267 Looks like antechamber has failed (?). And the failure is only on a swfit side, it never made it across to Falcon (there are no remote directories created). But I see some of antechamber jobs have finished (in shared). Yuqing -- could the changes you've made be responsible for these failures (I do not see how it could though) ? Ioan, what do you see in your logs ion these tasks: 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, identity=urn: 0-1-56-0-1186429255786) setting status to Failed 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, identity=urn: 0-1-57-0-1186429255798) setting status to Failed 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: 0-1-59-0-1186429255800) setting status to Failed 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: 0-1-60-0-1186429255805) setting status to Failed 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: 0-1-61-0-1186429255811) setting status to Failed 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: 0-1-58-0-1186429255814) setting status to Failed Nika On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: > OK! > Why don't we do one last run from my allocation, as everything is > set up already and ready to go! Make sure to enable all debug > logging. Falkon is up and running with all debug enabled! > > Falkon location is unchanged from the last experiment. > Falkon Factory Service: http://tg-viz-login2:50010/wsrf/services/ > GenericPortal/core/WS/GPFactoryService > Web Server (graphs): http://tg-viz-login2.uc.teragrid.org:51000/ > index.htm > > ANL/UC is not quite so idle as it was earlier, but I bet we could > still get 150~200 processors! > > Ioan > > Veronika Nefedova wrote: >> m050 and m179 finished just fine now via GRAM (thanks to Yuqing >> who fixed the m179 just in time!). We could start again the 244- >> molecule run to verify that nothing is wrong with the whole system. >> >> Nika >> >> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >> >>> >>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>> >>> >>> I started those 2 molecules via GRAM. I have no trust in m179 >>> finishing completely since I didn't change anything. I hope for >>> m050 to finish though... >>> You can watch the swift log on viper in ~nefedova/alamines/ >>> MolDyn-2-loops-be9484k93kk21.log >>> >>> Nika >>> >>>> Then, let's try another run with 244 molecules soon, as most of >>>> ANL/UC is free! >>>> >>>> Ioan >>>> >> >> > From iraicu at cs.uchicago.edu Mon Aug 6 15:27:59 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 15:27:59 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> Message-ID: <46B7844F.6020801@cs.uchicago.edu> Everything is idle, there is no work to be done... iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> tail GenericPortalWS_perf_per_sec.txt 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 24 workers are registered but idle.... queue length 0, 57 jobs completed. Also, see below all 57 jobs, they all finished with an exit code of 0, in other words succesfully! How many jobs does Swift think it sent? Ioan iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat GenericPortalWS_taskPerf.txt //taskNum taskID workerID startTimeStamp execTimeStamp resultsQueueTimeStamp endTimeStamp waitQueueTime ex ecTime resultsQueueTime totalTime exitCode 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 560614 560629 49780 338 15 50133 0 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 561200 561899 561909 216 699 10 925 0 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 561373 562150 562159 382 777 9 1168 0 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 1044916 1044926 62404 10200 10 72614 0 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 1046453 1047038 1047067 135 585 29 749 0 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 1046429 1053072 1053080 114 6643 8 6765 0 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 1047051 1054256 1054290 731 7205 34 7970 0 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 1054267 1054570 1054579 7943 303 9 8255 0 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 1053087 1056811 1056819 6765 3724 8 10497 0 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 1054583 1058691 1058719 8257 4108 28 12393 0 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 1058704 1059363 1059385 12373 659 22 13054 0 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 1056826 1060315 1060323 10497 3489 8 13994 0 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 1059375 1060589 1060596 13042 1214 7 14263 0 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 1060603 1060954 1061054 14265 351 100 14716 0 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 1060329 1061094 1061126 13993 765 32 14790 0 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 1061105 1065608 1065617 14414 4503 9 18926 0 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 1065622 1066307 1066315 18929 685 8 19622 0 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 1061045 1067540 1067563 14356 6495 23 20874 0 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 1066320 1069262 1069271 19625 2942 9 22576 0 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 1067551 1071003 1071011 20854 3452 8 24314 0 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 1071016 1071664 1071671 24316 648 7 24971 0 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 1069275 1071679 1071692 22577 2404 13 24994 0 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 1071687 1073978 1073988 24985 2291 10 27286 0 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 1073992 1075959 1075969 27286 1967 10 29263 0 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 1071699 1076704 1076713 24995 5005 9 30009 0 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 1075972 1077451 1077459 29264 1479 8 30751 0 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 1076717 1080157 1080165 30007 3440 8 33455 0 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 1077464 1080270 1080286 30752 2806 16 33574 0 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 1080170 1080611 1080619 33457 441 8 33906 0 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 1080624 1080973 1080983 33907 349 10 34266 0 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 1080281 1081405 1081413 33566 1124 8 34698 0 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 1080986 1082989 1082996 34267 2003 7 36277 0 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 1083002 1083370 1083378 36279 368 8 36655 0 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 1081417 1084830 1084837 34696 3413 7 38116 0 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 1084843 1085854 1085879 37761 1011 25 38797 0 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 1085865 1089502 1089511 38780 3637 9 42426 0 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 1089515 1089966 1089974 42428 451 8 42887 0 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 1083383 1091316 1091324 36658 7933 8 44599 0 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 1091329 1092042 1092049 44237 713 7 44957 0 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 1092055 1094242 1094249 44960 2187 7 47154 0 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 1089979 1094418 1094428 42889 4439 10 47338 0 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 1094433 1095082 1095089 47331 649 7 47987 0 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 1095095 1096846 1096853 47991 1751 7 49749 0 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 1094256 1098214 1098221 47156 3958 7 51121 0 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 1096859 1098627 1098637 49752 1768 10 51530 0 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 1094037 1098903 1098910 46940 4866 7 51813 0 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 1099192 1100210 1100246 52071 1018 36 53125 0 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 1097371 1100555 1100562 50260 3184 7 53451 0 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 1097135 1100896 1100904 50026 3761 8 53795 0 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 1098640 1101106 1101127 51523 2466 21 54010 0 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 1099965 1101217 1101224 52842 1252 7 54101 0 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 1098227 1101820 1101828 51112 3593 8 54713 0 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 1097375 1104132 1104139 50262 6757 7 57026 0 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 1100221 1106449 1106458 53096 6228 9 59333 0 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 1098916 1106473 1106481 51797 7557 8 59362 0 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 563384 1207793 1207801 71 644409 8 644488 0 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 563413 1216404 1216425 98 652991 21 653110 0 Veronika Nefedova wrote: > OK. There is something weird happening. I've got several such entries > in my swift log: > > 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application exception: Task > failed > task:execute @ vdl-int.k, line: 332 > vdl:execute2 @ execute-default.k, line: 22 > vdl:execute @ MolDyn-244-loops.kml, line: 20 > antchmbr @ MolDyn-244-loops.kml, line: 2845 > vdl:mains @ MolDyn-244-loops.kml, line: 2267 > > > Looks like antechamber has failed (?). And the failure is only on a > swfit side, it never made it across to Falcon (there are no remote > directories created). But I see some of antechamber jobs have finished > (in shared). > > Yuqing -- could the changes you've made be responsible for these > failures (I do not see how it could though) ? > > Ioan, what do you see in your logs ion these tasks: > > 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, > identity=urn:0-1-56-0-1186429255786) setting status to Failed > 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, > identity=urn:0-1-57-0-1186429255798) setting status to Failed > 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, > identity=urn:0-1-59-0-1186429255800) setting status to Failed > 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, > identity=urn:0-1-60-0-1186429255805) setting status to Failed > 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, > identity=urn:0-1-61-0-1186429255811) setting status to Failed > 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, > identity=urn:0-1-58-0-1186429255814) setting status to Failed > > Nika > > On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: > >> OK! >> Why don't we do one last run from my allocation, as everything is set >> up already and ready to go! Make sure to enable all debug logging. >> Falkon is up and running with all debug enabled! >> >> Falkon location is unchanged from the last experiment. >> Falkon Factory Service: >> http://tg-viz-login2:50010/wsrf/services/GenericPortal/core/WS/GPFactoryService >> >> Web Server (graphs): >> http://tg-viz-login2.uc.teragrid.org:51000/index.htm >> >> ANL/UC is not quite so idle as it was earlier, but I bet we could >> still get 150~200 processors! >> >> Ioan >> >> Veronika Nefedova wrote: >>> m050 and m179 finished just fine now via GRAM (thanks to Yuqing who >>> fixed the m179 just in time!). We could start again the 244- >>> molecule run to verify that nothing is wrong with the whole system. >>> >>> Nika >>> >>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>> >>>> >>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>> >>>> >>>> I started those 2 molecules via GRAM. I have no trust in m179 >>>> finishing completely since I didn't change anything. I hope for >>>> m050 to finish though... >>>> You can watch the swift log on viper in >>>> ~nefedova/alamines/MolDyn-2-loops-be9484k93kk21.log >>>> >>>> Nika >>>> >>>>> Then, let's try another run with 244 molecules soon, as most of >>>>> ANL/UC is free! >>>>> >>>>> Ioan >>>>> >>> >>> >> > > From nefedova at mcs.anl.gov Mon Aug 6 15:44:16 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 15:44:16 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B7844F.6020801@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> Message-ID: <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> Swift thinks that it sent 248 jobs. nefedova at viper:~/alamines> grep "Running job " MolDyn-244-loops- dbui34oxjr4j2.log | wc 248 6931 56718 nefedova at viper:~/alamines> On Aug 6, 2007, at 3:27 PM, Ioan Raicu wrote: > Everything is idle, there is no work to be done... > > iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> tail > GenericPortalWS_perf_per_sec.txt > 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > > 24 workers are registered but idle.... queue length 0, 57 jobs > completed. > > Also, see below all 57 jobs, they all finished with an exit code of > 0, in other words succesfully! How many jobs does Swift think it > sent? > > Ioan > > iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat > GenericPortalWS_taskPerf.txt > //taskNum taskID workerID startTimeStamp execTimeStamp > resultsQueueTimeStamp endTimeStamp waitQueueTime ex > ecTime resultsQueueTime totalTime exitCode > 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 560614 > 560629 49780 338 15 50133 0 > 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 561200 561899 > 561909 216 699 10 925 0 > 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 561373 562150 > 562159 382 777 9 1168 0 > 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 1044916 > 1044926 62404 10200 10 72614 0 > 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 1046453 > 1047038 1047067 135 585 29 749 0 > 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 1046429 > 1053072 1053080 114 6643 8 6765 0 > 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 1047051 > 1054256 1054290 731 7205 34 7970 0 > 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 1054267 > 1054570 1054579 7943 303 9 8255 0 > 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 1053087 > 1056811 1056819 6765 3724 8 10497 0 > 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 1054583 > 1058691 1058719 8257 4108 28 12393 0 > 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 1058704 > 1059363 1059385 12373 659 22 13054 0 > 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 1056826 > 1060315 1060323 10497 3489 8 13994 0 > 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 1059375 > 1060589 1060596 13042 1214 7 14263 0 > 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 1060603 > 1060954 1061054 14265 351 100 14716 0 > 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 1060329 > 1061094 1061126 13993 765 32 14790 0 > 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 1061105 > 1065608 1065617 14414 4503 9 18926 0 > 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 1065622 > 1066307 1066315 18929 685 8 19622 0 > 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 1061045 > 1067540 1067563 14356 6495 23 20874 0 > 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 1066320 > 1069262 1069271 19625 2942 9 22576 0 > 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 1067551 > 1071003 1071011 20854 3452 8 24314 0 > 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 1071016 > 1071664 1071671 24316 648 7 24971 0 > 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 1069275 > 1071679 1071692 22577 2404 13 24994 0 > 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 1071687 > 1073978 1073988 24985 2291 10 27286 0 > 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 1073992 > 1075959 1075969 27286 1967 10 29263 0 > 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 1071699 > 1076704 1076713 24995 5005 9 30009 0 > 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 1075972 > 1077451 1077459 29264 1479 8 30751 0 > 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 1076717 > 1080157 1080165 30007 3440 8 33455 0 > 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 1077464 > 1080270 1080286 30752 2806 16 33574 0 > 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 1080170 > 1080611 1080619 33457 441 8 33906 0 > 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 1080624 > 1080973 1080983 33907 349 10 34266 0 > 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 1080281 > 1081405 1081413 33566 1124 8 34698 0 > 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 1080986 > 1082989 1082996 34267 2003 7 36277 0 > 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 1083002 > 1083370 1083378 36279 368 8 36655 0 > 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 1081417 > 1084830 1084837 34696 3413 7 38116 0 > 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 1084843 > 1085854 1085879 37761 1011 25 38797 0 > 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 1085865 > 1089502 1089511 38780 3637 9 42426 0 > 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 1089515 > 1089966 1089974 42428 451 8 42887 0 > 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 1083383 > 1091316 1091324 36658 7933 8 44599 0 > 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 1091329 > 1092042 1092049 44237 713 7 44957 0 > 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 1092055 > 1094242 1094249 44960 2187 7 47154 0 > 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 1089979 > 1094418 1094428 42889 4439 10 47338 0 > 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 1094433 > 1095082 1095089 47331 649 7 47987 0 > 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 1095095 > 1096846 1096853 47991 1751 7 49749 0 > 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 1094256 > 1098214 1098221 47156 3958 7 51121 0 > 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 1096859 > 1098627 1098637 49752 1768 10 51530 0 > 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 1094037 > 1098903 1098910 46940 4866 7 51813 0 > 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 1099192 > 1100210 1100246 52071 1018 36 53125 0 > 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 1097371 > 1100555 1100562 50260 3184 7 53451 0 > 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 1097135 > 1100896 1100904 50026 3761 8 53795 0 > 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 1098640 > 1101106 1101127 51523 2466 21 54010 0 > 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 1099965 > 1101217 1101224 52842 1252 7 54101 0 > 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 1098227 > 1101820 1101828 51112 3593 8 54713 0 > 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 1097375 > 1104132 1104139 50262 6757 7 57026 0 > 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 1100221 > 1106449 1106458 53096 6228 9 59333 0 > 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 1098916 > 1106473 1106481 51797 7557 8 59362 0 > 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 563384 > 1207793 1207801 71 644409 8 644488 0 > 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 563413 > 1216404 1216425 98 652991 21 653110 0 > > > > Veronika Nefedova wrote: >> OK. There is something weird happening. I've got several such >> entries in my swift log: >> >> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application exception: >> Task failed >> task:execute @ vdl-int.k, line: 332 >> vdl:execute2 @ execute-default.k, line: 22 >> vdl:execute @ MolDyn-244-loops.kml, line: 20 >> antchmbr @ MolDyn-244-loops.kml, line: 2845 >> vdl:mains @ MolDyn-244-loops.kml, line: 2267 >> >> >> Looks like antechamber has failed (?). And the failure is only on >> a swfit side, it never made it across to Falcon (there are no >> remote directories created). But I see some of antechamber jobs >> have finished (in shared). >> >> Yuqing -- could the changes you've made be responsible for these >> failures (I do not see how it could though) ? >> >> Ioan, what do you see in your logs ion these tasks: >> >> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-56-0-1186429255786) setting status to Failed >> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-57-0-1186429255798) setting status to Failed >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-59-0-1186429255800) setting status to Failed >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-60-0-1186429255805) setting status to Failed >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-61-0-1186429255811) setting status to Failed >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-58-0-1186429255814) setting status to Failed >> >> Nika >> >> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >> >>> OK! >>> Why don't we do one last run from my allocation, as everything is >>> set up already and ready to go! Make sure to enable all debug >>> logging. Falkon is up and running with all debug enabled! >>> >>> Falkon location is unchanged from the last experiment. >>> Falkon Factory Service: http://tg-viz-login2:50010/wsrf/services/ >>> GenericPortal/core/WS/GPFactoryService >>> Web Server (graphs): http://tg-viz-login2.uc.teragrid.org:51000/ >>> index.htm >>> >>> ANL/UC is not quite so idle as it was earlier, but I bet we could >>> still get 150~200 processors! >>> >>> Ioan >>> >>> Veronika Nefedova wrote: >>>> m050 and m179 finished just fine now via GRAM (thanks to Yuqing >>>> who fixed the m179 just in time!). We could start again the 244- >>>> molecule run to verify that nothing is wrong with the whole system. >>>> >>>> Nika >>>> >>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>> >>>>> >>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>> >>>>> >>>>> I started those 2 molecules via GRAM. I have no trust in m179 >>>>> finishing completely since I didn't change anything. I hope for >>>>> m050 to finish though... >>>>> You can watch the swift log on viper in ~nefedova/alamines/ >>>>> MolDyn-2-loops-be9484k93kk21.log >>>>> >>>>> Nika >>>>> >>>>>> Then, let's try another run with 244 molecules soon, as most >>>>>> of ANL/UC is free! >>>>>> >>>>>> Ioan >>>>>> >>>> >>>> >>> >> >> > From hategan at mcs.anl.gov Mon Aug 6 15:52:31 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 06 Aug 2007 15:52:31 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> Message-ID: <1186433552.13450.0.camel@blabla.mcs.anl.gov> On Mon, 2007-08-06 at 15:17 -0500, Veronika Nefedova wrote: > OK. There is something weird happening. I've got several such entries > in my swift log: > > 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application exception: > Task failed > task:execute @ vdl-int.k, line: 332 > vdl:execute2 @ execute-default.k, line: 22 > vdl:execute @ MolDyn-244-loops.kml, line: 20 > antchmbr @ MolDyn-244-loops.kml, line: 2845 > vdl:mains @ MolDyn-244-loops.kml, line: 2267 That doesn't say much. Any more details in the logs? > > > Looks like antechamber has failed (?). And the failure is only on a > swfit side, it never made it across to Falcon (there are no remote > directories created). But I see some of antechamber jobs have > finished (in shared). > > Yuqing -- could the changes you've made be responsible for these > failures (I do not see how it could though) ? > > Ioan, what do you see in your logs ion these tasks: > > 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, identity=urn: > 0-1-56-0-1186429255786) setting status to Failed > 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, identity=urn: > 0-1-57-0-1186429255798) setting status to Failed > 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: > 0-1-59-0-1186429255800) setting status to Failed > 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: > 0-1-60-0-1186429255805) setting status to Failed > 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: > 0-1-61-0-1186429255811) setting status to Failed > 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: > 0-1-58-0-1186429255814) setting status to Failed > > Nika > > On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: > > > OK! > > Why don't we do one last run from my allocation, as everything is > > set up already and ready to go! Make sure to enable all debug > > logging. Falkon is up and running with all debug enabled! > > > > Falkon location is unchanged from the last experiment. > > Falkon Factory Service: http://tg-viz-login2:50010/wsrf/services/ > > GenericPortal/core/WS/GPFactoryService > > Web Server (graphs): http://tg-viz-login2.uc.teragrid.org:51000/ > > index.htm > > > > ANL/UC is not quite so idle as it was earlier, but I bet we could > > still get 150~200 processors! > > > > Ioan > > > > Veronika Nefedova wrote: > >> m050 and m179 finished just fine now via GRAM (thanks to Yuqing > >> who fixed the m179 just in time!). We could start again the 244- > >> molecule run to verify that nothing is wrong with the whole system. > >> > >> Nika > >> > >> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: > >> > >>> > >>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: > >>> > >>> > >>> I started those 2 molecules via GRAM. I have no trust in m179 > >>> finishing completely since I didn't change anything. I hope for > >>> m050 to finish though... > >>> You can watch the swift log on viper in ~nefedova/alamines/ > >>> MolDyn-2-loops-be9484k93kk21.log > >>> > >>> Nika > >>> > >>>> Then, let's try another run with 244 molecules soon, as most of > >>>> ANL/UC is free! > >>>> > >>>> Ioan > >>>> > >> > >> > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Mon Aug 6 15:57:24 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 15:57:24 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186433552.13450.0.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <1186433552.13450.0.camel@blabla.mcs.anl.gov> Message-ID: Nope, nothing more really... Several of these: 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: 0-1-67-0-1186429255847) setting status to Failed 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: 0-1-69-0-1186429255851) setting status to Failed 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: 0-1-68-0-1186429255859) setting status to Failed 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: 0-1-70-0-1186429255863) setting status to Failed Nothing more specific... The log is huge. If you tell me what string to grep for - I might be able to find something relevant... NIka On Aug 6, 2007, at 3:52 PM, Mihael Hategan wrote: > On Mon, 2007-08-06 at 15:17 -0500, Veronika Nefedova wrote: >> OK. There is something weird happening. I've got several such entries >> in my swift log: >> >> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application exception: >> Task failed >> task:execute @ vdl-int.k, line: 332 >> vdl:execute2 @ execute-default.k, line: 22 >> vdl:execute @ MolDyn-244-loops.kml, line: 20 >> antchmbr @ MolDyn-244-loops.kml, line: 2845 >> vdl:mains @ MolDyn-244-loops.kml, line: 2267 > > That doesn't say much. Any more details in the logs? > >> >> >> Looks like antechamber has failed (?). And the failure is only on a >> swfit side, it never made it across to Falcon (there are no remote >> directories created). But I see some of antechamber jobs have >> finished (in shared). >> >> Yuqing -- could the changes you've made be responsible for these >> failures (I do not see how it could though) ? >> >> Ioan, what do you see in your logs ion these tasks: >> >> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-56-0-1186429255786) setting status to Failed >> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-57-0-1186429255798) setting status to Failed >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-59-0-1186429255800) setting status to Failed >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-60-0-1186429255805) setting status to Failed >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-61-0-1186429255811) setting status to Failed >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-58-0-1186429255814) setting status to Failed >> >> Nika >> >> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >> >>> OK! >>> Why don't we do one last run from my allocation, as everything is >>> set up already and ready to go! Make sure to enable all debug >>> logging. Falkon is up and running with all debug enabled! >>> >>> Falkon location is unchanged from the last experiment. >>> Falkon Factory Service: http://tg-viz-login2:50010/wsrf/services/ >>> GenericPortal/core/WS/GPFactoryService >>> Web Server (graphs): http://tg-viz-login2.uc.teragrid.org:51000/ >>> index.htm >>> >>> ANL/UC is not quite so idle as it was earlier, but I bet we could >>> still get 150~200 processors! >>> >>> Ioan >>> >>> Veronika Nefedova wrote: >>>> m050 and m179 finished just fine now via GRAM (thanks to Yuqing >>>> who fixed the m179 just in time!). We could start again the 244- >>>> molecule run to verify that nothing is wrong with the whole system. >>>> >>>> Nika >>>> >>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>> >>>>> >>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>> >>>>> >>>>> I started those 2 molecules via GRAM. I have no trust in m179 >>>>> finishing completely since I didn't change anything. I hope for >>>>> m050 to finish though... >>>>> You can watch the swift log on viper in ~nefedova/alamines/ >>>>> MolDyn-2-loops-be9484k93kk21.log >>>>> >>>>> Nika >>>>> >>>>>> Then, let's try another run with 244 molecules soon, as most of >>>>>> ANL/UC is free! >>>>>> >>>>>> Ioan >>>>>> >>>> >>>> >>> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > From hategan at mcs.anl.gov Mon Aug 6 16:04:46 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 06 Aug 2007 16:04:46 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <1186433552.13450.0.camel@blabla.mcs.anl.gov> Message-ID: <1186434286.15849.0.camel@blabla.mcs.anl.gov> Try "[E|e]xception". On Mon, 2007-08-06 at 15:57 -0500, Veronika Nefedova wrote: > Nope, nothing more really... > Several of these: > > 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: > 0-1-67-0-1186429255847) setting status to Failed > 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: > 0-1-69-0-1186429255851) setting status to Failed > 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: > 0-1-68-0-1186429255859) setting status to Failed > 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: > 0-1-70-0-1186429255863) setting status to Failed > > Nothing more specific... > > The log is huge. If you tell me what string to grep for - I might be > able to find something relevant... > > NIka > > On Aug 6, 2007, at 3:52 PM, Mihael Hategan wrote: > > > On Mon, 2007-08-06 at 15:17 -0500, Veronika Nefedova wrote: > >> OK. There is something weird happening. I've got several such entries > >> in my swift log: > >> > >> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application exception: > >> Task failed > >> task:execute @ vdl-int.k, line: 332 > >> vdl:execute2 @ execute-default.k, line: 22 > >> vdl:execute @ MolDyn-244-loops.kml, line: 20 > >> antchmbr @ MolDyn-244-loops.kml, line: 2845 > >> vdl:mains @ MolDyn-244-loops.kml, line: 2267 > > > > That doesn't say much. Any more details in the logs? > > > >> > >> > >> Looks like antechamber has failed (?). And the failure is only on a > >> swfit side, it never made it across to Falcon (there are no remote > >> directories created). But I see some of antechamber jobs have > >> finished (in shared). > >> > >> Yuqing -- could the changes you've made be responsible for these > >> failures (I do not see how it could though) ? > >> > >> Ioan, what do you see in your logs ion these tasks: > >> > >> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, identity=urn: > >> 0-1-56-0-1186429255786) setting status to Failed > >> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, identity=urn: > >> 0-1-57-0-1186429255798) setting status to Failed > >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: > >> 0-1-59-0-1186429255800) setting status to Failed > >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: > >> 0-1-60-0-1186429255805) setting status to Failed > >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: > >> 0-1-61-0-1186429255811) setting status to Failed > >> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: > >> 0-1-58-0-1186429255814) setting status to Failed > >> > >> Nika > >> > >> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: > >> > >>> OK! > >>> Why don't we do one last run from my allocation, as everything is > >>> set up already and ready to go! Make sure to enable all debug > >>> logging. Falkon is up and running with all debug enabled! > >>> > >>> Falkon location is unchanged from the last experiment. > >>> Falkon Factory Service: http://tg-viz-login2:50010/wsrf/services/ > >>> GenericPortal/core/WS/GPFactoryService > >>> Web Server (graphs): http://tg-viz-login2.uc.teragrid.org:51000/ > >>> index.htm > >>> > >>> ANL/UC is not quite so idle as it was earlier, but I bet we could > >>> still get 150~200 processors! > >>> > >>> Ioan > >>> > >>> Veronika Nefedova wrote: > >>>> m050 and m179 finished just fine now via GRAM (thanks to Yuqing > >>>> who fixed the m179 just in time!). We could start again the 244- > >>>> molecule run to verify that nothing is wrong with the whole system. > >>>> > >>>> Nika > >>>> > >>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: > >>>> > >>>>> > >>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: > >>>>> > >>>>> > >>>>> I started those 2 molecules via GRAM. I have no trust in m179 > >>>>> finishing completely since I didn't change anything. I hope for > >>>>> m050 to finish though... > >>>>> You can watch the swift log on viper in ~nefedova/alamines/ > >>>>> MolDyn-2-loops-be9484k93kk21.log > >>>>> > >>>>> Nika > >>>>> > >>>>>> Then, let's try another run with 244 molecules soon, as most of > >>>>>> ANL/UC is free! > >>>>>> > >>>>>> Ioan > >>>>>> > >>>> > >>>> > >>> > >> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > > From iraicu at cs.uchicago.edu Mon Aug 6 16:13:15 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 16:13:15 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> Message-ID: <46B78EEB.80800@cs.uchicago.edu> Falkon only has 57 tasks received, here they are: tg-viz-login.uc.teragrid.org:/home/iraicu/java/Falkon_v0.8.1/service/logs/GenericPortalWS.txt.0.summary 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh pre_ch-vsk58efi stdout.txt stderr.txt . ./m179.mol2 ./m050.mol2 m179_am1 m050_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-xsk58efi stdout.txt stderr.txt m179_am1 m179_am1.rtf m179_am1.crd m179_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m179_am1 -fi mol2 -rn m179 -o m179_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-ysk58efi stdout.txt stderr.txt m050_am1 m050_am1.rtf m050_am1.crd m050_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m050_am1 -fi mol2 -rn m050 -o m050_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh chrm-0tk58efi equil_solv.out_m050 stderr.txt equil_solv.inp parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp m050_am1.rtf m050_am1.prm m050_am1.crd water_400.crd equil_solv.out_m050 solv_m050.psf solv_m050_eq.crd solv_m050.rst solv_m050.trj solv_m050_min.crd /disks/scratchgpfs1/iraicu/ModLyn/bin/charmm.sh system:solv_m050 title:solv stitle:m050 rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm gaff:m050_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 rwater:15 nstep:10000 minstep:100 skipstep:100 startstep:10000 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh chrm-zsk58efi equil_solv.out_m179 stderr.txt equil_solv.inp parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp m179_am1.rtf m179_am1.prm m179_am1.crd water_400.crd equil_solv.out_m179 solv_m179.psf solv_m179_eq.crd solv_m179.rst solv_m179.trj solv_m179_min.crd /disks/scratchgpfs1/iraicu/ModLyn/bin/charmm.sh system:solv_m179 title:solv stitle:m179 rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm gaff:m179_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 rwater:15 nstep:10000 minstep:100 skipstep:100 startstep:10000 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh pre_ch-38lc8efi stdout.txt stderr.txt . ./m197.mol2 ./m129.mol2 ./m069.mol2 ./m163.mol2 ./m128.mol2 ./m035.mol2 ./m070.mol2 ./m221.mol2 ./m162.mol2 ./m198.mol2 ./m034.mol2 ./m001.mol2 ./m220.mol2 ./m033.mol2 ./m161.mol2 ./m032.mol2 ./m160.mol2 ./m130.mol2 ./m071.mol2 ./m002.mol2 ./m199.mol2 ./m175.mol2 ./m234.mol2 ./m048.mol2 ./m107.mol2 ./m047.mol2 ./m106.mol2 ./m124.mol2 ./m193.mol2 ./m225.mol2 ./m066.mol2 ./m125.mol2 ./m176.mol2 ./m194.mol2 ./m224.mol2 ./m235.mol2 ./m067.mol2 ./m165.mol2 ./m049.mol2 ./m126.mol2 ./m166.mol2 ./m108.mol2 ./m195.mol2 ./m038.mol2 ./m059.mol2 ./m036.mol2 ./m186.mol2 ./m164.mol2 ./m117.mol2 ./m223.mol2 ./m058.mol2 ./m037.mol2 ./m188.mol2 ./m068.mol2 ./m119.mol2 ./m187.mol2 ./m196.mol2 ./m118.mol2 ./m127.mol2 ./m222.mol2 ./m189.mol2 ./m060.mol2 ./m236.mol2 ./m109.mol2 ./m177.mol2 ./m050.mol2 ./m179.mol2 ./m178.mol2 ./m123.mol2 ./m237.mol2 ./m110.mol2 ./m191.mol2 ./m100.mol2 ./m064.mol2 ./m041.mol2 ./m238.mol2 ./m063.mol2 ./m228.mol2 ./m051.mol2 ./m122.mol2 ./m169.mol2 ./m121.mol2 ./m190.mol2 ./m120.mol2 ./m062.mol2 ./m065.mol2 ./m039.mol2 ./m192.mol2 ./m167.mol2 ./m227.mol2 ./m040.mol2 ./m226.mol2 ./m168.mol2 ./m239.mol2 ./m052.mol2 ./m111.mol2 ./m180.mol2 ./m053.mol2 ./m112.mol2 ./m181.mol2 ./m240.mol2 ./m054.mol2 ./m044.mol2 ./m113.mol2 ./m230.mol2 ./m103.mol2 ./m229.mol2 ./m061.mol2 ./m042.mol2 ./m101.mol2 ./m170.mol2 ./m043.mol2 ./m102.mol2 ./m171.mol2 ./m151.mol2 ./m083.mol2 ./m210.mol2 ./m014.mol2 ./m023.mol2 ./m200.mol2 ./m092.mol2 ./m091.mol2 ./m150.mol2 ./m209.mol2 ./m022.mol2 ./m024.mol2 ./m093.mol2 ./m015.mol2 ./m084.mol2 ./m142.mol2 ./m201.mol2 ./m016.mol2 ./m085.mol2 ./m143.mol2 ./m202.mol2 ./m010.mol2 ./m212.mol2 ./m138.mol2 ./m026.mol2 ./m011.mol2 ./m095.mol2 ./m139.mol2 ./m154.mol2 ./m211.mol2 ./m025.mol2 ./m094.mol2 ./m153.mol2 ./m213.mol2 ./m080.mol2 ./m012.mol2 ./m152.mol2 ./m081.mol2 ./m140.mol2 ./m013.mol2 ./m082.mol2 ./m141.mol2 ./m028.mol2 ./m097.mol2 ./m155.mol2 ./m008.mol2 ./m214.mol2 ./m135.mol2 ./m029.mol2 ./m076.mol2 ./m098.mol2 ./m007.mol2 ./m156.mol2 ./m134.mol2 ./m215.mol2 ./m137.mol2 ./m079.mol2 ./m009.mol2 ./m078.mol2 ./m077.mol2 ./m096.mol2 ./m136.mol2 ./m027.mol2 ./m132.mol2 ./m158.mol2 ./m073.mol2 ./m217.mol2 ./m030.mol2 ./m159.mol2 ./m072.mol2 ./m218.mol2 ./m003.mol2 ./m031.mol2 ./m004.mol2 ./m219.mol2 ./m131.mol2 ./m074.mol2 ./m133.mol2 ./m006.mol2 ./m075.mol2 ./m157.mol2 ./m099.mol2 ./m005.mol2 ./m216.mol2 ./m090.mol2 ./m021.mol2 ./m208.mol2 ./m149.mol2 ./m020.mol2 ./m207.mol2 ./m148.mol2 ./m088.mol2 ./m089.mol2 ./m206.mol2 ./m147.mol2 ./m019.mol2 ./m205.mol2 ./m146.mol2 ./m087.mol2 ./m018.mol2 ./m204.mol2 ./m145.mol2 ./m086.mol2 ./m017.mol2 ./m144.mol2 ./m203.mol2 ./m057.mol2 ./m116.mol2 ./m232.mol2 ./m173.mol2 ./m105.mol2 ./m046.mol2 ./m231.mol2 ./m172.mol2 ./m104.mol2 ./m045.mol2 ./m174.mol2 ./m233.mol2 ./m244.mol2 ./m185.mol2 ./m182.mol2 ./m243.mol2 ./m055.mol2 ./m241.mol2 ./m183.mol2 ./m114.mol2 ./m056.mol2 ./m242.mol2 ./m184.mol2 ./m115.mol2 m197_am1 m129_am1 m069_am1 m163_am1 m128_am1 m035_am1 m070_am1 m221_am1 m162_am1 m198_am1 m034_am1 m001_am1 m220_am1 m033_am1 m161_am1 m032_am1 m160_am1 m130_am1 m071_am1 m002_am1 m199_am1 m175_am1 m234_am1 m048_am1 m107_am1 m047_am1 m106_am1 m124_am1 m193_am1 m225_am1 m066_am1 m125_am1 m176_am1 m194_am1 m224_am1 m235_am1 m067_am1 m165_am1 m049_am1 m126_am1 m166_am1 m108_am1 m195_am1 m038_am1 m059_am1 m036_am1 m186_am1 m164_am1 m223_am1 m117_am1 m037_am1 m058_am1 m068_am1 m188_am1 m119_am1 m196_am1 m187_am1 m222_am1 m127_am1 m118_am1 m189_am1 m060_am1 m236_am1 m109_am1 m177_am1 m050_am1 m179_am1 m123_am1 m178_am1 m237_am1 m100_am1 m191_am1 m110_am1 m041_am1 m064_am1 m228_am1 m063_am1 m238_am1 m169_am1 m122_am1 m051_am1 m121_am1 m190_am1 m120_am1 m062_am1 m039_am1 m065_am1 m167_am1 m192_am1 m227_am1 m040_am1 m226_am1 m168_am1 m239_am1 m052_am1 m111_am1 m180_am1 m053_am1 m112_am1 m181_am1 m240_am1 m054_am1 m044_am1 m113_am1 m230_am1 m103_am1 m229_am1 m061_am1 m042_am1 m101_am1 m170_am1 m043_am1 m102_am1 m171_am1 m151_am1 m083_am1 m210_am1 m014_am1 m023_am1 m200_am1 m092_am1 m091_am1 m150_am1 m209_am1 m022_am1 m024_am1 m093_am1 m015_am1 m084_am1 m142_am1 m201_am1 m016_am1 m085_am1 m143_am1 m202_am1 m010_am1 m212_am1 m138_am1 m026_am1 m011_am1 m095_am1 m139_am1 m154_am1 m211_am1 m025_am1 m094_am1 m153_am1 m213_am1 m080_am1 m012_am1 m152_am1 m081_am1 m140_am1 m013_am1 m082_am1 m141_am1 m028_am1 m097_am1 m155_am1 m008_am1 m214_am1 m135_am1 m029_am1 m076_am1 m098_am1 m007_am1 m156_am1 m134_am1 m215_am1 m137_am1 m079_am1 m009_am1 m078_am1 m077_am1 m096_am1 m136_am1 m027_am1 m132_am1 m158_am1 m073_am1 m217_am1 m030_am1 m159_am1 m072_am1 m218_am1 m003_am1 m031_am1 m004_am1 m219_am1 m131_am1 m074_am1 m133_am1 m006_am1 m075_am1 m157_am1 m099_am1 m216_am1 m005_am1 m090_am1 m021_am1 m208_am1 m149_am1 m020_am1 m207_am1 m148_am1 m089_am1 m088_am1 m206_am1 m147_am1 m019_am1 m205_am1 m146_am1 m087_am1 m018_am1 m204_am1 m145_am1 m086_am1 m017_am1 m144_am1 m203_am1 m057_am1 m116_am1 m232_am1 m173_am1 m105_am1 m046_am1 m231_am1 m172_am1 m104_am1 m045_am1 m174_am1 m233_am1 m244_am1 m185_am1 m182_am1 m243_am1 m055_am1 m241_am1 m183_am1 m114_am1 m056_am1 m242_am1 m184_am1 m115_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-58lc8efi stdout.txt stderr.txt m197_am1 m197_am1.rtf m197_am1.crd m197_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m197_am1 -fi mol2 -rn m197 -o m197_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-48lc8efi stdout.txt stderr.txt m129_am1 m129_am1.rtf m129_am1.crd m129_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m129_am1 -fi mol2 -rn m129 -o m129_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-68lc8efi stdout.txt stderr.txt m069_am1 m069_am1.rtf m069_am1.crd m069_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m069_am1 -fi mol2 -rn m069 -o m069_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-88lc8efi stdout.txt stderr.txt m163_am1 m163_am1.rtf m163_am1.crd m163_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m163_am1 -fi mol2 -rn m163 -o m163_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-78lc8efi stdout.txt stderr.txt m128_am1 m128_am1.rtf m128_am1.crd m128_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m128_am1 -fi mol2 -rn m128 -o m128_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-98lc8efi stdout.txt stderr.txt m035_am1 m035_am1.rtf m035_am1.crd m035_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m035_am1 -fi mol2 -rn m035 -o m035_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-a8lc8efi stdout.txt stderr.txt m070_am1 m070_am1.rtf m070_am1.crd m070_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m070_am1 -fi mol2 -rn m070 -o m070_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-b8lc8efi stdout.txt stderr.txt m221_am1 m221_am1.rtf m221_am1.crd m221_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m221_am1 -fi mol2 -rn m221 -o m221_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-c8lc8efi stdout.txt stderr.txt m162_am1 m162_am1.rtf m162_am1.crd m162_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m162_am1 -fi mol2 -rn m162 -o m162_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-d8lc8efi stdout.txt stderr.txt m198_am1 m198_am1.rtf m198_am1.crd m198_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m198_am1 -fi mol2 -rn m198 -o m198_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-e8lc8efi stdout.txt stderr.txt m034_am1 m034_am1.rtf m034_am1.crd m034_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m034_am1 -fi mol2 -rn m034 -o m034_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-f8lc8efi stdout.txt stderr.txt m001_am1 m001_am1.rtf m001_am1.crd m001_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m001_am1 -fi mol2 -rn m001 -o m001_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-h8lc8efi stdout.txt stderr.txt m033_am1 m033_am1.rtf m033_am1.crd m033_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m033_am1 -fi mol2 -rn m033 -o m033_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-g8lc8efi stdout.txt stderr.txt m220_am1 m220_am1.rtf m220_am1.crd m220_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m220_am1 -fi mol2 -rn m220 -o m220_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-i8lc8efi stdout.txt stderr.txt m161_am1 m161_am1.rtf m161_am1.crd m161_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m161_am1 -fi mol2 -rn m161 -o m161_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-j8lc8efi stdout.txt stderr.txt m032_am1 m032_am1.rtf m032_am1.crd m032_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m032_am1 -fi mol2 -rn m032 -o m032_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-k8lc8efi stdout.txt stderr.txt m160_am1 m160_am1.rtf m160_am1.crd m160_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m160_am1 -fi mol2 -rn m160 -o m160_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-l8lc8efi stdout.txt stderr.txt m130_am1 m130_am1.rtf m130_am1.crd m130_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m130_am1 -fi mol2 -rn m130 -o m130_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-m8lc8efi stdout.txt stderr.txt m071_am1 m071_am1.rtf m071_am1.crd m071_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m071_am1 -fi mol2 -rn m071 -o m071_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-o8lc8efi stdout.txt stderr.txt m199_am1 m199_am1.rtf m199_am1.crd m199_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m199_am1 -fi mol2 -rn m199 -o m199_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-n8lc8efi stdout.txt stderr.txt m002_am1 m002_am1.rtf m002_am1.crd m002_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m002_am1 -fi mol2 -rn m002 -o m002_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-p8lc8efi stdout.txt stderr.txt m175_am1 m175_am1.rtf m175_am1.crd m175_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m175_am1 -fi mol2 -rn m175 -o m175_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-q8lc8efi stdout.txt stderr.txt m234_am1 m234_am1.rtf m234_am1.crd m234_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m234_am1 -fi mol2 -rn m234 -o m234_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-s8lc8efi stdout.txt stderr.txt m107_am1 m107_am1.rtf m107_am1.crd m107_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m107_am1 -fi mol2 -rn m107 -o m107_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-r8lc8efi stdout.txt stderr.txt m048_am1 m048_am1.rtf m048_am1.crd m048_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m048_am1 -fi mol2 -rn m048 -o m048_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-v8lc8efi stdout.txt stderr.txt m124_am1 m124_am1.rtf m124_am1.crd m124_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m124_am1 -fi mol2 -rn m124 -o m124_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-t8lc8efi stdout.txt stderr.txt m047_am1 m047_am1.rtf m047_am1.crd m047_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m047_am1 -fi mol2 -rn m047 -o m047_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-u8lc8efi stdout.txt stderr.txt m106_am1 m106_am1.rtf m106_am1.crd m106_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m106_am1 -fi mol2 -rn m106 -o m106_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-x8lc8efi stdout.txt stderr.txt m193_am1 m193_am1.rtf m193_am1.crd m193_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m193_am1 -fi mol2 -rn m193 -o m193_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-y8lc8efi stdout.txt stderr.txt m225_am1 m225_am1.rtf m225_am1.crd m225_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m225_am1 -fi mol2 -rn m225 -o m225_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-z8lc8efi stdout.txt stderr.txt m066_am1 m066_am1.rtf m066_am1.crd m066_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m066_am1 -fi mol2 -rn m066 -o m066_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-09lc8efi stdout.txt stderr.txt m125_am1 m125_am1.rtf m125_am1.crd m125_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m125_am1 -fi mol2 -rn m125 -o m125_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-29lc8efi stdout.txt stderr.txt m194_am1 m194_am1.rtf m194_am1.crd m194_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m194_am1 -fi mol2 -rn m194 -o m194_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-19lc8efi stdout.txt stderr.txt m176_am1 m176_am1.rtf m176_am1.crd m176_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m176_am1 -fi mol2 -rn m176 -o m176_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-39lc8efi stdout.txt stderr.txt m224_am1 m224_am1.rtf m224_am1.crd m224_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m224_am1 -fi mol2 -rn m224 -o m224_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-49lc8efi stdout.txt stderr.txt m235_am1 m235_am1.rtf m235_am1.crd m235_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m235_am1 -fi mol2 -rn m235 -o m235_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-69lc8efi stdout.txt stderr.txt m165_am1 m165_am1.rtf m165_am1.crd m165_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m165_am1 -fi mol2 -rn m165 -o m165_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-59lc8efi stdout.txt stderr.txt m067_am1 m067_am1.rtf m067_am1.crd m067_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m067_am1 -fi mol2 -rn m067 -o m067_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-79lc8efi stdout.txt stderr.txt m049_am1 m049_am1.rtf m049_am1.crd m049_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m049_am1 -fi mol2 -rn m049 -o m049_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-89lc8efi stdout.txt stderr.txt m126_am1 m126_am1.rtf m126_am1.crd m126_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m126_am1 -fi mol2 -rn m126 -o m126_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-99lc8efi stdout.txt stderr.txt m166_am1 m166_am1.rtf m166_am1.crd m166_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m166_am1 -fi mol2 -rn m166 -o m166_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-a9lc8efi stdout.txt stderr.txt m108_am1 m108_am1.rtf m108_am1.crd m108_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m108_am1 -fi mol2 -rn m108 -o m108_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-b9lc8efi stdout.txt stderr.txt m195_am1 m195_am1.rtf m195_am1.crd m195_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m195_am1 -fi mol2 -rn m195 -o m195_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-d9lc8efi stdout.txt stderr.txt m038_am1 m038_am1.rtf m038_am1.crd m038_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m038_am1 -fi mol2 -rn m038 -o m038_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-c9lc8efi stdout.txt stderr.txt m059_am1 m059_am1.rtf m059_am1.crd m059_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m059_am1 -fi mol2 -rn m059 -o m059_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-e9lc8efi stdout.txt stderr.txt m186_am1 m186_am1.rtf m186_am1.crd m186_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m186_am1 -fi mol2 -rn m186 -o m186_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-f9lc8efi stdout.txt stderr.txt m164_am1 m164_am1.rtf m164_am1.crd m164_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m164_am1 -fi mol2 -rn m164 -o m164_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-h9lc8efi stdout.txt stderr.txt m036_am1 m036_am1.rtf m036_am1.crd m036_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m036_am1 -fi mol2 -rn m036 -o m036_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-g9lc8efi stdout.txt stderr.txt m223_am1 m223_am1.rtf m223_am1.crd m223_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m223_am1 -fi mol2 -rn m223 -o m223_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-j9lc8efi stdout.txt stderr.txt m058_am1 m058_am1.rtf m058_am1.crd m058_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m058_am1 -fi mol2 -rn m058 -o m058_am1 -fo charmm -c bcc 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh antch-k9lc8efi stdout.txt stderr.txt m037_am1 m037_am1.rtf m037_am1.crd m037_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m037_am1 -fi mol2 -rn m037 -o m037_am1 -fo charmm -c bcc Veronika Nefedova wrote: > Swift thinks that it sent 248 jobs. > > nefedova at viper:~/alamines> grep "Running job " > MolDyn-244-loops-dbui34oxjr4j2.log | wc > 248 6931 56718 > nefedova at viper:~/alamines> > > On Aug 6, 2007, at 3:27 PM, Ioan Raicu wrote: > >> Everything is idle, there is no work to be done... >> >> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> tail >> GenericPortalWS_perf_per_sec.txt >> 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >> 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >> 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >> 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >> 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >> 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >> 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >> 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >> 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >> 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >> >> 24 workers are registered but idle.... queue length 0, 57 jobs >> completed. >> >> Also, see below all 57 jobs, they all finished with an exit code of >> 0, in other words succesfully! How many jobs does Swift think it sent? >> >> Ioan >> >> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >> GenericPortalWS_taskPerf.txt >> //taskNum taskID workerID startTimeStamp execTimeStamp >> resultsQueueTimeStamp endTimeStamp waitQueueTime ex >> ecTime resultsQueueTime totalTime exitCode >> 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 560614 >> 560629 49780 338 15 50133 0 >> 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 561200 561899 >> 561909 216 699 10 925 0 >> 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 561373 562150 >> 562159 382 777 9 1168 0 >> 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 1044916 >> 1044926 62404 10200 10 72614 0 >> 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 1046453 >> 1047038 1047067 135 585 29 749 0 >> 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 1046429 >> 1053072 1053080 114 6643 8 6765 0 >> 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 1047051 >> 1054256 1054290 731 7205 34 7970 0 >> 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 1054267 >> 1054570 1054579 7943 303 9 8255 0 >> 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 1053087 >> 1056811 1056819 6765 3724 8 10497 0 >> 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 1054583 >> 1058691 1058719 8257 4108 28 12393 0 >> 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 1058704 >> 1059363 1059385 12373 659 22 13054 0 >> 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 1056826 >> 1060315 1060323 10497 3489 8 13994 0 >> 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 1059375 >> 1060589 1060596 13042 1214 7 14263 0 >> 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 1060603 >> 1060954 1061054 14265 351 100 14716 0 >> 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 1060329 >> 1061094 1061126 13993 765 32 14790 0 >> 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 1061105 >> 1065608 1065617 14414 4503 9 18926 0 >> 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 1065622 >> 1066307 1066315 18929 685 8 19622 0 >> 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 1061045 >> 1067540 1067563 14356 6495 23 20874 0 >> 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 1066320 >> 1069262 1069271 19625 2942 9 22576 0 >> 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 1067551 >> 1071003 1071011 20854 3452 8 24314 0 >> 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 1071016 >> 1071664 1071671 24316 648 7 24971 0 >> 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 1069275 >> 1071679 1071692 22577 2404 13 24994 0 >> 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 1071687 >> 1073978 1073988 24985 2291 10 27286 0 >> 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 1073992 >> 1075959 1075969 27286 1967 10 29263 0 >> 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 1071699 >> 1076704 1076713 24995 5005 9 30009 0 >> 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 1075972 >> 1077451 1077459 29264 1479 8 30751 0 >> 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 1076717 >> 1080157 1080165 30007 3440 8 33455 0 >> 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 1077464 >> 1080270 1080286 30752 2806 16 33574 0 >> 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 1080170 >> 1080611 1080619 33457 441 8 33906 0 >> 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 1080624 >> 1080973 1080983 33907 349 10 34266 0 >> 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 1080281 >> 1081405 1081413 33566 1124 8 34698 0 >> 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 1080986 >> 1082989 1082996 34267 2003 7 36277 0 >> 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 1083002 >> 1083370 1083378 36279 368 8 36655 0 >> 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 1081417 >> 1084830 1084837 34696 3413 7 38116 0 >> 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 1084843 >> 1085854 1085879 37761 1011 25 38797 0 >> 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 1085865 >> 1089502 1089511 38780 3637 9 42426 0 >> 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 1089515 >> 1089966 1089974 42428 451 8 42887 0 >> 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 1083383 >> 1091316 1091324 36658 7933 8 44599 0 >> 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 1091329 >> 1092042 1092049 44237 713 7 44957 0 >> 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 1092055 >> 1094242 1094249 44960 2187 7 47154 0 >> 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 1089979 >> 1094418 1094428 42889 4439 10 47338 0 >> 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 1094433 >> 1095082 1095089 47331 649 7 47987 0 >> 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 1095095 >> 1096846 1096853 47991 1751 7 49749 0 >> 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 1094256 >> 1098214 1098221 47156 3958 7 51121 0 >> 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 1096859 >> 1098627 1098637 49752 1768 10 51530 0 >> 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 1094037 >> 1098903 1098910 46940 4866 7 51813 0 >> 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 1099192 >> 1100210 1100246 52071 1018 36 53125 0 >> 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 1097371 >> 1100555 1100562 50260 3184 7 53451 0 >> 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 1097135 >> 1100896 1100904 50026 3761 8 53795 0 >> 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 1098640 >> 1101106 1101127 51523 2466 21 54010 0 >> 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 1099965 >> 1101217 1101224 52842 1252 7 54101 0 >> 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 1098227 >> 1101820 1101828 51112 3593 8 54713 0 >> 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 1097375 >> 1104132 1104139 50262 6757 7 57026 0 >> 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 1100221 >> 1106449 1106458 53096 6228 9 59333 0 >> 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 1098916 >> 1106473 1106481 51797 7557 8 59362 0 >> 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 563384 1207793 >> 1207801 71 644409 8 644488 0 >> 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 563413 1216404 >> 1216425 98 652991 21 653110 0 >> >> >> >> Veronika Nefedova wrote: >>> OK. There is something weird happening. I've got several such >>> entries in my swift log: >>> >>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application exception: >>> Task failed >>> task:execute @ vdl-int.k, line: 332 >>> vdl:execute2 @ execute-default.k, line: 22 >>> vdl:execute @ MolDyn-244-loops.kml, line: 20 >>> antchmbr @ MolDyn-244-loops.kml, line: 2845 >>> vdl:mains @ MolDyn-244-loops.kml, line: 2267 >>> >>> >>> Looks like antechamber has failed (?). And the failure is only on a >>> swfit side, it never made it across to Falcon (there are no remote >>> directories created). But I see some of antechamber jobs have >>> finished (in shared). >>> >>> Yuqing -- could the changes you've made be responsible for these >>> failures (I do not see how it could though) ? >>> >>> Ioan, what do you see in your logs ion these tasks: >>> >>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, >>> identity=urn:0-1-56-0-1186429255786) setting status to Failed >>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, >>> identity=urn:0-1-57-0-1186429255798) setting status to Failed >>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>> identity=urn:0-1-59-0-1186429255800) setting status to Failed >>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>> identity=urn:0-1-60-0-1186429255805) setting status to Failed >>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>> identity=urn:0-1-61-0-1186429255811) setting status to Failed >>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>> identity=urn:0-1-58-0-1186429255814) setting status to Failed >>> >>> Nika >>> >>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >>> >>>> OK! >>>> Why don't we do one last run from my allocation, as everything is >>>> set up already and ready to go! Make sure to enable all debug >>>> logging. Falkon is up and running with all debug enabled! >>>> >>>> Falkon location is unchanged from the last experiment. >>>> Falkon Factory Service: >>>> http://tg-viz-login2:50010/wsrf/services/GenericPortal/core/WS/GPFactoryService >>>> >>>> Web Server (graphs): >>>> http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>> >>>> ANL/UC is not quite so idle as it was earlier, but I bet we could >>>> still get 150~200 processors! >>>> >>>> Ioan >>>> >>>> Veronika Nefedova wrote: >>>>> m050 and m179 finished just fine now via GRAM (thanks to Yuqing >>>>> who fixed the m179 just in time!). We could start again the 244- >>>>> molecule run to verify that nothing is wrong with the whole system. >>>>> >>>>> Nika >>>>> >>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>>> >>>>>> >>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>>> >>>>>> >>>>>> I started those 2 molecules via GRAM. I have no trust in m179 >>>>>> finishing completely since I didn't change anything. I hope for >>>>>> m050 to finish though... >>>>>> You can watch the swift log on viper in >>>>>> ~nefedova/alamines/MolDyn-2-loops-be9484k93kk21.log >>>>>> >>>>>> Nika >>>>>> >>>>>>> Then, let's try another run with 244 molecules soon, as most of >>>>>>> ANL/UC is free! >>>>>>> >>>>>>> Ioan >>>>>>> >>>>> >>>>> >>>> >>> >>> >> > > From nefedova at mcs.anl.gov Mon Aug 6 16:15:58 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 16:15:58 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B78EEB.80800@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> Message-ID: Let attribute this all to NFS problems (maybe TG-UC was affected?) and start clean... Nika On Aug 6, 2007, at 4:13 PM, Ioan Raicu wrote: > Falkon only has 57 tasks received, here they are: > tg-viz-login.uc.teragrid.org:/home/iraicu/java/Falkon_v0.8.1/ > service/logs/GenericPortalWS.txt.0.summary > > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > pre_ch-vsk58efi stdout.txt stderr.txt . ./m179.mol2 ./m050.mol2 > m179_am1 m050_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-xsk58efi stdout.txt stderr.txt m179_am1 m179_am1.rtf > m179_am1.crd m179_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m179_am1 -fi mol2 -rn m179 -o m179_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-ysk58efi stdout.txt stderr.txt m050_am1 m050_am1.rtf > m050_am1.crd m050_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m050_am1 -fi mol2 -rn m050 -o m050_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > chrm-0tk58efi equil_solv.out_m050 stderr.txt equil_solv.inp > parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp > m050_am1.rtf m050_am1.prm m050_am1.crd water_400.crd > equil_solv.out_m050 solv_m050.psf solv_m050_eq.crd solv_m050.rst > solv_m050.trj solv_m050_min.crd /disks/scratchgpfs1/iraicu/ModLyn/ > bin/charmm.sh system:solv_m050 title:solv stitle:m050 > rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm > gaff:m050_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 rwater: > 15 nstep:10000 minstep:100 skipstep:100 startstep:10000 > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > chrm-zsk58efi equil_solv.out_m179 stderr.txt equil_solv.inp > parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp > m179_am1.rtf m179_am1.prm m179_am1.crd water_400.crd > equil_solv.out_m179 solv_m179.psf solv_m179_eq.crd solv_m179.rst > solv_m179.trj solv_m179_min.crd /disks/scratchgpfs1/iraicu/ModLyn/ > bin/charmm.sh system:solv_m179 title:solv stitle:m179 > rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm > gaff:m179_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 rwater: > 15 nstep:10000 minstep:100 skipstep:100 startstep:10000 > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > pre_ch-38lc8efi stdout.txt stderr.txt . ./m197.mol2 ./m129.mol2 ./ > m069.mol2 ./m163.mol2 ./m128.mol2 ./m035.mol2 ./m070.mol2 ./ > m221.mol2 ./m162.mol2 ./m198.mol2 ./m034.mol2 ./m001.mol2 ./ > m220.mol2 ./m033.mol2 ./m161.mol2 ./m032.mol2 ./m160.mol2 ./ > m130.mol2 ./m071.mol2 ./m002.mol2 ./m199.mol2 ./m175.mol2 ./ > m234.mol2 ./m048.mol2 ./m107.mol2 ./m047.mol2 ./m106.mol2 ./ > m124.mol2 ./m193.mol2 ./m225.mol2 ./m066.mol2 ./m125.mol2 ./ > m176.mol2 ./m194.mol2 ./m224.mol2 ./m235.mol2 ./m067.mol2 ./ > m165.mol2 ./m049.mol2 ./m126.mol2 ./m166.mol2 ./m108.mol2 ./ > m195.mol2 ./m038.mol2 ./m059.mol2 ./m036.mol2 ./m186.mol2 ./ > m164.mol2 ./m117.mol2 ./m223.mol2 ./m058.mol2 ./m037.mol2 ./ > m188.mol2 ./m068.mol2 ./m119.mol2 ./m187.mol2 ./m196.mol2 ./ > m118.mol2 ./m127.mol2 ./m222.mol2 ./m189.mol2 ./m060.mol2 ./ > m236.mol2 ./m109.mol2 ./m177.mol2 ./m050.mol2 ./m179.mol2 ./ > m178.mol2 ./m123.mol2 ./m237.mol2 ./m110.mol2 ./m191.mol2 ./ > m100.mol2 ./m064.mol2 ./m041.mol2 ./m238.mol2 ./m063.mol2 ./ > m228.mol2 ./m051.mol2 ./m122.mol2 ./m169.mol2 ./m121.mol2 ./ > m190.mol2 ./m120.mol2 ./m062.mol2 ./m065.mol2 ./m039.mol2 ./ > m192.mol2 ./m167.mol2 ./m227.mol2 ./m040.mol2 ./m226.mol2 ./ > m168.mol2 ./m239.mol2 ./m052.mol2 ./m111.mol2 ./m180.mol2 ./ > m053.mol2 ./m112.mol2 ./m181.mol2 ./m240.mol2 ./m054.mol2 ./ > m044.mol2 ./m113.mol2 ./m230.mol2 ./m103.mol2 ./m229.mol2 ./ > m061.mol2 ./m042.mol2 ./m101.mol2 ./m170.mol2 ./m043.mol2 ./ > m102.mol2 ./m171.mol2 ./m151.mol2 ./m083.mol2 ./m210.mol2 ./ > m014.mol2 ./m023.mol2 ./m200.mol2 ./m092.mol2 ./m091.mol2 ./ > m150.mol2 ./m209.mol2 ./m022.mol2 ./m024.mol2 ./m093.mol2 ./ > m015.mol2 ./m084.mol2 ./m142.mol2 ./m201.mol2 ./m016.mol2 ./ > m085.mol2 ./m143.mol2 ./m202.mol2 ./m010.mol2 ./m212.mol2 ./ > m138.mol2 ./m026.mol2 ./m011.mol2 ./m095.mol2 ./m139.mol2 ./ > m154.mol2 ./m211.mol2 ./m025.mol2 ./m094.mol2 ./m153.mol2 ./ > m213.mol2 ./m080.mol2 ./m012.mol2 ./m152.mol2 ./m081.mol2 ./ > m140.mol2 ./m013.mol2 ./m082.mol2 ./m141.mol2 ./m028.mol2 ./ > m097.mol2 ./m155.mol2 ./m008.mol2 ./m214.mol2 ./m135.mol2 ./ > m029.mol2 ./m076.mol2 ./m098.mol2 ./m007.mol2 ./m156.mol2 ./ > m134.mol2 ./m215.mol2 ./m137.mol2 ./m079.mol2 ./m009.mol2 ./ > m078.mol2 ./m077.mol2 ./m096.mol2 ./m136.mol2 ./m027.mol2 ./ > m132.mol2 ./m158.mol2 ./m073.mol2 ./m217.mol2 ./m030.mol2 ./ > m159.mol2 ./m072.mol2 ./m218.mol2 ./m003.mol2 ./m031.mol2 ./ > m004.mol2 ./m219.mol2 ./m131.mol2 ./m074.mol2 ./m133.mol2 ./ > m006.mol2 ./m075.mol2 ./m157.mol2 ./m099.mol2 ./m005.mol2 ./ > m216.mol2 ./m090.mol2 ./m021.mol2 ./m208.mol2 ./m149.mol2 ./ > m020.mol2 ./m207.mol2 ./m148.mol2 ./m088.mol2 ./m089.mol2 ./ > m206.mol2 ./m147.mol2 ./m019.mol2 ./m205.mol2 ./m146.mol2 ./ > m087.mol2 ./m018.mol2 ./m204.mol2 ./m145.mol2 ./m086.mol2 ./ > m017.mol2 ./m144.mol2 ./m203.mol2 ./m057.mol2 ./m116.mol2 ./ > m232.mol2 ./m173.mol2 ./m105.mol2 ./m046.mol2 ./m231.mol2 ./ > m172.mol2 ./m104.mol2 ./m045.mol2 ./m174.mol2 ./m233.mol2 ./ > m244.mol2 ./m185.mol2 ./m182.mol2 ./m243.mol2 ./m055.mol2 ./ > m241.mol2 ./m183.mol2 ./m114.mol2 ./m056.mol2 ./m242.mol2 ./ > m184.mol2 ./m115.mol2 m197_am1 m129_am1 m069_am1 m163_am1 m128_am1 > m035_am1 m070_am1 m221_am1 m162_am1 m198_am1 m034_am1 m001_am1 > m220_am1 m033_am1 m161_am1 m032_am1 m160_am1 m130_am1 m071_am1 > m002_am1 m199_am1 m175_am1 m234_am1 m048_am1 m107_am1 m047_am1 > m106_am1 m124_am1 m193_am1 m225_am1 m066_am1 m125_am1 m176_am1 > m194_am1 m224_am1 m235_am1 m067_am1 m165_am1 m049_am1 m126_am1 > m166_am1 m108_am1 m195_am1 m038_am1 m059_am1 m036_am1 m186_am1 > m164_am1 m223_am1 m117_am1 m037_am1 m058_am1 m068_am1 m188_am1 > m119_am1 m196_am1 m187_am1 m222_am1 m127_am1 m118_am1 m189_am1 > m060_am1 m236_am1 m109_am1 m177_am1 m050_am1 m179_am1 m123_am1 > m178_am1 m237_am1 m100_am1 m191_am1 m110_am1 m041_am1 m064_am1 > m228_am1 m063_am1 m238_am1 m169_am1 m122_am1 m051_am1 m121_am1 > m190_am1 m120_am1 m062_am1 m039_am1 m065_am1 m167_am1 m192_am1 > m227_am1 m040_am1 m226_am1 m168_am1 m239_am1 m052_am1 m111_am1 > m180_am1 m053_am1 m112_am1 m181_am1 m240_am1 m054_am1 m044_am1 > m113_am1 m230_am1 m103_am1 m229_am1 m061_am1 m042_am1 m101_am1 > m170_am1 m043_am1 m102_am1 m171_am1 m151_am1 m083_am1 m210_am1 > m014_am1 m023_am1 m200_am1 m092_am1 m091_am1 m150_am1 m209_am1 > m022_am1 m024_am1 m093_am1 m015_am1 m084_am1 m142_am1 m201_am1 > m016_am1 m085_am1 m143_am1 m202_am1 m010_am1 m212_am1 m138_am1 > m026_am1 m011_am1 m095_am1 m139_am1 m154_am1 m211_am1 m025_am1 > m094_am1 m153_am1 m213_am1 m080_am1 m012_am1 m152_am1 m081_am1 > m140_am1 m013_am1 m082_am1 m141_am1 m028_am1 m097_am1 m155_am1 > m008_am1 m214_am1 m135_am1 m029_am1 m076_am1 m098_am1 m007_am1 > m156_am1 m134_am1 m215_am1 m137_am1 m079_am1 m009_am1 m078_am1 > m077_am1 m096_am1 m136_am1 m027_am1 m132_am1 m158_am1 m073_am1 > m217_am1 m030_am1 m159_am1 m072_am1 m218_am1 m003_am1 m031_am1 > m004_am1 m219_am1 m131_am1 m074_am1 m133_am1 m006_am1 m075_am1 > m157_am1 m099_am1 m216_am1 m005_am1 m090_am1 m021_am1 m208_am1 > m149_am1 m020_am1 m207_am1 m148_am1 m089_am1 m088_am1 m206_am1 > m147_am1 m019_am1 m205_am1 m146_am1 m087_am1 m018_am1 m204_am1 > m145_am1 m086_am1 m017_am1 m144_am1 m203_am1 m057_am1 m116_am1 > m232_am1 m173_am1 m105_am1 m046_am1 m231_am1 m172_am1 m104_am1 > m045_am1 m174_am1 m233_am1 m244_am1 m185_am1 m182_am1 m243_am1 > m055_am1 m241_am1 m183_am1 m114_am1 m056_am1 m242_am1 m184_am1 > m115_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-58lc8efi stdout.txt stderr.txt m197_am1 m197_am1.rtf > m197_am1.crd m197_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m197_am1 -fi mol2 -rn m197 -o m197_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-48lc8efi stdout.txt stderr.txt m129_am1 m129_am1.rtf > m129_am1.crd m129_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m129_am1 -fi mol2 -rn m129 -o m129_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-68lc8efi stdout.txt stderr.txt m069_am1 m069_am1.rtf > m069_am1.crd m069_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m069_am1 -fi mol2 -rn m069 -o m069_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-88lc8efi stdout.txt stderr.txt m163_am1 m163_am1.rtf > m163_am1.crd m163_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m163_am1 -fi mol2 -rn m163 -o m163_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-78lc8efi stdout.txt stderr.txt m128_am1 m128_am1.rtf > m128_am1.crd m128_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m128_am1 -fi mol2 -rn m128 -o m128_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-98lc8efi stdout.txt stderr.txt m035_am1 m035_am1.rtf > m035_am1.crd m035_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m035_am1 -fi mol2 -rn m035 -o m035_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-a8lc8efi stdout.txt stderr.txt m070_am1 m070_am1.rtf > m070_am1.crd m070_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m070_am1 -fi mol2 -rn m070 -o m070_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-b8lc8efi stdout.txt stderr.txt m221_am1 m221_am1.rtf > m221_am1.crd m221_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m221_am1 -fi mol2 -rn m221 -o m221_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-c8lc8efi stdout.txt stderr.txt m162_am1 m162_am1.rtf > m162_am1.crd m162_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m162_am1 -fi mol2 -rn m162 -o m162_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-d8lc8efi stdout.txt stderr.txt m198_am1 m198_am1.rtf > m198_am1.crd m198_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m198_am1 -fi mol2 -rn m198 -o m198_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-e8lc8efi stdout.txt stderr.txt m034_am1 m034_am1.rtf > m034_am1.crd m034_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m034_am1 -fi mol2 -rn m034 -o m034_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-f8lc8efi stdout.txt stderr.txt m001_am1 m001_am1.rtf > m001_am1.crd m001_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m001_am1 -fi mol2 -rn m001 -o m001_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-h8lc8efi stdout.txt stderr.txt m033_am1 m033_am1.rtf > m033_am1.crd m033_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m033_am1 -fi mol2 -rn m033 -o m033_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-g8lc8efi stdout.txt stderr.txt m220_am1 m220_am1.rtf > m220_am1.crd m220_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m220_am1 -fi mol2 -rn m220 -o m220_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-i8lc8efi stdout.txt stderr.txt m161_am1 m161_am1.rtf > m161_am1.crd m161_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m161_am1 -fi mol2 -rn m161 -o m161_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-j8lc8efi stdout.txt stderr.txt m032_am1 m032_am1.rtf > m032_am1.crd m032_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m032_am1 -fi mol2 -rn m032 -o m032_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-k8lc8efi stdout.txt stderr.txt m160_am1 m160_am1.rtf > m160_am1.crd m160_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m160_am1 -fi mol2 -rn m160 -o m160_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-l8lc8efi stdout.txt stderr.txt m130_am1 m130_am1.rtf > m130_am1.crd m130_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m130_am1 -fi mol2 -rn m130 -o m130_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-m8lc8efi stdout.txt stderr.txt m071_am1 m071_am1.rtf > m071_am1.crd m071_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m071_am1 -fi mol2 -rn m071 -o m071_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-o8lc8efi stdout.txt stderr.txt m199_am1 m199_am1.rtf > m199_am1.crd m199_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m199_am1 -fi mol2 -rn m199 -o m199_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-n8lc8efi stdout.txt stderr.txt m002_am1 m002_am1.rtf > m002_am1.crd m002_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m002_am1 -fi mol2 -rn m002 -o m002_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-p8lc8efi stdout.txt stderr.txt m175_am1 m175_am1.rtf > m175_am1.crd m175_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m175_am1 -fi mol2 -rn m175 -o m175_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-q8lc8efi stdout.txt stderr.txt m234_am1 m234_am1.rtf > m234_am1.crd m234_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m234_am1 -fi mol2 -rn m234 -o m234_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-s8lc8efi stdout.txt stderr.txt m107_am1 m107_am1.rtf > m107_am1.crd m107_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m107_am1 -fi mol2 -rn m107 -o m107_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-r8lc8efi stdout.txt stderr.txt m048_am1 m048_am1.rtf > m048_am1.crd m048_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m048_am1 -fi mol2 -rn m048 -o m048_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-v8lc8efi stdout.txt stderr.txt m124_am1 m124_am1.rtf > m124_am1.crd m124_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m124_am1 -fi mol2 -rn m124 -o m124_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-t8lc8efi stdout.txt stderr.txt m047_am1 m047_am1.rtf > m047_am1.crd m047_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m047_am1 -fi mol2 -rn m047 -o m047_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-u8lc8efi stdout.txt stderr.txt m106_am1 m106_am1.rtf > m106_am1.crd m106_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m106_am1 -fi mol2 -rn m106 -o m106_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-x8lc8efi stdout.txt stderr.txt m193_am1 m193_am1.rtf > m193_am1.crd m193_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m193_am1 -fi mol2 -rn m193 -o m193_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-y8lc8efi stdout.txt stderr.txt m225_am1 m225_am1.rtf > m225_am1.crd m225_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m225_am1 -fi mol2 -rn m225 -o m225_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-z8lc8efi stdout.txt stderr.txt m066_am1 m066_am1.rtf > m066_am1.crd m066_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m066_am1 -fi mol2 -rn m066 -o m066_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-09lc8efi stdout.txt stderr.txt m125_am1 m125_am1.rtf > m125_am1.crd m125_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m125_am1 -fi mol2 -rn m125 -o m125_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-29lc8efi stdout.txt stderr.txt m194_am1 m194_am1.rtf > m194_am1.crd m194_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m194_am1 -fi mol2 -rn m194 -o m194_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-19lc8efi stdout.txt stderr.txt m176_am1 m176_am1.rtf > m176_am1.crd m176_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m176_am1 -fi mol2 -rn m176 -o m176_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-39lc8efi stdout.txt stderr.txt m224_am1 m224_am1.rtf > m224_am1.crd m224_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m224_am1 -fi mol2 -rn m224 -o m224_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-49lc8efi stdout.txt stderr.txt m235_am1 m235_am1.rtf > m235_am1.crd m235_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m235_am1 -fi mol2 -rn m235 -o m235_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-69lc8efi stdout.txt stderr.txt m165_am1 m165_am1.rtf > m165_am1.crd m165_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m165_am1 -fi mol2 -rn m165 -o m165_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-59lc8efi stdout.txt stderr.txt m067_am1 m067_am1.rtf > m067_am1.crd m067_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m067_am1 -fi mol2 -rn m067 -o m067_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-79lc8efi stdout.txt stderr.txt m049_am1 m049_am1.rtf > m049_am1.crd m049_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m049_am1 -fi mol2 -rn m049 -o m049_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-89lc8efi stdout.txt stderr.txt m126_am1 m126_am1.rtf > m126_am1.crd m126_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m126_am1 -fi mol2 -rn m126 -o m126_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-99lc8efi stdout.txt stderr.txt m166_am1 m166_am1.rtf > m166_am1.crd m166_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m166_am1 -fi mol2 -rn m166 -o m166_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-a9lc8efi stdout.txt stderr.txt m108_am1 m108_am1.rtf > m108_am1.crd m108_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m108_am1 -fi mol2 -rn m108 -o m108_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-b9lc8efi stdout.txt stderr.txt m195_am1 m195_am1.rtf > m195_am1.crd m195_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m195_am1 -fi mol2 -rn m195 -o m195_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-d9lc8efi stdout.txt stderr.txt m038_am1 m038_am1.rtf > m038_am1.crd m038_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m038_am1 -fi mol2 -rn m038 -o m038_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-c9lc8efi stdout.txt stderr.txt m059_am1 m059_am1.rtf > m059_am1.crd m059_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m059_am1 -fi mol2 -rn m059 -o m059_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-e9lc8efi stdout.txt stderr.txt m186_am1 m186_am1.rtf > m186_am1.crd m186_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m186_am1 -fi mol2 -rn m186 -o m186_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-f9lc8efi stdout.txt stderr.txt m164_am1 m164_am1.rtf > m164_am1.crd m164_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m164_am1 -fi mol2 -rn m164 -o m164_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-h9lc8efi stdout.txt stderr.txt m036_am1 m036_am1.rtf > m036_am1.crd m036_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m036_am1 -fi mol2 -rn m036 -o m036_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-g9lc8efi stdout.txt stderr.txt m223_am1 m223_am1.rtf > m223_am1.crd m223_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m223_am1 -fi mol2 -rn m223 -o m223_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-j9lc8efi stdout.txt stderr.txt m058_am1 m058_am1.rtf > m058_am1.crd m058_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m058_am1 -fi mol2 -rn m058 -o m058_am1 -fo > charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-k9lc8efi stdout.txt stderr.txt m037_am1 m037_am1.rtf > m037_am1.crd m037_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > antechamber.sh -s 2 -i m037_am1 -fi mol2 -rn m037 -o m037_am1 -fo > charmm -c bcc > > > > Veronika Nefedova wrote: >> Swift thinks that it sent 248 jobs. >> >> nefedova at viper:~/alamines> grep "Running job " MolDyn-244-loops- >> dbui34oxjr4j2.log | wc >> 248 6931 56718 >> nefedova at viper:~/alamines> >> >> On Aug 6, 2007, at 3:27 PM, Ioan Raicu wrote: >> >>> Everything is idle, there is no work to be done... >>> >>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> tail >>> GenericPortalWS_perf_per_sec.txt >>> 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> >>> 24 workers are registered but idle.... queue length 0, 57 jobs >>> completed. >>> >>> Also, see below all 57 jobs, they all finished with an exit code >>> of 0, in other words succesfully! How many jobs does Swift think >>> it sent? >>> >>> Ioan >>> >>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>> GenericPortalWS_taskPerf.txt >>> //taskNum taskID workerID startTimeStamp execTimeStamp >>> resultsQueueTimeStamp endTimeStamp waitQueueTime ex >>> ecTime resultsQueueTime totalTime exitCode >>> 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 560614 >>> 560629 49780 338 15 50133 0 >>> 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 561200 >>> 561899 561909 216 699 10 925 0 >>> 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 561373 >>> 562150 562159 382 777 9 1168 0 >>> 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 1044916 >>> 1044926 62404 10200 10 72614 0 >>> 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 1046453 >>> 1047038 1047067 135 585 29 749 0 >>> 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 1046429 >>> 1053072 1053080 114 6643 8 6765 0 >>> 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 1047051 >>> 1054256 1054290 731 7205 34 7970 0 >>> 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 1054267 >>> 1054570 1054579 7943 303 9 8255 0 >>> 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 1053087 >>> 1056811 1056819 6765 3724 8 10497 0 >>> 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 1054583 >>> 1058691 1058719 8257 4108 28 12393 0 >>> 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 1058704 >>> 1059363 1059385 12373 659 22 13054 0 >>> 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 1056826 >>> 1060315 1060323 10497 3489 8 13994 0 >>> 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 1059375 >>> 1060589 1060596 13042 1214 7 14263 0 >>> 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 1060603 >>> 1060954 1061054 14265 351 100 14716 0 >>> 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 1060329 >>> 1061094 1061126 13993 765 32 14790 0 >>> 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 1061105 >>> 1065608 1065617 14414 4503 9 18926 0 >>> 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 1065622 >>> 1066307 1066315 18929 685 8 19622 0 >>> 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 1061045 >>> 1067540 1067563 14356 6495 23 20874 0 >>> 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 1066320 >>> 1069262 1069271 19625 2942 9 22576 0 >>> 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 1067551 >>> 1071003 1071011 20854 3452 8 24314 0 >>> 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 1071016 >>> 1071664 1071671 24316 648 7 24971 0 >>> 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 1069275 >>> 1071679 1071692 22577 2404 13 24994 0 >>> 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 1071687 >>> 1073978 1073988 24985 2291 10 27286 0 >>> 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 1073992 >>> 1075959 1075969 27286 1967 10 29263 0 >>> 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 1071699 >>> 1076704 1076713 24995 5005 9 30009 0 >>> 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 1075972 >>> 1077451 1077459 29264 1479 8 30751 0 >>> 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 1076717 >>> 1080157 1080165 30007 3440 8 33455 0 >>> 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 1077464 >>> 1080270 1080286 30752 2806 16 33574 0 >>> 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 1080170 >>> 1080611 1080619 33457 441 8 33906 0 >>> 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 1080624 >>> 1080973 1080983 33907 349 10 34266 0 >>> 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 1080281 >>> 1081405 1081413 33566 1124 8 34698 0 >>> 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 1080986 >>> 1082989 1082996 34267 2003 7 36277 0 >>> 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 1083002 >>> 1083370 1083378 36279 368 8 36655 0 >>> 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 1081417 >>> 1084830 1084837 34696 3413 7 38116 0 >>> 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 1084843 >>> 1085854 1085879 37761 1011 25 38797 0 >>> 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 1085865 >>> 1089502 1089511 38780 3637 9 42426 0 >>> 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 1089515 >>> 1089966 1089974 42428 451 8 42887 0 >>> 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 1083383 >>> 1091316 1091324 36658 7933 8 44599 0 >>> 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 1091329 >>> 1092042 1092049 44237 713 7 44957 0 >>> 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 1092055 >>> 1094242 1094249 44960 2187 7 47154 0 >>> 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 1089979 >>> 1094418 1094428 42889 4439 10 47338 0 >>> 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 1094433 >>> 1095082 1095089 47331 649 7 47987 0 >>> 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 1095095 >>> 1096846 1096853 47991 1751 7 49749 0 >>> 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 1094256 >>> 1098214 1098221 47156 3958 7 51121 0 >>> 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 1096859 >>> 1098627 1098637 49752 1768 10 51530 0 >>> 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 1094037 >>> 1098903 1098910 46940 4866 7 51813 0 >>> 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 1099192 >>> 1100210 1100246 52071 1018 36 53125 0 >>> 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 1097371 >>> 1100555 1100562 50260 3184 7 53451 0 >>> 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 1097135 >>> 1100896 1100904 50026 3761 8 53795 0 >>> 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 1098640 >>> 1101106 1101127 51523 2466 21 54010 0 >>> 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 1099965 >>> 1101217 1101224 52842 1252 7 54101 0 >>> 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 1098227 >>> 1101820 1101828 51112 3593 8 54713 0 >>> 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 1097375 >>> 1104132 1104139 50262 6757 7 57026 0 >>> 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 1100221 >>> 1106449 1106458 53096 6228 9 59333 0 >>> 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 1098916 >>> 1106473 1106481 51797 7557 8 59362 0 >>> 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 563384 >>> 1207793 1207801 71 644409 8 644488 0 >>> 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 563413 >>> 1216404 1216425 98 652991 21 653110 0 >>> >>> >>> >>> Veronika Nefedova wrote: >>>> OK. There is something weird happening. I've got several such >>>> entries in my swift log: >>>> >>>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application >>>> exception: Task failed >>>> task:execute @ vdl-int.k, line: 332 >>>> vdl:execute2 @ execute-default.k, line: 22 >>>> vdl:execute @ MolDyn-244-loops.kml, line: 20 >>>> antchmbr @ MolDyn-244-loops.kml, line: 2845 >>>> vdl:mains @ MolDyn-244-loops.kml, line: 2267 >>>> >>>> >>>> Looks like antechamber has failed (?). And the failure is only >>>> on a swfit side, it never made it across to Falcon (there are no >>>> remote directories created). But I see some of antechamber jobs >>>> have finished (in shared). >>>> >>>> Yuqing -- could the changes you've made be responsible for these >>>> failures (I do not see how it could though) ? >>>> >>>> Ioan, what do you see in your logs ion these tasks: >>>> >>>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-56-0-1186429255786) setting status to Failed >>>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-57-0-1186429255798) setting status to Failed >>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-59-0-1186429255800) setting status to Failed >>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-60-0-1186429255805) setting status to Failed >>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-61-0-1186429255811) setting status to Failed >>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-58-0-1186429255814) setting status to Failed >>>> >>>> Nika >>>> >>>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >>>> >>>>> OK! >>>>> Why don't we do one last run from my allocation, as everything >>>>> is set up already and ready to go! Make sure to enable all >>>>> debug logging. Falkon is up and running with all debug enabled! >>>>> >>>>> Falkon location is unchanged from the last experiment. >>>>> Falkon Factory Service: http://tg-viz-login2:50010/wsrf/ >>>>> services/GenericPortal/core/WS/GPFactoryService >>>>> Web Server (graphs): http://tg-viz-login2.uc.teragrid.org:51000/ >>>>> index.htm >>>>> >>>>> ANL/UC is not quite so idle as it was earlier, but I bet we >>>>> could still get 150~200 processors! >>>>> >>>>> Ioan >>>>> >>>>> Veronika Nefedova wrote: >>>>>> m050 and m179 finished just fine now via GRAM (thanks to >>>>>> Yuqing who fixed the m179 just in time!). We could start again >>>>>> the 244- molecule run to verify that nothing is wrong with the >>>>>> whole system. >>>>>> >>>>>> Nika >>>>>> >>>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>>>> >>>>>>> >>>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>>>> >>>>>>> >>>>>>> I started those 2 molecules via GRAM. I have no trust in m179 >>>>>>> finishing completely since I didn't change anything. I hope >>>>>>> for m050 to finish though... >>>>>>> You can watch the swift log on viper in ~nefedova/alamines/ >>>>>>> MolDyn-2-loops-be9484k93kk21.log >>>>>>> >>>>>>> Nika >>>>>>> >>>>>>>> Then, let's try another run with 244 molecules soon, as most >>>>>>>> of ANL/UC is free! >>>>>>>> >>>>>>>> Ioan >>>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > From nefedova at mcs.anl.gov Mon Aug 6 16:18:21 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 16:18:21 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186434286.15849.0.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <1186433552.13450.0.camel@blabla.mcs.anl.gov> <1186434286.15849.0.camel@blabla.mcs.anl.gov> Message-ID: I got 40 such entries: 2007-08-06 14:47:03,571 DEBUG TaskImpl Task(type=2, identity=urn: 0-1-66-0-1186429258767) setting status to Failed Exception in getFile and 20 such entries: 2007-08-06 14:46:58,559 DEBUG vdl:execute2 Application exception: Task failed The workflow just exited with no more new errors/entries in the log. The last few lines of the log: 2007-08-06 14:47:03,596 DEBUG TaskImpl Task(type=4, identity=urn: 0-1-55-0-1186429258834) setting status to Active 2007-08-06 14:47:03,596 DEBUG TaskImpl Task(type=4, identity=urn: 0-1-55-0-1186429258834) setting status to Completed 2007-08-06 14:47:03,704 DEBUG TaskImpl Task(type=2, identity=urn: 0-1-62-0-1186429258791) setting status to Failed Exception in getFile 2007-08-06 14:47:03,705 DEBUG TaskImpl Task(type=4, identity=urn: 0-1-62-0-1186429258838) setting status to Active 2007-08-06 14:47:03,705 DEBUG TaskImpl Task(type=4, identity=urn: 0-1-62-0-1186429258838) setting status to Completed nefedova at viper:~/alamines> On Aug 6, 2007, at 4:04 PM, Mihael Hategan wrote: > Try "[E|e]xception". > > On Mon, 2007-08-06 at 15:57 -0500, Veronika Nefedova wrote: >> Nope, nothing more really... >> Several of these: >> >> 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-67-0-1186429255847) setting status to Failed >> 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-69-0-1186429255851) setting status to Failed >> 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-68-0-1186429255859) setting status to Failed >> 2007-08-06 14:46:58,562 DEBUG TaskImpl Task(type=1, identity=urn: >> 0-1-70-0-1186429255863) setting status to Failed >> >> Nothing more specific... >> >> The log is huge. If you tell me what string to grep for - I might be >> able to find something relevant... >> >> NIka >> >> On Aug 6, 2007, at 3:52 PM, Mihael Hategan wrote: >> >>> On Mon, 2007-08-06 at 15:17 -0500, Veronika Nefedova wrote: >>>> OK. There is something weird happening. I've got several such >>>> entries >>>> in my swift log: >>>> >>>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application exception: >>>> Task failed >>>> task:execute @ vdl-int.k, line: 332 >>>> vdl:execute2 @ execute-default.k, line: 22 >>>> vdl:execute @ MolDyn-244-loops.kml, line: 20 >>>> antchmbr @ MolDyn-244-loops.kml, line: 2845 >>>> vdl:mains @ MolDyn-244-loops.kml, line: 2267 >>> >>> That doesn't say much. Any more details in the logs? >>> >>>> >>>> >>>> Looks like antechamber has failed (?). And the failure is only on a >>>> swfit side, it never made it across to Falcon (there are no remote >>>> directories created). But I see some of antechamber jobs have >>>> finished (in shared). >>>> >>>> Yuqing -- could the changes you've made be responsible for these >>>> failures (I do not see how it could though) ? >>>> >>>> Ioan, what do you see in your logs ion these tasks: >>>> >>>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-56-0-1186429255786) setting status to Failed >>>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-57-0-1186429255798) setting status to Failed >>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-59-0-1186429255800) setting status to Failed >>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-60-0-1186429255805) setting status to Failed >>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-61-0-1186429255811) setting status to Failed >>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-58-0-1186429255814) setting status to Failed >>>> >>>> Nika >>>> >>>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >>>> >>>>> OK! >>>>> Why don't we do one last run from my allocation, as everything is >>>>> set up already and ready to go! Make sure to enable all debug >>>>> logging. Falkon is up and running with all debug enabled! >>>>> >>>>> Falkon location is unchanged from the last experiment. >>>>> Falkon Factory Service: http://tg-viz-login2:50010/wsrf/services/ >>>>> GenericPortal/core/WS/GPFactoryService >>>>> Web Server (graphs): http://tg-viz-login2.uc.teragrid.org:51000/ >>>>> index.htm >>>>> >>>>> ANL/UC is not quite so idle as it was earlier, but I bet we could >>>>> still get 150~200 processors! >>>>> >>>>> Ioan >>>>> >>>>> Veronika Nefedova wrote: >>>>>> m050 and m179 finished just fine now via GRAM (thanks to Yuqing >>>>>> who fixed the m179 just in time!). We could start again the 244- >>>>>> molecule run to verify that nothing is wrong with the whole >>>>>> system. >>>>>> >>>>>> Nika >>>>>> >>>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>>>> >>>>>>> >>>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>>>> >>>>>>> >>>>>>> I started those 2 molecules via GRAM. I have no trust in m179 >>>>>>> finishing completely since I didn't change anything. I hope for >>>>>>> m050 to finish though... >>>>>>> You can watch the swift log on viper in ~nefedova/alamines/ >>>>>>> MolDyn-2-loops-be9484k93kk21.log >>>>>>> >>>>>>> Nika >>>>>>> >>>>>>>> Then, let's try another run with 244 molecules soon, as most of >>>>>>>> ANL/UC is free! >>>>>>>> >>>>>>>> Ioan >>>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> >> > From iraicu at cs.uchicago.edu Mon Aug 6 16:20:56 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 16:20:56 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B78EEB.80800@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> Message-ID: <46B790B8.3080004@cs.uchicago.edu> Just to debug further.... I picked out 1 task at random from the Swift log... iraicu at viper:/home/nefedova/alamines> cat MolDyn-244-loops-dbui34oxjr4j2.log | grep "urn:0-1-62-0-1186429258791" 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, identity=urn:0-1-62-0-1186429258791) setting status to Submitted 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, identity=urn:0-1-62-0-1186429258791) setting status to Active 2007-08-06 14:47:03,704 DEBUG TaskImpl Task(type=2, identity=urn:0-1-62-0-1186429258791) setting status to Failed Exception in getFile but in my log, it is nowhere to be found... iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat GenericPortalWS_taskPerf.txt | grep "urn:0-1-62-0-1186429258791" What does "setting status to Failed Exception in getFile" mean? Could this mean that it failed on the data staging part, and that it never made it to Falkon? BTW, it lloks as if there were really 539 jobs submitted... iraicu at viper:/home/nefedova/alamines> grep "Submitted" MolDyn-244-loops-dbui34oxjr4j2.log | wc 539 5390 62835 but again, only 57 made it to Falkon, and there were no exceptions thrown anywhere to indicate that something unusual happened. Ioan Ioan Raicu wrote: > Falkon only has 57 tasks received, here they are: > tg-viz-login.uc.teragrid.org:/home/iraicu/java/Falkon_v0.8.1/service/logs/GenericPortalWS.txt.0.summary > > > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > pre_ch-vsk58efi stdout.txt stderr.txt . ./m179.mol2 ./m050.mol2 > m179_am1 m050_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-xsk58efi stdout.txt stderr.txt m179_am1 m179_am1.rtf > m179_am1.crd m179_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m179_am1 > -fi mol2 -rn m179 -o m179_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-ysk58efi stdout.txt stderr.txt m050_am1 m050_am1.rtf > m050_am1.crd m050_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m050_am1 > -fi mol2 -rn m050 -o m050_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > chrm-0tk58efi equil_solv.out_m050 stderr.txt equil_solv.inp > parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp m050_am1.rtf > m050_am1.prm m050_am1.crd water_400.crd equil_solv.out_m050 > solv_m050.psf solv_m050_eq.crd solv_m050.rst solv_m050.trj > solv_m050_min.crd /disks/scratchgpfs1/iraicu/ModLyn/bin/charmm.sh > system:solv_m050 title:solv stitle:m050 rtffile:parm03_gaff_all.rtf > paramfile:parm03_gaffnb_all.prm gaff:m050_am1 nwater:400 ligcrd:lyz > rforce:0 iseed:3131887 rwater:15 nstep:10000 minstep:100 skipstep:100 > startstep:10000 > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > chrm-zsk58efi equil_solv.out_m179 stderr.txt equil_solv.inp > parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp m179_am1.rtf > m179_am1.prm m179_am1.crd water_400.crd equil_solv.out_m179 > solv_m179.psf solv_m179_eq.crd solv_m179.rst solv_m179.trj > solv_m179_min.crd /disks/scratchgpfs1/iraicu/ModLyn/bin/charmm.sh > system:solv_m179 title:solv stitle:m179 rtffile:parm03_gaff_all.rtf > paramfile:parm03_gaffnb_all.prm gaff:m179_am1 nwater:400 ligcrd:lyz > rforce:0 iseed:3131887 rwater:15 nstep:10000 minstep:100 skipstep:100 > startstep:10000 > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > pre_ch-38lc8efi stdout.txt stderr.txt . ./m197.mol2 ./m129.mol2 > ./m069.mol2 ./m163.mol2 ./m128.mol2 ./m035.mol2 ./m070.mol2 > ./m221.mol2 ./m162.mol2 ./m198.mol2 ./m034.mol2 ./m001.mol2 > ./m220.mol2 ./m033.mol2 ./m161.mol2 ./m032.mol2 ./m160.mol2 > ./m130.mol2 ./m071.mol2 ./m002.mol2 ./m199.mol2 ./m175.mol2 > ./m234.mol2 ./m048.mol2 ./m107.mol2 ./m047.mol2 ./m106.mol2 > ./m124.mol2 ./m193.mol2 ./m225.mol2 ./m066.mol2 ./m125.mol2 > ./m176.mol2 ./m194.mol2 ./m224.mol2 ./m235.mol2 ./m067.mol2 > ./m165.mol2 ./m049.mol2 ./m126.mol2 ./m166.mol2 ./m108.mol2 > ./m195.mol2 ./m038.mol2 ./m059.mol2 ./m036.mol2 ./m186.mol2 > ./m164.mol2 ./m117.mol2 ./m223.mol2 ./m058.mol2 ./m037.mol2 > ./m188.mol2 ./m068.mol2 ./m119.mol2 ./m187.mol2 ./m196.mol2 > ./m118.mol2 ./m127.mol2 ./m222.mol2 ./m189.mol2 ./m060.mol2 > ./m236.mol2 ./m109.mol2 ./m177.mol2 ./m050.mol2 ./m179.mol2 > ./m178.mol2 ./m123.mol2 ./m237.mol2 ./m110.mol2 ./m191.mol2 > ./m100.mol2 ./m064.mol2 ./m041.mol2 ./m238.mol2 ./m063.mol2 > ./m228.mol2 ./m051.mol2 ./m122.mol2 ./m169.mol2 ./m121.mol2 > ./m190.mol2 ./m120.mol2 ./m062.mol2 ./m065.mol2 ./m039.mol2 > ./m192.mol2 ./m167.mol2 ./m227.mol2 ./m040.mol2 ./m226.mol2 > ./m168.mol2 ./m239.mol2 ./m052.mol2 ./m111.mol2 ./m180.mol2 > ./m053.mol2 ./m112.mol2 ./m181.mol2 ./m240.mol2 ./m054.mol2 > ./m044.mol2 ./m113.mol2 ./m230.mol2 ./m103.mol2 ./m229.mol2 > ./m061.mol2 ./m042.mol2 ./m101.mol2 ./m170.mol2 ./m043.mol2 > ./m102.mol2 ./m171.mol2 ./m151.mol2 ./m083.mol2 ./m210.mol2 > ./m014.mol2 ./m023.mol2 ./m200.mol2 ./m092.mol2 ./m091.mol2 > ./m150.mol2 ./m209.mol2 ./m022.mol2 ./m024.mol2 ./m093.mol2 > ./m015.mol2 ./m084.mol2 ./m142.mol2 ./m201.mol2 ./m016.mol2 > ./m085.mol2 ./m143.mol2 ./m202.mol2 ./m010.mol2 ./m212.mol2 > ./m138.mol2 ./m026.mol2 ./m011.mol2 ./m095.mol2 ./m139.mol2 > ./m154.mol2 ./m211.mol2 ./m025.mol2 ./m094.mol2 ./m153.mol2 > ./m213.mol2 ./m080.mol2 ./m012.mol2 ./m152.mol2 ./m081.mol2 > ./m140.mol2 ./m013.mol2 ./m082.mol2 ./m141.mol2 ./m028.mol2 > ./m097.mol2 ./m155.mol2 ./m008.mol2 ./m214.mol2 ./m135.mol2 > ./m029.mol2 ./m076.mol2 ./m098.mol2 ./m007.mol2 ./m156.mol2 > ./m134.mol2 ./m215.mol2 ./m137.mol2 ./m079.mol2 ./m009.mol2 > ./m078.mol2 ./m077.mol2 ./m096.mol2 ./m136.mol2 ./m027.mol2 > ./m132.mol2 ./m158.mol2 ./m073.mol2 ./m217.mol2 ./m030.mol2 > ./m159.mol2 ./m072.mol2 ./m218.mol2 ./m003.mol2 ./m031.mol2 > ./m004.mol2 ./m219.mol2 ./m131.mol2 ./m074.mol2 ./m133.mol2 > ./m006.mol2 ./m075.mol2 ./m157.mol2 ./m099.mol2 ./m005.mol2 > ./m216.mol2 ./m090.mol2 ./m021.mol2 ./m208.mol2 ./m149.mol2 > ./m020.mol2 ./m207.mol2 ./m148.mol2 ./m088.mol2 ./m089.mol2 > ./m206.mol2 ./m147.mol2 ./m019.mol2 ./m205.mol2 ./m146.mol2 > ./m087.mol2 ./m018.mol2 ./m204.mol2 ./m145.mol2 ./m086.mol2 > ./m017.mol2 ./m144.mol2 ./m203.mol2 ./m057.mol2 ./m116.mol2 > ./m232.mol2 ./m173.mol2 ./m105.mol2 ./m046.mol2 ./m231.mol2 > ./m172.mol2 ./m104.mol2 ./m045.mol2 ./m174.mol2 ./m233.mol2 > ./m244.mol2 ./m185.mol2 ./m182.mol2 ./m243.mol2 ./m055.mol2 > ./m241.mol2 ./m183.mol2 ./m114.mol2 ./m056.mol2 ./m242.mol2 > ./m184.mol2 ./m115.mol2 m197_am1 m129_am1 m069_am1 m163_am1 m128_am1 > m035_am1 m070_am1 m221_am1 m162_am1 m198_am1 m034_am1 m001_am1 > m220_am1 m033_am1 m161_am1 m032_am1 m160_am1 m130_am1 m071_am1 > m002_am1 m199_am1 m175_am1 m234_am1 m048_am1 m107_am1 m047_am1 > m106_am1 m124_am1 m193_am1 m225_am1 m066_am1 m125_am1 m176_am1 > m194_am1 m224_am1 m235_am1 m067_am1 m165_am1 m049_am1 m126_am1 > m166_am1 m108_am1 m195_am1 m038_am1 m059_am1 m036_am1 m186_am1 > m164_am1 m223_am1 m117_am1 m037_am1 m058_am1 m068_am1 m188_am1 > m119_am1 m196_am1 m187_am1 m222_am1 m127_am1 m118_am1 m189_am1 > m060_am1 m236_am1 m109_am1 m177_am1 m050_am1 m179_am1 m123_am1 > m178_am1 m237_am1 m100_am1 m191_am1 m110_am1 m041_am1 m064_am1 > m228_am1 m063_am1 m238_am1 m169_am1 m122_am1 m051_am1 m121_am1 > m190_am1 m120_am1 m062_am1 m039_am1 m065_am1 m167_am1 m192_am1 > m227_am1 m040_am1 m226_am1 m168_am1 m239_am1 m052_am1 m111_am1 > m180_am1 m053_am1 m112_am1 m181_am1 m240_am1 m054_am1 m044_am1 > m113_am1 m230_am1 m103_am1 m229_am1 m061_am1 m042_am1 m101_am1 > m170_am1 m043_am1 m102_am1 m171_am1 m151_am1 m083_am1 m210_am1 > m014_am1 m023_am1 m200_am1 m092_am1 m091_am1 m150_am1 m209_am1 > m022_am1 m024_am1 m093_am1 m015_am1 m084_am1 m142_am1 m201_am1 > m016_am1 m085_am1 m143_am1 m202_am1 m010_am1 m212_am1 m138_am1 > m026_am1 m011_am1 m095_am1 m139_am1 m154_am1 m211_am1 m025_am1 > m094_am1 m153_am1 m213_am1 m080_am1 m012_am1 m152_am1 m081_am1 > m140_am1 m013_am1 m082_am1 m141_am1 m028_am1 m097_am1 m155_am1 > m008_am1 m214_am1 m135_am1 m029_am1 m076_am1 m098_am1 m007_am1 > m156_am1 m134_am1 m215_am1 m137_am1 m079_am1 m009_am1 m078_am1 > m077_am1 m096_am1 m136_am1 m027_am1 m132_am1 m158_am1 m073_am1 > m217_am1 m030_am1 m159_am1 m072_am1 m218_am1 m003_am1 m031_am1 > m004_am1 m219_am1 m131_am1 m074_am1 m133_am1 m006_am1 m075_am1 > m157_am1 m099_am1 m216_am1 m005_am1 m090_am1 m021_am1 m208_am1 > m149_am1 m020_am1 m207_am1 m148_am1 m089_am1 m088_am1 m206_am1 > m147_am1 m019_am1 m205_am1 m146_am1 m087_am1 m018_am1 m204_am1 > m145_am1 m086_am1 m017_am1 m144_am1 m203_am1 m057_am1 m116_am1 > m232_am1 m173_am1 m105_am1 m046_am1 m231_am1 m172_am1 m104_am1 > m045_am1 m174_am1 m233_am1 m244_am1 m185_am1 m182_am1 m243_am1 > m055_am1 m241_am1 m183_am1 m114_am1 m056_am1 m242_am1 m184_am1 > m115_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-58lc8efi stdout.txt stderr.txt m197_am1 m197_am1.rtf > m197_am1.crd m197_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m197_am1 > -fi mol2 -rn m197 -o m197_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-48lc8efi stdout.txt stderr.txt m129_am1 m129_am1.rtf > m129_am1.crd m129_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m129_am1 > -fi mol2 -rn m129 -o m129_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-68lc8efi stdout.txt stderr.txt m069_am1 m069_am1.rtf > m069_am1.crd m069_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m069_am1 > -fi mol2 -rn m069 -o m069_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-88lc8efi stdout.txt stderr.txt m163_am1 m163_am1.rtf > m163_am1.crd m163_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m163_am1 > -fi mol2 -rn m163 -o m163_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-78lc8efi stdout.txt stderr.txt m128_am1 m128_am1.rtf > m128_am1.crd m128_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m128_am1 > -fi mol2 -rn m128 -o m128_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-98lc8efi stdout.txt stderr.txt m035_am1 m035_am1.rtf > m035_am1.crd m035_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m035_am1 > -fi mol2 -rn m035 -o m035_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-a8lc8efi stdout.txt stderr.txt m070_am1 m070_am1.rtf > m070_am1.crd m070_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m070_am1 > -fi mol2 -rn m070 -o m070_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-b8lc8efi stdout.txt stderr.txt m221_am1 m221_am1.rtf > m221_am1.crd m221_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m221_am1 > -fi mol2 -rn m221 -o m221_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-c8lc8efi stdout.txt stderr.txt m162_am1 m162_am1.rtf > m162_am1.crd m162_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m162_am1 > -fi mol2 -rn m162 -o m162_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-d8lc8efi stdout.txt stderr.txt m198_am1 m198_am1.rtf > m198_am1.crd m198_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m198_am1 > -fi mol2 -rn m198 -o m198_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-e8lc8efi stdout.txt stderr.txt m034_am1 m034_am1.rtf > m034_am1.crd m034_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m034_am1 > -fi mol2 -rn m034 -o m034_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-f8lc8efi stdout.txt stderr.txt m001_am1 m001_am1.rtf > m001_am1.crd m001_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m001_am1 > -fi mol2 -rn m001 -o m001_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-h8lc8efi stdout.txt stderr.txt m033_am1 m033_am1.rtf > m033_am1.crd m033_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m033_am1 > -fi mol2 -rn m033 -o m033_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-g8lc8efi stdout.txt stderr.txt m220_am1 m220_am1.rtf > m220_am1.crd m220_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m220_am1 > -fi mol2 -rn m220 -o m220_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-i8lc8efi stdout.txt stderr.txt m161_am1 m161_am1.rtf > m161_am1.crd m161_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m161_am1 > -fi mol2 -rn m161 -o m161_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-j8lc8efi stdout.txt stderr.txt m032_am1 m032_am1.rtf > m032_am1.crd m032_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m032_am1 > -fi mol2 -rn m032 -o m032_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-k8lc8efi stdout.txt stderr.txt m160_am1 m160_am1.rtf > m160_am1.crd m160_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m160_am1 > -fi mol2 -rn m160 -o m160_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-l8lc8efi stdout.txt stderr.txt m130_am1 m130_am1.rtf > m130_am1.crd m130_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m130_am1 > -fi mol2 -rn m130 -o m130_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-m8lc8efi stdout.txt stderr.txt m071_am1 m071_am1.rtf > m071_am1.crd m071_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m071_am1 > -fi mol2 -rn m071 -o m071_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-o8lc8efi stdout.txt stderr.txt m199_am1 m199_am1.rtf > m199_am1.crd m199_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m199_am1 > -fi mol2 -rn m199 -o m199_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-n8lc8efi stdout.txt stderr.txt m002_am1 m002_am1.rtf > m002_am1.crd m002_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m002_am1 > -fi mol2 -rn m002 -o m002_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-p8lc8efi stdout.txt stderr.txt m175_am1 m175_am1.rtf > m175_am1.crd m175_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m175_am1 > -fi mol2 -rn m175 -o m175_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-q8lc8efi stdout.txt stderr.txt m234_am1 m234_am1.rtf > m234_am1.crd m234_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m234_am1 > -fi mol2 -rn m234 -o m234_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-s8lc8efi stdout.txt stderr.txt m107_am1 m107_am1.rtf > m107_am1.crd m107_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m107_am1 > -fi mol2 -rn m107 -o m107_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-r8lc8efi stdout.txt stderr.txt m048_am1 m048_am1.rtf > m048_am1.crd m048_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m048_am1 > -fi mol2 -rn m048 -o m048_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-v8lc8efi stdout.txt stderr.txt m124_am1 m124_am1.rtf > m124_am1.crd m124_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m124_am1 > -fi mol2 -rn m124 -o m124_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-t8lc8efi stdout.txt stderr.txt m047_am1 m047_am1.rtf > m047_am1.crd m047_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m047_am1 > -fi mol2 -rn m047 -o m047_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-u8lc8efi stdout.txt stderr.txt m106_am1 m106_am1.rtf > m106_am1.crd m106_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m106_am1 > -fi mol2 -rn m106 -o m106_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-x8lc8efi stdout.txt stderr.txt m193_am1 m193_am1.rtf > m193_am1.crd m193_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m193_am1 > -fi mol2 -rn m193 -o m193_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-y8lc8efi stdout.txt stderr.txt m225_am1 m225_am1.rtf > m225_am1.crd m225_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m225_am1 > -fi mol2 -rn m225 -o m225_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-z8lc8efi stdout.txt stderr.txt m066_am1 m066_am1.rtf > m066_am1.crd m066_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m066_am1 > -fi mol2 -rn m066 -o m066_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-09lc8efi stdout.txt stderr.txt m125_am1 m125_am1.rtf > m125_am1.crd m125_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m125_am1 > -fi mol2 -rn m125 -o m125_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-29lc8efi stdout.txt stderr.txt m194_am1 m194_am1.rtf > m194_am1.crd m194_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m194_am1 > -fi mol2 -rn m194 -o m194_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-19lc8efi stdout.txt stderr.txt m176_am1 m176_am1.rtf > m176_am1.crd m176_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m176_am1 > -fi mol2 -rn m176 -o m176_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-39lc8efi stdout.txt stderr.txt m224_am1 m224_am1.rtf > m224_am1.crd m224_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m224_am1 > -fi mol2 -rn m224 -o m224_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-49lc8efi stdout.txt stderr.txt m235_am1 m235_am1.rtf > m235_am1.crd m235_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m235_am1 > -fi mol2 -rn m235 -o m235_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-69lc8efi stdout.txt stderr.txt m165_am1 m165_am1.rtf > m165_am1.crd m165_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m165_am1 > -fi mol2 -rn m165 -o m165_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-59lc8efi stdout.txt stderr.txt m067_am1 m067_am1.rtf > m067_am1.crd m067_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m067_am1 > -fi mol2 -rn m067 -o m067_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-79lc8efi stdout.txt stderr.txt m049_am1 m049_am1.rtf > m049_am1.crd m049_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m049_am1 > -fi mol2 -rn m049 -o m049_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-89lc8efi stdout.txt stderr.txt m126_am1 m126_am1.rtf > m126_am1.crd m126_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m126_am1 > -fi mol2 -rn m126 -o m126_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-99lc8efi stdout.txt stderr.txt m166_am1 m166_am1.rtf > m166_am1.crd m166_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m166_am1 > -fi mol2 -rn m166 -o m166_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-a9lc8efi stdout.txt stderr.txt m108_am1 m108_am1.rtf > m108_am1.crd m108_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m108_am1 > -fi mol2 -rn m108 -o m108_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-b9lc8efi stdout.txt stderr.txt m195_am1 m195_am1.rtf > m195_am1.crd m195_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m195_am1 > -fi mol2 -rn m195 -o m195_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-d9lc8efi stdout.txt stderr.txt m038_am1 m038_am1.rtf > m038_am1.crd m038_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m038_am1 > -fi mol2 -rn m038 -o m038_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-c9lc8efi stdout.txt stderr.txt m059_am1 m059_am1.rtf > m059_am1.crd m059_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m059_am1 > -fi mol2 -rn m059 -o m059_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-e9lc8efi stdout.txt stderr.txt m186_am1 m186_am1.rtf > m186_am1.crd m186_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m186_am1 > -fi mol2 -rn m186 -o m186_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-f9lc8efi stdout.txt stderr.txt m164_am1 m164_am1.rtf > m164_am1.crd m164_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m164_am1 > -fi mol2 -rn m164 -o m164_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-h9lc8efi stdout.txt stderr.txt m036_am1 m036_am1.rtf > m036_am1.crd m036_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m036_am1 > -fi mol2 -rn m036 -o m036_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-g9lc8efi stdout.txt stderr.txt m223_am1 m223_am1.rtf > m223_am1.crd m223_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m223_am1 > -fi mol2 -rn m223 -o m223_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-j9lc8efi stdout.txt stderr.txt m058_am1 m058_am1.rtf > m058_am1.crd m058_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m058_am1 > -fi mol2 -rn m058 -o m058_am1 -fo charmm -c bcc > 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > antch-k9lc8efi stdout.txt stderr.txt m037_am1 m037_am1.rtf > m037_am1.crd m037_am1.prm > /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i m037_am1 > -fi mol2 -rn m037 -o m037_am1 -fo charmm -c bcc > > > > Veronika Nefedova wrote: >> Swift thinks that it sent 248 jobs. >> >> nefedova at viper:~/alamines> grep "Running job " >> MolDyn-244-loops-dbui34oxjr4j2.log | wc >> 248 6931 56718 >> nefedova at viper:~/alamines> >> >> On Aug 6, 2007, at 3:27 PM, Ioan Raicu wrote: >> >>> Everything is idle, there is no work to be done... >>> >>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> tail >>> GenericPortalWS_perf_per_sec.txt >>> 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>> >>> 24 workers are registered but idle.... queue length 0, 57 jobs >>> completed. >>> >>> Also, see below all 57 jobs, they all finished with an exit code of >>> 0, in other words succesfully! How many jobs does Swift think it sent? >>> >>> Ioan >>> >>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>> GenericPortalWS_taskPerf.txt >>> //taskNum taskID workerID startTimeStamp execTimeStamp >>> resultsQueueTimeStamp endTimeStamp waitQueueTime ex >>> ecTime resultsQueueTime totalTime exitCode >>> 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 560614 >>> 560629 49780 338 15 50133 0 >>> 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 561200 561899 >>> 561909 216 699 10 925 0 >>> 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 561373 562150 >>> 562159 382 777 9 1168 0 >>> 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 1044916 >>> 1044926 62404 10200 10 72614 0 >>> 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 1046453 >>> 1047038 1047067 135 585 29 749 0 >>> 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 1046429 >>> 1053072 1053080 114 6643 8 6765 0 >>> 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 1047051 >>> 1054256 1054290 731 7205 34 7970 0 >>> 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 1054267 >>> 1054570 1054579 7943 303 9 8255 0 >>> 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 1053087 >>> 1056811 1056819 6765 3724 8 10497 0 >>> 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 1054583 >>> 1058691 1058719 8257 4108 28 12393 0 >>> 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 1058704 >>> 1059363 1059385 12373 659 22 13054 0 >>> 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 1056826 >>> 1060315 1060323 10497 3489 8 13994 0 >>> 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 1059375 >>> 1060589 1060596 13042 1214 7 14263 0 >>> 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 1060603 >>> 1060954 1061054 14265 351 100 14716 0 >>> 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 1060329 >>> 1061094 1061126 13993 765 32 14790 0 >>> 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 1061105 >>> 1065608 1065617 14414 4503 9 18926 0 >>> 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 1065622 >>> 1066307 1066315 18929 685 8 19622 0 >>> 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 1061045 >>> 1067540 1067563 14356 6495 23 20874 0 >>> 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 1066320 >>> 1069262 1069271 19625 2942 9 22576 0 >>> 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 1067551 >>> 1071003 1071011 20854 3452 8 24314 0 >>> 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 1071016 >>> 1071664 1071671 24316 648 7 24971 0 >>> 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 1069275 >>> 1071679 1071692 22577 2404 13 24994 0 >>> 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 1071687 >>> 1073978 1073988 24985 2291 10 27286 0 >>> 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 1073992 >>> 1075959 1075969 27286 1967 10 29263 0 >>> 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 1071699 >>> 1076704 1076713 24995 5005 9 30009 0 >>> 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 1075972 >>> 1077451 1077459 29264 1479 8 30751 0 >>> 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 1076717 >>> 1080157 1080165 30007 3440 8 33455 0 >>> 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 1077464 >>> 1080270 1080286 30752 2806 16 33574 0 >>> 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 1080170 >>> 1080611 1080619 33457 441 8 33906 0 >>> 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 1080624 >>> 1080973 1080983 33907 349 10 34266 0 >>> 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 1080281 >>> 1081405 1081413 33566 1124 8 34698 0 >>> 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 1080986 >>> 1082989 1082996 34267 2003 7 36277 0 >>> 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 1083002 >>> 1083370 1083378 36279 368 8 36655 0 >>> 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 1081417 >>> 1084830 1084837 34696 3413 7 38116 0 >>> 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 1084843 >>> 1085854 1085879 37761 1011 25 38797 0 >>> 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 1085865 >>> 1089502 1089511 38780 3637 9 42426 0 >>> 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 1089515 >>> 1089966 1089974 42428 451 8 42887 0 >>> 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 1083383 >>> 1091316 1091324 36658 7933 8 44599 0 >>> 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 1091329 >>> 1092042 1092049 44237 713 7 44957 0 >>> 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 1092055 >>> 1094242 1094249 44960 2187 7 47154 0 >>> 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 1089979 >>> 1094418 1094428 42889 4439 10 47338 0 >>> 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 1094433 >>> 1095082 1095089 47331 649 7 47987 0 >>> 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 1095095 >>> 1096846 1096853 47991 1751 7 49749 0 >>> 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 1094256 >>> 1098214 1098221 47156 3958 7 51121 0 >>> 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 1096859 >>> 1098627 1098637 49752 1768 10 51530 0 >>> 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 1094037 >>> 1098903 1098910 46940 4866 7 51813 0 >>> 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 1099192 >>> 1100210 1100246 52071 1018 36 53125 0 >>> 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 1097371 >>> 1100555 1100562 50260 3184 7 53451 0 >>> 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 1097135 >>> 1100896 1100904 50026 3761 8 53795 0 >>> 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 1098640 >>> 1101106 1101127 51523 2466 21 54010 0 >>> 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 1099965 >>> 1101217 1101224 52842 1252 7 54101 0 >>> 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 1098227 >>> 1101820 1101828 51112 3593 8 54713 0 >>> 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 1097375 >>> 1104132 1104139 50262 6757 7 57026 0 >>> 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 1100221 >>> 1106449 1106458 53096 6228 9 59333 0 >>> 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 1098916 >>> 1106473 1106481 51797 7557 8 59362 0 >>> 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 563384 >>> 1207793 1207801 71 644409 8 644488 0 >>> 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 563413 >>> 1216404 1216425 98 652991 21 653110 0 >>> >>> >>> >>> Veronika Nefedova wrote: >>>> OK. There is something weird happening. I've got several such >>>> entries in my swift log: >>>> >>>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application exception: >>>> Task failed >>>> task:execute @ vdl-int.k, line: 332 >>>> vdl:execute2 @ execute-default.k, line: 22 >>>> vdl:execute @ MolDyn-244-loops.kml, line: 20 >>>> antchmbr @ MolDyn-244-loops.kml, line: 2845 >>>> vdl:mains @ MolDyn-244-loops.kml, line: 2267 >>>> >>>> >>>> Looks like antechamber has failed (?). And the failure is only on a >>>> swfit side, it never made it across to Falcon (there are no remote >>>> directories created). But I see some of antechamber jobs have >>>> finished (in shared). >>>> >>>> Yuqing -- could the changes you've made be responsible for these >>>> failures (I do not see how it could though) ? >>>> >>>> Ioan, what do you see in your logs ion these tasks: >>>> >>>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, >>>> identity=urn:0-1-56-0-1186429255786) setting status to Failed >>>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, >>>> identity=urn:0-1-57-0-1186429255798) setting status to Failed >>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>> identity=urn:0-1-59-0-1186429255800) setting status to Failed >>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>> identity=urn:0-1-60-0-1186429255805) setting status to Failed >>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>> identity=urn:0-1-61-0-1186429255811) setting status to Failed >>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>> identity=urn:0-1-58-0-1186429255814) setting status to Failed >>>> >>>> Nika >>>> >>>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >>>> >>>>> OK! >>>>> Why don't we do one last run from my allocation, as everything is >>>>> set up already and ready to go! Make sure to enable all debug >>>>> logging. Falkon is up and running with all debug enabled! >>>>> >>>>> Falkon location is unchanged from the last experiment. >>>>> Falkon Factory Service: >>>>> http://tg-viz-login2:50010/wsrf/services/GenericPortal/core/WS/GPFactoryService >>>>> >>>>> Web Server (graphs): >>>>> http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>>> >>>>> ANL/UC is not quite so idle as it was earlier, but I bet we could >>>>> still get 150~200 processors! >>>>> >>>>> Ioan >>>>> >>>>> Veronika Nefedova wrote: >>>>>> m050 and m179 finished just fine now via GRAM (thanks to Yuqing >>>>>> who fixed the m179 just in time!). We could start again the 244- >>>>>> molecule run to verify that nothing is wrong with the whole system. >>>>>> >>>>>> Nika >>>>>> >>>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>>>> >>>>>>> >>>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>>>> >>>>>>> >>>>>>> I started those 2 molecules via GRAM. I have no trust in m179 >>>>>>> finishing completely since I didn't change anything. I hope for >>>>>>> m050 to finish though... >>>>>>> You can watch the swift log on viper in >>>>>>> ~nefedova/alamines/MolDyn-2-loops-be9484k93kk21.log >>>>>>> >>>>>>> Nika >>>>>>> >>>>>>>> Then, let's try another run with 244 molecules soon, as most of >>>>>>>> ANL/UC is free! >>>>>>>> >>>>>>>> Ioan >>>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> >> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Mon Aug 6 16:29:09 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 16:29:09 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B790B8.3080004@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> Message-ID: <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> Ioan, its all was due to NFS problems, I am convinced now... I restarted the run, the log is ~nefedova/alamines/MolDyn-244-loops- hxl1glhtqsag0.log Nika On Aug 6, 2007, at 4:20 PM, Ioan Raicu wrote: > Just to debug further.... I picked out 1 task at random from the > Swift log... > iraicu at viper:/home/nefedova/alamines> cat MolDyn-244-loops- > dbui34oxjr4j2.log | grep "urn:0-1-62-0-1186429258791" > 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, identity=urn: > 0-1-62-0-1186429258791) setting status to Submitted > 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, identity=urn: > 0-1-62-0-1186429258791) setting status to Active > 2007-08-06 14:47:03,704 DEBUG TaskImpl Task(type=2, identity=urn: > 0-1-62-0-1186429258791) setting status to Failed Exception in getFile > > but in my log, it is nowhere to be found... > iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat > GenericPortalWS_taskPerf.txt | grep "urn:0-1-62-0-1186429258791" > > What does "setting status to Failed Exception in getFile" mean? > Could this mean that it failed on the data staging part, and that > it never made it to Falkon? > > BTW, it lloks as if there were really 539 jobs submitted... > > iraicu at viper:/home/nefedova/alamines> grep "Submitted" MolDyn-244- > loops-dbui34oxjr4j2.log | wc > 539 5390 62835 > > but again, only 57 made it to Falkon, and there were no exceptions > thrown anywhere to indicate that something unusual happened. > > Ioan > > Ioan Raicu wrote: >> Falkon only has 57 tasks received, here they are: >> tg-viz-login.uc.teragrid.org:/home/iraicu/java/Falkon_v0.8.1/ >> service/logs/GenericPortalWS.txt.0.summary >> >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> pre_ch-vsk58efi stdout.txt stderr.txt . ./m179.mol2 ./m050.mol2 >> m179_am1 m050_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-xsk58efi stdout.txt stderr.txt m179_am1 m179_am1.rtf >> m179_am1.crd m179_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m179_am1 -fi mol2 -rn m179 -o m179_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-ysk58efi stdout.txt stderr.txt m050_am1 m050_am1.rtf >> m050_am1.crd m050_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m050_am1 -fi mol2 -rn m050 -o m050_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> chrm-0tk58efi equil_solv.out_m050 stderr.txt equil_solv.inp >> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp >> m050_am1.rtf m050_am1.prm m050_am1.crd water_400.crd >> equil_solv.out_m050 solv_m050.psf solv_m050_eq.crd solv_m050.rst >> solv_m050.trj solv_m050_min.crd /disks/scratchgpfs1/iraicu/ModLyn/ >> bin/charmm.sh system:solv_m050 title:solv stitle:m050 >> rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm >> gaff:m050_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 rwater: >> 15 nstep:10000 minstep:100 skipstep:100 startstep:10000 >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> chrm-zsk58efi equil_solv.out_m179 stderr.txt equil_solv.inp >> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp >> m179_am1.rtf m179_am1.prm m179_am1.crd water_400.crd >> equil_solv.out_m179 solv_m179.psf solv_m179_eq.crd solv_m179.rst >> solv_m179.trj solv_m179_min.crd /disks/scratchgpfs1/iraicu/ModLyn/ >> bin/charmm.sh system:solv_m179 title:solv stitle:m179 >> rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm >> gaff:m179_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 rwater: >> 15 nstep:10000 minstep:100 skipstep:100 startstep:10000 >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> pre_ch-38lc8efi stdout.txt stderr.txt . ./m197.mol2 ./ >> m129.mol2 ./m069.mol2 ./m163.mol2 ./m128.mol2 ./m035.mol2 ./ >> m070.mol2 ./m221.mol2 ./m162.mol2 ./m198.mol2 ./m034.mol2 ./ >> m001.mol2 ./m220.mol2 ./m033.mol2 ./m161.mol2 ./m032.mol2 ./ >> m160.mol2 ./m130.mol2 ./m071.mol2 ./m002.mol2 ./m199.mol2 ./ >> m175.mol2 ./m234.mol2 ./m048.mol2 ./m107.mol2 ./m047.mol2 ./ >> m106.mol2 ./m124.mol2 ./m193.mol2 ./m225.mol2 ./m066.mol2 ./ >> m125.mol2 ./m176.mol2 ./m194.mol2 ./m224.mol2 ./m235.mol2 ./ >> m067.mol2 ./m165.mol2 ./m049.mol2 ./m126.mol2 ./m166.mol2 ./ >> m108.mol2 ./m195.mol2 ./m038.mol2 ./m059.mol2 ./m036.mol2 ./ >> m186.mol2 ./m164.mol2 ./m117.mol2 ./m223.mol2 ./m058.mol2 ./ >> m037.mol2 ./m188.mol2 ./m068.mol2 ./m119.mol2 ./m187.mol2 ./ >> m196.mol2 ./m118.mol2 ./m127.mol2 ./m222.mol2 ./m189.mol2 ./ >> m060.mol2 ./m236.mol2 ./m109.mol2 ./m177.mol2 ./m050.mol2 ./ >> m179.mol2 ./m178.mol2 ./m123.mol2 ./m237.mol2 ./m110.mol2 ./ >> m191.mol2 ./m100.mol2 ./m064.mol2 ./m041.mol2 ./m238.mol2 ./ >> m063.mol2 ./m228.mol2 ./m051.mol2 ./m122.mol2 ./m169.mol2 ./ >> m121.mol2 ./m190.mol2 ./m120.mol2 ./m062.mol2 ./m065.mol2 ./ >> m039.mol2 ./m192.mol2 ./m167.mol2 ./m227.mol2 ./m040.mol2 ./ >> m226.mol2 ./m168.mol2 ./m239.mol2 ./m052.mol2 ./m111.mol2 ./ >> m180.mol2 ./m053.mol2 ./m112.mol2 ./m181.mol2 ./m240.mol2 ./ >> m054.mol2 ./m044.mol2 ./m113.mol2 ./m230.mol2 ./m103.mol2 ./ >> m229.mol2 ./m061.mol2 ./m042.mol2 ./m101.mol2 ./m170.mol2 ./ >> m043.mol2 ./m102.mol2 ./m171.mol2 ./m151.mol2 ./m083.mol2 ./ >> m210.mol2 ./m014.mol2 ./m023.mol2 ./m200.mol2 ./m092.mol2 ./ >> m091.mol2 ./m150.mol2 ./m209.mol2 ./m022.mol2 ./m024.mol2 ./ >> m093.mol2 ./m015.mol2 ./m084.mol2 ./m142.mol2 ./m201.mol2 ./ >> m016.mol2 ./m085.mol2 ./m143.mol2 ./m202.mol2 ./m010.mol2 ./ >> m212.mol2 ./m138.mol2 ./m026.mol2 ./m011.mol2 ./m095.mol2 ./ >> m139.mol2 ./m154.mol2 ./m211.mol2 ./m025.mol2 ./m094.mol2 ./ >> m153.mol2 ./m213.mol2 ./m080.mol2 ./m012.mol2 ./m152.mol2 ./ >> m081.mol2 ./m140.mol2 ./m013.mol2 ./m082.mol2 ./m141.mol2 ./ >> m028.mol2 ./m097.mol2 ./m155.mol2 ./m008.mol2 ./m214.mol2 ./ >> m135.mol2 ./m029.mol2 ./m076.mol2 ./m098.mol2 ./m007.mol2 ./ >> m156.mol2 ./m134.mol2 ./m215.mol2 ./m137.mol2 ./m079.mol2 ./ >> m009.mol2 ./m078.mol2 ./m077.mol2 ./m096.mol2 ./m136.mol2 ./ >> m027.mol2 ./m132.mol2 ./m158.mol2 ./m073.mol2 ./m217.mol2 ./ >> m030.mol2 ./m159.mol2 ./m072.mol2 ./m218.mol2 ./m003.mol2 ./ >> m031.mol2 ./m004.mol2 ./m219.mol2 ./m131.mol2 ./m074.mol2 ./ >> m133.mol2 ./m006.mol2 ./m075.mol2 ./m157.mol2 ./m099.mol2 ./ >> m005.mol2 ./m216.mol2 ./m090.mol2 ./m021.mol2 ./m208.mol2 ./ >> m149.mol2 ./m020.mol2 ./m207.mol2 ./m148.mol2 ./m088.mol2 ./ >> m089.mol2 ./m206.mol2 ./m147.mol2 ./m019.mol2 ./m205.mol2 ./ >> m146.mol2 ./m087.mol2 ./m018.mol2 ./m204.mol2 ./m145.mol2 ./ >> m086.mol2 ./m017.mol2 ./m144.mol2 ./m203.mol2 ./m057.mol2 ./ >> m116.mol2 ./m232.mol2 ./m173.mol2 ./m105.mol2 ./m046.mol2 ./ >> m231.mol2 ./m172.mol2 ./m104.mol2 ./m045.mol2 ./m174.mol2 ./ >> m233.mol2 ./m244.mol2 ./m185.mol2 ./m182.mol2 ./m243.mol2 ./ >> m055.mol2 ./m241.mol2 ./m183.mol2 ./m114.mol2 ./m056.mol2 ./ >> m242.mol2 ./m184.mol2 ./m115.mol2 m197_am1 m129_am1 m069_am1 >> m163_am1 m128_am1 m035_am1 m070_am1 m221_am1 m162_am1 m198_am1 >> m034_am1 m001_am1 m220_am1 m033_am1 m161_am1 m032_am1 m160_am1 >> m130_am1 m071_am1 m002_am1 m199_am1 m175_am1 m234_am1 m048_am1 >> m107_am1 m047_am1 m106_am1 m124_am1 m193_am1 m225_am1 m066_am1 >> m125_am1 m176_am1 m194_am1 m224_am1 m235_am1 m067_am1 m165_am1 >> m049_am1 m126_am1 m166_am1 m108_am1 m195_am1 m038_am1 m059_am1 >> m036_am1 m186_am1 m164_am1 m223_am1 m117_am1 m037_am1 m058_am1 >> m068_am1 m188_am1 m119_am1 m196_am1 m187_am1 m222_am1 m127_am1 >> m118_am1 m189_am1 m060_am1 m236_am1 m109_am1 m177_am1 m050_am1 >> m179_am1 m123_am1 m178_am1 m237_am1 m100_am1 m191_am1 m110_am1 >> m041_am1 m064_am1 m228_am1 m063_am1 m238_am1 m169_am1 m122_am1 >> m051_am1 m121_am1 m190_am1 m120_am1 m062_am1 m039_am1 m065_am1 >> m167_am1 m192_am1 m227_am1 m040_am1 m226_am1 m168_am1 m239_am1 >> m052_am1 m111_am1 m180_am1 m053_am1 m112_am1 m181_am1 m240_am1 >> m054_am1 m044_am1 m113_am1 m230_am1 m103_am1 m229_am1 m061_am1 >> m042_am1 m101_am1 m170_am1 m043_am1 m102_am1 m171_am1 m151_am1 >> m083_am1 m210_am1 m014_am1 m023_am1 m200_am1 m092_am1 m091_am1 >> m150_am1 m209_am1 m022_am1 m024_am1 m093_am1 m015_am1 m084_am1 >> m142_am1 m201_am1 m016_am1 m085_am1 m143_am1 m202_am1 m010_am1 >> m212_am1 m138_am1 m026_am1 m011_am1 m095_am1 m139_am1 m154_am1 >> m211_am1 m025_am1 m094_am1 m153_am1 m213_am1 m080_am1 m012_am1 >> m152_am1 m081_am1 m140_am1 m013_am1 m082_am1 m141_am1 m028_am1 >> m097_am1 m155_am1 m008_am1 m214_am1 m135_am1 m029_am1 m076_am1 >> m098_am1 m007_am1 m156_am1 m134_am1 m215_am1 m137_am1 m079_am1 >> m009_am1 m078_am1 m077_am1 m096_am1 m136_am1 m027_am1 m132_am1 >> m158_am1 m073_am1 m217_am1 m030_am1 m159_am1 m072_am1 m218_am1 >> m003_am1 m031_am1 m004_am1 m219_am1 m131_am1 m074_am1 m133_am1 >> m006_am1 m075_am1 m157_am1 m099_am1 m216_am1 m005_am1 m090_am1 >> m021_am1 m208_am1 m149_am1 m020_am1 m207_am1 m148_am1 m089_am1 >> m088_am1 m206_am1 m147_am1 m019_am1 m205_am1 m146_am1 m087_am1 >> m018_am1 m204_am1 m145_am1 m086_am1 m017_am1 m144_am1 m203_am1 >> m057_am1 m116_am1 m232_am1 m173_am1 m105_am1 m046_am1 m231_am1 >> m172_am1 m104_am1 m045_am1 m174_am1 m233_am1 m244_am1 m185_am1 >> m182_am1 m243_am1 m055_am1 m241_am1 m183_am1 m114_am1 m056_am1 >> m242_am1 m184_am1 m115_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> pre-antch.pl >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-58lc8efi stdout.txt stderr.txt m197_am1 m197_am1.rtf >> m197_am1.crd m197_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m197_am1 -fi mol2 -rn m197 -o m197_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-48lc8efi stdout.txt stderr.txt m129_am1 m129_am1.rtf >> m129_am1.crd m129_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m129_am1 -fi mol2 -rn m129 -o m129_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-68lc8efi stdout.txt stderr.txt m069_am1 m069_am1.rtf >> m069_am1.crd m069_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m069_am1 -fi mol2 -rn m069 -o m069_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-88lc8efi stdout.txt stderr.txt m163_am1 m163_am1.rtf >> m163_am1.crd m163_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m163_am1 -fi mol2 -rn m163 -o m163_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-78lc8efi stdout.txt stderr.txt m128_am1 m128_am1.rtf >> m128_am1.crd m128_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m128_am1 -fi mol2 -rn m128 -o m128_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-98lc8efi stdout.txt stderr.txt m035_am1 m035_am1.rtf >> m035_am1.crd m035_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m035_am1 -fi mol2 -rn m035 -o m035_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-a8lc8efi stdout.txt stderr.txt m070_am1 m070_am1.rtf >> m070_am1.crd m070_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m070_am1 -fi mol2 -rn m070 -o m070_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-b8lc8efi stdout.txt stderr.txt m221_am1 m221_am1.rtf >> m221_am1.crd m221_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m221_am1 -fi mol2 -rn m221 -o m221_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-c8lc8efi stdout.txt stderr.txt m162_am1 m162_am1.rtf >> m162_am1.crd m162_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m162_am1 -fi mol2 -rn m162 -o m162_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-d8lc8efi stdout.txt stderr.txt m198_am1 m198_am1.rtf >> m198_am1.crd m198_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m198_am1 -fi mol2 -rn m198 -o m198_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-e8lc8efi stdout.txt stderr.txt m034_am1 m034_am1.rtf >> m034_am1.crd m034_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m034_am1 -fi mol2 -rn m034 -o m034_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-f8lc8efi stdout.txt stderr.txt m001_am1 m001_am1.rtf >> m001_am1.crd m001_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m001_am1 -fi mol2 -rn m001 -o m001_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-h8lc8efi stdout.txt stderr.txt m033_am1 m033_am1.rtf >> m033_am1.crd m033_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m033_am1 -fi mol2 -rn m033 -o m033_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-g8lc8efi stdout.txt stderr.txt m220_am1 m220_am1.rtf >> m220_am1.crd m220_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m220_am1 -fi mol2 -rn m220 -o m220_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-i8lc8efi stdout.txt stderr.txt m161_am1 m161_am1.rtf >> m161_am1.crd m161_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m161_am1 -fi mol2 -rn m161 -o m161_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-j8lc8efi stdout.txt stderr.txt m032_am1 m032_am1.rtf >> m032_am1.crd m032_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m032_am1 -fi mol2 -rn m032 -o m032_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-k8lc8efi stdout.txt stderr.txt m160_am1 m160_am1.rtf >> m160_am1.crd m160_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m160_am1 -fi mol2 -rn m160 -o m160_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-l8lc8efi stdout.txt stderr.txt m130_am1 m130_am1.rtf >> m130_am1.crd m130_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m130_am1 -fi mol2 -rn m130 -o m130_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-m8lc8efi stdout.txt stderr.txt m071_am1 m071_am1.rtf >> m071_am1.crd m071_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m071_am1 -fi mol2 -rn m071 -o m071_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-o8lc8efi stdout.txt stderr.txt m199_am1 m199_am1.rtf >> m199_am1.crd m199_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m199_am1 -fi mol2 -rn m199 -o m199_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-n8lc8efi stdout.txt stderr.txt m002_am1 m002_am1.rtf >> m002_am1.crd m002_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m002_am1 -fi mol2 -rn m002 -o m002_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-p8lc8efi stdout.txt stderr.txt m175_am1 m175_am1.rtf >> m175_am1.crd m175_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m175_am1 -fi mol2 -rn m175 -o m175_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-q8lc8efi stdout.txt stderr.txt m234_am1 m234_am1.rtf >> m234_am1.crd m234_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m234_am1 -fi mol2 -rn m234 -o m234_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-s8lc8efi stdout.txt stderr.txt m107_am1 m107_am1.rtf >> m107_am1.crd m107_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m107_am1 -fi mol2 -rn m107 -o m107_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-r8lc8efi stdout.txt stderr.txt m048_am1 m048_am1.rtf >> m048_am1.crd m048_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m048_am1 -fi mol2 -rn m048 -o m048_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-v8lc8efi stdout.txt stderr.txt m124_am1 m124_am1.rtf >> m124_am1.crd m124_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m124_am1 -fi mol2 -rn m124 -o m124_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-t8lc8efi stdout.txt stderr.txt m047_am1 m047_am1.rtf >> m047_am1.crd m047_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m047_am1 -fi mol2 -rn m047 -o m047_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-u8lc8efi stdout.txt stderr.txt m106_am1 m106_am1.rtf >> m106_am1.crd m106_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m106_am1 -fi mol2 -rn m106 -o m106_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-x8lc8efi stdout.txt stderr.txt m193_am1 m193_am1.rtf >> m193_am1.crd m193_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m193_am1 -fi mol2 -rn m193 -o m193_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-y8lc8efi stdout.txt stderr.txt m225_am1 m225_am1.rtf >> m225_am1.crd m225_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m225_am1 -fi mol2 -rn m225 -o m225_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-z8lc8efi stdout.txt stderr.txt m066_am1 m066_am1.rtf >> m066_am1.crd m066_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m066_am1 -fi mol2 -rn m066 -o m066_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-09lc8efi stdout.txt stderr.txt m125_am1 m125_am1.rtf >> m125_am1.crd m125_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m125_am1 -fi mol2 -rn m125 -o m125_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-29lc8efi stdout.txt stderr.txt m194_am1 m194_am1.rtf >> m194_am1.crd m194_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m194_am1 -fi mol2 -rn m194 -o m194_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-19lc8efi stdout.txt stderr.txt m176_am1 m176_am1.rtf >> m176_am1.crd m176_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m176_am1 -fi mol2 -rn m176 -o m176_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-39lc8efi stdout.txt stderr.txt m224_am1 m224_am1.rtf >> m224_am1.crd m224_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m224_am1 -fi mol2 -rn m224 -o m224_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-49lc8efi stdout.txt stderr.txt m235_am1 m235_am1.rtf >> m235_am1.crd m235_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m235_am1 -fi mol2 -rn m235 -o m235_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-69lc8efi stdout.txt stderr.txt m165_am1 m165_am1.rtf >> m165_am1.crd m165_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m165_am1 -fi mol2 -rn m165 -o m165_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-59lc8efi stdout.txt stderr.txt m067_am1 m067_am1.rtf >> m067_am1.crd m067_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m067_am1 -fi mol2 -rn m067 -o m067_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-79lc8efi stdout.txt stderr.txt m049_am1 m049_am1.rtf >> m049_am1.crd m049_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m049_am1 -fi mol2 -rn m049 -o m049_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-89lc8efi stdout.txt stderr.txt m126_am1 m126_am1.rtf >> m126_am1.crd m126_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m126_am1 -fi mol2 -rn m126 -o m126_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-99lc8efi stdout.txt stderr.txt m166_am1 m166_am1.rtf >> m166_am1.crd m166_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m166_am1 -fi mol2 -rn m166 -o m166_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-a9lc8efi stdout.txt stderr.txt m108_am1 m108_am1.rtf >> m108_am1.crd m108_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m108_am1 -fi mol2 -rn m108 -o m108_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-b9lc8efi stdout.txt stderr.txt m195_am1 m195_am1.rtf >> m195_am1.crd m195_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m195_am1 -fi mol2 -rn m195 -o m195_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-d9lc8efi stdout.txt stderr.txt m038_am1 m038_am1.rtf >> m038_am1.crd m038_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m038_am1 -fi mol2 -rn m038 -o m038_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-c9lc8efi stdout.txt stderr.txt m059_am1 m059_am1.rtf >> m059_am1.crd m059_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m059_am1 -fi mol2 -rn m059 -o m059_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-e9lc8efi stdout.txt stderr.txt m186_am1 m186_am1.rtf >> m186_am1.crd m186_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m186_am1 -fi mol2 -rn m186 -o m186_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-f9lc8efi stdout.txt stderr.txt m164_am1 m164_am1.rtf >> m164_am1.crd m164_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m164_am1 -fi mol2 -rn m164 -o m164_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-h9lc8efi stdout.txt stderr.txt m036_am1 m036_am1.rtf >> m036_am1.crd m036_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m036_am1 -fi mol2 -rn m036 -o m036_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-g9lc8efi stdout.txt stderr.txt m223_am1 m223_am1.rtf >> m223_am1.crd m223_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m223_am1 -fi mol2 -rn m223 -o m223_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-j9lc8efi stdout.txt stderr.txt m058_am1 m058_am1.rtf >> m058_am1.crd m058_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m058_am1 -fi mol2 -rn m058 -o m058_am1 -fo >> charmm -c bcc >> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >> antch-k9lc8efi stdout.txt stderr.txt m037_am1 m037_am1.rtf >> m037_am1.crd m037_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >> antechamber.sh -s 2 -i m037_am1 -fi mol2 -rn m037 -o m037_am1 -fo >> charmm -c bcc >> >> >> >> Veronika Nefedova wrote: >>> Swift thinks that it sent 248 jobs. >>> >>> nefedova at viper:~/alamines> grep "Running job " MolDyn-244-loops- >>> dbui34oxjr4j2.log | wc >>> 248 6931 56718 >>> nefedova at viper:~/alamines> >>> >>> On Aug 6, 2007, at 3:27 PM, Ioan Raicu wrote: >>> >>>> Everything is idle, there is no work to be done... >>>> >>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> tail >>>> GenericPortalWS_perf_per_sec.txt >>>> 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>> 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>> 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>> 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>> 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>> 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>> 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>> 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>> 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>> 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>> >>>> 24 workers are registered but idle.... queue length 0, 57 jobs >>>> completed. >>>> >>>> Also, see below all 57 jobs, they all finished with an exit code >>>> of 0, in other words succesfully! How many jobs does Swift >>>> think it sent? >>>> >>>> Ioan >>>> >>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>>> GenericPortalWS_taskPerf.txt >>>> //taskNum taskID workerID startTimeStamp execTimeStamp >>>> resultsQueueTimeStamp endTimeStamp waitQueueTime ex >>>> ecTime resultsQueueTime totalTime exitCode >>>> 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 560614 >>>> 560629 49780 338 15 50133 0 >>>> 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 561200 >>>> 561899 561909 216 699 10 925 0 >>>> 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 561373 >>>> 562150 562159 382 777 9 1168 0 >>>> 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 >>>> 1044916 1044926 62404 10200 10 72614 0 >>>> 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 1046453 >>>> 1047038 1047067 135 585 29 749 0 >>>> 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 1046429 >>>> 1053072 1053080 114 6643 8 6765 0 >>>> 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 1047051 >>>> 1054256 1054290 731 7205 34 7970 0 >>>> 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 1054267 >>>> 1054570 1054579 7943 303 9 8255 0 >>>> 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 1053087 >>>> 1056811 1056819 6765 3724 8 10497 0 >>>> 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 1054583 >>>> 1058691 1058719 8257 4108 28 12393 0 >>>> 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 1058704 >>>> 1059363 1059385 12373 659 22 13054 0 >>>> 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 1056826 >>>> 1060315 1060323 10497 3489 8 13994 0 >>>> 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 1059375 >>>> 1060589 1060596 13042 1214 7 14263 0 >>>> 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 1060603 >>>> 1060954 1061054 14265 351 100 14716 0 >>>> 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 1060329 >>>> 1061094 1061126 13993 765 32 14790 0 >>>> 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 1061105 >>>> 1065608 1065617 14414 4503 9 18926 0 >>>> 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 1065622 >>>> 1066307 1066315 18929 685 8 19622 0 >>>> 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 1061045 >>>> 1067540 1067563 14356 6495 23 20874 0 >>>> 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 1066320 >>>> 1069262 1069271 19625 2942 9 22576 0 >>>> 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 1067551 >>>> 1071003 1071011 20854 3452 8 24314 0 >>>> 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 1071016 >>>> 1071664 1071671 24316 648 7 24971 0 >>>> 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 1069275 >>>> 1071679 1071692 22577 2404 13 24994 0 >>>> 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 1071687 >>>> 1073978 1073988 24985 2291 10 27286 0 >>>> 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 1073992 >>>> 1075959 1075969 27286 1967 10 29263 0 >>>> 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 1071699 >>>> 1076704 1076713 24995 5005 9 30009 0 >>>> 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 1075972 >>>> 1077451 1077459 29264 1479 8 30751 0 >>>> 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 1076717 >>>> 1080157 1080165 30007 3440 8 33455 0 >>>> 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 1077464 >>>> 1080270 1080286 30752 2806 16 33574 0 >>>> 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 1080170 >>>> 1080611 1080619 33457 441 8 33906 0 >>>> 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 1080624 >>>> 1080973 1080983 33907 349 10 34266 0 >>>> 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 1080281 >>>> 1081405 1081413 33566 1124 8 34698 0 >>>> 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 1080986 >>>> 1082989 1082996 34267 2003 7 36277 0 >>>> 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 1083002 >>>> 1083370 1083378 36279 368 8 36655 0 >>>> 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 1081417 >>>> 1084830 1084837 34696 3413 7 38116 0 >>>> 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 1084843 >>>> 1085854 1085879 37761 1011 25 38797 0 >>>> 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 1085865 >>>> 1089502 1089511 38780 3637 9 42426 0 >>>> 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 1089515 >>>> 1089966 1089974 42428 451 8 42887 0 >>>> 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 1083383 >>>> 1091316 1091324 36658 7933 8 44599 0 >>>> 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 1091329 >>>> 1092042 1092049 44237 713 7 44957 0 >>>> 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 1092055 >>>> 1094242 1094249 44960 2187 7 47154 0 >>>> 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 1089979 >>>> 1094418 1094428 42889 4439 10 47338 0 >>>> 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 1094433 >>>> 1095082 1095089 47331 649 7 47987 0 >>>> 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 1095095 >>>> 1096846 1096853 47991 1751 7 49749 0 >>>> 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 1094256 >>>> 1098214 1098221 47156 3958 7 51121 0 >>>> 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 1096859 >>>> 1098627 1098637 49752 1768 10 51530 0 >>>> 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 1094037 >>>> 1098903 1098910 46940 4866 7 51813 0 >>>> 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 1099192 >>>> 1100210 1100246 52071 1018 36 53125 0 >>>> 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 1097371 >>>> 1100555 1100562 50260 3184 7 53451 0 >>>> 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 1097135 >>>> 1100896 1100904 50026 3761 8 53795 0 >>>> 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 1098640 >>>> 1101106 1101127 51523 2466 21 54010 0 >>>> 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 1099965 >>>> 1101217 1101224 52842 1252 7 54101 0 >>>> 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 1098227 >>>> 1101820 1101828 51112 3593 8 54713 0 >>>> 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 1097375 >>>> 1104132 1104139 50262 6757 7 57026 0 >>>> 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 1100221 >>>> 1106449 1106458 53096 6228 9 59333 0 >>>> 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 1098916 >>>> 1106473 1106481 51797 7557 8 59362 0 >>>> 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 563384 >>>> 1207793 1207801 71 644409 8 644488 0 >>>> 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 563413 >>>> 1216404 1216425 98 652991 21 653110 0 >>>> >>>> >>>> >>>> Veronika Nefedova wrote: >>>>> OK. There is something weird happening. I've got several such >>>>> entries in my swift log: >>>>> >>>>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application >>>>> exception: Task failed >>>>> task:execute @ vdl-int.k, line: 332 >>>>> vdl:execute2 @ execute-default.k, line: 22 >>>>> vdl:execute @ MolDyn-244-loops.kml, line: 20 >>>>> antchmbr @ MolDyn-244-loops.kml, line: 2845 >>>>> vdl:mains @ MolDyn-244-loops.kml, line: 2267 >>>>> >>>>> >>>>> Looks like antechamber has failed (?). And the failure is only >>>>> on a swfit side, it never made it across to Falcon (there are >>>>> no remote directories created). But I see some of antechamber >>>>> jobs have finished (in shared). >>>>> >>>>> Yuqing -- could the changes you've made be responsible for >>>>> these failures (I do not see how it could though) ? >>>>> >>>>> Ioan, what do you see in your logs ion these tasks: >>>>> >>>>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, >>>>> identity=urn:0-1-56-0-1186429255786) setting status to Failed >>>>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, >>>>> identity=urn:0-1-57-0-1186429255798) setting status to Failed >>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>> identity=urn:0-1-59-0-1186429255800) setting status to Failed >>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>> identity=urn:0-1-60-0-1186429255805) setting status to Failed >>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>> identity=urn:0-1-61-0-1186429255811) setting status to Failed >>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>> identity=urn:0-1-58-0-1186429255814) setting status to Failed >>>>> >>>>> Nika >>>>> >>>>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >>>>> >>>>>> OK! >>>>>> Why don't we do one last run from my allocation, as everything >>>>>> is set up already and ready to go! Make sure to enable all >>>>>> debug logging. Falkon is up and running with all debug enabled! >>>>>> >>>>>> Falkon location is unchanged from the last experiment. >>>>>> Falkon Factory Service: http://tg-viz-login2:50010/wsrf/ >>>>>> services/GenericPortal/core/WS/GPFactoryService >>>>>> Web Server (graphs): http://tg-viz-login2.uc.teragrid.org: >>>>>> 51000/index.htm >>>>>> >>>>>> ANL/UC is not quite so idle as it was earlier, but I bet we >>>>>> could still get 150~200 processors! >>>>>> >>>>>> Ioan >>>>>> >>>>>> Veronika Nefedova wrote: >>>>>>> m050 and m179 finished just fine now via GRAM (thanks to >>>>>>> Yuqing who fixed the m179 just in time!). We could start >>>>>>> again the 244- molecule run to verify that nothing is wrong >>>>>>> with the whole system. >>>>>>> >>>>>>> Nika >>>>>>> >>>>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>>>>> >>>>>>>> >>>>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>>>>> >>>>>>>> >>>>>>>> I started those 2 molecules via GRAM. I have no trust in >>>>>>>> m179 finishing completely since I didn't change anything. I >>>>>>>> hope for m050 to finish though... >>>>>>>> You can watch the swift log on viper in ~nefedova/alamines/ >>>>>>>> MolDyn-2-loops-be9484k93kk21.log >>>>>>>> >>>>>>>> Nika >>>>>>>> >>>>>>>>> Then, let's try another run with 244 molecules soon, as >>>>>>>>> most of ANL/UC is free! >>>>>>>>> >>>>>>>>> Ioan >>>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> >>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > From nefedova at mcs.anl.gov Mon Aug 6 17:25:52 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 17:25:52 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> Message-ID: <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> OK. I accidentally closed viper window where I started the workflow. The workflow was started with & so it was supposed to stay up even if I exited the shell. But apparently it didn't! This is the last entry in the log: 2007-08-06 17:16:59,483 INFO ResourcePool Destroying remote service instance... dummy function, this doesn't really do anything... (and it doesn't change ever since). What went wrong ? Why closing the shell actually killed the job? (ps shows no swift job) I checked 'history' and in fact the job was started with &: 999 swift -tc.file tc-uc.data -sites.file sites-uc-64.xml -debug MolDyn-244-loops.swift & I'll restart the workflow in 30 mins or so (from home) again. Sigh... Nika On Aug 6, 2007, at 4:29 PM, Veronika Nefedova wrote: > Ioan, its all was due to NFS problems, I am convinced now... > > I restarted the run, the log is ~nefedova/alamines/MolDyn-244-loops- > hxl1glhtqsag0.log > > Nika > > On Aug 6, 2007, at 4:20 PM, Ioan Raicu wrote: > >> Just to debug further.... I picked out 1 task at random from the >> Swift log... >> iraicu at viper:/home/nefedova/alamines> cat MolDyn-244-loops- >> dbui34oxjr4j2.log | grep "urn:0-1-62-0-1186429258791" >> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, identity=urn: >> 0-1-62-0-1186429258791) setting status to Submitted >> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, identity=urn: >> 0-1-62-0-1186429258791) setting status to Active >> 2007-08-06 14:47:03,704 DEBUG TaskImpl Task(type=2, identity=urn: >> 0-1-62-0-1186429258791) setting status to Failed Exception in getFile >> >> but in my log, it is nowhere to be found... >> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >> GenericPortalWS_taskPerf.txt | grep "urn:0-1-62-0-1186429258791" >> >> What does "setting status to Failed Exception in getFile" mean? >> Could this mean that it failed on the data staging part, and that >> it never made it to Falkon? >> >> BTW, it lloks as if there were really 539 jobs submitted... >> >> iraicu at viper:/home/nefedova/alamines> grep "Submitted" MolDyn-244- >> loops-dbui34oxjr4j2.log | wc >> 539 5390 62835 >> >> but again, only 57 made it to Falkon, and there were no exceptions >> thrown anywhere to indicate that something unusual happened. >> >> Ioan >> >> Ioan Raicu wrote: >>> Falkon only has 57 tasks received, here they are: >>> tg-viz-login.uc.teragrid.org:/home/iraicu/java/Falkon_v0.8.1/ >>> service/logs/GenericPortalWS.txt.0.summary >>> >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> pre_ch-vsk58efi stdout.txt stderr.txt . ./m179.mol2 ./m050.mol2 >>> m179_am1 m050_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/pre- >>> antch.pl >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-xsk58efi stdout.txt stderr.txt m179_am1 m179_am1.rtf >>> m179_am1.crd m179_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m179_am1 -fi mol2 -rn m179 -o m179_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-ysk58efi stdout.txt stderr.txt m050_am1 m050_am1.rtf >>> m050_am1.crd m050_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m050_am1 -fi mol2 -rn m050 -o m050_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> chrm-0tk58efi equil_solv.out_m050 stderr.txt equil_solv.inp >>> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp >>> m050_am1.rtf m050_am1.prm m050_am1.crd water_400.crd >>> equil_solv.out_m050 solv_m050.psf solv_m050_eq.crd solv_m050.rst >>> solv_m050.trj solv_m050_min.crd /disks/scratchgpfs1/iraicu/ >>> ModLyn/bin/charmm.sh system:solv_m050 title:solv stitle:m050 >>> rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm >>> gaff:m050_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 rwater: >>> 15 nstep:10000 minstep:100 skipstep:100 startstep:10000 >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> chrm-zsk58efi equil_solv.out_m179 stderr.txt equil_solv.inp >>> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp >>> m179_am1.rtf m179_am1.prm m179_am1.crd water_400.crd >>> equil_solv.out_m179 solv_m179.psf solv_m179_eq.crd solv_m179.rst >>> solv_m179.trj solv_m179_min.crd /disks/scratchgpfs1/iraicu/ >>> ModLyn/bin/charmm.sh system:solv_m179 title:solv stitle:m179 >>> rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm >>> gaff:m179_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 rwater: >>> 15 nstep:10000 minstep:100 skipstep:100 startstep:10000 >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> pre_ch-38lc8efi stdout.txt stderr.txt . ./m197.mol2 ./ >>> m129.mol2 ./m069.mol2 ./m163.mol2 ./m128.mol2 ./m035.mol2 ./ >>> m070.mol2 ./m221.mol2 ./m162.mol2 ./m198.mol2 ./m034.mol2 ./ >>> m001.mol2 ./m220.mol2 ./m033.mol2 ./m161.mol2 ./m032.mol2 ./ >>> m160.mol2 ./m130.mol2 ./m071.mol2 ./m002.mol2 ./m199.mol2 ./ >>> m175.mol2 ./m234.mol2 ./m048.mol2 ./m107.mol2 ./m047.mol2 ./ >>> m106.mol2 ./m124.mol2 ./m193.mol2 ./m225.mol2 ./m066.mol2 ./ >>> m125.mol2 ./m176.mol2 ./m194.mol2 ./m224.mol2 ./m235.mol2 ./ >>> m067.mol2 ./m165.mol2 ./m049.mol2 ./m126.mol2 ./m166.mol2 ./ >>> m108.mol2 ./m195.mol2 ./m038.mol2 ./m059.mol2 ./m036.mol2 ./ >>> m186.mol2 ./m164.mol2 ./m117.mol2 ./m223.mol2 ./m058.mol2 ./ >>> m037.mol2 ./m188.mol2 ./m068.mol2 ./m119.mol2 ./m187.mol2 ./ >>> m196.mol2 ./m118.mol2 ./m127.mol2 ./m222.mol2 ./m189.mol2 ./ >>> m060.mol2 ./m236.mol2 ./m109.mol2 ./m177.mol2 ./m050.mol2 ./ >>> m179.mol2 ./m178.mol2 ./m123.mol2 ./m237.mol2 ./m110.mol2 ./ >>> m191.mol2 ./m100.mol2 ./m064.mol2 ./m041.mol2 ./m238.mol2 ./ >>> m063.mol2 ./m228.mol2 ./m051.mol2 ./m122.mol2 ./m169.mol2 ./ >>> m121.mol2 ./m190.mol2 ./m120.mol2 ./m062.mol2 ./m065.mol2 ./ >>> m039.mol2 ./m192.mol2 ./m167.mol2 ./m227.mol2 ./m040.mol2 ./ >>> m226.mol2 ./m168.mol2 ./m239.mol2 ./m052.mol2 ./m111.mol2 ./ >>> m180.mol2 ./m053.mol2 ./m112.mol2 ./m181.mol2 ./m240.mol2 ./ >>> m054.mol2 ./m044.mol2 ./m113.mol2 ./m230.mol2 ./m103.mol2 ./ >>> m229.mol2 ./m061.mol2 ./m042.mol2 ./m101.mol2 ./m170.mol2 ./ >>> m043.mol2 ./m102.mol2 ./m171.mol2 ./m151.mol2 ./m083.mol2 ./ >>> m210.mol2 ./m014.mol2 ./m023.mol2 ./m200.mol2 ./m092.mol2 ./ >>> m091.mol2 ./m150.mol2 ./m209.mol2 ./m022.mol2 ./m024.mol2 ./ >>> m093.mol2 ./m015.mol2 ./m084.mol2 ./m142.mol2 ./m201.mol2 ./ >>> m016.mol2 ./m085.mol2 ./m143.mol2 ./m202.mol2 ./m010.mol2 ./ >>> m212.mol2 ./m138.mol2 ./m026.mol2 ./m011.mol2 ./m095.mol2 ./ >>> m139.mol2 ./m154.mol2 ./m211.mol2 ./m025.mol2 ./m094.mol2 ./ >>> m153.mol2 ./m213.mol2 ./m080.mol2 ./m012.mol2 ./m152.mol2 ./ >>> m081.mol2 ./m140.mol2 ./m013.mol2 ./m082.mol2 ./m141.mol2 ./ >>> m028.mol2 ./m097.mol2 ./m155.mol2 ./m008.mol2 ./m214.mol2 ./ >>> m135.mol2 ./m029.mol2 ./m076.mol2 ./m098.mol2 ./m007.mol2 ./ >>> m156.mol2 ./m134.mol2 ./m215.mol2 ./m137.mol2 ./m079.mol2 ./ >>> m009.mol2 ./m078.mol2 ./m077.mol2 ./m096.mol2 ./m136.mol2 ./ >>> m027.mol2 ./m132.mol2 ./m158.mol2 ./m073.mol2 ./m217.mol2 ./ >>> m030.mol2 ./m159.mol2 ./m072.mol2 ./m218.mol2 ./m003.mol2 ./ >>> m031.mol2 ./m004.mol2 ./m219.mol2 ./m131.mol2 ./m074.mol2 ./ >>> m133.mol2 ./m006.mol2 ./m075.mol2 ./m157.mol2 ./m099.mol2 ./ >>> m005.mol2 ./m216.mol2 ./m090.mol2 ./m021.mol2 ./m208.mol2 ./ >>> m149.mol2 ./m020.mol2 ./m207.mol2 ./m148.mol2 ./m088.mol2 ./ >>> m089.mol2 ./m206.mol2 ./m147.mol2 ./m019.mol2 ./m205.mol2 ./ >>> m146.mol2 ./m087.mol2 ./m018.mol2 ./m204.mol2 ./m145.mol2 ./ >>> m086.mol2 ./m017.mol2 ./m144.mol2 ./m203.mol2 ./m057.mol2 ./ >>> m116.mol2 ./m232.mol2 ./m173.mol2 ./m105.mol2 ./m046.mol2 ./ >>> m231.mol2 ./m172.mol2 ./m104.mol2 ./m045.mol2 ./m174.mol2 ./ >>> m233.mol2 ./m244.mol2 ./m185.mol2 ./m182.mol2 ./m243.mol2 ./ >>> m055.mol2 ./m241.mol2 ./m183.mol2 ./m114.mol2 ./m056.mol2 ./ >>> m242.mol2 ./m184.mol2 ./m115.mol2 m197_am1 m129_am1 m069_am1 >>> m163_am1 m128_am1 m035_am1 m070_am1 m221_am1 m162_am1 m198_am1 >>> m034_am1 m001_am1 m220_am1 m033_am1 m161_am1 m032_am1 m160_am1 >>> m130_am1 m071_am1 m002_am1 m199_am1 m175_am1 m234_am1 m048_am1 >>> m107_am1 m047_am1 m106_am1 m124_am1 m193_am1 m225_am1 m066_am1 >>> m125_am1 m176_am1 m194_am1 m224_am1 m235_am1 m067_am1 m165_am1 >>> m049_am1 m126_am1 m166_am1 m108_am1 m195_am1 m038_am1 m059_am1 >>> m036_am1 m186_am1 m164_am1 m223_am1 m117_am1 m037_am1 m058_am1 >>> m068_am1 m188_am1 m119_am1 m196_am1 m187_am1 m222_am1 m127_am1 >>> m118_am1 m189_am1 m060_am1 m236_am1 m109_am1 m177_am1 m050_am1 >>> m179_am1 m123_am1 m178_am1 m237_am1 m100_am1 m191_am1 m110_am1 >>> m041_am1 m064_am1 m228_am1 m063_am1 m238_am1 m169_am1 m122_am1 >>> m051_am1 m121_am1 m190_am1 m120_am1 m062_am1 m039_am1 m065_am1 >>> m167_am1 m192_am1 m227_am1 m040_am1 m226_am1 m168_am1 m239_am1 >>> m052_am1 m111_am1 m180_am1 m053_am1 m112_am1 m181_am1 m240_am1 >>> m054_am1 m044_am1 m113_am1 m230_am1 m103_am1 m229_am1 m061_am1 >>> m042_am1 m101_am1 m170_am1 m043_am1 m102_am1 m171_am1 m151_am1 >>> m083_am1 m210_am1 m014_am1 m023_am1 m200_am1 m092_am1 m091_am1 >>> m150_am1 m209_am1 m022_am1 m024_am1 m093_am1 m015_am1 m084_am1 >>> m142_am1 m201_am1 m016_am1 m085_am1 m143_am1 m202_am1 m010_am1 >>> m212_am1 m138_am1 m026_am1 m011_am1 m095_am1 m139_am1 m154_am1 >>> m211_am1 m025_am1 m094_am1 m153_am1 m213_am1 m080_am1 m012_am1 >>> m152_am1 m081_am1 m140_am1 m013_am1 m082_am1 m141_am1 m028_am1 >>> m097_am1 m155_am1 m008_am1 m214_am1 m135_am1 m029_am1 m076_am1 >>> m098_am1 m007_am1 m156_am1 m134_am1 m215_am1 m137_am1 m079_am1 >>> m009_am1 m078_am1 m077_am1 m096_am1 m136_am1 m027_am1 m132_am1 >>> m158_am1 m073_am1 m217_am1 m030_am1 m159_am1 m072_am1 m218_am1 >>> m003_am1 m031_am1 m004_am1 m219_am1 m131_am1 m074_am1 m133_am1 >>> m006_am1 m075_am1 m157_am1 m099_am1 m216_am1 m005_am1 m090_am1 >>> m021_am1 m208_am1 m149_am1 m020_am1 m207_am1 m148_am1 m089_am1 >>> m088_am1 m206_am1 m147_am1 m019_am1 m205_am1 m146_am1 m087_am1 >>> m018_am1 m204_am1 m145_am1 m086_am1 m017_am1 m144_am1 m203_am1 >>> m057_am1 m116_am1 m232_am1 m173_am1 m105_am1 m046_am1 m231_am1 >>> m172_am1 m104_am1 m045_am1 m174_am1 m233_am1 m244_am1 m185_am1 >>> m182_am1 m243_am1 m055_am1 m241_am1 m183_am1 m114_am1 m056_am1 >>> m242_am1 m184_am1 m115_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> pre-antch.pl >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-58lc8efi stdout.txt stderr.txt m197_am1 m197_am1.rtf >>> m197_am1.crd m197_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m197_am1 -fi mol2 -rn m197 -o m197_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-48lc8efi stdout.txt stderr.txt m129_am1 m129_am1.rtf >>> m129_am1.crd m129_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m129_am1 -fi mol2 -rn m129 -o m129_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-68lc8efi stdout.txt stderr.txt m069_am1 m069_am1.rtf >>> m069_am1.crd m069_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m069_am1 -fi mol2 -rn m069 -o m069_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-88lc8efi stdout.txt stderr.txt m163_am1 m163_am1.rtf >>> m163_am1.crd m163_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m163_am1 -fi mol2 -rn m163 -o m163_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-78lc8efi stdout.txt stderr.txt m128_am1 m128_am1.rtf >>> m128_am1.crd m128_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m128_am1 -fi mol2 -rn m128 -o m128_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-98lc8efi stdout.txt stderr.txt m035_am1 m035_am1.rtf >>> m035_am1.crd m035_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m035_am1 -fi mol2 -rn m035 -o m035_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-a8lc8efi stdout.txt stderr.txt m070_am1 m070_am1.rtf >>> m070_am1.crd m070_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m070_am1 -fi mol2 -rn m070 -o m070_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-b8lc8efi stdout.txt stderr.txt m221_am1 m221_am1.rtf >>> m221_am1.crd m221_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m221_am1 -fi mol2 -rn m221 -o m221_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-c8lc8efi stdout.txt stderr.txt m162_am1 m162_am1.rtf >>> m162_am1.crd m162_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m162_am1 -fi mol2 -rn m162 -o m162_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-d8lc8efi stdout.txt stderr.txt m198_am1 m198_am1.rtf >>> m198_am1.crd m198_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m198_am1 -fi mol2 -rn m198 -o m198_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-e8lc8efi stdout.txt stderr.txt m034_am1 m034_am1.rtf >>> m034_am1.crd m034_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m034_am1 -fi mol2 -rn m034 -o m034_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-f8lc8efi stdout.txt stderr.txt m001_am1 m001_am1.rtf >>> m001_am1.crd m001_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m001_am1 -fi mol2 -rn m001 -o m001_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-h8lc8efi stdout.txt stderr.txt m033_am1 m033_am1.rtf >>> m033_am1.crd m033_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m033_am1 -fi mol2 -rn m033 -o m033_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-g8lc8efi stdout.txt stderr.txt m220_am1 m220_am1.rtf >>> m220_am1.crd m220_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m220_am1 -fi mol2 -rn m220 -o m220_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-i8lc8efi stdout.txt stderr.txt m161_am1 m161_am1.rtf >>> m161_am1.crd m161_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m161_am1 -fi mol2 -rn m161 -o m161_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-j8lc8efi stdout.txt stderr.txt m032_am1 m032_am1.rtf >>> m032_am1.crd m032_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m032_am1 -fi mol2 -rn m032 -o m032_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-k8lc8efi stdout.txt stderr.txt m160_am1 m160_am1.rtf >>> m160_am1.crd m160_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m160_am1 -fi mol2 -rn m160 -o m160_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-l8lc8efi stdout.txt stderr.txt m130_am1 m130_am1.rtf >>> m130_am1.crd m130_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m130_am1 -fi mol2 -rn m130 -o m130_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-m8lc8efi stdout.txt stderr.txt m071_am1 m071_am1.rtf >>> m071_am1.crd m071_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m071_am1 -fi mol2 -rn m071 -o m071_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-o8lc8efi stdout.txt stderr.txt m199_am1 m199_am1.rtf >>> m199_am1.crd m199_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m199_am1 -fi mol2 -rn m199 -o m199_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-n8lc8efi stdout.txt stderr.txt m002_am1 m002_am1.rtf >>> m002_am1.crd m002_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m002_am1 -fi mol2 -rn m002 -o m002_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-p8lc8efi stdout.txt stderr.txt m175_am1 m175_am1.rtf >>> m175_am1.crd m175_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m175_am1 -fi mol2 -rn m175 -o m175_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-q8lc8efi stdout.txt stderr.txt m234_am1 m234_am1.rtf >>> m234_am1.crd m234_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m234_am1 -fi mol2 -rn m234 -o m234_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-s8lc8efi stdout.txt stderr.txt m107_am1 m107_am1.rtf >>> m107_am1.crd m107_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m107_am1 -fi mol2 -rn m107 -o m107_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-r8lc8efi stdout.txt stderr.txt m048_am1 m048_am1.rtf >>> m048_am1.crd m048_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m048_am1 -fi mol2 -rn m048 -o m048_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-v8lc8efi stdout.txt stderr.txt m124_am1 m124_am1.rtf >>> m124_am1.crd m124_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m124_am1 -fi mol2 -rn m124 -o m124_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-t8lc8efi stdout.txt stderr.txt m047_am1 m047_am1.rtf >>> m047_am1.crd m047_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m047_am1 -fi mol2 -rn m047 -o m047_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-u8lc8efi stdout.txt stderr.txt m106_am1 m106_am1.rtf >>> m106_am1.crd m106_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m106_am1 -fi mol2 -rn m106 -o m106_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-x8lc8efi stdout.txt stderr.txt m193_am1 m193_am1.rtf >>> m193_am1.crd m193_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m193_am1 -fi mol2 -rn m193 -o m193_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-y8lc8efi stdout.txt stderr.txt m225_am1 m225_am1.rtf >>> m225_am1.crd m225_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m225_am1 -fi mol2 -rn m225 -o m225_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-z8lc8efi stdout.txt stderr.txt m066_am1 m066_am1.rtf >>> m066_am1.crd m066_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m066_am1 -fi mol2 -rn m066 -o m066_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-09lc8efi stdout.txt stderr.txt m125_am1 m125_am1.rtf >>> m125_am1.crd m125_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m125_am1 -fi mol2 -rn m125 -o m125_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-29lc8efi stdout.txt stderr.txt m194_am1 m194_am1.rtf >>> m194_am1.crd m194_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m194_am1 -fi mol2 -rn m194 -o m194_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-19lc8efi stdout.txt stderr.txt m176_am1 m176_am1.rtf >>> m176_am1.crd m176_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m176_am1 -fi mol2 -rn m176 -o m176_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-39lc8efi stdout.txt stderr.txt m224_am1 m224_am1.rtf >>> m224_am1.crd m224_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m224_am1 -fi mol2 -rn m224 -o m224_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-49lc8efi stdout.txt stderr.txt m235_am1 m235_am1.rtf >>> m235_am1.crd m235_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m235_am1 -fi mol2 -rn m235 -o m235_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-69lc8efi stdout.txt stderr.txt m165_am1 m165_am1.rtf >>> m165_am1.crd m165_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m165_am1 -fi mol2 -rn m165 -o m165_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-59lc8efi stdout.txt stderr.txt m067_am1 m067_am1.rtf >>> m067_am1.crd m067_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m067_am1 -fi mol2 -rn m067 -o m067_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-79lc8efi stdout.txt stderr.txt m049_am1 m049_am1.rtf >>> m049_am1.crd m049_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m049_am1 -fi mol2 -rn m049 -o m049_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-89lc8efi stdout.txt stderr.txt m126_am1 m126_am1.rtf >>> m126_am1.crd m126_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m126_am1 -fi mol2 -rn m126 -o m126_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-99lc8efi stdout.txt stderr.txt m166_am1 m166_am1.rtf >>> m166_am1.crd m166_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m166_am1 -fi mol2 -rn m166 -o m166_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-a9lc8efi stdout.txt stderr.txt m108_am1 m108_am1.rtf >>> m108_am1.crd m108_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m108_am1 -fi mol2 -rn m108 -o m108_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-b9lc8efi stdout.txt stderr.txt m195_am1 m195_am1.rtf >>> m195_am1.crd m195_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m195_am1 -fi mol2 -rn m195 -o m195_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-d9lc8efi stdout.txt stderr.txt m038_am1 m038_am1.rtf >>> m038_am1.crd m038_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m038_am1 -fi mol2 -rn m038 -o m038_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-c9lc8efi stdout.txt stderr.txt m059_am1 m059_am1.rtf >>> m059_am1.crd m059_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m059_am1 -fi mol2 -rn m059 -o m059_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-e9lc8efi stdout.txt stderr.txt m186_am1 m186_am1.rtf >>> m186_am1.crd m186_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m186_am1 -fi mol2 -rn m186 -o m186_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-f9lc8efi stdout.txt stderr.txt m164_am1 m164_am1.rtf >>> m164_am1.crd m164_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m164_am1 -fi mol2 -rn m164 -o m164_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-h9lc8efi stdout.txt stderr.txt m036_am1 m036_am1.rtf >>> m036_am1.crd m036_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m036_am1 -fi mol2 -rn m036 -o m036_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-g9lc8efi stdout.txt stderr.txt m223_am1 m223_am1.rtf >>> m223_am1.crd m223_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m223_am1 -fi mol2 -rn m223 -o m223_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-j9lc8efi stdout.txt stderr.txt m058_am1 m058_am1.rtf >>> m058_am1.crd m058_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m058_am1 -fi mol2 -rn m058 -o m058_am1 -fo >>> charmm -c bcc >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>> antch-k9lc8efi stdout.txt stderr.txt m037_am1 m037_am1.rtf >>> m037_am1.crd m037_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>> antechamber.sh -s 2 -i m037_am1 -fi mol2 -rn m037 -o m037_am1 -fo >>> charmm -c bcc >>> >>> >>> >>> Veronika Nefedova wrote: >>>> Swift thinks that it sent 248 jobs. >>>> >>>> nefedova at viper:~/alamines> grep "Running job " MolDyn-244-loops- >>>> dbui34oxjr4j2.log | wc >>>> 248 6931 56718 >>>> nefedova at viper:~/alamines> >>>> >>>> On Aug 6, 2007, at 3:27 PM, Ioan Raicu wrote: >>>> >>>>> Everything is idle, there is no work to be done... >>>>> >>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> tail >>>>> GenericPortalWS_perf_per_sec.txt >>>>> 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>> 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>> 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>> 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>> 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>> 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>> 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>> 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>> 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>> 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>> >>>>> 24 workers are registered but idle.... queue length 0, 57 jobs >>>>> completed. >>>>> >>>>> Also, see below all 57 jobs, they all finished with an exit >>>>> code of 0, in other words succesfully! How many jobs does >>>>> Swift think it sent? >>>>> >>>>> Ioan >>>>> >>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>>>> GenericPortalWS_taskPerf.txt >>>>> //taskNum taskID workerID startTimeStamp execTimeStamp >>>>> resultsQueueTimeStamp endTimeStamp waitQueueTime ex >>>>> ecTime resultsQueueTime totalTime exitCode >>>>> 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 560614 >>>>> 560629 49780 338 15 50133 0 >>>>> 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 561200 >>>>> 561899 561909 216 699 10 925 0 >>>>> 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 561373 >>>>> 562150 562159 382 777 9 1168 0 >>>>> 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 >>>>> 1044916 1044926 62404 10200 10 72614 0 >>>>> 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 1046453 >>>>> 1047038 1047067 135 585 29 749 0 >>>>> 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 1046429 >>>>> 1053072 1053080 114 6643 8 6765 0 >>>>> 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 1047051 >>>>> 1054256 1054290 731 7205 34 7970 0 >>>>> 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 1054267 >>>>> 1054570 1054579 7943 303 9 8255 0 >>>>> 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 1053087 >>>>> 1056811 1056819 6765 3724 8 10497 0 >>>>> 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 1054583 >>>>> 1058691 1058719 8257 4108 28 12393 0 >>>>> 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 1058704 >>>>> 1059363 1059385 12373 659 22 13054 0 >>>>> 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 1056826 >>>>> 1060315 1060323 10497 3489 8 13994 0 >>>>> 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 1059375 >>>>> 1060589 1060596 13042 1214 7 14263 0 >>>>> 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 >>>>> 1060603 1060954 1061054 14265 351 100 14716 0 >>>>> 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 >>>>> 1060329 1061094 1061126 13993 765 32 14790 0 >>>>> 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 >>>>> 1061105 1065608 1065617 14414 4503 9 18926 0 >>>>> 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 >>>>> 1065622 1066307 1066315 18929 685 8 19622 0 >>>>> 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 >>>>> 1061045 1067540 1067563 14356 6495 23 20874 0 >>>>> 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 >>>>> 1066320 1069262 1069271 19625 2942 9 22576 0 >>>>> 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 >>>>> 1067551 1071003 1071011 20854 3452 8 24314 0 >>>>> 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 >>>>> 1071016 1071664 1071671 24316 648 7 24971 0 >>>>> 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 >>>>> 1069275 1071679 1071692 22577 2404 13 24994 0 >>>>> 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 >>>>> 1071687 1073978 1073988 24985 2291 10 27286 0 >>>>> 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 >>>>> 1073992 1075959 1075969 27286 1967 10 29263 0 >>>>> 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 >>>>> 1071699 1076704 1076713 24995 5005 9 30009 0 >>>>> 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 >>>>> 1075972 1077451 1077459 29264 1479 8 30751 0 >>>>> 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 >>>>> 1076717 1080157 1080165 30007 3440 8 33455 0 >>>>> 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 >>>>> 1077464 1080270 1080286 30752 2806 16 33574 0 >>>>> 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 >>>>> 1080170 1080611 1080619 33457 441 8 33906 0 >>>>> 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 >>>>> 1080624 1080973 1080983 33907 349 10 34266 0 >>>>> 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 >>>>> 1080281 1081405 1081413 33566 1124 8 34698 0 >>>>> 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 >>>>> 1080986 1082989 1082996 34267 2003 7 36277 0 >>>>> 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 >>>>> 1083002 1083370 1083378 36279 368 8 36655 0 >>>>> 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 >>>>> 1081417 1084830 1084837 34696 3413 7 38116 0 >>>>> 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 >>>>> 1084843 1085854 1085879 37761 1011 25 38797 0 >>>>> 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 >>>>> 1085865 1089502 1089511 38780 3637 9 42426 0 >>>>> 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 >>>>> 1089515 1089966 1089974 42428 451 8 42887 0 >>>>> 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 >>>>> 1083383 1091316 1091324 36658 7933 8 44599 0 >>>>> 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 >>>>> 1091329 1092042 1092049 44237 713 7 44957 0 >>>>> 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 >>>>> 1092055 1094242 1094249 44960 2187 7 47154 0 >>>>> 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 >>>>> 1089979 1094418 1094428 42889 4439 10 47338 0 >>>>> 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 >>>>> 1094433 1095082 1095089 47331 649 7 47987 0 >>>>> 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 >>>>> 1095095 1096846 1096853 47991 1751 7 49749 0 >>>>> 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 >>>>> 1094256 1098214 1098221 47156 3958 7 51121 0 >>>>> 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 >>>>> 1096859 1098627 1098637 49752 1768 10 51530 0 >>>>> 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 >>>>> 1094037 1098903 1098910 46940 4866 7 51813 0 >>>>> 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 >>>>> 1099192 1100210 1100246 52071 1018 36 53125 0 >>>>> 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 >>>>> 1097371 1100555 1100562 50260 3184 7 53451 0 >>>>> 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 >>>>> 1097135 1100896 1100904 50026 3761 8 53795 0 >>>>> 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 >>>>> 1098640 1101106 1101127 51523 2466 21 54010 0 >>>>> 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 >>>>> 1099965 1101217 1101224 52842 1252 7 54101 0 >>>>> 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 >>>>> 1098227 1101820 1101828 51112 3593 8 54713 0 >>>>> 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 >>>>> 1097375 1104132 1104139 50262 6757 7 57026 0 >>>>> 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 >>>>> 1100221 1106449 1106458 53096 6228 9 59333 0 >>>>> 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 >>>>> 1098916 1106473 1106481 51797 7557 8 59362 0 >>>>> 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 563384 >>>>> 1207793 1207801 71 644409 8 644488 0 >>>>> 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 563413 >>>>> 1216404 1216425 98 652991 21 653110 0 >>>>> >>>>> >>>>> >>>>> Veronika Nefedova wrote: >>>>>> OK. There is something weird happening. I've got several such >>>>>> entries in my swift log: >>>>>> >>>>>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application >>>>>> exception: Task failed >>>>>> task:execute @ vdl-int.k, line: 332 >>>>>> vdl:execute2 @ execute-default.k, line: 22 >>>>>> vdl:execute @ MolDyn-244-loops.kml, line: 20 >>>>>> antchmbr @ MolDyn-244-loops.kml, line: 2845 >>>>>> vdl:mains @ MolDyn-244-loops.kml, line: 2267 >>>>>> >>>>>> >>>>>> Looks like antechamber has failed (?). And the failure is only >>>>>> on a swfit side, it never made it across to Falcon (there are >>>>>> no remote directories created). But I see some of antechamber >>>>>> jobs have finished (in shared). >>>>>> >>>>>> Yuqing -- could the changes you've made be responsible for >>>>>> these failures (I do not see how it could though) ? >>>>>> >>>>>> Ioan, what do you see in your logs ion these tasks: >>>>>> >>>>>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, >>>>>> identity=urn:0-1-56-0-1186429255786) setting status to Failed >>>>>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, >>>>>> identity=urn:0-1-57-0-1186429255798) setting status to Failed >>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>> identity=urn:0-1-59-0-1186429255800) setting status to Failed >>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>> identity=urn:0-1-60-0-1186429255805) setting status to Failed >>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>> identity=urn:0-1-61-0-1186429255811) setting status to Failed >>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>> identity=urn:0-1-58-0-1186429255814) setting status to Failed >>>>>> >>>>>> Nika >>>>>> >>>>>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >>>>>> >>>>>>> OK! >>>>>>> Why don't we do one last run from my allocation, as >>>>>>> everything is set up already and ready to go! Make sure to >>>>>>> enable all debug logging. Falkon is up and running with all >>>>>>> debug enabled! >>>>>>> >>>>>>> Falkon location is unchanged from the last experiment. >>>>>>> Falkon Factory Service: http://tg-viz-login2:50010/wsrf/ >>>>>>> services/GenericPortal/core/WS/GPFactoryService >>>>>>> Web Server (graphs): http://tg-viz-login2.uc.teragrid.org: >>>>>>> 51000/index.htm >>>>>>> >>>>>>> ANL/UC is not quite so idle as it was earlier, but I bet we >>>>>>> could still get 150~200 processors! >>>>>>> >>>>>>> Ioan >>>>>>> >>>>>>> Veronika Nefedova wrote: >>>>>>>> m050 and m179 finished just fine now via GRAM (thanks to >>>>>>>> Yuqing who fixed the m179 just in time!). We could start >>>>>>>> again the 244- molecule run to verify that nothing is wrong >>>>>>>> with the whole system. >>>>>>>> >>>>>>>> Nika >>>>>>>> >>>>>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>>>>>> >>>>>>>>> >>>>>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> I started those 2 molecules via GRAM. I have no trust in >>>>>>>>> m179 finishing completely since I didn't change anything. I >>>>>>>>> hope for m050 to finish though... >>>>>>>>> You can watch the swift log on viper in ~nefedova/alamines/ >>>>>>>>> MolDyn-2-loops-be9484k93kk21.log >>>>>>>>> >>>>>>>>> Nika >>>>>>>>> >>>>>>>>>> Then, let's try another run with 244 molecules soon, as >>>>>>>>>> most of ANL/UC is free! >>>>>>>>>> >>>>>>>>>> Ioan >>>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From iraicu at cs.uchicago.edu Mon Aug 6 17:29:17 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 17:29:17 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> Message-ID: <46B7A0BD.7050807@cs.uchicago.edu> As far as I know, what you did will cause the background job to terminate when the shell exits, such as closing your window. What I do is have a wrapper "script1" that invokes "script2 &", which then lets me close the window without a problem. Ioan Veronika Nefedova wrote: > OK. I accidentally closed viper window where I started the workflow. > The workflow was started with & so it was supposed to stay up even if > I exited the shell. But apparently it didn't! > > This is the last entry in the log: > > 2007-08-06 17:16:59,483 INFO ResourcePool Destroying remote service > instance... dummy function, this doesn't really do anything... > > (and it doesn't change ever since). > > What went wrong ? Why closing the shell actually killed the job? (ps > shows no swift job) > I checked 'history' and in fact the job was started with &: > > 999 swift -tc.file tc-uc.data -sites.file sites-uc-64.xml -debug > MolDyn-244-loops.swift & > > I'll restart the workflow in 30 mins or so (from home) again. > > Sigh... > > Nika > > > On Aug 6, 2007, at 4:29 PM, Veronika Nefedova wrote: > >> Ioan, its all was due to NFS problems, I am convinced now... >> >> I restarted the run, the log is >> ~nefedova/alamines/MolDyn-244-loops-hxl1glhtqsag0.log >> >> Nika >> >> On Aug 6, 2007, at 4:20 PM, Ioan Raicu wrote: >> >>> Just to debug further.... I picked out 1 task at random from the >>> Swift log... >>> iraicu at viper:/home/nefedova/alamines> cat >>> MolDyn-244-loops-dbui34oxjr4j2.log | grep "urn:0-1-62-0-1186429258791" >>> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, >>> identity=urn:0-1-62-0-1186429258791) setting status to Submitted >>> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, >>> identity=urn:0-1-62-0-1186429258791) setting status to Active >>> 2007-08-06 14:47:03,704 DEBUG TaskImpl Task(type=2, >>> identity=urn:0-1-62-0-1186429258791) setting status to Failed >>> Exception in getFile >>> >>> but in my log, it is nowhere to be found... >>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>> GenericPortalWS_taskPerf.txt | grep "urn:0-1-62-0-1186429258791" >>> >>> What does "setting status to Failed Exception in getFile" mean? >>> Could this mean that it failed on the data staging part, and that it >>> never made it to Falkon? >>> >>> BTW, it lloks as if there were really 539 jobs submitted... >>> >>> iraicu at viper:/home/nefedova/alamines> grep "Submitted" >>> MolDyn-244-loops-dbui34oxjr4j2.log | wc >>> 539 5390 62835 >>> >>> but again, only 57 made it to Falkon, and there were no exceptions >>> thrown anywhere to indicate that something unusual happened. >>> >>> Ioan >>> >>> Ioan Raicu wrote: >>>> Falkon only has 57 tasks received, here they are: >>>> tg-viz-login.uc.teragrid.org:/home/iraicu/java/Falkon_v0.8.1/service/logs/GenericPortalWS.txt.0.summary >>>> >>>> >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> pre_ch-vsk58efi stdout.txt stderr.txt . ./m179.mol2 ./m050.mol2 >>>> m179_am1 m050_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-xsk58efi stdout.txt stderr.txt m179_am1 m179_am1.rtf >>>> m179_am1.crd m179_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m179_am1 -fi mol2 -rn m179 -o m179_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-ysk58efi stdout.txt stderr.txt m050_am1 m050_am1.rtf >>>> m050_am1.crd m050_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m050_am1 -fi mol2 -rn m050 -o m050_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> chrm-0tk58efi equil_solv.out_m050 stderr.txt equil_solv.inp >>>> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp >>>> m050_am1.rtf m050_am1.prm m050_am1.crd water_400.crd >>>> equil_solv.out_m050 solv_m050.psf solv_m050_eq.crd solv_m050.rst >>>> solv_m050.trj solv_m050_min.crd >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/charmm.sh system:solv_m050 >>>> title:solv stitle:m050 rtffile:parm03_gaff_all.rtf >>>> paramfile:parm03_gaffnb_all.prm gaff:m050_am1 nwater:400 ligcrd:lyz >>>> rforce:0 iseed:3131887 rwater:15 nstep:10000 minstep:100 >>>> skipstep:100 startstep:10000 >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> chrm-zsk58efi equil_solv.out_m179 stderr.txt equil_solv.inp >>>> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp >>>> m179_am1.rtf m179_am1.prm m179_am1.crd water_400.crd >>>> equil_solv.out_m179 solv_m179.psf solv_m179_eq.crd solv_m179.rst >>>> solv_m179.trj solv_m179_min.crd >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/charmm.sh system:solv_m179 >>>> title:solv stitle:m179 rtffile:parm03_gaff_all.rtf >>>> paramfile:parm03_gaffnb_all.prm gaff:m179_am1 nwater:400 ligcrd:lyz >>>> rforce:0 iseed:3131887 rwater:15 nstep:10000 minstep:100 >>>> skipstep:100 startstep:10000 >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> pre_ch-38lc8efi stdout.txt stderr.txt . ./m197.mol2 ./m129.mol2 >>>> ./m069.mol2 ./m163.mol2 ./m128.mol2 ./m035.mol2 ./m070.mol2 >>>> ./m221.mol2 ./m162.mol2 ./m198.mol2 ./m034.mol2 ./m001.mol2 >>>> ./m220.mol2 ./m033.mol2 ./m161.mol2 ./m032.mol2 ./m160.mol2 >>>> ./m130.mol2 ./m071.mol2 ./m002.mol2 ./m199.mol2 ./m175.mol2 >>>> ./m234.mol2 ./m048.mol2 ./m107.mol2 ./m047.mol2 ./m106.mol2 >>>> ./m124.mol2 ./m193.mol2 ./m225.mol2 ./m066.mol2 ./m125.mol2 >>>> ./m176.mol2 ./m194.mol2 ./m224.mol2 ./m235.mol2 ./m067.mol2 >>>> ./m165.mol2 ./m049.mol2 ./m126.mol2 ./m166.mol2 ./m108.mol2 >>>> ./m195.mol2 ./m038.mol2 ./m059.mol2 ./m036.mol2 ./m186.mol2 >>>> ./m164.mol2 ./m117.mol2 ./m223.mol2 ./m058.mol2 ./m037.mol2 >>>> ./m188.mol2 ./m068.mol2 ./m119.mol2 ./m187.mol2 ./m196.mol2 >>>> ./m118.mol2 ./m127.mol2 ./m222.mol2 ./m189.mol2 ./m060.mol2 >>>> ./m236.mol2 ./m109.mol2 ./m177.mol2 ./m050.mol2 ./m179.mol2 >>>> ./m178.mol2 ./m123.mol2 ./m237.mol2 ./m110.mol2 ./m191.mol2 >>>> ./m100.mol2 ./m064.mol2 ./m041.mol2 ./m238.mol2 ./m063.mol2 >>>> ./m228.mol2 ./m051.mol2 ./m122.mol2 ./m169.mol2 ./m121.mol2 >>>> ./m190.mol2 ./m120.mol2 ./m062.mol2 ./m065.mol2 ./m039.mol2 >>>> ./m192.mol2 ./m167.mol2 ./m227.mol2 ./m040.mol2 ./m226.mol2 >>>> ./m168.mol2 ./m239.mol2 ./m052.mol2 ./m111.mol2 ./m180.mol2 >>>> ./m053.mol2 ./m112.mol2 ./m181.mol2 ./m240.mol2 ./m054.mol2 >>>> ./m044.mol2 ./m113.mol2 ./m230.mol2 ./m103.mol2 ./m229.mol2 >>>> ./m061.mol2 ./m042.mol2 ./m101.mol2 ./m170.mol2 ./m043.mol2 >>>> ./m102.mol2 ./m171.mol2 ./m151.mol2 ./m083.mol2 ./m210.mol2 >>>> ./m014.mol2 ./m023.mol2 ./m200.mol2 ./m092.mol2 ./m091.mol2 >>>> ./m150.mol2 ./m209.mol2 ./m022.mol2 ./m024.mol2 ./m093.mol2 >>>> ./m015.mol2 ./m084.mol2 ./m142.mol2 ./m201.mol2 ./m016.mol2 >>>> ./m085.mol2 ./m143.mol2 ./m202.mol2 ./m010.mol2 ./m212.mol2 >>>> ./m138.mol2 ./m026.mol2 ./m011.mol2 ./m095.mol2 ./m139.mol2 >>>> ./m154.mol2 ./m211.mol2 ./m025.mol2 ./m094.mol2 ./m153.mol2 >>>> ./m213.mol2 ./m080.mol2 ./m012.mol2 ./m152.mol2 ./m081.mol2 >>>> ./m140.mol2 ./m013.mol2 ./m082.mol2 ./m141.mol2 ./m028.mol2 >>>> ./m097.mol2 ./m155.mol2 ./m008.mol2 ./m214.mol2 ./m135.mol2 >>>> ./m029.mol2 ./m076.mol2 ./m098.mol2 ./m007.mol2 ./m156.mol2 >>>> ./m134.mol2 ./m215.mol2 ./m137.mol2 ./m079.mol2 ./m009.mol2 >>>> ./m078.mol2 ./m077.mol2 ./m096.mol2 ./m136.mol2 ./m027.mol2 >>>> ./m132.mol2 ./m158.mol2 ./m073.mol2 ./m217.mol2 ./m030.mol2 >>>> ./m159.mol2 ./m072.mol2 ./m218.mol2 ./m003.mol2 ./m031.mol2 >>>> ./m004.mol2 ./m219.mol2 ./m131.mol2 ./m074.mol2 ./m133.mol2 >>>> ./m006.mol2 ./m075.mol2 ./m157.mol2 ./m099.mol2 ./m005.mol2 >>>> ./m216.mol2 ./m090.mol2 ./m021.mol2 ./m208.mol2 ./m149.mol2 >>>> ./m020.mol2 ./m207.mol2 ./m148.mol2 ./m088.mol2 ./m089.mol2 >>>> ./m206.mol2 ./m147.mol2 ./m019.mol2 ./m205.mol2 ./m146.mol2 >>>> ./m087.mol2 ./m018.mol2 ./m204.mol2 ./m145.mol2 ./m086.mol2 >>>> ./m017.mol2 ./m144.mol2 ./m203.mol2 ./m057.mol2 ./m116.mol2 >>>> ./m232.mol2 ./m173.mol2 ./m105.mol2 ./m046.mol2 ./m231.mol2 >>>> ./m172.mol2 ./m104.mol2 ./m045.mol2 ./m174.mol2 ./m233.mol2 >>>> ./m244.mol2 ./m185.mol2 ./m182.mol2 ./m243.mol2 ./m055.mol2 >>>> ./m241.mol2 ./m183.mol2 ./m114.mol2 ./m056.mol2 ./m242.mol2 >>>> ./m184.mol2 ./m115.mol2 m197_am1 m129_am1 m069_am1 m163_am1 >>>> m128_am1 m035_am1 m070_am1 m221_am1 m162_am1 m198_am1 m034_am1 >>>> m001_am1 m220_am1 m033_am1 m161_am1 m032_am1 m160_am1 m130_am1 >>>> m071_am1 m002_am1 m199_am1 m175_am1 m234_am1 m048_am1 m107_am1 >>>> m047_am1 m106_am1 m124_am1 m193_am1 m225_am1 m066_am1 m125_am1 >>>> m176_am1 m194_am1 m224_am1 m235_am1 m067_am1 m165_am1 m049_am1 >>>> m126_am1 m166_am1 m108_am1 m195_am1 m038_am1 m059_am1 m036_am1 >>>> m186_am1 m164_am1 m223_am1 m117_am1 m037_am1 m058_am1 m068_am1 >>>> m188_am1 m119_am1 m196_am1 m187_am1 m222_am1 m127_am1 m118_am1 >>>> m189_am1 m060_am1 m236_am1 m109_am1 m177_am1 m050_am1 m179_am1 >>>> m123_am1 m178_am1 m237_am1 m100_am1 m191_am1 m110_am1 m041_am1 >>>> m064_am1 m228_am1 m063_am1 m238_am1 m169_am1 m122_am1 m051_am1 >>>> m121_am1 m190_am1 m120_am1 m062_am1 m039_am1 m065_am1 m167_am1 >>>> m192_am1 m227_am1 m040_am1 m226_am1 m168_am1 m239_am1 m052_am1 >>>> m111_am1 m180_am1 m053_am1 m112_am1 m181_am1 m240_am1 m054_am1 >>>> m044_am1 m113_am1 m230_am1 m103_am1 m229_am1 m061_am1 m042_am1 >>>> m101_am1 m170_am1 m043_am1 m102_am1 m171_am1 m151_am1 m083_am1 >>>> m210_am1 m014_am1 m023_am1 m200_am1 m092_am1 m091_am1 m150_am1 >>>> m209_am1 m022_am1 m024_am1 m093_am1 m015_am1 m084_am1 m142_am1 >>>> m201_am1 m016_am1 m085_am1 m143_am1 m202_am1 m010_am1 m212_am1 >>>> m138_am1 m026_am1 m011_am1 m095_am1 m139_am1 m154_am1 m211_am1 >>>> m025_am1 m094_am1 m153_am1 m213_am1 m080_am1 m012_am1 m152_am1 >>>> m081_am1 m140_am1 m013_am1 m082_am1 m141_am1 m028_am1 m097_am1 >>>> m155_am1 m008_am1 m214_am1 m135_am1 m029_am1 m076_am1 m098_am1 >>>> m007_am1 m156_am1 m134_am1 m215_am1 m137_am1 m079_am1 m009_am1 >>>> m078_am1 m077_am1 m096_am1 m136_am1 m027_am1 m132_am1 m158_am1 >>>> m073_am1 m217_am1 m030_am1 m159_am1 m072_am1 m218_am1 m003_am1 >>>> m031_am1 m004_am1 m219_am1 m131_am1 m074_am1 m133_am1 m006_am1 >>>> m075_am1 m157_am1 m099_am1 m216_am1 m005_am1 m090_am1 m021_am1 >>>> m208_am1 m149_am1 m020_am1 m207_am1 m148_am1 m089_am1 m088_am1 >>>> m206_am1 m147_am1 m019_am1 m205_am1 m146_am1 m087_am1 m018_am1 >>>> m204_am1 m145_am1 m086_am1 m017_am1 m144_am1 m203_am1 m057_am1 >>>> m116_am1 m232_am1 m173_am1 m105_am1 m046_am1 m231_am1 m172_am1 >>>> m104_am1 m045_am1 m174_am1 m233_am1 m244_am1 m185_am1 m182_am1 >>>> m243_am1 m055_am1 m241_am1 m183_am1 m114_am1 m056_am1 m242_am1 >>>> m184_am1 m115_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-58lc8efi stdout.txt stderr.txt m197_am1 m197_am1.rtf >>>> m197_am1.crd m197_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m197_am1 -fi mol2 -rn m197 -o m197_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-48lc8efi stdout.txt stderr.txt m129_am1 m129_am1.rtf >>>> m129_am1.crd m129_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m129_am1 -fi mol2 -rn m129 -o m129_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-68lc8efi stdout.txt stderr.txt m069_am1 m069_am1.rtf >>>> m069_am1.crd m069_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m069_am1 -fi mol2 -rn m069 -o m069_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-88lc8efi stdout.txt stderr.txt m163_am1 m163_am1.rtf >>>> m163_am1.crd m163_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m163_am1 -fi mol2 -rn m163 -o m163_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-78lc8efi stdout.txt stderr.txt m128_am1 m128_am1.rtf >>>> m128_am1.crd m128_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m128_am1 -fi mol2 -rn m128 -o m128_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-98lc8efi stdout.txt stderr.txt m035_am1 m035_am1.rtf >>>> m035_am1.crd m035_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m035_am1 -fi mol2 -rn m035 -o m035_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-a8lc8efi stdout.txt stderr.txt m070_am1 m070_am1.rtf >>>> m070_am1.crd m070_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m070_am1 -fi mol2 -rn m070 -o m070_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-b8lc8efi stdout.txt stderr.txt m221_am1 m221_am1.rtf >>>> m221_am1.crd m221_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m221_am1 -fi mol2 -rn m221 -o m221_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-c8lc8efi stdout.txt stderr.txt m162_am1 m162_am1.rtf >>>> m162_am1.crd m162_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m162_am1 -fi mol2 -rn m162 -o m162_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-d8lc8efi stdout.txt stderr.txt m198_am1 m198_am1.rtf >>>> m198_am1.crd m198_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m198_am1 -fi mol2 -rn m198 -o m198_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-e8lc8efi stdout.txt stderr.txt m034_am1 m034_am1.rtf >>>> m034_am1.crd m034_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m034_am1 -fi mol2 -rn m034 -o m034_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-f8lc8efi stdout.txt stderr.txt m001_am1 m001_am1.rtf >>>> m001_am1.crd m001_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m001_am1 -fi mol2 -rn m001 -o m001_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-h8lc8efi stdout.txt stderr.txt m033_am1 m033_am1.rtf >>>> m033_am1.crd m033_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m033_am1 -fi mol2 -rn m033 -o m033_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-g8lc8efi stdout.txt stderr.txt m220_am1 m220_am1.rtf >>>> m220_am1.crd m220_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m220_am1 -fi mol2 -rn m220 -o m220_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-i8lc8efi stdout.txt stderr.txt m161_am1 m161_am1.rtf >>>> m161_am1.crd m161_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m161_am1 -fi mol2 -rn m161 -o m161_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-j8lc8efi stdout.txt stderr.txt m032_am1 m032_am1.rtf >>>> m032_am1.crd m032_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m032_am1 -fi mol2 -rn m032 -o m032_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-k8lc8efi stdout.txt stderr.txt m160_am1 m160_am1.rtf >>>> m160_am1.crd m160_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m160_am1 -fi mol2 -rn m160 -o m160_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-l8lc8efi stdout.txt stderr.txt m130_am1 m130_am1.rtf >>>> m130_am1.crd m130_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m130_am1 -fi mol2 -rn m130 -o m130_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-m8lc8efi stdout.txt stderr.txt m071_am1 m071_am1.rtf >>>> m071_am1.crd m071_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m071_am1 -fi mol2 -rn m071 -o m071_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-o8lc8efi stdout.txt stderr.txt m199_am1 m199_am1.rtf >>>> m199_am1.crd m199_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m199_am1 -fi mol2 -rn m199 -o m199_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-n8lc8efi stdout.txt stderr.txt m002_am1 m002_am1.rtf >>>> m002_am1.crd m002_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m002_am1 -fi mol2 -rn m002 -o m002_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-p8lc8efi stdout.txt stderr.txt m175_am1 m175_am1.rtf >>>> m175_am1.crd m175_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m175_am1 -fi mol2 -rn m175 -o m175_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-q8lc8efi stdout.txt stderr.txt m234_am1 m234_am1.rtf >>>> m234_am1.crd m234_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m234_am1 -fi mol2 -rn m234 -o m234_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-s8lc8efi stdout.txt stderr.txt m107_am1 m107_am1.rtf >>>> m107_am1.crd m107_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m107_am1 -fi mol2 -rn m107 -o m107_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-r8lc8efi stdout.txt stderr.txt m048_am1 m048_am1.rtf >>>> m048_am1.crd m048_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m048_am1 -fi mol2 -rn m048 -o m048_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-v8lc8efi stdout.txt stderr.txt m124_am1 m124_am1.rtf >>>> m124_am1.crd m124_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m124_am1 -fi mol2 -rn m124 -o m124_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-t8lc8efi stdout.txt stderr.txt m047_am1 m047_am1.rtf >>>> m047_am1.crd m047_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m047_am1 -fi mol2 -rn m047 -o m047_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-u8lc8efi stdout.txt stderr.txt m106_am1 m106_am1.rtf >>>> m106_am1.crd m106_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m106_am1 -fi mol2 -rn m106 -o m106_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-x8lc8efi stdout.txt stderr.txt m193_am1 m193_am1.rtf >>>> m193_am1.crd m193_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m193_am1 -fi mol2 -rn m193 -o m193_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-y8lc8efi stdout.txt stderr.txt m225_am1 m225_am1.rtf >>>> m225_am1.crd m225_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m225_am1 -fi mol2 -rn m225 -o m225_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-z8lc8efi stdout.txt stderr.txt m066_am1 m066_am1.rtf >>>> m066_am1.crd m066_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m066_am1 -fi mol2 -rn m066 -o m066_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-09lc8efi stdout.txt stderr.txt m125_am1 m125_am1.rtf >>>> m125_am1.crd m125_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m125_am1 -fi mol2 -rn m125 -o m125_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-29lc8efi stdout.txt stderr.txt m194_am1 m194_am1.rtf >>>> m194_am1.crd m194_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m194_am1 -fi mol2 -rn m194 -o m194_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-19lc8efi stdout.txt stderr.txt m176_am1 m176_am1.rtf >>>> m176_am1.crd m176_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m176_am1 -fi mol2 -rn m176 -o m176_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-39lc8efi stdout.txt stderr.txt m224_am1 m224_am1.rtf >>>> m224_am1.crd m224_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m224_am1 -fi mol2 -rn m224 -o m224_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-49lc8efi stdout.txt stderr.txt m235_am1 m235_am1.rtf >>>> m235_am1.crd m235_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m235_am1 -fi mol2 -rn m235 -o m235_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-69lc8efi stdout.txt stderr.txt m165_am1 m165_am1.rtf >>>> m165_am1.crd m165_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m165_am1 -fi mol2 -rn m165 -o m165_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-59lc8efi stdout.txt stderr.txt m067_am1 m067_am1.rtf >>>> m067_am1.crd m067_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m067_am1 -fi mol2 -rn m067 -o m067_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-79lc8efi stdout.txt stderr.txt m049_am1 m049_am1.rtf >>>> m049_am1.crd m049_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m049_am1 -fi mol2 -rn m049 -o m049_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-89lc8efi stdout.txt stderr.txt m126_am1 m126_am1.rtf >>>> m126_am1.crd m126_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m126_am1 -fi mol2 -rn m126 -o m126_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-99lc8efi stdout.txt stderr.txt m166_am1 m166_am1.rtf >>>> m166_am1.crd m166_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m166_am1 -fi mol2 -rn m166 -o m166_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-a9lc8efi stdout.txt stderr.txt m108_am1 m108_am1.rtf >>>> m108_am1.crd m108_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m108_am1 -fi mol2 -rn m108 -o m108_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-b9lc8efi stdout.txt stderr.txt m195_am1 m195_am1.rtf >>>> m195_am1.crd m195_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m195_am1 -fi mol2 -rn m195 -o m195_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-d9lc8efi stdout.txt stderr.txt m038_am1 m038_am1.rtf >>>> m038_am1.crd m038_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m038_am1 -fi mol2 -rn m038 -o m038_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-c9lc8efi stdout.txt stderr.txt m059_am1 m059_am1.rtf >>>> m059_am1.crd m059_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m059_am1 -fi mol2 -rn m059 -o m059_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-e9lc8efi stdout.txt stderr.txt m186_am1 m186_am1.rtf >>>> m186_am1.crd m186_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m186_am1 -fi mol2 -rn m186 -o m186_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-f9lc8efi stdout.txt stderr.txt m164_am1 m164_am1.rtf >>>> m164_am1.crd m164_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m164_am1 -fi mol2 -rn m164 -o m164_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-h9lc8efi stdout.txt stderr.txt m036_am1 m036_am1.rtf >>>> m036_am1.crd m036_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m036_am1 -fi mol2 -rn m036 -o m036_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-g9lc8efi stdout.txt stderr.txt m223_am1 m223_am1.rtf >>>> m223_am1.crd m223_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m223_am1 -fi mol2 -rn m223 -o m223_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-j9lc8efi stdout.txt stderr.txt m058_am1 m058_am1.rtf >>>> m058_am1.crd m058_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m058_am1 -fi mol2 -rn m058 -o m058_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-k9lc8efi stdout.txt stderr.txt m037_am1 m037_am1.rtf >>>> m037_am1.crd m037_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m037_am1 -fi mol2 -rn m037 -o m037_am1 -fo charmm -c bcc >>>> >>>> >>>> >>>> Veronika Nefedova wrote: >>>>> Swift thinks that it sent 248 jobs. >>>>> >>>>> nefedova at viper:~/alamines> grep "Running job " >>>>> MolDyn-244-loops-dbui34oxjr4j2.log | wc >>>>> 248 6931 56718 >>>>> nefedova at viper:~/alamines> >>>>> >>>>> On Aug 6, 2007, at 3:27 PM, Ioan Raicu wrote: >>>>> >>>>>> Everything is idle, there is no work to be done... >>>>>> >>>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> tail >>>>>> GenericPortalWS_perf_per_sec.txt >>>>>> 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> >>>>>> 24 workers are registered but idle.... queue length 0, 57 jobs >>>>>> completed. >>>>>> >>>>>> Also, see below all 57 jobs, they all finished with an exit code >>>>>> of 0, in other words succesfully! How many jobs does Swift think >>>>>> it sent? >>>>>> >>>>>> Ioan >>>>>> >>>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>>>>> GenericPortalWS_taskPerf.txt >>>>>> //taskNum taskID workerID startTimeStamp execTimeStamp >>>>>> resultsQueueTimeStamp endTimeStamp waitQueueTime ex >>>>>> ecTime resultsQueueTime totalTime exitCode >>>>>> 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 560614 >>>>>> 560629 49780 338 15 50133 0 >>>>>> 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 561200 >>>>>> 561899 561909 216 699 10 925 0 >>>>>> 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 561373 >>>>>> 562150 562159 382 777 9 1168 0 >>>>>> 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 1044916 >>>>>> 1044926 62404 10200 10 72614 0 >>>>>> 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 1046453 >>>>>> 1047038 1047067 135 585 29 749 0 >>>>>> 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 1046429 >>>>>> 1053072 1053080 114 6643 8 6765 0 >>>>>> 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 1047051 >>>>>> 1054256 1054290 731 7205 34 7970 0 >>>>>> 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 1054267 >>>>>> 1054570 1054579 7943 303 9 8255 0 >>>>>> 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 1053087 >>>>>> 1056811 1056819 6765 3724 8 10497 0 >>>>>> 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 1054583 >>>>>> 1058691 1058719 8257 4108 28 12393 0 >>>>>> 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 1058704 >>>>>> 1059363 1059385 12373 659 22 13054 0 >>>>>> 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 1056826 >>>>>> 1060315 1060323 10497 3489 8 13994 0 >>>>>> 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 1059375 >>>>>> 1060589 1060596 13042 1214 7 14263 0 >>>>>> 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 1060603 >>>>>> 1060954 1061054 14265 351 100 14716 0 >>>>>> 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 1060329 >>>>>> 1061094 1061126 13993 765 32 14790 0 >>>>>> 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 1061105 >>>>>> 1065608 1065617 14414 4503 9 18926 0 >>>>>> 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 1065622 >>>>>> 1066307 1066315 18929 685 8 19622 0 >>>>>> 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 1061045 >>>>>> 1067540 1067563 14356 6495 23 20874 0 >>>>>> 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 1066320 >>>>>> 1069262 1069271 19625 2942 9 22576 0 >>>>>> 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 1067551 >>>>>> 1071003 1071011 20854 3452 8 24314 0 >>>>>> 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 1071016 >>>>>> 1071664 1071671 24316 648 7 24971 0 >>>>>> 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 1069275 >>>>>> 1071679 1071692 22577 2404 13 24994 0 >>>>>> 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 1071687 >>>>>> 1073978 1073988 24985 2291 10 27286 0 >>>>>> 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 1073992 >>>>>> 1075959 1075969 27286 1967 10 29263 0 >>>>>> 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 1071699 >>>>>> 1076704 1076713 24995 5005 9 30009 0 >>>>>> 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 1075972 >>>>>> 1077451 1077459 29264 1479 8 30751 0 >>>>>> 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 1076717 >>>>>> 1080157 1080165 30007 3440 8 33455 0 >>>>>> 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 1077464 >>>>>> 1080270 1080286 30752 2806 16 33574 0 >>>>>> 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 1080170 >>>>>> 1080611 1080619 33457 441 8 33906 0 >>>>>> 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 1080624 >>>>>> 1080973 1080983 33907 349 10 34266 0 >>>>>> 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 1080281 >>>>>> 1081405 1081413 33566 1124 8 34698 0 >>>>>> 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 1080986 >>>>>> 1082989 1082996 34267 2003 7 36277 0 >>>>>> 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 1083002 >>>>>> 1083370 1083378 36279 368 8 36655 0 >>>>>> 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 1081417 >>>>>> 1084830 1084837 34696 3413 7 38116 0 >>>>>> 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 1084843 >>>>>> 1085854 1085879 37761 1011 25 38797 0 >>>>>> 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 1085865 >>>>>> 1089502 1089511 38780 3637 9 42426 0 >>>>>> 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 1089515 >>>>>> 1089966 1089974 42428 451 8 42887 0 >>>>>> 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 1083383 >>>>>> 1091316 1091324 36658 7933 8 44599 0 >>>>>> 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 1091329 >>>>>> 1092042 1092049 44237 713 7 44957 0 >>>>>> 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 1092055 >>>>>> 1094242 1094249 44960 2187 7 47154 0 >>>>>> 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 1089979 >>>>>> 1094418 1094428 42889 4439 10 47338 0 >>>>>> 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 1094433 >>>>>> 1095082 1095089 47331 649 7 47987 0 >>>>>> 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 1095095 >>>>>> 1096846 1096853 47991 1751 7 49749 0 >>>>>> 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 1094256 >>>>>> 1098214 1098221 47156 3958 7 51121 0 >>>>>> 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 1096859 >>>>>> 1098627 1098637 49752 1768 10 51530 0 >>>>>> 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 1094037 >>>>>> 1098903 1098910 46940 4866 7 51813 0 >>>>>> 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 1099192 >>>>>> 1100210 1100246 52071 1018 36 53125 0 >>>>>> 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 1097371 >>>>>> 1100555 1100562 50260 3184 7 53451 0 >>>>>> 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 1097135 >>>>>> 1100896 1100904 50026 3761 8 53795 0 >>>>>> 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 1098640 >>>>>> 1101106 1101127 51523 2466 21 54010 0 >>>>>> 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 1099965 >>>>>> 1101217 1101224 52842 1252 7 54101 0 >>>>>> 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 1098227 >>>>>> 1101820 1101828 51112 3593 8 54713 0 >>>>>> 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 1097375 >>>>>> 1104132 1104139 50262 6757 7 57026 0 >>>>>> 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 1100221 >>>>>> 1106449 1106458 53096 6228 9 59333 0 >>>>>> 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 1098916 >>>>>> 1106473 1106481 51797 7557 8 59362 0 >>>>>> 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 563384 >>>>>> 1207793 1207801 71 644409 8 644488 0 >>>>>> 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 563413 >>>>>> 1216404 1216425 98 652991 21 653110 0 >>>>>> >>>>>> >>>>>> >>>>>> Veronika Nefedova wrote: >>>>>>> OK. There is something weird happening. I've got several such >>>>>>> entries in my swift log: >>>>>>> >>>>>>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application >>>>>>> exception: Task failed >>>>>>> task:execute @ vdl-int.k, line: 332 >>>>>>> vdl:execute2 @ execute-default.k, line: 22 >>>>>>> vdl:execute @ MolDyn-244-loops.kml, line: 20 >>>>>>> antchmbr @ MolDyn-244-loops.kml, line: 2845 >>>>>>> vdl:mains @ MolDyn-244-loops.kml, line: 2267 >>>>>>> >>>>>>> >>>>>>> Looks like antechamber has failed (?). And the failure is only >>>>>>> on a swfit side, it never made it across to Falcon (there are no >>>>>>> remote directories created). But I see some of antechamber jobs >>>>>>> have finished (in shared). >>>>>>> >>>>>>> Yuqing -- could the changes you've made be responsible for these >>>>>>> failures (I do not see how it could though) ? >>>>>>> >>>>>>> Ioan, what do you see in your logs ion these tasks: >>>>>>> >>>>>>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, >>>>>>> identity=urn:0-1-56-0-1186429255786) setting status to Failed >>>>>>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, >>>>>>> identity=urn:0-1-57-0-1186429255798) setting status to Failed >>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>> identity=urn:0-1-59-0-1186429255800) setting status to Failed >>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>> identity=urn:0-1-60-0-1186429255805) setting status to Failed >>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>> identity=urn:0-1-61-0-1186429255811) setting status to Failed >>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>> identity=urn:0-1-58-0-1186429255814) setting status to Failed >>>>>>> >>>>>>> Nika >>>>>>> >>>>>>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >>>>>>> >>>>>>>> OK! >>>>>>>> Why don't we do one last run from my allocation, as everything >>>>>>>> is set up already and ready to go! Make sure to enable all >>>>>>>> debug logging. Falkon is up and running with all debug enabled! >>>>>>>> >>>>>>>> Falkon location is unchanged from the last experiment. >>>>>>>> Falkon Factory Service: >>>>>>>> http://tg-viz-login2:50010/wsrf/services/GenericPortal/core/WS/GPFactoryService >>>>>>>> >>>>>>>> Web Server (graphs): >>>>>>>> http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>>>>>> >>>>>>>> ANL/UC is not quite so idle as it was earlier, but I bet we >>>>>>>> could still get 150~200 processors! >>>>>>>> >>>>>>>> Ioan >>>>>>>> >>>>>>>> Veronika Nefedova wrote: >>>>>>>>> m050 and m179 finished just fine now via GRAM (thanks to >>>>>>>>> Yuqing who fixed the m179 just in time!). We could start again >>>>>>>>> the 244- molecule run to verify that nothing is wrong with the >>>>>>>>> whole system. >>>>>>>>> >>>>>>>>> Nika >>>>>>>>> >>>>>>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I started those 2 molecules via GRAM. I have no trust in m179 >>>>>>>>>> finishing completely since I didn't change anything. I hope >>>>>>>>>> for m050 to finish though... >>>>>>>>>> You can watch the swift log on viper in >>>>>>>>>> ~nefedova/alamines/MolDyn-2-loops-be9484k93kk21.log >>>>>>>>>> >>>>>>>>>> Nika >>>>>>>>>> >>>>>>>>>>> Then, let's try another run with 244 molecules soon, as most >>>>>>>>>>> of ANL/UC is free! >>>>>>>>>>> >>>>>>>>>>> Ioan >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > From iraicu at cs.uchicago.edu Mon Aug 6 18:04:57 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 18:04:57 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> Message-ID: <46B7A919.1050507@cs.uchicago.edu> OK, I restarted Falkon as well as there were 12K jobs trying to go through, and keeping the entire ANL/UC site busy, although there was no Swift on the other end to pick up the notifications... here is the new info: Falkon Factory Service: http://tg-viz-login2:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService Web server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm Note that I changed the port #, its now 50020, so don't forget to change that before you start Swift... Ioan Veronika Nefedova wrote: > OK. I accidentally closed viper window where I started the workflow. > The workflow was started with & so it was supposed to stay up even if > I exited the shell. But apparently it didn't! > > This is the last entry in the log: > > 2007-08-06 17:16:59,483 INFO ResourcePool Destroying remote service > instance... dummy function, this doesn't really do anything... > > (and it doesn't change ever since). > > What went wrong ? Why closing the shell actually killed the job? (ps > shows no swift job) > I checked 'history' and in fact the job was started with &: > > 999 swift -tc.file tc-uc.data -sites.file sites-uc-64.xml -debug > MolDyn-244-loops.swift & > > I'll restart the workflow in 30 mins or so (from home) again. > > Sigh... > > Nika > > > On Aug 6, 2007, at 4:29 PM, Veronika Nefedova wrote: > >> Ioan, its all was due to NFS problems, I am convinced now... >> >> I restarted the run, the log is >> ~nefedova/alamines/MolDyn-244-loops-hxl1glhtqsag0.log >> >> Nika >> >> On Aug 6, 2007, at 4:20 PM, Ioan Raicu wrote: >> >>> Just to debug further.... I picked out 1 task at random from the >>> Swift log... >>> iraicu at viper:/home/nefedova/alamines> cat >>> MolDyn-244-loops-dbui34oxjr4j2.log | grep "urn:0-1-62-0-1186429258791" >>> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, >>> identity=urn:0-1-62-0-1186429258791) setting status to Submitted >>> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, >>> identity=urn:0-1-62-0-1186429258791) setting status to Active >>> 2007-08-06 14:47:03,704 DEBUG TaskImpl Task(type=2, >>> identity=urn:0-1-62-0-1186429258791) setting status to Failed >>> Exception in getFile >>> >>> but in my log, it is nowhere to be found... >>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>> GenericPortalWS_taskPerf.txt | grep "urn:0-1-62-0-1186429258791" >>> >>> What does "setting status to Failed Exception in getFile" mean? >>> Could this mean that it failed on the data staging part, and that it >>> never made it to Falkon? >>> >>> BTW, it lloks as if there were really 539 jobs submitted... >>> >>> iraicu at viper:/home/nefedova/alamines> grep "Submitted" >>> MolDyn-244-loops-dbui34oxjr4j2.log | wc >>> 539 5390 62835 >>> >>> but again, only 57 made it to Falkon, and there were no exceptions >>> thrown anywhere to indicate that something unusual happened. >>> >>> Ioan >>> >>> Ioan Raicu wrote: >>>> Falkon only has 57 tasks received, here they are: >>>> tg-viz-login.uc.teragrid.org:/home/iraicu/java/Falkon_v0.8.1/service/logs/GenericPortalWS.txt.0.summary >>>> >>>> >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> pre_ch-vsk58efi stdout.txt stderr.txt . ./m179.mol2 ./m050.mol2 >>>> m179_am1 m050_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-xsk58efi stdout.txt stderr.txt m179_am1 m179_am1.rtf >>>> m179_am1.crd m179_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m179_am1 -fi mol2 -rn m179 -o m179_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-ysk58efi stdout.txt stderr.txt m050_am1 m050_am1.rtf >>>> m050_am1.crd m050_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m050_am1 -fi mol2 -rn m050 -o m050_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> chrm-0tk58efi equil_solv.out_m050 stderr.txt equil_solv.inp >>>> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp >>>> m050_am1.rtf m050_am1.prm m050_am1.crd water_400.crd >>>> equil_solv.out_m050 solv_m050.psf solv_m050_eq.crd solv_m050.rst >>>> solv_m050.trj solv_m050_min.crd >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/charmm.sh system:solv_m050 >>>> title:solv stitle:m050 rtffile:parm03_gaff_all.rtf >>>> paramfile:parm03_gaffnb_all.prm gaff:m050_am1 nwater:400 ligcrd:lyz >>>> rforce:0 iseed:3131887 rwater:15 nstep:10000 minstep:100 >>>> skipstep:100 startstep:10000 >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> chrm-zsk58efi equil_solv.out_m179 stderr.txt equil_solv.inp >>>> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp >>>> m179_am1.rtf m179_am1.prm m179_am1.crd water_400.crd >>>> equil_solv.out_m179 solv_m179.psf solv_m179_eq.crd solv_m179.rst >>>> solv_m179.trj solv_m179_min.crd >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/charmm.sh system:solv_m179 >>>> title:solv stitle:m179 rtffile:parm03_gaff_all.rtf >>>> paramfile:parm03_gaffnb_all.prm gaff:m179_am1 nwater:400 ligcrd:lyz >>>> rforce:0 iseed:3131887 rwater:15 nstep:10000 minstep:100 >>>> skipstep:100 startstep:10000 >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> pre_ch-38lc8efi stdout.txt stderr.txt . ./m197.mol2 ./m129.mol2 >>>> ./m069.mol2 ./m163.mol2 ./m128.mol2 ./m035.mol2 ./m070.mol2 >>>> ./m221.mol2 ./m162.mol2 ./m198.mol2 ./m034.mol2 ./m001.mol2 >>>> ./m220.mol2 ./m033.mol2 ./m161.mol2 ./m032.mol2 ./m160.mol2 >>>> ./m130.mol2 ./m071.mol2 ./m002.mol2 ./m199.mol2 ./m175.mol2 >>>> ./m234.mol2 ./m048.mol2 ./m107.mol2 ./m047.mol2 ./m106.mol2 >>>> ./m124.mol2 ./m193.mol2 ./m225.mol2 ./m066.mol2 ./m125.mol2 >>>> ./m176.mol2 ./m194.mol2 ./m224.mol2 ./m235.mol2 ./m067.mol2 >>>> ./m165.mol2 ./m049.mol2 ./m126.mol2 ./m166.mol2 ./m108.mol2 >>>> ./m195.mol2 ./m038.mol2 ./m059.mol2 ./m036.mol2 ./m186.mol2 >>>> ./m164.mol2 ./m117.mol2 ./m223.mol2 ./m058.mol2 ./m037.mol2 >>>> ./m188.mol2 ./m068.mol2 ./m119.mol2 ./m187.mol2 ./m196.mol2 >>>> ./m118.mol2 ./m127.mol2 ./m222.mol2 ./m189.mol2 ./m060.mol2 >>>> ./m236.mol2 ./m109.mol2 ./m177.mol2 ./m050.mol2 ./m179.mol2 >>>> ./m178.mol2 ./m123.mol2 ./m237.mol2 ./m110.mol2 ./m191.mol2 >>>> ./m100.mol2 ./m064.mol2 ./m041.mol2 ./m238.mol2 ./m063.mol2 >>>> ./m228.mol2 ./m051.mol2 ./m122.mol2 ./m169.mol2 ./m121.mol2 >>>> ./m190.mol2 ./m120.mol2 ./m062.mol2 ./m065.mol2 ./m039.mol2 >>>> ./m192.mol2 ./m167.mol2 ./m227.mol2 ./m040.mol2 ./m226.mol2 >>>> ./m168.mol2 ./m239.mol2 ./m052.mol2 ./m111.mol2 ./m180.mol2 >>>> ./m053.mol2 ./m112.mol2 ./m181.mol2 ./m240.mol2 ./m054.mol2 >>>> ./m044.mol2 ./m113.mol2 ./m230.mol2 ./m103.mol2 ./m229.mol2 >>>> ./m061.mol2 ./m042.mol2 ./m101.mol2 ./m170.mol2 ./m043.mol2 >>>> ./m102.mol2 ./m171.mol2 ./m151.mol2 ./m083.mol2 ./m210.mol2 >>>> ./m014.mol2 ./m023.mol2 ./m200.mol2 ./m092.mol2 ./m091.mol2 >>>> ./m150.mol2 ./m209.mol2 ./m022.mol2 ./m024.mol2 ./m093.mol2 >>>> ./m015.mol2 ./m084.mol2 ./m142.mol2 ./m201.mol2 ./m016.mol2 >>>> ./m085.mol2 ./m143.mol2 ./m202.mol2 ./m010.mol2 ./m212.mol2 >>>> ./m138.mol2 ./m026.mol2 ./m011.mol2 ./m095.mol2 ./m139.mol2 >>>> ./m154.mol2 ./m211.mol2 ./m025.mol2 ./m094.mol2 ./m153.mol2 >>>> ./m213.mol2 ./m080.mol2 ./m012.mol2 ./m152.mol2 ./m081.mol2 >>>> ./m140.mol2 ./m013.mol2 ./m082.mol2 ./m141.mol2 ./m028.mol2 >>>> ./m097.mol2 ./m155.mol2 ./m008.mol2 ./m214.mol2 ./m135.mol2 >>>> ./m029.mol2 ./m076.mol2 ./m098.mol2 ./m007.mol2 ./m156.mol2 >>>> ./m134.mol2 ./m215.mol2 ./m137.mol2 ./m079.mol2 ./m009.mol2 >>>> ./m078.mol2 ./m077.mol2 ./m096.mol2 ./m136.mol2 ./m027.mol2 >>>> ./m132.mol2 ./m158.mol2 ./m073.mol2 ./m217.mol2 ./m030.mol2 >>>> ./m159.mol2 ./m072.mol2 ./m218.mol2 ./m003.mol2 ./m031.mol2 >>>> ./m004.mol2 ./m219.mol2 ./m131.mol2 ./m074.mol2 ./m133.mol2 >>>> ./m006.mol2 ./m075.mol2 ./m157.mol2 ./m099.mol2 ./m005.mol2 >>>> ./m216.mol2 ./m090.mol2 ./m021.mol2 ./m208.mol2 ./m149.mol2 >>>> ./m020.mol2 ./m207.mol2 ./m148.mol2 ./m088.mol2 ./m089.mol2 >>>> ./m206.mol2 ./m147.mol2 ./m019.mol2 ./m205.mol2 ./m146.mol2 >>>> ./m087.mol2 ./m018.mol2 ./m204.mol2 ./m145.mol2 ./m086.mol2 >>>> ./m017.mol2 ./m144.mol2 ./m203.mol2 ./m057.mol2 ./m116.mol2 >>>> ./m232.mol2 ./m173.mol2 ./m105.mol2 ./m046.mol2 ./m231.mol2 >>>> ./m172.mol2 ./m104.mol2 ./m045.mol2 ./m174.mol2 ./m233.mol2 >>>> ./m244.mol2 ./m185.mol2 ./m182.mol2 ./m243.mol2 ./m055.mol2 >>>> ./m241.mol2 ./m183.mol2 ./m114.mol2 ./m056.mol2 ./m242.mol2 >>>> ./m184.mol2 ./m115.mol2 m197_am1 m129_am1 m069_am1 m163_am1 >>>> m128_am1 m035_am1 m070_am1 m221_am1 m162_am1 m198_am1 m034_am1 >>>> m001_am1 m220_am1 m033_am1 m161_am1 m032_am1 m160_am1 m130_am1 >>>> m071_am1 m002_am1 m199_am1 m175_am1 m234_am1 m048_am1 m107_am1 >>>> m047_am1 m106_am1 m124_am1 m193_am1 m225_am1 m066_am1 m125_am1 >>>> m176_am1 m194_am1 m224_am1 m235_am1 m067_am1 m165_am1 m049_am1 >>>> m126_am1 m166_am1 m108_am1 m195_am1 m038_am1 m059_am1 m036_am1 >>>> m186_am1 m164_am1 m223_am1 m117_am1 m037_am1 m058_am1 m068_am1 >>>> m188_am1 m119_am1 m196_am1 m187_am1 m222_am1 m127_am1 m118_am1 >>>> m189_am1 m060_am1 m236_am1 m109_am1 m177_am1 m050_am1 m179_am1 >>>> m123_am1 m178_am1 m237_am1 m100_am1 m191_am1 m110_am1 m041_am1 >>>> m064_am1 m228_am1 m063_am1 m238_am1 m169_am1 m122_am1 m051_am1 >>>> m121_am1 m190_am1 m120_am1 m062_am1 m039_am1 m065_am1 m167_am1 >>>> m192_am1 m227_am1 m040_am1 m226_am1 m168_am1 m239_am1 m052_am1 >>>> m111_am1 m180_am1 m053_am1 m112_am1 m181_am1 m240_am1 m054_am1 >>>> m044_am1 m113_am1 m230_am1 m103_am1 m229_am1 m061_am1 m042_am1 >>>> m101_am1 m170_am1 m043_am1 m102_am1 m171_am1 m151_am1 m083_am1 >>>> m210_am1 m014_am1 m023_am1 m200_am1 m092_am1 m091_am1 m150_am1 >>>> m209_am1 m022_am1 m024_am1 m093_am1 m015_am1 m084_am1 m142_am1 >>>> m201_am1 m016_am1 m085_am1 m143_am1 m202_am1 m010_am1 m212_am1 >>>> m138_am1 m026_am1 m011_am1 m095_am1 m139_am1 m154_am1 m211_am1 >>>> m025_am1 m094_am1 m153_am1 m213_am1 m080_am1 m012_am1 m152_am1 >>>> m081_am1 m140_am1 m013_am1 m082_am1 m141_am1 m028_am1 m097_am1 >>>> m155_am1 m008_am1 m214_am1 m135_am1 m029_am1 m076_am1 m098_am1 >>>> m007_am1 m156_am1 m134_am1 m215_am1 m137_am1 m079_am1 m009_am1 >>>> m078_am1 m077_am1 m096_am1 m136_am1 m027_am1 m132_am1 m158_am1 >>>> m073_am1 m217_am1 m030_am1 m159_am1 m072_am1 m218_am1 m003_am1 >>>> m031_am1 m004_am1 m219_am1 m131_am1 m074_am1 m133_am1 m006_am1 >>>> m075_am1 m157_am1 m099_am1 m216_am1 m005_am1 m090_am1 m021_am1 >>>> m208_am1 m149_am1 m020_am1 m207_am1 m148_am1 m089_am1 m088_am1 >>>> m206_am1 m147_am1 m019_am1 m205_am1 m146_am1 m087_am1 m018_am1 >>>> m204_am1 m145_am1 m086_am1 m017_am1 m144_am1 m203_am1 m057_am1 >>>> m116_am1 m232_am1 m173_am1 m105_am1 m046_am1 m231_am1 m172_am1 >>>> m104_am1 m045_am1 m174_am1 m233_am1 m244_am1 m185_am1 m182_am1 >>>> m243_am1 m055_am1 m241_am1 m183_am1 m114_am1 m056_am1 m242_am1 >>>> m184_am1 m115_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-58lc8efi stdout.txt stderr.txt m197_am1 m197_am1.rtf >>>> m197_am1.crd m197_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m197_am1 -fi mol2 -rn m197 -o m197_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-48lc8efi stdout.txt stderr.txt m129_am1 m129_am1.rtf >>>> m129_am1.crd m129_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m129_am1 -fi mol2 -rn m129 -o m129_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-68lc8efi stdout.txt stderr.txt m069_am1 m069_am1.rtf >>>> m069_am1.crd m069_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m069_am1 -fi mol2 -rn m069 -o m069_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-88lc8efi stdout.txt stderr.txt m163_am1 m163_am1.rtf >>>> m163_am1.crd m163_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m163_am1 -fi mol2 -rn m163 -o m163_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-78lc8efi stdout.txt stderr.txt m128_am1 m128_am1.rtf >>>> m128_am1.crd m128_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m128_am1 -fi mol2 -rn m128 -o m128_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-98lc8efi stdout.txt stderr.txt m035_am1 m035_am1.rtf >>>> m035_am1.crd m035_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m035_am1 -fi mol2 -rn m035 -o m035_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-a8lc8efi stdout.txt stderr.txt m070_am1 m070_am1.rtf >>>> m070_am1.crd m070_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m070_am1 -fi mol2 -rn m070 -o m070_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-b8lc8efi stdout.txt stderr.txt m221_am1 m221_am1.rtf >>>> m221_am1.crd m221_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m221_am1 -fi mol2 -rn m221 -o m221_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-c8lc8efi stdout.txt stderr.txt m162_am1 m162_am1.rtf >>>> m162_am1.crd m162_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m162_am1 -fi mol2 -rn m162 -o m162_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-d8lc8efi stdout.txt stderr.txt m198_am1 m198_am1.rtf >>>> m198_am1.crd m198_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m198_am1 -fi mol2 -rn m198 -o m198_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-e8lc8efi stdout.txt stderr.txt m034_am1 m034_am1.rtf >>>> m034_am1.crd m034_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m034_am1 -fi mol2 -rn m034 -o m034_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-f8lc8efi stdout.txt stderr.txt m001_am1 m001_am1.rtf >>>> m001_am1.crd m001_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m001_am1 -fi mol2 -rn m001 -o m001_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-h8lc8efi stdout.txt stderr.txt m033_am1 m033_am1.rtf >>>> m033_am1.crd m033_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m033_am1 -fi mol2 -rn m033 -o m033_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-g8lc8efi stdout.txt stderr.txt m220_am1 m220_am1.rtf >>>> m220_am1.crd m220_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m220_am1 -fi mol2 -rn m220 -o m220_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-i8lc8efi stdout.txt stderr.txt m161_am1 m161_am1.rtf >>>> m161_am1.crd m161_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m161_am1 -fi mol2 -rn m161 -o m161_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-j8lc8efi stdout.txt stderr.txt m032_am1 m032_am1.rtf >>>> m032_am1.crd m032_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m032_am1 -fi mol2 -rn m032 -o m032_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-k8lc8efi stdout.txt stderr.txt m160_am1 m160_am1.rtf >>>> m160_am1.crd m160_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m160_am1 -fi mol2 -rn m160 -o m160_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-l8lc8efi stdout.txt stderr.txt m130_am1 m130_am1.rtf >>>> m130_am1.crd m130_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m130_am1 -fi mol2 -rn m130 -o m130_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-m8lc8efi stdout.txt stderr.txt m071_am1 m071_am1.rtf >>>> m071_am1.crd m071_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m071_am1 -fi mol2 -rn m071 -o m071_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-o8lc8efi stdout.txt stderr.txt m199_am1 m199_am1.rtf >>>> m199_am1.crd m199_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m199_am1 -fi mol2 -rn m199 -o m199_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-n8lc8efi stdout.txt stderr.txt m002_am1 m002_am1.rtf >>>> m002_am1.crd m002_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m002_am1 -fi mol2 -rn m002 -o m002_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-p8lc8efi stdout.txt stderr.txt m175_am1 m175_am1.rtf >>>> m175_am1.crd m175_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m175_am1 -fi mol2 -rn m175 -o m175_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-q8lc8efi stdout.txt stderr.txt m234_am1 m234_am1.rtf >>>> m234_am1.crd m234_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m234_am1 -fi mol2 -rn m234 -o m234_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-s8lc8efi stdout.txt stderr.txt m107_am1 m107_am1.rtf >>>> m107_am1.crd m107_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m107_am1 -fi mol2 -rn m107 -o m107_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-r8lc8efi stdout.txt stderr.txt m048_am1 m048_am1.rtf >>>> m048_am1.crd m048_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m048_am1 -fi mol2 -rn m048 -o m048_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-v8lc8efi stdout.txt stderr.txt m124_am1 m124_am1.rtf >>>> m124_am1.crd m124_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m124_am1 -fi mol2 -rn m124 -o m124_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-t8lc8efi stdout.txt stderr.txt m047_am1 m047_am1.rtf >>>> m047_am1.crd m047_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m047_am1 -fi mol2 -rn m047 -o m047_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-u8lc8efi stdout.txt stderr.txt m106_am1 m106_am1.rtf >>>> m106_am1.crd m106_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m106_am1 -fi mol2 -rn m106 -o m106_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-x8lc8efi stdout.txt stderr.txt m193_am1 m193_am1.rtf >>>> m193_am1.crd m193_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m193_am1 -fi mol2 -rn m193 -o m193_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-y8lc8efi stdout.txt stderr.txt m225_am1 m225_am1.rtf >>>> m225_am1.crd m225_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m225_am1 -fi mol2 -rn m225 -o m225_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-z8lc8efi stdout.txt stderr.txt m066_am1 m066_am1.rtf >>>> m066_am1.crd m066_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m066_am1 -fi mol2 -rn m066 -o m066_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-09lc8efi stdout.txt stderr.txt m125_am1 m125_am1.rtf >>>> m125_am1.crd m125_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m125_am1 -fi mol2 -rn m125 -o m125_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-29lc8efi stdout.txt stderr.txt m194_am1 m194_am1.rtf >>>> m194_am1.crd m194_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m194_am1 -fi mol2 -rn m194 -o m194_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-19lc8efi stdout.txt stderr.txt m176_am1 m176_am1.rtf >>>> m176_am1.crd m176_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m176_am1 -fi mol2 -rn m176 -o m176_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-39lc8efi stdout.txt stderr.txt m224_am1 m224_am1.rtf >>>> m224_am1.crd m224_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m224_am1 -fi mol2 -rn m224 -o m224_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-49lc8efi stdout.txt stderr.txt m235_am1 m235_am1.rtf >>>> m235_am1.crd m235_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m235_am1 -fi mol2 -rn m235 -o m235_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-69lc8efi stdout.txt stderr.txt m165_am1 m165_am1.rtf >>>> m165_am1.crd m165_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m165_am1 -fi mol2 -rn m165 -o m165_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-59lc8efi stdout.txt stderr.txt m067_am1 m067_am1.rtf >>>> m067_am1.crd m067_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m067_am1 -fi mol2 -rn m067 -o m067_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-79lc8efi stdout.txt stderr.txt m049_am1 m049_am1.rtf >>>> m049_am1.crd m049_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m049_am1 -fi mol2 -rn m049 -o m049_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-89lc8efi stdout.txt stderr.txt m126_am1 m126_am1.rtf >>>> m126_am1.crd m126_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m126_am1 -fi mol2 -rn m126 -o m126_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-99lc8efi stdout.txt stderr.txt m166_am1 m166_am1.rtf >>>> m166_am1.crd m166_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m166_am1 -fi mol2 -rn m166 -o m166_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-a9lc8efi stdout.txt stderr.txt m108_am1 m108_am1.rtf >>>> m108_am1.crd m108_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m108_am1 -fi mol2 -rn m108 -o m108_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-b9lc8efi stdout.txt stderr.txt m195_am1 m195_am1.rtf >>>> m195_am1.crd m195_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m195_am1 -fi mol2 -rn m195 -o m195_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-d9lc8efi stdout.txt stderr.txt m038_am1 m038_am1.rtf >>>> m038_am1.crd m038_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m038_am1 -fi mol2 -rn m038 -o m038_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-c9lc8efi stdout.txt stderr.txt m059_am1 m059_am1.rtf >>>> m059_am1.crd m059_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m059_am1 -fi mol2 -rn m059 -o m059_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-e9lc8efi stdout.txt stderr.txt m186_am1 m186_am1.rtf >>>> m186_am1.crd m186_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m186_am1 -fi mol2 -rn m186 -o m186_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-f9lc8efi stdout.txt stderr.txt m164_am1 m164_am1.rtf >>>> m164_am1.crd m164_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m164_am1 -fi mol2 -rn m164 -o m164_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-h9lc8efi stdout.txt stderr.txt m036_am1 m036_am1.rtf >>>> m036_am1.crd m036_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m036_am1 -fi mol2 -rn m036 -o m036_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-g9lc8efi stdout.txt stderr.txt m223_am1 m223_am1.rtf >>>> m223_am1.crd m223_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m223_am1 -fi mol2 -rn m223 -o m223_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-j9lc8efi stdout.txt stderr.txt m058_am1 m058_am1.rtf >>>> m058_am1.crd m058_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m058_am1 -fi mol2 -rn m058 -o m058_am1 -fo charmm -c bcc >>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>> antch-k9lc8efi stdout.txt stderr.txt m037_am1 m037_am1.rtf >>>> m037_am1.crd m037_am1.prm >>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>> m037_am1 -fi mol2 -rn m037 -o m037_am1 -fo charmm -c bcc >>>> >>>> >>>> >>>> Veronika Nefedova wrote: >>>>> Swift thinks that it sent 248 jobs. >>>>> >>>>> nefedova at viper:~/alamines> grep "Running job " >>>>> MolDyn-244-loops-dbui34oxjr4j2.log | wc >>>>> 248 6931 56718 >>>>> nefedova at viper:~/alamines> >>>>> >>>>> On Aug 6, 2007, at 3:27 PM, Ioan Raicu wrote: >>>>> >>>>>> Everything is idle, there is no work to be done... >>>>>> >>>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> tail >>>>>> GenericPortalWS_perf_per_sec.txt >>>>>> 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>> >>>>>> 24 workers are registered but idle.... queue length 0, 57 jobs >>>>>> completed. >>>>>> >>>>>> Also, see below all 57 jobs, they all finished with an exit code >>>>>> of 0, in other words succesfully! How many jobs does Swift think >>>>>> it sent? >>>>>> >>>>>> Ioan >>>>>> >>>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>>>>> GenericPortalWS_taskPerf.txt >>>>>> //taskNum taskID workerID startTimeStamp execTimeStamp >>>>>> resultsQueueTimeStamp endTimeStamp waitQueueTime ex >>>>>> ecTime resultsQueueTime totalTime exitCode >>>>>> 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 560614 >>>>>> 560629 49780 338 15 50133 0 >>>>>> 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 561200 >>>>>> 561899 561909 216 699 10 925 0 >>>>>> 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 561373 >>>>>> 562150 562159 382 777 9 1168 0 >>>>>> 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 1044916 >>>>>> 1044926 62404 10200 10 72614 0 >>>>>> 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 1046453 >>>>>> 1047038 1047067 135 585 29 749 0 >>>>>> 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 1046429 >>>>>> 1053072 1053080 114 6643 8 6765 0 >>>>>> 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 1047051 >>>>>> 1054256 1054290 731 7205 34 7970 0 >>>>>> 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 1054267 >>>>>> 1054570 1054579 7943 303 9 8255 0 >>>>>> 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 1053087 >>>>>> 1056811 1056819 6765 3724 8 10497 0 >>>>>> 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 1054583 >>>>>> 1058691 1058719 8257 4108 28 12393 0 >>>>>> 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 1058704 >>>>>> 1059363 1059385 12373 659 22 13054 0 >>>>>> 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 1056826 >>>>>> 1060315 1060323 10497 3489 8 13994 0 >>>>>> 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 1059375 >>>>>> 1060589 1060596 13042 1214 7 14263 0 >>>>>> 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 1060603 >>>>>> 1060954 1061054 14265 351 100 14716 0 >>>>>> 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 1060329 >>>>>> 1061094 1061126 13993 765 32 14790 0 >>>>>> 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 1061105 >>>>>> 1065608 1065617 14414 4503 9 18926 0 >>>>>> 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 1065622 >>>>>> 1066307 1066315 18929 685 8 19622 0 >>>>>> 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 1061045 >>>>>> 1067540 1067563 14356 6495 23 20874 0 >>>>>> 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 1066320 >>>>>> 1069262 1069271 19625 2942 9 22576 0 >>>>>> 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 1067551 >>>>>> 1071003 1071011 20854 3452 8 24314 0 >>>>>> 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 1071016 >>>>>> 1071664 1071671 24316 648 7 24971 0 >>>>>> 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 1069275 >>>>>> 1071679 1071692 22577 2404 13 24994 0 >>>>>> 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 1071687 >>>>>> 1073978 1073988 24985 2291 10 27286 0 >>>>>> 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 1073992 >>>>>> 1075959 1075969 27286 1967 10 29263 0 >>>>>> 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 1071699 >>>>>> 1076704 1076713 24995 5005 9 30009 0 >>>>>> 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 1075972 >>>>>> 1077451 1077459 29264 1479 8 30751 0 >>>>>> 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 1076717 >>>>>> 1080157 1080165 30007 3440 8 33455 0 >>>>>> 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 1077464 >>>>>> 1080270 1080286 30752 2806 16 33574 0 >>>>>> 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 1080170 >>>>>> 1080611 1080619 33457 441 8 33906 0 >>>>>> 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 1080624 >>>>>> 1080973 1080983 33907 349 10 34266 0 >>>>>> 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 1080281 >>>>>> 1081405 1081413 33566 1124 8 34698 0 >>>>>> 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 1080986 >>>>>> 1082989 1082996 34267 2003 7 36277 0 >>>>>> 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 1083002 >>>>>> 1083370 1083378 36279 368 8 36655 0 >>>>>> 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 1081417 >>>>>> 1084830 1084837 34696 3413 7 38116 0 >>>>>> 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 1084843 >>>>>> 1085854 1085879 37761 1011 25 38797 0 >>>>>> 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 1085865 >>>>>> 1089502 1089511 38780 3637 9 42426 0 >>>>>> 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 1089515 >>>>>> 1089966 1089974 42428 451 8 42887 0 >>>>>> 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 1083383 >>>>>> 1091316 1091324 36658 7933 8 44599 0 >>>>>> 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 1091329 >>>>>> 1092042 1092049 44237 713 7 44957 0 >>>>>> 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 1092055 >>>>>> 1094242 1094249 44960 2187 7 47154 0 >>>>>> 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 1089979 >>>>>> 1094418 1094428 42889 4439 10 47338 0 >>>>>> 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 1094433 >>>>>> 1095082 1095089 47331 649 7 47987 0 >>>>>> 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 1095095 >>>>>> 1096846 1096853 47991 1751 7 49749 0 >>>>>> 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 1094256 >>>>>> 1098214 1098221 47156 3958 7 51121 0 >>>>>> 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 1096859 >>>>>> 1098627 1098637 49752 1768 10 51530 0 >>>>>> 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 1094037 >>>>>> 1098903 1098910 46940 4866 7 51813 0 >>>>>> 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 1099192 >>>>>> 1100210 1100246 52071 1018 36 53125 0 >>>>>> 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 1097371 >>>>>> 1100555 1100562 50260 3184 7 53451 0 >>>>>> 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 1097135 >>>>>> 1100896 1100904 50026 3761 8 53795 0 >>>>>> 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 1098640 >>>>>> 1101106 1101127 51523 2466 21 54010 0 >>>>>> 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 1099965 >>>>>> 1101217 1101224 52842 1252 7 54101 0 >>>>>> 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 1098227 >>>>>> 1101820 1101828 51112 3593 8 54713 0 >>>>>> 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 1097375 >>>>>> 1104132 1104139 50262 6757 7 57026 0 >>>>>> 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 1100221 >>>>>> 1106449 1106458 53096 6228 9 59333 0 >>>>>> 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 1098916 >>>>>> 1106473 1106481 51797 7557 8 59362 0 >>>>>> 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 563384 >>>>>> 1207793 1207801 71 644409 8 644488 0 >>>>>> 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 563413 >>>>>> 1216404 1216425 98 652991 21 653110 0 >>>>>> >>>>>> >>>>>> >>>>>> Veronika Nefedova wrote: >>>>>>> OK. There is something weird happening. I've got several such >>>>>>> entries in my swift log: >>>>>>> >>>>>>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application >>>>>>> exception: Task failed >>>>>>> task:execute @ vdl-int.k, line: 332 >>>>>>> vdl:execute2 @ execute-default.k, line: 22 >>>>>>> vdl:execute @ MolDyn-244-loops.kml, line: 20 >>>>>>> antchmbr @ MolDyn-244-loops.kml, line: 2845 >>>>>>> vdl:mains @ MolDyn-244-loops.kml, line: 2267 >>>>>>> >>>>>>> >>>>>>> Looks like antechamber has failed (?). And the failure is only >>>>>>> on a swfit side, it never made it across to Falcon (there are no >>>>>>> remote directories created). But I see some of antechamber jobs >>>>>>> have finished (in shared). >>>>>>> >>>>>>> Yuqing -- could the changes you've made be responsible for these >>>>>>> failures (I do not see how it could though) ? >>>>>>> >>>>>>> Ioan, what do you see in your logs ion these tasks: >>>>>>> >>>>>>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, >>>>>>> identity=urn:0-1-56-0-1186429255786) setting status to Failed >>>>>>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, >>>>>>> identity=urn:0-1-57-0-1186429255798) setting status to Failed >>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>> identity=urn:0-1-59-0-1186429255800) setting status to Failed >>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>> identity=urn:0-1-60-0-1186429255805) setting status to Failed >>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>> identity=urn:0-1-61-0-1186429255811) setting status to Failed >>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>> identity=urn:0-1-58-0-1186429255814) setting status to Failed >>>>>>> >>>>>>> Nika >>>>>>> >>>>>>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >>>>>>> >>>>>>>> OK! >>>>>>>> Why don't we do one last run from my allocation, as everything >>>>>>>> is set up already and ready to go! Make sure to enable all >>>>>>>> debug logging. Falkon is up and running with all debug enabled! >>>>>>>> >>>>>>>> Falkon location is unchanged from the last experiment. >>>>>>>> Falkon Factory Service: >>>>>>>> http://tg-viz-login2:50010/wsrf/services/GenericPortal/core/WS/GPFactoryService >>>>>>>> >>>>>>>> Web Server (graphs): >>>>>>>> http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>>>>>> >>>>>>>> ANL/UC is not quite so idle as it was earlier, but I bet we >>>>>>>> could still get 150~200 processors! >>>>>>>> >>>>>>>> Ioan >>>>>>>> >>>>>>>> Veronika Nefedova wrote: >>>>>>>>> m050 and m179 finished just fine now via GRAM (thanks to >>>>>>>>> Yuqing who fixed the m179 just in time!). We could start again >>>>>>>>> the 244- molecule run to verify that nothing is wrong with the >>>>>>>>> whole system. >>>>>>>>> >>>>>>>>> Nika >>>>>>>>> >>>>>>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> I started those 2 molecules via GRAM. I have no trust in m179 >>>>>>>>>> finishing completely since I didn't change anything. I hope >>>>>>>>>> for m050 to finish though... >>>>>>>>>> You can watch the swift log on viper in >>>>>>>>>> ~nefedova/alamines/MolDyn-2-loops-be9484k93kk21.log >>>>>>>>>> >>>>>>>>>> Nika >>>>>>>>>> >>>>>>>>>>> Then, let's try another run with 244 molecules soon, as most >>>>>>>>>>> of ANL/UC is free! >>>>>>>>>>> >>>>>>>>>>> Ioan >>>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> >> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > From nefedova at mcs.anl.gov Mon Aug 6 21:36:04 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 21:36:04 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B7A919.1050507@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> Message-ID: Whats up now? Everything has stopped, no errors on swift site... Do you have any errors now? Nika On Aug 6, 2007, at 6:04 PM, Ioan Raicu wrote: > OK, I restarted Falkon as well as there were 12K jobs trying to go > through, and keeping the entire ANL/UC site busy, although there > was no Swift on the other end to pick up the notifications... > > here is the new info: > > Falkon Factory Service: http://tg-viz-login2:50020/wsrf/services/ > GenericPortal/core/WS/GPFactoryService > Web server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm > > Note that I changed the port #, its now 50020, so don't forget to > change that before you start Swift... > > Ioan > > Veronika Nefedova wrote: >> OK. I accidentally closed viper window where I started the >> workflow. The workflow was started with & so it was supposed to >> stay up even if I exited the shell. But apparently it didn't! >> >> This is the last entry in the log: >> >> 2007-08-06 17:16:59,483 INFO ResourcePool Destroying remote >> service instance... dummy function, this doesn't really do >> anything... >> >> (and it doesn't change ever since). >> >> What went wrong ? Why closing the shell actually killed the job? >> (ps shows no swift job) >> I checked 'history' and in fact the job was started with &: >> >> 999 swift -tc.file tc-uc.data -sites.file sites-uc-64.xml - >> debug MolDyn-244-loops.swift & >> >> I'll restart the workflow in 30 mins or so (from home) again. >> >> Sigh... >> >> Nika >> >> >> On Aug 6, 2007, at 4:29 PM, Veronika Nefedova wrote: >> >>> Ioan, its all was due to NFS problems, I am convinced now... >>> >>> I restarted the run, the log is ~nefedova/alamines/MolDyn-244- >>> loops-hxl1glhtqsag0.log >>> >>> Nika >>> >>> On Aug 6, 2007, at 4:20 PM, Ioan Raicu wrote: >>> >>>> Just to debug further.... I picked out 1 task at random from the >>>> Swift log... >>>> iraicu at viper:/home/nefedova/alamines> cat MolDyn-244-loops- >>>> dbui34oxjr4j2.log | grep "urn:0-1-62-0-1186429258791" >>>> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, identity=urn: >>>> 0-1-62-0-1186429258791) setting status to Submitted >>>> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, identity=urn: >>>> 0-1-62-0-1186429258791) setting status to Active >>>> 2007-08-06 14:47:03,704 DEBUG TaskImpl Task(type=2, identity=urn: >>>> 0-1-62-0-1186429258791) setting status to Failed Exception in >>>> getFile >>>> >>>> but in my log, it is nowhere to be found... >>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>>> GenericPortalWS_taskPerf.txt | grep "urn:0-1-62-0-1186429258791" >>>> >>>> What does "setting status to Failed Exception in getFile" mean? >>>> Could this mean that it failed on the data staging part, and >>>> that it never made it to Falkon? >>>> >>>> BTW, it lloks as if there were really 539 jobs submitted... >>>> >>>> iraicu at viper:/home/nefedova/alamines> grep "Submitted" >>>> MolDyn-244-loops-dbui34oxjr4j2.log | wc >>>> 539 5390 62835 >>>> >>>> but again, only 57 made it to Falkon, and there were no >>>> exceptions thrown anywhere to indicate that something unusual >>>> happened. >>>> >>>> Ioan >>>> >>>> Ioan Raicu wrote: >>>>> Falkon only has 57 tasks received, here they are: >>>>> tg-viz-login.uc.teragrid.org:/home/iraicu/java/Falkon_v0.8.1/ >>>>> service/logs/GenericPortalWS.txt.0.summary >>>>> >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh pre_ch-vsk58efi stdout.txt stderr.txt . ./ >>>>> m179.mol2 ./m050.mol2 m179_am1 m050_am1 /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/pre-antch.pl >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-xsk58efi stdout.txt stderr.txt m179_am1 >>>>> m179_am1.rtf m179_am1.crd m179_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m179_am1 -fi mol2 -rn >>>>> m179 -o m179_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-ysk58efi stdout.txt stderr.txt m050_am1 >>>>> m050_am1.rtf m050_am1.crd m050_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m050_am1 -fi mol2 -rn >>>>> m050 -o m050_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh chrm-0tk58efi equil_solv.out_m050 stderr.txt >>>>> equil_solv.inp parm03_gaff_all.rtf parm03_gaffnb_all.prm >>>>> equil_solv.inp m050_am1.rtf m050_am1.prm m050_am1.crd >>>>> water_400.crd equil_solv.out_m050 solv_m050.psf >>>>> solv_m050_eq.crd solv_m050.rst solv_m050.trj >>>>> solv_m050_min.crd /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>>>> charmm.sh system:solv_m050 title:solv stitle:m050 >>>>> rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm >>>>> gaff:m050_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 >>>>> rwater:15 nstep:10000 minstep:100 skipstep:100 startstep:10000 >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh chrm-zsk58efi equil_solv.out_m179 stderr.txt >>>>> equil_solv.inp parm03_gaff_all.rtf parm03_gaffnb_all.prm >>>>> equil_solv.inp m179_am1.rtf m179_am1.prm m179_am1.crd >>>>> water_400.crd equil_solv.out_m179 solv_m179.psf >>>>> solv_m179_eq.crd solv_m179.rst solv_m179.trj >>>>> solv_m179_min.crd /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>>>> charmm.sh system:solv_m179 title:solv stitle:m179 >>>>> rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm >>>>> gaff:m179_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 >>>>> rwater:15 nstep:10000 minstep:100 skipstep:100 startstep:10000 >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh pre_ch-38lc8efi stdout.txt stderr.txt . ./ >>>>> m197.mol2 ./m129.mol2 ./m069.mol2 ./m163.mol2 ./m128.mol2 ./ >>>>> m035.mol2 ./m070.mol2 ./m221.mol2 ./m162.mol2 ./m198.mol2 ./ >>>>> m034.mol2 ./m001.mol2 ./m220.mol2 ./m033.mol2 ./m161.mol2 ./ >>>>> m032.mol2 ./m160.mol2 ./m130.mol2 ./m071.mol2 ./m002.mol2 ./ >>>>> m199.mol2 ./m175.mol2 ./m234.mol2 ./m048.mol2 ./m107.mol2 ./ >>>>> m047.mol2 ./m106.mol2 ./m124.mol2 ./m193.mol2 ./m225.mol2 ./ >>>>> m066.mol2 ./m125.mol2 ./m176.mol2 ./m194.mol2 ./m224.mol2 ./ >>>>> m235.mol2 ./m067.mol2 ./m165.mol2 ./m049.mol2 ./m126.mol2 ./ >>>>> m166.mol2 ./m108.mol2 ./m195.mol2 ./m038.mol2 ./m059.mol2 ./ >>>>> m036.mol2 ./m186.mol2 ./m164.mol2 ./m117.mol2 ./m223.mol2 ./ >>>>> m058.mol2 ./m037.mol2 ./m188.mol2 ./m068.mol2 ./m119.mol2 ./ >>>>> m187.mol2 ./m196.mol2 ./m118.mol2 ./m127.mol2 ./m222.mol2 ./ >>>>> m189.mol2 ./m060.mol2 ./m236.mol2 ./m109.mol2 ./m177.mol2 ./ >>>>> m050.mol2 ./m179.mol2 ./m178.mol2 ./m123.mol2 ./m237.mol2 ./ >>>>> m110.mol2 ./m191.mol2 ./m100.mol2 ./m064.mol2 ./m041.mol2 ./ >>>>> m238.mol2 ./m063.mol2 ./m228.mol2 ./m051.mol2 ./m122.mol2 ./ >>>>> m169.mol2 ./m121.mol2 ./m190.mol2 ./m120.mol2 ./m062.mol2 ./ >>>>> m065.mol2 ./m039.mol2 ./m192.mol2 ./m167.mol2 ./m227.mol2 ./ >>>>> m040.mol2 ./m226.mol2 ./m168.mol2 ./m239.mol2 ./m052.mol2 ./ >>>>> m111.mol2 ./m180.mol2 ./m053.mol2 ./m112.mol2 ./m181.mol2 ./ >>>>> m240.mol2 ./m054.mol2 ./m044.mol2 ./m113.mol2 ./m230.mol2 ./ >>>>> m103.mol2 ./m229.mol2 ./m061.mol2 ./m042.mol2 ./m101.mol2 ./ >>>>> m170.mol2 ./m043.mol2 ./m102.mol2 ./m171.mol2 ./m151.mol2 ./ >>>>> m083.mol2 ./m210.mol2 ./m014.mol2 ./m023.mol2 ./m200.mol2 ./ >>>>> m092.mol2 ./m091.mol2 ./m150.mol2 ./m209.mol2 ./m022.mol2 ./ >>>>> m024.mol2 ./m093.mol2 ./m015.mol2 ./m084.mol2 ./m142.mol2 ./ >>>>> m201.mol2 ./m016.mol2 ./m085.mol2 ./m143.mol2 ./m202.mol2 ./ >>>>> m010.mol2 ./m212.mol2 ./m138.mol2 ./m026.mol2 ./m011.mol2 ./ >>>>> m095.mol2 ./m139.mol2 ./m154.mol2 ./m211.mol2 ./m025.mol2 ./ >>>>> m094.mol2 ./m153.mol2 ./m213.mol2 ./m080.mol2 ./m012.mol2 ./ >>>>> m152.mol2 ./m081.mol2 ./m140.mol2 ./m013.mol2 ./m082.mol2 ./ >>>>> m141.mol2 ./m028.mol2 ./m097.mol2 ./m155.mol2 ./m008.mol2 ./ >>>>> m214.mol2 ./m135.mol2 ./m029.mol2 ./m076.mol2 ./m098.mol2 ./ >>>>> m007.mol2 ./m156.mol2 ./m134.mol2 ./m215.mol2 ./m137.mol2 ./ >>>>> m079.mol2 ./m009.mol2 ./m078.mol2 ./m077.mol2 ./m096.mol2 ./ >>>>> m136.mol2 ./m027.mol2 ./m132.mol2 ./m158.mol2 ./m073.mol2 ./ >>>>> m217.mol2 ./m030.mol2 ./m159.mol2 ./m072.mol2 ./m218.mol2 ./ >>>>> m003.mol2 ./m031.mol2 ./m004.mol2 ./m219.mol2 ./m131.mol2 ./ >>>>> m074.mol2 ./m133.mol2 ./m006.mol2 ./m075.mol2 ./m157.mol2 ./ >>>>> m099.mol2 ./m005.mol2 ./m216.mol2 ./m090.mol2 ./m021.mol2 ./ >>>>> m208.mol2 ./m149.mol2 ./m020.mol2 ./m207.mol2 ./m148.mol2 ./ >>>>> m088.mol2 ./m089.mol2 ./m206.mol2 ./m147.mol2 ./m019.mol2 ./ >>>>> m205.mol2 ./m146.mol2 ./m087.mol2 ./m018.mol2 ./m204.mol2 ./ >>>>> m145.mol2 ./m086.mol2 ./m017.mol2 ./m144.mol2 ./m203.mol2 ./ >>>>> m057.mol2 ./m116.mol2 ./m232.mol2 ./m173.mol2 ./m105.mol2 ./ >>>>> m046.mol2 ./m231.mol2 ./m172.mol2 ./m104.mol2 ./m045.mol2 ./ >>>>> m174.mol2 ./m233.mol2 ./m244.mol2 ./m185.mol2 ./m182.mol2 ./ >>>>> m243.mol2 ./m055.mol2 ./m241.mol2 ./m183.mol2 ./m114.mol2 ./ >>>>> m056.mol2 ./m242.mol2 ./m184.mol2 ./m115.mol2 m197_am1 m129_am1 >>>>> m069_am1 m163_am1 m128_am1 m035_am1 m070_am1 m221_am1 m162_am1 >>>>> m198_am1 m034_am1 m001_am1 m220_am1 m033_am1 m161_am1 m032_am1 >>>>> m160_am1 m130_am1 m071_am1 m002_am1 m199_am1 m175_am1 m234_am1 >>>>> m048_am1 m107_am1 m047_am1 m106_am1 m124_am1 m193_am1 m225_am1 >>>>> m066_am1 m125_am1 m176_am1 m194_am1 m224_am1 m235_am1 m067_am1 >>>>> m165_am1 m049_am1 m126_am1 m166_am1 m108_am1 m195_am1 m038_am1 >>>>> m059_am1 m036_am1 m186_am1 m164_am1 m223_am1 m117_am1 m037_am1 >>>>> m058_am1 m068_am1 m188_am1 m119_am1 m196_am1 m187_am1 m222_am1 >>>>> m127_am1 m118_am1 m189_am1 m060_am1 m236_am1 m109_am1 m177_am1 >>>>> m050_am1 m179_am1 m123_am1 m178_am1 m237_am1 m100_am1 m191_am1 >>>>> m110_am1 m041_am1 m064_am1 m228_am1 m063_am1 m238_am1 m169_am1 >>>>> m122_am1 m051_am1 m121_am1 m190_am1 m120_am1 m062_am1 m039_am1 >>>>> m065_am1 m167_am1 m192_am1 m227_am1 m040_am1 m226_am1 m168_am1 >>>>> m239_am1 m052_am1 m111_am1 m180_am1 m053_am1 m112_am1 m181_am1 >>>>> m240_am1 m054_am1 m044_am1 m113_am1 m230_am1 m103_am1 m229_am1 >>>>> m061_am1 m042_am1 m101_am1 m170_am1 m043_am1 m102_am1 m171_am1 >>>>> m151_am1 m083_am1 m210_am1 m014_am1 m023_am1 m200_am1 m092_am1 >>>>> m091_am1 m150_am1 m209_am1 m022_am1 m024_am1 m093_am1 m015_am1 >>>>> m084_am1 m142_am1 m201_am1 m016_am1 m085_am1 m143_am1 m202_am1 >>>>> m010_am1 m212_am1 m138_am1 m026_am1 m011_am1 m095_am1 m139_am1 >>>>> m154_am1 m211_am1 m025_am1 m094_am1 m153_am1 m213_am1 m080_am1 >>>>> m012_am1 m152_am1 m081_am1 m140_am1 m013_am1 m082_am1 m141_am1 >>>>> m028_am1 m097_am1 m155_am1 m008_am1 m214_am1 m135_am1 m029_am1 >>>>> m076_am1 m098_am1 m007_am1 m156_am1 m134_am1 m215_am1 m137_am1 >>>>> m079_am1 m009_am1 m078_am1 m077_am1 m096_am1 m136_am1 m027_am1 >>>>> m132_am1 m158_am1 m073_am1 m217_am1 m030_am1 m159_am1 m072_am1 >>>>> m218_am1 m003_am1 m031_am1 m004_am1 m219_am1 m131_am1 m074_am1 >>>>> m133_am1 m006_am1 m075_am1 m157_am1 m099_am1 m216_am1 m005_am1 >>>>> m090_am1 m021_am1 m208_am1 m149_am1 m020_am1 m207_am1 m148_am1 >>>>> m089_am1 m088_am1 m206_am1 m147_am1 m019_am1 m205_am1 m146_am1 >>>>> m087_am1 m018_am1 m204_am1 m145_am1 m086_am1 m017_am1 m144_am1 >>>>> m203_am1 m057_am1 m116_am1 m232_am1 m173_am1 m105_am1 m046_am1 >>>>> m231_am1 m172_am1 m104_am1 m045_am1 m174_am1 m233_am1 m244_am1 >>>>> m185_am1 m182_am1 m243_am1 m055_am1 m241_am1 m183_am1 m114_am1 >>>>> m056_am1 m242_am1 m184_am1 m115_am1 /disks/scratchgpfs1/iraicu/ >>>>> ModLyn/bin/pre-antch.pl >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-58lc8efi stdout.txt stderr.txt m197_am1 >>>>> m197_am1.rtf m197_am1.crd m197_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m197_am1 -fi mol2 -rn >>>>> m197 -o m197_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-48lc8efi stdout.txt stderr.txt m129_am1 >>>>> m129_am1.rtf m129_am1.crd m129_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m129_am1 -fi mol2 -rn >>>>> m129 -o m129_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-68lc8efi stdout.txt stderr.txt m069_am1 >>>>> m069_am1.rtf m069_am1.crd m069_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m069_am1 -fi mol2 -rn >>>>> m069 -o m069_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-88lc8efi stdout.txt stderr.txt m163_am1 >>>>> m163_am1.rtf m163_am1.crd m163_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m163_am1 -fi mol2 -rn >>>>> m163 -o m163_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-78lc8efi stdout.txt stderr.txt m128_am1 >>>>> m128_am1.rtf m128_am1.crd m128_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m128_am1 -fi mol2 -rn >>>>> m128 -o m128_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-98lc8efi stdout.txt stderr.txt m035_am1 >>>>> m035_am1.rtf m035_am1.crd m035_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m035_am1 -fi mol2 -rn >>>>> m035 -o m035_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-a8lc8efi stdout.txt stderr.txt m070_am1 >>>>> m070_am1.rtf m070_am1.crd m070_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m070_am1 -fi mol2 -rn >>>>> m070 -o m070_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-b8lc8efi stdout.txt stderr.txt m221_am1 >>>>> m221_am1.rtf m221_am1.crd m221_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m221_am1 -fi mol2 -rn >>>>> m221 -o m221_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-c8lc8efi stdout.txt stderr.txt m162_am1 >>>>> m162_am1.rtf m162_am1.crd m162_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m162_am1 -fi mol2 -rn >>>>> m162 -o m162_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-d8lc8efi stdout.txt stderr.txt m198_am1 >>>>> m198_am1.rtf m198_am1.crd m198_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m198_am1 -fi mol2 -rn >>>>> m198 -o m198_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-e8lc8efi stdout.txt stderr.txt m034_am1 >>>>> m034_am1.rtf m034_am1.crd m034_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m034_am1 -fi mol2 -rn >>>>> m034 -o m034_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-f8lc8efi stdout.txt stderr.txt m001_am1 >>>>> m001_am1.rtf m001_am1.crd m001_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m001_am1 -fi mol2 -rn >>>>> m001 -o m001_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-h8lc8efi stdout.txt stderr.txt m033_am1 >>>>> m033_am1.rtf m033_am1.crd m033_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m033_am1 -fi mol2 -rn >>>>> m033 -o m033_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-g8lc8efi stdout.txt stderr.txt m220_am1 >>>>> m220_am1.rtf m220_am1.crd m220_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m220_am1 -fi mol2 -rn >>>>> m220 -o m220_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-i8lc8efi stdout.txt stderr.txt m161_am1 >>>>> m161_am1.rtf m161_am1.crd m161_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m161_am1 -fi mol2 -rn >>>>> m161 -o m161_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-j8lc8efi stdout.txt stderr.txt m032_am1 >>>>> m032_am1.rtf m032_am1.crd m032_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m032_am1 -fi mol2 -rn >>>>> m032 -o m032_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-k8lc8efi stdout.txt stderr.txt m160_am1 >>>>> m160_am1.rtf m160_am1.crd m160_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m160_am1 -fi mol2 -rn >>>>> m160 -o m160_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-l8lc8efi stdout.txt stderr.txt m130_am1 >>>>> m130_am1.rtf m130_am1.crd m130_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m130_am1 -fi mol2 -rn >>>>> m130 -o m130_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-m8lc8efi stdout.txt stderr.txt m071_am1 >>>>> m071_am1.rtf m071_am1.crd m071_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m071_am1 -fi mol2 -rn >>>>> m071 -o m071_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-o8lc8efi stdout.txt stderr.txt m199_am1 >>>>> m199_am1.rtf m199_am1.crd m199_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m199_am1 -fi mol2 -rn >>>>> m199 -o m199_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-n8lc8efi stdout.txt stderr.txt m002_am1 >>>>> m002_am1.rtf m002_am1.crd m002_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m002_am1 -fi mol2 -rn >>>>> m002 -o m002_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-p8lc8efi stdout.txt stderr.txt m175_am1 >>>>> m175_am1.rtf m175_am1.crd m175_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m175_am1 -fi mol2 -rn >>>>> m175 -o m175_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-q8lc8efi stdout.txt stderr.txt m234_am1 >>>>> m234_am1.rtf m234_am1.crd m234_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m234_am1 -fi mol2 -rn >>>>> m234 -o m234_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-s8lc8efi stdout.txt stderr.txt m107_am1 >>>>> m107_am1.rtf m107_am1.crd m107_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m107_am1 -fi mol2 -rn >>>>> m107 -o m107_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-r8lc8efi stdout.txt stderr.txt m048_am1 >>>>> m048_am1.rtf m048_am1.crd m048_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m048_am1 -fi mol2 -rn >>>>> m048 -o m048_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-v8lc8efi stdout.txt stderr.txt m124_am1 >>>>> m124_am1.rtf m124_am1.crd m124_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m124_am1 -fi mol2 -rn >>>>> m124 -o m124_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-t8lc8efi stdout.txt stderr.txt m047_am1 >>>>> m047_am1.rtf m047_am1.crd m047_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m047_am1 -fi mol2 -rn >>>>> m047 -o m047_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-u8lc8efi stdout.txt stderr.txt m106_am1 >>>>> m106_am1.rtf m106_am1.crd m106_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m106_am1 -fi mol2 -rn >>>>> m106 -o m106_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-x8lc8efi stdout.txt stderr.txt m193_am1 >>>>> m193_am1.rtf m193_am1.crd m193_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m193_am1 -fi mol2 -rn >>>>> m193 -o m193_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-y8lc8efi stdout.txt stderr.txt m225_am1 >>>>> m225_am1.rtf m225_am1.crd m225_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m225_am1 -fi mol2 -rn >>>>> m225 -o m225_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-z8lc8efi stdout.txt stderr.txt m066_am1 >>>>> m066_am1.rtf m066_am1.crd m066_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m066_am1 -fi mol2 -rn >>>>> m066 -o m066_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-09lc8efi stdout.txt stderr.txt m125_am1 >>>>> m125_am1.rtf m125_am1.crd m125_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m125_am1 -fi mol2 -rn >>>>> m125 -o m125_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-29lc8efi stdout.txt stderr.txt m194_am1 >>>>> m194_am1.rtf m194_am1.crd m194_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m194_am1 -fi mol2 -rn >>>>> m194 -o m194_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-19lc8efi stdout.txt stderr.txt m176_am1 >>>>> m176_am1.rtf m176_am1.crd m176_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m176_am1 -fi mol2 -rn >>>>> m176 -o m176_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-39lc8efi stdout.txt stderr.txt m224_am1 >>>>> m224_am1.rtf m224_am1.crd m224_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m224_am1 -fi mol2 -rn >>>>> m224 -o m224_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-49lc8efi stdout.txt stderr.txt m235_am1 >>>>> m235_am1.rtf m235_am1.crd m235_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m235_am1 -fi mol2 -rn >>>>> m235 -o m235_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-69lc8efi stdout.txt stderr.txt m165_am1 >>>>> m165_am1.rtf m165_am1.crd m165_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m165_am1 -fi mol2 -rn >>>>> m165 -o m165_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-59lc8efi stdout.txt stderr.txt m067_am1 >>>>> m067_am1.rtf m067_am1.crd m067_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m067_am1 -fi mol2 -rn >>>>> m067 -o m067_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-79lc8efi stdout.txt stderr.txt m049_am1 >>>>> m049_am1.rtf m049_am1.crd m049_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m049_am1 -fi mol2 -rn >>>>> m049 -o m049_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-89lc8efi stdout.txt stderr.txt m126_am1 >>>>> m126_am1.rtf m126_am1.crd m126_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m126_am1 -fi mol2 -rn >>>>> m126 -o m126_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-99lc8efi stdout.txt stderr.txt m166_am1 >>>>> m166_am1.rtf m166_am1.crd m166_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m166_am1 -fi mol2 -rn >>>>> m166 -o m166_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-a9lc8efi stdout.txt stderr.txt m108_am1 >>>>> m108_am1.rtf m108_am1.crd m108_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m108_am1 -fi mol2 -rn >>>>> m108 -o m108_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-b9lc8efi stdout.txt stderr.txt m195_am1 >>>>> m195_am1.rtf m195_am1.crd m195_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m195_am1 -fi mol2 -rn >>>>> m195 -o m195_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-d9lc8efi stdout.txt stderr.txt m038_am1 >>>>> m038_am1.rtf m038_am1.crd m038_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m038_am1 -fi mol2 -rn >>>>> m038 -o m038_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-c9lc8efi stdout.txt stderr.txt m059_am1 >>>>> m059_am1.rtf m059_am1.crd m059_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m059_am1 -fi mol2 -rn >>>>> m059 -o m059_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-e9lc8efi stdout.txt stderr.txt m186_am1 >>>>> m186_am1.rtf m186_am1.crd m186_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m186_am1 -fi mol2 -rn >>>>> m186 -o m186_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-f9lc8efi stdout.txt stderr.txt m164_am1 >>>>> m164_am1.rtf m164_am1.crd m164_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m164_am1 -fi mol2 -rn >>>>> m164 -o m164_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-h9lc8efi stdout.txt stderr.txt m036_am1 >>>>> m036_am1.rtf m036_am1.crd m036_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m036_am1 -fi mol2 -rn >>>>> m036 -o m036_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-g9lc8efi stdout.txt stderr.txt m223_am1 >>>>> m223_am1.rtf m223_am1.crd m223_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m223_am1 -fi mol2 -rn >>>>> m223 -o m223_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-j9lc8efi stdout.txt stderr.txt m058_am1 >>>>> m058_am1.rtf m058_am1.crd m058_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m058_am1 -fi mol2 -rn >>>>> m058 -o m058_am1 -fo charmm -c bcc >>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>> wrapper.sh antch-k9lc8efi stdout.txt stderr.txt m037_am1 >>>>> m037_am1.rtf m037_am1.crd m037_am1.prm /disks/scratchgpfs1/ >>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m037_am1 -fi mol2 -rn >>>>> m037 -o m037_am1 -fo charmm -c bcc >>>>> >>>>> >>>>> >>>>> Veronika Nefedova wrote: >>>>>> Swift thinks that it sent 248 jobs. >>>>>> >>>>>> nefedova at viper:~/alamines> grep "Running job " MolDyn-244- >>>>>> loops-dbui34oxjr4j2.log | wc >>>>>> 248 6931 56718 >>>>>> nefedova at viper:~/alamines> >>>>>> >>>>>> On Aug 6, 2007, at 3:27 PM, Ioan Raicu wrote: >>>>>> >>>>>>> Everything is idle, there is no work to be done... >>>>>>> >>>>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> tail >>>>>>> GenericPortalWS_perf_per_sec.txt >>>>>>> 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>> 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>> 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>> 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>> 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>> 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>> 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>> 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>> 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>> 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>> >>>>>>> 24 workers are registered but idle.... queue length 0, 57 >>>>>>> jobs completed. >>>>>>> >>>>>>> Also, see below all 57 jobs, they all finished with an exit >>>>>>> code of 0, in other words succesfully! How many jobs does >>>>>>> Swift think it sent? >>>>>>> >>>>>>> Ioan >>>>>>> >>>>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>>>>>> GenericPortalWS_taskPerf.txt >>>>>>> //taskNum taskID workerID startTimeStamp execTimeStamp >>>>>>> resultsQueueTimeStamp endTimeStamp waitQueueTime ex >>>>>>> ecTime resultsQueueTime totalTime exitCode >>>>>>> 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 >>>>>>> 560614 560629 49780 338 15 50133 0 >>>>>>> 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 561200 >>>>>>> 561899 561909 216 699 10 925 0 >>>>>>> 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 561373 >>>>>>> 562150 562159 382 777 9 1168 0 >>>>>>> 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 >>>>>>> 1044916 1044926 62404 10200 10 72614 0 >>>>>>> 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 >>>>>>> 1046453 1047038 1047067 135 585 29 749 0 >>>>>>> 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 >>>>>>> 1046429 1053072 1053080 114 6643 8 6765 0 >>>>>>> 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 >>>>>>> 1047051 1054256 1054290 731 7205 34 7970 0 >>>>>>> 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 >>>>>>> 1054267 1054570 1054579 7943 303 9 8255 0 >>>>>>> 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 >>>>>>> 1053087 1056811 1056819 6765 3724 8 10497 0 >>>>>>> 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 >>>>>>> 1054583 1058691 1058719 8257 4108 28 12393 0 >>>>>>> 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 >>>>>>> 1058704 1059363 1059385 12373 659 22 13054 0 >>>>>>> 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 >>>>>>> 1056826 1060315 1060323 10497 3489 8 13994 0 >>>>>>> 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 >>>>>>> 1059375 1060589 1060596 13042 1214 7 14263 0 >>>>>>> 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 >>>>>>> 1060603 1060954 1061054 14265 351 100 14716 0 >>>>>>> 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 >>>>>>> 1060329 1061094 1061126 13993 765 32 14790 0 >>>>>>> 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 >>>>>>> 1061105 1065608 1065617 14414 4503 9 18926 0 >>>>>>> 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 >>>>>>> 1065622 1066307 1066315 18929 685 8 19622 0 >>>>>>> 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 >>>>>>> 1061045 1067540 1067563 14356 6495 23 20874 0 >>>>>>> 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 >>>>>>> 1066320 1069262 1069271 19625 2942 9 22576 0 >>>>>>> 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 >>>>>>> 1067551 1071003 1071011 20854 3452 8 24314 0 >>>>>>> 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 >>>>>>> 1071016 1071664 1071671 24316 648 7 24971 0 >>>>>>> 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 >>>>>>> 1069275 1071679 1071692 22577 2404 13 24994 0 >>>>>>> 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 >>>>>>> 1071687 1073978 1073988 24985 2291 10 27286 0 >>>>>>> 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 >>>>>>> 1073992 1075959 1075969 27286 1967 10 29263 0 >>>>>>> 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 >>>>>>> 1071699 1076704 1076713 24995 5005 9 30009 0 >>>>>>> 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 >>>>>>> 1075972 1077451 1077459 29264 1479 8 30751 0 >>>>>>> 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 >>>>>>> 1076717 1080157 1080165 30007 3440 8 33455 0 >>>>>>> 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 >>>>>>> 1077464 1080270 1080286 30752 2806 16 33574 0 >>>>>>> 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 >>>>>>> 1080170 1080611 1080619 33457 441 8 33906 0 >>>>>>> 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 >>>>>>> 1080624 1080973 1080983 33907 349 10 34266 0 >>>>>>> 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 >>>>>>> 1080281 1081405 1081413 33566 1124 8 34698 0 >>>>>>> 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 >>>>>>> 1080986 1082989 1082996 34267 2003 7 36277 0 >>>>>>> 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 >>>>>>> 1083002 1083370 1083378 36279 368 8 36655 0 >>>>>>> 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 >>>>>>> 1081417 1084830 1084837 34696 3413 7 38116 0 >>>>>>> 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 >>>>>>> 1084843 1085854 1085879 37761 1011 25 38797 0 >>>>>>> 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 >>>>>>> 1085865 1089502 1089511 38780 3637 9 42426 0 >>>>>>> 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 >>>>>>> 1089515 1089966 1089974 42428 451 8 42887 0 >>>>>>> 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 >>>>>>> 1083383 1091316 1091324 36658 7933 8 44599 0 >>>>>>> 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 >>>>>>> 1091329 1092042 1092049 44237 713 7 44957 0 >>>>>>> 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 >>>>>>> 1092055 1094242 1094249 44960 2187 7 47154 0 >>>>>>> 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 >>>>>>> 1089979 1094418 1094428 42889 4439 10 47338 0 >>>>>>> 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 >>>>>>> 1094433 1095082 1095089 47331 649 7 47987 0 >>>>>>> 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 >>>>>>> 1095095 1096846 1096853 47991 1751 7 49749 0 >>>>>>> 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 >>>>>>> 1094256 1098214 1098221 47156 3958 7 51121 0 >>>>>>> 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 >>>>>>> 1096859 1098627 1098637 49752 1768 10 51530 0 >>>>>>> 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 >>>>>>> 1094037 1098903 1098910 46940 4866 7 51813 0 >>>>>>> 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 >>>>>>> 1099192 1100210 1100246 52071 1018 36 53125 0 >>>>>>> 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 >>>>>>> 1097371 1100555 1100562 50260 3184 7 53451 0 >>>>>>> 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 >>>>>>> 1097135 1100896 1100904 50026 3761 8 53795 0 >>>>>>> 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 >>>>>>> 1098640 1101106 1101127 51523 2466 21 54010 0 >>>>>>> 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 >>>>>>> 1099965 1101217 1101224 52842 1252 7 54101 0 >>>>>>> 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 >>>>>>> 1098227 1101820 1101828 51112 3593 8 54713 0 >>>>>>> 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 >>>>>>> 1097375 1104132 1104139 50262 6757 7 57026 0 >>>>>>> 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 >>>>>>> 1100221 1106449 1106458 53096 6228 9 59333 0 >>>>>>> 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 >>>>>>> 1098916 1106473 1106481 51797 7557 8 59362 0 >>>>>>> 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 563384 >>>>>>> 1207793 1207801 71 644409 8 644488 0 >>>>>>> 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 563413 >>>>>>> 1216404 1216425 98 652991 21 653110 0 >>>>>>> >>>>>>> >>>>>>> >>>>>>> Veronika Nefedova wrote: >>>>>>>> OK. There is something weird happening. I've got several >>>>>>>> such entries in my swift log: >>>>>>>> >>>>>>>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application >>>>>>>> exception: Task failed >>>>>>>> task:execute @ vdl-int.k, line: 332 >>>>>>>> vdl:execute2 @ execute-default.k, line: 22 >>>>>>>> vdl:execute @ MolDyn-244-loops.kml, line: 20 >>>>>>>> antchmbr @ MolDyn-244-loops.kml, line: 2845 >>>>>>>> vdl:mains @ MolDyn-244-loops.kml, line: 2267 >>>>>>>> >>>>>>>> >>>>>>>> Looks like antechamber has failed (?). And the failure is >>>>>>>> only on a swfit side, it never made it across to Falcon >>>>>>>> (there are no remote directories created). But I see some of >>>>>>>> antechamber jobs have finished (in shared). >>>>>>>> >>>>>>>> Yuqing -- could the changes you've made be responsible for >>>>>>>> these failures (I do not see how it could though) ? >>>>>>>> >>>>>>>> Ioan, what do you see in your logs ion these tasks: >>>>>>>> >>>>>>>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, >>>>>>>> identity=urn:0-1-56-0-1186429255786) setting status to Failed >>>>>>>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, >>>>>>>> identity=urn:0-1-57-0-1186429255798) setting status to Failed >>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>> identity=urn:0-1-59-0-1186429255800) setting status to Failed >>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>> identity=urn:0-1-60-0-1186429255805) setting status to Failed >>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>> identity=urn:0-1-61-0-1186429255811) setting status to Failed >>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>> identity=urn:0-1-58-0-1186429255814) setting status to Failed >>>>>>>> >>>>>>>> Nika >>>>>>>> >>>>>>>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >>>>>>>> >>>>>>>>> OK! >>>>>>>>> Why don't we do one last run from my allocation, as >>>>>>>>> everything is set up already and ready to go! Make sure to >>>>>>>>> enable all debug logging. Falkon is up and running with >>>>>>>>> all debug enabled! >>>>>>>>> >>>>>>>>> Falkon location is unchanged from the last experiment. >>>>>>>>> Falkon Factory Service: http://tg-viz-login2:50010/wsrf/ >>>>>>>>> services/GenericPortal/core/WS/GPFactoryService >>>>>>>>> Web Server (graphs): http://tg-viz-login2.uc.teragrid.org: >>>>>>>>> 51000/index.htm >>>>>>>>> >>>>>>>>> ANL/UC is not quite so idle as it was earlier, but I bet we >>>>>>>>> could still get 150~200 processors! >>>>>>>>> >>>>>>>>> Ioan >>>>>>>>> >>>>>>>>> Veronika Nefedova wrote: >>>>>>>>>> m050 and m179 finished just fine now via GRAM (thanks to >>>>>>>>>> Yuqing who fixed the m179 just in time!). We could start >>>>>>>>>> again the 244- molecule run to verify that nothing is >>>>>>>>>> wrong with the whole system. >>>>>>>>>> >>>>>>>>>> Nika >>>>>>>>>> >>>>>>>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> I started those 2 molecules via GRAM. I have no trust in >>>>>>>>>>> m179 finishing completely since I didn't change anything. >>>>>>>>>>> I hope for m050 to finish though... >>>>>>>>>>> You can watch the swift log on viper in ~nefedova/ >>>>>>>>>>> alamines/MolDyn-2-loops-be9484k93kk21.log >>>>>>>>>>> >>>>>>>>>>> Nika >>>>>>>>>>> >>>>>>>>>>>> Then, let's try another run with 244 molecules soon, as >>>>>>>>>>>> most of ANL/UC is free! >>>>>>>>>>>> >>>>>>>>>>>> Ioan >>>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> >> > From iraicu at cs.uchicago.edu Mon Aug 6 21:51:16 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 21:51:16 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> Message-ID: <46B7DE24.7030600@cs.uchicago.edu> I have 7959 jobs completed with an exit code of 0, no failed jobs! All the Falkon logs point to the same 7959 number of jobs, and when they were all completed, no new jobs came in from Swift... How many jobs do you see submitted, and how many have been completed in the Swift logs? Everything looks 100% normal on Falkon's end. Ioan Veronika Nefedova wrote: > Whats up now? Everything has stopped, no errors on swift site... > Do you have any errors now? > > Nika > > On Aug 6, 2007, at 6:04 PM, Ioan Raicu wrote: > >> OK, I restarted Falkon as well as there were 12K jobs trying to go >> through, and keeping the entire ANL/UC site busy, although there was >> no Swift on the other end to pick up the notifications... >> >> here is the new info: >> >> Falkon Factory Service: >> http://tg-viz-login2:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService >> >> Web server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >> >> Note that I changed the port #, its now 50020, so don't forget to >> change that before you start Swift... >> >> Ioan >> >> Veronika Nefedova wrote: >>> OK. I accidentally closed viper window where I started the workflow. >>> The workflow was started with & so it was supposed to stay up even >>> if I exited the shell. But apparently it didn't! >>> >>> This is the last entry in the log: >>> >>> 2007-08-06 17:16:59,483 INFO ResourcePool Destroying remote service >>> instance... dummy function, this doesn't really do anything... >>> >>> (and it doesn't change ever since). >>> >>> What went wrong ? Why closing the shell actually killed the job? (ps >>> shows no swift job) >>> I checked 'history' and in fact the job was started with &: >>> >>> 999 swift -tc.file tc-uc.data -sites.file sites-uc-64.xml -debug >>> MolDyn-244-loops.swift & >>> >>> I'll restart the workflow in 30 mins or so (from home) again. >>> >>> Sigh... >>> >>> Nika >>> >>> >>> On Aug 6, 2007, at 4:29 PM, Veronika Nefedova wrote: >>> >>>> Ioan, its all was due to NFS problems, I am convinced now... >>>> >>>> I restarted the run, the log is >>>> ~nefedova/alamines/MolDyn-244-loops-hxl1glhtqsag0.log >>>> >>>> Nika >>>> >>>> On Aug 6, 2007, at 4:20 PM, Ioan Raicu wrote: >>>> >>>>> Just to debug further.... I picked out 1 task at random from the >>>>> Swift log... >>>>> iraicu at viper:/home/nefedova/alamines> cat >>>>> MolDyn-244-loops-dbui34oxjr4j2.log | grep >>>>> "urn:0-1-62-0-1186429258791" >>>>> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, >>>>> identity=urn:0-1-62-0-1186429258791) setting status to Submitted >>>>> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, >>>>> identity=urn:0-1-62-0-1186429258791) setting status to Active >>>>> 2007-08-06 14:47:03,704 DEBUG TaskImpl Task(type=2, >>>>> identity=urn:0-1-62-0-1186429258791) setting status to Failed >>>>> Exception in getFile >>>>> >>>>> but in my log, it is nowhere to be found... >>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>>>> GenericPortalWS_taskPerf.txt | grep "urn:0-1-62-0-1186429258791" >>>>> >>>>> What does "setting status to Failed Exception in getFile" mean? >>>>> Could this mean that it failed on the data staging part, and that >>>>> it never made it to Falkon? >>>>> >>>>> BTW, it lloks as if there were really 539 jobs submitted... >>>>> >>>>> iraicu at viper:/home/nefedova/alamines> grep "Submitted" >>>>> MolDyn-244-loops-dbui34oxjr4j2.log | wc >>>>> 539 5390 62835 >>>>> >>>>> but again, only 57 made it to Falkon, and there were no exceptions >>>>> thrown anywhere to indicate that something unusual happened. >>>>> >>>>> Ioan >>>>> >>>>> Ioan Raicu wrote: >>>>>> Falkon only has 57 tasks received, here they are: >>>>>> tg-viz-login.uc.teragrid.org:/home/iraicu/java/Falkon_v0.8.1/service/logs/GenericPortalWS.txt.0.summary >>>>>> >>>>>> >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> pre_ch-vsk58efi stdout.txt stderr.txt . ./m179.mol2 ./m050.mol2 >>>>>> m179_am1 m050_am1 >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-xsk58efi stdout.txt stderr.txt m179_am1 m179_am1.rtf >>>>>> m179_am1.crd m179_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m179_am1 -fi mol2 -rn m179 -o m179_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-ysk58efi stdout.txt stderr.txt m050_am1 m050_am1.rtf >>>>>> m050_am1.crd m050_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m050_am1 -fi mol2 -rn m050 -o m050_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> chrm-0tk58efi equil_solv.out_m050 stderr.txt equil_solv.inp >>>>>> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp >>>>>> m050_am1.rtf m050_am1.prm m050_am1.crd water_400.crd >>>>>> equil_solv.out_m050 solv_m050.psf solv_m050_eq.crd solv_m050.rst >>>>>> solv_m050.trj solv_m050_min.crd >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/charmm.sh system:solv_m050 >>>>>> title:solv stitle:m050 rtffile:parm03_gaff_all.rtf >>>>>> paramfile:parm03_gaffnb_all.prm gaff:m050_am1 nwater:400 >>>>>> ligcrd:lyz rforce:0 iseed:3131887 rwater:15 nstep:10000 >>>>>> minstep:100 skipstep:100 startstep:10000 >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> chrm-zsk58efi equil_solv.out_m179 stderr.txt equil_solv.inp >>>>>> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp >>>>>> m179_am1.rtf m179_am1.prm m179_am1.crd water_400.crd >>>>>> equil_solv.out_m179 solv_m179.psf solv_m179_eq.crd solv_m179.rst >>>>>> solv_m179.trj solv_m179_min.crd >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/charmm.sh system:solv_m179 >>>>>> title:solv stitle:m179 rtffile:parm03_gaff_all.rtf >>>>>> paramfile:parm03_gaffnb_all.prm gaff:m179_am1 nwater:400 >>>>>> ligcrd:lyz rforce:0 iseed:3131887 rwater:15 nstep:10000 >>>>>> minstep:100 skipstep:100 startstep:10000 >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> pre_ch-38lc8efi stdout.txt stderr.txt . ./m197.mol2 ./m129.mol2 >>>>>> ./m069.mol2 ./m163.mol2 ./m128.mol2 ./m035.mol2 ./m070.mol2 >>>>>> ./m221.mol2 ./m162.mol2 ./m198.mol2 ./m034.mol2 ./m001.mol2 >>>>>> ./m220.mol2 ./m033.mol2 ./m161.mol2 ./m032.mol2 ./m160.mol2 >>>>>> ./m130.mol2 ./m071.mol2 ./m002.mol2 ./m199.mol2 ./m175.mol2 >>>>>> ./m234.mol2 ./m048.mol2 ./m107.mol2 ./m047.mol2 ./m106.mol2 >>>>>> ./m124.mol2 ./m193.mol2 ./m225.mol2 ./m066.mol2 ./m125.mol2 >>>>>> ./m176.mol2 ./m194.mol2 ./m224.mol2 ./m235.mol2 ./m067.mol2 >>>>>> ./m165.mol2 ./m049.mol2 ./m126.mol2 ./m166.mol2 ./m108.mol2 >>>>>> ./m195.mol2 ./m038.mol2 ./m059.mol2 ./m036.mol2 ./m186.mol2 >>>>>> ./m164.mol2 ./m117.mol2 ./m223.mol2 ./m058.mol2 ./m037.mol2 >>>>>> ./m188.mol2 ./m068.mol2 ./m119.mol2 ./m187.mol2 ./m196.mol2 >>>>>> ./m118.mol2 ./m127.mol2 ./m222.mol2 ./m189.mol2 ./m060.mol2 >>>>>> ./m236.mol2 ./m109.mol2 ./m177.mol2 ./m050.mol2 ./m179.mol2 >>>>>> ./m178.mol2 ./m123.mol2 ./m237.mol2 ./m110.mol2 ./m191.mol2 >>>>>> ./m100.mol2 ./m064.mol2 ./m041.mol2 ./m238.mol2 ./m063.mol2 >>>>>> ./m228.mol2 ./m051.mol2 ./m122.mol2 ./m169.mol2 ./m121.mol2 >>>>>> ./m190.mol2 ./m120.mol2 ./m062.mol2 ./m065.mol2 ./m039.mol2 >>>>>> ./m192.mol2 ./m167.mol2 ./m227.mol2 ./m040.mol2 ./m226.mol2 >>>>>> ./m168.mol2 ./m239.mol2 ./m052.mol2 ./m111.mol2 ./m180.mol2 >>>>>> ./m053.mol2 ./m112.mol2 ./m181.mol2 ./m240.mol2 ./m054.mol2 >>>>>> ./m044.mol2 ./m113.mol2 ./m230.mol2 ./m103.mol2 ./m229.mol2 >>>>>> ./m061.mol2 ./m042.mol2 ./m101.mol2 ./m170.mol2 ./m043.mol2 >>>>>> ./m102.mol2 ./m171.mol2 ./m151.mol2 ./m083.mol2 ./m210.mol2 >>>>>> ./m014.mol2 ./m023.mol2 ./m200.mol2 ./m092.mol2 ./m091.mol2 >>>>>> ./m150.mol2 ./m209.mol2 ./m022.mol2 ./m024.mol2 ./m093.mol2 >>>>>> ./m015.mol2 ./m084.mol2 ./m142.mol2 ./m201.mol2 ./m016.mol2 >>>>>> ./m085.mol2 ./m143.mol2 ./m202.mol2 ./m010.mol2 ./m212.mol2 >>>>>> ./m138.mol2 ./m026.mol2 ./m011.mol2 ./m095.mol2 ./m139.mol2 >>>>>> ./m154.mol2 ./m211.mol2 ./m025.mol2 ./m094.mol2 ./m153.mol2 >>>>>> ./m213.mol2 ./m080.mol2 ./m012.mol2 ./m152.mol2 ./m081.mol2 >>>>>> ./m140.mol2 ./m013.mol2 ./m082.mol2 ./m141.mol2 ./m028.mol2 >>>>>> ./m097.mol2 ./m155.mol2 ./m008.mol2 ./m214.mol2 ./m135.mol2 >>>>>> ./m029.mol2 ./m076.mol2 ./m098.mol2 ./m007.mol2 ./m156.mol2 >>>>>> ./m134.mol2 ./m215.mol2 ./m137.mol2 ./m079.mol2 ./m009.mol2 >>>>>> ./m078.mol2 ./m077.mol2 ./m096.mol2 ./m136.mol2 ./m027.mol2 >>>>>> ./m132.mol2 ./m158.mol2 ./m073.mol2 ./m217.mol2 ./m030.mol2 >>>>>> ./m159.mol2 ./m072.mol2 ./m218.mol2 ./m003.mol2 ./m031.mol2 >>>>>> ./m004.mol2 ./m219.mol2 ./m131.mol2 ./m074.mol2 ./m133.mol2 >>>>>> ./m006.mol2 ./m075.mol2 ./m157.mol2 ./m099.mol2 ./m005.mol2 >>>>>> ./m216.mol2 ./m090.mol2 ./m021.mol2 ./m208.mol2 ./m149.mol2 >>>>>> ./m020.mol2 ./m207.mol2 ./m148.mol2 ./m088.mol2 ./m089.mol2 >>>>>> ./m206.mol2 ./m147.mol2 ./m019.mol2 ./m205.mol2 ./m146.mol2 >>>>>> ./m087.mol2 ./m018.mol2 ./m204.mol2 ./m145.mol2 ./m086.mol2 >>>>>> ./m017.mol2 ./m144.mol2 ./m203.mol2 ./m057.mol2 ./m116.mol2 >>>>>> ./m232.mol2 ./m173.mol2 ./m105.mol2 ./m046.mol2 ./m231.mol2 >>>>>> ./m172.mol2 ./m104.mol2 ./m045.mol2 ./m174.mol2 ./m233.mol2 >>>>>> ./m244.mol2 ./m185.mol2 ./m182.mol2 ./m243.mol2 ./m055.mol2 >>>>>> ./m241.mol2 ./m183.mol2 ./m114.mol2 ./m056.mol2 ./m242.mol2 >>>>>> ./m184.mol2 ./m115.mol2 m197_am1 m129_am1 m069_am1 m163_am1 >>>>>> m128_am1 m035_am1 m070_am1 m221_am1 m162_am1 m198_am1 m034_am1 >>>>>> m001_am1 m220_am1 m033_am1 m161_am1 m032_am1 m160_am1 m130_am1 >>>>>> m071_am1 m002_am1 m199_am1 m175_am1 m234_am1 m048_am1 m107_am1 >>>>>> m047_am1 m106_am1 m124_am1 m193_am1 m225_am1 m066_am1 m125_am1 >>>>>> m176_am1 m194_am1 m224_am1 m235_am1 m067_am1 m165_am1 m049_am1 >>>>>> m126_am1 m166_am1 m108_am1 m195_am1 m038_am1 m059_am1 m036_am1 >>>>>> m186_am1 m164_am1 m223_am1 m117_am1 m037_am1 m058_am1 m068_am1 >>>>>> m188_am1 m119_am1 m196_am1 m187_am1 m222_am1 m127_am1 m118_am1 >>>>>> m189_am1 m060_am1 m236_am1 m109_am1 m177_am1 m050_am1 m179_am1 >>>>>> m123_am1 m178_am1 m237_am1 m100_am1 m191_am1 m110_am1 m041_am1 >>>>>> m064_am1 m228_am1 m063_am1 m238_am1 m169_am1 m122_am1 m051_am1 >>>>>> m121_am1 m190_am1 m120_am1 m062_am1 m039_am1 m065_am1 m167_am1 >>>>>> m192_am1 m227_am1 m040_am1 m226_am1 m168_am1 m239_am1 m052_am1 >>>>>> m111_am1 m180_am1 m053_am1 m112_am1 m181_am1 m240_am1 m054_am1 >>>>>> m044_am1 m113_am1 m230_am1 m103_am1 m229_am1 m061_am1 m042_am1 >>>>>> m101_am1 m170_am1 m043_am1 m102_am1 m171_am1 m151_am1 m083_am1 >>>>>> m210_am1 m014_am1 m023_am1 m200_am1 m092_am1 m091_am1 m150_am1 >>>>>> m209_am1 m022_am1 m024_am1 m093_am1 m015_am1 m084_am1 m142_am1 >>>>>> m201_am1 m016_am1 m085_am1 m143_am1 m202_am1 m010_am1 m212_am1 >>>>>> m138_am1 m026_am1 m011_am1 m095_am1 m139_am1 m154_am1 m211_am1 >>>>>> m025_am1 m094_am1 m153_am1 m213_am1 m080_am1 m012_am1 m152_am1 >>>>>> m081_am1 m140_am1 m013_am1 m082_am1 m141_am1 m028_am1 m097_am1 >>>>>> m155_am1 m008_am1 m214_am1 m135_am1 m029_am1 m076_am1 m098_am1 >>>>>> m007_am1 m156_am1 m134_am1 m215_am1 m137_am1 m079_am1 m009_am1 >>>>>> m078_am1 m077_am1 m096_am1 m136_am1 m027_am1 m132_am1 m158_am1 >>>>>> m073_am1 m217_am1 m030_am1 m159_am1 m072_am1 m218_am1 m003_am1 >>>>>> m031_am1 m004_am1 m219_am1 m131_am1 m074_am1 m133_am1 m006_am1 >>>>>> m075_am1 m157_am1 m099_am1 m216_am1 m005_am1 m090_am1 m021_am1 >>>>>> m208_am1 m149_am1 m020_am1 m207_am1 m148_am1 m089_am1 m088_am1 >>>>>> m206_am1 m147_am1 m019_am1 m205_am1 m146_am1 m087_am1 m018_am1 >>>>>> m204_am1 m145_am1 m086_am1 m017_am1 m144_am1 m203_am1 m057_am1 >>>>>> m116_am1 m232_am1 m173_am1 m105_am1 m046_am1 m231_am1 m172_am1 >>>>>> m104_am1 m045_am1 m174_am1 m233_am1 m244_am1 m185_am1 m182_am1 >>>>>> m243_am1 m055_am1 m241_am1 m183_am1 m114_am1 m056_am1 m242_am1 >>>>>> m184_am1 m115_am1 >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-58lc8efi stdout.txt stderr.txt m197_am1 m197_am1.rtf >>>>>> m197_am1.crd m197_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m197_am1 -fi mol2 -rn m197 -o m197_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-48lc8efi stdout.txt stderr.txt m129_am1 m129_am1.rtf >>>>>> m129_am1.crd m129_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m129_am1 -fi mol2 -rn m129 -o m129_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-68lc8efi stdout.txt stderr.txt m069_am1 m069_am1.rtf >>>>>> m069_am1.crd m069_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m069_am1 -fi mol2 -rn m069 -o m069_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-88lc8efi stdout.txt stderr.txt m163_am1 m163_am1.rtf >>>>>> m163_am1.crd m163_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m163_am1 -fi mol2 -rn m163 -o m163_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-78lc8efi stdout.txt stderr.txt m128_am1 m128_am1.rtf >>>>>> m128_am1.crd m128_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m128_am1 -fi mol2 -rn m128 -o m128_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-98lc8efi stdout.txt stderr.txt m035_am1 m035_am1.rtf >>>>>> m035_am1.crd m035_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m035_am1 -fi mol2 -rn m035 -o m035_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-a8lc8efi stdout.txt stderr.txt m070_am1 m070_am1.rtf >>>>>> m070_am1.crd m070_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m070_am1 -fi mol2 -rn m070 -o m070_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-b8lc8efi stdout.txt stderr.txt m221_am1 m221_am1.rtf >>>>>> m221_am1.crd m221_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m221_am1 -fi mol2 -rn m221 -o m221_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-c8lc8efi stdout.txt stderr.txt m162_am1 m162_am1.rtf >>>>>> m162_am1.crd m162_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m162_am1 -fi mol2 -rn m162 -o m162_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-d8lc8efi stdout.txt stderr.txt m198_am1 m198_am1.rtf >>>>>> m198_am1.crd m198_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m198_am1 -fi mol2 -rn m198 -o m198_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-e8lc8efi stdout.txt stderr.txt m034_am1 m034_am1.rtf >>>>>> m034_am1.crd m034_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m034_am1 -fi mol2 -rn m034 -o m034_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-f8lc8efi stdout.txt stderr.txt m001_am1 m001_am1.rtf >>>>>> m001_am1.crd m001_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m001_am1 -fi mol2 -rn m001 -o m001_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-h8lc8efi stdout.txt stderr.txt m033_am1 m033_am1.rtf >>>>>> m033_am1.crd m033_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m033_am1 -fi mol2 -rn m033 -o m033_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-g8lc8efi stdout.txt stderr.txt m220_am1 m220_am1.rtf >>>>>> m220_am1.crd m220_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m220_am1 -fi mol2 -rn m220 -o m220_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-i8lc8efi stdout.txt stderr.txt m161_am1 m161_am1.rtf >>>>>> m161_am1.crd m161_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m161_am1 -fi mol2 -rn m161 -o m161_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-j8lc8efi stdout.txt stderr.txt m032_am1 m032_am1.rtf >>>>>> m032_am1.crd m032_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m032_am1 -fi mol2 -rn m032 -o m032_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-k8lc8efi stdout.txt stderr.txt m160_am1 m160_am1.rtf >>>>>> m160_am1.crd m160_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m160_am1 -fi mol2 -rn m160 -o m160_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-l8lc8efi stdout.txt stderr.txt m130_am1 m130_am1.rtf >>>>>> m130_am1.crd m130_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m130_am1 -fi mol2 -rn m130 -o m130_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-m8lc8efi stdout.txt stderr.txt m071_am1 m071_am1.rtf >>>>>> m071_am1.crd m071_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m071_am1 -fi mol2 -rn m071 -o m071_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-o8lc8efi stdout.txt stderr.txt m199_am1 m199_am1.rtf >>>>>> m199_am1.crd m199_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m199_am1 -fi mol2 -rn m199 -o m199_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-n8lc8efi stdout.txt stderr.txt m002_am1 m002_am1.rtf >>>>>> m002_am1.crd m002_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m002_am1 -fi mol2 -rn m002 -o m002_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-p8lc8efi stdout.txt stderr.txt m175_am1 m175_am1.rtf >>>>>> m175_am1.crd m175_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m175_am1 -fi mol2 -rn m175 -o m175_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-q8lc8efi stdout.txt stderr.txt m234_am1 m234_am1.rtf >>>>>> m234_am1.crd m234_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m234_am1 -fi mol2 -rn m234 -o m234_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-s8lc8efi stdout.txt stderr.txt m107_am1 m107_am1.rtf >>>>>> m107_am1.crd m107_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m107_am1 -fi mol2 -rn m107 -o m107_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-r8lc8efi stdout.txt stderr.txt m048_am1 m048_am1.rtf >>>>>> m048_am1.crd m048_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m048_am1 -fi mol2 -rn m048 -o m048_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-v8lc8efi stdout.txt stderr.txt m124_am1 m124_am1.rtf >>>>>> m124_am1.crd m124_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m124_am1 -fi mol2 -rn m124 -o m124_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-t8lc8efi stdout.txt stderr.txt m047_am1 m047_am1.rtf >>>>>> m047_am1.crd m047_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m047_am1 -fi mol2 -rn m047 -o m047_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-u8lc8efi stdout.txt stderr.txt m106_am1 m106_am1.rtf >>>>>> m106_am1.crd m106_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m106_am1 -fi mol2 -rn m106 -o m106_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-x8lc8efi stdout.txt stderr.txt m193_am1 m193_am1.rtf >>>>>> m193_am1.crd m193_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m193_am1 -fi mol2 -rn m193 -o m193_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-y8lc8efi stdout.txt stderr.txt m225_am1 m225_am1.rtf >>>>>> m225_am1.crd m225_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m225_am1 -fi mol2 -rn m225 -o m225_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-z8lc8efi stdout.txt stderr.txt m066_am1 m066_am1.rtf >>>>>> m066_am1.crd m066_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m066_am1 -fi mol2 -rn m066 -o m066_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-09lc8efi stdout.txt stderr.txt m125_am1 m125_am1.rtf >>>>>> m125_am1.crd m125_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m125_am1 -fi mol2 -rn m125 -o m125_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-29lc8efi stdout.txt stderr.txt m194_am1 m194_am1.rtf >>>>>> m194_am1.crd m194_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m194_am1 -fi mol2 -rn m194 -o m194_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-19lc8efi stdout.txt stderr.txt m176_am1 m176_am1.rtf >>>>>> m176_am1.crd m176_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m176_am1 -fi mol2 -rn m176 -o m176_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-39lc8efi stdout.txt stderr.txt m224_am1 m224_am1.rtf >>>>>> m224_am1.crd m224_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m224_am1 -fi mol2 -rn m224 -o m224_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-49lc8efi stdout.txt stderr.txt m235_am1 m235_am1.rtf >>>>>> m235_am1.crd m235_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m235_am1 -fi mol2 -rn m235 -o m235_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-69lc8efi stdout.txt stderr.txt m165_am1 m165_am1.rtf >>>>>> m165_am1.crd m165_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m165_am1 -fi mol2 -rn m165 -o m165_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-59lc8efi stdout.txt stderr.txt m067_am1 m067_am1.rtf >>>>>> m067_am1.crd m067_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m067_am1 -fi mol2 -rn m067 -o m067_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-79lc8efi stdout.txt stderr.txt m049_am1 m049_am1.rtf >>>>>> m049_am1.crd m049_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m049_am1 -fi mol2 -rn m049 -o m049_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-89lc8efi stdout.txt stderr.txt m126_am1 m126_am1.rtf >>>>>> m126_am1.crd m126_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m126_am1 -fi mol2 -rn m126 -o m126_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-99lc8efi stdout.txt stderr.txt m166_am1 m166_am1.rtf >>>>>> m166_am1.crd m166_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m166_am1 -fi mol2 -rn m166 -o m166_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-a9lc8efi stdout.txt stderr.txt m108_am1 m108_am1.rtf >>>>>> m108_am1.crd m108_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m108_am1 -fi mol2 -rn m108 -o m108_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-b9lc8efi stdout.txt stderr.txt m195_am1 m195_am1.rtf >>>>>> m195_am1.crd m195_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m195_am1 -fi mol2 -rn m195 -o m195_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-d9lc8efi stdout.txt stderr.txt m038_am1 m038_am1.rtf >>>>>> m038_am1.crd m038_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m038_am1 -fi mol2 -rn m038 -o m038_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-c9lc8efi stdout.txt stderr.txt m059_am1 m059_am1.rtf >>>>>> m059_am1.crd m059_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m059_am1 -fi mol2 -rn m059 -o m059_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-e9lc8efi stdout.txt stderr.txt m186_am1 m186_am1.rtf >>>>>> m186_am1.crd m186_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m186_am1 -fi mol2 -rn m186 -o m186_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-f9lc8efi stdout.txt stderr.txt m164_am1 m164_am1.rtf >>>>>> m164_am1.crd m164_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m164_am1 -fi mol2 -rn m164 -o m164_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-h9lc8efi stdout.txt stderr.txt m036_am1 m036_am1.rtf >>>>>> m036_am1.crd m036_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m036_am1 -fi mol2 -rn m036 -o m036_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-g9lc8efi stdout.txt stderr.txt m223_am1 m223_am1.rtf >>>>>> m223_am1.crd m223_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m223_am1 -fi mol2 -rn m223 -o m223_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-j9lc8efi stdout.txt stderr.txt m058_am1 m058_am1.rtf >>>>>> m058_am1.crd m058_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m058_am1 -fi mol2 -rn m058 -o m058_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-k9lc8efi stdout.txt stderr.txt m037_am1 m037_am1.rtf >>>>>> m037_am1.crd m037_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m037_am1 -fi mol2 -rn m037 -o m037_am1 -fo charmm -c bcc >>>>>> >>>>>> >>>>>> >>>>>> Veronika Nefedova wrote: >>>>>>> Swift thinks that it sent 248 jobs. >>>>>>> >>>>>>> nefedova at viper:~/alamines> grep "Running job " >>>>>>> MolDyn-244-loops-dbui34oxjr4j2.log | wc >>>>>>> 248 6931 56718 >>>>>>> nefedova at viper:~/alamines> >>>>>>> >>>>>>> On Aug 6, 2007, at 3:27 PM, Ioan Raicu wrote: >>>>>>> >>>>>>>> Everything is idle, there is no work to be done... >>>>>>>> >>>>>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> tail >>>>>>>> GenericPortalWS_perf_per_sec.txt >>>>>>>> 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> >>>>>>>> 24 workers are registered but idle.... queue length 0, 57 jobs >>>>>>>> completed. >>>>>>>> >>>>>>>> Also, see below all 57 jobs, they all finished with an exit >>>>>>>> code of 0, in other words succesfully! How many jobs does >>>>>>>> Swift think it sent? >>>>>>>> >>>>>>>> Ioan >>>>>>>> >>>>>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>>>>>>> GenericPortalWS_taskPerf.txt >>>>>>>> //taskNum taskID workerID startTimeStamp execTimeStamp >>>>>>>> resultsQueueTimeStamp endTimeStamp waitQueueTime ex >>>>>>>> ecTime resultsQueueTime totalTime exitCode >>>>>>>> 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 560614 >>>>>>>> 560629 49780 338 15 50133 0 >>>>>>>> 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 561200 >>>>>>>> 561899 561909 216 699 10 925 0 >>>>>>>> 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 561373 >>>>>>>> 562150 562159 382 777 9 1168 0 >>>>>>>> 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 >>>>>>>> 1044916 1044926 62404 10200 10 72614 0 >>>>>>>> 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 1046453 >>>>>>>> 1047038 1047067 135 585 29 749 0 >>>>>>>> 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 1046429 >>>>>>>> 1053072 1053080 114 6643 8 6765 0 >>>>>>>> 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 1047051 >>>>>>>> 1054256 1054290 731 7205 34 7970 0 >>>>>>>> 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 1054267 >>>>>>>> 1054570 1054579 7943 303 9 8255 0 >>>>>>>> 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 1053087 >>>>>>>> 1056811 1056819 6765 3724 8 10497 0 >>>>>>>> 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 1054583 >>>>>>>> 1058691 1058719 8257 4108 28 12393 0 >>>>>>>> 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 1058704 >>>>>>>> 1059363 1059385 12373 659 22 13054 0 >>>>>>>> 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 1056826 >>>>>>>> 1060315 1060323 10497 3489 8 13994 0 >>>>>>>> 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 1059375 >>>>>>>> 1060589 1060596 13042 1214 7 14263 0 >>>>>>>> 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 >>>>>>>> 1060603 1060954 1061054 14265 351 100 14716 0 >>>>>>>> 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 >>>>>>>> 1060329 1061094 1061126 13993 765 32 14790 0 >>>>>>>> 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 >>>>>>>> 1061105 1065608 1065617 14414 4503 9 18926 0 >>>>>>>> 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 >>>>>>>> 1065622 1066307 1066315 18929 685 8 19622 0 >>>>>>>> 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 >>>>>>>> 1061045 1067540 1067563 14356 6495 23 20874 0 >>>>>>>> 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 >>>>>>>> 1066320 1069262 1069271 19625 2942 9 22576 0 >>>>>>>> 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 >>>>>>>> 1067551 1071003 1071011 20854 3452 8 24314 0 >>>>>>>> 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 >>>>>>>> 1071016 1071664 1071671 24316 648 7 24971 0 >>>>>>>> 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 >>>>>>>> 1069275 1071679 1071692 22577 2404 13 24994 0 >>>>>>>> 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 >>>>>>>> 1071687 1073978 1073988 24985 2291 10 27286 0 >>>>>>>> 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 >>>>>>>> 1073992 1075959 1075969 27286 1967 10 29263 0 >>>>>>>> 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 >>>>>>>> 1071699 1076704 1076713 24995 5005 9 30009 0 >>>>>>>> 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 >>>>>>>> 1075972 1077451 1077459 29264 1479 8 30751 0 >>>>>>>> 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 >>>>>>>> 1076717 1080157 1080165 30007 3440 8 33455 0 >>>>>>>> 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 >>>>>>>> 1077464 1080270 1080286 30752 2806 16 33574 0 >>>>>>>> 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 >>>>>>>> 1080170 1080611 1080619 33457 441 8 33906 0 >>>>>>>> 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 >>>>>>>> 1080624 1080973 1080983 33907 349 10 34266 0 >>>>>>>> 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 >>>>>>>> 1080281 1081405 1081413 33566 1124 8 34698 0 >>>>>>>> 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 >>>>>>>> 1080986 1082989 1082996 34267 2003 7 36277 0 >>>>>>>> 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 >>>>>>>> 1083002 1083370 1083378 36279 368 8 36655 0 >>>>>>>> 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 >>>>>>>> 1081417 1084830 1084837 34696 3413 7 38116 0 >>>>>>>> 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 >>>>>>>> 1084843 1085854 1085879 37761 1011 25 38797 0 >>>>>>>> 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 >>>>>>>> 1085865 1089502 1089511 38780 3637 9 42426 0 >>>>>>>> 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 >>>>>>>> 1089515 1089966 1089974 42428 451 8 42887 0 >>>>>>>> 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 >>>>>>>> 1083383 1091316 1091324 36658 7933 8 44599 0 >>>>>>>> 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 >>>>>>>> 1091329 1092042 1092049 44237 713 7 44957 0 >>>>>>>> 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 >>>>>>>> 1092055 1094242 1094249 44960 2187 7 47154 0 >>>>>>>> 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 >>>>>>>> 1089979 1094418 1094428 42889 4439 10 47338 0 >>>>>>>> 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 >>>>>>>> 1094433 1095082 1095089 47331 649 7 47987 0 >>>>>>>> 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 >>>>>>>> 1095095 1096846 1096853 47991 1751 7 49749 0 >>>>>>>> 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 >>>>>>>> 1094256 1098214 1098221 47156 3958 7 51121 0 >>>>>>>> 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 >>>>>>>> 1096859 1098627 1098637 49752 1768 10 51530 0 >>>>>>>> 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 >>>>>>>> 1094037 1098903 1098910 46940 4866 7 51813 0 >>>>>>>> 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 >>>>>>>> 1099192 1100210 1100246 52071 1018 36 53125 0 >>>>>>>> 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 >>>>>>>> 1097371 1100555 1100562 50260 3184 7 53451 0 >>>>>>>> 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 >>>>>>>> 1097135 1100896 1100904 50026 3761 8 53795 0 >>>>>>>> 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 >>>>>>>> 1098640 1101106 1101127 51523 2466 21 54010 0 >>>>>>>> 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 >>>>>>>> 1099965 1101217 1101224 52842 1252 7 54101 0 >>>>>>>> 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 >>>>>>>> 1098227 1101820 1101828 51112 3593 8 54713 0 >>>>>>>> 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 >>>>>>>> 1097375 1104132 1104139 50262 6757 7 57026 0 >>>>>>>> 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 >>>>>>>> 1100221 1106449 1106458 53096 6228 9 59333 0 >>>>>>>> 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 >>>>>>>> 1098916 1106473 1106481 51797 7557 8 59362 0 >>>>>>>> 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 563384 >>>>>>>> 1207793 1207801 71 644409 8 644488 0 >>>>>>>> 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 563413 >>>>>>>> 1216404 1216425 98 652991 21 653110 0 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Veronika Nefedova wrote: >>>>>>>>> OK. There is something weird happening. I've got several such >>>>>>>>> entries in my swift log: >>>>>>>>> >>>>>>>>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application >>>>>>>>> exception: Task failed >>>>>>>>> task:execute @ vdl-int.k, line: 332 >>>>>>>>> vdl:execute2 @ execute-default.k, line: 22 >>>>>>>>> vdl:execute @ MolDyn-244-loops.kml, line: 20 >>>>>>>>> antchmbr @ MolDyn-244-loops.kml, line: 2845 >>>>>>>>> vdl:mains @ MolDyn-244-loops.kml, line: 2267 >>>>>>>>> >>>>>>>>> >>>>>>>>> Looks like antechamber has failed (?). And the failure is only >>>>>>>>> on a swfit side, it never made it across to Falcon (there are >>>>>>>>> no remote directories created). But I see some of antechamber >>>>>>>>> jobs have finished (in shared). >>>>>>>>> >>>>>>>>> Yuqing -- could the changes you've made be responsible for >>>>>>>>> these failures (I do not see how it could though) ? >>>>>>>>> >>>>>>>>> Ioan, what do you see in your logs ion these tasks: >>>>>>>>> >>>>>>>>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn:0-1-56-0-1186429255786) setting status to Failed >>>>>>>>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn:0-1-57-0-1186429255798) setting status to Failed >>>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn:0-1-59-0-1186429255800) setting status to Failed >>>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn:0-1-60-0-1186429255805) setting status to Failed >>>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn:0-1-61-0-1186429255811) setting status to Failed >>>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn:0-1-58-0-1186429255814) setting status to Failed >>>>>>>>> >>>>>>>>> Nika >>>>>>>>> >>>>>>>>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >>>>>>>>> >>>>>>>>>> OK! >>>>>>>>>> Why don't we do one last run from my allocation, as >>>>>>>>>> everything is set up already and ready to go! Make sure to >>>>>>>>>> enable all debug logging. Falkon is up and running with all >>>>>>>>>> debug enabled! >>>>>>>>>> >>>>>>>>>> Falkon location is unchanged from the last experiment. >>>>>>>>>> Falkon Factory Service: >>>>>>>>>> http://tg-viz-login2:50010/wsrf/services/GenericPortal/core/WS/GPFactoryService >>>>>>>>>> >>>>>>>>>> Web Server (graphs): >>>>>>>>>> http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>>>>>>>> >>>>>>>>>> ANL/UC is not quite so idle as it was earlier, but I bet we >>>>>>>>>> could still get 150~200 processors! >>>>>>>>>> >>>>>>>>>> Ioan >>>>>>>>>> >>>>>>>>>> Veronika Nefedova wrote: >>>>>>>>>>> m050 and m179 finished just fine now via GRAM (thanks to >>>>>>>>>>> Yuqing who fixed the m179 just in time!). We could start >>>>>>>>>>> again the 244- molecule run to verify that nothing is wrong >>>>>>>>>>> with the whole system. >>>>>>>>>>> >>>>>>>>>>> Nika >>>>>>>>>>> >>>>>>>>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I started those 2 molecules via GRAM. I have no trust in >>>>>>>>>>>> m179 finishing completely since I didn't change anything. I >>>>>>>>>>>> hope for m050 to finish though... >>>>>>>>>>>> You can watch the swift log on viper in >>>>>>>>>>>> ~nefedova/alamines/MolDyn-2-loops-be9484k93kk21.log >>>>>>>>>>>> >>>>>>>>>>>> Nika >>>>>>>>>>>> >>>>>>>>>>>>> Then, let's try another run with 244 molecules soon, as >>>>>>>>>>>>> most of ANL/UC is free! >>>>>>>>>>>>> >>>>>>>>>>>>> Ioan >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> >>> >> > > From iraicu at cs.uchicago.edu Mon Aug 6 21:56:21 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 21:56:21 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> Message-ID: <46B7DF55.4080205@cs.uchicago.edu> One other thing, in the past, once it got past the first few stages, it would submit about 16500 jobs all at once, and then it would keep sending a few at a time for every few that were completed.... this time, it sent out about 6000 jobs all at once (making the queue go up to 7K+ jobs), but after that, it did not submit any new jobs, despite many jobs completing.... and eventually, the queue went to 0, and it went all idle.... this is very different than what we saw in previous runs! Whatever happened, it happened in the middle of the experiment, when it only sent the 6K jobs (instead of 16K it would normally send at this stage). If there is no discrepancy between the # of jobs Swift think it sent Falkon and what Falkon received, then it is beyond me what happened. Ioan Veronika Nefedova wrote: > Whats up now? Everything has stopped, no errors on swift site... > Do you have any errors now? > > Nika > > On Aug 6, 2007, at 6:04 PM, Ioan Raicu wrote: > >> OK, I restarted Falkon as well as there were 12K jobs trying to go >> through, and keeping the entire ANL/UC site busy, although there was >> no Swift on the other end to pick up the notifications... >> >> here is the new info: >> >> Falkon Factory Service: >> http://tg-viz-login2:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService >> >> Web server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >> >> Note that I changed the port #, its now 50020, so don't forget to >> change that before you start Swift... >> >> Ioan >> >> Veronika Nefedova wrote: >>> OK. I accidentally closed viper window where I started the workflow. >>> The workflow was started with & so it was supposed to stay up even >>> if I exited the shell. But apparently it didn't! >>> >>> This is the last entry in the log: >>> >>> 2007-08-06 17:16:59,483 INFO ResourcePool Destroying remote service >>> instance... dummy function, this doesn't really do anything... >>> >>> (and it doesn't change ever since). >>> >>> What went wrong ? Why closing the shell actually killed the job? (ps >>> shows no swift job) >>> I checked 'history' and in fact the job was started with &: >>> >>> 999 swift -tc.file tc-uc.data -sites.file sites-uc-64.xml -debug >>> MolDyn-244-loops.swift & >>> >>> I'll restart the workflow in 30 mins or so (from home) again. >>> >>> Sigh... >>> >>> Nika >>> >>> >>> On Aug 6, 2007, at 4:29 PM, Veronika Nefedova wrote: >>> >>>> Ioan, its all was due to NFS problems, I am convinced now... >>>> >>>> I restarted the run, the log is >>>> ~nefedova/alamines/MolDyn-244-loops-hxl1glhtqsag0.log >>>> >>>> Nika >>>> >>>> On Aug 6, 2007, at 4:20 PM, Ioan Raicu wrote: >>>> >>>>> Just to debug further.... I picked out 1 task at random from the >>>>> Swift log... >>>>> iraicu at viper:/home/nefedova/alamines> cat >>>>> MolDyn-244-loops-dbui34oxjr4j2.log | grep >>>>> "urn:0-1-62-0-1186429258791" >>>>> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, >>>>> identity=urn:0-1-62-0-1186429258791) setting status to Submitted >>>>> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, >>>>> identity=urn:0-1-62-0-1186429258791) setting status to Active >>>>> 2007-08-06 14:47:03,704 DEBUG TaskImpl Task(type=2, >>>>> identity=urn:0-1-62-0-1186429258791) setting status to Failed >>>>> Exception in getFile >>>>> >>>>> but in my log, it is nowhere to be found... >>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>>>> GenericPortalWS_taskPerf.txt | grep "urn:0-1-62-0-1186429258791" >>>>> >>>>> What does "setting status to Failed Exception in getFile" mean? >>>>> Could this mean that it failed on the data staging part, and that >>>>> it never made it to Falkon? >>>>> >>>>> BTW, it lloks as if there were really 539 jobs submitted... >>>>> >>>>> iraicu at viper:/home/nefedova/alamines> grep "Submitted" >>>>> MolDyn-244-loops-dbui34oxjr4j2.log | wc >>>>> 539 5390 62835 >>>>> >>>>> but again, only 57 made it to Falkon, and there were no exceptions >>>>> thrown anywhere to indicate that something unusual happened. >>>>> >>>>> Ioan >>>>> >>>>> Ioan Raicu wrote: >>>>>> Falkon only has 57 tasks received, here they are: >>>>>> tg-viz-login.uc.teragrid.org:/home/iraicu/java/Falkon_v0.8.1/service/logs/GenericPortalWS.txt.0.summary >>>>>> >>>>>> >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> pre_ch-vsk58efi stdout.txt stderr.txt . ./m179.mol2 ./m050.mol2 >>>>>> m179_am1 m050_am1 >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-xsk58efi stdout.txt stderr.txt m179_am1 m179_am1.rtf >>>>>> m179_am1.crd m179_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m179_am1 -fi mol2 -rn m179 -o m179_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-ysk58efi stdout.txt stderr.txt m050_am1 m050_am1.rtf >>>>>> m050_am1.crd m050_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m050_am1 -fi mol2 -rn m050 -o m050_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> chrm-0tk58efi equil_solv.out_m050 stderr.txt equil_solv.inp >>>>>> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp >>>>>> m050_am1.rtf m050_am1.prm m050_am1.crd water_400.crd >>>>>> equil_solv.out_m050 solv_m050.psf solv_m050_eq.crd solv_m050.rst >>>>>> solv_m050.trj solv_m050_min.crd >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/charmm.sh system:solv_m050 >>>>>> title:solv stitle:m050 rtffile:parm03_gaff_all.rtf >>>>>> paramfile:parm03_gaffnb_all.prm gaff:m050_am1 nwater:400 >>>>>> ligcrd:lyz rforce:0 iseed:3131887 rwater:15 nstep:10000 >>>>>> minstep:100 skipstep:100 startstep:10000 >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> chrm-zsk58efi equil_solv.out_m179 stderr.txt equil_solv.inp >>>>>> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp >>>>>> m179_am1.rtf m179_am1.prm m179_am1.crd water_400.crd >>>>>> equil_solv.out_m179 solv_m179.psf solv_m179_eq.crd solv_m179.rst >>>>>> solv_m179.trj solv_m179_min.crd >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/charmm.sh system:solv_m179 >>>>>> title:solv stitle:m179 rtffile:parm03_gaff_all.rtf >>>>>> paramfile:parm03_gaffnb_all.prm gaff:m179_am1 nwater:400 >>>>>> ligcrd:lyz rforce:0 iseed:3131887 rwater:15 nstep:10000 >>>>>> minstep:100 skipstep:100 startstep:10000 >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> pre_ch-38lc8efi stdout.txt stderr.txt . ./m197.mol2 ./m129.mol2 >>>>>> ./m069.mol2 ./m163.mol2 ./m128.mol2 ./m035.mol2 ./m070.mol2 >>>>>> ./m221.mol2 ./m162.mol2 ./m198.mol2 ./m034.mol2 ./m001.mol2 >>>>>> ./m220.mol2 ./m033.mol2 ./m161.mol2 ./m032.mol2 ./m160.mol2 >>>>>> ./m130.mol2 ./m071.mol2 ./m002.mol2 ./m199.mol2 ./m175.mol2 >>>>>> ./m234.mol2 ./m048.mol2 ./m107.mol2 ./m047.mol2 ./m106.mol2 >>>>>> ./m124.mol2 ./m193.mol2 ./m225.mol2 ./m066.mol2 ./m125.mol2 >>>>>> ./m176.mol2 ./m194.mol2 ./m224.mol2 ./m235.mol2 ./m067.mol2 >>>>>> ./m165.mol2 ./m049.mol2 ./m126.mol2 ./m166.mol2 ./m108.mol2 >>>>>> ./m195.mol2 ./m038.mol2 ./m059.mol2 ./m036.mol2 ./m186.mol2 >>>>>> ./m164.mol2 ./m117.mol2 ./m223.mol2 ./m058.mol2 ./m037.mol2 >>>>>> ./m188.mol2 ./m068.mol2 ./m119.mol2 ./m187.mol2 ./m196.mol2 >>>>>> ./m118.mol2 ./m127.mol2 ./m222.mol2 ./m189.mol2 ./m060.mol2 >>>>>> ./m236.mol2 ./m109.mol2 ./m177.mol2 ./m050.mol2 ./m179.mol2 >>>>>> ./m178.mol2 ./m123.mol2 ./m237.mol2 ./m110.mol2 ./m191.mol2 >>>>>> ./m100.mol2 ./m064.mol2 ./m041.mol2 ./m238.mol2 ./m063.mol2 >>>>>> ./m228.mol2 ./m051.mol2 ./m122.mol2 ./m169.mol2 ./m121.mol2 >>>>>> ./m190.mol2 ./m120.mol2 ./m062.mol2 ./m065.mol2 ./m039.mol2 >>>>>> ./m192.mol2 ./m167.mol2 ./m227.mol2 ./m040.mol2 ./m226.mol2 >>>>>> ./m168.mol2 ./m239.mol2 ./m052.mol2 ./m111.mol2 ./m180.mol2 >>>>>> ./m053.mol2 ./m112.mol2 ./m181.mol2 ./m240.mol2 ./m054.mol2 >>>>>> ./m044.mol2 ./m113.mol2 ./m230.mol2 ./m103.mol2 ./m229.mol2 >>>>>> ./m061.mol2 ./m042.mol2 ./m101.mol2 ./m170.mol2 ./m043.mol2 >>>>>> ./m102.mol2 ./m171.mol2 ./m151.mol2 ./m083.mol2 ./m210.mol2 >>>>>> ./m014.mol2 ./m023.mol2 ./m200.mol2 ./m092.mol2 ./m091.mol2 >>>>>> ./m150.mol2 ./m209.mol2 ./m022.mol2 ./m024.mol2 ./m093.mol2 >>>>>> ./m015.mol2 ./m084.mol2 ./m142.mol2 ./m201.mol2 ./m016.mol2 >>>>>> ./m085.mol2 ./m143.mol2 ./m202.mol2 ./m010.mol2 ./m212.mol2 >>>>>> ./m138.mol2 ./m026.mol2 ./m011.mol2 ./m095.mol2 ./m139.mol2 >>>>>> ./m154.mol2 ./m211.mol2 ./m025.mol2 ./m094.mol2 ./m153.mol2 >>>>>> ./m213.mol2 ./m080.mol2 ./m012.mol2 ./m152.mol2 ./m081.mol2 >>>>>> ./m140.mol2 ./m013.mol2 ./m082.mol2 ./m141.mol2 ./m028.mol2 >>>>>> ./m097.mol2 ./m155.mol2 ./m008.mol2 ./m214.mol2 ./m135.mol2 >>>>>> ./m029.mol2 ./m076.mol2 ./m098.mol2 ./m007.mol2 ./m156.mol2 >>>>>> ./m134.mol2 ./m215.mol2 ./m137.mol2 ./m079.mol2 ./m009.mol2 >>>>>> ./m078.mol2 ./m077.mol2 ./m096.mol2 ./m136.mol2 ./m027.mol2 >>>>>> ./m132.mol2 ./m158.mol2 ./m073.mol2 ./m217.mol2 ./m030.mol2 >>>>>> ./m159.mol2 ./m072.mol2 ./m218.mol2 ./m003.mol2 ./m031.mol2 >>>>>> ./m004.mol2 ./m219.mol2 ./m131.mol2 ./m074.mol2 ./m133.mol2 >>>>>> ./m006.mol2 ./m075.mol2 ./m157.mol2 ./m099.mol2 ./m005.mol2 >>>>>> ./m216.mol2 ./m090.mol2 ./m021.mol2 ./m208.mol2 ./m149.mol2 >>>>>> ./m020.mol2 ./m207.mol2 ./m148.mol2 ./m088.mol2 ./m089.mol2 >>>>>> ./m206.mol2 ./m147.mol2 ./m019.mol2 ./m205.mol2 ./m146.mol2 >>>>>> ./m087.mol2 ./m018.mol2 ./m204.mol2 ./m145.mol2 ./m086.mol2 >>>>>> ./m017.mol2 ./m144.mol2 ./m203.mol2 ./m057.mol2 ./m116.mol2 >>>>>> ./m232.mol2 ./m173.mol2 ./m105.mol2 ./m046.mol2 ./m231.mol2 >>>>>> ./m172.mol2 ./m104.mol2 ./m045.mol2 ./m174.mol2 ./m233.mol2 >>>>>> ./m244.mol2 ./m185.mol2 ./m182.mol2 ./m243.mol2 ./m055.mol2 >>>>>> ./m241.mol2 ./m183.mol2 ./m114.mol2 ./m056.mol2 ./m242.mol2 >>>>>> ./m184.mol2 ./m115.mol2 m197_am1 m129_am1 m069_am1 m163_am1 >>>>>> m128_am1 m035_am1 m070_am1 m221_am1 m162_am1 m198_am1 m034_am1 >>>>>> m001_am1 m220_am1 m033_am1 m161_am1 m032_am1 m160_am1 m130_am1 >>>>>> m071_am1 m002_am1 m199_am1 m175_am1 m234_am1 m048_am1 m107_am1 >>>>>> m047_am1 m106_am1 m124_am1 m193_am1 m225_am1 m066_am1 m125_am1 >>>>>> m176_am1 m194_am1 m224_am1 m235_am1 m067_am1 m165_am1 m049_am1 >>>>>> m126_am1 m166_am1 m108_am1 m195_am1 m038_am1 m059_am1 m036_am1 >>>>>> m186_am1 m164_am1 m223_am1 m117_am1 m037_am1 m058_am1 m068_am1 >>>>>> m188_am1 m119_am1 m196_am1 m187_am1 m222_am1 m127_am1 m118_am1 >>>>>> m189_am1 m060_am1 m236_am1 m109_am1 m177_am1 m050_am1 m179_am1 >>>>>> m123_am1 m178_am1 m237_am1 m100_am1 m191_am1 m110_am1 m041_am1 >>>>>> m064_am1 m228_am1 m063_am1 m238_am1 m169_am1 m122_am1 m051_am1 >>>>>> m121_am1 m190_am1 m120_am1 m062_am1 m039_am1 m065_am1 m167_am1 >>>>>> m192_am1 m227_am1 m040_am1 m226_am1 m168_am1 m239_am1 m052_am1 >>>>>> m111_am1 m180_am1 m053_am1 m112_am1 m181_am1 m240_am1 m054_am1 >>>>>> m044_am1 m113_am1 m230_am1 m103_am1 m229_am1 m061_am1 m042_am1 >>>>>> m101_am1 m170_am1 m043_am1 m102_am1 m171_am1 m151_am1 m083_am1 >>>>>> m210_am1 m014_am1 m023_am1 m200_am1 m092_am1 m091_am1 m150_am1 >>>>>> m209_am1 m022_am1 m024_am1 m093_am1 m015_am1 m084_am1 m142_am1 >>>>>> m201_am1 m016_am1 m085_am1 m143_am1 m202_am1 m010_am1 m212_am1 >>>>>> m138_am1 m026_am1 m011_am1 m095_am1 m139_am1 m154_am1 m211_am1 >>>>>> m025_am1 m094_am1 m153_am1 m213_am1 m080_am1 m012_am1 m152_am1 >>>>>> m081_am1 m140_am1 m013_am1 m082_am1 m141_am1 m028_am1 m097_am1 >>>>>> m155_am1 m008_am1 m214_am1 m135_am1 m029_am1 m076_am1 m098_am1 >>>>>> m007_am1 m156_am1 m134_am1 m215_am1 m137_am1 m079_am1 m009_am1 >>>>>> m078_am1 m077_am1 m096_am1 m136_am1 m027_am1 m132_am1 m158_am1 >>>>>> m073_am1 m217_am1 m030_am1 m159_am1 m072_am1 m218_am1 m003_am1 >>>>>> m031_am1 m004_am1 m219_am1 m131_am1 m074_am1 m133_am1 m006_am1 >>>>>> m075_am1 m157_am1 m099_am1 m216_am1 m005_am1 m090_am1 m021_am1 >>>>>> m208_am1 m149_am1 m020_am1 m207_am1 m148_am1 m089_am1 m088_am1 >>>>>> m206_am1 m147_am1 m019_am1 m205_am1 m146_am1 m087_am1 m018_am1 >>>>>> m204_am1 m145_am1 m086_am1 m017_am1 m144_am1 m203_am1 m057_am1 >>>>>> m116_am1 m232_am1 m173_am1 m105_am1 m046_am1 m231_am1 m172_am1 >>>>>> m104_am1 m045_am1 m174_am1 m233_am1 m244_am1 m185_am1 m182_am1 >>>>>> m243_am1 m055_am1 m241_am1 m183_am1 m114_am1 m056_am1 m242_am1 >>>>>> m184_am1 m115_am1 >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/pre-antch.pl >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-58lc8efi stdout.txt stderr.txt m197_am1 m197_am1.rtf >>>>>> m197_am1.crd m197_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m197_am1 -fi mol2 -rn m197 -o m197_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-48lc8efi stdout.txt stderr.txt m129_am1 m129_am1.rtf >>>>>> m129_am1.crd m129_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m129_am1 -fi mol2 -rn m129 -o m129_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-68lc8efi stdout.txt stderr.txt m069_am1 m069_am1.rtf >>>>>> m069_am1.crd m069_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m069_am1 -fi mol2 -rn m069 -o m069_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-88lc8efi stdout.txt stderr.txt m163_am1 m163_am1.rtf >>>>>> m163_am1.crd m163_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m163_am1 -fi mol2 -rn m163 -o m163_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-78lc8efi stdout.txt stderr.txt m128_am1 m128_am1.rtf >>>>>> m128_am1.crd m128_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m128_am1 -fi mol2 -rn m128 -o m128_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-98lc8efi stdout.txt stderr.txt m035_am1 m035_am1.rtf >>>>>> m035_am1.crd m035_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m035_am1 -fi mol2 -rn m035 -o m035_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-a8lc8efi stdout.txt stderr.txt m070_am1 m070_am1.rtf >>>>>> m070_am1.crd m070_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m070_am1 -fi mol2 -rn m070 -o m070_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-b8lc8efi stdout.txt stderr.txt m221_am1 m221_am1.rtf >>>>>> m221_am1.crd m221_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m221_am1 -fi mol2 -rn m221 -o m221_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-c8lc8efi stdout.txt stderr.txt m162_am1 m162_am1.rtf >>>>>> m162_am1.crd m162_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m162_am1 -fi mol2 -rn m162 -o m162_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-d8lc8efi stdout.txt stderr.txt m198_am1 m198_am1.rtf >>>>>> m198_am1.crd m198_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m198_am1 -fi mol2 -rn m198 -o m198_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-e8lc8efi stdout.txt stderr.txt m034_am1 m034_am1.rtf >>>>>> m034_am1.crd m034_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m034_am1 -fi mol2 -rn m034 -o m034_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-f8lc8efi stdout.txt stderr.txt m001_am1 m001_am1.rtf >>>>>> m001_am1.crd m001_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m001_am1 -fi mol2 -rn m001 -o m001_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-h8lc8efi stdout.txt stderr.txt m033_am1 m033_am1.rtf >>>>>> m033_am1.crd m033_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m033_am1 -fi mol2 -rn m033 -o m033_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-g8lc8efi stdout.txt stderr.txt m220_am1 m220_am1.rtf >>>>>> m220_am1.crd m220_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m220_am1 -fi mol2 -rn m220 -o m220_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-i8lc8efi stdout.txt stderr.txt m161_am1 m161_am1.rtf >>>>>> m161_am1.crd m161_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m161_am1 -fi mol2 -rn m161 -o m161_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-j8lc8efi stdout.txt stderr.txt m032_am1 m032_am1.rtf >>>>>> m032_am1.crd m032_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m032_am1 -fi mol2 -rn m032 -o m032_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-k8lc8efi stdout.txt stderr.txt m160_am1 m160_am1.rtf >>>>>> m160_am1.crd m160_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m160_am1 -fi mol2 -rn m160 -o m160_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-l8lc8efi stdout.txt stderr.txt m130_am1 m130_am1.rtf >>>>>> m130_am1.crd m130_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m130_am1 -fi mol2 -rn m130 -o m130_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-m8lc8efi stdout.txt stderr.txt m071_am1 m071_am1.rtf >>>>>> m071_am1.crd m071_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m071_am1 -fi mol2 -rn m071 -o m071_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-o8lc8efi stdout.txt stderr.txt m199_am1 m199_am1.rtf >>>>>> m199_am1.crd m199_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m199_am1 -fi mol2 -rn m199 -o m199_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-n8lc8efi stdout.txt stderr.txt m002_am1 m002_am1.rtf >>>>>> m002_am1.crd m002_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m002_am1 -fi mol2 -rn m002 -o m002_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-p8lc8efi stdout.txt stderr.txt m175_am1 m175_am1.rtf >>>>>> m175_am1.crd m175_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m175_am1 -fi mol2 -rn m175 -o m175_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-q8lc8efi stdout.txt stderr.txt m234_am1 m234_am1.rtf >>>>>> m234_am1.crd m234_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m234_am1 -fi mol2 -rn m234 -o m234_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-s8lc8efi stdout.txt stderr.txt m107_am1 m107_am1.rtf >>>>>> m107_am1.crd m107_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m107_am1 -fi mol2 -rn m107 -o m107_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-r8lc8efi stdout.txt stderr.txt m048_am1 m048_am1.rtf >>>>>> m048_am1.crd m048_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m048_am1 -fi mol2 -rn m048 -o m048_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-v8lc8efi stdout.txt stderr.txt m124_am1 m124_am1.rtf >>>>>> m124_am1.crd m124_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m124_am1 -fi mol2 -rn m124 -o m124_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-t8lc8efi stdout.txt stderr.txt m047_am1 m047_am1.rtf >>>>>> m047_am1.crd m047_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m047_am1 -fi mol2 -rn m047 -o m047_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-u8lc8efi stdout.txt stderr.txt m106_am1 m106_am1.rtf >>>>>> m106_am1.crd m106_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m106_am1 -fi mol2 -rn m106 -o m106_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-x8lc8efi stdout.txt stderr.txt m193_am1 m193_am1.rtf >>>>>> m193_am1.crd m193_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m193_am1 -fi mol2 -rn m193 -o m193_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-y8lc8efi stdout.txt stderr.txt m225_am1 m225_am1.rtf >>>>>> m225_am1.crd m225_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m225_am1 -fi mol2 -rn m225 -o m225_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-z8lc8efi stdout.txt stderr.txt m066_am1 m066_am1.rtf >>>>>> m066_am1.crd m066_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m066_am1 -fi mol2 -rn m066 -o m066_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-09lc8efi stdout.txt stderr.txt m125_am1 m125_am1.rtf >>>>>> m125_am1.crd m125_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m125_am1 -fi mol2 -rn m125 -o m125_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-29lc8efi stdout.txt stderr.txt m194_am1 m194_am1.rtf >>>>>> m194_am1.crd m194_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m194_am1 -fi mol2 -rn m194 -o m194_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-19lc8efi stdout.txt stderr.txt m176_am1 m176_am1.rtf >>>>>> m176_am1.crd m176_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m176_am1 -fi mol2 -rn m176 -o m176_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-39lc8efi stdout.txt stderr.txt m224_am1 m224_am1.rtf >>>>>> m224_am1.crd m224_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m224_am1 -fi mol2 -rn m224 -o m224_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-49lc8efi stdout.txt stderr.txt m235_am1 m235_am1.rtf >>>>>> m235_am1.crd m235_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m235_am1 -fi mol2 -rn m235 -o m235_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-69lc8efi stdout.txt stderr.txt m165_am1 m165_am1.rtf >>>>>> m165_am1.crd m165_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m165_am1 -fi mol2 -rn m165 -o m165_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-59lc8efi stdout.txt stderr.txt m067_am1 m067_am1.rtf >>>>>> m067_am1.crd m067_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m067_am1 -fi mol2 -rn m067 -o m067_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-79lc8efi stdout.txt stderr.txt m049_am1 m049_am1.rtf >>>>>> m049_am1.crd m049_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m049_am1 -fi mol2 -rn m049 -o m049_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-89lc8efi stdout.txt stderr.txt m126_am1 m126_am1.rtf >>>>>> m126_am1.crd m126_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m126_am1 -fi mol2 -rn m126 -o m126_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-99lc8efi stdout.txt stderr.txt m166_am1 m166_am1.rtf >>>>>> m166_am1.crd m166_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m166_am1 -fi mol2 -rn m166 -o m166_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-a9lc8efi stdout.txt stderr.txt m108_am1 m108_am1.rtf >>>>>> m108_am1.crd m108_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m108_am1 -fi mol2 -rn m108 -o m108_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-b9lc8efi stdout.txt stderr.txt m195_am1 m195_am1.rtf >>>>>> m195_am1.crd m195_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m195_am1 -fi mol2 -rn m195 -o m195_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-d9lc8efi stdout.txt stderr.txt m038_am1 m038_am1.rtf >>>>>> m038_am1.crd m038_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m038_am1 -fi mol2 -rn m038 -o m038_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-c9lc8efi stdout.txt stderr.txt m059_am1 m059_am1.rtf >>>>>> m059_am1.crd m059_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m059_am1 -fi mol2 -rn m059 -o m059_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-e9lc8efi stdout.txt stderr.txt m186_am1 m186_am1.rtf >>>>>> m186_am1.crd m186_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m186_am1 -fi mol2 -rn m186 -o m186_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-f9lc8efi stdout.txt stderr.txt m164_am1 m164_am1.rtf >>>>>> m164_am1.crd m164_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m164_am1 -fi mol2 -rn m164 -o m164_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-h9lc8efi stdout.txt stderr.txt m036_am1 m036_am1.rtf >>>>>> m036_am1.crd m036_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m036_am1 -fi mol2 -rn m036 -o m036_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-g9lc8efi stdout.txt stderr.txt m223_am1 m223_am1.rtf >>>>>> m223_am1.crd m223_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m223_am1 -fi mol2 -rn m223 -o m223_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-j9lc8efi stdout.txt stderr.txt m058_am1 m058_am1.rtf >>>>>> m058_am1.crd m058_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m058_am1 -fi mol2 -rn m058 -o m058_am1 -fo charmm -c bcc >>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh >>>>>> antch-k9lc8efi stdout.txt stderr.txt m037_am1 m037_am1.rtf >>>>>> m037_am1.crd m037_am1.prm >>>>>> /disks/scratchgpfs1/iraicu/ModLyn/bin/antechamber.sh -s 2 -i >>>>>> m037_am1 -fi mol2 -rn m037 -o m037_am1 -fo charmm -c bcc >>>>>> >>>>>> >>>>>> >>>>>> Veronika Nefedova wrote: >>>>>>> Swift thinks that it sent 248 jobs. >>>>>>> >>>>>>> nefedova at viper:~/alamines> grep "Running job " >>>>>>> MolDyn-244-loops-dbui34oxjr4j2.log | wc >>>>>>> 248 6931 56718 >>>>>>> nefedova at viper:~/alamines> >>>>>>> >>>>>>> On Aug 6, 2007, at 3:27 PM, Ioan Raicu wrote: >>>>>>> >>>>>>>> Everything is idle, there is no work to be done... >>>>>>>> >>>>>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> tail >>>>>>>> GenericPortalWS_perf_per_sec.txt >>>>>>>> 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>> >>>>>>>> 24 workers are registered but idle.... queue length 0, 57 jobs >>>>>>>> completed. >>>>>>>> >>>>>>>> Also, see below all 57 jobs, they all finished with an exit >>>>>>>> code of 0, in other words succesfully! How many jobs does >>>>>>>> Swift think it sent? >>>>>>>> >>>>>>>> Ioan >>>>>>>> >>>>>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>>>>>>> GenericPortalWS_taskPerf.txt >>>>>>>> //taskNum taskID workerID startTimeStamp execTimeStamp >>>>>>>> resultsQueueTimeStamp endTimeStamp waitQueueTime ex >>>>>>>> ecTime resultsQueueTime totalTime exitCode >>>>>>>> 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 560614 >>>>>>>> 560629 49780 338 15 50133 0 >>>>>>>> 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 561200 >>>>>>>> 561899 561909 216 699 10 925 0 >>>>>>>> 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 561373 >>>>>>>> 562150 562159 382 777 9 1168 0 >>>>>>>> 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 >>>>>>>> 1044916 1044926 62404 10200 10 72614 0 >>>>>>>> 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 1046453 >>>>>>>> 1047038 1047067 135 585 29 749 0 >>>>>>>> 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 1046429 >>>>>>>> 1053072 1053080 114 6643 8 6765 0 >>>>>>>> 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 1047051 >>>>>>>> 1054256 1054290 731 7205 34 7970 0 >>>>>>>> 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 1054267 >>>>>>>> 1054570 1054579 7943 303 9 8255 0 >>>>>>>> 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 1053087 >>>>>>>> 1056811 1056819 6765 3724 8 10497 0 >>>>>>>> 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 1054583 >>>>>>>> 1058691 1058719 8257 4108 28 12393 0 >>>>>>>> 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 1058704 >>>>>>>> 1059363 1059385 12373 659 22 13054 0 >>>>>>>> 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 1056826 >>>>>>>> 1060315 1060323 10497 3489 8 13994 0 >>>>>>>> 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 1059375 >>>>>>>> 1060589 1060596 13042 1214 7 14263 0 >>>>>>>> 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 >>>>>>>> 1060603 1060954 1061054 14265 351 100 14716 0 >>>>>>>> 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 >>>>>>>> 1060329 1061094 1061126 13993 765 32 14790 0 >>>>>>>> 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 >>>>>>>> 1061105 1065608 1065617 14414 4503 9 18926 0 >>>>>>>> 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 >>>>>>>> 1065622 1066307 1066315 18929 685 8 19622 0 >>>>>>>> 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 >>>>>>>> 1061045 1067540 1067563 14356 6495 23 20874 0 >>>>>>>> 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 >>>>>>>> 1066320 1069262 1069271 19625 2942 9 22576 0 >>>>>>>> 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 >>>>>>>> 1067551 1071003 1071011 20854 3452 8 24314 0 >>>>>>>> 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 >>>>>>>> 1071016 1071664 1071671 24316 648 7 24971 0 >>>>>>>> 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 >>>>>>>> 1069275 1071679 1071692 22577 2404 13 24994 0 >>>>>>>> 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 >>>>>>>> 1071687 1073978 1073988 24985 2291 10 27286 0 >>>>>>>> 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 >>>>>>>> 1073992 1075959 1075969 27286 1967 10 29263 0 >>>>>>>> 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 >>>>>>>> 1071699 1076704 1076713 24995 5005 9 30009 0 >>>>>>>> 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 >>>>>>>> 1075972 1077451 1077459 29264 1479 8 30751 0 >>>>>>>> 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 >>>>>>>> 1076717 1080157 1080165 30007 3440 8 33455 0 >>>>>>>> 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 >>>>>>>> 1077464 1080270 1080286 30752 2806 16 33574 0 >>>>>>>> 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 >>>>>>>> 1080170 1080611 1080619 33457 441 8 33906 0 >>>>>>>> 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 >>>>>>>> 1080624 1080973 1080983 33907 349 10 34266 0 >>>>>>>> 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 >>>>>>>> 1080281 1081405 1081413 33566 1124 8 34698 0 >>>>>>>> 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 >>>>>>>> 1080986 1082989 1082996 34267 2003 7 36277 0 >>>>>>>> 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 >>>>>>>> 1083002 1083370 1083378 36279 368 8 36655 0 >>>>>>>> 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 >>>>>>>> 1081417 1084830 1084837 34696 3413 7 38116 0 >>>>>>>> 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 >>>>>>>> 1084843 1085854 1085879 37761 1011 25 38797 0 >>>>>>>> 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 >>>>>>>> 1085865 1089502 1089511 38780 3637 9 42426 0 >>>>>>>> 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 >>>>>>>> 1089515 1089966 1089974 42428 451 8 42887 0 >>>>>>>> 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 >>>>>>>> 1083383 1091316 1091324 36658 7933 8 44599 0 >>>>>>>> 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 >>>>>>>> 1091329 1092042 1092049 44237 713 7 44957 0 >>>>>>>> 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 >>>>>>>> 1092055 1094242 1094249 44960 2187 7 47154 0 >>>>>>>> 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 >>>>>>>> 1089979 1094418 1094428 42889 4439 10 47338 0 >>>>>>>> 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 >>>>>>>> 1094433 1095082 1095089 47331 649 7 47987 0 >>>>>>>> 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 >>>>>>>> 1095095 1096846 1096853 47991 1751 7 49749 0 >>>>>>>> 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 >>>>>>>> 1094256 1098214 1098221 47156 3958 7 51121 0 >>>>>>>> 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 >>>>>>>> 1096859 1098627 1098637 49752 1768 10 51530 0 >>>>>>>> 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 >>>>>>>> 1094037 1098903 1098910 46940 4866 7 51813 0 >>>>>>>> 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 >>>>>>>> 1099192 1100210 1100246 52071 1018 36 53125 0 >>>>>>>> 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 >>>>>>>> 1097371 1100555 1100562 50260 3184 7 53451 0 >>>>>>>> 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 >>>>>>>> 1097135 1100896 1100904 50026 3761 8 53795 0 >>>>>>>> 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 >>>>>>>> 1098640 1101106 1101127 51523 2466 21 54010 0 >>>>>>>> 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 >>>>>>>> 1099965 1101217 1101224 52842 1252 7 54101 0 >>>>>>>> 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 >>>>>>>> 1098227 1101820 1101828 51112 3593 8 54713 0 >>>>>>>> 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 >>>>>>>> 1097375 1104132 1104139 50262 6757 7 57026 0 >>>>>>>> 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 >>>>>>>> 1100221 1106449 1106458 53096 6228 9 59333 0 >>>>>>>> 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 >>>>>>>> 1098916 1106473 1106481 51797 7557 8 59362 0 >>>>>>>> 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 563384 >>>>>>>> 1207793 1207801 71 644409 8 644488 0 >>>>>>>> 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 563413 >>>>>>>> 1216404 1216425 98 652991 21 653110 0 >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Veronika Nefedova wrote: >>>>>>>>> OK. There is something weird happening. I've got several such >>>>>>>>> entries in my swift log: >>>>>>>>> >>>>>>>>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application >>>>>>>>> exception: Task failed >>>>>>>>> task:execute @ vdl-int.k, line: 332 >>>>>>>>> vdl:execute2 @ execute-default.k, line: 22 >>>>>>>>> vdl:execute @ MolDyn-244-loops.kml, line: 20 >>>>>>>>> antchmbr @ MolDyn-244-loops.kml, line: 2845 >>>>>>>>> vdl:mains @ MolDyn-244-loops.kml, line: 2267 >>>>>>>>> >>>>>>>>> >>>>>>>>> Looks like antechamber has failed (?). And the failure is only >>>>>>>>> on a swfit side, it never made it across to Falcon (there are >>>>>>>>> no remote directories created). But I see some of antechamber >>>>>>>>> jobs have finished (in shared). >>>>>>>>> >>>>>>>>> Yuqing -- could the changes you've made be responsible for >>>>>>>>> these failures (I do not see how it could though) ? >>>>>>>>> >>>>>>>>> Ioan, what do you see in your logs ion these tasks: >>>>>>>>> >>>>>>>>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn:0-1-56-0-1186429255786) setting status to Failed >>>>>>>>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn:0-1-57-0-1186429255798) setting status to Failed >>>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn:0-1-59-0-1186429255800) setting status to Failed >>>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn:0-1-60-0-1186429255805) setting status to Failed >>>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn:0-1-61-0-1186429255811) setting status to Failed >>>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn:0-1-58-0-1186429255814) setting status to Failed >>>>>>>>> >>>>>>>>> Nika >>>>>>>>> >>>>>>>>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >>>>>>>>> >>>>>>>>>> OK! >>>>>>>>>> Why don't we do one last run from my allocation, as >>>>>>>>>> everything is set up already and ready to go! Make sure to >>>>>>>>>> enable all debug logging. Falkon is up and running with all >>>>>>>>>> debug enabled! >>>>>>>>>> >>>>>>>>>> Falkon location is unchanged from the last experiment. >>>>>>>>>> Falkon Factory Service: >>>>>>>>>> http://tg-viz-login2:50010/wsrf/services/GenericPortal/core/WS/GPFactoryService >>>>>>>>>> >>>>>>>>>> Web Server (graphs): >>>>>>>>>> http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>>>>>>>> >>>>>>>>>> ANL/UC is not quite so idle as it was earlier, but I bet we >>>>>>>>>> could still get 150~200 processors! >>>>>>>>>> >>>>>>>>>> Ioan >>>>>>>>>> >>>>>>>>>> Veronika Nefedova wrote: >>>>>>>>>>> m050 and m179 finished just fine now via GRAM (thanks to >>>>>>>>>>> Yuqing who fixed the m179 just in time!). We could start >>>>>>>>>>> again the 244- molecule run to verify that nothing is wrong >>>>>>>>>>> with the whole system. >>>>>>>>>>> >>>>>>>>>>> Nika >>>>>>>>>>> >>>>>>>>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> I started those 2 molecules via GRAM. I have no trust in >>>>>>>>>>>> m179 finishing completely since I didn't change anything. I >>>>>>>>>>>> hope for m050 to finish though... >>>>>>>>>>>> You can watch the swift log on viper in >>>>>>>>>>>> ~nefedova/alamines/MolDyn-2-loops-be9484k93kk21.log >>>>>>>>>>>> >>>>>>>>>>>> Nika >>>>>>>>>>>> >>>>>>>>>>>>> Then, let's try another run with 244 molecules soon, as >>>>>>>>>>>>> most of ANL/UC is free! >>>>>>>>>>>>> >>>>>>>>>>>>> Ioan >>>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>> >>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> >>> >> > > From nefedova at mcs.anl.gov Mon Aug 6 21:59:52 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 6 Aug 2007 21:59:52 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B7DE24.7030600@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <46B7DE24.7030600@cs.uchicago.edu> Message-ID: <41C1BA76-B8E1-4A8C-8063-DF6B02060FAA@mcs.anl.gov> Well, there are some discrepancies: nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- zhgo6be8tjhi1.log | wc 7959 244749 3241072 nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- zhgo6be8tjhi1.log | wc 17207 564648 7949388 nefedova at viper:~/alamines> I.e. almost half of the jobs haven't finished (according to swift) Nika On Aug 6, 2007, at 9:51 PM, Ioan Raicu wrote: > I have 7959 jobs completed with an exit code of 0, no failed jobs! > All the Falkon logs point to the same 7959 number of jobs, and when > they were all completed, no new jobs came in from Swift... > > How many jobs do you see submitted, and how many have been > completed in the Swift logs? > > Everything looks 100% normal on Falkon's end. > > Ioan > > Veronika Nefedova wrote: >> Whats up now? Everything has stopped, no errors on swift site... >> Do you have any errors now? >> >> Nika >> >> On Aug 6, 2007, at 6:04 PM, Ioan Raicu wrote: >> >>> OK, I restarted Falkon as well as there were 12K jobs trying to >>> go through, and keeping the entire ANL/UC site busy, although >>> there was no Swift on the other end to pick up the notifications... >>> >>> here is the new info: >>> >>> Falkon Factory Service: http://tg-viz-login2:50020/wsrf/services/ >>> GenericPortal/core/WS/GPFactoryService >>> Web server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>> >>> Note that I changed the port #, its now 50020, so don't forget to >>> change that before you start Swift... >>> >>> Ioan >>> >>> Veronika Nefedova wrote: >>>> OK. I accidentally closed viper window where I started the >>>> workflow. The workflow was started with & so it was supposed to >>>> stay up even if I exited the shell. But apparently it didn't! >>>> >>>> This is the last entry in the log: >>>> >>>> 2007-08-06 17:16:59,483 INFO ResourcePool Destroying remote >>>> service instance... dummy function, this doesn't really do >>>> anything... >>>> >>>> (and it doesn't change ever since). >>>> >>>> What went wrong ? Why closing the shell actually killed the job? >>>> (ps shows no swift job) >>>> I checked 'history' and in fact the job was started with &: >>>> >>>> 999 swift -tc.file tc-uc.data -sites.file sites-uc-64.xml - >>>> debug MolDyn-244-loops.swift & >>>> >>>> I'll restart the workflow in 30 mins or so (from home) again. >>>> >>>> Sigh... >>>> >>>> Nika >>>> >>>> >>>> On Aug 6, 2007, at 4:29 PM, Veronika Nefedova wrote: >>>> >>>>> Ioan, its all was due to NFS problems, I am convinced now... >>>>> >>>>> I restarted the run, the log is ~nefedova/alamines/MolDyn-244- >>>>> loops-hxl1glhtqsag0.log >>>>> >>>>> Nika >>>>> >>>>> On Aug 6, 2007, at 4:20 PM, Ioan Raicu wrote: >>>>> >>>>>> Just to debug further.... I picked out 1 task at random from >>>>>> the Swift log... >>>>>> iraicu at viper:/home/nefedova/alamines> cat MolDyn-244-loops- >>>>>> dbui34oxjr4j2.log | grep "urn:0-1-62-0-1186429258791" >>>>>> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, >>>>>> identity=urn:0-1-62-0-1186429258791) setting status to Submitted >>>>>> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, >>>>>> identity=urn:0-1-62-0-1186429258791) setting status to Active >>>>>> 2007-08-06 14:47:03,704 DEBUG TaskImpl Task(type=2, >>>>>> identity=urn:0-1-62-0-1186429258791) setting status to Failed >>>>>> Exception in getFile >>>>>> >>>>>> but in my log, it is nowhere to be found... >>>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>>>>> GenericPortalWS_taskPerf.txt | grep "urn:0-1-62-0-1186429258791" >>>>>> >>>>>> What does "setting status to Failed Exception in getFile" >>>>>> mean? Could this mean that it failed on the data staging >>>>>> part, and that it never made it to Falkon? >>>>>> >>>>>> BTW, it lloks as if there were really 539 jobs submitted... >>>>>> >>>>>> iraicu at viper:/home/nefedova/alamines> grep "Submitted" >>>>>> MolDyn-244-loops-dbui34oxjr4j2.log | wc >>>>>> 539 5390 62835 >>>>>> >>>>>> but again, only 57 made it to Falkon, and there were no >>>>>> exceptions thrown anywhere to indicate that something unusual >>>>>> happened. >>>>>> >>>>>> Ioan >>>>>> >>>>>> Ioan Raicu wrote: >>>>>>> Falkon only has 57 tasks received, here they are: >>>>>>> tg-viz-login.uc.teragrid.org:/home/iraicu/java/Falkon_v0.8.1/ >>>>>>> service/logs/GenericPortalWS.txt.0.summary >>>>>>> >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh pre_ch-vsk58efi stdout.txt stderr.txt . ./ >>>>>>> m179.mol2 ./m050.mol2 m179_am1 m050_am1 /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/pre-antch.pl >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-xsk58efi stdout.txt stderr.txt m179_am1 >>>>>>> m179_am1.rtf m179_am1.crd m179_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m179_am1 -fi mol2 - >>>>>>> rn m179 -o m179_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-ysk58efi stdout.txt stderr.txt m050_am1 >>>>>>> m050_am1.rtf m050_am1.crd m050_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m050_am1 -fi mol2 - >>>>>>> rn m050 -o m050_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh chrm-0tk58efi equil_solv.out_m050 stderr.txt >>>>>>> equil_solv.inp parm03_gaff_all.rtf parm03_gaffnb_all.prm >>>>>>> equil_solv.inp m050_am1.rtf m050_am1.prm m050_am1.crd >>>>>>> water_400.crd equil_solv.out_m050 solv_m050.psf >>>>>>> solv_m050_eq.crd solv_m050.rst solv_m050.trj >>>>>>> solv_m050_min.crd /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>>>>>> charmm.sh system:solv_m050 title:solv stitle:m050 >>>>>>> rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm >>>>>>> gaff:m050_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 >>>>>>> rwater:15 nstep:10000 minstep:100 skipstep:100 startstep:10000 >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh chrm-zsk58efi equil_solv.out_m179 stderr.txt >>>>>>> equil_solv.inp parm03_gaff_all.rtf parm03_gaffnb_all.prm >>>>>>> equil_solv.inp m179_am1.rtf m179_am1.prm m179_am1.crd >>>>>>> water_400.crd equil_solv.out_m179 solv_m179.psf >>>>>>> solv_m179_eq.crd solv_m179.rst solv_m179.trj >>>>>>> solv_m179_min.crd /disks/scratchgpfs1/iraicu/ModLyn/bin/ >>>>>>> charmm.sh system:solv_m179 title:solv stitle:m179 >>>>>>> rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm >>>>>>> gaff:m179_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 >>>>>>> rwater:15 nstep:10000 minstep:100 skipstep:100 startstep:10000 >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh pre_ch-38lc8efi stdout.txt stderr.txt . ./ >>>>>>> m197.mol2 ./m129.mol2 ./m069.mol2 ./m163.mol2 ./m128.mol2 ./ >>>>>>> m035.mol2 ./m070.mol2 ./m221.mol2 ./m162.mol2 ./m198.mol2 ./ >>>>>>> m034.mol2 ./m001.mol2 ./m220.mol2 ./m033.mol2 ./m161.mol2 ./ >>>>>>> m032.mol2 ./m160.mol2 ./m130.mol2 ./m071.mol2 ./m002.mol2 ./ >>>>>>> m199.mol2 ./m175.mol2 ./m234.mol2 ./m048.mol2 ./m107.mol2 ./ >>>>>>> m047.mol2 ./m106.mol2 ./m124.mol2 ./m193.mol2 ./m225.mol2 ./ >>>>>>> m066.mol2 ./m125.mol2 ./m176.mol2 ./m194.mol2 ./m224.mol2 ./ >>>>>>> m235.mol2 ./m067.mol2 ./m165.mol2 ./m049.mol2 ./m126.mol2 ./ >>>>>>> m166.mol2 ./m108.mol2 ./m195.mol2 ./m038.mol2 ./m059.mol2 ./ >>>>>>> m036.mol2 ./m186.mol2 ./m164.mol2 ./m117.mol2 ./m223.mol2 ./ >>>>>>> m058.mol2 ./m037.mol2 ./m188.mol2 ./m068.mol2 ./m119.mol2 ./ >>>>>>> m187.mol2 ./m196.mol2 ./m118.mol2 ./m127.mol2 ./m222.mol2 ./ >>>>>>> m189.mol2 ./m060.mol2 ./m236.mol2 ./m109.mol2 ./m177.mol2 ./ >>>>>>> m050.mol2 ./m179.mol2 ./m178.mol2 ./m123.mol2 ./m237.mol2 ./ >>>>>>> m110.mol2 ./m191.mol2 ./m100.mol2 ./m064.mol2 ./m041.mol2 ./ >>>>>>> m238.mol2 ./m063.mol2 ./m228.mol2 ./m051.mol2 ./m122.mol2 ./ >>>>>>> m169.mol2 ./m121.mol2 ./m190.mol2 ./m120.mol2 ./m062.mol2 ./ >>>>>>> m065.mol2 ./m039.mol2 ./m192.mol2 ./m167.mol2 ./m227.mol2 ./ >>>>>>> m040.mol2 ./m226.mol2 ./m168.mol2 ./m239.mol2 ./m052.mol2 ./ >>>>>>> m111.mol2 ./m180.mol2 ./m053.mol2 ./m112.mol2 ./m181.mol2 ./ >>>>>>> m240.mol2 ./m054.mol2 ./m044.mol2 ./m113.mol2 ./m230.mol2 ./ >>>>>>> m103.mol2 ./m229.mol2 ./m061.mol2 ./m042.mol2 ./m101.mol2 ./ >>>>>>> m170.mol2 ./m043.mol2 ./m102.mol2 ./m171.mol2 ./m151.mol2 ./ >>>>>>> m083.mol2 ./m210.mol2 ./m014.mol2 ./m023.mol2 ./m200.mol2 ./ >>>>>>> m092.mol2 ./m091.mol2 ./m150.mol2 ./m209.mol2 ./m022.mol2 ./ >>>>>>> m024.mol2 ./m093.mol2 ./m015.mol2 ./m084.mol2 ./m142.mol2 ./ >>>>>>> m201.mol2 ./m016.mol2 ./m085.mol2 ./m143.mol2 ./m202.mol2 ./ >>>>>>> m010.mol2 ./m212.mol2 ./m138.mol2 ./m026.mol2 ./m011.mol2 ./ >>>>>>> m095.mol2 ./m139.mol2 ./m154.mol2 ./m211.mol2 ./m025.mol2 ./ >>>>>>> m094.mol2 ./m153.mol2 ./m213.mol2 ./m080.mol2 ./m012.mol2 ./ >>>>>>> m152.mol2 ./m081.mol2 ./m140.mol2 ./m013.mol2 ./m082.mol2 ./ >>>>>>> m141.mol2 ./m028.mol2 ./m097.mol2 ./m155.mol2 ./m008.mol2 ./ >>>>>>> m214.mol2 ./m135.mol2 ./m029.mol2 ./m076.mol2 ./m098.mol2 ./ >>>>>>> m007.mol2 ./m156.mol2 ./m134.mol2 ./m215.mol2 ./m137.mol2 ./ >>>>>>> m079.mol2 ./m009.mol2 ./m078.mol2 ./m077.mol2 ./m096.mol2 ./ >>>>>>> m136.mol2 ./m027.mol2 ./m132.mol2 ./m158.mol2 ./m073.mol2 ./ >>>>>>> m217.mol2 ./m030.mol2 ./m159.mol2 ./m072.mol2 ./m218.mol2 ./ >>>>>>> m003.mol2 ./m031.mol2 ./m004.mol2 ./m219.mol2 ./m131.mol2 ./ >>>>>>> m074.mol2 ./m133.mol2 ./m006.mol2 ./m075.mol2 ./m157.mol2 ./ >>>>>>> m099.mol2 ./m005.mol2 ./m216.mol2 ./m090.mol2 ./m021.mol2 ./ >>>>>>> m208.mol2 ./m149.mol2 ./m020.mol2 ./m207.mol2 ./m148.mol2 ./ >>>>>>> m088.mol2 ./m089.mol2 ./m206.mol2 ./m147.mol2 ./m019.mol2 ./ >>>>>>> m205.mol2 ./m146.mol2 ./m087.mol2 ./m018.mol2 ./m204.mol2 ./ >>>>>>> m145.mol2 ./m086.mol2 ./m017.mol2 ./m144.mol2 ./m203.mol2 ./ >>>>>>> m057.mol2 ./m116.mol2 ./m232.mol2 ./m173.mol2 ./m105.mol2 ./ >>>>>>> m046.mol2 ./m231.mol2 ./m172.mol2 ./m104.mol2 ./m045.mol2 ./ >>>>>>> m174.mol2 ./m233.mol2 ./m244.mol2 ./m185.mol2 ./m182.mol2 ./ >>>>>>> m243.mol2 ./m055.mol2 ./m241.mol2 ./m183.mol2 ./m114.mol2 ./ >>>>>>> m056.mol2 ./m242.mol2 ./m184.mol2 ./m115.mol2 m197_am1 >>>>>>> m129_am1 m069_am1 m163_am1 m128_am1 m035_am1 m070_am1 >>>>>>> m221_am1 m162_am1 m198_am1 m034_am1 m001_am1 m220_am1 >>>>>>> m033_am1 m161_am1 m032_am1 m160_am1 m130_am1 m071_am1 >>>>>>> m002_am1 m199_am1 m175_am1 m234_am1 m048_am1 m107_am1 >>>>>>> m047_am1 m106_am1 m124_am1 m193_am1 m225_am1 m066_am1 >>>>>>> m125_am1 m176_am1 m194_am1 m224_am1 m235_am1 m067_am1 >>>>>>> m165_am1 m049_am1 m126_am1 m166_am1 m108_am1 m195_am1 >>>>>>> m038_am1 m059_am1 m036_am1 m186_am1 m164_am1 m223_am1 >>>>>>> m117_am1 m037_am1 m058_am1 m068_am1 m188_am1 m119_am1 >>>>>>> m196_am1 m187_am1 m222_am1 m127_am1 m118_am1 m189_am1 >>>>>>> m060_am1 m236_am1 m109_am1 m177_am1 m050_am1 m179_am1 >>>>>>> m123_am1 m178_am1 m237_am1 m100_am1 m191_am1 m110_am1 >>>>>>> m041_am1 m064_am1 m228_am1 m063_am1 m238_am1 m169_am1 >>>>>>> m122_am1 m051_am1 m121_am1 m190_am1 m120_am1 m062_am1 >>>>>>> m039_am1 m065_am1 m167_am1 m192_am1 m227_am1 m040_am1 >>>>>>> m226_am1 m168_am1 m239_am1 m052_am1 m111_am1 m180_am1 >>>>>>> m053_am1 m112_am1 m181_am1 m240_am1 m054_am1 m044_am1 >>>>>>> m113_am1 m230_am1 m103_am1 m229_am1 m061_am1 m042_am1 >>>>>>> m101_am1 m170_am1 m043_am1 m102_am1 m171_am1 m151_am1 >>>>>>> m083_am1 m210_am1 m014_am1 m023_am1 m200_am1 m092_am1 >>>>>>> m091_am1 m150_am1 m209_am1 m022_am1 m024_am1 m093_am1 >>>>>>> m015_am1 m084_am1 m142_am1 m201_am1 m016_am1 m085_am1 >>>>>>> m143_am1 m202_am1 m010_am1 m212_am1 m138_am1 m026_am1 >>>>>>> m011_am1 m095_am1 m139_am1 m154_am1 m211_am1 m025_am1 >>>>>>> m094_am1 m153_am1 m213_am1 m080_am1 m012_am1 m152_am1 >>>>>>> m081_am1 m140_am1 m013_am1 m082_am1 m141_am1 m028_am1 >>>>>>> m097_am1 m155_am1 m008_am1 m214_am1 m135_am1 m029_am1 >>>>>>> m076_am1 m098_am1 m007_am1 m156_am1 m134_am1 m215_am1 >>>>>>> m137_am1 m079_am1 m009_am1 m078_am1 m077_am1 m096_am1 >>>>>>> m136_am1 m027_am1 m132_am1 m158_am1 m073_am1 m217_am1 >>>>>>> m030_am1 m159_am1 m072_am1 m218_am1 m003_am1 m031_am1 >>>>>>> m004_am1 m219_am1 m131_am1 m074_am1 m133_am1 m006_am1 >>>>>>> m075_am1 m157_am1 m099_am1 m216_am1 m005_am1 m090_am1 >>>>>>> m021_am1 m208_am1 m149_am1 m020_am1 m207_am1 m148_am1 >>>>>>> m089_am1 m088_am1 m206_am1 m147_am1 m019_am1 m205_am1 >>>>>>> m146_am1 m087_am1 m018_am1 m204_am1 m145_am1 m086_am1 >>>>>>> m017_am1 m144_am1 m203_am1 m057_am1 m116_am1 m232_am1 >>>>>>> m173_am1 m105_am1 m046_am1 m231_am1 m172_am1 m104_am1 >>>>>>> m045_am1 m174_am1 m233_am1 m244_am1 m185_am1 m182_am1 >>>>>>> m243_am1 m055_am1 m241_am1 m183_am1 m114_am1 m056_am1 >>>>>>> m242_am1 m184_am1 m115_am1 /disks/scratchgpfs1/iraicu/ModLyn/ >>>>>>> bin/pre-antch.pl >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-58lc8efi stdout.txt stderr.txt m197_am1 >>>>>>> m197_am1.rtf m197_am1.crd m197_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m197_am1 -fi mol2 - >>>>>>> rn m197 -o m197_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-48lc8efi stdout.txt stderr.txt m129_am1 >>>>>>> m129_am1.rtf m129_am1.crd m129_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m129_am1 -fi mol2 - >>>>>>> rn m129 -o m129_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-68lc8efi stdout.txt stderr.txt m069_am1 >>>>>>> m069_am1.rtf m069_am1.crd m069_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m069_am1 -fi mol2 - >>>>>>> rn m069 -o m069_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-88lc8efi stdout.txt stderr.txt m163_am1 >>>>>>> m163_am1.rtf m163_am1.crd m163_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m163_am1 -fi mol2 - >>>>>>> rn m163 -o m163_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-78lc8efi stdout.txt stderr.txt m128_am1 >>>>>>> m128_am1.rtf m128_am1.crd m128_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m128_am1 -fi mol2 - >>>>>>> rn m128 -o m128_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-98lc8efi stdout.txt stderr.txt m035_am1 >>>>>>> m035_am1.rtf m035_am1.crd m035_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m035_am1 -fi mol2 - >>>>>>> rn m035 -o m035_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-a8lc8efi stdout.txt stderr.txt m070_am1 >>>>>>> m070_am1.rtf m070_am1.crd m070_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m070_am1 -fi mol2 - >>>>>>> rn m070 -o m070_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-b8lc8efi stdout.txt stderr.txt m221_am1 >>>>>>> m221_am1.rtf m221_am1.crd m221_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m221_am1 -fi mol2 - >>>>>>> rn m221 -o m221_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-c8lc8efi stdout.txt stderr.txt m162_am1 >>>>>>> m162_am1.rtf m162_am1.crd m162_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m162_am1 -fi mol2 - >>>>>>> rn m162 -o m162_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-d8lc8efi stdout.txt stderr.txt m198_am1 >>>>>>> m198_am1.rtf m198_am1.crd m198_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m198_am1 -fi mol2 - >>>>>>> rn m198 -o m198_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-e8lc8efi stdout.txt stderr.txt m034_am1 >>>>>>> m034_am1.rtf m034_am1.crd m034_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m034_am1 -fi mol2 - >>>>>>> rn m034 -o m034_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-f8lc8efi stdout.txt stderr.txt m001_am1 >>>>>>> m001_am1.rtf m001_am1.crd m001_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m001_am1 -fi mol2 - >>>>>>> rn m001 -o m001_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-h8lc8efi stdout.txt stderr.txt m033_am1 >>>>>>> m033_am1.rtf m033_am1.crd m033_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m033_am1 -fi mol2 - >>>>>>> rn m033 -o m033_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-g8lc8efi stdout.txt stderr.txt m220_am1 >>>>>>> m220_am1.rtf m220_am1.crd m220_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m220_am1 -fi mol2 - >>>>>>> rn m220 -o m220_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-i8lc8efi stdout.txt stderr.txt m161_am1 >>>>>>> m161_am1.rtf m161_am1.crd m161_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m161_am1 -fi mol2 - >>>>>>> rn m161 -o m161_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-j8lc8efi stdout.txt stderr.txt m032_am1 >>>>>>> m032_am1.rtf m032_am1.crd m032_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m032_am1 -fi mol2 - >>>>>>> rn m032 -o m032_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-k8lc8efi stdout.txt stderr.txt m160_am1 >>>>>>> m160_am1.rtf m160_am1.crd m160_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m160_am1 -fi mol2 - >>>>>>> rn m160 -o m160_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-l8lc8efi stdout.txt stderr.txt m130_am1 >>>>>>> m130_am1.rtf m130_am1.crd m130_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m130_am1 -fi mol2 - >>>>>>> rn m130 -o m130_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-m8lc8efi stdout.txt stderr.txt m071_am1 >>>>>>> m071_am1.rtf m071_am1.crd m071_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m071_am1 -fi mol2 - >>>>>>> rn m071 -o m071_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-o8lc8efi stdout.txt stderr.txt m199_am1 >>>>>>> m199_am1.rtf m199_am1.crd m199_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m199_am1 -fi mol2 - >>>>>>> rn m199 -o m199_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-n8lc8efi stdout.txt stderr.txt m002_am1 >>>>>>> m002_am1.rtf m002_am1.crd m002_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m002_am1 -fi mol2 - >>>>>>> rn m002 -o m002_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-p8lc8efi stdout.txt stderr.txt m175_am1 >>>>>>> m175_am1.rtf m175_am1.crd m175_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m175_am1 -fi mol2 - >>>>>>> rn m175 -o m175_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-q8lc8efi stdout.txt stderr.txt m234_am1 >>>>>>> m234_am1.rtf m234_am1.crd m234_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m234_am1 -fi mol2 - >>>>>>> rn m234 -o m234_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-s8lc8efi stdout.txt stderr.txt m107_am1 >>>>>>> m107_am1.rtf m107_am1.crd m107_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m107_am1 -fi mol2 - >>>>>>> rn m107 -o m107_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-r8lc8efi stdout.txt stderr.txt m048_am1 >>>>>>> m048_am1.rtf m048_am1.crd m048_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m048_am1 -fi mol2 - >>>>>>> rn m048 -o m048_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-v8lc8efi stdout.txt stderr.txt m124_am1 >>>>>>> m124_am1.rtf m124_am1.crd m124_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m124_am1 -fi mol2 - >>>>>>> rn m124 -o m124_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-t8lc8efi stdout.txt stderr.txt m047_am1 >>>>>>> m047_am1.rtf m047_am1.crd m047_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m047_am1 -fi mol2 - >>>>>>> rn m047 -o m047_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-u8lc8efi stdout.txt stderr.txt m106_am1 >>>>>>> m106_am1.rtf m106_am1.crd m106_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m106_am1 -fi mol2 - >>>>>>> rn m106 -o m106_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-x8lc8efi stdout.txt stderr.txt m193_am1 >>>>>>> m193_am1.rtf m193_am1.crd m193_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m193_am1 -fi mol2 - >>>>>>> rn m193 -o m193_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-y8lc8efi stdout.txt stderr.txt m225_am1 >>>>>>> m225_am1.rtf m225_am1.crd m225_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m225_am1 -fi mol2 - >>>>>>> rn m225 -o m225_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-z8lc8efi stdout.txt stderr.txt m066_am1 >>>>>>> m066_am1.rtf m066_am1.crd m066_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m066_am1 -fi mol2 - >>>>>>> rn m066 -o m066_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-09lc8efi stdout.txt stderr.txt m125_am1 >>>>>>> m125_am1.rtf m125_am1.crd m125_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m125_am1 -fi mol2 - >>>>>>> rn m125 -o m125_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-29lc8efi stdout.txt stderr.txt m194_am1 >>>>>>> m194_am1.rtf m194_am1.crd m194_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m194_am1 -fi mol2 - >>>>>>> rn m194 -o m194_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-19lc8efi stdout.txt stderr.txt m176_am1 >>>>>>> m176_am1.rtf m176_am1.crd m176_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m176_am1 -fi mol2 - >>>>>>> rn m176 -o m176_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-39lc8efi stdout.txt stderr.txt m224_am1 >>>>>>> m224_am1.rtf m224_am1.crd m224_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m224_am1 -fi mol2 - >>>>>>> rn m224 -o m224_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-49lc8efi stdout.txt stderr.txt m235_am1 >>>>>>> m235_am1.rtf m235_am1.crd m235_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m235_am1 -fi mol2 - >>>>>>> rn m235 -o m235_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-69lc8efi stdout.txt stderr.txt m165_am1 >>>>>>> m165_am1.rtf m165_am1.crd m165_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m165_am1 -fi mol2 - >>>>>>> rn m165 -o m165_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-59lc8efi stdout.txt stderr.txt m067_am1 >>>>>>> m067_am1.rtf m067_am1.crd m067_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m067_am1 -fi mol2 - >>>>>>> rn m067 -o m067_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-79lc8efi stdout.txt stderr.txt m049_am1 >>>>>>> m049_am1.rtf m049_am1.crd m049_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m049_am1 -fi mol2 - >>>>>>> rn m049 -o m049_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-89lc8efi stdout.txt stderr.txt m126_am1 >>>>>>> m126_am1.rtf m126_am1.crd m126_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m126_am1 -fi mol2 - >>>>>>> rn m126 -o m126_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-99lc8efi stdout.txt stderr.txt m166_am1 >>>>>>> m166_am1.rtf m166_am1.crd m166_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m166_am1 -fi mol2 - >>>>>>> rn m166 -o m166_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-a9lc8efi stdout.txt stderr.txt m108_am1 >>>>>>> m108_am1.rtf m108_am1.crd m108_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m108_am1 -fi mol2 - >>>>>>> rn m108 -o m108_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-b9lc8efi stdout.txt stderr.txt m195_am1 >>>>>>> m195_am1.rtf m195_am1.crd m195_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m195_am1 -fi mol2 - >>>>>>> rn m195 -o m195_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-d9lc8efi stdout.txt stderr.txt m038_am1 >>>>>>> m038_am1.rtf m038_am1.crd m038_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m038_am1 -fi mol2 - >>>>>>> rn m038 -o m038_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-c9lc8efi stdout.txt stderr.txt m059_am1 >>>>>>> m059_am1.rtf m059_am1.crd m059_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m059_am1 -fi mol2 - >>>>>>> rn m059 -o m059_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-e9lc8efi stdout.txt stderr.txt m186_am1 >>>>>>> m186_am1.rtf m186_am1.crd m186_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m186_am1 -fi mol2 - >>>>>>> rn m186 -o m186_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-f9lc8efi stdout.txt stderr.txt m164_am1 >>>>>>> m164_am1.rtf m164_am1.crd m164_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m164_am1 -fi mol2 - >>>>>>> rn m164 -o m164_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-h9lc8efi stdout.txt stderr.txt m036_am1 >>>>>>> m036_am1.rtf m036_am1.crd m036_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m036_am1 -fi mol2 - >>>>>>> rn m036 -o m036_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-g9lc8efi stdout.txt stderr.txt m223_am1 >>>>>>> m223_am1.rtf m223_am1.crd m223_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m223_am1 -fi mol2 - >>>>>>> rn m223 -o m223_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-j9lc8efi stdout.txt stderr.txt m058_am1 >>>>>>> m058_am1.rtf m058_am1.crd m058_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m058_am1 -fi mol2 - >>>>>>> rn m058 -o m058_am1 -fo charmm -c bcc >>>>>>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/ >>>>>>> wrapper.sh antch-k9lc8efi stdout.txt stderr.txt m037_am1 >>>>>>> m037_am1.rtf m037_am1.crd m037_am1.prm /disks/scratchgpfs1/ >>>>>>> iraicu/ModLyn/bin/antechamber.sh -s 2 -i m037_am1 -fi mol2 - >>>>>>> rn m037 -o m037_am1 -fo charmm -c bcc >>>>>>> >>>>>>> >>>>>>> >>>>>>> Veronika Nefedova wrote: >>>>>>>> Swift thinks that it sent 248 jobs. >>>>>>>> >>>>>>>> nefedova at viper:~/alamines> grep "Running job " MolDyn-244- >>>>>>>> loops-dbui34oxjr4j2.log | wc >>>>>>>> 248 6931 56718 >>>>>>>> nefedova at viper:~/alamines> >>>>>>>> >>>>>>>> On Aug 6, 2007, at 3:27 PM, Ioan Raicu wrote: >>>>>>>> >>>>>>>>> Everything is idle, there is no work to be done... >>>>>>>>> >>>>>>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> >>>>>>>>> tail GenericPortalWS_perf_per_sec.txt >>>>>>>>> 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>>> 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>>> 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>>> 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>>> 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>>> 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>>> 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>>> 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>>> 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>>> 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 >>>>>>>>> >>>>>>>>> 24 workers are registered but idle.... queue length 0, 57 >>>>>>>>> jobs completed. >>>>>>>>> >>>>>>>>> Also, see below all 57 jobs, they all finished with an exit >>>>>>>>> code of 0, in other words succesfully! How many jobs does >>>>>>>>> Swift think it sent? >>>>>>>>> >>>>>>>>> Ioan >>>>>>>>> >>>>>>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat >>>>>>>>> GenericPortalWS_taskPerf.txt >>>>>>>>> //taskNum taskID workerID startTimeStamp execTimeStamp >>>>>>>>> resultsQueueTimeStamp endTimeStamp waitQueueTime ex >>>>>>>>> ecTime resultsQueueTime totalTime exitCode >>>>>>>>> 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 >>>>>>>>> 560614 560629 49780 338 15 50133 0 >>>>>>>>> 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 >>>>>>>>> 561200 561899 561909 216 699 10 925 0 >>>>>>>>> 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 >>>>>>>>> 561373 562150 562159 382 777 9 1168 0 >>>>>>>>> 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 >>>>>>>>> 1044916 1044926 62404 10200 10 72614 0 >>>>>>>>> 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 >>>>>>>>> 1046453 1047038 1047067 135 585 29 749 0 >>>>>>>>> 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 >>>>>>>>> 1046429 1053072 1053080 114 6643 8 6765 0 >>>>>>>>> 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 >>>>>>>>> 1047051 1054256 1054290 731 7205 34 7970 0 >>>>>>>>> 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 >>>>>>>>> 1054267 1054570 1054579 7943 303 9 8255 0 >>>>>>>>> 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 >>>>>>>>> 1053087 1056811 1056819 6765 3724 8 10497 0 >>>>>>>>> 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 >>>>>>>>> 1054583 1058691 1058719 8257 4108 28 12393 0 >>>>>>>>> 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 >>>>>>>>> 1058704 1059363 1059385 12373 659 22 13054 0 >>>>>>>>> 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 >>>>>>>>> 1056826 1060315 1060323 10497 3489 8 13994 0 >>>>>>>>> 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 >>>>>>>>> 1059375 1060589 1060596 13042 1214 7 14263 0 >>>>>>>>> 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 >>>>>>>>> 1060603 1060954 1061054 14265 351 100 14716 0 >>>>>>>>> 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 >>>>>>>>> 1060329 1061094 1061126 13993 765 32 14790 0 >>>>>>>>> 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 >>>>>>>>> 1061105 1065608 1065617 14414 4503 9 18926 0 >>>>>>>>> 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 >>>>>>>>> 1065622 1066307 1066315 18929 685 8 19622 0 >>>>>>>>> 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 >>>>>>>>> 1061045 1067540 1067563 14356 6495 23 20874 0 >>>>>>>>> 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 >>>>>>>>> 1066320 1069262 1069271 19625 2942 9 22576 0 >>>>>>>>> 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 >>>>>>>>> 1067551 1071003 1071011 20854 3452 8 24314 0 >>>>>>>>> 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 >>>>>>>>> 1071016 1071664 1071671 24316 648 7 24971 0 >>>>>>>>> 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 >>>>>>>>> 1069275 1071679 1071692 22577 2404 13 24994 0 >>>>>>>>> 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 >>>>>>>>> 1071687 1073978 1073988 24985 2291 10 27286 0 >>>>>>>>> 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 >>>>>>>>> 1073992 1075959 1075969 27286 1967 10 29263 0 >>>>>>>>> 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 >>>>>>>>> 1071699 1076704 1076713 24995 5005 9 30009 0 >>>>>>>>> 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 >>>>>>>>> 1075972 1077451 1077459 29264 1479 8 30751 0 >>>>>>>>> 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 >>>>>>>>> 1076717 1080157 1080165 30007 3440 8 33455 0 >>>>>>>>> 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 >>>>>>>>> 1077464 1080270 1080286 30752 2806 16 33574 0 >>>>>>>>> 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 >>>>>>>>> 1080170 1080611 1080619 33457 441 8 33906 0 >>>>>>>>> 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 >>>>>>>>> 1080624 1080973 1080983 33907 349 10 34266 0 >>>>>>>>> 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 >>>>>>>>> 1080281 1081405 1081413 33566 1124 8 34698 0 >>>>>>>>> 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 >>>>>>>>> 1080986 1082989 1082996 34267 2003 7 36277 0 >>>>>>>>> 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 >>>>>>>>> 1083002 1083370 1083378 36279 368 8 36655 0 >>>>>>>>> 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 >>>>>>>>> 1081417 1084830 1084837 34696 3413 7 38116 0 >>>>>>>>> 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 >>>>>>>>> 1084843 1085854 1085879 37761 1011 25 38797 0 >>>>>>>>> 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 >>>>>>>>> 1085865 1089502 1089511 38780 3637 9 42426 0 >>>>>>>>> 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 >>>>>>>>> 1089515 1089966 1089974 42428 451 8 42887 0 >>>>>>>>> 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 >>>>>>>>> 1083383 1091316 1091324 36658 7933 8 44599 0 >>>>>>>>> 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 >>>>>>>>> 1091329 1092042 1092049 44237 713 7 44957 0 >>>>>>>>> 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 >>>>>>>>> 1092055 1094242 1094249 44960 2187 7 47154 0 >>>>>>>>> 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 >>>>>>>>> 1089979 1094418 1094428 42889 4439 10 47338 0 >>>>>>>>> 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 >>>>>>>>> 1094433 1095082 1095089 47331 649 7 47987 0 >>>>>>>>> 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 >>>>>>>>> 1095095 1096846 1096853 47991 1751 7 49749 0 >>>>>>>>> 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 >>>>>>>>> 1094256 1098214 1098221 47156 3958 7 51121 0 >>>>>>>>> 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 >>>>>>>>> 1096859 1098627 1098637 49752 1768 10 51530 0 >>>>>>>>> 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 >>>>>>>>> 1094037 1098903 1098910 46940 4866 7 51813 0 >>>>>>>>> 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 >>>>>>>>> 1099192 1100210 1100246 52071 1018 36 53125 0 >>>>>>>>> 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 >>>>>>>>> 1097371 1100555 1100562 50260 3184 7 53451 0 >>>>>>>>> 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 >>>>>>>>> 1097135 1100896 1100904 50026 3761 8 53795 0 >>>>>>>>> 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 >>>>>>>>> 1098640 1101106 1101127 51523 2466 21 54010 0 >>>>>>>>> 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 >>>>>>>>> 1099965 1101217 1101224 52842 1252 7 54101 0 >>>>>>>>> 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 >>>>>>>>> 1098227 1101820 1101828 51112 3593 8 54713 0 >>>>>>>>> 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 >>>>>>>>> 1097375 1104132 1104139 50262 6757 7 57026 0 >>>>>>>>> 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 >>>>>>>>> 1100221 1106449 1106458 53096 6228 9 59333 0 >>>>>>>>> 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 >>>>>>>>> 1098916 1106473 1106481 51797 7557 8 59362 0 >>>>>>>>> 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 >>>>>>>>> 563384 1207793 1207801 71 644409 8 644488 0 >>>>>>>>> 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 >>>>>>>>> 563413 1216404 1216425 98 652991 21 653110 0 >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> Veronika Nefedova wrote: >>>>>>>>>> OK. There is something weird happening. I've got several >>>>>>>>>> such entries in my swift log: >>>>>>>>>> >>>>>>>>>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application >>>>>>>>>> exception: Task failed >>>>>>>>>> task:execute @ vdl-int.k, line: 332 >>>>>>>>>> vdl:execute2 @ execute-default.k, line: 22 >>>>>>>>>> vdl:execute @ MolDyn-244-loops.kml, line: 20 >>>>>>>>>> antchmbr @ MolDyn-244-loops.kml, line: 2845 >>>>>>>>>> vdl:mains @ MolDyn-244-loops.kml, line: 2267 >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Looks like antechamber has failed (?). And the failure is >>>>>>>>>> only on a swfit side, it never made it across to Falcon >>>>>>>>>> (there are no remote directories created). But I see some >>>>>>>>>> of antechamber jobs have finished (in shared). >>>>>>>>>> >>>>>>>>>> Yuqing -- could the changes you've made be responsible for >>>>>>>>>> these failures (I do not see how it could though) ? >>>>>>>>>> >>>>>>>>>> Ioan, what do you see in your logs ion these tasks: >>>>>>>>>> >>>>>>>>>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, >>>>>>>>>> identity=urn:0-1-56-0-1186429255786) setting status to Failed >>>>>>>>>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, >>>>>>>>>> identity=urn:0-1-57-0-1186429255798) setting status to Failed >>>>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>>>> identity=urn:0-1-59-0-1186429255800) setting status to Failed >>>>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>>>> identity=urn:0-1-60-0-1186429255805) setting status to Failed >>>>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>>>> identity=urn:0-1-61-0-1186429255811) setting status to Failed >>>>>>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, >>>>>>>>>> identity=urn:0-1-58-0-1186429255814) setting status to Failed >>>>>>>>>> >>>>>>>>>> Nika >>>>>>>>>> >>>>>>>>>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: >>>>>>>>>> >>>>>>>>>>> OK! >>>>>>>>>>> Why don't we do one last run from my allocation, as >>>>>>>>>>> everything is set up already and ready to go! Make sure >>>>>>>>>>> to enable all debug logging. Falkon is up and running >>>>>>>>>>> with all debug enabled! >>>>>>>>>>> >>>>>>>>>>> Falkon location is unchanged from the last experiment. >>>>>>>>>>> Falkon Factory Service: http://tg-viz-login2:50010/wsrf/ >>>>>>>>>>> services/GenericPortal/core/WS/GPFactoryService >>>>>>>>>>> Web Server (graphs): http://tg-viz-login2.uc.teragrid.org: >>>>>>>>>>> 51000/index.htm >>>>>>>>>>> >>>>>>>>>>> ANL/UC is not quite so idle as it was earlier, but I bet >>>>>>>>>>> we could still get 150~200 processors! >>>>>>>>>>> >>>>>>>>>>> Ioan >>>>>>>>>>> >>>>>>>>>>> Veronika Nefedova wrote: >>>>>>>>>>>> m050 and m179 finished just fine now via GRAM (thanks to >>>>>>>>>>>> Yuqing who fixed the m179 just in time!). We could start >>>>>>>>>>>> again the 244- molecule run to verify that nothing is >>>>>>>>>>>> wrong with the whole system. >>>>>>>>>>>> >>>>>>>>>>>> Nika >>>>>>>>>>>> >>>>>>>>>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: >>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> I started those 2 molecules via GRAM. I have no trust >>>>>>>>>>>>> in m179 finishing completely since I didn't change >>>>>>>>>>>>> anything. I hope for m050 to finish though... >>>>>>>>>>>>> You can watch the swift log on viper in ~nefedova/ >>>>>>>>>>>>> alamines/MolDyn-2-loops-be9484k93kk21.log >>>>>>>>>>>>> >>>>>>>>>>>>> Nika >>>>>>>>>>>>> >>>>>>>>>>>>>> Then, let's try another run with 244 molecules soon, >>>>>>>>>>>>>> as most of ANL/UC is free! >>>>>>>>>>>>>> >>>>>>>>>>>>>> Ioan >>>>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Swift-devel mailing list >>>>>>> Swift-devel at ci.uchicago.edu >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>> >>>>>> >>>>> >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>> >>>> >>> >> >> > From iraicu at cs.uchicago.edu Mon Aug 6 22:06:34 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 22:06:34 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B7DE24.7030600@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <46B7DE24.7030600@cs.uchicago.edu> Message-ID: <46B7E1BA.8090903@cs.uchicago.edu> Just resending the email, I don't think it made it through the first time... Ioan Raicu wrote: > I have 7959 jobs completed with an exit code of 0, no failed jobs! > All the Falkon logs point to the same 7959 number of jobs, and when > they were all completed, no new jobs came in from Swift... > > How many jobs do you see submitted, and how many have been completed > in the Swift logs? > > Everything looks 100% normal on Falkon's end. > > Ioan > > Veronika Nefedova wrote: >> Whats up now? Everything has stopped, no errors on swift site... >> Do you have any errors now? >> >> Nika >> >> On Aug 6, 2007, at 6:04 PM, Ioan Raicu wrote: >> >>> OK, I restarted Falkon as well as there were 12K jobs trying to go >>> through, and keeping the entire ANL/UC site busy, although there was >>> no Swift on the other end to pick up the notifications... >>> >>> here is the new info: >>> >>> Falkon Factory Service: >>> http://tg-viz-login2:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService >>> >>> Web server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>> >>> Note that I changed the port #, its now 50020, so don't forget to >>> change that before you start Swift... >>> >>> Ioan >>> >>> Veronika Nefedova wrote: >>> From iraicu at cs.uchicago.edu Mon Aug 6 22:07:00 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 22:07:00 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B7DF55.4080205@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <46B7DF55.4080205@cs.uchicago.edu> Message-ID: <46B7E1D4.4050809@cs.uchicago.edu> Just resending this other email, I don't think it made it through the first time... Ioan Raicu wrote: > One other thing, in the past, once it got past the first few stages, > it would submit about 16500 jobs all at once, and then it would keep > sending a few at a time for every few that were completed.... this > time, it sent out about 6000 jobs all at once (making the queue go up > to 7K+ jobs), but after that, it did not submit any new jobs, despite > many jobs completing.... and eventually, the queue went to 0, and it > went all idle.... this is very different than what we saw in previous > runs! Whatever happened, it happened in the middle of the experiment, > when it only sent the 6K jobs (instead of 16K it would normally send > at this stage). If there is no discrepancy between the # of jobs > Swift think it sent Falkon and what Falkon received, then it is beyond > me what happened. > > Ioan > > Veronika Nefedova wrote: >> Whats up now? Everything has stopped, no errors on swift site... >> Do you have any errors now? >> >> Nika >> >> On Aug 6, 2007, at 6:04 PM, Ioan Raicu wrote: >> >>> OK, I restarted Falkon as well as there were 12K jobs trying to go >>> through, and keeping the entire ANL/UC site busy, although there was >>> no Swift on the other end to pick up the notifications... >>> >>> here is the new info: >>> >>> Falkon Factory Service: >>> http://tg-viz-login2:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService >>> >>> Web server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>> >>> Note that I changed the port #, its now 50020, so don't forget to >>> change that before you start Swift... >>> >>> Ioan >>> > From iraicu at cs.uchicago.edu Mon Aug 6 22:28:47 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 06 Aug 2007 22:28:47 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> Message-ID: <46B7E6EF.6080909@cs.uchicago.edu> It looks like viper (where Swift is running) is idle, and so is tg-viz-login2 (where Falkon is running). What looks evident to me is that the normal list of events is for a successful task: iraicu at viper:/home/nefedova/alamines> grep "urn:0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn:0-1-73-2-31-0-0-1186444341989) setting status to Submitted 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn:0-1-73-2-31-0-0-1186444341989 0 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn:0-1-73-2-31-0-0-1186444341989) setting status to Completed iraicu at viper:/home/nefedova/alamines> grep "setting status to Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc 17566 175660 2179412 iraicu at viper:/home/nefedova/alamines> grep "NotificationThread notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc 7959 55713 785035 iraicu at viper:/home/nefedova/alamines> grep "setting status to Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc 190968 1909680 24003796 Now, 17566 tasks were submitted, 7959 notifiation were received from Falkon, and 190968 tasks were set to completed... Obviously this isn't right. Falkon only saw 7959 tasks, so I would argue that the # of notifications received is correct. The submitted # of tasks looks like the # I would have expected, but all the tasks did not make it to Falkon. The Falkon provider is what sits between the change of status to submitted, and the receipt of the notification, so I would say that is the first place we need to look for more details... there used to some extra debug info in the Falkon provider that simply printed all the tasks that were actually being submitted to Falkon (as opposed to just the change of status within Karajan). I don't see those debug statements, I bet they got overwritten in the SVN update. What about the completed tasks, why are there so many (190K) completed tasks? Where did they come from? Yong, are you keeping up with these emails? Do you still have a copy of the latest Falkon provider that you edited just before you left? Can you just take a look through there to make sure nothing has been broken with the SVN updates? If you don't have time for this now (considering today was your first day on the new job), I'll dig through there and see if I can make some sense of what is happening! One last thing, Ben mentioned that the Falkon provider you saw in Nika's account was different than what was in SVN. Ben, did you at least look at modification dates? How old was one as opposed to the other? I hope we did not revert back to an older version that might have had some bug in it.... Ioan Veronika Nefedova wrote: > Well, there are some discrepancies: > > nefedova at viper:~/alamines> grep "Completed job" > MolDyn-244-loops-zhgo6be8tjhi1.log | wc > 7959 244749 3241072 > nefedova at viper:~/alamines> grep "Running job" > MolDyn-244-loops-zhgo6be8tjhi1.log | wc > 17207 564648 7949388 > nefedova at viper:~/alamines> > > I.e. almost half of the jobs haven't finished (according to swift) > > I also have some exceptions: > > 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, > identity=urn:0-1-101-2-37-0-0-1186444363341) setting status to Failed > Exception in getFile > (80 of those): > nefedova at viper:~/alamines> grep "ailed" > MolDyn-244-loops-zhgo6be8tjhi1.log | wc > 80 880 9705 > nefedova at viper:~/alamines> > > > Nika > > On Aug 6, 2007, at 9:36 PM, Veronika Nefedova wrote: > >> Whats up now? Everything has stopped, no errors on swift site... >> Do you have any errors now? >> >> Nika >> >> On Aug 6, 2007, at 6:04 PM, Ioan Raicu wrote: >> >>> OK, I restarted Falkon as well as there were 12K jobs trying to go >>> through, and keeping the entire ANL/UC site busy, although there was >>> no Swift on the other end to pick up the notifications... >>> >>> here is the new info: >>> >>> Falkon Factory Service: >>> http://tg-viz-login2:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService >>> >>> Web server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>> >>> Note that I changed the port #, its now 50020, so don't forget to >>> change that before you start Swift... >>> >>> Ioan >>> > > From nikan at wideopenwest.com Mon Aug 6 22:07:39 2007 From: nikan at wideopenwest.com (Veronika Nefedova) Date: Mon, 6 Aug 2007 22:07:39 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> Message-ID: <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> Well, there are some discrepancies: nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- zhgo6be8tjhi1.log | wc 7959 244749 3241072 nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- zhgo6be8tjhi1.log | wc 17207 564648 7949388 nefedova at viper:~/alamines> I.e. almost half of the jobs haven't finished (according to swift) I also have some exceptions: 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception in getFile (80 of those): nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- zhgo6be8tjhi1.log | wc 80 880 9705 nefedova at viper:~/alamines> Nika On Aug 6, 2007, at 9:36 PM, Veronika Nefedova wrote: > Whats up now? Everything has stopped, no errors on swift site... > Do you have any errors now? > > Nika > > On Aug 6, 2007, at 6:04 PM, Ioan Raicu wrote: > >> OK, I restarted Falkon as well as there were 12K jobs trying to go >> through, and keeping the entire ANL/UC site busy, although there >> was no Swift on the other end to pick up the notifications... >> >> here is the new info: >> >> Falkon Factory Service: http://tg-viz-login2:50020/wsrf/services/ >> GenericPortal/core/WS/GPFactoryService >> Web server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >> >> Note that I changed the port #, its now 50020, so don't forget to >> change that before you start Swift... >> >> Ioan >> From hategan at mcs.anl.gov Tue Aug 7 01:02:44 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 07 Aug 2007 01:02:44 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> Message-ID: <1186466564.17222.2.camel@blabla.mcs.anl.gov> On Mon, 2007-08-06 at 17:25 -0500, Veronika Nefedova wrote: > OK. I accidentally closed viper window where I started the workflow. > The workflow was started with & so it was supposed to stay up even if > I exited the shell. But apparently it didn't! I don't think running it in the background prevents it from dying when the parent process dies. Using nohup will probably do what you want. > > This is the last entry in the log: > > 2007-08-06 17:16:59,483 INFO ResourcePool Destroying remote service > instance... dummy function, this doesn't really do anything... > > (and it doesn't change ever since). > > What went wrong ? Why closing the shell actually killed the job? (ps > shows no swift job) > I checked 'history' and in fact the job was started with &: > > 999 swift -tc.file tc-uc.data -sites.file sites-uc-64.xml -debug > MolDyn-244-loops.swift & > > I'll restart the workflow in 30 mins or so (from home) again. > > Sigh... > > Nika > > > On Aug 6, 2007, at 4:29 PM, Veronika Nefedova wrote: > > > Ioan, its all was due to NFS problems, I am convinced now... > > > > I restarted the run, the log is ~nefedova/alamines/MolDyn-244-loops- > > hxl1glhtqsag0.log > > > > Nika > > > > On Aug 6, 2007, at 4:20 PM, Ioan Raicu wrote: > > > >> Just to debug further.... I picked out 1 task at random from the > >> Swift log... > >> iraicu at viper:/home/nefedova/alamines> cat MolDyn-244-loops- > >> dbui34oxjr4j2.log | grep "urn:0-1-62-0-1186429258791" > >> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, identity=urn: > >> 0-1-62-0-1186429258791) setting status to Submitted > >> 2007-08-06 14:47:03,281 DEBUG TaskImpl Task(type=2, identity=urn: > >> 0-1-62-0-1186429258791) setting status to Active > >> 2007-08-06 14:47:03,704 DEBUG TaskImpl Task(type=2, identity=urn: > >> 0-1-62-0-1186429258791) setting status to Failed Exception in getFile > >> > >> but in my log, it is nowhere to be found... > >> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat > >> GenericPortalWS_taskPerf.txt | grep "urn:0-1-62-0-1186429258791" > >> > >> What does "setting status to Failed Exception in getFile" mean? > >> Could this mean that it failed on the data staging part, and that > >> it never made it to Falkon? > >> > >> BTW, it lloks as if there were really 539 jobs submitted... > >> > >> iraicu at viper:/home/nefedova/alamines> grep "Submitted" MolDyn-244- > >> loops-dbui34oxjr4j2.log | wc > >> 539 5390 62835 > >> > >> but again, only 57 made it to Falkon, and there were no exceptions > >> thrown anywhere to indicate that something unusual happened. > >> > >> Ioan > >> > >> Ioan Raicu wrote: > >>> Falkon only has 57 tasks received, here they are: > >>> tg-viz-login.uc.teragrid.org:/home/iraicu/java/Falkon_v0.8.1/ > >>> service/logs/GenericPortalWS.txt.0.summary > >>> > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> pre_ch-vsk58efi stdout.txt stderr.txt . ./m179.mol2 ./m050.mol2 > >>> m179_am1 m050_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/pre- > >>> antch.pl > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-xsk58efi stdout.txt stderr.txt m179_am1 m179_am1.rtf > >>> m179_am1.crd m179_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m179_am1 -fi mol2 -rn m179 -o m179_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-ysk58efi stdout.txt stderr.txt m050_am1 m050_am1.rtf > >>> m050_am1.crd m050_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m050_am1 -fi mol2 -rn m050 -o m050_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> chrm-0tk58efi equil_solv.out_m050 stderr.txt equil_solv.inp > >>> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp > >>> m050_am1.rtf m050_am1.prm m050_am1.crd water_400.crd > >>> equil_solv.out_m050 solv_m050.psf solv_m050_eq.crd solv_m050.rst > >>> solv_m050.trj solv_m050_min.crd /disks/scratchgpfs1/iraicu/ > >>> ModLyn/bin/charmm.sh system:solv_m050 title:solv stitle:m050 > >>> rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm > >>> gaff:m050_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 rwater: > >>> 15 nstep:10000 minstep:100 skipstep:100 startstep:10000 > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> chrm-zsk58efi equil_solv.out_m179 stderr.txt equil_solv.inp > >>> parm03_gaff_all.rtf parm03_gaffnb_all.prm equil_solv.inp > >>> m179_am1.rtf m179_am1.prm m179_am1.crd water_400.crd > >>> equil_solv.out_m179 solv_m179.psf solv_m179_eq.crd solv_m179.rst > >>> solv_m179.trj solv_m179_min.crd /disks/scratchgpfs1/iraicu/ > >>> ModLyn/bin/charmm.sh system:solv_m179 title:solv stitle:m179 > >>> rtffile:parm03_gaff_all.rtf paramfile:parm03_gaffnb_all.prm > >>> gaff:m179_am1 nwater:400 ligcrd:lyz rforce:0 iseed:3131887 rwater: > >>> 15 nstep:10000 minstep:100 skipstep:100 startstep:10000 > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> pre_ch-38lc8efi stdout.txt stderr.txt . ./m197.mol2 ./ > >>> m129.mol2 ./m069.mol2 ./m163.mol2 ./m128.mol2 ./m035.mol2 ./ > >>> m070.mol2 ./m221.mol2 ./m162.mol2 ./m198.mol2 ./m034.mol2 ./ > >>> m001.mol2 ./m220.mol2 ./m033.mol2 ./m161.mol2 ./m032.mol2 ./ > >>> m160.mol2 ./m130.mol2 ./m071.mol2 ./m002.mol2 ./m199.mol2 ./ > >>> m175.mol2 ./m234.mol2 ./m048.mol2 ./m107.mol2 ./m047.mol2 ./ > >>> m106.mol2 ./m124.mol2 ./m193.mol2 ./m225.mol2 ./m066.mol2 ./ > >>> m125.mol2 ./m176.mol2 ./m194.mol2 ./m224.mol2 ./m235.mol2 ./ > >>> m067.mol2 ./m165.mol2 ./m049.mol2 ./m126.mol2 ./m166.mol2 ./ > >>> m108.mol2 ./m195.mol2 ./m038.mol2 ./m059.mol2 ./m036.mol2 ./ > >>> m186.mol2 ./m164.mol2 ./m117.mol2 ./m223.mol2 ./m058.mol2 ./ > >>> m037.mol2 ./m188.mol2 ./m068.mol2 ./m119.mol2 ./m187.mol2 ./ > >>> m196.mol2 ./m118.mol2 ./m127.mol2 ./m222.mol2 ./m189.mol2 ./ > >>> m060.mol2 ./m236.mol2 ./m109.mol2 ./m177.mol2 ./m050.mol2 ./ > >>> m179.mol2 ./m178.mol2 ./m123.mol2 ./m237.mol2 ./m110.mol2 ./ > >>> m191.mol2 ./m100.mol2 ./m064.mol2 ./m041.mol2 ./m238.mol2 ./ > >>> m063.mol2 ./m228.mol2 ./m051.mol2 ./m122.mol2 ./m169.mol2 ./ > >>> m121.mol2 ./m190.mol2 ./m120.mol2 ./m062.mol2 ./m065.mol2 ./ > >>> m039.mol2 ./m192.mol2 ./m167.mol2 ./m227.mol2 ./m040.mol2 ./ > >>> m226.mol2 ./m168.mol2 ./m239.mol2 ./m052.mol2 ./m111.mol2 ./ > >>> m180.mol2 ./m053.mol2 ./m112.mol2 ./m181.mol2 ./m240.mol2 ./ > >>> m054.mol2 ./m044.mol2 ./m113.mol2 ./m230.mol2 ./m103.mol2 ./ > >>> m229.mol2 ./m061.mol2 ./m042.mol2 ./m101.mol2 ./m170.mol2 ./ > >>> m043.mol2 ./m102.mol2 ./m171.mol2 ./m151.mol2 ./m083.mol2 ./ > >>> m210.mol2 ./m014.mol2 ./m023.mol2 ./m200.mol2 ./m092.mol2 ./ > >>> m091.mol2 ./m150.mol2 ./m209.mol2 ./m022.mol2 ./m024.mol2 ./ > >>> m093.mol2 ./m015.mol2 ./m084.mol2 ./m142.mol2 ./m201.mol2 ./ > >>> m016.mol2 ./m085.mol2 ./m143.mol2 ./m202.mol2 ./m010.mol2 ./ > >>> m212.mol2 ./m138.mol2 ./m026.mol2 ./m011.mol2 ./m095.mol2 ./ > >>> m139.mol2 ./m154.mol2 ./m211.mol2 ./m025.mol2 ./m094.mol2 ./ > >>> m153.mol2 ./m213.mol2 ./m080.mol2 ./m012.mol2 ./m152.mol2 ./ > >>> m081.mol2 ./m140.mol2 ./m013.mol2 ./m082.mol2 ./m141.mol2 ./ > >>> m028.mol2 ./m097.mol2 ./m155.mol2 ./m008.mol2 ./m214.mol2 ./ > >>> m135.mol2 ./m029.mol2 ./m076.mol2 ./m098.mol2 ./m007.mol2 ./ > >>> m156.mol2 ./m134.mol2 ./m215.mol2 ./m137.mol2 ./m079.mol2 ./ > >>> m009.mol2 ./m078.mol2 ./m077.mol2 ./m096.mol2 ./m136.mol2 ./ > >>> m027.mol2 ./m132.mol2 ./m158.mol2 ./m073.mol2 ./m217.mol2 ./ > >>> m030.mol2 ./m159.mol2 ./m072.mol2 ./m218.mol2 ./m003.mol2 ./ > >>> m031.mol2 ./m004.mol2 ./m219.mol2 ./m131.mol2 ./m074.mol2 ./ > >>> m133.mol2 ./m006.mol2 ./m075.mol2 ./m157.mol2 ./m099.mol2 ./ > >>> m005.mol2 ./m216.mol2 ./m090.mol2 ./m021.mol2 ./m208.mol2 ./ > >>> m149.mol2 ./m020.mol2 ./m207.mol2 ./m148.mol2 ./m088.mol2 ./ > >>> m089.mol2 ./m206.mol2 ./m147.mol2 ./m019.mol2 ./m205.mol2 ./ > >>> m146.mol2 ./m087.mol2 ./m018.mol2 ./m204.mol2 ./m145.mol2 ./ > >>> m086.mol2 ./m017.mol2 ./m144.mol2 ./m203.mol2 ./m057.mol2 ./ > >>> m116.mol2 ./m232.mol2 ./m173.mol2 ./m105.mol2 ./m046.mol2 ./ > >>> m231.mol2 ./m172.mol2 ./m104.mol2 ./m045.mol2 ./m174.mol2 ./ > >>> m233.mol2 ./m244.mol2 ./m185.mol2 ./m182.mol2 ./m243.mol2 ./ > >>> m055.mol2 ./m241.mol2 ./m183.mol2 ./m114.mol2 ./m056.mol2 ./ > >>> m242.mol2 ./m184.mol2 ./m115.mol2 m197_am1 m129_am1 m069_am1 > >>> m163_am1 m128_am1 m035_am1 m070_am1 m221_am1 m162_am1 m198_am1 > >>> m034_am1 m001_am1 m220_am1 m033_am1 m161_am1 m032_am1 m160_am1 > >>> m130_am1 m071_am1 m002_am1 m199_am1 m175_am1 m234_am1 m048_am1 > >>> m107_am1 m047_am1 m106_am1 m124_am1 m193_am1 m225_am1 m066_am1 > >>> m125_am1 m176_am1 m194_am1 m224_am1 m235_am1 m067_am1 m165_am1 > >>> m049_am1 m126_am1 m166_am1 m108_am1 m195_am1 m038_am1 m059_am1 > >>> m036_am1 m186_am1 m164_am1 m223_am1 m117_am1 m037_am1 m058_am1 > >>> m068_am1 m188_am1 m119_am1 m196_am1 m187_am1 m222_am1 m127_am1 > >>> m118_am1 m189_am1 m060_am1 m236_am1 m109_am1 m177_am1 m050_am1 > >>> m179_am1 m123_am1 m178_am1 m237_am1 m100_am1 m191_am1 m110_am1 > >>> m041_am1 m064_am1 m228_am1 m063_am1 m238_am1 m169_am1 m122_am1 > >>> m051_am1 m121_am1 m190_am1 m120_am1 m062_am1 m039_am1 m065_am1 > >>> m167_am1 m192_am1 m227_am1 m040_am1 m226_am1 m168_am1 m239_am1 > >>> m052_am1 m111_am1 m180_am1 m053_am1 m112_am1 m181_am1 m240_am1 > >>> m054_am1 m044_am1 m113_am1 m230_am1 m103_am1 m229_am1 m061_am1 > >>> m042_am1 m101_am1 m170_am1 m043_am1 m102_am1 m171_am1 m151_am1 > >>> m083_am1 m210_am1 m014_am1 m023_am1 m200_am1 m092_am1 m091_am1 > >>> m150_am1 m209_am1 m022_am1 m024_am1 m093_am1 m015_am1 m084_am1 > >>> m142_am1 m201_am1 m016_am1 m085_am1 m143_am1 m202_am1 m010_am1 > >>> m212_am1 m138_am1 m026_am1 m011_am1 m095_am1 m139_am1 m154_am1 > >>> m211_am1 m025_am1 m094_am1 m153_am1 m213_am1 m080_am1 m012_am1 > >>> m152_am1 m081_am1 m140_am1 m013_am1 m082_am1 m141_am1 m028_am1 > >>> m097_am1 m155_am1 m008_am1 m214_am1 m135_am1 m029_am1 m076_am1 > >>> m098_am1 m007_am1 m156_am1 m134_am1 m215_am1 m137_am1 m079_am1 > >>> m009_am1 m078_am1 m077_am1 m096_am1 m136_am1 m027_am1 m132_am1 > >>> m158_am1 m073_am1 m217_am1 m030_am1 m159_am1 m072_am1 m218_am1 > >>> m003_am1 m031_am1 m004_am1 m219_am1 m131_am1 m074_am1 m133_am1 > >>> m006_am1 m075_am1 m157_am1 m099_am1 m216_am1 m005_am1 m090_am1 > >>> m021_am1 m208_am1 m149_am1 m020_am1 m207_am1 m148_am1 m089_am1 > >>> m088_am1 m206_am1 m147_am1 m019_am1 m205_am1 m146_am1 m087_am1 > >>> m018_am1 m204_am1 m145_am1 m086_am1 m017_am1 m144_am1 m203_am1 > >>> m057_am1 m116_am1 m232_am1 m173_am1 m105_am1 m046_am1 m231_am1 > >>> m172_am1 m104_am1 m045_am1 m174_am1 m233_am1 m244_am1 m185_am1 > >>> m182_am1 m243_am1 m055_am1 m241_am1 m183_am1 m114_am1 m056_am1 > >>> m242_am1 m184_am1 m115_am1 /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> pre-antch.pl > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-58lc8efi stdout.txt stderr.txt m197_am1 m197_am1.rtf > >>> m197_am1.crd m197_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m197_am1 -fi mol2 -rn m197 -o m197_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-48lc8efi stdout.txt stderr.txt m129_am1 m129_am1.rtf > >>> m129_am1.crd m129_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m129_am1 -fi mol2 -rn m129 -o m129_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-68lc8efi stdout.txt stderr.txt m069_am1 m069_am1.rtf > >>> m069_am1.crd m069_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m069_am1 -fi mol2 -rn m069 -o m069_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-88lc8efi stdout.txt stderr.txt m163_am1 m163_am1.rtf > >>> m163_am1.crd m163_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m163_am1 -fi mol2 -rn m163 -o m163_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-78lc8efi stdout.txt stderr.txt m128_am1 m128_am1.rtf > >>> m128_am1.crd m128_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m128_am1 -fi mol2 -rn m128 -o m128_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-98lc8efi stdout.txt stderr.txt m035_am1 m035_am1.rtf > >>> m035_am1.crd m035_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m035_am1 -fi mol2 -rn m035 -o m035_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-a8lc8efi stdout.txt stderr.txt m070_am1 m070_am1.rtf > >>> m070_am1.crd m070_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m070_am1 -fi mol2 -rn m070 -o m070_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-b8lc8efi stdout.txt stderr.txt m221_am1 m221_am1.rtf > >>> m221_am1.crd m221_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m221_am1 -fi mol2 -rn m221 -o m221_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-c8lc8efi stdout.txt stderr.txt m162_am1 m162_am1.rtf > >>> m162_am1.crd m162_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m162_am1 -fi mol2 -rn m162 -o m162_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-d8lc8efi stdout.txt stderr.txt m198_am1 m198_am1.rtf > >>> m198_am1.crd m198_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m198_am1 -fi mol2 -rn m198 -o m198_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-e8lc8efi stdout.txt stderr.txt m034_am1 m034_am1.rtf > >>> m034_am1.crd m034_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m034_am1 -fi mol2 -rn m034 -o m034_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-f8lc8efi stdout.txt stderr.txt m001_am1 m001_am1.rtf > >>> m001_am1.crd m001_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m001_am1 -fi mol2 -rn m001 -o m001_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-h8lc8efi stdout.txt stderr.txt m033_am1 m033_am1.rtf > >>> m033_am1.crd m033_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m033_am1 -fi mol2 -rn m033 -o m033_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-g8lc8efi stdout.txt stderr.txt m220_am1 m220_am1.rtf > >>> m220_am1.crd m220_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m220_am1 -fi mol2 -rn m220 -o m220_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-i8lc8efi stdout.txt stderr.txt m161_am1 m161_am1.rtf > >>> m161_am1.crd m161_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m161_am1 -fi mol2 -rn m161 -o m161_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-j8lc8efi stdout.txt stderr.txt m032_am1 m032_am1.rtf > >>> m032_am1.crd m032_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m032_am1 -fi mol2 -rn m032 -o m032_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-k8lc8efi stdout.txt stderr.txt m160_am1 m160_am1.rtf > >>> m160_am1.crd m160_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m160_am1 -fi mol2 -rn m160 -o m160_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-l8lc8efi stdout.txt stderr.txt m130_am1 m130_am1.rtf > >>> m130_am1.crd m130_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m130_am1 -fi mol2 -rn m130 -o m130_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-m8lc8efi stdout.txt stderr.txt m071_am1 m071_am1.rtf > >>> m071_am1.crd m071_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m071_am1 -fi mol2 -rn m071 -o m071_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-o8lc8efi stdout.txt stderr.txt m199_am1 m199_am1.rtf > >>> m199_am1.crd m199_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m199_am1 -fi mol2 -rn m199 -o m199_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-n8lc8efi stdout.txt stderr.txt m002_am1 m002_am1.rtf > >>> m002_am1.crd m002_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m002_am1 -fi mol2 -rn m002 -o m002_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-p8lc8efi stdout.txt stderr.txt m175_am1 m175_am1.rtf > >>> m175_am1.crd m175_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m175_am1 -fi mol2 -rn m175 -o m175_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-q8lc8efi stdout.txt stderr.txt m234_am1 m234_am1.rtf > >>> m234_am1.crd m234_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m234_am1 -fi mol2 -rn m234 -o m234_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-s8lc8efi stdout.txt stderr.txt m107_am1 m107_am1.rtf > >>> m107_am1.crd m107_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m107_am1 -fi mol2 -rn m107 -o m107_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-r8lc8efi stdout.txt stderr.txt m048_am1 m048_am1.rtf > >>> m048_am1.crd m048_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m048_am1 -fi mol2 -rn m048 -o m048_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-v8lc8efi stdout.txt stderr.txt m124_am1 m124_am1.rtf > >>> m124_am1.crd m124_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m124_am1 -fi mol2 -rn m124 -o m124_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-t8lc8efi stdout.txt stderr.txt m047_am1 m047_am1.rtf > >>> m047_am1.crd m047_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m047_am1 -fi mol2 -rn m047 -o m047_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-u8lc8efi stdout.txt stderr.txt m106_am1 m106_am1.rtf > >>> m106_am1.crd m106_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m106_am1 -fi mol2 -rn m106 -o m106_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-x8lc8efi stdout.txt stderr.txt m193_am1 m193_am1.rtf > >>> m193_am1.crd m193_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m193_am1 -fi mol2 -rn m193 -o m193_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-y8lc8efi stdout.txt stderr.txt m225_am1 m225_am1.rtf > >>> m225_am1.crd m225_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m225_am1 -fi mol2 -rn m225 -o m225_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-z8lc8efi stdout.txt stderr.txt m066_am1 m066_am1.rtf > >>> m066_am1.crd m066_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m066_am1 -fi mol2 -rn m066 -o m066_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-09lc8efi stdout.txt stderr.txt m125_am1 m125_am1.rtf > >>> m125_am1.crd m125_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m125_am1 -fi mol2 -rn m125 -o m125_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-29lc8efi stdout.txt stderr.txt m194_am1 m194_am1.rtf > >>> m194_am1.crd m194_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m194_am1 -fi mol2 -rn m194 -o m194_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-19lc8efi stdout.txt stderr.txt m176_am1 m176_am1.rtf > >>> m176_am1.crd m176_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m176_am1 -fi mol2 -rn m176 -o m176_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-39lc8efi stdout.txt stderr.txt m224_am1 m224_am1.rtf > >>> m224_am1.crd m224_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m224_am1 -fi mol2 -rn m224 -o m224_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-49lc8efi stdout.txt stderr.txt m235_am1 m235_am1.rtf > >>> m235_am1.crd m235_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m235_am1 -fi mol2 -rn m235 -o m235_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-69lc8efi stdout.txt stderr.txt m165_am1 m165_am1.rtf > >>> m165_am1.crd m165_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m165_am1 -fi mol2 -rn m165 -o m165_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-59lc8efi stdout.txt stderr.txt m067_am1 m067_am1.rtf > >>> m067_am1.crd m067_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m067_am1 -fi mol2 -rn m067 -o m067_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-79lc8efi stdout.txt stderr.txt m049_am1 m049_am1.rtf > >>> m049_am1.crd m049_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m049_am1 -fi mol2 -rn m049 -o m049_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-89lc8efi stdout.txt stderr.txt m126_am1 m126_am1.rtf > >>> m126_am1.crd m126_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m126_am1 -fi mol2 -rn m126 -o m126_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-99lc8efi stdout.txt stderr.txt m166_am1 m166_am1.rtf > >>> m166_am1.crd m166_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m166_am1 -fi mol2 -rn m166 -o m166_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-a9lc8efi stdout.txt stderr.txt m108_am1 m108_am1.rtf > >>> m108_am1.crd m108_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m108_am1 -fi mol2 -rn m108 -o m108_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-b9lc8efi stdout.txt stderr.txt m195_am1 m195_am1.rtf > >>> m195_am1.crd m195_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m195_am1 -fi mol2 -rn m195 -o m195_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-d9lc8efi stdout.txt stderr.txt m038_am1 m038_am1.rtf > >>> m038_am1.crd m038_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m038_am1 -fi mol2 -rn m038 -o m038_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-c9lc8efi stdout.txt stderr.txt m059_am1 m059_am1.rtf > >>> m059_am1.crd m059_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m059_am1 -fi mol2 -rn m059 -o m059_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-e9lc8efi stdout.txt stderr.txt m186_am1 m186_am1.rtf > >>> m186_am1.crd m186_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m186_am1 -fi mol2 -rn m186 -o m186_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-f9lc8efi stdout.txt stderr.txt m164_am1 m164_am1.rtf > >>> m164_am1.crd m164_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m164_am1 -fi mol2 -rn m164 -o m164_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-h9lc8efi stdout.txt stderr.txt m036_am1 m036_am1.rtf > >>> m036_am1.crd m036_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m036_am1 -fi mol2 -rn m036 -o m036_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-g9lc8efi stdout.txt stderr.txt m223_am1 m223_am1.rtf > >>> m223_am1.crd m223_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m223_am1 -fi mol2 -rn m223 -o m223_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-j9lc8efi stdout.txt stderr.txt m058_am1 m058_am1.rtf > >>> m058_am1.crd m058_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m058_am1 -fi mol2 -rn m058 -o m058_am1 -fo > >>> charmm -c bcc > >>> 128.135.160.234 : EXECUTABLE /bin/sh ARGUEMENTS shared/wrapper.sh > >>> antch-k9lc8efi stdout.txt stderr.txt m037_am1 m037_am1.rtf > >>> m037_am1.crd m037_am1.prm /disks/scratchgpfs1/iraicu/ModLyn/bin/ > >>> antechamber.sh -s 2 -i m037_am1 -fi mol2 -rn m037 -o m037_am1 -fo > >>> charmm -c bcc > >>> > >>> > >>> > >>> Veronika Nefedova wrote: > >>>> Swift thinks that it sent 248 jobs. > >>>> > >>>> nefedova at viper:~/alamines> grep "Running job " MolDyn-244-loops- > >>>> dbui34oxjr4j2.log | wc > >>>> 248 6931 56718 > >>>> nefedova at viper:~/alamines> > >>>> > >>>> On Aug 6, 2007, at 3:27 PM, Ioan Raicu wrote: > >>>> > >>>>> Everything is idle, there is no work to be done... > >>>>> > >>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> tail > >>>>> GenericPortalWS_perf_per_sec.txt > >>>>> 3510.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > >>>>> 3511.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > >>>>> 3512.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > >>>>> 3513.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > >>>>> 3514.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > >>>>> 3515.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > >>>>> 3516.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > >>>>> 3517.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > >>>>> 3518.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > >>>>> 3519.997 0 2 41 24 24 0 0 0.0 0.0 0.0 0.0 57.0 0.0 > >>>>> > >>>>> 24 workers are registered but idle.... queue length 0, 57 jobs > >>>>> completed. > >>>>> > >>>>> Also, see below all 57 jobs, they all finished with an exit > >>>>> code of 0, in other words succesfully! How many jobs does > >>>>> Swift think it sent? > >>>>> > >>>>> Ioan > >>>>> > >>>>> iraicu at tg-viz-login2:~/java/Falkon_v0.8.1/service/logs> cat > >>>>> GenericPortalWS_taskPerf.txt > >>>>> //taskNum taskID workerID startTimeStamp execTimeStamp > >>>>> resultsQueueTimeStamp endTimeStamp waitQueueTime ex > >>>>> ecTime resultsQueueTime totalTime exitCode > >>>>> 1 urn:0-0-1186428880921 192.5.198.70:50100 510496 560276 560614 > >>>>> 560629 49780 338 15 50133 0 > >>>>> 2 urn:0-1-1-0-1186428880939 192.5.198.70:50101 560984 561200 > >>>>> 561899 561909 216 699 10 925 0 > >>>>> 3 urn:0-1-2-0-1186428880941 192.5.198.70:50100 560991 561373 > >>>>> 562150 562159 382 777 9 1168 0 > >>>>> 4 urn:0-0-1186429254652 192.5.198.71:50100 972312 1034716 > >>>>> 1044916 1044926 62404 10200 10 72614 0 > >>>>> 5 urn:0-1-2-0-1186429255467 192.5.198.71:50101 1046318 1046453 > >>>>> 1047038 1047067 135 585 29 749 0 > >>>>> 6 urn:0-1-1-0-1186429255461 192.5.198.71:50100 1046315 1046429 > >>>>> 1053072 1053080 114 6643 8 6765 0 > >>>>> 7 urn:0-1-3-0-1186429255469 192.5.198.71:50101 1046320 1047051 > >>>>> 1054256 1054290 731 7205 34 7970 0 > >>>>> 8 urn:0-1-5-0-1186429255481 192.5.198.71:50101 1046324 1054267 > >>>>> 1054570 1054579 7943 303 9 8255 0 > >>>>> 9 urn:0-1-4-0-1186429255479 192.5.198.71:50100 1046322 1053087 > >>>>> 1056811 1056819 6765 3724 8 10497 0 > >>>>> 10 urn:0-1-6-0-1186429255484 192.5.198.71:50101 1046326 1054583 > >>>>> 1058691 1058719 8257 4108 28 12393 0 > >>>>> 11 urn:0-1-8-0-1186429255495 192.5.198.71:50101 1046331 1058704 > >>>>> 1059363 1059385 12373 659 22 13054 0 > >>>>> 12 urn:0-1-7-0-1186429255486 192.5.198.71:50100 1046329 1056826 > >>>>> 1060315 1060323 10497 3489 8 13994 0 > >>>>> 13 urn:0-1-9-0-1186429255502 192.5.198.71:50101 1046333 1059375 > >>>>> 1060589 1060596 13042 1214 7 14263 0 > >>>>> 14 urn:0-1-11-0-1186429255514 192.5.198.71:50101 1046338 > >>>>> 1060603 1060954 1061054 14265 351 100 14716 0 > >>>>> 15 urn:0-1-10-0-1186429255511 192.5.198.71:50100 1046336 > >>>>> 1060329 1061094 1061126 13993 765 32 14790 0 > >>>>> 16 urn:0-1-14-0-1186429255533 192.5.198.71:50100 1046691 > >>>>> 1061105 1065608 1065617 14414 4503 9 18926 0 > >>>>> 17 urn:0-1-13-0-1186429255535 192.5.198.71:50100 1046693 > >>>>> 1065622 1066307 1066315 18929 685 8 19622 0 > >>>>> 18 urn:0-1-12-0-1186429255524 192.5.198.71:50101 1046689 > >>>>> 1061045 1067540 1067563 14356 6495 23 20874 0 > >>>>> 19 urn:0-1-15-0-1186429255539 192.5.198.71:50100 1046695 > >>>>> 1066320 1069262 1069271 19625 2942 9 22576 0 > >>>>> 20 urn:0-1-16-0-1186429255543 192.5.198.71:50101 1046697 > >>>>> 1067551 1071003 1071011 20854 3452 8 24314 0 > >>>>> 21 urn:0-1-18-0-1186429255559 192.5.198.71:50101 1046700 > >>>>> 1071016 1071664 1071671 24316 648 7 24971 0 > >>>>> 22 urn:0-1-17-0-1186429255557 192.5.198.71:50100 1046698 > >>>>> 1069275 1071679 1071692 22577 2404 13 24994 0 > >>>>> 23 urn:0-1-19-0-1186429255565 192.5.198.71:50101 1046702 > >>>>> 1071687 1073978 1073988 24985 2291 10 27286 0 > >>>>> 24 urn:0-1-20-0-1186429255572 192.5.198.71:50101 1046706 > >>>>> 1073992 1075959 1075969 27286 1967 10 29263 0 > >>>>> 25 urn:0-1-21-0-1186429255567 192.5.198.71:50100 1046704 > >>>>> 1071699 1076704 1076713 24995 5005 9 30009 0 > >>>>> 26 urn:0-1-22-0-1186429255587 192.5.198.71:50101 1046708 > >>>>> 1075972 1077451 1077459 29264 1479 8 30751 0 > >>>>> 27 urn:0-1-23-0-1186429255595 192.5.198.71:50100 1046710 > >>>>> 1076717 1080157 1080165 30007 3440 8 33455 0 > >>>>> 28 urn:0-1-25-0-1186429255599 192.5.198.71:50101 1046712 > >>>>> 1077464 1080270 1080286 30752 2806 16 33574 0 > >>>>> 29 urn:0-1-24-0-1186429255601 192.5.198.71:50100 1046713 > >>>>> 1080170 1080611 1080619 33457 441 8 33906 0 > >>>>> 30 urn:0-1-26-0-1186429255613 192.5.198.71:50100 1046717 > >>>>> 1080624 1080973 1080983 33907 349 10 34266 0 > >>>>> 31 urn:0-1-28-0-1186429255611 192.5.198.71:50101 1046715 > >>>>> 1080281 1081405 1081413 33566 1124 8 34698 0 > >>>>> 32 urn:0-1-27-0-1186429255616 192.5.198.71:50100 1046719 > >>>>> 1080986 1082989 1082996 34267 2003 7 36277 0 > >>>>> 33 urn:0-1-30-0-1186429255635 192.5.198.71:50100 1046723 > >>>>> 1083002 1083370 1083378 36279 368 8 36655 0 > >>>>> 34 urn:0-1-29-0-1186429255622 192.5.198.71:50101 1046721 > >>>>> 1081417 1084830 1084837 34696 3413 7 38116 0 > >>>>> 35 urn:0-1-32-0-1186429255652 192.5.198.71:50101 1047082 > >>>>> 1084843 1085854 1085879 37761 1011 25 38797 0 > >>>>> 36 urn:0-1-34-0-1186429255654 192.5.198.71:50101 1047085 > >>>>> 1085865 1089502 1089511 38780 3637 9 42426 0 > >>>>> 37 urn:0-1-33-0-1186429255656 192.5.198.71:50101 1047087 > >>>>> 1089515 1089966 1089974 42428 451 8 42887 0 > >>>>> 38 urn:0-1-31-0-1186429255642 192.5.198.71:50100 1046725 > >>>>> 1083383 1091316 1091324 36658 7933 8 44599 0 > >>>>> 39 urn:0-1-36-0-1186429255664 192.5.198.71:50100 1047092 > >>>>> 1091329 1092042 1092049 44237 713 7 44957 0 > >>>>> 40 urn:0-1-38-0-1186429255673 192.5.198.71:50100 1047095 > >>>>> 1092055 1094242 1094249 44960 2187 7 47154 0 > >>>>> 41 urn:0-1-35-0-1186429255658 192.5.198.71:50101 1047090 > >>>>> 1089979 1094418 1094428 42889 4439 10 47338 0 > >>>>> 42 urn:0-1-40-0-1186429255696 192.5.198.71:50101 1047102 > >>>>> 1094433 1095082 1095089 47331 649 7 47987 0 > >>>>> 43 urn:0-1-41-0-1186429255692 192.5.198.71:50101 1047104 > >>>>> 1095095 1096846 1096853 47991 1751 7 49749 0 > >>>>> 44 urn:0-1-39-0-1186429255686 192.5.198.71:50100 1047100 > >>>>> 1094256 1098214 1098221 47156 3958 7 51121 0 > >>>>> 45 urn:0-1-42-0-1186429255700 192.5.198.71:50101 1047107 > >>>>> 1096859 1098627 1098637 49752 1768 10 51530 0 > >>>>> 46 urn:0-1-37-0-1186429255681 192.5.198.67:50100 1047097 > >>>>> 1094037 1098903 1098910 46940 4866 7 51813 0 > >>>>> 47 urn:0-1-50-0-1186429255749 192.5.198.67:50101 1047121 > >>>>> 1099192 1100210 1100246 52071 1018 36 53125 0 > >>>>> 48 urn:0-1-44-0-1186429255720 192.5.198.57:50101 1047111 > >>>>> 1097371 1100555 1100562 50260 3184 7 53451 0 > >>>>> 49 urn:0-1-43-0-1186429255705 192.5.198.66:50100 1047109 > >>>>> 1097135 1100896 1100904 50026 3761 8 53795 0 > >>>>> 50 urn:0-1-48-0-1186429255737 192.5.198.71:50101 1047117 > >>>>> 1098640 1101106 1101127 51523 2466 21 54010 0 > >>>>> 51 urn:0-1-51-0-1186429255755 192.5.198.55:50100 1047123 > >>>>> 1099965 1101217 1101224 52842 1252 7 54101 0 > >>>>> 52 urn:0-1-47-0-1186429255731 192.5.198.71:50100 1047115 > >>>>> 1098227 1101820 1101828 51112 3593 8 54713 0 > >>>>> 53 urn:0-1-45-0-1186429255723 192.5.198.57:50100 1047113 > >>>>> 1097375 1104132 1104139 50262 6757 7 57026 0 > >>>>> 54 urn:0-1-52-0-1186429255764 192.5.198.67:50101 1047125 > >>>>> 1100221 1106449 1106458 53096 6228 9 59333 0 > >>>>> 55 urn:0-1-46-0-1186429255743 192.5.198.67:50100 1047119 > >>>>> 1098916 1106473 1106481 51797 7557 8 59362 0 > >>>>> 56 urn:0-1-2-1-1186428881026 192.5.198.70:50101 563313 563384 > >>>>> 1207793 1207801 71 644409 8 644488 0 > >>>>> 57 urn:0-1-1-1-1186428881028 192.5.198.70:50100 563315 563413 > >>>>> 1216404 1216425 98 652991 21 653110 0 > >>>>> > >>>>> > >>>>> > >>>>> Veronika Nefedova wrote: > >>>>>> OK. There is something weird happening. I've got several such > >>>>>> entries in my swift log: > >>>>>> > >>>>>> 2007-08-06 14:46:58,565 DEBUG vdl:execute2 Application > >>>>>> exception: Task failed > >>>>>> task:execute @ vdl-int.k, line: 332 > >>>>>> vdl:execute2 @ execute-default.k, line: 22 > >>>>>> vdl:execute @ MolDyn-244-loops.kml, line: 20 > >>>>>> antchmbr @ MolDyn-244-loops.kml, line: 2845 > >>>>>> vdl:mains @ MolDyn-244-loops.kml, line: 2267 > >>>>>> > >>>>>> > >>>>>> Looks like antechamber has failed (?). And the failure is only > >>>>>> on a swfit side, it never made it across to Falcon (there are > >>>>>> no remote directories created). But I see some of antechamber > >>>>>> jobs have finished (in shared). > >>>>>> > >>>>>> Yuqing -- could the changes you've made be responsible for > >>>>>> these failures (I do not see how it could though) ? > >>>>>> > >>>>>> Ioan, what do you see in your logs ion these tasks: > >>>>>> > >>>>>> 2007-08-06 14:46:58,555 DEBUG TaskImpl Task(type=1, > >>>>>> identity=urn:0-1-56-0-1186429255786) setting status to Failed > >>>>>> 2007-08-06 14:46:58,556 DEBUG TaskImpl Task(type=1, > >>>>>> identity=urn:0-1-57-0-1186429255798) setting status to Failed > >>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, > >>>>>> identity=urn:0-1-59-0-1186429255800) setting status to Failed > >>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, > >>>>>> identity=urn:0-1-60-0-1186429255805) setting status to Failed > >>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, > >>>>>> identity=urn:0-1-61-0-1186429255811) setting status to Failed > >>>>>> 2007-08-06 14:46:58,558 DEBUG TaskImpl Task(type=1, > >>>>>> identity=urn:0-1-58-0-1186429255814) setting status to Failed > >>>>>> > >>>>>> Nika > >>>>>> > >>>>>> On Aug 6, 2007, at 2:29 PM, Ioan Raicu wrote: > >>>>>> > >>>>>>> OK! > >>>>>>> Why don't we do one last run from my allocation, as > >>>>>>> everything is set up already and ready to go! Make sure to > >>>>>>> enable all debug logging. Falkon is up and running with all > >>>>>>> debug enabled! > >>>>>>> > >>>>>>> Falkon location is unchanged from the last experiment. > >>>>>>> Falkon Factory Service: http://tg-viz-login2:50010/wsrf/ > >>>>>>> services/GenericPortal/core/WS/GPFactoryService > >>>>>>> Web Server (graphs): http://tg-viz-login2.uc.teragrid.org: > >>>>>>> 51000/index.htm > >>>>>>> > >>>>>>> ANL/UC is not quite so idle as it was earlier, but I bet we > >>>>>>> could still get 150~200 processors! > >>>>>>> > >>>>>>> Ioan > >>>>>>> > >>>>>>> Veronika Nefedova wrote: > >>>>>>>> m050 and m179 finished just fine now via GRAM (thanks to > >>>>>>>> Yuqing who fixed the m179 just in time!). We could start > >>>>>>>> again the 244- molecule run to verify that nothing is wrong > >>>>>>>> with the whole system. > >>>>>>>> > >>>>>>>> Nika > >>>>>>>> > >>>>>>>> On Aug 6, 2007, at 12:20 PM, Veronika Nefedova wrote: > >>>>>>>> > >>>>>>>>> > >>>>>>>>> On Aug 6, 2007, at 11:51 AM, Ioan Raicu wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> I started those 2 molecules via GRAM. I have no trust in > >>>>>>>>> m179 finishing completely since I didn't change anything. I > >>>>>>>>> hope for m050 to finish though... > >>>>>>>>> You can watch the swift log on viper in ~nefedova/alamines/ > >>>>>>>>> MolDyn-2-loops-be9484k93kk21.log > >>>>>>>>> > >>>>>>>>> Nika > >>>>>>>>> > >>>>>>>>>> Then, let's try another run with 244 molecules soon, as > >>>>>>>>>> most of ANL/UC is free! > >>>>>>>>>> > >>>>>>>>>> Ioan > >>>>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>>> > >>>>> > >>>> > >>>> > >>> _______________________________________________ > >>> Swift-devel mailing list > >>> Swift-devel at ci.uchicago.edu > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>> > >> > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Tue Aug 7 01:17:44 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 7 Aug 2007 06:17:44 +0000 (GMT) Subject: [Swift-devel] Q about MolDyn In-Reply-To: <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> Message-ID: On Mon, 6 Aug 2007, Veronika Nefedova wrote: > OK. I accidentally closed viper window where I started the workflow. The > workflow was started with & so it was supposed to stay up even if I exited the > shell. But apparently it didn't! Like Mihael said, it won't. The semantics we have for swift at the moment are the usual unix semantics of 'close the window, kill the process'. You can use screen (which I think you know how to use because you're running it...) to get the disconnect semantics that you want (i.e. same as for your mud client?) -- From nikan at wideopenwest.com Tue Aug 7 10:01:02 2007 From: nikan at wideopenwest.com (Veronika Nefedova) Date: Tue, 7 Aug 2007 10:01:02 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B7E6EF.6080909@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> Message-ID: <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> Mihael, do you have any clues on why this run has failed? Ioan - my answers to your questions are below... On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: > It looks like viper (where Swift is running) is idle, and so is tg- > viz-login2 (where Falkon is running). > What looks evident to me is that the normal list of events is for a > successful task: > iraicu at viper:/home/nefedova/alamines> grep "urn: > 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log > 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: > 0-1-73-2-31-0-0-1186444341989) setting status to Submitted > 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: > 0-1-73-2-31-0-0-1186444341989 0 > 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: > 0-1-73-2-31-0-0-1186444341989) setting status to Completed > > iraicu at viper:/home/nefedova/alamines> grep "setting status to > Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > 17566 175660 2179412 > > iraicu at viper:/home/nefedova/alamines> grep "NotificationThread > notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > 7959 55713 785035 > > iraicu at viper:/home/nefedova/alamines> grep "setting status to > Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > 190968 1909680 24003796 > > Now, 17566 tasks were submitted, 7959 notifiation were received > from Falkon, and 190968 tasks were set to completed... > > Obviously this isn't right. Falkon only saw 7959 tasks, so I would > argue that the # of notifications received is correct. The > submitted # of tasks looks like the # I would have expected, but > all the tasks did not make it to Falkon. The Falkon provider is > what sits between the change of status to submitted, and the > receipt of the notification, so I would say that is the first place > we need to look for more details... there used to some extra debug > info in the Falkon provider that simply printed all the tasks that > were actually being submitted to Falkon (as opposed to just the > change of status within Karajan). I don't see those debug > statements, I bet they got overwritten in the SVN update. > What about the completed tasks, why are there so many (190K) > completed tasks? Where did they come from? > "Task" doesn't mean job. It could be just data being staged in , etc. The first 2 are important -- (Submitted vs Completed). Since it differs, this is the problem... > Yong, are you keeping up with these emails? Do you still have a > copy of the latest Falkon provider that you edited just before you > left? Can you just take a look through there to make sure nothing > has been broken with the SVN updates? If you don't have time for > this now (considering today was your first day on the new job), > I'll dig through there and see if I can make some sense of what is > happening! > > One last thing, Ben mentioned that the Falkon provider you saw in > Nika's account was different than what was in SVN. Ben, did you at > least look at modification dates? How old was one as opposed to > the other? I hope we did not revert back to an older version that > might have had some bug in it.... > I had to update to the latest version of provider-deef from SVN since without the update nothing worked. The version I am at now is 1050. But this is exactly the same version of swift/deef I used for our Friday run (which 'worked' from Falcon/Swift point of view) Nika > Ioan > > Veronika Nefedova wrote: >> Well, there are some discrepancies: >> >> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- >> zhgo6be8tjhi1.log | wc >> 7959 244749 3241072 >> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- >> zhgo6be8tjhi1.log | wc >> 17207 564648 7949388 >> nefedova at viper:~/alamines> >> >> I.e. almost half of the jobs haven't finished (according to swift) >> >> I also have some exceptions: >> >> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: >> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception >> in getFile >> (80 of those): >> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >> zhgo6be8tjhi1.log | wc >> 80 880 9705 >> nefedova at viper:~/alamines> >> >> >> Nika From hategan at mcs.anl.gov Tue Aug 7 10:12:19 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 07 Aug 2007 10:12:19 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> Message-ID: <1186499539.18053.0.camel@blabla.mcs.anl.gov> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: > Mihael, do you have any clues on why this run has failed? Ioan - my > answers to your questions are below... Either give me an account on viper or copy that log to a machine I have access to. > > On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: > > > It looks like viper (where Swift is running) is idle, and so is tg- > > viz-login2 (where Falkon is running). > > What looks evident to me is that the normal list of events is for a > > successful task: > > iraicu at viper:/home/nefedova/alamines> grep "urn: > > 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log > > 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: > > 0-1-73-2-31-0-0-1186444341989) setting status to Submitted > > 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: > > 0-1-73-2-31-0-0-1186444341989 0 > > 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: > > 0-1-73-2-31-0-0-1186444341989) setting status to Completed > > > > iraicu at viper:/home/nefedova/alamines> grep "setting status to > > Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > 17566 175660 2179412 > > > > iraicu at viper:/home/nefedova/alamines> grep "NotificationThread > > notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > 7959 55713 785035 > > > > iraicu at viper:/home/nefedova/alamines> grep "setting status to > > Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > 190968 1909680 24003796 > > > > Now, 17566 tasks were submitted, 7959 notifiation were received > > from Falkon, and 190968 tasks were set to completed... > > > > Obviously this isn't right. Falkon only saw 7959 tasks, so I would > > argue that the # of notifications received is correct. The > > submitted # of tasks looks like the # I would have expected, but > > all the tasks did not make it to Falkon. The Falkon provider is > > what sits between the change of status to submitted, and the > > receipt of the notification, so I would say that is the first place > > we need to look for more details... there used to some extra debug > > info in the Falkon provider that simply printed all the tasks that > > were actually being submitted to Falkon (as opposed to just the > > change of status within Karajan). I don't see those debug > > statements, I bet they got overwritten in the SVN update. > > What about the completed tasks, why are there so many (190K) > > completed tasks? Where did they come from? > > > > > "Task" doesn't mean job. It could be just data being staged in , etc. > The first 2 are important -- (Submitted vs Completed). Since it > differs, this is the problem... > > > > Yong, are you keeping up with these emails? Do you still have a > > copy of the latest Falkon provider that you edited just before you > > left? Can you just take a look through there to make sure nothing > > has been broken with the SVN updates? If you don't have time for > > this now (considering today was your first day on the new job), > > I'll dig through there and see if I can make some sense of what is > > happening! > > > > One last thing, Ben mentioned that the Falkon provider you saw in > > Nika's account was different than what was in SVN. Ben, did you at > > least look at modification dates? How old was one as opposed to > > the other? I hope we did not revert back to an older version that > > might have had some bug in it.... > > > > I had to update to the latest version of provider-deef from SVN since > without the update nothing worked. The version I am at now is 1050. > But this is exactly the same version of swift/deef I used for our > Friday run (which 'worked' from Falcon/Swift point of view) > > Nika > > > > Ioan > > > > Veronika Nefedova wrote: > >> Well, there are some discrepancies: > >> > >> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- > >> zhgo6be8tjhi1.log | wc > >> 7959 244749 3241072 > >> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- > >> zhgo6be8tjhi1.log | wc > >> 17207 564648 7949388 > >> nefedova at viper:~/alamines> > >> > >> I.e. almost half of the jobs haven't finished (according to swift) > >> > >> I also have some exceptions: > >> > >> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: > >> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception > >> in getFile > >> (80 of those): > >> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- > >> zhgo6be8tjhi1.log | wc > >> 80 880 9705 > >> nefedova at viper:~/alamines> > >> > >> > >> Nika > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From iraicu at cs.uchicago.edu Tue Aug 7 11:18:16 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 07 Aug 2007 11:18:16 -0500 Subject: [Swift-devel] Re: Falkon code and logs In-Reply-To: References: <469BE095.4010608@cs.uchicago.edu> Message-ID: <46B89B48.1040209@cs.uchicago.edu> Hi Ben, I finally took the plunge, and wanted to see how SVN works :) Thanks for setting it up, I think its going to come in very useful, as several people are about to start editing the Falkon code... Catalin, Zhao, you, maybe Mihael, myself, etc... I did the svn co.... did a little house cleaning, removed some files, moved others, edited some scripts, did a clean, make, and run to test things.... and now I wanted to commit all my changes. I did: iraicu at viper:~/java/svn/falkon> svn commit svn: Commit failed (details follow): svn: Working copy '/home/iraicu/java/svn/falkon/service/build' is missing or not locked My compile scripts are set to remove the service/build directory, and it gets created new every time you compile the service. Is this a problem? My guess is that the service/build directory should not be in SVN, as it gets generated at compile time! Any hints on what I can do to commit my changes? Thanks, Ioan Ben Clifford wrote: > On Mon, 16 Jul 2007, Ioan Raicu wrote: > > >> Hey Ben, >> Here is the latest Falkon code base, including all compiled classes, scripts, >> libraries, 1.4 JRE, ploticus binaries, GT4 WS-core container, web server, >> etc... its the entire branch that is needed containing all the different >> Falkon components. I would have preffered to clean things up a bit, but here >> it is, and I'll do the clean-up later... >> http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_v0.8.1.tgz >> > > I just imported this into the vdl2 subversion repo. > > Type: > > svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon > > to get the checkout. > > I removed the embedded JRE (putting aside issues of whether we should big > binaries like that in the SVN, a quick glance at the JRE redistribution > licence looked like it was not something acceptable) > > If you edit files, you can commit them with: > > svn commit > > which will require you to feed in your CI password. > > Type svn update in the root directory of your checkout to pull down > changes that other people have made since your last checkout/update > (probably you'll find me making a bunch of those to tidy some things up) > > If you add files, you will need to: > > svn add myfile.java > > before committing it. > > This is the tarball as I received it, so has lots of built cruft in there > (.class files and things). > > I'll help work on tidying that up in the repository. > > Please commit any changes you have made since this tarball, and begin > making your releases from committed SVN code rather than from your own > private codebase - that way, people can talk about 'falkon built from > r972' and then everyone can look at the exact code version from SVN. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Aug 7 11:24:12 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 07 Aug 2007 11:24:12 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> References: <46AF37D9.7000301@mcs.anl.gov> <46B3B367.6090701@mcs.anl.gov> <46B3FA77.90801@cs.uchicago.edu> <1E69C5C1-ACB4-4540-93BD-3BBDC2AD6C1A@mcs.anl.gov> <46B74B8B.1080408@cs.uchicago.edu> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> Message-ID: <1186503852.18998.3.camel@blabla.mcs.anl.gov> Well, it doesn't look like the falkon provider in SVN has been updated at all in terms of fixing synchronization issues. All commits on provider-deef come from either ben or me: bash-3.1$ svn log ------------------------------------------------------------------------ r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, 03 Aug 2007) | 1 line removed gt4 stuff and added them as a dependency ------------------------------------------------------------------------ r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, 03 Aug 2007) | 1 line removed gt4 stuff and added them as a dependency ------------------------------------------------------------------------ r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug 2007) | 1 line a very small readme for provider-deef ------------------------------------------------------------------------ r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun 2007) | 1 line remove dist directory form svn ------------------------------------------------------------------------ r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun 2007) | 20 lines provider-deef, the Falkon/cog provider based on source in below message, with .class files deleted Date: Wed, 27 Jun 2007 09:27:23 -0500 From: Veronika Nefedova To: Yong Zhao Cc: Ben Clifford , Mihael Hategan , iraicu at cs.uchicago.edu, Ian Foster , Mike Wilde , Tiberiu Stef-Praun Subject: Re: 244 molecule MolDyn run... its on viper.uchicago.edu in : /home/nefedova/cogl/modules/provider-deef/ I also tared it up and put in my home on terminable: ~nefedova/cogl.tgz Nika ------------------------------------------------------------------------ On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: > Mihael, do you have any clues on why this run has failed? Ioan - my > answers to your questions are below... > > On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: > > > It looks like viper (where Swift is running) is idle, and so is tg- > > viz-login2 (where Falkon is running). > > What looks evident to me is that the normal list of events is for a > > successful task: > > iraicu at viper:/home/nefedova/alamines> grep "urn: > > 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log > > 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: > > 0-1-73-2-31-0-0-1186444341989) setting status to Submitted > > 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: > > 0-1-73-2-31-0-0-1186444341989 0 > > 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: > > 0-1-73-2-31-0-0-1186444341989) setting status to Completed > > > > iraicu at viper:/home/nefedova/alamines> grep "setting status to > > Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > 17566 175660 2179412 > > > > iraicu at viper:/home/nefedova/alamines> grep "NotificationThread > > notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > 7959 55713 785035 > > > > iraicu at viper:/home/nefedova/alamines> grep "setting status to > > Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > 190968 1909680 24003796 > > > > Now, 17566 tasks were submitted, 7959 notifiation were received > > from Falkon, and 190968 tasks were set to completed... > > > > Obviously this isn't right. Falkon only saw 7959 tasks, so I would > > argue that the # of notifications received is correct. The > > submitted # of tasks looks like the # I would have expected, but > > all the tasks did not make it to Falkon. The Falkon provider is > > what sits between the change of status to submitted, and the > > receipt of the notification, so I would say that is the first place > > we need to look for more details... there used to some extra debug > > info in the Falkon provider that simply printed all the tasks that > > were actually being submitted to Falkon (as opposed to just the > > change of status within Karajan). I don't see those debug > > statements, I bet they got overwritten in the SVN update. > > What about the completed tasks, why are there so many (190K) > > completed tasks? Where did they come from? > > > > > "Task" doesn't mean job. It could be just data being staged in , etc. > The first 2 are important -- (Submitted vs Completed). Since it > differs, this is the problem... > > > > Yong, are you keeping up with these emails? Do you still have a > > copy of the latest Falkon provider that you edited just before you > > left? Can you just take a look through there to make sure nothing > > has been broken with the SVN updates? If you don't have time for > > this now (considering today was your first day on the new job), > > I'll dig through there and see if I can make some sense of what is > > happening! > > > > One last thing, Ben mentioned that the Falkon provider you saw in > > Nika's account was different than what was in SVN. Ben, did you at > > least look at modification dates? How old was one as opposed to > > the other? I hope we did not revert back to an older version that > > might have had some bug in it.... > > > > I had to update to the latest version of provider-deef from SVN since > without the update nothing worked. The version I am at now is 1050. > But this is exactly the same version of swift/deef I used for our > Friday run (which 'worked' from Falcon/Swift point of view) > > Nika > > > > Ioan > > > > Veronika Nefedova wrote: > >> Well, there are some discrepancies: > >> > >> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- > >> zhgo6be8tjhi1.log | wc > >> 7959 244749 3241072 > >> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- > >> zhgo6be8tjhi1.log | wc > >> 17207 564648 7949388 > >> nefedova at viper:~/alamines> > >> > >> I.e. almost half of the jobs haven't finished (according to swift) > >> > >> I also have some exceptions: > >> > >> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: > >> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception > >> in getFile > >> (80 of those): > >> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- > >> zhgo6be8tjhi1.log | wc > >> 80 880 9705 > >> nefedova at viper:~/alamines> > >> > >> > >> Nika > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Tue Aug 7 11:29:44 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 07 Aug 2007 11:29:44 -0500 Subject: [Swift-devel] Re: Falkon code and logs In-Reply-To: <46B89B48.1040209@cs.uchicago.edu> References: <469BE095.4010608@cs.uchicago.edu> <46B89B48.1040209@cs.uchicago.edu> Message-ID: <1186504184.18998.4.camel@blabla.mcs.anl.gov> Remove the build directory. cd ~/java/svn/falkon/service svn rm build svn ci On Tue, 2007-08-07 at 11:18 -0500, Ioan Raicu wrote: > Hi Ben, > I finally took the plunge, and wanted to see how SVN works :) Thanks > for setting it up, I think its going to come in very useful, as > several people are about to start editing the Falkon code... Catalin, > Zhao, you, maybe Mihael, myself, etc... > > I did the svn co.... > > did a little house cleaning, removed some files, moved others, edited > some scripts, did a clean, make, and run to test things.... and now I > wanted to commit all my changes. > > I did: > iraicu at viper:~/java/svn/falkon> svn commit > svn: Commit failed (details follow): > svn: Working copy '/home/iraicu/java/svn/falkon/service/build' is > missing or not locked > > My compile scripts are set to remove the service/build directory, and > it gets created new every time you compile the service. Is this a > problem? My guess is that the service/build directory should not be > in SVN, as it gets generated at compile time! > > Any hints on what I can do to commit my changes? > > Thanks, > Ioan > > Ben Clifford wrote: > > On Mon, 16 Jul 2007, Ioan Raicu wrote: > > > > > > > Hey Ben, > > > Here is the latest Falkon code base, including all compiled classes, scripts, > > > libraries, 1.4 JRE, ploticus binaries, GT4 WS-core container, web server, > > > etc... its the entire branch that is needed containing all the different > > > Falkon components. I would have preffered to clean things up a bit, but here > > > it is, and I'll do the clean-up later... > > > http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_v0.8.1.tgz > > > > > > > I just imported this into the vdl2 subversion repo. > > > > Type: > > > > svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon > > > > to get the checkout. > > > > I removed the embedded JRE (putting aside issues of whether we should big > > binaries like that in the SVN, a quick glance at the JRE redistribution > > licence looked like it was not something acceptable) > > > > If you edit files, you can commit them with: > > > > svn commit > > > > which will require you to feed in your CI password. > > > > Type svn update in the root directory of your checkout to pull down > > changes that other people have made since your last checkout/update > > (probably you'll find me making a bunch of those to tidy some things up) > > > > If you add files, you will need to: > > > > svn add myfile.java > > > > before committing it. > > > > This is the tarball as I received it, so has lots of built cruft in there > > (.class files and things). > > > > I'll help work on tidying that up in the repository. > > > > Please commit any changes you have made since this tarball, and begin > > making your releases from committed SVN code rather than from your own > > private codebase - that way, people can talk about 'falkon built from > > r972' and then everyone can look at the exact code version from SVN. > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel From iraicu at cs.uchicago.edu Tue Aug 7 11:32:13 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 07 Aug 2007 11:32:13 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186503852.18998.3.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> Message-ID: <46B89E8D.1060809@cs.uchicago.edu> Could it be that the fixes were done before the original SVN checkin? If not, then at least we know why things aren't working. I bet the latest provider source was in Nika's Swift install on viper. Nika, I take it you don't have this anymore, as SVN updates overwrote this. Yong, is there any other place you might have the latest provider source? If not, I guess we need to take another look through the provider source to fix the issues that we knew of... Ioan Mihael Hategan wrote: > Well, it doesn't look like the falkon provider in SVN has been updated > at all in terms of fixing synchronization issues. All commits on > provider-deef come from either ben or me: > > bash-3.1$ svn log > ------------------------------------------------------------------------ > r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, 03 Aug > 2007) | 1 line > > removed gt4 stuff and added them as a dependency > ------------------------------------------------------------------------ > r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, 03 Aug > 2007) | 1 line > > removed gt4 stuff and added them as a dependency > ------------------------------------------------------------------------ > r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug > 2007) | 1 line > > a very small readme for provider-deef > ------------------------------------------------------------------------ > r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun > 2007) | 1 line > > remove dist directory form svn > ------------------------------------------------------------------------ > r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun > 2007) | 20 lines > > provider-deef, the Falkon/cog provider > > based on source in below message, with .class files deleted > > > Date: Wed, 27 Jun 2007 09:27:23 -0500 > From: Veronika Nefedova > To: Yong Zhao > Cc: Ben Clifford , Mihael Hategan > , > iraicu at cs.uchicago.edu, Ian Foster , > Mike Wilde , > Tiberiu Stef-Praun > Subject: Re: 244 molecule MolDyn run... > > its on viper.uchicago.edu > in : /home/nefedova/cogl/modules/provider-deef/ > I also tared it up and put in my home on terminable: ~nefedova/cogl.tgz > > Nika > > > ------------------------------------------------------------------------ > > > On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: > >> Mihael, do you have any clues on why this run has failed? Ioan - my >> answers to your questions are below... >> >> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >> >> >>> It looks like viper (where Swift is running) is idle, and so is tg- >>> viz-login2 (where Falkon is running). >>> What looks evident to me is that the normal list of events is for a >>> successful task: >>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log >>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: >>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>> 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: >>> 0-1-73-2-31-0-0-1186444341989 0 >>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: >>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>> >>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>> 17566 175660 2179412 >>> >>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread >>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>> 7959 55713 785035 >>> >>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>> 190968 1909680 24003796 >>> >>> Now, 17566 tasks were submitted, 7959 notifiation were received >>> from Falkon, and 190968 tasks were set to completed... >>> >>> Obviously this isn't right. Falkon only saw 7959 tasks, so I would >>> argue that the # of notifications received is correct. The >>> submitted # of tasks looks like the # I would have expected, but >>> all the tasks did not make it to Falkon. The Falkon provider is >>> what sits between the change of status to submitted, and the >>> receipt of the notification, so I would say that is the first place >>> we need to look for more details... there used to some extra debug >>> info in the Falkon provider that simply printed all the tasks that >>> were actually being submitted to Falkon (as opposed to just the >>> change of status within Karajan). I don't see those debug >>> statements, I bet they got overwritten in the SVN update. >>> What about the completed tasks, why are there so many (190K) >>> completed tasks? Where did they come from? >>> >>> >> "Task" doesn't mean job. It could be just data being staged in , etc. >> The first 2 are important -- (Submitted vs Completed). Since it >> differs, this is the problem... >> >> >> >>> Yong, are you keeping up with these emails? Do you still have a >>> copy of the latest Falkon provider that you edited just before you >>> left? Can you just take a look through there to make sure nothing >>> has been broken with the SVN updates? If you don't have time for >>> this now (considering today was your first day on the new job), >>> I'll dig through there and see if I can make some sense of what is >>> happening! >>> >>> One last thing, Ben mentioned that the Falkon provider you saw in >>> Nika's account was different than what was in SVN. Ben, did you at >>> least look at modification dates? How old was one as opposed to >>> the other? I hope we did not revert back to an older version that >>> might have had some bug in it.... >>> >>> >> I had to update to the latest version of provider-deef from SVN since >> without the update nothing worked. The version I am at now is 1050. >> But this is exactly the same version of swift/deef I used for our >> Friday run (which 'worked' from Falcon/Swift point of view) >> >> Nika >> >> >> >>> Ioan >>> >>> Veronika Nefedova wrote: >>> >>>> Well, there are some discrepancies: >>>> >>>> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- >>>> zhgo6be8tjhi1.log | wc >>>> 7959 244749 3241072 >>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- >>>> zhgo6be8tjhi1.log | wc >>>> 17207 564648 7949388 >>>> nefedova at viper:~/alamines> >>>> >>>> I.e. almost half of the jobs haven't finished (according to swift) >>>> >>>> I also have some exceptions: >>>> >>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: >>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception >>>> in getFile >>>> (80 of those): >>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>> zhgo6be8tjhi1.log | wc >>>> 80 880 9705 >>>> nefedova at viper:~/alamines> >>>> >>>> >>>> Nika >>>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Tue Aug 7 11:54:28 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 07 Aug 2007 11:54:28 -0500 Subject: [Swift-devel] Re: Falkon code and logs In-Reply-To: <1186504184.18998.4.camel@blabla.mcs.anl.gov> References: <469BE095.4010608@cs.uchicago.edu> <46B89B48.1040209@cs.uchicago.edu> <1186504184.18998.4.camel@blabla.mcs.anl.gov> Message-ID: <46B8A3C4.9030300@cs.uchicago.edu> OK, made it through all the commits, but now its asking for a user id and pass. Who do I ask to reset my pass? Ioan Mihael Hategan wrote: > Remove the build directory. > > cd ~/java/svn/falkon/service > svn rm build > svn ci > > > On Tue, 2007-08-07 at 11:18 -0500, Ioan Raicu wrote: > >> Hi Ben, >> I finally took the plunge, and wanted to see how SVN works :) Thanks >> for setting it up, I think its going to come in very useful, as >> several people are about to start editing the Falkon code... Catalin, >> Zhao, you, maybe Mihael, myself, etc... >> >> I did the svn co.... >> >> did a little house cleaning, removed some files, moved others, edited >> some scripts, did a clean, make, and run to test things.... and now I >> wanted to commit all my changes. >> >> I did: >> iraicu at viper:~/java/svn/falkon> svn commit >> svn: Commit failed (details follow): >> svn: Working copy '/home/iraicu/java/svn/falkon/service/build' is >> missing or not locked >> >> My compile scripts are set to remove the service/build directory, and >> it gets created new every time you compile the service. Is this a >> problem? My guess is that the service/build directory should not be >> in SVN, as it gets generated at compile time! >> >> Any hints on what I can do to commit my changes? >> >> Thanks, >> Ioan >> >> Ben Clifford wrote: >> >>> On Mon, 16 Jul 2007, Ioan Raicu wrote: >>> >>> >>> >>>> Hey Ben, >>>> Here is the latest Falkon code base, including all compiled classes, scripts, >>>> libraries, 1.4 JRE, ploticus binaries, GT4 WS-core container, web server, >>>> etc... its the entire branch that is needed containing all the different >>>> Falkon components. I would have preffered to clean things up a bit, but here >>>> it is, and I'll do the clean-up later... >>>> http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_v0.8.1.tgz >>>> >>>> >>> I just imported this into the vdl2 subversion repo. >>> >>> Type: >>> >>> svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon >>> >>> to get the checkout. >>> >>> I removed the embedded JRE (putting aside issues of whether we should big >>> binaries like that in the SVN, a quick glance at the JRE redistribution >>> licence looked like it was not something acceptable) >>> >>> If you edit files, you can commit them with: >>> >>> svn commit >>> >>> which will require you to feed in your CI password. >>> >>> Type svn update in the root directory of your checkout to pull down >>> changes that other people have made since your last checkout/update >>> (probably you'll find me making a bunch of those to tidy some things up) >>> >>> If you add files, you will need to: >>> >>> svn add myfile.java >>> >>> before committing it. >>> >>> This is the tarball as I received it, so has lots of built cruft in there >>> (.class files and things). >>> >>> I'll help work on tidying that up in the repository. >>> >>> Please commit any changes you have made since this tarball, and begin >>> making your releases from committed SVN code rather than from your own >>> private codebase - that way, people can talk about 'falkon built from >>> r972' and then everyone can look at the exact code version from SVN. >>> >>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From nefedova at mcs.anl.gov Tue Aug 7 11:53:29 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Tue, 7 Aug 2007 11:53:29 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B89E8D.1060809@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D.1060809@cs.uchicago.edu> Message-ID: Ioan, It looks like the Falcon (including provider-deef) was put in SVN on June 27th. You really were supposed to use the SVN code from that point. Sigh. Did you do any changes to viper install after June 27th? Nika On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: > Could it be that the fixes were done before the original SVN > checkin? If not, then at least we know why things aren't > working. I bet the latest provider source was in Nika's Swift > install on viper. Nika, I take it you don't have this anymore, as > SVN updates overwrote this. Yong, is there any other place you > might have the latest provider source? If not, I guess we need to > take another look through the provider source to fix the issues > that we knew of... > > Ioan > > Mihael Hategan wrote: >> Well, it doesn't look like the falkon provider in SVN has been >> updated >> at all in terms of fixing synchronization issues. All commits on >> provider-deef come from either ben or me: >> >> bash-3.1$ svn log >> --------------------------------------------------------------------- >> --- >> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, >> 03 Aug >> 2007) | 1 line >> >> removed gt4 stuff and added them as a dependency >> --------------------------------------------------------------------- >> --- >> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, >> 03 Aug >> 2007) | 1 line >> >> removed gt4 stuff and added them as a dependency >> --------------------------------------------------------------------- >> --- >> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug >> 2007) | 1 line >> >> a very small readme for provider-deef >> --------------------------------------------------------------------- >> --- >> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun >> 2007) | 1 line >> >> remove dist directory form svn >> --------------------------------------------------------------------- >> --- >> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun >> 2007) | 20 lines >> >> provider-deef, the Falkon/cog provider >> >> based on source in below message, with .class files deleted >> >> >> Date: Wed, 27 Jun 2007 09:27:23 -0500 >> From: Veronika Nefedova >> To: Yong Zhao >> Cc: Ben Clifford , Mihael Hategan >> , >> iraicu at cs.uchicago.edu, Ian Foster , >> Mike Wilde , >> Tiberiu Stef-Praun >> Subject: Re: 244 molecule MolDyn run... >> >> its on viper.uchicago.edu >> in : /home/nefedova/cogl/modules/provider-deef/ >> I also tared it up and put in my home on terminable: ~nefedova/ >> cogl.tgz >> >> Nika >> >> >> --------------------------------------------------------------------- >> --- >> >> >> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >> >>> Mihael, do you have any clues on why this run has failed? Ioan - my >>> answers to your questions are below... >>> >>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>> >>> >>>> It looks like viper (where Swift is running) is idle, and so is tg- >>>> viz-login2 (where Falkon is running). >>>> What looks evident to me is that the normal list of events is for a >>>> successful task: >>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log >>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: >>>> 0-1-73-2-31-0-0-1186444341989 0 >>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: >>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>>> >>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>> 17566 175660 2179412 >>>> >>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread >>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>> 7959 55713 785035 >>>> >>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>> 190968 1909680 24003796 >>>> >>>> Now, 17566 tasks were submitted, 7959 notifiation were received >>>> from Falkon, and 190968 tasks were set to completed... >>>> >>>> Obviously this isn't right. Falkon only saw 7959 tasks, so I would >>>> argue that the # of notifications received is correct. The >>>> submitted # of tasks looks like the # I would have expected, but >>>> all the tasks did not make it to Falkon. The Falkon provider is >>>> what sits between the change of status to submitted, and the >>>> receipt of the notification, so I would say that is the first place >>>> we need to look for more details... there used to some extra debug >>>> info in the Falkon provider that simply printed all the tasks that >>>> were actually being submitted to Falkon (as opposed to just the >>>> change of status within Karajan). I don't see those debug >>>> statements, I bet they got overwritten in the SVN update. >>>> What about the completed tasks, why are there so many (190K) >>>> completed tasks? Where did they come from? >>>> >>>> >>> "Task" doesn't mean job. It could be just data being staged in , >>> etc. >>> The first 2 are important -- (Submitted vs Completed). Since it >>> differs, this is the problem... >>> >>> >>> >>>> Yong, are you keeping up with these emails? Do you still have a >>>> copy of the latest Falkon provider that you edited just before you >>>> left? Can you just take a look through there to make sure nothing >>>> has been broken with the SVN updates? If you don't have time for >>>> this now (considering today was your first day on the new job), >>>> I'll dig through there and see if I can make some sense of what is >>>> happening! >>>> >>>> One last thing, Ben mentioned that the Falkon provider you saw in >>>> Nika's account was different than what was in SVN. Ben, did you at >>>> least look at modification dates? How old was one as opposed to >>>> the other? I hope we did not revert back to an older version that >>>> might have had some bug in it.... >>>> >>>> >>> I had to update to the latest version of provider-deef from SVN >>> since >>> without the update nothing worked. The version I am at now is 1050. >>> But this is exactly the same version of swift/deef I used for our >>> Friday run (which 'worked' from Falcon/Swift point of view) >>> >>> Nika >>> >>> >>> >>>> Ioan >>>> >>>> Veronika Nefedova wrote: >>>> >>>>> Well, there are some discrepancies: >>>>> >>>>> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- >>>>> zhgo6be8tjhi1.log | wc >>>>> 7959 244749 3241072 >>>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- >>>>> zhgo6be8tjhi1.log | wc >>>>> 17207 564648 7949388 >>>>> nefedova at viper:~/alamines> >>>>> >>>>> I.e. almost half of the jobs haven't finished (according to swift) >>>>> >>>>> I also have some exceptions: >>>>> >>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: >>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception >>>>> in getFile >>>>> (80 of those): >>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>> zhgo6be8tjhi1.log | wc >>>>> 80 880 9705 >>>>> nefedova at viper:~/alamines> >>>>> >>>>> >>>>> Nika >>>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Tue Aug 7 11:56:19 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 07 Aug 2007 11:56:19 -0500 Subject: [Swift-devel] Re: Falkon code and logs In-Reply-To: <46B8A3C4.9030300@cs.uchicago.edu> References: <469BE095.4010608@cs.uchicago.edu> <46B89B48.1040209@cs.uchicago.edu> <1186504184.18998.4.camel@blabla.mcs.anl.gov> <46B8A3C4.9030300@cs.uchicago.edu> Message-ID: <46B8A433.4070407@cs.uchicago.edu> Actually, did the commit occur successfully? Maybe it was just my comment that was not saved? Ioan iraicu at viper:~/java/svn/falkon> svn ci just same basic cleanup...and testing how SVN works! -This line, and those below, will be ignored-- M container/share/schema/GenericPortal/FactoryService/Factory_service.wsdl M container/share/schema/GenericPortal/FactoryService/Factory_flattened.wsdl M container/share/schema/GenericPortal/FactoryService/Factory_bindings.wsdl M container/share/schema/GenericPortal/GPService_instance/GP_flattened.wsdl M container/share/schema/GenericPortal/GPService_instance/GP_bindings.wsdl M container/share/schema/GenericPortal/GPService_instance/GP_service.wsdl M container/lib/org_globus_GenericPortal_services_core_WS.jar M container/lib/org_globus_GenericPortal_services_core_WS_stubs.jar M container/lib/org_globus_GenericPortal_common.jar D service/build M service/org_globus_GenericPortal_common.jar M service/make.gpws.sh M service/run.gpws_local.sh M service/clean.gpws.sh M service/org_globus_GenericPortal_services_core_WS.gar M worker/lib/org_globus_GenericPortal_common.jar M worker/WorkerEPR.txt M worker/org_globus_GenericPortal_common.jar D client/drp_test M client/make.user.sh D client/pssh_stdout D client/DeeF_tests_nosec_drp_test D client/pssh_stderr A client/workloads A client/workloads/1c A client/workloads/10c A client/workloads/10s A client/workloads/100c A client/workloads/1000c A client/workloads/101c A client/workloads/30c A client/workloads/30s A client/workloads/30ss A client/workloads/sleep A client/workloads/sleep/sleep_8 A client/workloads/sleep/sleep_480 A client/workloads/sleep/sleep_0_2M A client/workloads/sleep/sleep_0 A client/workloads/sleep/sleep_1 A client/workloads/sleep/sleep_2 A client/workloads/sleep/sleep_local_8 A client/workloads/sleep/sleep_120 A client/workloads/sleep/sleep_4 A client/workloads/sleep/sleep_32 A client/workloads/sleep/sleep_240 A client/workloads/sleep/sleep_60 A client/workloads/sleep/sleep_16 D client/lib_old M client/org_globus_GenericPortal_common.jar ~ ~ ~ ~ ~ ~ ~ ~ "svn-commit.tmp" 52L, 2074C written Deleting client/DeeF_tests_nosec_drp_test Authentication realm: SVN Login Password for 'iraicu': Authentication realm: SVN Login Username: iraicu Password for 'iraicu': Authentication realm: SVN Login Username: svn: Commit failed (details follow): svn: CHECKOUT of '/svn/vdl2/!svn/ver/999/falkon/client': authorization failed (https://svn.ci.uchicago.edu) svn: Your commit message was left in a temporary file: svn: '/home/iraicu/java/svn/falkon/svn-commit.tmp' iraicu at viper:~/java/svn/falkon> Ioan Raicu wrote: > OK, made it through all the commits, but now its asking for a user id > and pass. Who do I ask to reset my pass? > > Ioan > > Mihael Hategan wrote: >> Remove the build directory. >> >> cd ~/java/svn/falkon/service >> svn rm build >> svn ci >> >> >> On Tue, 2007-08-07 at 11:18 -0500, Ioan Raicu wrote: >> >>> Hi Ben, >>> I finally took the plunge, and wanted to see how SVN works :) Thanks >>> for setting it up, I think its going to come in very useful, as >>> several people are about to start editing the Falkon code... Catalin, >>> Zhao, you, maybe Mihael, myself, etc... >>> >>> I did the svn co.... >>> >>> did a little house cleaning, removed some files, moved others, edited >>> some scripts, did a clean, make, and run to test things.... and now I >>> wanted to commit all my changes. >>> >>> I did: >>> iraicu at viper:~/java/svn/falkon> svn commit >>> svn: Commit failed (details follow): >>> svn: Working copy '/home/iraicu/java/svn/falkon/service/build' is >>> missing or not locked >>> >>> My compile scripts are set to remove the service/build directory, and >>> it gets created new every time you compile the service. Is this a >>> problem? My guess is that the service/build directory should not be >>> in SVN, as it gets generated at compile time! >>> >>> Any hints on what I can do to commit my changes? >>> >>> Thanks, >>> Ioan >>> >>> Ben Clifford wrote: >>> >>>> On Mon, 16 Jul 2007, Ioan Raicu wrote: >>>> >>>> >>>> >>>>> Hey Ben, >>>>> Here is the latest Falkon code base, including all compiled classes, scripts, >>>>> libraries, 1.4 JRE, ploticus binaries, GT4 WS-core container, web server, >>>>> etc... its the entire branch that is needed containing all the different >>>>> Falkon components. I would have preffered to clean things up a bit, but here >>>>> it is, and I'll do the clean-up later... >>>>> http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_v0.8.1.tgz >>>>> >>>>> >>>> I just imported this into the vdl2 subversion repo. >>>> >>>> Type: >>>> >>>> svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon >>>> >>>> to get the checkout. >>>> >>>> I removed the embedded JRE (putting aside issues of whether we should big >>>> binaries like that in the SVN, a quick glance at the JRE redistribution >>>> licence looked like it was not something acceptable) >>>> >>>> If you edit files, you can commit them with: >>>> >>>> svn commit >>>> >>>> which will require you to feed in your CI password. >>>> >>>> Type svn update in the root directory of your checkout to pull down >>>> changes that other people have made since your last checkout/update >>>> (probably you'll find me making a bunch of those to tidy some things up) >>>> >>>> If you add files, you will need to: >>>> >>>> svn add myfile.java >>>> >>>> before committing it. >>>> >>>> This is the tarball as I received it, so has lots of built cruft in there >>>> (.class files and things). >>>> >>>> I'll help work on tidying that up in the repository. >>>> >>>> Please commit any changes you have made since this tarball, and begin >>>> making your releases from committed SVN code rather than from your own >>>> private codebase - that way, people can talk about 'falkon built from >>>> r972' and then everyone can look at the exact code version from SVN. >>>> >>>> >>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> >> >> > ------------------------------------------------------------------------ > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Aug 7 11:58:52 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 07 Aug 2007 11:58:52 -0500 Subject: [Swift-devel] Re: Falkon code and logs In-Reply-To: <46B8A3C4.9030300@cs.uchicago.edu> References: <469BE095.4010608@cs.uchicago.edu> <46B89B48.1040209@cs.uchicago.edu> <1186504184.18998.4.camel@blabla.mcs.anl.gov> <46B8A3C4.9030300@cs.uchicago.edu> Message-ID: <1186505932.22351.0.camel@blabla.mcs.anl.gov> It's your CI login. On Tue, 2007-08-07 at 11:54 -0500, Ioan Raicu wrote: > OK, made it through all the commits, but now its asking for a user id > and pass. Who do I ask to reset my pass? > > Ioan > > Mihael Hategan wrote: > > Remove the build directory. > > > > cd ~/java/svn/falkon/service > > svn rm build > > svn ci > > > > > > On Tue, 2007-08-07 at 11:18 -0500, Ioan Raicu wrote: > > > > > Hi Ben, > > > I finally took the plunge, and wanted to see how SVN works :) Thanks > > > for setting it up, I think its going to come in very useful, as > > > several people are about to start editing the Falkon code... Catalin, > > > Zhao, you, maybe Mihael, myself, etc... > > > > > > I did the svn co.... > > > > > > did a little house cleaning, removed some files, moved others, edited > > > some scripts, did a clean, make, and run to test things.... and now I > > > wanted to commit all my changes. > > > > > > I did: > > > iraicu at viper:~/java/svn/falkon> svn commit > > > svn: Commit failed (details follow): > > > svn: Working copy '/home/iraicu/java/svn/falkon/service/build' is > > > missing or not locked > > > > > > My compile scripts are set to remove the service/build directory, and > > > it gets created new every time you compile the service. Is this a > > > problem? My guess is that the service/build directory should not be > > > in SVN, as it gets generated at compile time! > > > > > > Any hints on what I can do to commit my changes? > > > > > > Thanks, > > > Ioan > > > > > > Ben Clifford wrote: > > > > > > > On Mon, 16 Jul 2007, Ioan Raicu wrote: > > > > > > > > > > > > > > > > > Hey Ben, > > > > > Here is the latest Falkon code base, including all compiled classes, scripts, > > > > > libraries, 1.4 JRE, ploticus binaries, GT4 WS-core container, web server, > > > > > etc... its the entire branch that is needed containing all the different > > > > > Falkon components. I would have preffered to clean things up a bit, but here > > > > > it is, and I'll do the clean-up later... > > > > > http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_v0.8.1.tgz > > > > > > > > > > > > > > I just imported this into the vdl2 subversion repo. > > > > > > > > Type: > > > > > > > > svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon > > > > > > > > to get the checkout. > > > > > > > > I removed the embedded JRE (putting aside issues of whether we should big > > > > binaries like that in the SVN, a quick glance at the JRE redistribution > > > > licence looked like it was not something acceptable) > > > > > > > > If you edit files, you can commit them with: > > > > > > > > svn commit > > > > > > > > which will require you to feed in your CI password. > > > > > > > > Type svn update in the root directory of your checkout to pull down > > > > changes that other people have made since your last checkout/update > > > > (probably you'll find me making a bunch of those to tidy some things up) > > > > > > > > If you add files, you will need to: > > > > > > > > svn add myfile.java > > > > > > > > before committing it. > > > > > > > > This is the tarball as I received it, so has lots of built cruft in there > > > > (.class files and things). > > > > > > > > I'll help work on tidying that up in the repository. > > > > > > > > Please commit any changes you have made since this tarball, and begin > > > > making your releases from committed SVN code rather than from your own > > > > private codebase - that way, people can talk about 'falkon built from > > > > r972' and then everyone can look at the exact code version from SVN. > > > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > From benc at hawaga.org.uk Tue Aug 7 12:00:59 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 7 Aug 2007 17:00:59 +0000 (GMT) Subject: [Swift-devel] Re: Falkon code and logs In-Reply-To: <1186505932.22351.0.camel@blabla.mcs.anl.gov> References: <469BE095.4010608@cs.uchicago.edu> <46B89B48.1040209@cs.uchicago.edu> <1186504184.18998.4.camel@blabla.mcs.anl.gov> <46B8A3C4.9030300@cs.uchicago.edu> <1186505932.22351.0.camel@blabla.mcs.anl.gov> Message-ID: Though Ioan doesn't have committ access to the SVN repo. I'll get that fixed, though - please hold. On Tue, 7 Aug 2007, Mihael Hategan wrote: > It's your CI login. > > On Tue, 2007-08-07 at 11:54 -0500, Ioan Raicu wrote: > > OK, made it through all the commits, but now its asking for a user id > > and pass. Who do I ask to reset my pass? > > > > Ioan > > > > Mihael Hategan wrote: > > > Remove the build directory. > > > > > > cd ~/java/svn/falkon/service > > > svn rm build > > > svn ci > > > > > > > > > On Tue, 2007-08-07 at 11:18 -0500, Ioan Raicu wrote: > > > > > > > Hi Ben, > > > > I finally took the plunge, and wanted to see how SVN works :) Thanks > > > > for setting it up, I think its going to come in very useful, as > > > > several people are about to start editing the Falkon code... Catalin, > > > > Zhao, you, maybe Mihael, myself, etc... > > > > > > > > I did the svn co.... > > > > > > > > did a little house cleaning, removed some files, moved others, edited > > > > some scripts, did a clean, make, and run to test things.... and now I > > > > wanted to commit all my changes. > > > > > > > > I did: > > > > iraicu at viper:~/java/svn/falkon> svn commit > > > > svn: Commit failed (details follow): > > > > svn: Working copy '/home/iraicu/java/svn/falkon/service/build' is > > > > missing or not locked > > > > > > > > My compile scripts are set to remove the service/build directory, and > > > > it gets created new every time you compile the service. Is this a > > > > problem? My guess is that the service/build directory should not be > > > > in SVN, as it gets generated at compile time! > > > > > > > > Any hints on what I can do to commit my changes? > > > > > > > > Thanks, > > > > Ioan > > > > > > > > Ben Clifford wrote: > > > > > > > > > On Mon, 16 Jul 2007, Ioan Raicu wrote: > > > > > > > > > > > > > > > > > > > > > Hey Ben, > > > > > > Here is the latest Falkon code base, including all compiled classes, scripts, > > > > > > libraries, 1.4 JRE, ploticus binaries, GT4 WS-core container, web server, > > > > > > etc... its the entire branch that is needed containing all the different > > > > > > Falkon components. I would have preffered to clean things up a bit, but here > > > > > > it is, and I'll do the clean-up later... > > > > > > http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_v0.8.1.tgz > > > > > > > > > > > > > > > > > I just imported this into the vdl2 subversion repo. > > > > > > > > > > Type: > > > > > > > > > > svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon > > > > > > > > > > to get the checkout. > > > > > > > > > > I removed the embedded JRE (putting aside issues of whether we should big > > > > > binaries like that in the SVN, a quick glance at the JRE redistribution > > > > > licence looked like it was not something acceptable) > > > > > > > > > > If you edit files, you can commit them with: > > > > > > > > > > svn commit > > > > > > > > > > which will require you to feed in your CI password. > > > > > > > > > > Type svn update in the root directory of your checkout to pull down > > > > > changes that other people have made since your last checkout/update > > > > > (probably you'll find me making a bunch of those to tidy some things up) > > > > > > > > > > If you add files, you will need to: > > > > > > > > > > svn add myfile.java > > > > > > > > > > before committing it. > > > > > > > > > > This is the tarball as I received it, so has lots of built cruft in there > > > > > (.class files and things). > > > > > > > > > > I'll help work on tidying that up in the repository. > > > > > > > > > > Please commit any changes you have made since this tarball, and begin > > > > > making your releases from committed SVN code rather than from your own > > > > > private codebase - that way, people can talk about 'falkon built from > > > > > r972' and then everyone can look at the exact code version from SVN. > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > From hategan at mcs.anl.gov Tue Aug 7 12:01:51 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 07 Aug 2007 12:01:51 -0500 Subject: [Swift-devel] Re: Falkon code and logs In-Reply-To: <46B8A433.4070407@cs.uchicago.edu> References: <469BE095.4010608@cs.uchicago.edu> <46B89B48.1040209@cs.uchicago.edu> <1186504184.18998.4.camel@blabla.mcs.anl.gov> <46B8A3C4.9030300@cs.uchicago.edu> <46B8A433.4070407@cs.uchicago.edu> Message-ID: <1186506111.22504.0.camel@blabla.mcs.anl.gov> Not it didn't. Try again. If it still doesn't work, send mail to ci-support. On Tue, 2007-08-07 at 11:56 -0500, Ioan Raicu wrote: > Actually, did the commit occur successfully? Maybe it was just my > comment that was not saved? > > Ioan > > iraicu at viper:~/java/svn/falkon> svn ci > > just same basic cleanup...and testing how SVN works! > -This line, and those below, will be ignored-- > > M > container/share/schema/GenericPortal/FactoryService/Factory_service.wsdl > M > container/share/schema/GenericPortal/FactoryService/Factory_flattened.wsdl > M > container/share/schema/GenericPortal/FactoryService/Factory_bindings.wsdl > M > container/share/schema/GenericPortal/GPService_instance/GP_flattened.wsdl > M > container/share/schema/GenericPortal/GPService_instance/GP_bindings.wsdl > M > container/share/schema/GenericPortal/GPService_instance/GP_service.wsdl > M container/lib/org_globus_GenericPortal_services_core_WS.jar > M container/lib/org_globus_GenericPortal_services_core_WS_stubs.jar > M container/lib/org_globus_GenericPortal_common.jar > D service/build > M service/org_globus_GenericPortal_common.jar > M service/make.gpws.sh > M service/run.gpws_local.sh > M service/clean.gpws.sh > M service/org_globus_GenericPortal_services_core_WS.gar > M worker/lib/org_globus_GenericPortal_common.jar > M worker/WorkerEPR.txt > M worker/org_globus_GenericPortal_common.jar > D client/drp_test > M client/make.user.sh > D client/pssh_stdout > D client/DeeF_tests_nosec_drp_test > D client/pssh_stderr > A client/workloads > A client/workloads/1c > A client/workloads/10c > A client/workloads/10s > A client/workloads/100c > A client/workloads/1000c > A client/workloads/101c > A client/workloads/30c > A client/workloads/30s > A client/workloads/30ss > A client/workloads/sleep > A client/workloads/sleep/sleep_8 > A client/workloads/sleep/sleep_480 > A client/workloads/sleep/sleep_0_2M > A client/workloads/sleep/sleep_0 > A client/workloads/sleep/sleep_1 > A client/workloads/sleep/sleep_2 > A client/workloads/sleep/sleep_local_8 > A client/workloads/sleep/sleep_120 > A client/workloads/sleep/sleep_4 > A client/workloads/sleep/sleep_32 > A client/workloads/sleep/sleep_240 > A client/workloads/sleep/sleep_60 > A client/workloads/sleep/sleep_16 > D client/lib_old > M client/org_globus_GenericPortal_common.jar > ~ > ~ > ~ > ~ > ~ > ~ > ~ > ~ > "svn-commit.tmp" 52L, 2074C > written > Deleting client/DeeF_tests_nosec_drp_test > Authentication realm: SVN Login > Password for 'iraicu': > Authentication realm: SVN Login > Username: iraicu > Password for 'iraicu': > Authentication realm: SVN Login > Username: svn: Commit failed (details follow): > svn: CHECKOUT of '/svn/vdl2/!svn/ver/999/falkon/client': authorization > failed (https://svn.ci.uchicago.edu) > svn: Your commit message was left in a temporary file: > svn: '/home/iraicu/java/svn/falkon/svn-commit.tmp' > iraicu at viper:~/java/svn/falkon> > > Ioan Raicu wrote: > > OK, made it through all the commits, but now its asking for a user > > id and pass. Who do I ask to reset my pass? > > > > Ioan > > > > Mihael Hategan wrote: > > > Remove the build directory. > > > > > > cd ~/java/svn/falkon/service > > > svn rm build > > > svn ci > > > > > > > > > On Tue, 2007-08-07 at 11:18 -0500, Ioan Raicu wrote: > > > > > > > Hi Ben, > > > > I finally took the plunge, and wanted to see how SVN works :) Thanks > > > > for setting it up, I think its going to come in very useful, as > > > > several people are about to start editing the Falkon code... Catalin, > > > > Zhao, you, maybe Mihael, myself, etc... > > > > > > > > I did the svn co.... > > > > > > > > did a little house cleaning, removed some files, moved others, edited > > > > some scripts, did a clean, make, and run to test things.... and now I > > > > wanted to commit all my changes. > > > > > > > > I did: > > > > iraicu at viper:~/java/svn/falkon> svn commit > > > > svn: Commit failed (details follow): > > > > svn: Working copy '/home/iraicu/java/svn/falkon/service/build' is > > > > missing or not locked > > > > > > > > My compile scripts are set to remove the service/build directory, and > > > > it gets created new every time you compile the service. Is this a > > > > problem? My guess is that the service/build directory should not be > > > > in SVN, as it gets generated at compile time! > > > > > > > > Any hints on what I can do to commit my changes? > > > > > > > > Thanks, > > > > Ioan > > > > > > > > Ben Clifford wrote: > > > > > > > > > On Mon, 16 Jul 2007, Ioan Raicu wrote: > > > > > > > > > > > > > > > > > > > > > Hey Ben, > > > > > > Here is the latest Falkon code base, including all compiled classes, scripts, > > > > > > libraries, 1.4 JRE, ploticus binaries, GT4 WS-core container, web server, > > > > > > etc... its the entire branch that is needed containing all the different > > > > > > Falkon components. I would have preffered to clean things up a bit, but here > > > > > > it is, and I'll do the clean-up later... > > > > > > http://people.cs.uchicago.edu/~iraicu/research/Falkon/Falkon_v0.8.1.tgz > > > > > > > > > > > > > > > > > I just imported this into the vdl2 subversion repo. > > > > > > > > > > Type: > > > > > > > > > > svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon > > > > > > > > > > to get the checkout. > > > > > > > > > > I removed the embedded JRE (putting aside issues of whether we should big > > > > > binaries like that in the SVN, a quick glance at the JRE redistribution > > > > > licence looked like it was not something acceptable) > > > > > > > > > > If you edit files, you can commit them with: > > > > > > > > > > svn commit > > > > > > > > > > which will require you to feed in your CI password. > > > > > > > > > > Type svn update in the root directory of your checkout to pull down > > > > > changes that other people have made since your last checkout/update > > > > > (probably you'll find me making a bunch of those to tidy some things up) > > > > > > > > > > If you add files, you will need to: > > > > > > > > > > svn add myfile.java > > > > > > > > > > before committing it. > > > > > > > > > > This is the tarball as I received it, so has lots of built cruft in there > > > > > (.class files and things). > > > > > > > > > > I'll help work on tidying that up in the repository. > > > > > > > > > > Please commit any changes you have made since this tarball, and begin > > > > > making your releases from committed SVN code rather than from your own > > > > > private codebase - that way, people can talk about 'falkon built from > > > > > r972' and then everyone can look at the exact code version from SVN. > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > ____________________________________________________________________ > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From iraicu at cs.uchicago.edu Tue Aug 7 12:03:35 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 07 Aug 2007 12:03:35 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> Message-ID: <46B8A5E7.3010100@cs.uchicago.edu> Veronika Nefedova wrote: > Ioan, > > It looks like the Falcon (including provider-deef) was put in SVN on > June 27th. The Falkon provider yes, but I believe the Falkon code only made it in on July 25th. I have not made any changes in Falkon itself since then, so there are no issues there. > You really were supposed to use the SVN code from that point. Sigh. > Did you do any changes to viper install after June 27th? Offcourse we have, this has been exactly the time that we have been trying to run the 244 mol run! The Falkon provider has seen quite a few fixes over the last 1~2 months. If we are now using a version from June 27th, I bet we are using one that had problems and were later fixed... in your Swift install on viper. If you had it checked out, and we modified it, you should have just checked it in. If you never had it checked out, then we were just editing a local copy. Let's see if Yong responds back with hopefully another copy of the provider source. Ioan > > Nika > > On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: > >> Could it be that the fixes were done before the original SVN >> checkin? If not, then at least we know why things aren't working. >> I bet the latest provider source was in Nika's Swift install on >> viper. Nika, I take it you don't have this anymore, as SVN updates >> overwrote this. Yong, is there any other place you might have the >> latest provider source? If not, I guess we need to take another look >> through the provider source to fix the issues that we knew of... >> >> Ioan >> >> Mihael Hategan wrote: >>> Well, it doesn't look like the falkon provider in SVN has been updated >>> at all in terms of fixing synchronization issues. All commits on >>> provider-deef come from either ben or me: >>> >>> bash-3.1$ svn log >>> ------------------------------------------------------------------------ >>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, 03 Aug >>> 2007) | 1 line >>> >>> removed gt4 stuff and added them as a dependency >>> ------------------------------------------------------------------------ >>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, 03 Aug >>> 2007) | 1 line >>> >>> removed gt4 stuff and added them as a dependency >>> ------------------------------------------------------------------------ >>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug >>> 2007) | 1 line >>> >>> a very small readme for provider-deef >>> ------------------------------------------------------------------------ >>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun >>> 2007) | 1 line >>> >>> remove dist directory form svn >>> ------------------------------------------------------------------------ >>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun >>> 2007) | 20 lines >>> >>> provider-deef, the Falkon/cog provider >>> >>> based on source in below message, with .class files deleted >>> >>> >>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>> From: Veronika Nefedova >>> To: Yong Zhao >>> Cc: Ben Clifford , Mihael Hategan >>> , >>> iraicu at cs.uchicago.edu, Ian Foster , >>> Mike Wilde , >>> Tiberiu Stef-Praun >>> Subject: Re: 244 molecule MolDyn run... >>> >>> its on viper.uchicago.edu >>> in : /home/nefedova/cogl/modules/provider-deef/ >>> I also tared it up and put in my home on terminable: ~nefedova/cogl.tgz >>> >>> Nika >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>> >>>> Mihael, do you have any clues on why this run has failed? Ioan - my >>>> answers to your questions are below... >>>> >>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>> >>>> >>>>> It looks like viper (where Swift is running) is idle, and so is tg- >>>>> viz-login2 (where Falkon is running). >>>>> What looks evident to me is that the normal list of events is for a >>>>> successful task: >>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log >>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: >>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: >>>>> 0-1-73-2-31-0-0-1186444341989 0 >>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: >>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>>>> >>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>> 17566 175660 2179412 >>>>> >>>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread >>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>> 7959 55713 785035 >>>>> >>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>> 190968 1909680 24003796 >>>>> >>>>> Now, 17566 tasks were submitted, 7959 notifiation were received >>>>> from Falkon, and 190968 tasks were set to completed... >>>>> >>>>> Obviously this isn't right. Falkon only saw 7959 tasks, so I would >>>>> argue that the # of notifications received is correct. The >>>>> submitted # of tasks looks like the # I would have expected, but >>>>> all the tasks did not make it to Falkon. The Falkon provider is >>>>> what sits between the change of status to submitted, and the >>>>> receipt of the notification, so I would say that is the first place >>>>> we need to look for more details... there used to some extra debug >>>>> info in the Falkon provider that simply printed all the tasks that >>>>> were actually being submitted to Falkon (as opposed to just the >>>>> change of status within Karajan). I don't see those debug >>>>> statements, I bet they got overwritten in the SVN update. >>>>> What about the completed tasks, why are there so many (190K) >>>>> completed tasks? Where did they come from? >>>>> >>>>> >>>> "Task" doesn't mean job. It could be just data being staged in , etc. >>>> The first 2 are important -- (Submitted vs Completed). Since it >>>> differs, this is the problem... >>>> >>>> >>>> >>>>> Yong, are you keeping up with these emails? Do you still have a >>>>> copy of the latest Falkon provider that you edited just before you >>>>> left? Can you just take a look through there to make sure nothing >>>>> has been broken with the SVN updates? If you don't have time for >>>>> this now (considering today was your first day on the new job), >>>>> I'll dig through there and see if I can make some sense of what is >>>>> happening! >>>>> >>>>> One last thing, Ben mentioned that the Falkon provider you saw in >>>>> Nika's account was different than what was in SVN. Ben, did you at >>>>> least look at modification dates? How old was one as opposed to >>>>> the other? I hope we did not revert back to an older version that >>>>> might have had some bug in it.... >>>>> >>>>> >>>> I had to update to the latest version of provider-deef from SVN since >>>> without the update nothing worked. The version I am at now is 1050. >>>> But this is exactly the same version of swift/deef I used for our >>>> Friday run (which 'worked' from Falcon/Swift point of view) >>>> >>>> Nika >>>> >>>> >>>> >>>>> Ioan >>>>> >>>>> Veronika Nefedova wrote: >>>>> >>>>>> Well, there are some discrepancies: >>>>>> >>>>>> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- >>>>>> zhgo6be8tjhi1.log | wc >>>>>> 7959 244749 3241072 >>>>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- >>>>>> zhgo6be8tjhi1.log | wc >>>>>> 17207 564648 7949388 >>>>>> nefedova at viper:~/alamines> >>>>>> >>>>>> I.e. almost half of the jobs haven't finished (according to swift) >>>>>> >>>>>> I also have some exceptions: >>>>>> >>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: >>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception >>>>>> in getFile >>>>>> (80 of those): >>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>>> zhgo6be8tjhi1.log | wc >>>>>> 80 880 9705 >>>>>> nefedova at viper:~/alamines> >>>>>> >>>>>> >>>>>> Nika >>>>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Tue Aug 7 12:05:45 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 7 Aug 2007 17:05:45 +0000 (GMT) Subject: [Swift-devel] Re: Falkon code and logs In-Reply-To: <46B8A433.4070407@cs.uchicago.edu> References: <469BE095.4010608@cs.uchicago.edu> <46B89B48.1040209@cs.uchicago.edu> <1186504184.18998.4.camel@blabla.mcs.anl.gov> <46B8A3C4.9030300@cs.uchicago.edu> <46B8A433.4070407@cs.uchicago.edu> Message-ID: On Tue, 7 Aug 2007, Ioan Raicu wrote: > Actually, did the commit occur successfully? If you want to see all the recent commits for any module, you can look at the web interface: http://www.ci.uchicago.edu/trac/swift/timeline At time of writing, the most recent is r1073 by me. When you make a commit successfully (which will happen sometime after you get commit rights - please wait), you should see the next in sequence appear at the top of that page with your name and commentary attached. -- From bugzilla-daemon at mcs.anl.gov Tue Aug 7 12:31:31 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 7 Aug 2007 12:31:31 -0500 (CDT) Subject: [Swift-devel] [Bug 86] recompilation should not be suppressed if compiler version has changed In-Reply-To: Message-ID: <20070807173131.B7C4E164EC@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=86 ------- Comment #1 from hategan at mcs.anl.gov 2007-08-07 12:31 ------- Both org.griphyn.vdl.toolkit.VDLt2VDLx and org.griphyn.vdl.engine.Karajan should have a getVersion() or preferably getLastModificationDate(). -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From benc at hawaga.org.uk Tue Aug 7 14:08:51 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 7 Aug 2007 19:08:51 +0000 (GMT) Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B8A5E7.3010100@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B8A5E7.3010100@cs.uchicago.edu> Message-ID: On Tue, 7 Aug 2007, Ioan Raicu wrote: > viper. If you had it checked out, and we modified it, you should have just > checked it in. No, the person who made the modifications should have 'just checked [them] in'. That's the way the whole world works! -- From benc at hawaga.org.uk Tue Aug 7 14:17:35 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 7 Aug 2007 19:17:35 +0000 (GMT) Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D.1060809@cs.uchicago.edu> Message-ID: On Tue, 7 Aug 2007, Veronika Nefedova wrote: > Did > you do any changes to viper install after June 27th? You really shouldn't be letting people mess round with your install - the *only* way in which code should land there is by you putting it there, and ideally you would only be putting it there from the various SVN locations. -- From iraicu at cs.uchicago.edu Tue Aug 7 14:40:36 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 07 Aug 2007 14:40:36 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D.1060809@cs.uchicago.edu> Message-ID: <46B8CAB4.6020702@cs.uchicago.edu> Right, ideally... but the debugging process was going so slow, that we started messing around with Yong and I making changes in Nika's account, having Nika use my credentials, etc... I spoke to Yong, and he said he did some tests from his account on viper just before he left, so there should be a good copy of the provider there! I'll look for it and see if I can patch up the Falkon provider! BTW, I still can't commit changes to SVN... Ioan Ben Clifford wrote: > On Tue, 7 Aug 2007, Veronika Nefedova wrote: > > >> Did >> you do any changes to viper install after June 27th? >> > > You really shouldn't be letting people mess round with your install - the > *only* way in which code should land there is by you putting it there, and > ideally you would only be putting it there from the various SVN locations. > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Tue Aug 7 14:43:22 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 7 Aug 2007 19:43:22 +0000 (GMT) Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B8CAB4.6020702@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D.1060809@cs.uchicago.edu> <46B8CAB4.6020702@cs.uchicago.edu> Message-ID: On Tue, 7 Aug 2007, Ioan Raicu wrote: > BTW, I still can't commit changes to SVN... Wait for response to that request i put in... -- From iraicu at cs.uchicago.edu Wed Aug 8 11:59:26 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 08 Aug 2007 11:59:26 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> Message-ID: <46B9F66E.5060103@cs.uchicago.edu> OK everyone, I found Yong's version of the provider dated July 26th, much more recent than what was in SVN on June 27th. I updated Nika's version of the provider (which has been checked out of SVN), and recompiled&deploy! ant distclean ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ dist I even updated updated some of the logging info to use the logger (some were not using the logger). Nika, Falkon is freshly restarted and ready for another test run! Falkon Factory Service: http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm Ioan Veronika Nefedova wrote: > Ioan, > > It looks like the Falcon (including provider-deef) was put in SVN on > June 27th. You really were supposed to use the SVN code from that > point. Sigh. Did you do any changes to viper install after June 27th? > > Nika > > On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: > >> Could it be that the fixes were done before the original SVN >> checkin? If not, then at least we know why things aren't working. >> I bet the latest provider source was in Nika's Swift install on >> viper. Nika, I take it you don't have this anymore, as SVN updates >> overwrote this. Yong, is there any other place you might have the >> latest provider source? If not, I guess we need to take another look >> through the provider source to fix the issues that we knew of... >> >> Ioan >> >> Mihael Hategan wrote: >>> Well, it doesn't look like the falkon provider in SVN has been updated >>> at all in terms of fixing synchronization issues. All commits on >>> provider-deef come from either ben or me: >>> >>> bash-3.1$ svn log >>> ------------------------------------------------------------------------ >>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, 03 Aug >>> 2007) | 1 line >>> >>> removed gt4 stuff and added them as a dependency >>> ------------------------------------------------------------------------ >>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, 03 Aug >>> 2007) | 1 line >>> >>> removed gt4 stuff and added them as a dependency >>> ------------------------------------------------------------------------ >>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug >>> 2007) | 1 line >>> >>> a very small readme for provider-deef >>> ------------------------------------------------------------------------ >>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun >>> 2007) | 1 line >>> >>> remove dist directory form svn >>> ------------------------------------------------------------------------ >>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun >>> 2007) | 20 lines >>> >>> provider-deef, the Falkon/cog provider >>> >>> based on source in below message, with .class files deleted >>> >>> >>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>> From: Veronika Nefedova >>> To: Yong Zhao >>> Cc: Ben Clifford , Mihael Hategan >>> , >>> iraicu at cs.uchicago.edu, Ian Foster , >>> Mike Wilde , >>> Tiberiu Stef-Praun >>> Subject: Re: 244 molecule MolDyn run... >>> >>> its on viper.uchicago.edu >>> in : /home/nefedova/cogl/modules/provider-deef/ >>> I also tared it up and put in my home on terminable: ~nefedova/cogl.tgz >>> >>> Nika >>> >>> >>> ------------------------------------------------------------------------ >>> >>> >>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>> >>>> Mihael, do you have any clues on why this run has failed? Ioan - my >>>> answers to your questions are below... >>>> >>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>> >>>> >>>>> It looks like viper (where Swift is running) is idle, and so is tg- >>>>> viz-login2 (where Falkon is running). >>>>> What looks evident to me is that the normal list of events is for a >>>>> successful task: >>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log >>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: >>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: >>>>> 0-1-73-2-31-0-0-1186444341989 0 >>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: >>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>>>> >>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>> 17566 175660 2179412 >>>>> >>>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread >>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>> 7959 55713 785035 >>>>> >>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>> 190968 1909680 24003796 >>>>> >>>>> Now, 17566 tasks were submitted, 7959 notifiation were received >>>>> from Falkon, and 190968 tasks were set to completed... >>>>> >>>>> Obviously this isn't right. Falkon only saw 7959 tasks, so I would >>>>> argue that the # of notifications received is correct. The >>>>> submitted # of tasks looks like the # I would have expected, but >>>>> all the tasks did not make it to Falkon. The Falkon provider is >>>>> what sits between the change of status to submitted, and the >>>>> receipt of the notification, so I would say that is the first place >>>>> we need to look for more details... there used to some extra debug >>>>> info in the Falkon provider that simply printed all the tasks that >>>>> were actually being submitted to Falkon (as opposed to just the >>>>> change of status within Karajan). I don't see those debug >>>>> statements, I bet they got overwritten in the SVN update. >>>>> What about the completed tasks, why are there so many (190K) >>>>> completed tasks? Where did they come from? >>>>> >>>>> >>>> "Task" doesn't mean job. It could be just data being staged in , etc. >>>> The first 2 are important -- (Submitted vs Completed). Since it >>>> differs, this is the problem... >>>> >>>> >>>> >>>>> Yong, are you keeping up with these emails? Do you still have a >>>>> copy of the latest Falkon provider that you edited just before you >>>>> left? Can you just take a look through there to make sure nothing >>>>> has been broken with the SVN updates? If you don't have time for >>>>> this now (considering today was your first day on the new job), >>>>> I'll dig through there and see if I can make some sense of what is >>>>> happening! >>>>> >>>>> One last thing, Ben mentioned that the Falkon provider you saw in >>>>> Nika's account was different than what was in SVN. Ben, did you at >>>>> least look at modification dates? How old was one as opposed to >>>>> the other? I hope we did not revert back to an older version that >>>>> might have had some bug in it.... >>>>> >>>>> >>>> I had to update to the latest version of provider-deef from SVN since >>>> without the update nothing worked. The version I am at now is 1050. >>>> But this is exactly the same version of swift/deef I used for our >>>> Friday run (which 'worked' from Falcon/Swift point of view) >>>> >>>> Nika >>>> >>>> >>>> >>>>> Ioan >>>>> >>>>> Veronika Nefedova wrote: >>>>> >>>>>> Well, there are some discrepancies: >>>>>> >>>>>> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- >>>>>> zhgo6be8tjhi1.log | wc >>>>>> 7959 244749 3241072 >>>>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- >>>>>> zhgo6be8tjhi1.log | wc >>>>>> 17207 564648 7949388 >>>>>> nefedova at viper:~/alamines> >>>>>> >>>>>> I.e. almost half of the jobs haven't finished (according to swift) >>>>>> >>>>>> I also have some exceptions: >>>>>> >>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: >>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception >>>>>> in getFile >>>>>> (80 of those): >>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>>> zhgo6be8tjhi1.log | wc >>>>>> 80 880 9705 >>>>>> nefedova at viper:~/alamines> >>>>>> >>>>>> >>>>>> Nika >>>>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Aug 8 13:00:43 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 08 Aug 2007 13:00:43 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B9F66E.5060103@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> Message-ID: <1186596043.28685.2.camel@blabla.mcs.anl.gov> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: > OK everyone, I found Yong's version of the provider dated July 26th, > much more recent than what was in SVN on June 27th. I updated Nika's > version of the provider (which has been checked out of SVN), No. P u t t h e c h a n g e s i n S V N ! > and recompiled&deploy! > > ant distclean > ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ > dist > > I even updated updated some of the logging info to use the logger > (some were not using the logger). > > Nika, Falkon is freshly restarted and ready for another test run! > > Falkon Factory Service: > http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService > Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm > > Ioan > > Veronika Nefedova wrote: > > Ioan, > > > > > > It looks like the Falcon (including provider-deef) was put in SVN on > > June 27th. You really were supposed to use the SVN code from that > > point. Sigh. Did you do any changes to viper install after June > > 27th? > > > > > > Nika > > > > On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: > > > > > Could it be that the fixes were done before the original SVN > > > checkin? If not, then at least we know why things aren't > > > working. I bet the latest provider source was in Nika's Swift > > > install on viper. Nika, I take it you don't have this anymore, as > > > SVN updates overwrote this. Yong, is there any other place you > > > might have the latest provider source? If not, I guess we need to > > > take another look through the provider source to fix the issues > > > that we knew of... > > > > > > Ioan > > > > > > Mihael Hategan wrote: > > > > Well, it doesn't look like the falkon provider in SVN has been updated > > > > at all in terms of fixing synchronization issues. All commits on > > > > provider-deef come from either ben or me: > > > > > > > > bash-3.1$ svn log > > > > ------------------------------------------------------------------------ > > > > r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, 03 Aug > > > > 2007) | 1 line > > > > > > > > removed gt4 stuff and added them as a dependency > > > > ------------------------------------------------------------------------ > > > > r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, 03 Aug > > > > 2007) | 1 line > > > > > > > > removed gt4 stuff and added them as a dependency > > > > ------------------------------------------------------------------------ > > > > r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug > > > > 2007) | 1 line > > > > > > > > a very small readme for provider-deef > > > > ------------------------------------------------------------------------ > > > > r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun > > > > 2007) | 1 line > > > > > > > > remove dist directory form svn > > > > ------------------------------------------------------------------------ > > > > r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun > > > > 2007) | 20 lines > > > > > > > > provider-deef, the Falkon/cog provider > > > > > > > > based on source in below message, with .class files deleted > > > > > > > > > > > > Date: Wed, 27 Jun 2007 09:27:23 -0500 > > > > From: Veronika Nefedova > > > > To: Yong Zhao > > > > Cc: Ben Clifford , Mihael Hategan > > > > , > > > > iraicu at cs.uchicago.edu, Ian Foster , > > > > Mike Wilde , > > > > Tiberiu Stef-Praun > > > > Subject: Re: 244 molecule MolDyn run... > > > > > > > > its on viper.uchicago.edu > > > > in : /home/nefedova/cogl/modules/provider-deef/ > > > > I also tared it up and put in my home on terminable: ~nefedova/cogl.tgz > > > > > > > > Nika > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > > > > > On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: > > > > > > > > > Mihael, do you have any clues on why this run has failed? Ioan - my > > > > > answers to your questions are below... > > > > > > > > > > On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: > > > > > > > > > > > > > > > > It looks like viper (where Swift is running) is idle, and so is tg- > > > > > > viz-login2 (where Falkon is running). > > > > > > What looks evident to me is that the normal list of events is for a > > > > > > successful task: > > > > > > iraicu at viper:/home/nefedova/alamines> grep "urn: > > > > > > 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log > > > > > > 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: > > > > > > 0-1-73-2-31-0-0-1186444341989) setting status to Submitted > > > > > > 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: > > > > > > 0-1-73-2-31-0-0-1186444341989 0 > > > > > > 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: > > > > > > 0-1-73-2-31-0-0-1186444341989) setting status to Completed > > > > > > > > > > > > iraicu at viper:/home/nefedova/alamines> grep "setting status to > > > > > > Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > > > > > 17566 175660 2179412 > > > > > > > > > > > > iraicu at viper:/home/nefedova/alamines> grep "NotificationThread > > > > > > notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > > > > > 7959 55713 785035 > > > > > > > > > > > > iraicu at viper:/home/nefedova/alamines> grep "setting status to > > > > > > Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > > > > > 190968 1909680 24003796 > > > > > > > > > > > > Now, 17566 tasks were submitted, 7959 notifiation were received > > > > > > from Falkon, and 190968 tasks were set to completed... > > > > > > > > > > > > Obviously this isn't right. Falkon only saw 7959 tasks, so I would > > > > > > argue that the # of notifications received is correct. The > > > > > > submitted # of tasks looks like the # I would have expected, but > > > > > > all the tasks did not make it to Falkon. The Falkon provider is > > > > > > what sits between the change of status to submitted, and the > > > > > > receipt of the notification, so I would say that is the first place > > > > > > we need to look for more details... there used to some extra debug > > > > > > info in the Falkon provider that simply printed all the tasks that > > > > > > were actually being submitted to Falkon (as opposed to just the > > > > > > change of status within Karajan). I don't see those debug > > > > > > statements, I bet they got overwritten in the SVN update. > > > > > > What about the completed tasks, why are there so many (190K) > > > > > > completed tasks? Where did they come from? > > > > > > > > > > > > > > > > > "Task" doesn't mean job. It could be just data being staged in , etc. > > > > > The first 2 are important -- (Submitted vs Completed). Since it > > > > > differs, this is the problem... > > > > > > > > > > > > > > > > > > > > > Yong, are you keeping up with these emails? Do you still have a > > > > > > copy of the latest Falkon provider that you edited just before you > > > > > > left? Can you just take a look through there to make sure nothing > > > > > > has been broken with the SVN updates? If you don't have time for > > > > > > this now (considering today was your first day on the new job), > > > > > > I'll dig through there and see if I can make some sense of what is > > > > > > happening! > > > > > > > > > > > > One last thing, Ben mentioned that the Falkon provider you saw in > > > > > > Nika's account was different than what was in SVN. Ben, did you at > > > > > > least look at modification dates? How old was one as opposed to > > > > > > the other? I hope we did not revert back to an older version that > > > > > > might have had some bug in it.... > > > > > > > > > > > > > > > > > I had to update to the latest version of provider-deef from SVN since > > > > > without the update nothing worked. The version I am at now is 1050. > > > > > But this is exactly the same version of swift/deef I used for our > > > > > Friday run (which 'worked' from Falcon/Swift point of view) > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > Ioan > > > > > > > > > > > > Veronika Nefedova wrote: > > > > > > > > > > > > > Well, there are some discrepancies: > > > > > > > > > > > > > > nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- > > > > > > > zhgo6be8tjhi1.log | wc > > > > > > > 7959 244749 3241072 > > > > > > > nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- > > > > > > > zhgo6be8tjhi1.log | wc > > > > > > > 17207 564648 7949388 > > > > > > > nefedova at viper:~/alamines> > > > > > > > > > > > > > > I.e. almost half of the jobs haven't finished (according to swift) > > > > > > > > > > > > > > I also have some exceptions: > > > > > > > > > > > > > > 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: > > > > > > > 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception > > > > > > > in getFile > > > > > > > (80 of those): > > > > > > > nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- > > > > > > > zhgo6be8tjhi1.log | wc > > > > > > > 80 880 9705 > > > > > > > nefedova at viper:~/alamines> > > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From iraicu at cs.uchicago.edu Wed Aug 8 13:04:16 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 08 Aug 2007 13:04:16 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186596043.28685.2.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186596043.28685.2.camel@blabla.mcs.anl.gov> Message-ID: <46BA05A0.2070909@cs.uchicago.edu> Shouldn't we be certain that things work before we commit the changes? I thought the commit would take place after we try MolDyn out and we see things are back to normal. Ioan Mihael Hategan wrote: > On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: > >> OK everyone, I found Yong's version of the provider dated July 26th, >> much more recent than what was in SVN on June 27th. I updated Nika's >> version of the provider (which has been checked out of SVN), >> > > No. P u t t h e c h a n g e s i n S V N ! > > >> and recompiled&deploy! >> >> ant distclean >> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ >> dist >> >> I even updated updated some of the logging info to use the logger >> (some were not using the logger). >> >> Nika, Falkon is freshly restarted and ready for another test run! >> >> Falkon Factory Service: >> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService >> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >> >> Ioan >> >> Veronika Nefedova wrote: >> >>> Ioan, >>> >>> >>> It looks like the Falcon (including provider-deef) was put in SVN on >>> June 27th. You really were supposed to use the SVN code from that >>> point. Sigh. Did you do any changes to viper install after June >>> 27th? >>> >>> >>> Nika >>> >>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: >>> >>> >>>> Could it be that the fixes were done before the original SVN >>>> checkin? If not, then at least we know why things aren't >>>> working. I bet the latest provider source was in Nika's Swift >>>> install on viper. Nika, I take it you don't have this anymore, as >>>> SVN updates overwrote this. Yong, is there any other place you >>>> might have the latest provider source? If not, I guess we need to >>>> take another look through the provider source to fix the issues >>>> that we knew of... >>>> >>>> Ioan >>>> >>>> Mihael Hategan wrote: >>>> >>>>> Well, it doesn't look like the falkon provider in SVN has been updated >>>>> at all in terms of fixing synchronization issues. All commits on >>>>> provider-deef come from either ben or me: >>>>> >>>>> bash-3.1$ svn log >>>>> ------------------------------------------------------------------------ >>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, 03 Aug >>>>> 2007) | 1 line >>>>> >>>>> removed gt4 stuff and added them as a dependency >>>>> ------------------------------------------------------------------------ >>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, 03 Aug >>>>> 2007) | 1 line >>>>> >>>>> removed gt4 stuff and added them as a dependency >>>>> ------------------------------------------------------------------------ >>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug >>>>> 2007) | 1 line >>>>> >>>>> a very small readme for provider-deef >>>>> ------------------------------------------------------------------------ >>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun >>>>> 2007) | 1 line >>>>> >>>>> remove dist directory form svn >>>>> ------------------------------------------------------------------------ >>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun >>>>> 2007) | 20 lines >>>>> >>>>> provider-deef, the Falkon/cog provider >>>>> >>>>> based on source in below message, with .class files deleted >>>>> >>>>> >>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>>>> From: Veronika Nefedova >>>>> To: Yong Zhao >>>>> Cc: Ben Clifford , Mihael Hategan >>>>> , >>>>> iraicu at cs.uchicago.edu, Ian Foster , >>>>> Mike Wilde , >>>>> Tiberiu Stef-Praun >>>>> Subject: Re: 244 molecule MolDyn run... >>>>> >>>>> its on viper.uchicago.edu >>>>> in : /home/nefedova/cogl/modules/provider-deef/ >>>>> I also tared it up and put in my home on terminable: ~nefedova/cogl.tgz >>>>> >>>>> Nika >>>>> >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> >>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>>>> >>>>> >>>>>> Mihael, do you have any clues on why this run has failed? Ioan - my >>>>>> answers to your questions are below... >>>>>> >>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>>>> >>>>>> >>>>>> >>>>>>> It looks like viper (where Swift is running) is idle, and so is tg- >>>>>>> viz-login2 (where Falkon is running). >>>>>>> What looks evident to me is that the normal list of events is for a >>>>>>> successful task: >>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log >>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: >>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: >>>>>>> 0-1-73-2-31-0-0-1186444341989 0 >>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: >>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>>>>>> >>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>> 17566 175660 2179412 >>>>>>> >>>>>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread >>>>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>> 7959 55713 785035 >>>>>>> >>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>> 190968 1909680 24003796 >>>>>>> >>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were received >>>>>>> from Falkon, and 190968 tasks were set to completed... >>>>>>> >>>>>>> Obviously this isn't right. Falkon only saw 7959 tasks, so I would >>>>>>> argue that the # of notifications received is correct. The >>>>>>> submitted # of tasks looks like the # I would have expected, but >>>>>>> all the tasks did not make it to Falkon. The Falkon provider is >>>>>>> what sits between the change of status to submitted, and the >>>>>>> receipt of the notification, so I would say that is the first place >>>>>>> we need to look for more details... there used to some extra debug >>>>>>> info in the Falkon provider that simply printed all the tasks that >>>>>>> were actually being submitted to Falkon (as opposed to just the >>>>>>> change of status within Karajan). I don't see those debug >>>>>>> statements, I bet they got overwritten in the SVN update. >>>>>>> What about the completed tasks, why are there so many (190K) >>>>>>> completed tasks? Where did they come from? >>>>>>> >>>>>>> >>>>>>> >>>>>> "Task" doesn't mean job. It could be just data being staged in , etc. >>>>>> The first 2 are important -- (Submitted vs Completed). Since it >>>>>> differs, this is the problem... >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Yong, are you keeping up with these emails? Do you still have a >>>>>>> copy of the latest Falkon provider that you edited just before you >>>>>>> left? Can you just take a look through there to make sure nothing >>>>>>> has been broken with the SVN updates? If you don't have time for >>>>>>> this now (considering today was your first day on the new job), >>>>>>> I'll dig through there and see if I can make some sense of what is >>>>>>> happening! >>>>>>> >>>>>>> One last thing, Ben mentioned that the Falkon provider you saw in >>>>>>> Nika's account was different than what was in SVN. Ben, did you at >>>>>>> least look at modification dates? How old was one as opposed to >>>>>>> the other? I hope we did not revert back to an older version that >>>>>>> might have had some bug in it.... >>>>>>> >>>>>>> >>>>>>> >>>>>> I had to update to the latest version of provider-deef from SVN since >>>>>> without the update nothing worked. The version I am at now is 1050. >>>>>> But this is exactly the same version of swift/deef I used for our >>>>>> Friday run (which 'worked' from Falcon/Swift point of view) >>>>>> >>>>>> Nika >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Ioan >>>>>>> >>>>>>> Veronika Nefedova wrote: >>>>>>> >>>>>>> >>>>>>>> Well, there are some discrepancies: >>>>>>>> >>>>>>>> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- >>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>> 7959 244749 3241072 >>>>>>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- >>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>> 17207 564648 7949388 >>>>>>>> nefedova at viper:~/alamines> >>>>>>>> >>>>>>>> I.e. almost half of the jobs haven't finished (according to swift) >>>>>>>> >>>>>>>> I also have some exceptions: >>>>>>>> >>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: >>>>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception >>>>>>>> in getFile >>>>>>>> (80 of those): >>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>> 80 880 9705 >>>>>>>> nefedova at viper:~/alamines> >>>>>>>> >>>>>>>> >>>>>>>> Nika >>>>>>>> >>>>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>>> >>>>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Aug 8 13:19:05 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 08 Aug 2007 13:19:05 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BA05A0.2070909@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186596043.28685.2.camel@blabla.mcs.anl.gov> <46BA05A0.2070909@cs.uchicago.edu> Message-ID: <1186597145.29195.8.camel@blabla.mcs.anl.gov> On Wed, 2007-08-08 at 13:04 -0500, Ioan Raicu wrote: > Shouldn't we be certain that things work before we commit the changes? No. > I thought the commit would take place after we try MolDyn out and we > see things are back to normal. The whole problem we've seen the past few days was due to the fact that Nika had no clear place to get the code from, so she repeatedly ended up with broken versions. S o p u t t h e c h a n g e s i n S V N ! > > Ioan > > Mihael Hategan wrote: > > On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: > > > > > OK everyone, I found Yong's version of the provider dated July 26th, > > > much more recent than what was in SVN on June 27th. I updated Nika's > > > version of the provider (which has been checked out of SVN), > > > > > > > No. P u t t h e c h a n g e s i n S V N ! > > > > > > > and recompiled&deploy! > > > > > > ant distclean > > > ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ > > > dist > > > > > > I even updated updated some of the logging info to use the logger > > > (some were not using the logger). > > > > > > Nika, Falkon is freshly restarted and ready for another test run! > > > > > > Falkon Factory Service: > > > http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService > > > Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm > > > > > > Ioan > > > > > > Veronika Nefedova wrote: > > > > > > > Ioan, > > > > > > > > > > > > It looks like the Falcon (including provider-deef) was put in SVN on > > > > June 27th. You really were supposed to use the SVN code from that > > > > point. Sigh. Did you do any changes to viper install after June > > > > 27th? > > > > > > > > > > > > Nika > > > > > > > > On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: > > > > > > > > > > > > > Could it be that the fixes were done before the original SVN > > > > > checkin? If not, then at least we know why things aren't > > > > > working. I bet the latest provider source was in Nika's Swift > > > > > install on viper. Nika, I take it you don't have this anymore, as > > > > > SVN updates overwrote this. Yong, is there any other place you > > > > > might have the latest provider source? If not, I guess we need to > > > > > take another look through the provider source to fix the issues > > > > > that we knew of... > > > > > > > > > > Ioan > > > > > > > > > > Mihael Hategan wrote: > > > > > > > > > > > Well, it doesn't look like the falkon provider in SVN has been updated > > > > > > at all in terms of fixing synchronization issues. All commits on > > > > > > provider-deef come from either ben or me: > > > > > > > > > > > > bash-3.1$ svn log > > > > > > ------------------------------------------------------------------------ > > > > > > r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, 03 Aug > > > > > > 2007) | 1 line > > > > > > > > > > > > removed gt4 stuff and added them as a dependency > > > > > > ------------------------------------------------------------------------ > > > > > > r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, 03 Aug > > > > > > 2007) | 1 line > > > > > > > > > > > > removed gt4 stuff and added them as a dependency > > > > > > ------------------------------------------------------------------------ > > > > > > r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug > > > > > > 2007) | 1 line > > > > > > > > > > > > a very small readme for provider-deef > > > > > > ------------------------------------------------------------------------ > > > > > > r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun > > > > > > 2007) | 1 line > > > > > > > > > > > > remove dist directory form svn > > > > > > ------------------------------------------------------------------------ > > > > > > r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun > > > > > > 2007) | 20 lines > > > > > > > > > > > > provider-deef, the Falkon/cog provider > > > > > > > > > > > > based on source in below message, with .class files deleted > > > > > > > > > > > > > > > > > > Date: Wed, 27 Jun 2007 09:27:23 -0500 > > > > > > From: Veronika Nefedova > > > > > > To: Yong Zhao > > > > > > Cc: Ben Clifford , Mihael Hategan > > > > > > , > > > > > > iraicu at cs.uchicago.edu, Ian Foster , > > > > > > Mike Wilde , > > > > > > Tiberiu Stef-Praun > > > > > > Subject: Re: 244 molecule MolDyn run... > > > > > > > > > > > > its on viper.uchicago.edu > > > > > > in : /home/nefedova/cogl/modules/provider-deef/ > > > > > > I also tared it up and put in my home on terminable: ~nefedova/cogl.tgz > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > > > > > > > > > > > On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: > > > > > > > > > > > > > > > > > > > Mihael, do you have any clues on why this run has failed? Ioan - my > > > > > > > answers to your questions are below... > > > > > > > > > > > > > > On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > It looks like viper (where Swift is running) is idle, and so is tg- > > > > > > > > viz-login2 (where Falkon is running). > > > > > > > > What looks evident to me is that the normal list of events is for a > > > > > > > > successful task: > > > > > > > > iraicu at viper:/home/nefedova/alamines> grep "urn: > > > > > > > > 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log > > > > > > > > 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: > > > > > > > > 0-1-73-2-31-0-0-1186444341989) setting status to Submitted > > > > > > > > 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: > > > > > > > > 0-1-73-2-31-0-0-1186444341989 0 > > > > > > > > 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: > > > > > > > > 0-1-73-2-31-0-0-1186444341989) setting status to Completed > > > > > > > > > > > > > > > > iraicu at viper:/home/nefedova/alamines> grep "setting status to > > > > > > > > Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > > > > > > > 17566 175660 2179412 > > > > > > > > > > > > > > > > iraicu at viper:/home/nefedova/alamines> grep "NotificationThread > > > > > > > > notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > > > > > > > 7959 55713 785035 > > > > > > > > > > > > > > > > iraicu at viper:/home/nefedova/alamines> grep "setting status to > > > > > > > > Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > > > > > > > 190968 1909680 24003796 > > > > > > > > > > > > > > > > Now, 17566 tasks were submitted, 7959 notifiation were received > > > > > > > > from Falkon, and 190968 tasks were set to completed... > > > > > > > > > > > > > > > > Obviously this isn't right. Falkon only saw 7959 tasks, so I would > > > > > > > > argue that the # of notifications received is correct. The > > > > > > > > submitted # of tasks looks like the # I would have expected, but > > > > > > > > all the tasks did not make it to Falkon. The Falkon provider is > > > > > > > > what sits between the change of status to submitted, and the > > > > > > > > receipt of the notification, so I would say that is the first place > > > > > > > > we need to look for more details... there used to some extra debug > > > > > > > > info in the Falkon provider that simply printed all the tasks that > > > > > > > > were actually being submitted to Falkon (as opposed to just the > > > > > > > > change of status within Karajan). I don't see those debug > > > > > > > > statements, I bet they got overwritten in the SVN update. > > > > > > > > What about the completed tasks, why are there so many (190K) > > > > > > > > completed tasks? Where did they come from? > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > "Task" doesn't mean job. It could be just data being staged in , etc. > > > > > > > The first 2 are important -- (Submitted vs Completed). Since it > > > > > > > differs, this is the problem... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Yong, are you keeping up with these emails? Do you still have a > > > > > > > > copy of the latest Falkon provider that you edited just before you > > > > > > > > left? Can you just take a look through there to make sure nothing > > > > > > > > has been broken with the SVN updates? If you don't have time for > > > > > > > > this now (considering today was your first day on the new job), > > > > > > > > I'll dig through there and see if I can make some sense of what is > > > > > > > > happening! > > > > > > > > > > > > > > > > One last thing, Ben mentioned that the Falkon provider you saw in > > > > > > > > Nika's account was different than what was in SVN. Ben, did you at > > > > > > > > least look at modification dates? How old was one as opposed to > > > > > > > > the other? I hope we did not revert back to an older version that > > > > > > > > might have had some bug in it.... > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > I had to update to the latest version of provider-deef from SVN since > > > > > > > without the update nothing worked. The version I am at now is 1050. > > > > > > > But this is exactly the same version of swift/deef I used for our > > > > > > > Friday run (which 'worked' from Falcon/Swift point of view) > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > Ioan > > > > > > > > > > > > > > > > Veronika Nefedova wrote: > > > > > > > > > > > > > > > > > > > > > > > > > Well, there are some discrepancies: > > > > > > > > > > > > > > > > > > nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- > > > > > > > > > zhgo6be8tjhi1.log | wc > > > > > > > > > 7959 244749 3241072 > > > > > > > > > nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- > > > > > > > > > zhgo6be8tjhi1.log | wc > > > > > > > > > 17207 564648 7949388 > > > > > > > > > nefedova at viper:~/alamines> > > > > > > > > > > > > > > > > > > I.e. almost half of the jobs haven't finished (according to swift) > > > > > > > > > > > > > > > > > > I also have some exceptions: > > > > > > > > > > > > > > > > > > 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: > > > > > > > > > 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception > > > > > > > > > in getFile > > > > > > > > > (80 of those): > > > > > > > > > nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- > > > > > > > > > zhgo6be8tjhi1.log | wc > > > > > > > > > 80 880 9705 > > > > > > > > > nefedova at viper:~/alamines> > > > > > > > > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > > Swift-devel mailing list > > > > > > > Swift-devel at ci.uchicago.edu > > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > From nefedova at mcs.anl.gov Wed Aug 8 13:25:32 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Wed, 8 Aug 2007 13:25:32 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186597145.29195.8.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186596043.28685.2.camel@blabla.mcs.anl.gov> <46BA05A0.2070909@cs.uchicago.edu> <1186597145.29195.8.camel@blabla.mcs.anl.gov> Message-ID: <1AE8D4D2-1A05-4666-B731-8A7840116064@mcs.anl.gov> the current changes screwed up my logging again... Please, do not touch my install --- I'd rather get everything from SVN, nefedova at viper:~/alamines> swift -tc.file tc-uc.data -sites.file sites-uc-64.xml -debug MolDyn-244-loops.swift& [1] 10562 nefedova at viper:~/alamines> WARN - Failed to configure log file name DEBUG - Booting deef Nika On Aug 8, 2007, at 1:19 PM, Mihael Hategan wrote: > On Wed, 2007-08-08 at 13:04 -0500, Ioan Raicu wrote: >> Shouldn't we be certain that things work before we commit the >> changes? > > No. > >> I thought the commit would take place after we try MolDyn out >> and we >> see things are back to normal. > > The whole problem we've seen the past few days was due to the fact > that > Nika had no clear place to get the code from, so she repeatedly > ended up > with broken versions. S o p u t t h e c h a n g e s i n S V N ! > >> >> Ioan >> >> Mihael Hategan wrote: >>> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: >>> >>>> OK everyone, I found Yong's version of the provider dated July >>>> 26th, >>>> much more recent than what was in SVN on June 27th. I updated >>>> Nika's >>>> version of the provider (which has been checked out of SVN), >>>> >>> >>> No. P u t t h e c h a n g e s i n S V N ! >>> >>> >>>> and recompiled&deploy! >>>> >>>> ant distclean >>>> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2- >>>> dev/ >>>> dist >>>> >>>> I even updated updated some of the logging info to use the logger >>>> (some were not using the logger). >>>> >>>> Nika, Falkon is freshly restarted and ready for another test run! >>>> >>>> Falkon Factory Service: >>>> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/ >>>> GenericPortal/core/WS/GPFactoryService >>>> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>> >>>> Ioan >>>> >>>> Veronika Nefedova wrote: >>>> >>>>> Ioan, >>>>> >>>>> >>>>> It looks like the Falcon (including provider-deef) was put in >>>>> SVN on >>>>> June 27th. You really were supposed to use the SVN code from that >>>>> point. Sigh. Did you do any changes to viper install after June >>>>> 27th? >>>>> >>>>> >>>>> Nika >>>>> >>>>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: >>>>> >>>>> >>>>>> Could it be that the fixes were done before the original SVN >>>>>> checkin? If not, then at least we know why things aren't >>>>>> working. I bet the latest provider source was in Nika's Swift >>>>>> install on viper. Nika, I take it you don't have this >>>>>> anymore, as >>>>>> SVN updates overwrote this. Yong, is there any other place you >>>>>> might have the latest provider source? If not, I guess we >>>>>> need to >>>>>> take another look through the provider source to fix the issues >>>>>> that we knew of... >>>>>> >>>>>> Ioan >>>>>> >>>>>> Mihael Hategan wrote: >>>>>> >>>>>>> Well, it doesn't look like the falkon provider in SVN has >>>>>>> been updated >>>>>>> at all in terms of fixing synchronization issues. All commits on >>>>>>> provider-deef come from either ben or me: >>>>>>> >>>>>>> bash-3.1$ svn log >>>>>>> ---------------------------------------------------------------- >>>>>>> -------- >>>>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 >>>>>>> (Fri, 03 Aug >>>>>>> 2007) | 1 line >>>>>>> >>>>>>> removed gt4 stuff and added them as a dependency >>>>>>> ---------------------------------------------------------------- >>>>>>> -------- >>>>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 >>>>>>> (Fri, 03 Aug >>>>>>> 2007) | 1 line >>>>>>> >>>>>>> removed gt4 stuff and added them as a dependency >>>>>>> ---------------------------------------------------------------- >>>>>>> -------- >>>>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 >>>>>>> (Fri, 03 Aug >>>>>>> 2007) | 1 line >>>>>>> >>>>>>> a very small readme for provider-deef >>>>>>> ---------------------------------------------------------------- >>>>>>> -------- >>>>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, >>>>>>> 27 Jun >>>>>>> 2007) | 1 line >>>>>>> >>>>>>> remove dist directory form svn >>>>>>> ---------------------------------------------------------------- >>>>>>> -------- >>>>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, >>>>>>> 27 Jun >>>>>>> 2007) | 20 lines >>>>>>> >>>>>>> provider-deef, the Falkon/cog provider >>>>>>> >>>>>>> based on source in below message, with .class files deleted >>>>>>> >>>>>>> >>>>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>>>>>> From: Veronika Nefedova >>>>>>> To: Yong Zhao >>>>>>> Cc: Ben Clifford , Mihael Hategan >>>>>>> , >>>>>>> iraicu at cs.uchicago.edu, Ian Foster , >>>>>>> Mike Wilde , >>>>>>> Tiberiu Stef-Praun >>>>>>> Subject: Re: 244 molecule MolDyn run... >>>>>>> >>>>>>> its on viper.uchicago.edu >>>>>>> in : /home/nefedova/cogl/modules/provider-deef/ >>>>>>> I also tared it up and put in my home on terminable: >>>>>>> ~nefedova/cogl.tgz >>>>>>> >>>>>>> Nika >>>>>>> >>>>>>> >>>>>>> ---------------------------------------------------------------- >>>>>>> -------- >>>>>>> >>>>>>> >>>>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>>>>>> >>>>>>> >>>>>>>> Mihael, do you have any clues on why this run has failed? >>>>>>>> Ioan - my >>>>>>>> answers to your questions are below... >>>>>>>> >>>>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> It looks like viper (where Swift is running) is idle, and >>>>>>>>> so is tg- >>>>>>>>> viz-login2 (where Falkon is running). >>>>>>>>> What looks evident to me is that the normal list of events >>>>>>>>> is for a >>>>>>>>> successful task: >>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>>>>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops- >>>>>>>>> zhgo6be8tjhi1.log >>>>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn: >>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>>>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread >>>>>>>>> notification: urn: >>>>>>>>> 0-1-73-2-31-0-0-1186444341989 0 >>>>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, >>>>>>>>> identity=urn: >>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>>>>>>>> >>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>> 17566 175660 2179412 >>>>>>>>> >>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread >>>>>>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>> 7959 55713 785035 >>>>>>>>> >>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>> 190968 1909680 24003796 >>>>>>>>> >>>>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were >>>>>>>>> received >>>>>>>>> from Falkon, and 190968 tasks were set to completed... >>>>>>>>> >>>>>>>>> Obviously this isn't right. Falkon only saw 7959 tasks, so >>>>>>>>> I would >>>>>>>>> argue that the # of notifications received is correct. The >>>>>>>>> submitted # of tasks looks like the # I would have >>>>>>>>> expected, but >>>>>>>>> all the tasks did not make it to Falkon. The Falkon >>>>>>>>> provider is >>>>>>>>> what sits between the change of status to submitted, and the >>>>>>>>> receipt of the notification, so I would say that is the >>>>>>>>> first place >>>>>>>>> we need to look for more details... there used to some >>>>>>>>> extra debug >>>>>>>>> info in the Falkon provider that simply printed all the >>>>>>>>> tasks that >>>>>>>>> were actually being submitted to Falkon (as opposed to just >>>>>>>>> the >>>>>>>>> change of status within Karajan). I don't see those debug >>>>>>>>> statements, I bet they got overwritten in the SVN update. >>>>>>>>> What about the completed tasks, why are there so many (190K) >>>>>>>>> completed tasks? Where did they come from? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> "Task" doesn't mean job. It could be just data being staged >>>>>>>> in , etc. >>>>>>>> The first 2 are important -- (Submitted vs Completed). Since it >>>>>>>> differs, this is the problem... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Yong, are you keeping up with these emails? Do you still >>>>>>>>> have a >>>>>>>>> copy of the latest Falkon provider that you edited just >>>>>>>>> before you >>>>>>>>> left? Can you just take a look through there to make sure >>>>>>>>> nothing >>>>>>>>> has been broken with the SVN updates? If you don't have >>>>>>>>> time for >>>>>>>>> this now (considering today was your first day on the new >>>>>>>>> job), >>>>>>>>> I'll dig through there and see if I can make some sense of >>>>>>>>> what is >>>>>>>>> happening! >>>>>>>>> >>>>>>>>> One last thing, Ben mentioned that the Falkon provider you >>>>>>>>> saw in >>>>>>>>> Nika's account was different than what was in SVN. Ben, >>>>>>>>> did you at >>>>>>>>> least look at modification dates? How old was one as >>>>>>>>> opposed to >>>>>>>>> the other? I hope we did not revert back to an older >>>>>>>>> version that >>>>>>>>> might have had some bug in it.... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> I had to update to the latest version of provider-deef from >>>>>>>> SVN since >>>>>>>> without the update nothing worked. The version I am at now >>>>>>>> is 1050. >>>>>>>> But this is exactly the same version of swift/deef I used >>>>>>>> for our >>>>>>>> Friday run (which 'worked' from Falcon/Swift point of view) >>>>>>>> >>>>>>>> Nika >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Ioan >>>>>>>>> >>>>>>>>> Veronika Nefedova wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Well, there are some discrepancies: >>>>>>>>>> >>>>>>>>>> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244- >>>>>>>>>> loops- >>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>> 7959 244749 3241072 >>>>>>>>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244- >>>>>>>>>> loops- >>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>> 17207 564648 7949388 >>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>> >>>>>>>>>> I.e. almost half of the jobs haven't finished (according >>>>>>>>>> to swift) >>>>>>>>>> >>>>>>>>>> I also have some exceptions: >>>>>>>>>> >>>>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, >>>>>>>>>> identity=urn: >>>>>>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed >>>>>>>>>> Exception >>>>>>>>>> in getFile >>>>>>>>>> (80 of those): >>>>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>> 80 880 9705 >>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Nika >>>>>>>>>> >>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Swift-devel mailing list >>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>> >>> >>> > From iraicu at cs.uchicago.edu Wed Aug 8 14:20:48 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 08 Aug 2007 14:20:48 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1AE8D4D2-1A05-4666-B731-8A7840116064@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186596043.28685.2.camel@blabla.mcs.anl.gov> <46BA05A0.2070909@c s.uchicago.edu> <1186597145.29195.8.camel@blabla.mcs.anl.gov> <1AE8D4D2-1A05-4666-B731-8A7840116064@mcs.anl.gov> Message-ID: <46BA1790.2010901@cs.uchicago.edu> All my work was related to the deef-provider... I did not touch anything else! in the folder nefedova at viper:~/cogl/modules/provider-deef I did: cp yongs_source_files src/org/globus/cog/abstraction/impl/execution/deef/ svn update ant distclean ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ Now why would this screw up your logging or anything else in Swift? Unless it screwed something up in the deef-provider (which was already screwed up prior). Now, the message "booting deef" comes from Boot.java. This file was from SVN, as Mihael modified it a few days ago, so Yong's Boot.java was not carried over. Should I have used the older Boot.java (Yong's version from July 26th)? If this is not the issue, and its something else related to the deef-provider, you can find the old deef-provider that you had before at: viper:/home/nefedova/cogl/modules/provider-deef_8-8-07_svn Ioan PS: I don't have rights to commit changes to SVN, so if you don't want me to make any more changes to your Swift install, we can wait until I get the right to commit my changes so you can see them and pull them in yourself through SVN. Veronika Nefedova wrote: > the current changes screwed up my logging again... > Please, do not touch my install --- I'd rather get everything from SVN, > > nefedova at viper:~/alamines> swift -tc.file tc-uc.data -sites.file > sites-uc-64.xml -debug MolDyn-244-loops.swift& > [1] 10562 > nefedova at viper:~/alamines> WARN - Failed to configure log file name > DEBUG - Booting deef > > > Nika > > On Aug 8, 2007, at 1:19 PM, Mihael Hategan wrote: > >> On Wed, 2007-08-08 at 13:04 -0500, Ioan Raicu wrote: >>> Shouldn't we be certain that things work before we commit the changes? >> >> No. >> >>> I thought the commit would take place after we try MolDyn out and we >>> see things are back to normal. >> >> The whole problem we've seen the past few days was due to the fact that >> Nika had no clear place to get the code from, so she repeatedly ended up >> with broken versions. S o p u t t h e c h a n g e s i n S V N ! >> >>> >>> Ioan >>> >>> Mihael Hategan wrote: >>>> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: >>>> >>>>> OK everyone, I found Yong's version of the provider dated July 26th, >>>>> much more recent than what was in SVN on June 27th. I updated Nika's >>>>> version of the provider (which has been checked out of SVN), >>>>> >>>> >>>> No. P u t t h e c h a n g e s i n S V N ! >>>> >>>> >>>>> and recompiled&deploy! >>>>> >>>>> ant distclean >>>>> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ >>>>> dist >>>>> >>>>> I even updated updated some of the logging info to use the logger >>>>> (some were not using the logger). >>>>> >>>>> Nika, Falkon is freshly restarted and ready for another test run! >>>>> >>>>> Falkon Factory Service: >>>>> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService >>>>> >>>>> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>>> >>>>> Ioan >>>>> >>>>> Veronika Nefedova wrote: >>>>> >>>>>> Ioan, >>>>>> >>>>>> >>>>>> It looks like the Falcon (including provider-deef) was put in SVN on >>>>>> June 27th. You really were supposed to use the SVN code from that >>>>>> point. Sigh. Did you do any changes to viper install after June >>>>>> 27th? >>>>>> >>>>>> >>>>>> Nika >>>>>> >>>>>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: >>>>>> >>>>>> >>>>>>> Could it be that the fixes were done before the original SVN >>>>>>> checkin? If not, then at least we know why things aren't >>>>>>> working. I bet the latest provider source was in Nika's Swift >>>>>>> install on viper. Nika, I take it you don't have this anymore, as >>>>>>> SVN updates overwrote this. Yong, is there any other place you >>>>>>> might have the latest provider source? If not, I guess we need to >>>>>>> take another look through the provider source to fix the issues >>>>>>> that we knew of... >>>>>>> >>>>>>> Ioan >>>>>>> >>>>>>> Mihael Hategan wrote: >>>>>>> >>>>>>>> Well, it doesn't look like the falkon provider in SVN has been >>>>>>>> updated >>>>>>>> at all in terms of fixing synchronization issues. All commits on >>>>>>>> provider-deef come from either ben or me: >>>>>>>> >>>>>>>> bash-3.1$ svn log >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> >>>>>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 >>>>>>>> (Fri, 03 Aug >>>>>>>> 2007) | 1 line >>>>>>>> >>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> >>>>>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 >>>>>>>> (Fri, 03 Aug >>>>>>>> 2007) | 1 line >>>>>>>> >>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> >>>>>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, >>>>>>>> 03 Aug >>>>>>>> 2007) | 1 line >>>>>>>> >>>>>>>> a very small readme for provider-deef >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> >>>>>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, >>>>>>>> 27 Jun >>>>>>>> 2007) | 1 line >>>>>>>> >>>>>>>> remove dist directory form svn >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> >>>>>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, >>>>>>>> 27 Jun >>>>>>>> 2007) | 20 lines >>>>>>>> >>>>>>>> provider-deef, the Falkon/cog provider >>>>>>>> >>>>>>>> based on source in below message, with .class files deleted >>>>>>>> >>>>>>>> >>>>>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>>>>>>> From: Veronika Nefedova >>>>>>>> To: Yong Zhao >>>>>>>> Cc: Ben Clifford , Mihael Hategan >>>>>>>> , >>>>>>>> iraicu at cs.uchicago.edu, Ian Foster , >>>>>>>> Mike Wilde , >>>>>>>> Tiberiu Stef-Praun >>>>>>>> Subject: Re: 244 molecule MolDyn run... >>>>>>>> >>>>>>>> its on viper.uchicago.edu >>>>>>>> in : /home/nefedova/cogl/modules/provider-deef/ >>>>>>>> I also tared it up and put in my home on terminable: >>>>>>>> ~nefedova/cogl.tgz >>>>>>>> >>>>>>>> Nika >>>>>>>> >>>>>>>> >>>>>>>> ------------------------------------------------------------------------ >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Mihael, do you have any clues on why this run has failed? Ioan >>>>>>>>> - my >>>>>>>>> answers to your questions are below... >>>>>>>>> >>>>>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> It looks like viper (where Swift is running) is idle, and so >>>>>>>>>> is tg- >>>>>>>>>> viz-login2 (where Falkon is running). >>>>>>>>>> What looks evident to me is that the normal list of events is >>>>>>>>>> for a >>>>>>>>>> successful task: >>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>>>>>>> 0-1-73-2-31-0-0-1186444341989" >>>>>>>>>> MolDyn-244-loops-zhgo6be8tjhi1.log >>>>>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, >>>>>>>>>> identity=urn: >>>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>>>>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread >>>>>>>>>> notification: urn: >>>>>>>>>> 0-1-73-2-31-0-0-1186444341989 0 >>>>>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, >>>>>>>>>> identity=urn: >>>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>>>>>>>>> >>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>> 17566 175660 2179412 >>>>>>>>>> >>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread >>>>>>>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>> 7959 55713 785035 >>>>>>>>>> >>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>> 190968 1909680 24003796 >>>>>>>>>> >>>>>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were received >>>>>>>>>> from Falkon, and 190968 tasks were set to completed... >>>>>>>>>> >>>>>>>>>> Obviously this isn't right. Falkon only saw 7959 tasks, so I >>>>>>>>>> would >>>>>>>>>> argue that the # of notifications received is correct. The >>>>>>>>>> submitted # of tasks looks like the # I would have expected, but >>>>>>>>>> all the tasks did not make it to Falkon. The Falkon provider is >>>>>>>>>> what sits between the change of status to submitted, and the >>>>>>>>>> receipt of the notification, so I would say that is the first >>>>>>>>>> place >>>>>>>>>> we need to look for more details... there used to some extra >>>>>>>>>> debug >>>>>>>>>> info in the Falkon provider that simply printed all the tasks >>>>>>>>>> that >>>>>>>>>> were actually being submitted to Falkon (as opposed to just the >>>>>>>>>> change of status within Karajan). I don't see those debug >>>>>>>>>> statements, I bet they got overwritten in the SVN update. >>>>>>>>>> What about the completed tasks, why are there so many (190K) >>>>>>>>>> completed tasks? Where did they come from? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> "Task" doesn't mean job. It could be just data being staged in >>>>>>>>> , etc. >>>>>>>>> The first 2 are important -- (Submitted vs Completed). Since it >>>>>>>>> differs, this is the problem... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Yong, are you keeping up with these emails? Do you still have a >>>>>>>>>> copy of the latest Falkon provider that you edited just >>>>>>>>>> before you >>>>>>>>>> left? Can you just take a look through there to make sure >>>>>>>>>> nothing >>>>>>>>>> has been broken with the SVN updates? If you don't have time >>>>>>>>>> for >>>>>>>>>> this now (considering today was your first day on the new job), >>>>>>>>>> I'll dig through there and see if I can make some sense of >>>>>>>>>> what is >>>>>>>>>> happening! >>>>>>>>>> >>>>>>>>>> One last thing, Ben mentioned that the Falkon provider you >>>>>>>>>> saw in >>>>>>>>>> Nika's account was different than what was in SVN. Ben, did >>>>>>>>>> you at >>>>>>>>>> least look at modification dates? How old was one as opposed to >>>>>>>>>> the other? I hope we did not revert back to an older version >>>>>>>>>> that >>>>>>>>>> might have had some bug in it.... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> I had to update to the latest version of provider-deef from >>>>>>>>> SVN since >>>>>>>>> without the update nothing worked. The version I am at now is >>>>>>>>> 1050. >>>>>>>>> But this is exactly the same version of swift/deef I used for our >>>>>>>>> Friday run (which 'worked' from Falcon/Swift point of view) >>>>>>>>> >>>>>>>>> Nika >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Ioan >>>>>>>>>> >>>>>>>>>> Veronika Nefedova wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Well, there are some discrepancies: >>>>>>>>>>> >>>>>>>>>>> nefedova at viper:~/alamines> grep "Completed job" >>>>>>>>>>> MolDyn-244-loops- >>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>> 7959 244749 3241072 >>>>>>>>>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- >>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>> 17207 564648 7949388 >>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>> >>>>>>>>>>> I.e. almost half of the jobs haven't finished (according to >>>>>>>>>>> swift) >>>>>>>>>>> >>>>>>>>>>> I also have some exceptions: >>>>>>>>>>> >>>>>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, >>>>>>>>>>> identity=urn: >>>>>>>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed >>>>>>>>>>> Exception >>>>>>>>>>> in getFile >>>>>>>>>>> (80 of those): >>>>>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>> 80 880 9705 >>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Nika >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Swift-devel mailing list >>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Swift-devel mailing list >>>>>>> Swift-devel at ci.uchicago.edu >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>> >>>> >>>> >>>> >> > > From nefedova at mcs.anl.gov Wed Aug 8 14:52:55 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Wed, 8 Aug 2007 14:52:55 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BA1790.2010901@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186596043.28685.2.camel@blabla.mcs.anl.gov> <46BA05A0.2070909@c s.uchicago.edu> <1186597145.29195.8.camel@blabla.mcs.anl.gov> <1AE8D4D2-1A05-4666-B731-8A7840116064@mcs.anl.gov> <46BA1790.2010901@cs.uchicago.edu> Message-ID: <1A17265B-5F1A-4C56-B8B2-776E6D15DDE2@mcs.anl.gov> anyway - I fixed the log4j.properties file and started the run Nika On Aug 8, 2007, at 2:20 PM, Ioan Raicu wrote: > All my work was related to the deef-provider... I did not touch > anything else! > > in the folder > nefedova at viper:~/cogl/modules/provider-deef > > I did: > > cp yongs_source_files src/org/globus/cog/abstraction/impl/execution/ > deef/ > svn update > ant distclean > ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ > > Now why would this screw up your logging or anything else in > Swift? Unless it screwed something up in the deef-provider (which > was already screwed up prior). Now, the message "booting deef" > comes from Boot.java. This file was from SVN, as Mihael modified > it a few days ago, so Yong's Boot.java was not carried over. > Should I have used the older Boot.java (Yong's version from July > 26th)? If this is not the issue, and its something else related to > the deef-provider, you can find the old deef-provider that you had > before at: > viper:/home/nefedova/cogl/modules/provider-deef_8-8-07_svn > > Ioan > PS: I don't have rights to commit changes to SVN, so if you don't > want me to make any more changes to your Swift install, we can wait > until I get the right to commit my changes so you can see them and > pull them in yourself through SVN. > > Veronika Nefedova wrote: >> the current changes screwed up my logging again... >> Please, do not touch my install --- I'd rather get everything from >> SVN, >> >> nefedova at viper:~/alamines> swift -tc.file tc-uc.data -sites.file >> sites-uc-64.xml -debug MolDyn-244-loops.swift& >> [1] 10562 >> nefedova at viper:~/alamines> WARN - Failed to configure log file name >> DEBUG - Booting deef >> >> >> Nika >> >> On Aug 8, 2007, at 1:19 PM, Mihael Hategan wrote: >> >>> On Wed, 2007-08-08 at 13:04 -0500, Ioan Raicu wrote: >>>> Shouldn't we be certain that things work before we commit the >>>> changes? >>> >>> No. >>> >>>> I thought the commit would take place after we try MolDyn out >>>> and we >>>> see things are back to normal. >>> >>> The whole problem we've seen the past few days was due to the >>> fact that >>> Nika had no clear place to get the code from, so she repeatedly >>> ended up >>> with broken versions. S o p u t t h e c h a n g e s i n S V N ! >>> >>>> >>>> Ioan >>>> >>>> Mihael Hategan wrote: >>>>> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: >>>>> >>>>>> OK everyone, I found Yong's version of the provider dated July >>>>>> 26th, >>>>>> much more recent than what was in SVN on June 27th. I updated >>>>>> Nika's >>>>>> version of the provider (which has been checked out of SVN), >>>>>> >>>>> >>>>> No. P u t t h e c h a n g e s i n S V N ! >>>>> >>>>> >>>>>> and recompiled&deploy! >>>>>> >>>>>> ant distclean >>>>>> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/ >>>>>> vdsk-0.2-dev/ >>>>>> dist >>>>>> >>>>>> I even updated updated some of the logging info to use the logger >>>>>> (some were not using the logger). >>>>>> >>>>>> Nika, Falkon is freshly restarted and ready for another test run! >>>>>> >>>>>> Falkon Factory Service: >>>>>> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/ >>>>>> GenericPortal/core/WS/GPFactoryService >>>>>> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>>>> >>>>>> Ioan >>>>>> >>>>>> Veronika Nefedova wrote: >>>>>> >>>>>>> Ioan, >>>>>>> >>>>>>> >>>>>>> It looks like the Falcon (including provider-deef) was put in >>>>>>> SVN on >>>>>>> June 27th. You really were supposed to use the SVN code from >>>>>>> that >>>>>>> point. Sigh. Did you do any changes to viper install after June >>>>>>> 27th? >>>>>>> >>>>>>> >>>>>>> Nika >>>>>>> >>>>>>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: >>>>>>> >>>>>>> >>>>>>>> Could it be that the fixes were done before the original SVN >>>>>>>> checkin? If not, then at least we know why things aren't >>>>>>>> working. I bet the latest provider source was in Nika's Swift >>>>>>>> install on viper. Nika, I take it you don't have this >>>>>>>> anymore, as >>>>>>>> SVN updates overwrote this. Yong, is there any other place you >>>>>>>> might have the latest provider source? If not, I guess we >>>>>>>> need to >>>>>>>> take another look through the provider source to fix the issues >>>>>>>> that we knew of... >>>>>>>> >>>>>>>> Ioan >>>>>>>> >>>>>>>> Mihael Hategan wrote: >>>>>>>> >>>>>>>>> Well, it doesn't look like the falkon provider in SVN has >>>>>>>>> been updated >>>>>>>>> at all in terms of fixing synchronization issues. All >>>>>>>>> commits on >>>>>>>>> provider-deef come from either ben or me: >>>>>>>>> >>>>>>>>> bash-3.1$ svn log >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> ---------- >>>>>>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 >>>>>>>>> (Fri, 03 Aug >>>>>>>>> 2007) | 1 line >>>>>>>>> >>>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> ---------- >>>>>>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 >>>>>>>>> (Fri, 03 Aug >>>>>>>>> 2007) | 1 line >>>>>>>>> >>>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> ---------- >>>>>>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 >>>>>>>>> (Fri, 03 Aug >>>>>>>>> 2007) | 1 line >>>>>>>>> >>>>>>>>> a very small readme for provider-deef >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> ---------- >>>>>>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 >>>>>>>>> (Wed, 27 Jun >>>>>>>>> 2007) | 1 line >>>>>>>>> >>>>>>>>> remove dist directory form svn >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> ---------- >>>>>>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 >>>>>>>>> (Wed, 27 Jun >>>>>>>>> 2007) | 20 lines >>>>>>>>> >>>>>>>>> provider-deef, the Falkon/cog provider >>>>>>>>> >>>>>>>>> based on source in below message, with .class files deleted >>>>>>>>> >>>>>>>>> >>>>>>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>>>>>>>> From: Veronika Nefedova >>>>>>>>> To: Yong Zhao >>>>>>>>> Cc: Ben Clifford , Mihael Hategan >>>>>>>>> , >>>>>>>>> iraicu at cs.uchicago.edu, Ian Foster , >>>>>>>>> Mike Wilde , >>>>>>>>> Tiberiu Stef-Praun >>>>>>>>> Subject: Re: 244 molecule MolDyn run... >>>>>>>>> >>>>>>>>> its on viper.uchicago.edu >>>>>>>>> in : /home/nefedova/cogl/modules/provider-deef/ >>>>>>>>> I also tared it up and put in my home on terminable: >>>>>>>>> ~nefedova/cogl.tgz >>>>>>>>> >>>>>>>>> Nika >>>>>>>>> >>>>>>>>> >>>>>>>>> -------------------------------------------------------------- >>>>>>>>> ---------- >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Mihael, do you have any clues on why this run has failed? >>>>>>>>>> Ioan - my >>>>>>>>>> answers to your questions are below... >>>>>>>>>> >>>>>>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> It looks like viper (where Swift is running) is idle, and >>>>>>>>>>> so is tg- >>>>>>>>>>> viz-login2 (where Falkon is running). >>>>>>>>>>> What looks evident to me is that the normal list of >>>>>>>>>>> events is for a >>>>>>>>>>> successful task: >>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops- >>>>>>>>>>> zhgo6be8tjhi1.log >>>>>>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, >>>>>>>>>>> identity=urn: >>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>>>>>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread >>>>>>>>>>> notification: urn: >>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989 0 >>>>>>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, >>>>>>>>>>> identity=urn: >>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>>>>>>>>>> >>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting >>>>>>>>>>> status to >>>>>>>>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>> 17566 175660 2179412 >>>>>>>>>>> >>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep >>>>>>>>>>> "NotificationThread >>>>>>>>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>> 7959 55713 785035 >>>>>>>>>>> >>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting >>>>>>>>>>> status to >>>>>>>>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>> 190968 1909680 24003796 >>>>>>>>>>> >>>>>>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were >>>>>>>>>>> received >>>>>>>>>>> from Falkon, and 190968 tasks were set to completed... >>>>>>>>>>> >>>>>>>>>>> Obviously this isn't right. Falkon only saw 7959 tasks, >>>>>>>>>>> so I would >>>>>>>>>>> argue that the # of notifications received is correct. The >>>>>>>>>>> submitted # of tasks looks like the # I would have >>>>>>>>>>> expected, but >>>>>>>>>>> all the tasks did not make it to Falkon. The Falkon >>>>>>>>>>> provider is >>>>>>>>>>> what sits between the change of status to submitted, and the >>>>>>>>>>> receipt of the notification, so I would say that is the >>>>>>>>>>> first place >>>>>>>>>>> we need to look for more details... there used to some >>>>>>>>>>> extra debug >>>>>>>>>>> info in the Falkon provider that simply printed all the >>>>>>>>>>> tasks that >>>>>>>>>>> were actually being submitted to Falkon (as opposed to >>>>>>>>>>> just the >>>>>>>>>>> change of status within Karajan). I don't see those debug >>>>>>>>>>> statements, I bet they got overwritten in the SVN update. >>>>>>>>>>> What about the completed tasks, why are there so many (190K) >>>>>>>>>>> completed tasks? Where did they come from? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> "Task" doesn't mean job. It could be just data being >>>>>>>>>> staged in , etc. >>>>>>>>>> The first 2 are important -- (Submitted vs Completed). >>>>>>>>>> Since it >>>>>>>>>> differs, this is the problem... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Yong, are you keeping up with these emails? Do you still >>>>>>>>>>> have a >>>>>>>>>>> copy of the latest Falkon provider that you edited just >>>>>>>>>>> before you >>>>>>>>>>> left? Can you just take a look through there to make >>>>>>>>>>> sure nothing >>>>>>>>>>> has been broken with the SVN updates? If you don't have >>>>>>>>>>> time for >>>>>>>>>>> this now (considering today was your first day on the new >>>>>>>>>>> job), >>>>>>>>>>> I'll dig through there and see if I can make some sense >>>>>>>>>>> of what is >>>>>>>>>>> happening! >>>>>>>>>>> >>>>>>>>>>> One last thing, Ben mentioned that the Falkon provider >>>>>>>>>>> you saw in >>>>>>>>>>> Nika's account was different than what was in SVN. Ben, >>>>>>>>>>> did you at >>>>>>>>>>> least look at modification dates? How old was one as >>>>>>>>>>> opposed to >>>>>>>>>>> the other? I hope we did not revert back to an older >>>>>>>>>>> version that >>>>>>>>>>> might have had some bug in it.... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> I had to update to the latest version of provider-deef >>>>>>>>>> from SVN since >>>>>>>>>> without the update nothing worked. The version I am at now >>>>>>>>>> is 1050. >>>>>>>>>> But this is exactly the same version of swift/deef I used >>>>>>>>>> for our >>>>>>>>>> Friday run (which 'worked' from Falcon/Swift point of view) >>>>>>>>>> >>>>>>>>>> Nika >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Ioan >>>>>>>>>>> >>>>>>>>>>> Veronika Nefedova wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Well, there are some discrepancies: >>>>>>>>>>>> >>>>>>>>>>>> nefedova at viper:~/alamines> grep "Completed job" >>>>>>>>>>>> MolDyn-244-loops- >>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>> 7959 244749 3241072 >>>>>>>>>>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244- >>>>>>>>>>>> loops- >>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>> 17207 564648 7949388 >>>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>>> >>>>>>>>>>>> I.e. almost half of the jobs haven't finished (according >>>>>>>>>>>> to swift) >>>>>>>>>>>> >>>>>>>>>>>> I also have some exceptions: >>>>>>>>>>>> >>>>>>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, >>>>>>>>>>>> identity=urn: >>>>>>>>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed >>>>>>>>>>>> Exception >>>>>>>>>>>> in getFile >>>>>>>>>>>> (80 of those): >>>>>>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>> 80 880 9705 >>>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Nika >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Swift-devel mailing list >>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Swift-devel mailing list >>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>> >>>>> >>>>> >>>>> >>> >> >> > From iraicu at cs.uchicago.edu Wed Aug 8 15:35:45 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 08 Aug 2007 15:35:45 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1A17265B-5F1A-4C56-B8B2-776E6D15DDE2@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186596043.28685.2.camel@blabla.mcs.anl.gov> <46BA05A0.2070909@c s.uchicago.edu> <1186597145.29195.8.camel@blabla.mcs.anl.gov> <1AE8D4D2-1A05-4666-B731-8A7840116064@mcs.anl.gov> <46BA1790.2010901@cs.uc hicago.edu> <1A17265B-5F1A-4C56-B8B2-776E6D15DDE2@mcs.anl.gov> Message-ID: <46BA2921.3020402@cs.uchicago.edu> Did you try just a small workflow to test? It looks to be idle 13014.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0 13015.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0 13016.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0 13017.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0 13018.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0 13019.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0 with 489 jobs completed... is this normal? Veronika Nefedova wrote: > anyway - I fixed the log4j.properties file and started the run > > Nika > > On Aug 8, 2007, at 2:20 PM, Ioan Raicu wrote: > >> All my work was related to the deef-provider... I did not touch >> anything else! >> >> in the folder >> nefedova at viper:~/cogl/modules/provider-deef >> >> I did: >> >> cp yongs_source_files >> src/org/globus/cog/abstraction/impl/execution/deef/ >> svn update >> ant distclean >> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ >> >> Now why would this screw up your logging or anything else in Swift? >> Unless it screwed something up in the deef-provider (which was >> already screwed up prior). Now, the message "booting deef" comes >> from Boot.java. This file was from SVN, as Mihael modified it a few >> days ago, so Yong's Boot.java was not carried over. Should I have >> used the older Boot.java (Yong's version from July 26th)? If this is >> not the issue, and its something else related to the deef-provider, >> you can find the old deef-provider that you had before at: >> viper:/home/nefedova/cogl/modules/provider-deef_8-8-07_svn >> >> Ioan >> PS: I don't have rights to commit changes to SVN, so if you don't >> want me to make any more changes to your Swift install, we can wait >> until I get the right to commit my changes so you can see them and >> pull them in yourself through SVN. >> >> Veronika Nefedova wrote: >>> the current changes screwed up my logging again... >>> Please, do not touch my install --- I'd rather get everything from SVN, >>> >>> nefedova at viper:~/alamines> swift -tc.file tc-uc.data -sites.file >>> sites-uc-64.xml -debug MolDyn-244-loops.swift& >>> [1] 10562 >>> nefedova at viper:~/alamines> WARN - Failed to configure log file name >>> DEBUG - Booting deef >>> >>> >>> Nika >>> >>> On Aug 8, 2007, at 1:19 PM, Mihael Hategan wrote: >>> >>>> On Wed, 2007-08-08 at 13:04 -0500, Ioan Raicu wrote: >>>>> Shouldn't we be certain that things work before we commit the >>>>> changes? >>>> >>>> No. >>>> >>>>> I thought the commit would take place after we try MolDyn out >>>>> and we >>>>> see things are back to normal. >>>> >>>> The whole problem we've seen the past few days was due to the fact >>>> that >>>> Nika had no clear place to get the code from, so she repeatedly >>>> ended up >>>> with broken versions. S o p u t t h e c h a n g e s i n S V N ! >>>> >>>>> >>>>> Ioan >>>>> >>>>> Mihael Hategan wrote: >>>>>> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: >>>>>> >>>>>>> OK everyone, I found Yong's version of the provider dated July >>>>>>> 26th, >>>>>>> much more recent than what was in SVN on June 27th. I updated >>>>>>> Nika's >>>>>>> version of the provider (which has been checked out of SVN), >>>>>>> >>>>>> >>>>>> No. P u t t h e c h a n g e s i n S V N ! >>>>>> >>>>>> >>>>>>> and recompiled&deploy! >>>>>>> >>>>>>> ant distclean >>>>>>> ant >>>>>>> -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ >>>>>>> dist >>>>>>> >>>>>>> I even updated updated some of the logging info to use the logger >>>>>>> (some were not using the logger). >>>>>>> >>>>>>> Nika, Falkon is freshly restarted and ready for another test run! >>>>>>> >>>>>>> Falkon Factory Service: >>>>>>> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService >>>>>>> >>>>>>> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>>>>> >>>>>>> Ioan >>>>>>> >>>>>>> Veronika Nefedova wrote: >>>>>>> >>>>>>>> Ioan, >>>>>>>> >>>>>>>> >>>>>>>> It looks like the Falcon (including provider-deef) was put in >>>>>>>> SVN on >>>>>>>> June 27th. You really were supposed to use the SVN code from that >>>>>>>> point. Sigh. Did you do any changes to viper install after June >>>>>>>> 27th? >>>>>>>> >>>>>>>> >>>>>>>> Nika >>>>>>>> >>>>>>>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Could it be that the fixes were done before the original SVN >>>>>>>>> checkin? If not, then at least we know why things aren't >>>>>>>>> working. I bet the latest provider source was in Nika's Swift >>>>>>>>> install on viper. Nika, I take it you don't have this >>>>>>>>> anymore, as >>>>>>>>> SVN updates overwrote this. Yong, is there any other place you >>>>>>>>> might have the latest provider source? If not, I guess we >>>>>>>>> need to >>>>>>>>> take another look through the provider source to fix the issues >>>>>>>>> that we knew of... >>>>>>>>> >>>>>>>>> Ioan >>>>>>>>> >>>>>>>>> Mihael Hategan wrote: >>>>>>>>> >>>>>>>>>> Well, it doesn't look like the falkon provider in SVN has >>>>>>>>>> been updated >>>>>>>>>> at all in terms of fixing synchronization issues. All commits on >>>>>>>>>> provider-deef come from either ben or me: >>>>>>>>>> >>>>>>>>>> bash-3.1$ svn log >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> >>>>>>>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 >>>>>>>>>> (Fri, 03 Aug >>>>>>>>>> 2007) | 1 line >>>>>>>>>> >>>>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> >>>>>>>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 >>>>>>>>>> (Fri, 03 Aug >>>>>>>>>> 2007) | 1 line >>>>>>>>>> >>>>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> >>>>>>>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 >>>>>>>>>> (Fri, 03 Aug >>>>>>>>>> 2007) | 1 line >>>>>>>>>> >>>>>>>>>> a very small readme for provider-deef >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> >>>>>>>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, >>>>>>>>>> 27 Jun >>>>>>>>>> 2007) | 1 line >>>>>>>>>> >>>>>>>>>> remove dist directory form svn >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> >>>>>>>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, >>>>>>>>>> 27 Jun >>>>>>>>>> 2007) | 20 lines >>>>>>>>>> >>>>>>>>>> provider-deef, the Falkon/cog provider >>>>>>>>>> >>>>>>>>>> based on source in below message, with .class files deleted >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>>>>>>>>> From: Veronika Nefedova >>>>>>>>>> To: Yong Zhao >>>>>>>>>> Cc: Ben Clifford , Mihael Hategan >>>>>>>>>> , >>>>>>>>>> iraicu at cs.uchicago.edu, Ian Foster , >>>>>>>>>> Mike Wilde , >>>>>>>>>> Tiberiu Stef-Praun >>>>>>>>>> Subject: Re: 244 molecule MolDyn run... >>>>>>>>>> >>>>>>>>>> its on viper.uchicago.edu >>>>>>>>>> in : /home/nefedova/cogl/modules/provider-deef/ >>>>>>>>>> I also tared it up and put in my home on terminable: >>>>>>>>>> ~nefedova/cogl.tgz >>>>>>>>>> >>>>>>>>>> Nika >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Mihael, do you have any clues on why this run has failed? >>>>>>>>>>> Ioan - my >>>>>>>>>>> answers to your questions are below... >>>>>>>>>>> >>>>>>>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> It looks like viper (where Swift is running) is idle, and >>>>>>>>>>>> so is tg- >>>>>>>>>>>> viz-login2 (where Falkon is running). >>>>>>>>>>>> What looks evident to me is that the normal list of events >>>>>>>>>>>> is for a >>>>>>>>>>>> successful task: >>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989" >>>>>>>>>>>> MolDyn-244-loops-zhgo6be8tjhi1.log >>>>>>>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, >>>>>>>>>>>> identity=urn: >>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>>>>>>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread >>>>>>>>>>>> notification: urn: >>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989 0 >>>>>>>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, >>>>>>>>>>>> identity=urn: >>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>>>>>>>>>>> >>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>>>>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>>> 17566 175660 2179412 >>>>>>>>>>>> >>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread >>>>>>>>>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>>> 7959 55713 785035 >>>>>>>>>>>> >>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>>>>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>>> 190968 1909680 24003796 >>>>>>>>>>>> >>>>>>>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were >>>>>>>>>>>> received >>>>>>>>>>>> from Falkon, and 190968 tasks were set to completed... >>>>>>>>>>>> >>>>>>>>>>>> Obviously this isn't right. Falkon only saw 7959 tasks, so >>>>>>>>>>>> I would >>>>>>>>>>>> argue that the # of notifications received is correct. The >>>>>>>>>>>> submitted # of tasks looks like the # I would have >>>>>>>>>>>> expected, but >>>>>>>>>>>> all the tasks did not make it to Falkon. The Falkon >>>>>>>>>>>> provider is >>>>>>>>>>>> what sits between the change of status to submitted, and the >>>>>>>>>>>> receipt of the notification, so I would say that is the >>>>>>>>>>>> first place >>>>>>>>>>>> we need to look for more details... there used to some >>>>>>>>>>>> extra debug >>>>>>>>>>>> info in the Falkon provider that simply printed all the >>>>>>>>>>>> tasks that >>>>>>>>>>>> were actually being submitted to Falkon (as opposed to just >>>>>>>>>>>> the >>>>>>>>>>>> change of status within Karajan). I don't see those debug >>>>>>>>>>>> statements, I bet they got overwritten in the SVN update. >>>>>>>>>>>> What about the completed tasks, why are there so many (190K) >>>>>>>>>>>> completed tasks? Where did they come from? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> "Task" doesn't mean job. It could be just data being staged >>>>>>>>>>> in , etc. >>>>>>>>>>> The first 2 are important -- (Submitted vs Completed). Since it >>>>>>>>>>> differs, this is the problem... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Yong, are you keeping up with these emails? Do you still >>>>>>>>>>>> have a >>>>>>>>>>>> copy of the latest Falkon provider that you edited just >>>>>>>>>>>> before you >>>>>>>>>>>> left? Can you just take a look through there to make sure >>>>>>>>>>>> nothing >>>>>>>>>>>> has been broken with the SVN updates? If you don't have >>>>>>>>>>>> time for >>>>>>>>>>>> this now (considering today was your first day on the new >>>>>>>>>>>> job), >>>>>>>>>>>> I'll dig through there and see if I can make some sense of >>>>>>>>>>>> what is >>>>>>>>>>>> happening! >>>>>>>>>>>> >>>>>>>>>>>> One last thing, Ben mentioned that the Falkon provider you >>>>>>>>>>>> saw in >>>>>>>>>>>> Nika's account was different than what was in SVN. Ben, >>>>>>>>>>>> did you at >>>>>>>>>>>> least look at modification dates? How old was one as >>>>>>>>>>>> opposed to >>>>>>>>>>>> the other? I hope we did not revert back to an older >>>>>>>>>>>> version that >>>>>>>>>>>> might have had some bug in it.... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> I had to update to the latest version of provider-deef from >>>>>>>>>>> SVN since >>>>>>>>>>> without the update nothing worked. The version I am at now >>>>>>>>>>> is 1050. >>>>>>>>>>> But this is exactly the same version of swift/deef I used >>>>>>>>>>> for our >>>>>>>>>>> Friday run (which 'worked' from Falcon/Swift point of view) >>>>>>>>>>> >>>>>>>>>>> Nika >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Ioan >>>>>>>>>>>> >>>>>>>>>>>> Veronika Nefedova wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Well, there are some discrepancies: >>>>>>>>>>>>> >>>>>>>>>>>>> nefedova at viper:~/alamines> grep "Completed job" >>>>>>>>>>>>> MolDyn-244-loops- >>>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>>> 7959 244749 3241072 >>>>>>>>>>>>> nefedova at viper:~/alamines> grep "Running job" >>>>>>>>>>>>> MolDyn-244-loops- >>>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>>> 17207 564648 7949388 >>>>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>>>> >>>>>>>>>>>>> I.e. almost half of the jobs haven't finished (according >>>>>>>>>>>>> to swift) >>>>>>>>>>>>> >>>>>>>>>>>>> I also have some exceptions: >>>>>>>>>>>>> >>>>>>>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, >>>>>>>>>>>>> identity=urn: >>>>>>>>>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed >>>>>>>>>>>>> Exception >>>>>>>>>>>>> in getFile >>>>>>>>>>>>> (80 of those): >>>>>>>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>>> 80 880 9705 >>>>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Nika >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Swift-devel mailing list >>>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Swift-devel mailing list >>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>> >>>>>> >>>>>> >>>>>> >>>> >>> >>> >> > > From nefedova at mcs.anl.gov Wed Aug 8 15:45:52 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Wed, 8 Aug 2007 15:45:52 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BA2921.3020402@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186596043.28685.2.camel@blabla.mcs.anl.gov> <46BA05A0.2070909@c s.uchicago.edu> <1186597145.29195.8.camel@blabla.mcs.anl.gov> <1AE8D4D2-1A05-4666-B731-8A7840116064@mcs.anl.gov> <46BA1790.2010901@cs.uc hicago.edu> <1A17265B-5F1A-4C56-B8B2-776E6D15DDE2@mcs.anl.gov> <46BA2921.3020402@cs.uchicago.edu> Message-ID: <8F79B5CC-AB60-4250-92CF-A3C8B15ACE92@mcs.anl.gov> nope, its a 244-mol workflow. I have no errors or exceptions in the log. nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- p2p6vy21s5fj0.log | wc 247 6411 45191 nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- p2p6vy21s5fj0.log | wc 489 13923 149727 nefedova at viper:~/alamines> grep "xception" MolDyn-244-loops- p2p6vy21s5fj0.log | wc 0 0 0 So I guess something else is wrong here? Nika On Aug 8, 2007, at 3:35 PM, Ioan Raicu wrote: > Did you try just a small workflow to test? It looks to be idle > > 13014.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0 > 13015.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0 > 13016.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0 > 13017.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0 > 13018.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0 > 13019.996 0 1 42 188 188 0 0 0.0 0.0 0.0 0.0 489.0 0.0 > > with 489 jobs completed... is this normal? > > Veronika Nefedova wrote: >> anyway - I fixed the log4j.properties file and started the run >> >> Nika >> >> On Aug 8, 2007, at 2:20 PM, Ioan Raicu wrote: >> >>> All my work was related to the deef-provider... I did not touch >>> anything else! >>> >>> in the folder >>> nefedova at viper:~/cogl/modules/provider-deef >>> >>> I did: >>> >>> cp yongs_source_files src/org/globus/cog/abstraction/impl/ >>> execution/deef/ >>> svn update >>> ant distclean >>> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ >>> >>> Now why would this screw up your logging or anything else in >>> Swift? Unless it screwed something up in the deef-provider >>> (which was already screwed up prior). Now, the message "booting >>> deef" comes from Boot.java. This file was from SVN, as Mihael >>> modified it a few days ago, so Yong's Boot.java was not carried >>> over. Should I have used the older Boot.java (Yong's version >>> from July 26th)? If this is not the issue, and its something >>> else related to the deef-provider, you can find the old deef- >>> provider that you had before at: >>> viper:/home/nefedova/cogl/modules/provider-deef_8-8-07_svn >>> >>> Ioan >>> PS: I don't have rights to commit changes to SVN, so if you don't >>> want me to make any more changes to your Swift install, we can >>> wait until I get the right to commit my changes so you can see >>> them and pull them in yourself through SVN. >>> >>> Veronika Nefedova wrote: >>>> the current changes screwed up my logging again... >>>> Please, do not touch my install --- I'd rather get everything >>>> from SVN, >>>> >>>> nefedova at viper:~/alamines> swift -tc.file tc-uc.data -sites.file >>>> sites-uc-64.xml -debug MolDyn-244-loops.swift& >>>> [1] 10562 >>>> nefedova at viper:~/alamines> WARN - Failed to configure log file >>>> name >>>> DEBUG - Booting deef >>>> >>>> >>>> Nika >>>> >>>> On Aug 8, 2007, at 1:19 PM, Mihael Hategan wrote: >>>> >>>>> On Wed, 2007-08-08 at 13:04 -0500, Ioan Raicu wrote: >>>>>> Shouldn't we be certain that things work before we commit the >>>>>> changes? >>>>> >>>>> No. >>>>> >>>>>> I thought the commit would take place after we try MolDyn >>>>>> out and we >>>>>> see things are back to normal. >>>>> >>>>> The whole problem we've seen the past few days was due to the >>>>> fact that >>>>> Nika had no clear place to get the code from, so she repeatedly >>>>> ended up >>>>> with broken versions. S o p u t t h e c h a n g e s i n S >>>>> V N ! >>>>> >>>>>> >>>>>> Ioan >>>>>> >>>>>> Mihael Hategan wrote: >>>>>>> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: >>>>>>> >>>>>>>> OK everyone, I found Yong's version of the provider dated >>>>>>>> July 26th, >>>>>>>> much more recent than what was in SVN on June 27th. I >>>>>>>> updated Nika's >>>>>>>> version of the provider (which has been checked out of SVN), >>>>>>>> >>>>>>> >>>>>>> No. P u t t h e c h a n g e s i n S V N ! >>>>>>> >>>>>>> >>>>>>>> and recompiled&deploy! >>>>>>>> >>>>>>>> ant distclean >>>>>>>> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/ >>>>>>>> vdsk-0.2-dev/ >>>>>>>> dist >>>>>>>> >>>>>>>> I even updated updated some of the logging info to use the >>>>>>>> logger >>>>>>>> (some were not using the logger). >>>>>>>> >>>>>>>> Nika, Falkon is freshly restarted and ready for another test >>>>>>>> run! >>>>>>>> >>>>>>>> Falkon Factory Service: >>>>>>>> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/ >>>>>>>> GenericPortal/core/WS/GPFactoryService >>>>>>>> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/ >>>>>>>> index.htm >>>>>>>> >>>>>>>> Ioan >>>>>>>> >>>>>>>> Veronika Nefedova wrote: >>>>>>>> >>>>>>>>> Ioan, >>>>>>>>> >>>>>>>>> >>>>>>>>> It looks like the Falcon (including provider-deef) was put >>>>>>>>> in SVN on >>>>>>>>> June 27th. You really were supposed to use the SVN code >>>>>>>>> from that >>>>>>>>> point. Sigh. Did you do any changes to viper install after >>>>>>>>> June >>>>>>>>> 27th? >>>>>>>>> >>>>>>>>> >>>>>>>>> Nika >>>>>>>>> >>>>>>>>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Could it be that the fixes were done before the original SVN >>>>>>>>>> checkin? If not, then at least we know why things aren't >>>>>>>>>> working. I bet the latest provider source was in Nika's >>>>>>>>>> Swift >>>>>>>>>> install on viper. Nika, I take it you don't have this >>>>>>>>>> anymore, as >>>>>>>>>> SVN updates overwrote this. Yong, is there any other >>>>>>>>>> place you >>>>>>>>>> might have the latest provider source? If not, I guess we >>>>>>>>>> need to >>>>>>>>>> take another look through the provider source to fix the >>>>>>>>>> issues >>>>>>>>>> that we knew of... >>>>>>>>>> >>>>>>>>>> Ioan >>>>>>>>>> >>>>>>>>>> Mihael Hategan wrote: >>>>>>>>>> >>>>>>>>>>> Well, it doesn't look like the falkon provider in SVN has >>>>>>>>>>> been updated >>>>>>>>>>> at all in terms of fixing synchronization issues. All >>>>>>>>>>> commits on >>>>>>>>>>> provider-deef come from either ben or me: >>>>>>>>>>> >>>>>>>>>>> bash-3.1$ svn log >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> ------------ >>>>>>>>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 >>>>>>>>>>> -0500 (Fri, 03 Aug >>>>>>>>>>> 2007) | 1 line >>>>>>>>>>> >>>>>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> ------------ >>>>>>>>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 >>>>>>>>>>> -0500 (Fri, 03 Aug >>>>>>>>>>> 2007) | 1 line >>>>>>>>>>> >>>>>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> ------------ >>>>>>>>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 >>>>>>>>>>> (Fri, 03 Aug >>>>>>>>>>> 2007) | 1 line >>>>>>>>>>> >>>>>>>>>>> a very small readme for provider-deef >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> ------------ >>>>>>>>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 >>>>>>>>>>> (Wed, 27 Jun >>>>>>>>>>> 2007) | 1 line >>>>>>>>>>> >>>>>>>>>>> remove dist directory form svn >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> ------------ >>>>>>>>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 >>>>>>>>>>> (Wed, 27 Jun >>>>>>>>>>> 2007) | 20 lines >>>>>>>>>>> >>>>>>>>>>> provider-deef, the Falkon/cog provider >>>>>>>>>>> >>>>>>>>>>> based on source in below message, with .class files deleted >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>>>>>>>>>> From: Veronika Nefedova >>>>>>>>>>> To: Yong Zhao >>>>>>>>>>> Cc: Ben Clifford , Mihael Hategan >>>>>>>>>>> , >>>>>>>>>>> iraicu at cs.uchicago.edu, Ian Foster , >>>>>>>>>>> Mike Wilde , >>>>>>>>>>> Tiberiu Stef-Praun >>>>>>>>>>> Subject: Re: 244 molecule MolDyn run... >>>>>>>>>>> >>>>>>>>>>> its on viper.uchicago.edu >>>>>>>>>>> in : /home/nefedova/cogl/modules/provider-deef/ >>>>>>>>>>> I also tared it up and put in my home on terminable: >>>>>>>>>>> ~nefedova/cogl.tgz >>>>>>>>>>> >>>>>>>>>>> Nika >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------ >>>>>>>>>>> ------------ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Mihael, do you have any clues on why this run has >>>>>>>>>>>> failed? Ioan - my >>>>>>>>>>>> answers to your questions are below... >>>>>>>>>>>> >>>>>>>>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> It looks like viper (where Swift is running) is idle, >>>>>>>>>>>>> and so is tg- >>>>>>>>>>>>> viz-login2 (where Falkon is running). >>>>>>>>>>>>> What looks evident to me is that the normal list of >>>>>>>>>>>>> events is for a >>>>>>>>>>>>> successful task: >>>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops- >>>>>>>>>>>>> zhgo6be8tjhi1.log >>>>>>>>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, >>>>>>>>>>>>> identity=urn: >>>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>>>>>>>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread >>>>>>>>>>>>> notification: urn: >>>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989 0 >>>>>>>>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, >>>>>>>>>>>>> identity=urn: >>>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>>>>>>>>>>>> >>>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting >>>>>>>>>>>>> status to >>>>>>>>>>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>>>> 17566 175660 2179412 >>>>>>>>>>>>> >>>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep >>>>>>>>>>>>> "NotificationThread >>>>>>>>>>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>>>> 7959 55713 785035 >>>>>>>>>>>>> >>>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting >>>>>>>>>>>>> status to >>>>>>>>>>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>>>> 190968 1909680 24003796 >>>>>>>>>>>>> >>>>>>>>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were >>>>>>>>>>>>> received >>>>>>>>>>>>> from Falkon, and 190968 tasks were set to completed... >>>>>>>>>>>>> >>>>>>>>>>>>> Obviously this isn't right. Falkon only saw 7959 >>>>>>>>>>>>> tasks, so I would >>>>>>>>>>>>> argue that the # of notifications received is correct. >>>>>>>>>>>>> The >>>>>>>>>>>>> submitted # of tasks looks like the # I would have >>>>>>>>>>>>> expected, but >>>>>>>>>>>>> all the tasks did not make it to Falkon. The Falkon >>>>>>>>>>>>> provider is >>>>>>>>>>>>> what sits between the change of status to submitted, >>>>>>>>>>>>> and the >>>>>>>>>>>>> receipt of the notification, so I would say that is the >>>>>>>>>>>>> first place >>>>>>>>>>>>> we need to look for more details... there used to some >>>>>>>>>>>>> extra debug >>>>>>>>>>>>> info in the Falkon provider that simply printed all the >>>>>>>>>>>>> tasks that >>>>>>>>>>>>> were actually being submitted to Falkon (as opposed to >>>>>>>>>>>>> just the >>>>>>>>>>>>> change of status within Karajan). I don't see those debug >>>>>>>>>>>>> statements, I bet they got overwritten in the SVN update. >>>>>>>>>>>>> What about the completed tasks, why are there so many >>>>>>>>>>>>> (190K) >>>>>>>>>>>>> completed tasks? Where did they come from? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> "Task" doesn't mean job. It could be just data being >>>>>>>>>>>> staged in , etc. >>>>>>>>>>>> The first 2 are important -- (Submitted vs Completed). >>>>>>>>>>>> Since it >>>>>>>>>>>> differs, this is the problem... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Yong, are you keeping up with these emails? Do you >>>>>>>>>>>>> still have a >>>>>>>>>>>>> copy of the latest Falkon provider that you edited just >>>>>>>>>>>>> before you >>>>>>>>>>>>> left? Can you just take a look through there to make >>>>>>>>>>>>> sure nothing >>>>>>>>>>>>> has been broken with the SVN updates? If you don't >>>>>>>>>>>>> have time for >>>>>>>>>>>>> this now (considering today was your first day on the >>>>>>>>>>>>> new job), >>>>>>>>>>>>> I'll dig through there and see if I can make some sense >>>>>>>>>>>>> of what is >>>>>>>>>>>>> happening! >>>>>>>>>>>>> >>>>>>>>>>>>> One last thing, Ben mentioned that the Falkon provider >>>>>>>>>>>>> you saw in >>>>>>>>>>>>> Nika's account was different than what was in SVN. >>>>>>>>>>>>> Ben, did you at >>>>>>>>>>>>> least look at modification dates? How old was one as >>>>>>>>>>>>> opposed to >>>>>>>>>>>>> the other? I hope we did not revert back to an older >>>>>>>>>>>>> version that >>>>>>>>>>>>> might have had some bug in it.... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> I had to update to the latest version of provider-deef >>>>>>>>>>>> from SVN since >>>>>>>>>>>> without the update nothing worked. The version I am at >>>>>>>>>>>> now is 1050. >>>>>>>>>>>> But this is exactly the same version of swift/deef I >>>>>>>>>>>> used for our >>>>>>>>>>>> Friday run (which 'worked' from Falcon/Swift point of view) >>>>>>>>>>>> >>>>>>>>>>>> Nika >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Ioan >>>>>>>>>>>>> >>>>>>>>>>>>> Veronika Nefedova wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>>> Well, there are some discrepancies: >>>>>>>>>>>>>> >>>>>>>>>>>>>> nefedova at viper:~/alamines> grep "Completed job" >>>>>>>>>>>>>> MolDyn-244-loops- >>>>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>>>> 7959 244749 3241072 >>>>>>>>>>>>>> nefedova at viper:~/alamines> grep "Running job" >>>>>>>>>>>>>> MolDyn-244-loops- >>>>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>>>> 17207 564648 7949388 >>>>>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I.e. almost half of the jobs haven't finished >>>>>>>>>>>>>> (according to swift) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I also have some exceptions: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, >>>>>>>>>>>>>> identity=urn: >>>>>>>>>>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to >>>>>>>>>>>>>> Failed Exception >>>>>>>>>>>>>> in getFile >>>>>>>>>>>>>> (80 of those): >>>>>>>>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>>>> 80 880 9705 >>>>>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Nika >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Swift-devel mailing list >>>>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Swift-devel mailing list >>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>> >>>> >>> >> >> > From hategan at mcs.anl.gov Wed Aug 8 17:07:50 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 08 Aug 2007 17:07:50 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46B9F66E.5060103@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <10C2510A-1E5E-43DE-A7CD-32F3B1F62EE2@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> Message-ID: <1186610870.6478.0.camel@blabla.mcs.anl.gov> Where exactly is this version? On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: > OK everyone, I found Yong's version of the provider dated July 26th, > much more recent than what was in SVN on June 27th. I updated Nika's > version of the provider (which has been checked out of SVN), and > recompiled&deploy! > > ant distclean > ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ > dist > > I even updated updated some of the logging info to use the logger > (some were not using the logger). > > Nika, Falkon is freshly restarted and ready for another test run! > > Falkon Factory Service: > http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService > Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm > > Ioan > > Veronika Nefedova wrote: > > Ioan, > > > > > > It looks like the Falcon (including provider-deef) was put in SVN on > > June 27th. You really were supposed to use the SVN code from that > > point. Sigh. Did you do any changes to viper install after June > > 27th? > > > > > > Nika > > > > On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: > > > > > Could it be that the fixes were done before the original SVN > > > checkin? If not, then at least we know why things aren't > > > working. I bet the latest provider source was in Nika's Swift > > > install on viper. Nika, I take it you don't have this anymore, as > > > SVN updates overwrote this. Yong, is there any other place you > > > might have the latest provider source? If not, I guess we need to > > > take another look through the provider source to fix the issues > > > that we knew of... > > > > > > Ioan > > > > > > Mihael Hategan wrote: > > > > Well, it doesn't look like the falkon provider in SVN has been updated > > > > at all in terms of fixing synchronization issues. All commits on > > > > provider-deef come from either ben or me: > > > > > > > > bash-3.1$ svn log > > > > ------------------------------------------------------------------------ > > > > r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, 03 Aug > > > > 2007) | 1 line > > > > > > > > removed gt4 stuff and added them as a dependency > > > > ------------------------------------------------------------------------ > > > > r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, 03 Aug > > > > 2007) | 1 line > > > > > > > > removed gt4 stuff and added them as a dependency > > > > ------------------------------------------------------------------------ > > > > r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug > > > > 2007) | 1 line > > > > > > > > a very small readme for provider-deef > > > > ------------------------------------------------------------------------ > > > > r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun > > > > 2007) | 1 line > > > > > > > > remove dist directory form svn > > > > ------------------------------------------------------------------------ > > > > r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun > > > > 2007) | 20 lines > > > > > > > > provider-deef, the Falkon/cog provider > > > > > > > > based on source in below message, with .class files deleted > > > > > > > > > > > > Date: Wed, 27 Jun 2007 09:27:23 -0500 > > > > From: Veronika Nefedova > > > > To: Yong Zhao > > > > Cc: Ben Clifford , Mihael Hategan > > > > , > > > > iraicu at cs.uchicago.edu, Ian Foster , > > > > Mike Wilde , > > > > Tiberiu Stef-Praun > > > > Subject: Re: 244 molecule MolDyn run... > > > > > > > > its on viper.uchicago.edu > > > > in : /home/nefedova/cogl/modules/provider-deef/ > > > > I also tared it up and put in my home on terminable: ~nefedova/cogl.tgz > > > > > > > > Nika > > > > > > > > > > > > ------------------------------------------------------------------------ > > > > > > > > > > > > On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: > > > > > > > > > Mihael, do you have any clues on why this run has failed? Ioan - my > > > > > answers to your questions are below... > > > > > > > > > > On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: > > > > > > > > > > > > > > > > It looks like viper (where Swift is running) is idle, and so is tg- > > > > > > viz-login2 (where Falkon is running). > > > > > > What looks evident to me is that the normal list of events is for a > > > > > > successful task: > > > > > > iraicu at viper:/home/nefedova/alamines> grep "urn: > > > > > > 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log > > > > > > 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: > > > > > > 0-1-73-2-31-0-0-1186444341989) setting status to Submitted > > > > > > 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: > > > > > > 0-1-73-2-31-0-0-1186444341989 0 > > > > > > 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: > > > > > > 0-1-73-2-31-0-0-1186444341989) setting status to Completed > > > > > > > > > > > > iraicu at viper:/home/nefedova/alamines> grep "setting status to > > > > > > Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > > > > > 17566 175660 2179412 > > > > > > > > > > > > iraicu at viper:/home/nefedova/alamines> grep "NotificationThread > > > > > > notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > > > > > 7959 55713 785035 > > > > > > > > > > > > iraicu at viper:/home/nefedova/alamines> grep "setting status to > > > > > > Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > > > > > > 190968 1909680 24003796 > > > > > > > > > > > > Now, 17566 tasks were submitted, 7959 notifiation were received > > > > > > from Falkon, and 190968 tasks were set to completed... > > > > > > > > > > > > Obviously this isn't right. Falkon only saw 7959 tasks, so I would > > > > > > argue that the # of notifications received is correct. The > > > > > > submitted # of tasks looks like the # I would have expected, but > > > > > > all the tasks did not make it to Falkon. The Falkon provider is > > > > > > what sits between the change of status to submitted, and the > > > > > > receipt of the notification, so I would say that is the first place > > > > > > we need to look for more details... there used to some extra debug > > > > > > info in the Falkon provider that simply printed all the tasks that > > > > > > were actually being submitted to Falkon (as opposed to just the > > > > > > change of status within Karajan). I don't see those debug > > > > > > statements, I bet they got overwritten in the SVN update. > > > > > > What about the completed tasks, why are there so many (190K) > > > > > > completed tasks? Where did they come from? > > > > > > > > > > > > > > > > > "Task" doesn't mean job. It could be just data being staged in , etc. > > > > > The first 2 are important -- (Submitted vs Completed). Since it > > > > > differs, this is the problem... > > > > > > > > > > > > > > > > > > > > > Yong, are you keeping up with these emails? Do you still have a > > > > > > copy of the latest Falkon provider that you edited just before you > > > > > > left? Can you just take a look through there to make sure nothing > > > > > > has been broken with the SVN updates? If you don't have time for > > > > > > this now (considering today was your first day on the new job), > > > > > > I'll dig through there and see if I can make some sense of what is > > > > > > happening! > > > > > > > > > > > > One last thing, Ben mentioned that the Falkon provider you saw in > > > > > > Nika's account was different than what was in SVN. Ben, did you at > > > > > > least look at modification dates? How old was one as opposed to > > > > > > the other? I hope we did not revert back to an older version that > > > > > > might have had some bug in it.... > > > > > > > > > > > > > > > > > I had to update to the latest version of provider-deef from SVN since > > > > > without the update nothing worked. The version I am at now is 1050. > > > > > But this is exactly the same version of swift/deef I used for our > > > > > Friday run (which 'worked' from Falcon/Swift point of view) > > > > > > > > > > Nika > > > > > > > > > > > > > > > > > > > > > Ioan > > > > > > > > > > > > Veronika Nefedova wrote: > > > > > > > > > > > > > Well, there are some discrepancies: > > > > > > > > > > > > > > nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- > > > > > > > zhgo6be8tjhi1.log | wc > > > > > > > 7959 244749 3241072 > > > > > > > nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- > > > > > > > zhgo6be8tjhi1.log | wc > > > > > > > 17207 564648 7949388 > > > > > > > nefedova at viper:~/alamines> > > > > > > > > > > > > > > I.e. almost half of the jobs haven't finished (according to swift) > > > > > > > > > > > > > > I also have some exceptions: > > > > > > > > > > > > > > 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: > > > > > > > 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception > > > > > > > in getFile > > > > > > > (80 of those): > > > > > > > nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- > > > > > > > zhgo6be8tjhi1.log | wc > > > > > > > 80 880 9705 > > > > > > > nefedova at viper:~/alamines> > > > > > > > > > > > > > > > > > > > > > Nika > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > From iraicu at cs.uchicago.edu Wed Aug 8 17:36:05 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 08 Aug 2007 17:36:05 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186610870.6478.0.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186610870.6478.0.camel@blabla.mcs.anl.gov> Message-ID: <46BA4555.2010401@cs.uchicago.edu> viper in Yong's account... he ran some tests just before he left with this version, and it worked just fine! I saved Nika's provider which I replaced, so we can always go back to that if we need to. Ioan Mihael Hategan wrote: > Where exactly is this version? > > On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: > >> OK everyone, I found Yong's version of the provider dated July 26th, >> much more recent than what was in SVN on June 27th. I updated Nika's >> version of the provider (which has been checked out of SVN), and >> recompiled&deploy! >> >> ant distclean >> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ >> dist >> >> I even updated updated some of the logging info to use the logger >> (some were not using the logger). >> >> Nika, Falkon is freshly restarted and ready for another test run! >> >> Falkon Factory Service: >> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService >> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >> >> Ioan >> >> Veronika Nefedova wrote: >> >>> Ioan, >>> >>> >>> It looks like the Falcon (including provider-deef) was put in SVN on >>> June 27th. You really were supposed to use the SVN code from that >>> point. Sigh. Did you do any changes to viper install after June >>> 27th? >>> >>> >>> Nika >>> >>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: >>> >>> >>>> Could it be that the fixes were done before the original SVN >>>> checkin? If not, then at least we know why things aren't >>>> working. I bet the latest provider source was in Nika's Swift >>>> install on viper. Nika, I take it you don't have this anymore, as >>>> SVN updates overwrote this. Yong, is there any other place you >>>> might have the latest provider source? If not, I guess we need to >>>> take another look through the provider source to fix the issues >>>> that we knew of... >>>> >>>> Ioan >>>> >>>> Mihael Hategan wrote: >>>> >>>>> Well, it doesn't look like the falkon provider in SVN has been updated >>>>> at all in terms of fixing synchronization issues. All commits on >>>>> provider-deef come from either ben or me: >>>>> >>>>> bash-3.1$ svn log >>>>> ------------------------------------------------------------------------ >>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, 03 Aug >>>>> 2007) | 1 line >>>>> >>>>> removed gt4 stuff and added them as a dependency >>>>> ------------------------------------------------------------------------ >>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, 03 Aug >>>>> 2007) | 1 line >>>>> >>>>> removed gt4 stuff and added them as a dependency >>>>> ------------------------------------------------------------------------ >>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug >>>>> 2007) | 1 line >>>>> >>>>> a very small readme for provider-deef >>>>> ------------------------------------------------------------------------ >>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun >>>>> 2007) | 1 line >>>>> >>>>> remove dist directory form svn >>>>> ------------------------------------------------------------------------ >>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun >>>>> 2007) | 20 lines >>>>> >>>>> provider-deef, the Falkon/cog provider >>>>> >>>>> based on source in below message, with .class files deleted >>>>> >>>>> >>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>>>> From: Veronika Nefedova >>>>> To: Yong Zhao >>>>> Cc: Ben Clifford , Mihael Hategan >>>>> , >>>>> iraicu at cs.uchicago.edu, Ian Foster , >>>>> Mike Wilde , >>>>> Tiberiu Stef-Praun >>>>> Subject: Re: 244 molecule MolDyn run... >>>>> >>>>> its on viper.uchicago.edu >>>>> in : /home/nefedova/cogl/modules/provider-deef/ >>>>> I also tared it up and put in my home on terminable: ~nefedova/cogl.tgz >>>>> >>>>> Nika >>>>> >>>>> >>>>> ------------------------------------------------------------------------ >>>>> >>>>> >>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>>>> >>>>> >>>>>> Mihael, do you have any clues on why this run has failed? Ioan - my >>>>>> answers to your questions are below... >>>>>> >>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>>>> >>>>>> >>>>>> >>>>>>> It looks like viper (where Swift is running) is idle, and so is tg- >>>>>>> viz-login2 (where Falkon is running). >>>>>>> What looks evident to me is that the normal list of events is for a >>>>>>> successful task: >>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log >>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: >>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: >>>>>>> 0-1-73-2-31-0-0-1186444341989 0 >>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: >>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>>>>>> >>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>> 17566 175660 2179412 >>>>>>> >>>>>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread >>>>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>> 7959 55713 785035 >>>>>>> >>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>> 190968 1909680 24003796 >>>>>>> >>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were received >>>>>>> from Falkon, and 190968 tasks were set to completed... >>>>>>> >>>>>>> Obviously this isn't right. Falkon only saw 7959 tasks, so I would >>>>>>> argue that the # of notifications received is correct. The >>>>>>> submitted # of tasks looks like the # I would have expected, but >>>>>>> all the tasks did not make it to Falkon. The Falkon provider is >>>>>>> what sits between the change of status to submitted, and the >>>>>>> receipt of the notification, so I would say that is the first place >>>>>>> we need to look for more details... there used to some extra debug >>>>>>> info in the Falkon provider that simply printed all the tasks that >>>>>>> were actually being submitted to Falkon (as opposed to just the >>>>>>> change of status within Karajan). I don't see those debug >>>>>>> statements, I bet they got overwritten in the SVN update. >>>>>>> What about the completed tasks, why are there so many (190K) >>>>>>> completed tasks? Where did they come from? >>>>>>> >>>>>>> >>>>>>> >>>>>> "Task" doesn't mean job. It could be just data being staged in , etc. >>>>>> The first 2 are important -- (Submitted vs Completed). Since it >>>>>> differs, this is the problem... >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Yong, are you keeping up with these emails? Do you still have a >>>>>>> copy of the latest Falkon provider that you edited just before you >>>>>>> left? Can you just take a look through there to make sure nothing >>>>>>> has been broken with the SVN updates? If you don't have time for >>>>>>> this now (considering today was your first day on the new job), >>>>>>> I'll dig through there and see if I can make some sense of what is >>>>>>> happening! >>>>>>> >>>>>>> One last thing, Ben mentioned that the Falkon provider you saw in >>>>>>> Nika's account was different than what was in SVN. Ben, did you at >>>>>>> least look at modification dates? How old was one as opposed to >>>>>>> the other? I hope we did not revert back to an older version that >>>>>>> might have had some bug in it.... >>>>>>> >>>>>>> >>>>>>> >>>>>> I had to update to the latest version of provider-deef from SVN since >>>>>> without the update nothing worked. The version I am at now is 1050. >>>>>> But this is exactly the same version of swift/deef I used for our >>>>>> Friday run (which 'worked' from Falcon/Swift point of view) >>>>>> >>>>>> Nika >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Ioan >>>>>>> >>>>>>> Veronika Nefedova wrote: >>>>>>> >>>>>>> >>>>>>>> Well, there are some discrepancies: >>>>>>>> >>>>>>>> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- >>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>> 7959 244749 3241072 >>>>>>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- >>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>> 17207 564648 7949388 >>>>>>>> nefedova at viper:~/alamines> >>>>>>>> >>>>>>>> I.e. almost half of the jobs haven't finished (according to swift) >>>>>>>> >>>>>>>> I also have some exceptions: >>>>>>>> >>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: >>>>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception >>>>>>>> in getFile >>>>>>>> (80 of those): >>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>> 80 880 9705 >>>>>>>> nefedova at viper:~/alamines> >>>>>>>> >>>>>>>> >>>>>>>> Nika >>>>>>>> >>>>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>>> >>>>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Wed Aug 8 17:45:44 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 08 Aug 2007 17:45:44 -0500 Subject: [Swift-devel] regexp mapper Message-ID: <1186613144.6478.7.camel@blabla.mcs.anl.gov> I've been looking at the regexp mapper, in particular the map() function. It looks like all things in there are invariants or not used. The path is not used and all mapper parameters are, by definition, invariants for a particular instance of the mapper. So it looks more like a single file mapper with a very intricate way of simulating a regular expression replacement function, which makes even less sense if the source is a static string. What's going on? Mihael From nefedova at mcs.anl.gov Wed Aug 8 19:44:41 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Wed, 8 Aug 2007 19:44:41 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BA4555.2010401@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B751AF.9050502@cs.uchicago.edu> <5BF06E34-7FB7-476A-A291-4E3ADC3BD25B@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186610870.6478.0.camel@blabla.mcs.anl.gov> <46BA4555.2010401@cs.uchicago.edu> Message-ID: Everything seemed to come to a halt. This is the last stdout that I have: Staged out MolDyn-244-loops-knt9h8fru9sm2/shared/ solv_repu_0.7_0.8_a0_m040.wham to solv_repu_0.7_0.8_a0_m040.wham from UC-64 Staged out MolDyn-244-loops-knt9h8fru9sm2/shared/ solv_repu_0.7_0.8_a0_m040_done to solv_repu_0.7_0.8_a0_m040_done from UC-64 Submitting task Task(type=4, identity=urn: 0-1-91-2-29-0-0-2-1186617126510) No host specified Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) setting status to Active Submitting task Task(type=4, identity=urn: 0-1-91-2-29-0-0-1-1186617126513) No host specified Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) setting status to Completed Submitting task Task(type=4, identity=urn: 0-1-91-2-29-0-0-3-1186617126516) No host specified Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) setting status to Active Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) Completed. Waiting: 1, Running: 14926. Heap size: 1518M, Heap free: 962M, Max heap: 1518M Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) setting status to Completed Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) Completed. Waiting: 0, Running: 14926. Heap size: 1518M, Heap free: 962M, Max heap: 1518M Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) setting status to Active Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) setting status to Completed Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) Completed. Waiting: 0, Running: 14925. Heap size: 1518M, Heap free: 962M, Max heap: 1518M Submitting task Task(type=4, identity=urn: 0-1-91-2-29-0-0-4-1186617126519) No host specified Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) setting status to Active Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) setting status to Completed Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) Completed. Waiting: 0, Running: 14925. Heap size: 1518M, Heap free: 962M, Max heap: 1518M Resolved 2078 to UC-64 chrm_long completed Notice 'No host specified' -- this message was printing throughout the whole execution, from the very beginning. The log is in ~nefedova/alamines/MolDyn-244-loops-knt9h8fru9sm2.log on viper Nika On Aug 8, 2007, at 5:36 PM, Ioan Raicu wrote: > viper in Yong's account... he ran some tests just before he left > with this version, and it worked just fine! > I saved Nika's provider which I replaced, so we can always go back > to that if we need to. > > Ioan > > Mihael Hategan wrote: >> Where exactly is this version? >> >> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: >> >>> OK everyone, I found Yong's version of the provider dated July 26th, >>> much more recent than what was in SVN on June 27th. I updated >>> Nika's >>> version of the provider (which has been checked out of SVN), and >>> recompiled&deploy! >>> >>> ant distclean >>> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ >>> dist >>> >>> I even updated updated some of the logging info to use the logger >>> (some were not using the logger). >>> >>> Nika, Falkon is freshly restarted and ready for another test run! >>> >>> Falkon Factory Service: >>> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/ >>> GenericPortal/core/WS/GPFactoryService >>> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>> >>> Ioan >>> >>> Veronika Nefedova wrote: >>> >>>> Ioan, >>>> >>>> >>>> It looks like the Falcon (including provider-deef) was put in >>>> SVN on >>>> June 27th. You really were supposed to use the SVN code from that >>>> point. Sigh. Did you do any changes to viper install after June >>>> 27th? >>>> >>>> >>>> Nika >>>> >>>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: >>>> >>>> >>>>> Could it be that the fixes were done before the original SVN >>>>> checkin? If not, then at least we know why things aren't >>>>> working. I bet the latest provider source was in Nika's Swift >>>>> install on viper. Nika, I take it you don't have this anymore, as >>>>> SVN updates overwrote this. Yong, is there any other place you >>>>> might have the latest provider source? If not, I guess we need to >>>>> take another look through the provider source to fix the issues >>>>> that we knew of... >>>>> >>>>> Ioan >>>>> >>>>> Mihael Hategan wrote: >>>>> >>>>>> Well, it doesn't look like the falkon provider in SVN has been >>>>>> updated >>>>>> at all in terms of fixing synchronization issues. All commits on >>>>>> provider-deef come from either ben or me: >>>>>> >>>>>> bash-3.1$ svn log >>>>>> ----------------------------------------------------------------- >>>>>> ------- >>>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 >>>>>> (Fri, 03 Aug >>>>>> 2007) | 1 line >>>>>> >>>>>> removed gt4 stuff and added them as a dependency >>>>>> ----------------------------------------------------------------- >>>>>> ------- >>>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 >>>>>> (Fri, 03 Aug >>>>>> 2007) | 1 line >>>>>> >>>>>> removed gt4 stuff and added them as a dependency >>>>>> ----------------------------------------------------------------- >>>>>> ------- >>>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, >>>>>> 03 Aug >>>>>> 2007) | 1 line >>>>>> >>>>>> a very small readme for provider-deef >>>>>> ----------------------------------------------------------------- >>>>>> ------- >>>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, >>>>>> 27 Jun >>>>>> 2007) | 1 line >>>>>> >>>>>> remove dist directory form svn >>>>>> ----------------------------------------------------------------- >>>>>> ------- >>>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, >>>>>> 27 Jun >>>>>> 2007) | 20 lines >>>>>> >>>>>> provider-deef, the Falkon/cog provider >>>>>> >>>>>> based on source in below message, with .class files deleted >>>>>> >>>>>> >>>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>>>>> From: Veronika Nefedova >>>>>> To: Yong Zhao >>>>>> Cc: Ben Clifford , Mihael Hategan >>>>>> , >>>>>> iraicu at cs.uchicago.edu, Ian Foster , >>>>>> Mike Wilde , >>>>>> Tiberiu Stef-Praun >>>>>> Subject: Re: 244 molecule MolDyn run... >>>>>> >>>>>> its on viper.uchicago.edu >>>>>> in : /home/nefedova/cogl/modules/provider-deef/ >>>>>> I also tared it up and put in my home on terminable: ~nefedova/ >>>>>> cogl.tgz >>>>>> >>>>>> Nika >>>>>> >>>>>> >>>>>> ----------------------------------------------------------------- >>>>>> ------- >>>>>> >>>>>> >>>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>>>>> >>>>>> >>>>>>> Mihael, do you have any clues on why this run has failed? >>>>>>> Ioan - my >>>>>>> answers to your questions are below... >>>>>>> >>>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> It looks like viper (where Swift is running) is idle, and so >>>>>>>> is tg- >>>>>>>> viz-login2 (where Falkon is running). >>>>>>>> What looks evident to me is that the normal list of events >>>>>>>> is for a >>>>>>>> successful task: >>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>>>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops- >>>>>>>> zhgo6be8tjhi1.log >>>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, >>>>>>>> identity=urn: >>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread >>>>>>>> notification: urn: >>>>>>>> 0-1-73-2-31-0-0-1186444341989 0 >>>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, >>>>>>>> identity=urn: >>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>>>>>>> >>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>> 17566 175660 2179412 >>>>>>>> >>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread >>>>>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>> 7959 55713 785035 >>>>>>>> >>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>> 190968 1909680 24003796 >>>>>>>> >>>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were received >>>>>>>> from Falkon, and 190968 tasks were set to completed... >>>>>>>> >>>>>>>> Obviously this isn't right. Falkon only saw 7959 tasks, so >>>>>>>> I would >>>>>>>> argue that the # of notifications received is correct. The >>>>>>>> submitted # of tasks looks like the # I would have expected, >>>>>>>> but >>>>>>>> all the tasks did not make it to Falkon. The Falkon >>>>>>>> provider is >>>>>>>> what sits between the change of status to submitted, and the >>>>>>>> receipt of the notification, so I would say that is the >>>>>>>> first place >>>>>>>> we need to look for more details... there used to some extra >>>>>>>> debug >>>>>>>> info in the Falkon provider that simply printed all the >>>>>>>> tasks that >>>>>>>> were actually being submitted to Falkon (as opposed to just the >>>>>>>> change of status within Karajan). I don't see those debug >>>>>>>> statements, I bet they got overwritten in the SVN update. >>>>>>>> What about the completed tasks, why are there so many (190K) >>>>>>>> completed tasks? Where did they come from? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> "Task" doesn't mean job. It could be just data being staged >>>>>>> in , etc. >>>>>>> The first 2 are important -- (Submitted vs Completed). Since it >>>>>>> differs, this is the problem... >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Yong, are you keeping up with these emails? Do you still >>>>>>>> have a >>>>>>>> copy of the latest Falkon provider that you edited just >>>>>>>> before you >>>>>>>> left? Can you just take a look through there to make sure >>>>>>>> nothing >>>>>>>> has been broken with the SVN updates? If you don't have >>>>>>>> time for >>>>>>>> this now (considering today was your first day on the new job), >>>>>>>> I'll dig through there and see if I can make some sense of >>>>>>>> what is >>>>>>>> happening! >>>>>>>> >>>>>>>> One last thing, Ben mentioned that the Falkon provider you >>>>>>>> saw in >>>>>>>> Nika's account was different than what was in SVN. Ben, did >>>>>>>> you at >>>>>>>> least look at modification dates? How old was one as >>>>>>>> opposed to >>>>>>>> the other? I hope we did not revert back to an older >>>>>>>> version that >>>>>>>> might have had some bug in it.... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> I had to update to the latest version of provider-deef from >>>>>>> SVN since >>>>>>> without the update nothing worked. The version I am at now is >>>>>>> 1050. >>>>>>> But this is exactly the same version of swift/deef I used for >>>>>>> our >>>>>>> Friday run (which 'worked' from Falcon/Swift point of view) >>>>>>> >>>>>>> Nika >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Ioan >>>>>>>> >>>>>>>> Veronika Nefedova wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Well, there are some discrepancies: >>>>>>>>> >>>>>>>>> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244- >>>>>>>>> loops- >>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>> 7959 244749 3241072 >>>>>>>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244- >>>>>>>>> loops- >>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>> 17207 564648 7949388 >>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>> >>>>>>>>> I.e. almost half of the jobs haven't finished (according to >>>>>>>>> swift) >>>>>>>>> >>>>>>>>> I also have some exceptions: >>>>>>>>> >>>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, >>>>>>>>> identity=urn: >>>>>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed >>>>>>>>> Exception >>>>>>>>> in getFile >>>>>>>>> (80 of those): >>>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>> 80 880 9705 >>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>> >>>>>>>>> >>>>>>>>> Nika >>>>>>>>> >>>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Swift-devel mailing list >>>>>>> Swift-devel at ci.uchicago.edu >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>> >>>>>>> >>>>>>> >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Wed Aug 8 20:34:40 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 08 Aug 2007 20:34:40 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186610870.6478.0.camel@blabla.mcs.anl.gov> < 46BA4555.2010401@cs.uchicago.edu> Message-ID: <46BA6F30.3040908@cs.uchicago.edu> Things are not halted, Falkon is still running, and its delivering results really slowly... http://tg-viz-login2.uc.teragrid.org:51000/index.htm notice the black area in the second graph, that is the time to deliver notificaitons to Swift... all machines are basically idle, I don't know what it could be... there is ample space on the disks... CPU is idle, memory is OK, yet things are just crawling, and Swift seems to have stopped printing anything to the screen or file. The logs show nothing strange... but there is obviosuly something that is not right... I'll let the experiment keep going for now, and I'll dig into it deeper later tonight... Ioan Veronika Nefedova wrote: > Everything seemed to come to a halt. > > This is the last stdout that I have: > > Staged out > MolDyn-244-loops-knt9h8fru9sm2/shared/solv_repu_0.7_0.8_a0_m040.wham > to solv_repu_0.7_0.8_a0_m040.wham from UC-64 > Staged out > MolDyn-244-loops-knt9h8fru9sm2/shared/solv_repu_0.7_0.8_a0_m040_done > to solv_repu_0.7_0.8_a0_m040_done from UC-64 > Submitting task Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) > No host specified > Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) setting > status to Active > Submitting task Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) > No host specified > Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) setting > status to Completed > Submitting task Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) > No host specified > Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) setting > status to Active > Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) Completed. > Waiting: 1, Running: 14926. Heap size: 1518M, Heap free: 962M, Max > heap: 1518M > Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) setting > status to Completed > Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) Completed. > Waiting: 0, Running: 14926. Heap size: 1518M, Heap free: 962M, Max > heap: 1518M > Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) setting > status to Active > Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) setting > status to Completed > Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) Completed. > Waiting: 0, Running: 14925. Heap size: 1518M, Heap free: 962M, Max > heap: 1518M > Submitting task Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) > No host specified > Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) setting > status to Active > Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) setting > status to Completed > Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) Completed. > Waiting: 0, Running: 14925. Heap size: 1518M, Heap free: 962M, Max > heap: 1518M > Resolved 2078 to UC-64 > chrm_long completed > > > > Notice 'No host specified' -- this message was printing throughout the > whole execution, from the very beginning. > The log is in ~nefedova/alamines/MolDyn-244-loops-knt9h8fru9sm2.log on > viper > > Nika > > On Aug 8, 2007, at 5:36 PM, Ioan Raicu wrote: > >> viper in Yong's account... he ran some tests just before he left with >> this version, and it worked just fine! >> I saved Nika's provider which I replaced, so we can always go back to >> that if we need to. >> >> Ioan >> >> Mihael Hategan wrote: >>> Where exactly is this version? >>> >>> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: >>> >>>> OK everyone, I found Yong's version of the provider dated July 26th, >>>> much more recent than what was in SVN on June 27th. I updated Nika's >>>> version of the provider (which has been checked out of SVN), and >>>> recompiled&deploy! >>>> >>>> ant distclean >>>> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ >>>> dist >>>> >>>> I even updated updated some of the logging info to use the logger >>>> (some were not using the logger). >>>> >>>> Nika, Falkon is freshly restarted and ready for another test run! >>>> >>>> Falkon Factory Service: >>>> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService >>>> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>> >>>> Ioan >>>> >>>> Veronika Nefedova wrote: >>>> >>>>> Ioan, >>>>> >>>>> >>>>> It looks like the Falcon (including provider-deef) was put in SVN on >>>>> June 27th. You really were supposed to use the SVN code from that >>>>> point. Sigh. Did you do any changes to viper install after June >>>>> 27th? >>>>> >>>>> >>>>> Nika >>>>> >>>>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: >>>>> >>>>> >>>>>> Could it be that the fixes were done before the original SVN >>>>>> checkin? If not, then at least we know why things aren't >>>>>> working. I bet the latest provider source was in Nika's Swift >>>>>> install on viper. Nika, I take it you don't have this anymore, as >>>>>> SVN updates overwrote this. Yong, is there any other place you >>>>>> might have the latest provider source? If not, I guess we need to >>>>>> take another look through the provider source to fix the issues >>>>>> that we knew of... >>>>>> >>>>>> Ioan >>>>>> >>>>>> Mihael Hategan wrote: >>>>>> >>>>>>> Well, it doesn't look like the falkon provider in SVN has been updated >>>>>>> at all in terms of fixing synchronization issues. All commits on >>>>>>> provider-deef come from either ben or me: >>>>>>> >>>>>>> bash-3.1$ svn log >>>>>>> ------------------------------------------------------------------------ >>>>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, 03 Aug >>>>>>> 2007) | 1 line >>>>>>> >>>>>>> removed gt4 stuff and added them as a dependency >>>>>>> ------------------------------------------------------------------------ >>>>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, 03 Aug >>>>>>> 2007) | 1 line >>>>>>> >>>>>>> removed gt4 stuff and added them as a dependency >>>>>>> ------------------------------------------------------------------------ >>>>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug >>>>>>> 2007) | 1 line >>>>>>> >>>>>>> a very small readme for provider-deef >>>>>>> ------------------------------------------------------------------------ >>>>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun >>>>>>> 2007) | 1 line >>>>>>> >>>>>>> remove dist directory form svn >>>>>>> ------------------------------------------------------------------------ >>>>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun >>>>>>> 2007) | 20 lines >>>>>>> >>>>>>> provider-deef, the Falkon/cog provider >>>>>>> >>>>>>> based on source in below message, with .class files deleted >>>>>>> >>>>>>> >>>>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>>>>>> From: Veronika Nefedova >>>>>>> To: Yong Zhao >>>>>>> Cc: Ben Clifford , Mihael Hategan >>>>>>> , >>>>>>> iraicu at cs.uchicago.edu, Ian Foster , >>>>>>> Mike Wilde , >>>>>>> Tiberiu Stef-Praun >>>>>>> Subject: Re: 244 molecule MolDyn run... >>>>>>> >>>>>>> its on viper.uchicago.edu >>>>>>> in : /home/nefedova/cogl/modules/provider-deef/ >>>>>>> I also tared it up and put in my home on terminable: ~nefedova/cogl.tgz >>>>>>> >>>>>>> Nika >>>>>>> >>>>>>> >>>>>>> ------------------------------------------------------------------------ >>>>>>> >>>>>>> >>>>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>>>>>> >>>>>>> >>>>>>>> Mihael, do you have any clues on why this run has failed? Ioan - my >>>>>>>> answers to your questions are below... >>>>>>>> >>>>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> It looks like viper (where Swift is running) is idle, and so is tg- >>>>>>>>> viz-login2 (where Falkon is running). >>>>>>>>> What looks evident to me is that the normal list of events is for a >>>>>>>>> successful task: >>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>>>>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log >>>>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: >>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>>>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: >>>>>>>>> 0-1-73-2-31-0-0-1186444341989 0 >>>>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: >>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>>>>>>>> >>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>> 17566 175660 2179412 >>>>>>>>> >>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread >>>>>>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>> 7959 55713 785035 >>>>>>>>> >>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>> 190968 1909680 24003796 >>>>>>>>> >>>>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were received >>>>>>>>> from Falkon, and 190968 tasks were set to completed... >>>>>>>>> >>>>>>>>> Obviously this isn't right. Falkon only saw 7959 tasks, so I would >>>>>>>>> argue that the # of notifications received is correct. The >>>>>>>>> submitted # of tasks looks like the # I would have expected, but >>>>>>>>> all the tasks did not make it to Falkon. The Falkon provider is >>>>>>>>> what sits between the change of status to submitted, and the >>>>>>>>> receipt of the notification, so I would say that is the first place >>>>>>>>> we need to look for more details... there used to some extra debug >>>>>>>>> info in the Falkon provider that simply printed all the tasks that >>>>>>>>> were actually being submitted to Falkon (as opposed to just the >>>>>>>>> change of status within Karajan). I don't see those debug >>>>>>>>> statements, I bet they got overwritten in the SVN update. >>>>>>>>> What about the completed tasks, why are there so many (190K) >>>>>>>>> completed tasks? Where did they come from? >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> "Task" doesn't mean job. It could be just data being staged in , etc. >>>>>>>> The first 2 are important -- (Submitted vs Completed). Since it >>>>>>>> differs, this is the problem... >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Yong, are you keeping up with these emails? Do you still have a >>>>>>>>> copy of the latest Falkon provider that you edited just before you >>>>>>>>> left? Can you just take a look through there to make sure nothing >>>>>>>>> has been broken with the SVN updates? If you don't have time for >>>>>>>>> this now (considering today was your first day on the new job), >>>>>>>>> I'll dig through there and see if I can make some sense of what is >>>>>>>>> happening! >>>>>>>>> >>>>>>>>> One last thing, Ben mentioned that the Falkon provider you saw in >>>>>>>>> Nika's account was different than what was in SVN. Ben, did you at >>>>>>>>> least look at modification dates? How old was one as opposed to >>>>>>>>> the other? I hope we did not revert back to an older version that >>>>>>>>> might have had some bug in it.... >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> I had to update to the latest version of provider-deef from SVN since >>>>>>>> without the update nothing worked. The version I am at now is 1050. >>>>>>>> But this is exactly the same version of swift/deef I used for our >>>>>>>> Friday run (which 'worked' from Falcon/Swift point of view) >>>>>>>> >>>>>>>> Nika >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Ioan >>>>>>>>> >>>>>>>>> Veronika Nefedova wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Well, there are some discrepancies: >>>>>>>>>> >>>>>>>>>> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- >>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>> 7959 244749 3241072 >>>>>>>>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- >>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>> 17207 564648 7949388 >>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>> >>>>>>>>>> I.e. almost half of the jobs haven't finished (according to swift) >>>>>>>>>> >>>>>>>>>> I also have some exceptions: >>>>>>>>>> >>>>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: >>>>>>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception >>>>>>>>>> in getFile >>>>>>>>>> (80 of those): >>>>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>> 80 880 9705 >>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Nika >>>>>>>>>> >>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Swift-devel mailing list >>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>> >>>>>>>> >>>>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>> >>> > From hategan at mcs.anl.gov Thu Aug 9 12:33:17 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 09 Aug 2007 12:33:17 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BA6F30.3040908@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <9A591F19-7F78-49C2-86BD-A2EB8FF6972D@mcs.anl.gov> <46B776A7.7040005@cs.uchicago.edu> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186610870.6478.0.camel@blabla.mcs.anl.gov> < 46BA4555.2010401@cs.uchicago.edu> <46BA6F30.3040908@cs.uchicago.edu> Message-ID: <1186680798.26452.3.camel@blabla.mcs.anl.gov> So I see a gap in the log from 19:32 to 21:45. No log messages whatsoever in between. Which is weird. I wonder what could cause log4j to stop writing things to the log file. On Wed, 2007-08-08 at 20:34 -0500, Ioan Raicu wrote: > Things are not halted, Falkon is still running, and its delivering > results really slowly... > > http://tg-viz-login2.uc.teragrid.org:51000/index.htm > > notice the black area in the second graph, that is the time to deliver > notificaitons to Swift... all machines are basically idle, I don't know > what it could be... there is ample space on the disks... CPU is idle, > memory is OK, yet things are just crawling, and Swift seems to have > stopped printing anything to the screen or file. > > The logs show nothing strange... but there is obviosuly something that > is not right... > > I'll let the experiment keep going for now, and I'll dig into it deeper > later tonight... > > Ioan > > Veronika Nefedova wrote: > > Everything seemed to come to a halt. > > > > This is the last stdout that I have: > > > > Staged out > > MolDyn-244-loops-knt9h8fru9sm2/shared/solv_repu_0.7_0.8_a0_m040.wham > > to solv_repu_0.7_0.8_a0_m040.wham from UC-64 > > Staged out > > MolDyn-244-loops-knt9h8fru9sm2/shared/solv_repu_0.7_0.8_a0_m040_done > > to solv_repu_0.7_0.8_a0_m040_done from UC-64 > > Submitting task Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) > > No host specified > > Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) setting > > status to Active > > Submitting task Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) > > No host specified > > Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) setting > > status to Completed > > Submitting task Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) > > No host specified > > Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) setting > > status to Active > > Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) Completed. > > Waiting: 1, Running: 14926. Heap size: 1518M, Heap free: 962M, Max > > heap: 1518M > > Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) setting > > status to Completed > > Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) Completed. > > Waiting: 0, Running: 14926. Heap size: 1518M, Heap free: 962M, Max > > heap: 1518M > > Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) setting > > status to Active > > Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) setting > > status to Completed > > Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) Completed. > > Waiting: 0, Running: 14925. Heap size: 1518M, Heap free: 962M, Max > > heap: 1518M > > Submitting task Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) > > No host specified > > Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) setting > > status to Active > > Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) setting > > status to Completed > > Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) Completed. > > Waiting: 0, Running: 14925. Heap size: 1518M, Heap free: 962M, Max > > heap: 1518M > > Resolved 2078 to UC-64 > > chrm_long completed > > > > > > > > Notice 'No host specified' -- this message was printing throughout the > > whole execution, from the very beginning. > > The log is in ~nefedova/alamines/MolDyn-244-loops-knt9h8fru9sm2.log on > > viper > > > > Nika > > > > On Aug 8, 2007, at 5:36 PM, Ioan Raicu wrote: > > > >> viper in Yong's account... he ran some tests just before he left with > >> this version, and it worked just fine! > >> I saved Nika's provider which I replaced, so we can always go back to > >> that if we need to. > >> > >> Ioan > >> > >> Mihael Hategan wrote: > >>> Where exactly is this version? > >>> > >>> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: > >>> > >>>> OK everyone, I found Yong's version of the provider dated July 26th, > >>>> much more recent than what was in SVN on June 27th. I updated Nika's > >>>> version of the provider (which has been checked out of SVN), and > >>>> recompiled&deploy! > >>>> > >>>> ant distclean > >>>> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ > >>>> dist > >>>> > >>>> I even updated updated some of the logging info to use the logger > >>>> (some were not using the logger). > >>>> > >>>> Nika, Falkon is freshly restarted and ready for another test run! > >>>> > >>>> Falkon Factory Service: > >>>> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService > >>>> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm > >>>> > >>>> Ioan > >>>> > >>>> Veronika Nefedova wrote: > >>>> > >>>>> Ioan, > >>>>> > >>>>> > >>>>> It looks like the Falcon (including provider-deef) was put in SVN on > >>>>> June 27th. You really were supposed to use the SVN code from that > >>>>> point. Sigh. Did you do any changes to viper install after June > >>>>> 27th? > >>>>> > >>>>> > >>>>> Nika > >>>>> > >>>>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: > >>>>> > >>>>> > >>>>>> Could it be that the fixes were done before the original SVN > >>>>>> checkin? If not, then at least we know why things aren't > >>>>>> working. I bet the latest provider source was in Nika's Swift > >>>>>> install on viper. Nika, I take it you don't have this anymore, as > >>>>>> SVN updates overwrote this. Yong, is there any other place you > >>>>>> might have the latest provider source? If not, I guess we need to > >>>>>> take another look through the provider source to fix the issues > >>>>>> that we knew of... > >>>>>> > >>>>>> Ioan > >>>>>> > >>>>>> Mihael Hategan wrote: > >>>>>> > >>>>>>> Well, it doesn't look like the falkon provider in SVN has been updated > >>>>>>> at all in terms of fixing synchronization issues. All commits on > >>>>>>> provider-deef come from either ben or me: > >>>>>>> > >>>>>>> bash-3.1$ svn log > >>>>>>> ------------------------------------------------------------------------ > >>>>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, 03 Aug > >>>>>>> 2007) | 1 line > >>>>>>> > >>>>>>> removed gt4 stuff and added them as a dependency > >>>>>>> ------------------------------------------------------------------------ > >>>>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, 03 Aug > >>>>>>> 2007) | 1 line > >>>>>>> > >>>>>>> removed gt4 stuff and added them as a dependency > >>>>>>> ------------------------------------------------------------------------ > >>>>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug > >>>>>>> 2007) | 1 line > >>>>>>> > >>>>>>> a very small readme for provider-deef > >>>>>>> ------------------------------------------------------------------------ > >>>>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun > >>>>>>> 2007) | 1 line > >>>>>>> > >>>>>>> remove dist directory form svn > >>>>>>> ------------------------------------------------------------------------ > >>>>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun > >>>>>>> 2007) | 20 lines > >>>>>>> > >>>>>>> provider-deef, the Falkon/cog provider > >>>>>>> > >>>>>>> based on source in below message, with .class files deleted > >>>>>>> > >>>>>>> > >>>>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500 > >>>>>>> From: Veronika Nefedova > >>>>>>> To: Yong Zhao > >>>>>>> Cc: Ben Clifford , Mihael Hategan > >>>>>>> , > >>>>>>> iraicu at cs.uchicago.edu, Ian Foster , > >>>>>>> Mike Wilde , > >>>>>>> Tiberiu Stef-Praun > >>>>>>> Subject: Re: 244 molecule MolDyn run... > >>>>>>> > >>>>>>> its on viper.uchicago.edu > >>>>>>> in : /home/nefedova/cogl/modules/provider-deef/ > >>>>>>> I also tared it up and put in my home on terminable: ~nefedova/cogl.tgz > >>>>>>> > >>>>>>> Nika > >>>>>>> > >>>>>>> > >>>>>>> ------------------------------------------------------------------------ > >>>>>>> > >>>>>>> > >>>>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: > >>>>>>> > >>>>>>> > >>>>>>>> Mihael, do you have any clues on why this run has failed? Ioan - my > >>>>>>>> answers to your questions are below... > >>>>>>>> > >>>>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> It looks like viper (where Swift is running) is idle, and so is tg- > >>>>>>>>> viz-login2 (where Falkon is running). > >>>>>>>>> What looks evident to me is that the normal list of events is for a > >>>>>>>>> successful task: > >>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: > >>>>>>>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log > >>>>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: > >>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted > >>>>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: > >>>>>>>>> 0-1-73-2-31-0-0-1186444341989 0 > >>>>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: > >>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed > >>>>>>>>> > >>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to > >>>>>>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > >>>>>>>>> 17566 175660 2179412 > >>>>>>>>> > >>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread > >>>>>>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > >>>>>>>>> 7959 55713 785035 > >>>>>>>>> > >>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to > >>>>>>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc > >>>>>>>>> 190968 1909680 24003796 > >>>>>>>>> > >>>>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were received > >>>>>>>>> from Falkon, and 190968 tasks were set to completed... > >>>>>>>>> > >>>>>>>>> Obviously this isn't right. Falkon only saw 7959 tasks, so I would > >>>>>>>>> argue that the # of notifications received is correct. The > >>>>>>>>> submitted # of tasks looks like the # I would have expected, but > >>>>>>>>> all the tasks did not make it to Falkon. The Falkon provider is > >>>>>>>>> what sits between the change of status to submitted, and the > >>>>>>>>> receipt of the notification, so I would say that is the first place > >>>>>>>>> we need to look for more details... there used to some extra debug > >>>>>>>>> info in the Falkon provider that simply printed all the tasks that > >>>>>>>>> were actually being submitted to Falkon (as opposed to just the > >>>>>>>>> change of status within Karajan). I don't see those debug > >>>>>>>>> statements, I bet they got overwritten in the SVN update. > >>>>>>>>> What about the completed tasks, why are there so many (190K) > >>>>>>>>> completed tasks? Where did they come from? > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> "Task" doesn't mean job. It could be just data being staged in , etc. > >>>>>>>> The first 2 are important -- (Submitted vs Completed). Since it > >>>>>>>> differs, this is the problem... > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> Yong, are you keeping up with these emails? Do you still have a > >>>>>>>>> copy of the latest Falkon provider that you edited just before you > >>>>>>>>> left? Can you just take a look through there to make sure nothing > >>>>>>>>> has been broken with the SVN updates? If you don't have time for > >>>>>>>>> this now (considering today was your first day on the new job), > >>>>>>>>> I'll dig through there and see if I can make some sense of what is > >>>>>>>>> happening! > >>>>>>>>> > >>>>>>>>> One last thing, Ben mentioned that the Falkon provider you saw in > >>>>>>>>> Nika's account was different than what was in SVN. Ben, did you at > >>>>>>>>> least look at modification dates? How old was one as opposed to > >>>>>>>>> the other? I hope we did not revert back to an older version that > >>>>>>>>> might have had some bug in it.... > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>> I had to update to the latest version of provider-deef from SVN since > >>>>>>>> without the update nothing worked. The version I am at now is 1050. > >>>>>>>> But this is exactly the same version of swift/deef I used for our > >>>>>>>> Friday run (which 'worked' from Falcon/Swift point of view) > >>>>>>>> > >>>>>>>> Nika > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> Ioan > >>>>>>>>> > >>>>>>>>> Veronika Nefedova wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> Well, there are some discrepancies: > >>>>>>>>>> > >>>>>>>>>> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- > >>>>>>>>>> zhgo6be8tjhi1.log | wc > >>>>>>>>>> 7959 244749 3241072 > >>>>>>>>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- > >>>>>>>>>> zhgo6be8tjhi1.log | wc > >>>>>>>>>> 17207 564648 7949388 > >>>>>>>>>> nefedova at viper:~/alamines> > >>>>>>>>>> > >>>>>>>>>> I.e. almost half of the jobs haven't finished (according to swift) > >>>>>>>>>> > >>>>>>>>>> I also have some exceptions: > >>>>>>>>>> > >>>>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: > >>>>>>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception > >>>>>>>>>> in getFile > >>>>>>>>>> (80 of those): > >>>>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- > >>>>>>>>>> zhgo6be8tjhi1.log | wc > >>>>>>>>>> 80 880 9705 > >>>>>>>>>> nefedova at viper:~/alamines> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> Nika > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> _______________________________________________ > >>>>>>>> Swift-devel mailing list > >>>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>> _______________________________________________ > >>>>>> Swift-devel mailing list > >>>>>> Swift-devel at ci.uchicago.edu > >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>> > >>>>> > >>> > > > From iraicu at cs.uchicago.edu Thu Aug 9 13:01:54 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 13:01:54 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186680798.26452.3.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186610870.6478.0.camel@blabla.mcs.anl.gov> < 46BA4555.2010401@cs.uchicago.edu> <46BA6F30.3040908@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> Message-ID: <46BB5692.90305@cs.uchicago.edu> I don't know, and the machine looked relatively idle... I am trying to track down what Falkon stubs are in Nika's Swift install, from the looks of it, they are really old, back from March. I have not made many changes, but in the last month or so, I did change the notificaiton engine to support persistent connections to speed it up a bit. I made it so its backwards compatible, but maybe its not 100%. Basically, the service was using persistent socket support, while the client was not, and maybe that cause some problem. I am now updating the Falkon stubs. I found a single jar file in the modules/provider-deef, which I have updated and committed! But, there are another bunch of them all over the place... let me track them down, and see if I can clean them up... as I suppose there should only be a single instance of these stubs, in a lib directory! Where should this master jar be, in which lib directory? Ioan Mihael Hategan wrote: > So I see a gap in the log from 19:32 to 21:45. No log messages > whatsoever in between. Which is weird. I wonder what could cause log4j > to stop writing things to the log file. > > On Wed, 2007-08-08 at 20:34 -0500, Ioan Raicu wrote: > >> Things are not halted, Falkon is still running, and its delivering >> results really slowly... >> >> http://tg-viz-login2.uc.teragrid.org:51000/index.htm >> >> notice the black area in the second graph, that is the time to deliver >> notificaitons to Swift... all machines are basically idle, I don't know >> what it could be... there is ample space on the disks... CPU is idle, >> memory is OK, yet things are just crawling, and Swift seems to have >> stopped printing anything to the screen or file. >> >> The logs show nothing strange... but there is obviosuly something that >> is not right... >> >> I'll let the experiment keep going for now, and I'll dig into it deeper >> later tonight... >> >> Ioan >> >> Veronika Nefedova wrote: >> >>> Everything seemed to come to a halt. >>> >>> This is the last stdout that I have: >>> >>> Staged out >>> MolDyn-244-loops-knt9h8fru9sm2/shared/solv_repu_0.7_0.8_a0_m040.wham >>> to solv_repu_0.7_0.8_a0_m040.wham from UC-64 >>> Staged out >>> MolDyn-244-loops-knt9h8fru9sm2/shared/solv_repu_0.7_0.8_a0_m040_done >>> to solv_repu_0.7_0.8_a0_m040_done from UC-64 >>> Submitting task Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) >>> No host specified >>> Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) setting >>> status to Active >>> Submitting task Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) >>> No host specified >>> Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) setting >>> status to Completed >>> Submitting task Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) >>> No host specified >>> Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) setting >>> status to Active >>> Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) Completed. >>> Waiting: 1, Running: 14926. Heap size: 1518M, Heap free: 962M, Max >>> heap: 1518M >>> Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) setting >>> status to Completed >>> Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) Completed. >>> Waiting: 0, Running: 14926. Heap size: 1518M, Heap free: 962M, Max >>> heap: 1518M >>> Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) setting >>> status to Active >>> Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) setting >>> status to Completed >>> Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) Completed. >>> Waiting: 0, Running: 14925. Heap size: 1518M, Heap free: 962M, Max >>> heap: 1518M >>> Submitting task Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) >>> No host specified >>> Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) setting >>> status to Active >>> Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) setting >>> status to Completed >>> Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) Completed. >>> Waiting: 0, Running: 14925. Heap size: 1518M, Heap free: 962M, Max >>> heap: 1518M >>> Resolved 2078 to UC-64 >>> chrm_long completed >>> >>> >>> >>> Notice 'No host specified' -- this message was printing throughout the >>> whole execution, from the very beginning. >>> The log is in ~nefedova/alamines/MolDyn-244-loops-knt9h8fru9sm2.log on >>> viper >>> >>> Nika >>> >>> On Aug 8, 2007, at 5:36 PM, Ioan Raicu wrote: >>> >>> >>>> viper in Yong's account... he ran some tests just before he left with >>>> this version, and it worked just fine! >>>> I saved Nika's provider which I replaced, so we can always go back to >>>> that if we need to. >>>> >>>> Ioan >>>> >>>> Mihael Hategan wrote: >>>> >>>>> Where exactly is this version? >>>>> >>>>> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: >>>>> >>>>> >>>>>> OK everyone, I found Yong's version of the provider dated July 26th, >>>>>> much more recent than what was in SVN on June 27th. I updated Nika's >>>>>> version of the provider (which has been checked out of SVN), and >>>>>> recompiled&deploy! >>>>>> >>>>>> ant distclean >>>>>> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ >>>>>> dist >>>>>> >>>>>> I even updated updated some of the logging info to use the logger >>>>>> (some were not using the logger). >>>>>> >>>>>> Nika, Falkon is freshly restarted and ready for another test run! >>>>>> >>>>>> Falkon Factory Service: >>>>>> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService >>>>>> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>>>> >>>>>> Ioan >>>>>> >>>>>> Veronika Nefedova wrote: >>>>>> >>>>>> >>>>>>> Ioan, >>>>>>> >>>>>>> >>>>>>> It looks like the Falcon (including provider-deef) was put in SVN on >>>>>>> June 27th. You really were supposed to use the SVN code from that >>>>>>> point. Sigh. Did you do any changes to viper install after June >>>>>>> 27th? >>>>>>> >>>>>>> >>>>>>> Nika >>>>>>> >>>>>>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Could it be that the fixes were done before the original SVN >>>>>>>> checkin? If not, then at least we know why things aren't >>>>>>>> working. I bet the latest provider source was in Nika's Swift >>>>>>>> install on viper. Nika, I take it you don't have this anymore, as >>>>>>>> SVN updates overwrote this. Yong, is there any other place you >>>>>>>> might have the latest provider source? If not, I guess we need to >>>>>>>> take another look through the provider source to fix the issues >>>>>>>> that we knew of... >>>>>>>> >>>>>>>> Ioan >>>>>>>> >>>>>>>> Mihael Hategan wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Well, it doesn't look like the falkon provider in SVN has been updated >>>>>>>>> at all in terms of fixing synchronization issues. All commits on >>>>>>>>> provider-deef come from either ben or me: >>>>>>>>> >>>>>>>>> bash-3.1$ svn log >>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 (Fri, 03 Aug >>>>>>>>> 2007) | 1 line >>>>>>>>> >>>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 (Fri, 03 Aug >>>>>>>>> 2007) | 1 line >>>>>>>>> >>>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 (Fri, 03 Aug >>>>>>>>> 2007) | 1 line >>>>>>>>> >>>>>>>>> a very small readme for provider-deef >>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 (Wed, 27 Jun >>>>>>>>> 2007) | 1 line >>>>>>>>> >>>>>>>>> remove dist directory form svn >>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 (Wed, 27 Jun >>>>>>>>> 2007) | 20 lines >>>>>>>>> >>>>>>>>> provider-deef, the Falkon/cog provider >>>>>>>>> >>>>>>>>> based on source in below message, with .class files deleted >>>>>>>>> >>>>>>>>> >>>>>>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>>>>>>>> From: Veronika Nefedova >>>>>>>>> To: Yong Zhao >>>>>>>>> Cc: Ben Clifford , Mihael Hategan >>>>>>>>> , >>>>>>>>> iraicu at cs.uchicago.edu, Ian Foster , >>>>>>>>> Mike Wilde , >>>>>>>>> Tiberiu Stef-Praun >>>>>>>>> Subject: Re: 244 molecule MolDyn run... >>>>>>>>> >>>>>>>>> its on viper.uchicago.edu >>>>>>>>> in : /home/nefedova/cogl/modules/provider-deef/ >>>>>>>>> I also tared it up and put in my home on terminable: ~nefedova/cogl.tgz >>>>>>>>> >>>>>>>>> Nika >>>>>>>>> >>>>>>>>> >>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>> >>>>>>>>> >>>>>>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> Mihael, do you have any clues on why this run has failed? Ioan - my >>>>>>>>>> answers to your questions are below... >>>>>>>>>> >>>>>>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> It looks like viper (where Swift is running) is idle, and so is tg- >>>>>>>>>>> viz-login2 (where Falkon is running). >>>>>>>>>>> What looks evident to me is that the normal list of events is for a >>>>>>>>>>> successful task: >>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops-zhgo6be8tjhi1.log >>>>>>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, identity=urn: >>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Submitted >>>>>>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread notification: urn: >>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989 0 >>>>>>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, identity=urn: >>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989) setting status to Completed >>>>>>>>>>> >>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>>>>>> Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>> 17566 175660 2179412 >>>>>>>>>>> >>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "NotificationThread >>>>>>>>>>> notification" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>> 7959 55713 785035 >>>>>>>>>>> >>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status to >>>>>>>>>>> Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>> 190968 1909680 24003796 >>>>>>>>>>> >>>>>>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were received >>>>>>>>>>> from Falkon, and 190968 tasks were set to completed... >>>>>>>>>>> >>>>>>>>>>> Obviously this isn't right. Falkon only saw 7959 tasks, so I would >>>>>>>>>>> argue that the # of notifications received is correct. The >>>>>>>>>>> submitted # of tasks looks like the # I would have expected, but >>>>>>>>>>> all the tasks did not make it to Falkon. The Falkon provider is >>>>>>>>>>> what sits between the change of status to submitted, and the >>>>>>>>>>> receipt of the notification, so I would say that is the first place >>>>>>>>>>> we need to look for more details... there used to some extra debug >>>>>>>>>>> info in the Falkon provider that simply printed all the tasks that >>>>>>>>>>> were actually being submitted to Falkon (as opposed to just the >>>>>>>>>>> change of status within Karajan). I don't see those debug >>>>>>>>>>> statements, I bet they got overwritten in the SVN update. >>>>>>>>>>> What about the completed tasks, why are there so many (190K) >>>>>>>>>>> completed tasks? Where did they come from? >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> "Task" doesn't mean job. It could be just data being staged in , etc. >>>>>>>>>> The first 2 are important -- (Submitted vs Completed). Since it >>>>>>>>>> differs, this is the problem... >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Yong, are you keeping up with these emails? Do you still have a >>>>>>>>>>> copy of the latest Falkon provider that you edited just before you >>>>>>>>>>> left? Can you just take a look through there to make sure nothing >>>>>>>>>>> has been broken with the SVN updates? If you don't have time for >>>>>>>>>>> this now (considering today was your first day on the new job), >>>>>>>>>>> I'll dig through there and see if I can make some sense of what is >>>>>>>>>>> happening! >>>>>>>>>>> >>>>>>>>>>> One last thing, Ben mentioned that the Falkon provider you saw in >>>>>>>>>>> Nika's account was different than what was in SVN. Ben, did you at >>>>>>>>>>> least look at modification dates? How old was one as opposed to >>>>>>>>>>> the other? I hope we did not revert back to an older version that >>>>>>>>>>> might have had some bug in it.... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> I had to update to the latest version of provider-deef from SVN since >>>>>>>>>> without the update nothing worked. The version I am at now is 1050. >>>>>>>>>> But this is exactly the same version of swift/deef I used for our >>>>>>>>>> Friday run (which 'worked' from Falcon/Swift point of view) >>>>>>>>>> >>>>>>>>>> Nika >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Ioan >>>>>>>>>>> >>>>>>>>>>> Veronika Nefedova wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Well, there are some discrepancies: >>>>>>>>>>>> >>>>>>>>>>>> nefedova at viper:~/alamines> grep "Completed job" MolDyn-244-loops- >>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>> 7959 244749 3241072 >>>>>>>>>>>> nefedova at viper:~/alamines> grep "Running job" MolDyn-244-loops- >>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>> 17207 564648 7949388 >>>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>>> >>>>>>>>>>>> I.e. almost half of the jobs haven't finished (according to swift) >>>>>>>>>>>> >>>>>>>>>>>> I also have some exceptions: >>>>>>>>>>>> >>>>>>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, identity=urn: >>>>>>>>>>>> 0-1-101-2-37-0-0-1186444363341) setting status to Failed Exception >>>>>>>>>>>> in getFile >>>>>>>>>>>> (80 of those): >>>>>>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>> 80 880 9705 >>>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> Nika >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Swift-devel mailing list >>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Swift-devel mailing list >>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> > > > From benc at hawaga.org.uk Thu Aug 9 13:07:59 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 9 Aug 2007 18:07:59 +0000 (GMT) Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BB5692.90305@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> Message-ID: On Thu, 9 Aug 2007, Ioan Raicu wrote: > I can clean them up... as I suppose there should only be a single instance of > these stubs, in a lib directory! Where should this master jar be, in which > lib directory? in the SVN, ideally only one copy of any piece of code. I think the right place is https://svn.ci.uchicago.edu/svn/vdl2/provider-deef/lib That's the only place I see GenericPortal.jar in the provider-deef and swift SVN trees. Do you see elsewhere? if so where? -- From nefedova at mcs.anl.gov Thu Aug 9 13:07:48 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Thu, 9 Aug 2007 13:07:48 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BB5692.90305@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186610870.6478.0.camel@blabla.mcs.anl.gov> < 46BA4555.2010401@cs.uchicago.edu> <46BA6F30.3040908@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> Message-ID: <0FA3AC5C-E258-48C7-8852-5907E6FCBDC5@mcs.anl.gov> Its such a mess... really, we should start using SVN asap. Nika On Aug 9, 2007, at 1:01 PM, Ioan Raicu wrote: > I don't know, and the machine looked relatively idle... I am > trying to track down what Falkon stubs are in Nika's Swift install, > from the looks of it, they are really old, back from March. I have > not made many changes, but in the last month or so, I did change > the notificaiton engine to support persistent connections to speed > it up a bit. I made it so its backwards compatible, but maybe its > not 100%. Basically, the service was using persistent socket > support, while the client was not, and maybe that cause some > problem. I am now updating the Falkon stubs. I found a single jar > file in the modules/provider-deef, which I have updated and > committed! But, there are another bunch of them all over the > place... let me track them down, and see if I can clean them up... > as I suppose there should only be a single instance of these stubs, > in a lib directory! Where should this master jar be, in which lib > directory? > Ioan > > Mihael Hategan wrote: >> So I see a gap in the log from 19:32 to 21:45. No log messages >> whatsoever in between. Which is weird. I wonder what could cause >> log4j >> to stop writing things to the log file. >> >> On Wed, 2007-08-08 at 20:34 -0500, Ioan Raicu wrote: >> >>> Things are not halted, Falkon is still running, and its >>> delivering results really slowly... >>> >>> http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>> >>> notice the black area in the second graph, that is the time to >>> deliver notificaitons to Swift... all machines are basically >>> idle, I don't know what it could be... there is ample space on >>> the disks... CPU is idle, memory is OK, yet things are just >>> crawling, and Swift seems to have stopped printing anything to >>> the screen or file. >>> The logs show nothing strange... but there is obviosuly something >>> that is not right... >>> >>> I'll let the experiment keep going for now, and I'll dig into it >>> deeper later tonight... >>> >>> Ioan >>> >>> Veronika Nefedova wrote: >>> >>>> Everything seemed to come to a halt. >>>> >>>> This is the last stdout that I have: >>>> >>>> Staged out MolDyn-244-loops-knt9h8fru9sm2/shared/ >>>> solv_repu_0.7_0.8_a0_m040.wham to solv_repu_0.7_0.8_a0_m040.wham >>>> from UC-64 >>>> Staged out MolDyn-244-loops-knt9h8fru9sm2/shared/ >>>> solv_repu_0.7_0.8_a0_m040_done to solv_repu_0.7_0.8_a0_m040_done >>>> from UC-64 >>>> Submitting task Task(type=4, identity=urn: >>>> 0-1-91-2-29-0-0-2-1186617126510) >>>> No host specified >>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) >>>> setting status to Active >>>> Submitting task Task(type=4, identity=urn: >>>> 0-1-91-2-29-0-0-1-1186617126513) >>>> No host specified >>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) >>>> setting status to Completed >>>> Submitting task Task(type=4, identity=urn: >>>> 0-1-91-2-29-0-0-3-1186617126516) >>>> No host specified >>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) >>>> setting status to Active >>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) >>>> Completed. Waiting: 1, Running: 14926. Heap size: 1518M, Heap >>>> free: 962M, Max heap: 1518M >>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) >>>> setting status to Completed >>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) >>>> Completed. Waiting: 0, Running: 14926. Heap size: 1518M, Heap >>>> free: 962M, Max heap: 1518M >>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) >>>> setting status to Active >>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) >>>> setting status to Completed >>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) >>>> Completed. Waiting: 0, Running: 14925. Heap size: 1518M, Heap >>>> free: 962M, Max heap: 1518M >>>> Submitting task Task(type=4, identity=urn: >>>> 0-1-91-2-29-0-0-4-1186617126519) >>>> No host specified >>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) >>>> setting status to Active >>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) >>>> setting status to Completed >>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) >>>> Completed. Waiting: 0, Running: 14925. Heap size: 1518M, Heap >>>> free: 962M, Max heap: 1518M >>>> Resolved 2078 to UC-64 >>>> chrm_long completed >>>> >>>> >>>> >>>> Notice 'No host specified' -- this message was printing >>>> throughout the whole execution, from the very beginning. >>>> The log is in ~nefedova/alamines/MolDyn-244-loops- >>>> knt9h8fru9sm2.log on viper >>>> >>>> Nika >>>> >>>> On Aug 8, 2007, at 5:36 PM, Ioan Raicu wrote: >>>> >>>> >>>>> viper in Yong's account... he ran some tests just before he >>>>> left with this version, and it worked just fine! >>>>> I saved Nika's provider which I replaced, so we can always go >>>>> back to that if we need to. >>>>> >>>>> Ioan >>>>> >>>>> Mihael Hategan wrote: >>>>> >>>>>> Where exactly is this version? >>>>>> >>>>>> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: >>>>>> >>>>>>> OK everyone, I found Yong's version of the provider dated >>>>>>> July 26th, >>>>>>> much more recent than what was in SVN on June 27th. I >>>>>>> updated Nika's >>>>>>> version of the provider (which has been checked out of SVN), and >>>>>>> recompiled&deploy! >>>>>>> ant distclean >>>>>>> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/ >>>>>>> vdsk-0.2-dev/ >>>>>>> dist >>>>>>> >>>>>>> I even updated updated some of the logging info to use the >>>>>>> logger >>>>>>> (some were not using the logger). >>>>>>> >>>>>>> Nika, Falkon is freshly restarted and ready for another test >>>>>>> run! >>>>>>> >>>>>>> Falkon Factory Service: >>>>>>> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/ >>>>>>> GenericPortal/core/WS/GPFactoryService >>>>>>> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>>>>> >>>>>>> Ioan >>>>>>> >>>>>>> Veronika Nefedova wrote: >>>>>>>> Ioan, >>>>>>>> >>>>>>>> It looks like the Falcon (including provider-deef) was put >>>>>>>> in SVN on >>>>>>>> June 27th. You really were supposed to use the SVN code from >>>>>>>> that >>>>>>>> point. Sigh. Did you do any changes to viper install after June >>>>>>>> 27th? >>>>>>>> >>>>>>>> >>>>>>>> Nika >>>>>>>> >>>>>>>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: >>>>>>>> >>>>>>>> >>>>>>>>> Could it be that the fixes were done before the original SVN >>>>>>>>> checkin? If not, then at least we know why things aren't >>>>>>>>> working. I bet the latest provider source was in Nika's Swift >>>>>>>>> install on viper. Nika, I take it you don't have this >>>>>>>>> anymore, as >>>>>>>>> SVN updates overwrote this. Yong, is there any other place >>>>>>>>> you >>>>>>>>> might have the latest provider source? If not, I guess we >>>>>>>>> need to >>>>>>>>> take another look through the provider source to fix the >>>>>>>>> issues >>>>>>>>> that we knew of... >>>>>>>>> >>>>>>>>> Ioan >>>>>>>>> >>>>>>>>> Mihael Hategan wrote: >>>>>>>>>> Well, it doesn't look like the falkon provider in SVN has >>>>>>>>>> been updated >>>>>>>>>> at all in terms of fixing synchronization issues. All >>>>>>>>>> commits on >>>>>>>>>> provider-deef come from either ben or me: >>>>>>>>>> >>>>>>>>>> bash-3.1$ svn log >>>>>>>>>> ------------------------------------------------------------- >>>>>>>>>> ----------- >>>>>>>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 >>>>>>>>>> -0500 (Fri, 03 Aug >>>>>>>>>> 2007) | 1 line >>>>>>>>>> >>>>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>>>> ------------------------------------------------------------- >>>>>>>>>> ----------- >>>>>>>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 >>>>>>>>>> -0500 (Fri, 03 Aug >>>>>>>>>> 2007) | 1 line >>>>>>>>>> >>>>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>>>> ------------------------------------------------------------- >>>>>>>>>> ----------- >>>>>>>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 >>>>>>>>>> (Fri, 03 Aug >>>>>>>>>> 2007) | 1 line >>>>>>>>>> >>>>>>>>>> a very small readme for provider-deef >>>>>>>>>> ------------------------------------------------------------- >>>>>>>>>> ----------- >>>>>>>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 >>>>>>>>>> (Wed, 27 Jun >>>>>>>>>> 2007) | 1 line >>>>>>>>>> >>>>>>>>>> remove dist directory form svn >>>>>>>>>> ------------------------------------------------------------- >>>>>>>>>> ----------- >>>>>>>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 >>>>>>>>>> (Wed, 27 Jun >>>>>>>>>> 2007) | 20 lines >>>>>>>>>> >>>>>>>>>> provider-deef, the Falkon/cog provider >>>>>>>>>> >>>>>>>>>> based on source in below message, with .class files deleted >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>>>>>>>>> From: Veronika Nefedova >>>>>>>>>> To: Yong Zhao >>>>>>>>>> Cc: Ben Clifford , Mihael Hategan >>>>>>>>>> , >>>>>>>>>> iraicu at cs.uchicago.edu, Ian Foster , >>>>>>>>>> Mike Wilde , >>>>>>>>>> Tiberiu Stef-Praun >>>>>>>>>> Subject: Re: 244 molecule MolDyn run... >>>>>>>>>> >>>>>>>>>> its on viper.uchicago.edu >>>>>>>>>> in : /home/nefedova/cogl/modules/provider-deef/ >>>>>>>>>> I also tared it up and put in my home on terminable: >>>>>>>>>> ~nefedova/cogl.tgz >>>>>>>>>> >>>>>>>>>> Nika >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> ------------------------------------------------------------- >>>>>>>>>> ----------- >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>>>>>>>>> >>>>>>>>>>> Mihael, do you have any clues on why this run has failed? >>>>>>>>>>> Ioan - my answers to your questions are below... >>>>>>>>>>> >>>>>>>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> It looks like viper (where Swift is running) is idle, >>>>>>>>>>>> and so is tg- viz-login2 (where Falkon is running). >>>>>>>>>>>> What looks evident to me is that the normal list of >>>>>>>>>>>> events is for a successful task: >>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989" MolDyn-244-loops- >>>>>>>>>>>> zhgo6be8tjhi1.log >>>>>>>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, >>>>>>>>>>>> identity=urn: 0-1-73-2-31-0-0-1186444341989) setting >>>>>>>>>>>> status to Submitted >>>>>>>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread >>>>>>>>>>>> notification: urn: 0-1-73-2-31-0-0-1186444341989 0 >>>>>>>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, >>>>>>>>>>>> identity=urn: 0-1-73-2-31-0-0-1186444341989) setting >>>>>>>>>>>> status to Completed >>>>>>>>>>>> >>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting >>>>>>>>>>>> status to Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log >>>>>>>>>>>> | wc >>>>>>>>>>>> 17566 175660 2179412 >>>>>>>>>>>> >>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep >>>>>>>>>>>> "NotificationThread notification" MolDyn-244-loops- >>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>> 7959 55713 785035 >>>>>>>>>>>> >>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting >>>>>>>>>>>> status to Completed" MolDyn-244-loops-zhgo6be8tjhi1.log >>>>>>>>>>>> | wc >>>>>>>>>>>> 190968 1909680 24003796 >>>>>>>>>>>> >>>>>>>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were >>>>>>>>>>>> received from Falkon, and 190968 tasks were set to >>>>>>>>>>>> completed... >>>>>>>>>>>> >>>>>>>>>>>> Obviously this isn't right. Falkon only saw 7959 tasks, >>>>>>>>>>>> so I would argue that the # of notifications received >>>>>>>>>>>> is correct. The submitted # of tasks looks like the # >>>>>>>>>>>> I would have expected, but all the tasks did not make >>>>>>>>>>>> it to Falkon. The Falkon provider is what sits between >>>>>>>>>>>> the change of status to submitted, and the receipt of >>>>>>>>>>>> the notification, so I would say that is the first >>>>>>>>>>>> place we need to look for more details... there used to >>>>>>>>>>>> some extra debug info in the Falkon provider that >>>>>>>>>>>> simply printed all the tasks that were actually being >>>>>>>>>>>> submitted to Falkon (as opposed to just the change of >>>>>>>>>>>> status within Karajan). I don't see those debug >>>>>>>>>>>> statements, I bet they got overwritten in the SVN update. >>>>>>>>>>>> What about the completed tasks, why are there so many >>>>>>>>>>>> (190K) completed tasks? Where did they come from? >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> "Task" doesn't mean job. It could be just data being >>>>>>>>>>> staged in , etc. The first 2 are important -- (Submitted >>>>>>>>>>> vs Completed). Since it differs, this is the problem... >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Yong, are you keeping up with these emails? Do you >>>>>>>>>>>> still have a copy of the latest Falkon provider that >>>>>>>>>>>> you edited just before you left? Can you just take a >>>>>>>>>>>> look through there to make sure nothing has been broken >>>>>>>>>>>> with the SVN updates? If you don't have time for this >>>>>>>>>>>> now (considering today was your first day on the new >>>>>>>>>>>> job), I'll dig through there and see if I can make some >>>>>>>>>>>> sense of what is happening! >>>>>>>>>>>> >>>>>>>>>>>> One last thing, Ben mentioned that the Falkon provider >>>>>>>>>>>> you saw in Nika's account was different than what was >>>>>>>>>>>> in SVN. Ben, did you at least look at modification >>>>>>>>>>>> dates? How old was one as opposed to the other? I >>>>>>>>>>>> hope we did not revert back to an older version that >>>>>>>>>>>> might have had some bug in it.... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>> I had to update to the latest version of provider-deef >>>>>>>>>>> from SVN since without the update nothing worked. The >>>>>>>>>>> version I am at now is 1050. But this is exactly the >>>>>>>>>>> same version of swift/deef I used for our Friday run >>>>>>>>>>> (which 'worked' from Falcon/Swift point of view) >>>>>>>>>>> >>>>>>>>>>> Nika >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>>> Ioan >>>>>>>>>>>> >>>>>>>>>>>> Veronika Nefedova wrote: >>>>>>>>>>>> >>>>>>>>>>>>> Well, there are some discrepancies: >>>>>>>>>>>>> >>>>>>>>>>>>> nefedova at viper:~/alamines> grep "Completed job" >>>>>>>>>>>>> MolDyn-244-loops- zhgo6be8tjhi1.log | wc >>>>>>>>>>>>> 7959 244749 3241072 >>>>>>>>>>>>> nefedova at viper:~/alamines> grep "Running job" >>>>>>>>>>>>> MolDyn-244-loops- zhgo6be8tjhi1.log | wc >>>>>>>>>>>>> 17207 564648 7949388 >>>>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>>>> >>>>>>>>>>>>> I.e. almost half of the jobs haven't finished >>>>>>>>>>>>> (according to swift) >>>>>>>>>>>>> >>>>>>>>>>>>> I also have some exceptions: >>>>>>>>>>>>> >>>>>>>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, >>>>>>>>>>>>> identity=urn: 0-1-101-2-37-0-0-1186444363341) setting >>>>>>>>>>>>> status to Failed Exception in getFile >>>>>>>>>>>>> (80 of those): >>>>>>>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244- >>>>>>>>>>>>> loops- zhgo6be8tjhi1.log | wc >>>>>>>>>>>>> 80 880 9705 >>>>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>>> Nika >>>>>>>>>>>>> >>>>>>>>>>> _______________________________________________ >>>>>>>>>>> Swift-devel mailing list >>>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Swift-devel mailing list >>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>> >>>>>>>> >>>>>> >> >> >> > From benc at hawaga.org.uk Thu Aug 9 13:10:35 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 9 Aug 2007 18:10:35 +0000 (GMT) Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> Message-ID: I don't see a commit from you on http://www.ci.uchicago.edu/trac/swift/timeline When you commit, you should get a revision number like this: $ svn commit Sending security/GridProxyCertificates.xml Sending security/UsingCertificates.xml Sending security/index.xml Transmitting file data ... Committed revision 80. and the 80 should be (for the swift SVN) somewhere around 1074. -- From benc at hawaga.org.uk Thu Aug 9 13:12:51 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 9 Aug 2007 18:12:51 +0000 (GMT) Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> Message-ID: also note that if you really do ahve stuff in SVN properly, it should take around 5 minutes to make a completely new cog+swift+provider-deef install in a completely new directory. -- From iraicu at cs.uchicago.edu Thu Aug 9 13:54:17 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 13:54:17 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> Message-ID: <46BB62D9.9040409@cs.uchicago.edu> a trimmed version of ls -R from viper:/home/nefedova/cogl/modules directory, yielded... ./karajan/dist/karajan-0.35/lib: GenericPortal.jar ./vdsk/dist/oldstuff/vdsk-0.1-dev/lib: GenericPortal.jar ./vdsk/dist/oldstuff/vdsk-1.0: GenericPortal.jar ./vdsk/dist/oldstuff/vdsk-1.0/lib: GenericPortal.jar ./vdsk/dist/vdsk-0.2-dev/lib: GenericPortal.jar Now, I renamed the jar to FalkonStubs.jar and it can be found in /home/nefedova/cogl/modules/provider-deef/lib/ Is it safe to assume that I can remove all the GenericPortal.jar... when I deploy the provider-deef, will it automatically copy the new FalkonStubs.jar to the installed Swift lib directory? Thanks, Ioan Ben Clifford wrote: > On Thu, 9 Aug 2007, Ioan Raicu wrote: > > >> I can clean them up... as I suppose there should only be a single instance of >> these stubs, in a lib directory! Where should this master jar be, in which >> lib directory? >> > > in the SVN, ideally only one copy of any piece of code. I think the right > place is https://svn.ci.uchicago.edu/svn/vdl2/provider-deef/lib > > That's the only place I see GenericPortal.jar in the provider-deef and > swift SVN trees. Do you see elsewhere? if so where? > From iraicu at cs.uchicago.edu Thu Aug 9 13:55:27 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 13:55:27 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <0FA3AC5C-E258-48C7-8852-5907E6FCBDC5@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <46B790B8.3080004@cs.uchicago.edu> <9D95EF12-F343-4E9E-BAC2-D954C1694CA4@mcs.anl.gov> <3154CC6E-E677-4BB9-B589-C89F4021B252@mcs.anl.gov> <46B7A919.1050507@cs.uchicago.edu> <1EF6C229-88B4-4F58-844A-ED03F8A86822@wideopenwest.com> <46B7E6EF.60809! 09@cs.uchicago.edu> <5538B31F-FA27-4725-96E3-CCF18FB8E589@wideopenwest.com> <1186503852.18998.3.camel@blabla.mcs.anl.gov> <46B89E8D. 1060809@cs.uchicago.edu> <46B9F66E.5060103@cs.uchicago.edu> <1186610870.6478.0.camel@blabla.mcs.anl.gov> < 46BA4555.2010401@cs.uchicago.edu> <46BA6F30.3040908@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB 5692.90305@cs.uchicago.edu> <0FA3AC5C-E258-48C7-8852-5907E6FCBDC5@mcs.anl.gov> Message-ID: <46BB631F.1010906@cs.uchicago.edu> Right, that is what we are trying to do now... get a working version of the falkon provider, and commit the changes... Veronika Nefedova wrote: > Its such a mess... really, we should start using SVN asap. > > Nika > > On Aug 9, 2007, at 1:01 PM, Ioan Raicu wrote: > >> I don't know, and the machine looked relatively idle... I am trying >> to track down what Falkon stubs are in Nika's Swift install, from the >> looks of it, they are really old, back from March. I have not made >> many changes, but in the last month or so, I did change the >> notificaiton engine to support persistent connections to speed it up >> a bit. I made it so its backwards compatible, but maybe its not >> 100%. Basically, the service was using persistent socket support, >> while the client was not, and maybe that cause some problem. I am >> now updating the Falkon stubs. I found a single jar file in the >> modules/provider-deef, which I have updated and committed! But, >> there are another bunch of them all over the place... let me track >> them down, and see if I can clean them up... as I suppose there >> should only be a single instance of these stubs, in a lib directory! >> Where should this master jar be, in which lib directory? >> Ioan >> >> Mihael Hategan wrote: >>> So I see a gap in the log from 19:32 to 21:45. No log messages >>> whatsoever in between. Which is weird. I wonder what could cause log4j >>> to stop writing things to the log file. >>> >>> On Wed, 2007-08-08 at 20:34 -0500, Ioan Raicu wrote: >>> >>>> Things are not halted, Falkon is still running, and its delivering >>>> results really slowly... >>>> >>>> http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>> >>>> notice the black area in the second graph, that is the time to >>>> deliver notificaitons to Swift... all machines are basically idle, >>>> I don't know what it could be... there is ample space on the >>>> disks... CPU is idle, memory is OK, yet things are just crawling, >>>> and Swift seems to have stopped printing anything to the screen or >>>> file. >>>> The logs show nothing strange... but there is obviosuly something >>>> that is not right... >>>> >>>> I'll let the experiment keep going for now, and I'll dig into it >>>> deeper later tonight... >>>> >>>> Ioan >>>> >>>> Veronika Nefedova wrote: >>>> >>>>> Everything seemed to come to a halt. >>>>> >>>>> This is the last stdout that I have: >>>>> >>>>> Staged out >>>>> MolDyn-244-loops-knt9h8fru9sm2/shared/solv_repu_0.7_0.8_a0_m040.wham >>>>> to solv_repu_0.7_0.8_a0_m040.wham from UC-64 >>>>> Staged out >>>>> MolDyn-244-loops-knt9h8fru9sm2/shared/solv_repu_0.7_0.8_a0_m040_done >>>>> to solv_repu_0.7_0.8_a0_m040_done from UC-64 >>>>> Submitting task Task(type=4, >>>>> identity=urn:0-1-91-2-29-0-0-2-1186617126510) >>>>> No host specified >>>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) setting >>>>> status to Active >>>>> Submitting task Task(type=4, >>>>> identity=urn:0-1-91-2-29-0-0-1-1186617126513) >>>>> No host specified >>>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) setting >>>>> status to Completed >>>>> Submitting task Task(type=4, >>>>> identity=urn:0-1-91-2-29-0-0-3-1186617126516) >>>>> No host specified >>>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) setting >>>>> status to Active >>>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-2-1186617126510) >>>>> Completed. Waiting: 1, Running: 14926. Heap size: 1518M, Heap >>>>> free: 962M, Max heap: 1518M >>>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) setting >>>>> status to Completed >>>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-1-1186617126513) >>>>> Completed. Waiting: 0, Running: 14926. Heap size: 1518M, Heap >>>>> free: 962M, Max heap: 1518M >>>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) setting >>>>> status to Active >>>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) setting >>>>> status to Completed >>>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-3-1186617126516) >>>>> Completed. Waiting: 0, Running: 14925. Heap size: 1518M, Heap >>>>> free: 962M, Max heap: 1518M >>>>> Submitting task Task(type=4, >>>>> identity=urn:0-1-91-2-29-0-0-4-1186617126519) >>>>> No host specified >>>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) setting >>>>> status to Active >>>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) setting >>>>> status to Completed >>>>> Task(type=4, identity=urn:0-1-91-2-29-0-0-4-1186617126519) >>>>> Completed. Waiting: 0, Running: 14925. Heap size: 1518M, Heap >>>>> free: 962M, Max heap: 1518M >>>>> Resolved 2078 to UC-64 >>>>> chrm_long completed >>>>> >>>>> >>>>> >>>>> Notice 'No host specified' -- this message was printing throughout >>>>> the whole execution, from the very beginning. >>>>> The log is in >>>>> ~nefedova/alamines/MolDyn-244-loops-knt9h8fru9sm2.log on viper >>>>> >>>>> Nika >>>>> >>>>> On Aug 8, 2007, at 5:36 PM, Ioan Raicu wrote: >>>>> >>>>> >>>>>> viper in Yong's account... he ran some tests just before he left >>>>>> with this version, and it worked just fine! >>>>>> I saved Nika's provider which I replaced, so we can always go >>>>>> back to that if we need to. >>>>>> >>>>>> Ioan >>>>>> >>>>>> Mihael Hategan wrote: >>>>>> >>>>>>> Where exactly is this version? >>>>>>> >>>>>>> On Wed, 2007-08-08 at 11:59 -0500, Ioan Raicu wrote: >>>>>>> >>>>>>>> OK everyone, I found Yong's version of the provider dated July >>>>>>>> 26th, >>>>>>>> much more recent than what was in SVN on June 27th. I updated >>>>>>>> Nika's >>>>>>>> version of the provider (which has been checked out of SVN), and >>>>>>>> recompiled&deploy! >>>>>>>> ant distclean >>>>>>>> ant >>>>>>>> -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ >>>>>>>> dist >>>>>>>> >>>>>>>> I even updated updated some of the logging info to use the logger >>>>>>>> (some were not using the logger). >>>>>>>> >>>>>>>> Nika, Falkon is freshly restarted and ready for another test run! >>>>>>>> >>>>>>>> Falkon Factory Service: >>>>>>>> http://tg-viz-login2.uc.teragrid.org:50020/wsrf/services/GenericPortal/core/WS/GPFactoryService >>>>>>>> >>>>>>>> Web Server: http://tg-viz-login2.uc.teragrid.org:51000/index.htm >>>>>>>> >>>>>>>> Ioan >>>>>>>> >>>>>>>> Veronika Nefedova wrote: >>>>>>>>> Ioan, >>>>>>>>> >>>>>>>>> It looks like the Falcon (including provider-deef) was put in >>>>>>>>> SVN on >>>>>>>>> June 27th. You really were supposed to use the SVN code from that >>>>>>>>> point. Sigh. Did you do any changes to viper install after June >>>>>>>>> 27th? >>>>>>>>> >>>>>>>>> >>>>>>>>> Nika >>>>>>>>> >>>>>>>>> On Aug 7, 2007, at 11:32 AM, Ioan Raicu wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>>> Could it be that the fixes were done before the original SVN >>>>>>>>>> checkin? If not, then at least we know why things aren't >>>>>>>>>> working. I bet the latest provider source was in Nika's Swift >>>>>>>>>> install on viper. Nika, I take it you don't have this >>>>>>>>>> anymore, as >>>>>>>>>> SVN updates overwrote this. Yong, is there any other place you >>>>>>>>>> might have the latest provider source? If not, I guess we >>>>>>>>>> need to >>>>>>>>>> take another look through the provider source to fix the issues >>>>>>>>>> that we knew of... >>>>>>>>>> >>>>>>>>>> Ioan >>>>>>>>>> >>>>>>>>>> Mihael Hategan wrote: >>>>>>>>>>> Well, it doesn't look like the falkon provider in SVN has >>>>>>>>>>> been updated >>>>>>>>>>> at all in terms of fixing synchronization issues. All >>>>>>>>>>> commits on >>>>>>>>>>> provider-deef come from either ben or me: >>>>>>>>>>> >>>>>>>>>>> bash-3.1$ svn log >>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>> >>>>>>>>>>> r1053 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:49:48 -0500 >>>>>>>>>>> (Fri, 03 Aug >>>>>>>>>>> 2007) | 1 line >>>>>>>>>>> >>>>>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>> >>>>>>>>>>> r1052 | hategan at CI.UCHICAGO.EDU | 2007-08-03 14:48:25 -0500 >>>>>>>>>>> (Fri, 03 Aug >>>>>>>>>>> 2007) | 1 line >>>>>>>>>>> >>>>>>>>>>> removed gt4 stuff and added them as a dependency >>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>> >>>>>>>>>>> r1051 | benc at CI.UCHICAGO.EDU | 2007-08-03 14:20:21 -0500 >>>>>>>>>>> (Fri, 03 Aug >>>>>>>>>>> 2007) | 1 line >>>>>>>>>>> >>>>>>>>>>> a very small readme for provider-deef >>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>> >>>>>>>>>>> r875 | benc at CI.UCHICAGO.EDU | 2007-06-27 15:00:12 -0500 >>>>>>>>>>> (Wed, 27 Jun >>>>>>>>>>> 2007) | 1 line >>>>>>>>>>> >>>>>>>>>>> remove dist directory form svn >>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>> >>>>>>>>>>> r873 | benc at CI.UCHICAGO.EDU | 2007-06-27 10:23:15 -0500 >>>>>>>>>>> (Wed, 27 Jun >>>>>>>>>>> 2007) | 20 lines >>>>>>>>>>> >>>>>>>>>>> provider-deef, the Falkon/cog provider >>>>>>>>>>> >>>>>>>>>>> based on source in below message, with .class files deleted >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Date: Wed, 27 Jun 2007 09:27:23 -0500 >>>>>>>>>>> From: Veronika Nefedova >>>>>>>>>>> To: Yong Zhao >>>>>>>>>>> Cc: Ben Clifford , Mihael Hategan >>>>>>>>>>> , >>>>>>>>>>> iraicu at cs.uchicago.edu, Ian Foster , >>>>>>>>>>> Mike Wilde , >>>>>>>>>>> Tiberiu Stef-Praun >>>>>>>>>>> Subject: Re: 244 molecule MolDyn run... >>>>>>>>>>> >>>>>>>>>>> its on viper.uchicago.edu >>>>>>>>>>> in : /home/nefedova/cogl/modules/provider-deef/ >>>>>>>>>>> I also tared it up and put in my home on terminable: >>>>>>>>>>> ~nefedova/cogl.tgz >>>>>>>>>>> >>>>>>>>>>> Nika >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> ------------------------------------------------------------------------ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> On Tue, 2007-08-07 at 10:01 -0500, Veronika Nefedova wrote: >>>>>>>>>>> >>>>>>>>>>>> Mihael, do you have any clues on why this run has failed? >>>>>>>>>>>> Ioan - my answers to your questions are below... >>>>>>>>>>>> >>>>>>>>>>>> On Aug 6, 2007, at 10:28 PM, Ioan Raicu wrote: >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> It looks like viper (where Swift is running) is idle, and >>>>>>>>>>>>> so is tg- viz-login2 (where Falkon is running). >>>>>>>>>>>>> What looks evident to me is that the normal list of events >>>>>>>>>>>>> is for a successful task: >>>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "urn: >>>>>>>>>>>>> 0-1-73-2-31-0-0-1186444341989" >>>>>>>>>>>>> MolDyn-244-loops-zhgo6be8tjhi1.log >>>>>>>>>>>>> 2007-08-06 19:08:25,121 DEBUG TaskImpl Task(type=1, >>>>>>>>>>>>> identity=urn: 0-1-73-2-31-0-0-1186444341989) setting >>>>>>>>>>>>> status to Submitted >>>>>>>>>>>>> 2007-08-06 20:58:17,685 DEBUG NotificationThread >>>>>>>>>>>>> notification: urn: 0-1-73-2-31-0-0-1186444341989 0 >>>>>>>>>>>>> 2007-08-06 20:58:17,723 DEBUG TaskImpl Task(type=1, >>>>>>>>>>>>> identity=urn: 0-1-73-2-31-0-0-1186444341989) setting >>>>>>>>>>>>> status to Completed >>>>>>>>>>>>> >>>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status >>>>>>>>>>>>> to Submitted" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>>>> 17566 175660 2179412 >>>>>>>>>>>>> >>>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep >>>>>>>>>>>>> "NotificationThread notification" >>>>>>>>>>>>> MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>>>> 7959 55713 785035 >>>>>>>>>>>>> >>>>>>>>>>>>> iraicu at viper:/home/nefedova/alamines> grep "setting status >>>>>>>>>>>>> to Completed" MolDyn-244-loops-zhgo6be8tjhi1.log | wc >>>>>>>>>>>>> 190968 1909680 24003796 >>>>>>>>>>>>> >>>>>>>>>>>>> Now, 17566 tasks were submitted, 7959 notifiation were >>>>>>>>>>>>> received from Falkon, and 190968 tasks were set to >>>>>>>>>>>>> completed... >>>>>>>>>>>>> >>>>>>>>>>>>> Obviously this isn't right. Falkon only saw 7959 tasks, >>>>>>>>>>>>> so I would argue that the # of notifications received is >>>>>>>>>>>>> correct. The submitted # of tasks looks like the # I >>>>>>>>>>>>> would have expected, but all the tasks did not make it to >>>>>>>>>>>>> Falkon. The Falkon provider is what sits between the >>>>>>>>>>>>> change of status to submitted, and the receipt of the >>>>>>>>>>>>> notification, so I would say that is the first place we >>>>>>>>>>>>> need to look for more details... there used to some extra >>>>>>>>>>>>> debug info in the Falkon provider that simply printed all >>>>>>>>>>>>> the tasks that were actually being submitted to Falkon >>>>>>>>>>>>> (as opposed to just the change of status within >>>>>>>>>>>>> Karajan). I don't see those debug statements, I bet they >>>>>>>>>>>>> got overwritten in the SVN update. >>>>>>>>>>>>> What about the completed tasks, why are there so many >>>>>>>>>>>>> (190K) completed tasks? Where did they come from? >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> "Task" doesn't mean job. It could be just data being staged >>>>>>>>>>>> in , etc. The first 2 are important -- (Submitted vs >>>>>>>>>>>> Completed). Since it differs, this is the problem... >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Yong, are you keeping up with these emails? Do you still >>>>>>>>>>>>> have a copy of the latest Falkon provider that you edited >>>>>>>>>>>>> just before you left? Can you just take a look through >>>>>>>>>>>>> there to make sure nothing has been broken with the SVN >>>>>>>>>>>>> updates? If you don't have time for this now >>>>>>>>>>>>> (considering today was your first day on the new job), >>>>>>>>>>>>> I'll dig through there and see if I can make some sense of >>>>>>>>>>>>> what is happening! >>>>>>>>>>>>> >>>>>>>>>>>>> One last thing, Ben mentioned that the Falkon provider you >>>>>>>>>>>>> saw in Nika's account was different than what was in >>>>>>>>>>>>> SVN. Ben, did you at least look at modification dates? >>>>>>>>>>>>> How old was one as opposed to the other? I hope we did >>>>>>>>>>>>> not revert back to an older version that might have had >>>>>>>>>>>>> some bug in it.... >>>>>>>>>>>>> >>>>>>>>>>>>> >>>>>>>>>>>> I had to update to the latest version of provider-deef from >>>>>>>>>>>> SVN since without the update nothing worked. The version I >>>>>>>>>>>> am at now is 1050. But this is exactly the same version of >>>>>>>>>>>> swift/deef I used for our Friday run (which 'worked' from >>>>>>>>>>>> Falcon/Swift point of view) >>>>>>>>>>>> >>>>>>>>>>>> Nika >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> Ioan >>>>>>>>>>>>> >>>>>>>>>>>>> Veronika Nefedova wrote: >>>>>>>>>>>>> >>>>>>>>>>>>>> Well, there are some discrepancies: >>>>>>>>>>>>>> >>>>>>>>>>>>>> nefedova at viper:~/alamines> grep "Completed job" >>>>>>>>>>>>>> MolDyn-244-loops- zhgo6be8tjhi1.log | wc >>>>>>>>>>>>>> 7959 244749 3241072 >>>>>>>>>>>>>> nefedova at viper:~/alamines> grep "Running job" >>>>>>>>>>>>>> MolDyn-244-loops- zhgo6be8tjhi1.log | wc >>>>>>>>>>>>>> 17207 564648 7949388 >>>>>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>>>>> >>>>>>>>>>>>>> I.e. almost half of the jobs haven't finished (according >>>>>>>>>>>>>> to swift) >>>>>>>>>>>>>> >>>>>>>>>>>>>> I also have some exceptions: >>>>>>>>>>>>>> >>>>>>>>>>>>>> 2007-08-06 19:08:49,378 DEBUG TaskImpl Task(type=2, >>>>>>>>>>>>>> identity=urn: 0-1-101-2-37-0-0-1186444363341) setting >>>>>>>>>>>>>> status to Failed Exception in getFile >>>>>>>>>>>>>> (80 of those): >>>>>>>>>>>>>> nefedova at viper:~/alamines> grep "ailed" MolDyn-244-loops- >>>>>>>>>>>>>> zhgo6be8tjhi1.log | wc >>>>>>>>>>>>>> 80 880 9705 >>>>>>>>>>>>>> nefedova at viper:~/alamines> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> Nika >>>>>>>>>>>>>> >>>>>>>>>>>> _______________________________________________ >>>>>>>>>>>> Swift-devel mailing list >>>>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>> _______________________________________________ >>>>>>>>>> Swift-devel mailing list >>>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>>> >>>>>>>>> >>>>>>> >>> >>> >>> >> > > From iraicu at cs.uchicago.edu Thu Aug 9 14:00:02 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 14:00:02 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> Message-ID: <46BB6432.3010107@cs.uchicago.edu> I did the commit, but I don't have it on the screen anymore. When I tried to do it again, it seems to ask me for a user name and passwd, which it didn't ask for last time, so obviously, it must not have worked. I don't seem to know my CI password, so I'll look into reseting it! I'll commit the changes as soon as I have a password. Ioan Ben Clifford wrote: > I don't see a commit from you on > http://www.ci.uchicago.edu/trac/swift/timeline > > When you commit, you should get a revision number like this: > $ svn commit > Sending security/GridProxyCertificates.xml > Sending security/UsingCertificates.xml > Sending security/index.xml > Transmitting file data ... > Committed revision 80. > > and the 80 should be (for the swift SVN) somewhere around 1074. > From hategan at mcs.anl.gov Thu Aug 9 14:02:30 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 09 Aug 2007 14:02:30 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BB62D9.9040409@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> Message-ID: <1186686150.29197.3.camel@blabla.mcs.anl.gov> On Thu, 2007-08-09 at 13:54 -0500, Ioan Raicu wrote: > a trimmed version of ls -R from viper:/home/nefedova/cogl/modules > directory, yielded... > ./karajan/dist/karajan-0.35/lib: > GenericPortal.jar > > ./vdsk/dist/oldstuff/vdsk-0.1-dev/lib: > GenericPortal.jar > > ./vdsk/dist/oldstuff/vdsk-1.0: > GenericPortal.jar > > ./vdsk/dist/oldstuff/vdsk-1.0/lib: > GenericPortal.jar > > ./vdsk/dist/vdsk-0.2-dev/lib: > GenericPortal.jar > > Now, I renamed the jar to FalkonStubs.jar and it can be found in > /home/nefedova/cogl/modules/provider-deef/lib/ > > Is it safe to assume that I can remove all the GenericPortal.jar... There's no need to. Those are dist directories. Just make sure you do ant distclean in vdsk before building. > when > I deploy the provider-deef, will it automatically copy the new > FalkonStubs.jar to the installed Swift lib directory? No. You need to edit provider-deef/project.properties. > > Thanks, > Ioan > > > Ben Clifford wrote: > > On Thu, 9 Aug 2007, Ioan Raicu wrote: > > > > > >> I can clean them up... as I suppose there should only be a single instance of > >> these stubs, in a lib directory! Where should this master jar be, in which > >> lib directory? > >> > > > > in the SVN, ideally only one copy of any piece of code. I think the right > > place is https://svn.ci.uchicago.edu/svn/vdl2/provider-deef/lib > > > > That's the only place I see GenericPortal.jar in the provider-deef and > > swift SVN trees. Do you see elsewhere? if so where? > > > From hategan at mcs.anl.gov Thu Aug 9 14:03:45 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 09 Aug 2007 14:03:45 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BB6432.3010107@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB6432.3010107@cs.uchicago.edu> Message-ID: <1186686225.29197.5.camel@blabla.mcs.anl.gov> If you don't know your CI password and you've never done a commit to the Swift SVN before, then you didn't do the commit. On Thu, 2007-08-09 at 14:00 -0500, Ioan Raicu wrote: > I did the commit, but I don't have it on the screen anymore. When I > tried to do it again, it seems to ask me for a user name and passwd, > which it didn't ask for last time, so obviously, it must not have > worked. I don't seem to know my CI password, so I'll look into reseting > it! I'll commit the changes as soon as I have a password. > > Ioan > > Ben Clifford wrote: > > I don't see a commit from you on > > http://www.ci.uchicago.edu/trac/swift/timeline > > > > When you commit, you should get a revision number like this: > > $ svn commit > > Sending security/GridProxyCertificates.xml > > Sending security/UsingCertificates.xml > > Sending security/index.xml > > Transmitting file data ... > > Committed revision 80. > > > > and the 80 should be (for the swift SVN) somewhere around 1074. > > > From benc at hawaga.org.uk Thu Aug 9 14:22:03 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 9 Aug 2007 19:22:03 +0000 (GMT) Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186686150.29197.3.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <1186686150.29197.3.camel@blabla.mcs.anl.gov> Message-ID: On Thu, 9 Aug 2007, Mihael Hategan wrote: > There's no need to. Those are dist directories. Just make sure you do > ant distclean in vdsk before building. however as I think you are still insistent on building in Nika's install tree rather than using your own, you should be careful that you don't destroy nika's work. it would be wise to take 10 minutes to build swift yourself on your own machine. -- From benc at hawaga.org.uk Thu Aug 9 14:23:45 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 9 Aug 2007 19:23:45 +0000 (GMT) Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BB62D9.9040409@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> Message-ID: On Thu, 9 Aug 2007, Ioan Raicu wrote: > Now, I renamed the jar to FalkonStubs.jar and it can be found in > /home/nefedova/cogl/modules/provider-deef/lib/ you need to 'svn rm WhateverTheOldJarWas.jar' and 'svn add FalkonStubs.jar' before you commit. -- From iraicu at cs.uchicago.edu Thu Aug 9 14:29:01 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 14:29:01 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> Message-ID: <46BB6AFD.4010004@cs.uchicago.edu> Right! Ben Clifford wrote: > On Thu, 9 Aug 2007, Ioan Raicu wrote: > > >> Now, I renamed the jar to FalkonStubs.jar and it can be found in >> /home/nefedova/cogl/modules/provider-deef/lib/ >> > > you need to 'svn rm WhateverTheOldJarWas.jar' and 'svn add > FalkonStubs.jar' before you commit. > > From iraicu at cs.uchicago.edu Thu Aug 9 14:36:05 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 14:36:05 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BB6AFD.4010004@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> Message-ID: <46BB6CA5.8060006@cs.uchicago.edu> Now i am trying to compile the provider-deef, and I fails... it seems that its not finding the stubs I just added... I changed the file name from GenericPortal.jar to FalkonStubs.jar. I need to do anything special, like to rebuild the classpath with the new jar name so the build can pick it up? Ioan nefedova at viper:~/cogl/modules/provider-deef> ant distclean Buildfile: build.xml distclean: .... BUILD SUCCESSFUL Total time: 4 seconds nefedova at viper:~/cogl/modules/provider-deef> ant -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ dist Buildfile: build.xml dist: ... delete.jar: [echo] [provider-deef]: DELETE.JAR (cog-provider-deef-1.0.jar) [delete] Deleting: /home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/lib/cog-provider-deef-1.0.jar compile: [echo] [provider-deef]: COMPILE [mkdir] Created dir: /home/nefedova/cogl/modules/provider-deef/build [javac] Compiling 8 source files to /home/nefedova/cogl/modules/provider-deef/build [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:41: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.*; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:42: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.GPPortType; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:43: package org.globus.GenericPortal.stubs.GPService_instance.service does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:45: package org.globus.GenericPortal.stubs.Factory does not exist [javac] import org.globus.GenericPortal.stubs.Factory.CreateResource; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:46: package org.globus.GenericPortal.stubs.Factory does not exist [javac] import org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:47: package org.globus.GenericPortal.stubs.Factory does not exist [javac] import org.globus.GenericPortal.stubs.Factory.FactoryPortType; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:48: package org.globus.GenericPortal.stubs.Factory.service does not exist [javac] import org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:50: package org.globus.GenericPortal.common does not exist [javac] import org.globus.GenericPortal.common.Notification; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:19: package org.globus.GenericPortal.stubs.Factory does not exist [javac] import org.globus.GenericPortal.stubs.Factory.CreateResource; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:20: package org.globus.GenericPortal.stubs.Factory does not exist [javac] import org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:21: package org.globus.GenericPortal.stubs.Factory does not exist [javac] import org.globus.GenericPortal.stubs.Factory.FactoryPortType; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:22: package org.globus.GenericPortal.stubs.Factory.service does not exist [javac] import org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:24: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.UserJob; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:25: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:26: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.InitResponse; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:27: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.Init; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:28: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.DeInitResponse; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:29: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.DeInit; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:30: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.GPPortType; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:31: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.Executable; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:32: package org.globus.GenericPortal.stubs.GPService_instance.service does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:35: package org.globus.GenericPortal.common does not exist [javac] import org.globus.GenericPortal.common.Notification; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:48: cannot resolve symbol [javac] symbol : class FactoryPortType [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] private FactoryPortType gpFactory = null; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:54: cannot resolve symbol [javac] symbol : class Notification [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] private Notification userNot = null; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:16: package org.globus.GenericPortal.common does not exist [javac] import org.globus.GenericPortal.common.Notification; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:22: cannot resolve symbol [javac] symbol : class Notification [javac] location: class org.globus.cog.abstraction.impl.execution.deef.NotificationThread [javac] private Notification userNot = null; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:31: cannot resolve symbol [javac] symbol : class Notification [javac] location: class org.globus.cog.abstraction.impl.execution.deef.NotificationThread [javac] public NotificationThread(Map tasks, Notification userNot, List completedQueue){ [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:16: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.GPPortType; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:17: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.Executable; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:18: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.UserJob; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:19: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:26: cannot resolve symbol [javac] symbol : class Executable [javac] location: class org.globus.cog.abstraction.impl.execution.deef.SubmissionThread [javac] private Executable[] execs = null; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:27: cannot resolve symbol [javac] symbol : class UserJob [javac] location: class org.globus.cog.abstraction.impl.execution.deef.SubmissionThread [javac] private UserJob job = null; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:97: cannot resolve symbol [javac] symbol : class Executable [javac] location: class org.globus.cog.abstraction.impl.execution.deef.SubmissionThread [javac] public void setStatus (Executable execs[]) { [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:16: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.GPPortType; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:17: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.Executable; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:18: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.UserJob; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:19: package org.globus.GenericPortal.stubs.GPService_instance does not exist [javac] import org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:59: cannot resolve symbol [javac] symbol : class Executable [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] private Executable[] execs = null; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:60: cannot resolve symbol [javac] symbol : class UserJob [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] private UserJob job = null; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:174: cannot resolve symbol [javac] symbol : class GPPortType [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] public GPPortType getNextResourcePort() { [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:184: cannot resolve symbol [javac] symbol : class Executable [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] public void submit(Task task, Executable exec) { [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:210: cannot resolve symbol [javac] symbol : class Notification [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] public static String getMachNamePort(Notification userNot){ [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:229: cannot resolve symbol [javac] symbol : class Executable [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] public Executable removeFirstExec() throws NoSuchElementException { [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:63: cannot resolve symbol [javac] symbol : class GPPortType [javac] location: class org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler [javac] private GPPortType gGP = null; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:206: cannot resolve symbol [javac] symbol : class Executable [javac] location: class org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler [javac] private Executable prepareSpecification(JobSpecification spec) [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:157: cannot resolve symbol [javac] symbol : class Executable [javac] location: class org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler [javac] Executable job = prepareSpecification(spec); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: cannot resolve symbol [javac] symbol : class Executable [javac] location: class org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler [javac] Executable exec = new Executable(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: cannot resolve symbol [javac] symbol : class Executable [javac] location: class org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler [javac] Executable exec = new Executable(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: cannot resolve symbol [javac] symbol : class DeInit [javac] location: class org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler [javac] DeInit deInit = new DeInit(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: cannot resolve symbol [javac] symbol : class DeInit [javac] location: class org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler [javac] DeInit deInit = new DeInit(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:291: cannot resolve symbol [javac] symbol : class DeInitResponse [javac] location: class org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler [javac] DeInitResponse dr = this.gGP.deInit(deInit); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:69: cannot resolve symbol [javac] symbol : class Notification [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] rp.userNot = new Notification(SO_TIMEOUT); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: cannot resolve symbol [javac] symbol : class FactoryServiceAddressingLocator [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] FactoryServiceAddressingLocator factoryLocator = new FactoryServiceAddressingLocator(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: cannot resolve symbol [javac] symbol : class FactoryServiceAddressingLocator [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] FactoryServiceAddressingLocator factoryLocator = new FactoryServiceAddressingLocator(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:130: cannot resolve symbol [javac] symbol : class CreateResourceResponse [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] CreateResourceResponse createResponse = gpFactory [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:131: cannot resolve symbol [javac] symbol : class CreateResource [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] .createResource(new CreateResource()); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: cannot resolve symbol [javac] symbol : class GPServiceAddressingLocator [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] GPServiceAddressingLocator instanceLocator = new GPServiceAddressingLocator(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: cannot resolve symbol [javac] symbol : class GPServiceAddressingLocator [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] GPServiceAddressingLocator instanceLocator = new GPServiceAddressingLocator(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:138: cannot resolve symbol [javac] symbol : class GPPortType [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] GPPortType gGP = instanceLocator.getGPPortTypePort(instanceEPR); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: cannot resolve symbol [javac] symbol : class Init [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] Init initMsg = new Init(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: cannot resolve symbol [javac] symbol : class Init [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] Init initMsg = new Init(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:142: cannot resolve symbol [javac] symbol : class InitResponse [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] InitResponse ir = gGP.init(initMsg); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:177: cannot resolve symbol [javac] symbol : class GPPortType [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] return (GPPortType) gptPool.get(next); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:230: cannot resolve symbol [javac] symbol : class Executable [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] return (Executable) execQueue.remove(0); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: cannot resolve symbol [javac] symbol : class DeInit [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] DeInit deInit = new DeInit(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: cannot resolve symbol [javac] symbol : class DeInit [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] DeInit deInit = new DeInit(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: cannot resolve symbol [javac] symbol : class GPPortType [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] GPPortType gGP = (GPPortType)gptPool.get(i); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: cannot resolve symbol [javac] symbol : class GPPortType [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] GPPortType gGP = (GPPortType)gptPool.get(i); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:254: cannot resolve symbol [javac] symbol : class DeInitResponse [javac] location: class org.globus.cog.abstraction.impl.execution.deef.ResourcePool [javac] DeInitResponse dr = gGP.deInit(deInit); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:44: cannot resolve symbol [javac] symbol : class UserJob [javac] location: class org.globus.cog.abstraction.impl.execution.deef.SubmissionThread [javac] job = new UserJob(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:67: cannot resolve symbol [javac] symbol : class Executable [javac] location: class org.globus.cog.abstraction.impl.execution.deef.SubmissionThread [javac] execs = new Executable[num_execs]; [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:74: cannot resolve symbol [javac] symbol : class GPPortType [javac] location: class org.globus.cog.abstraction.impl.execution.deef.SubmissionThread [javac] GPPortType gGP = rp.getNextResourcePort(); [javac] ^ [javac] /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:75: cannot resolve symbol [javac] symbol : class UserJobResponse [javac] location: class org.globus.cog.abstraction.impl.execution.deef.SubmissionThread [javac] UserJobResponse jobRP = gGP.userJob(job); [javac] ^ [javac] Note: /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java uses or overrides a deprecated API. [javac] Note: Recompile with -deprecation for details. [javac] 74 errors BUILD FAILED /home/nefedova/cogl/modules/provider-deef/build.xml:56: The following error occurred while executing this line: /home/nefedova/cogl/mbuild.xml:463: The following error occurred while executing this line: /home/nefedova/cogl/mbuild.xml:227: Compile failed; see the compiler error output for details. Total time: 23 seconds Ioan Raicu wrote: > Right! > > Ben Clifford wrote: >> On Thu, 9 Aug 2007, Ioan Raicu wrote: >> >> >>> Now, I renamed the jar to FalkonStubs.jar and it can be found in >>> /home/nefedova/cogl/modules/provider-deef/lib/ >>> >> >> you need to 'svn rm WhateverTheOldJarWas.jar' and 'svn add >> FalkonStubs.jar' before you commit. >> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Thu Aug 9 14:42:35 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 09 Aug 2007 14:42:35 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BB6CA5.8060006@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> Message-ID: <1186688556.31075.0.camel@blabla.mcs.anl.gov> That's because you didn't change the project.properties file. You need to edit that and replace GenericPortal.jar with whatever the new thing is. On Thu, 2007-08-09 at 14:36 -0500, Ioan Raicu wrote: > Now i am trying to compile the provider-deef, and I fails... it seems > that its not finding the stubs I just added... I changed the file name > from GenericPortal.jar to FalkonStubs.jar. I need to do anything > special, like to rebuild the classpath with the new jar name so the > build can pick it up? > > Ioan > > nefedova at viper:~/cogl/modules/provider-deef> ant distclean > Buildfile: build.xml > > distclean: > .... > BUILD SUCCESSFUL > Total time: 4 seconds > nefedova at viper:~/cogl/modules/provider-deef> ant > -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ dist > Buildfile: build.xml > > dist: > ... > delete.jar: > [echo] [provider-deef]: DELETE.JAR (cog-provider-deef-1.0.jar) > [delete] Deleting: > /home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/lib/cog-provider-deef-1.0.jar > > compile: > [echo] [provider-deef]: COMPILE > [mkdir] Created dir: /home/nefedova/cogl/modules/provider-deef/build > [javac] Compiling 8 source files to > /home/nefedova/cogl/modules/provider-deef/build > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:41: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import org.globus.GenericPortal.stubs.GPService_instance.*; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:42: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:43: > package org.globus.GenericPortal.stubs.GPService_instance.service does > not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; > > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:45: > package org.globus.GenericPortal.stubs.Factory does not exist > [javac] import org.globus.GenericPortal.stubs.Factory.CreateResource; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:46: > package org.globus.GenericPortal.stubs.Factory does not exist > [javac] import > org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:47: > package org.globus.GenericPortal.stubs.Factory does not exist > [javac] import org.globus.GenericPortal.stubs.Factory.FactoryPortType; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:48: > package org.globus.GenericPortal.stubs.Factory.service does not exist > [javac] import > org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:50: > package org.globus.GenericPortal.common does not exist > [javac] import org.globus.GenericPortal.common.Notification; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:19: > package org.globus.GenericPortal.stubs.Factory does not exist > [javac] import org.globus.GenericPortal.stubs.Factory.CreateResource; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:20: > package org.globus.GenericPortal.stubs.Factory does not exist > [javac] import > org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:21: > package org.globus.GenericPortal.stubs.Factory does not exist > [javac] import org.globus.GenericPortal.stubs.Factory.FactoryPortType; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:22: > package org.globus.GenericPortal.stubs.Factory.service does not exist > [javac] import > org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:24: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.UserJob; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:25: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:26: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.InitResponse; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:27: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import org.globus.GenericPortal.stubs.GPService_instance.Init; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:28: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.DeInitResponse; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:29: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import org.globus.GenericPortal.stubs.GPService_instance.DeInit; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:30: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:31: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.Executable; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:32: > package org.globus.GenericPortal.stubs.GPService_instance.service does > not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; > > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:35: > package org.globus.GenericPortal.common does not exist > [javac] import org.globus.GenericPortal.common.Notification; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:48: > cannot resolve symbol > [javac] symbol : class FactoryPortType > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] private FactoryPortType gpFactory = null; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:54: > cannot resolve symbol > [javac] symbol : class Notification > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] private Notification userNot = null; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:16: > package org.globus.GenericPortal.common does not exist > [javac] import org.globus.GenericPortal.common.Notification; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:22: > cannot resolve symbol > [javac] symbol : class Notification > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.NotificationThread > [javac] private Notification userNot = null; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:31: > cannot resolve symbol > [javac] symbol : class Notification > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.NotificationThread > [javac] public NotificationThread(Map tasks, Notification > userNot, List completedQueue){ > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:16: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:17: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.Executable; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:18: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.UserJob; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:19: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:26: > cannot resolve symbol > [javac] symbol : class Executable > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > [javac] private Executable[] execs = null; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:27: > cannot resolve symbol > [javac] symbol : class UserJob > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > [javac] private UserJob job = null; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:97: > cannot resolve symbol > [javac] symbol : class Executable > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > [javac] public void setStatus (Executable execs[]) { > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:16: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:17: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.Executable; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:18: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.UserJob; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:19: > package org.globus.GenericPortal.stubs.GPService_instance does not exist > [javac] import > org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:59: > cannot resolve symbol > [javac] symbol : class Executable > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] private Executable[] execs = null; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:60: > cannot resolve symbol > [javac] symbol : class UserJob > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] private UserJob job = null; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:174: > cannot resolve symbol > [javac] symbol : class GPPortType > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] public GPPortType getNextResourcePort() { > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:184: > cannot resolve symbol > [javac] symbol : class Executable > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] public void submit(Task task, Executable exec) { > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:210: > cannot resolve symbol > [javac] symbol : class Notification > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] public static String getMachNamePort(Notification userNot){ > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:229: > cannot resolve symbol > [javac] symbol : class Executable > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] public Executable removeFirstExec() throws > NoSuchElementException { > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:63: > cannot resolve symbol > [javac] symbol : class GPPortType > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > [javac] private GPPortType gGP = null; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:206: > cannot resolve symbol > [javac] symbol : class Executable > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > [javac] private Executable prepareSpecification(JobSpecification > spec) > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:157: > cannot resolve symbol > [javac] symbol : class Executable > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > [javac] Executable job = prepareSpecification(spec); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: > cannot resolve symbol > [javac] symbol : class Executable > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > [javac] Executable exec = new Executable(); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: > cannot resolve symbol > [javac] symbol : class Executable > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > [javac] Executable exec = new Executable(); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: > cannot resolve symbol > [javac] symbol : class DeInit > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > [javac] DeInit deInit = new DeInit(); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: > cannot resolve symbol > [javac] symbol : class DeInit > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > [javac] DeInit deInit = new DeInit(); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:291: > cannot resolve symbol > [javac] symbol : class DeInitResponse > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > [javac] DeInitResponse dr = this.gGP.deInit(deInit); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:69: > cannot resolve symbol > [javac] symbol : class Notification > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] rp.userNot = new Notification(SO_TIMEOUT); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: > cannot resolve symbol > [javac] symbol : class FactoryServiceAddressingLocator > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] FactoryServiceAddressingLocator factoryLocator = new > FactoryServiceAddressingLocator(); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: > cannot resolve symbol > [javac] symbol : class FactoryServiceAddressingLocator > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] FactoryServiceAddressingLocator factoryLocator = new > FactoryServiceAddressingLocator(); > > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:130: > cannot resolve symbol > [javac] symbol : class CreateResourceResponse > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] CreateResourceResponse createResponse = gpFactory > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:131: > cannot resolve symbol > [javac] symbol : class CreateResource > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] .createResource(new CreateResource()); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: > cannot resolve symbol > [javac] symbol : class GPServiceAddressingLocator > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] GPServiceAddressingLocator instanceLocator = new > GPServiceAddressingLocator(); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: > cannot resolve symbol > [javac] symbol : class GPServiceAddressingLocator > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] GPServiceAddressingLocator instanceLocator = new > GPServiceAddressingLocator(); > > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:138: > cannot resolve symbol > [javac] symbol : class GPPortType > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] GPPortType gGP = > instanceLocator.getGPPortTypePort(instanceEPR); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: > cannot resolve symbol > [javac] symbol : class Init > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] Init initMsg = new Init(); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: > cannot resolve symbol > [javac] symbol : class Init > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] Init initMsg = new Init(); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:142: > cannot resolve symbol > [javac] symbol : class InitResponse > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] InitResponse ir = gGP.init(initMsg); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:177: > cannot resolve symbol > [javac] symbol : class GPPortType > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] return (GPPortType) gptPool.get(next); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:230: > cannot resolve symbol > [javac] symbol : class Executable > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] return (Executable) execQueue.remove(0); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: > cannot resolve symbol > [javac] symbol : class DeInit > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] DeInit deInit = new DeInit(); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: > cannot resolve symbol > [javac] symbol : class DeInit > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] DeInit deInit = new DeInit(); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: > cannot resolve symbol > [javac] symbol : class GPPortType > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] GPPortType gGP = (GPPortType)gptPool.get(i); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: > cannot resolve symbol > [javac] symbol : class GPPortType > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] GPPortType gGP = (GPPortType)gptPool.get(i); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:254: > cannot resolve symbol > [javac] symbol : class DeInitResponse > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > [javac] DeInitResponse dr = gGP.deInit(deInit); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:44: > cannot resolve symbol > [javac] symbol : class UserJob > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > [javac] job = new UserJob(); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:67: > cannot resolve symbol > [javac] symbol : class Executable > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > [javac] execs = new Executable[num_execs]; > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:74: > cannot resolve symbol > [javac] symbol : class GPPortType > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > [javac] GPPortType gGP = rp.getNextResourcePort(); > [javac] ^ > [javac] > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:75: > cannot resolve symbol > [javac] symbol : class UserJobResponse > [javac] location: class > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > [javac] UserJobResponse jobRP = gGP.userJob(job); > [javac] ^ > [javac] Note: > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java > uses or overrides a deprecated API. > [javac] Note: Recompile with -deprecation for details. > [javac] 74 errors > > BUILD FAILED > /home/nefedova/cogl/modules/provider-deef/build.xml:56: The following > error occurred while executing this line: > /home/nefedova/cogl/mbuild.xml:463: The following error occurred while > executing this line: > /home/nefedova/cogl/mbuild.xml:227: Compile failed; see the compiler > error output for details. > > Total time: 23 seconds > > Ioan Raicu wrote: > > Right! > > > > Ben Clifford wrote: > >> On Thu, 9 Aug 2007, Ioan Raicu wrote: > >> > >> > >>> Now, I renamed the jar to FalkonStubs.jar and it can be found in > >>> /home/nefedova/cogl/modules/provider-deef/lib/ > >>> > >> > >> you need to 'svn rm WhateverTheOldJarWas.jar' and 'svn add > >> FalkonStubs.jar' before you commit. > >> > >> > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From iraicu at cs.uchicago.edu Thu Aug 9 14:46:29 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 14:46:29 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BB6CA5.8060006@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> Message-ID: <46BB6F15.5020308@cs.uchicago.edu> Renaming the jar to the old name, GenericPortal.jar didn't have any effect, and the build still fails. I see that the classpath is not set, is this being set at compile time? Ioan Raicu wrote: > Now i am trying to compile the provider-deef, and I fails... it seems > that its not finding the stubs I just added... I changed the file name > from GenericPortal.jar to FalkonStubs.jar. I need to do anything > special, like to rebuild the classpath with the new jar name so the > build can pick it up? > > Ioan > From hategan at mcs.anl.gov Thu Aug 9 14:52:39 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 09 Aug 2007 14:52:39 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BB6F15.5020308@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> <46BB6F15.5020308@cs.uchicago.edu> Message-ID: <1186689159.31536.0.camel@blabla.mcs.anl.gov> Yeah, but now you changed the jar name in project.properties to FalkonStubs.jar. On Thu, 2007-08-09 at 14:46 -0500, Ioan Raicu wrote: > Renaming the jar to the old name, GenericPortal.jar didn't have any > effect, and the build still fails. I see that the classpath is not set, > is this being set at compile time? > > Ioan Raicu wrote: > > Now i am trying to compile the provider-deef, and I fails... it seems > > that its not finding the stubs I just added... I changed the file name > > from GenericPortal.jar to FalkonStubs.jar. I need to do anything > > special, like to rebuild the classpath with the new jar name so the > > build can pick it up? > > > > Ioan > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From iraicu at cs.uchicago.edu Thu Aug 9 14:52:22 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 14:52:22 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186688556.31075.0.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> <1186688556.31075.0.camel@blabla.mcs.anl.gov> Message-ID: <46BB7076.3080502@cs.uchicago.edu> OK, updated this to reflect the new name, still no luck, same errors... Mihael Hategan wrote: > That's because you didn't change the project.properties file. You need > to edit that and replace GenericPortal.jar with whatever the new thing > is. > > On Thu, 2007-08-09 at 14:36 -0500, Ioan Raicu wrote: > >> Now i am trying to compile the provider-deef, and I fails... it seems >> that its not finding the stubs I just added... I changed the file name >> from GenericPortal.jar to FalkonStubs.jar. I need to do anything >> special, like to rebuild the classpath with the new jar name so the >> build can pick it up? >> >> Ioan >> >> nefedova at viper:~/cogl/modules/provider-deef> ant distclean >> Buildfile: build.xml >> >> distclean: >> .... >> BUILD SUCCESSFUL >> Total time: 4 seconds >> nefedova at viper:~/cogl/modules/provider-deef> ant >> -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ dist >> Buildfile: build.xml >> >> dist: >> ... >> delete.jar: >> [echo] [provider-deef]: DELETE.JAR (cog-provider-deef-1.0.jar) >> [delete] Deleting: >> /home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/lib/cog-provider-deef-1.0.jar >> >> compile: >> [echo] [provider-deef]: COMPILE >> [mkdir] Created dir: /home/nefedova/cogl/modules/provider-deef/build >> [javac] Compiling 8 source files to >> /home/nefedova/cogl/modules/provider-deef/build >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:41: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import org.globus.GenericPortal.stubs.GPService_instance.*; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:42: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:43: >> package org.globus.GenericPortal.stubs.GPService_instance.service does >> not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; >> >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:45: >> package org.globus.GenericPortal.stubs.Factory does not exist >> [javac] import org.globus.GenericPortal.stubs.Factory.CreateResource; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:46: >> package org.globus.GenericPortal.stubs.Factory does not exist >> [javac] import >> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:47: >> package org.globus.GenericPortal.stubs.Factory does not exist >> [javac] import org.globus.GenericPortal.stubs.Factory.FactoryPortType; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:48: >> package org.globus.GenericPortal.stubs.Factory.service does not exist >> [javac] import >> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:50: >> package org.globus.GenericPortal.common does not exist >> [javac] import org.globus.GenericPortal.common.Notification; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:19: >> package org.globus.GenericPortal.stubs.Factory does not exist >> [javac] import org.globus.GenericPortal.stubs.Factory.CreateResource; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:20: >> package org.globus.GenericPortal.stubs.Factory does not exist >> [javac] import >> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:21: >> package org.globus.GenericPortal.stubs.Factory does not exist >> [javac] import org.globus.GenericPortal.stubs.Factory.FactoryPortType; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:22: >> package org.globus.GenericPortal.stubs.Factory.service does not exist >> [javac] import >> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:24: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:25: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:26: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.InitResponse; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:27: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import org.globus.GenericPortal.stubs.GPService_instance.Init; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:28: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.DeInitResponse; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:29: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import org.globus.GenericPortal.stubs.GPService_instance.DeInit; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:30: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:31: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.Executable; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:32: >> package org.globus.GenericPortal.stubs.GPService_instance.service does >> not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; >> >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:35: >> package org.globus.GenericPortal.common does not exist >> [javac] import org.globus.GenericPortal.common.Notification; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:48: >> cannot resolve symbol >> [javac] symbol : class FactoryPortType >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] private FactoryPortType gpFactory = null; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:54: >> cannot resolve symbol >> [javac] symbol : class Notification >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] private Notification userNot = null; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:16: >> package org.globus.GenericPortal.common does not exist >> [javac] import org.globus.GenericPortal.common.Notification; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:22: >> cannot resolve symbol >> [javac] symbol : class Notification >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.NotificationThread >> [javac] private Notification userNot = null; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:31: >> cannot resolve symbol >> [javac] symbol : class Notification >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.NotificationThread >> [javac] public NotificationThread(Map tasks, Notification >> userNot, List completedQueue){ >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:16: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:17: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.Executable; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:18: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:19: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:26: >> cannot resolve symbol >> [javac] symbol : class Executable >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >> [javac] private Executable[] execs = null; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:27: >> cannot resolve symbol >> [javac] symbol : class UserJob >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >> [javac] private UserJob job = null; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:97: >> cannot resolve symbol >> [javac] symbol : class Executable >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >> [javac] public void setStatus (Executable execs[]) { >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:16: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:17: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.Executable; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:18: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:19: >> package org.globus.GenericPortal.stubs.GPService_instance does not exist >> [javac] import >> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:59: >> cannot resolve symbol >> [javac] symbol : class Executable >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] private Executable[] execs = null; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:60: >> cannot resolve symbol >> [javac] symbol : class UserJob >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] private UserJob job = null; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:174: >> cannot resolve symbol >> [javac] symbol : class GPPortType >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] public GPPortType getNextResourcePort() { >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:184: >> cannot resolve symbol >> [javac] symbol : class Executable >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] public void submit(Task task, Executable exec) { >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:210: >> cannot resolve symbol >> [javac] symbol : class Notification >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] public static String getMachNamePort(Notification userNot){ >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:229: >> cannot resolve symbol >> [javac] symbol : class Executable >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] public Executable removeFirstExec() throws >> NoSuchElementException { >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:63: >> cannot resolve symbol >> [javac] symbol : class GPPortType >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >> [javac] private GPPortType gGP = null; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:206: >> cannot resolve symbol >> [javac] symbol : class Executable >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >> [javac] private Executable prepareSpecification(JobSpecification >> spec) >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:157: >> cannot resolve symbol >> [javac] symbol : class Executable >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >> [javac] Executable job = prepareSpecification(spec); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: >> cannot resolve symbol >> [javac] symbol : class Executable >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >> [javac] Executable exec = new Executable(); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: >> cannot resolve symbol >> [javac] symbol : class Executable >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >> [javac] Executable exec = new Executable(); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: >> cannot resolve symbol >> [javac] symbol : class DeInit >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >> [javac] DeInit deInit = new DeInit(); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: >> cannot resolve symbol >> [javac] symbol : class DeInit >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >> [javac] DeInit deInit = new DeInit(); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:291: >> cannot resolve symbol >> [javac] symbol : class DeInitResponse >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >> [javac] DeInitResponse dr = this.gGP.deInit(deInit); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:69: >> cannot resolve symbol >> [javac] symbol : class Notification >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] rp.userNot = new Notification(SO_TIMEOUT); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: >> cannot resolve symbol >> [javac] symbol : class FactoryServiceAddressingLocator >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] FactoryServiceAddressingLocator factoryLocator = new >> FactoryServiceAddressingLocator(); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: >> cannot resolve symbol >> [javac] symbol : class FactoryServiceAddressingLocator >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] FactoryServiceAddressingLocator factoryLocator = new >> FactoryServiceAddressingLocator(); >> >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:130: >> cannot resolve symbol >> [javac] symbol : class CreateResourceResponse >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] CreateResourceResponse createResponse = gpFactory >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:131: >> cannot resolve symbol >> [javac] symbol : class CreateResource >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] .createResource(new CreateResource()); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: >> cannot resolve symbol >> [javac] symbol : class GPServiceAddressingLocator >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] GPServiceAddressingLocator instanceLocator = new >> GPServiceAddressingLocator(); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: >> cannot resolve symbol >> [javac] symbol : class GPServiceAddressingLocator >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] GPServiceAddressingLocator instanceLocator = new >> GPServiceAddressingLocator(); >> >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:138: >> cannot resolve symbol >> [javac] symbol : class GPPortType >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] GPPortType gGP = >> instanceLocator.getGPPortTypePort(instanceEPR); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: >> cannot resolve symbol >> [javac] symbol : class Init >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] Init initMsg = new Init(); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: >> cannot resolve symbol >> [javac] symbol : class Init >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] Init initMsg = new Init(); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:142: >> cannot resolve symbol >> [javac] symbol : class InitResponse >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] InitResponse ir = gGP.init(initMsg); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:177: >> cannot resolve symbol >> [javac] symbol : class GPPortType >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] return (GPPortType) gptPool.get(next); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:230: >> cannot resolve symbol >> [javac] symbol : class Executable >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] return (Executable) execQueue.remove(0); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: >> cannot resolve symbol >> [javac] symbol : class DeInit >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] DeInit deInit = new DeInit(); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: >> cannot resolve symbol >> [javac] symbol : class DeInit >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] DeInit deInit = new DeInit(); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: >> cannot resolve symbol >> [javac] symbol : class GPPortType >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: >> cannot resolve symbol >> [javac] symbol : class GPPortType >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:254: >> cannot resolve symbol >> [javac] symbol : class DeInitResponse >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >> [javac] DeInitResponse dr = gGP.deInit(deInit); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:44: >> cannot resolve symbol >> [javac] symbol : class UserJob >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >> [javac] job = new UserJob(); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:67: >> cannot resolve symbol >> [javac] symbol : class Executable >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >> [javac] execs = new Executable[num_execs]; >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:74: >> cannot resolve symbol >> [javac] symbol : class GPPortType >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >> [javac] GPPortType gGP = rp.getNextResourcePort(); >> [javac] ^ >> [javac] >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:75: >> cannot resolve symbol >> [javac] symbol : class UserJobResponse >> [javac] location: class >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >> [javac] UserJobResponse jobRP = gGP.userJob(job); >> [javac] ^ >> [javac] Note: >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java >> uses or overrides a deprecated API. >> [javac] Note: Recompile with -deprecation for details. >> [javac] 74 errors >> >> BUILD FAILED >> /home/nefedova/cogl/modules/provider-deef/build.xml:56: The following >> error occurred while executing this line: >> /home/nefedova/cogl/mbuild.xml:463: The following error occurred while >> executing this line: >> /home/nefedova/cogl/mbuild.xml:227: Compile failed; see the compiler >> error output for details. >> >> Total time: 23 seconds >> >> Ioan Raicu wrote: >> >>> Right! >>> >>> Ben Clifford wrote: >>> >>>> On Thu, 9 Aug 2007, Ioan Raicu wrote: >>>> >>>> >>>> >>>>> Now, I renamed the jar to FalkonStubs.jar and it can be found in >>>>> /home/nefedova/cogl/modules/provider-deef/lib/ >>>>> >>>>> >>>> you need to 'svn rm WhateverTheOldJarWas.jar' and 'svn add >>>> FalkonStubs.jar' before you commit. >>>> >>>> >>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > > > From hategan at mcs.anl.gov Thu Aug 9 15:00:19 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 09 Aug 2007 15:00:19 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BB7076.3080502@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> <1186688556.31075.0.camel@blabla.mcs.anl.gov> <46BB7076.3080502@cs.uchicago.edu> Message-ID: <1186689619.31721.1.camel@blabla.mcs.anl.gov> Looks like that jar file contains two other jar files instead of directories and class files. On Thu, 2007-08-09 at 14:52 -0500, Ioan Raicu wrote: > OK, updated this to reflect the new name, still no luck, same errors... > > Mihael Hategan wrote: > > That's because you didn't change the project.properties file. You need > > to edit that and replace GenericPortal.jar with whatever the new thing > > is. > > > > On Thu, 2007-08-09 at 14:36 -0500, Ioan Raicu wrote: > > > >> Now i am trying to compile the provider-deef, and I fails... it seems > >> that its not finding the stubs I just added... I changed the file name > >> from GenericPortal.jar to FalkonStubs.jar. I need to do anything > >> special, like to rebuild the classpath with the new jar name so the > >> build can pick it up? > >> > >> Ioan > >> > >> nefedova at viper:~/cogl/modules/provider-deef> ant distclean > >> Buildfile: build.xml > >> > >> distclean: > >> .... > >> BUILD SUCCESSFUL > >> Total time: 4 seconds > >> nefedova at viper:~/cogl/modules/provider-deef> ant > >> -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ dist > >> Buildfile: build.xml > >> > >> dist: > >> ... > >> delete.jar: > >> [echo] [provider-deef]: DELETE.JAR (cog-provider-deef-1.0.jar) > >> [delete] Deleting: > >> /home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/lib/cog-provider-deef-1.0.jar > >> > >> compile: > >> [echo] [provider-deef]: COMPILE > >> [mkdir] Created dir: /home/nefedova/cogl/modules/provider-deef/build > >> [javac] Compiling 8 source files to > >> /home/nefedova/cogl/modules/provider-deef/build > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:41: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import org.globus.GenericPortal.stubs.GPService_instance.*; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:42: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:43: > >> package org.globus.GenericPortal.stubs.GPService_instance.service does > >> not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; > >> > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:45: > >> package org.globus.GenericPortal.stubs.Factory does not exist > >> [javac] import org.globus.GenericPortal.stubs.Factory.CreateResource; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:46: > >> package org.globus.GenericPortal.stubs.Factory does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:47: > >> package org.globus.GenericPortal.stubs.Factory does not exist > >> [javac] import org.globus.GenericPortal.stubs.Factory.FactoryPortType; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:48: > >> package org.globus.GenericPortal.stubs.Factory.service does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:50: > >> package org.globus.GenericPortal.common does not exist > >> [javac] import org.globus.GenericPortal.common.Notification; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:19: > >> package org.globus.GenericPortal.stubs.Factory does not exist > >> [javac] import org.globus.GenericPortal.stubs.Factory.CreateResource; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:20: > >> package org.globus.GenericPortal.stubs.Factory does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:21: > >> package org.globus.GenericPortal.stubs.Factory does not exist > >> [javac] import org.globus.GenericPortal.stubs.Factory.FactoryPortType; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:22: > >> package org.globus.GenericPortal.stubs.Factory.service does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:24: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.UserJob; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:25: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:26: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.InitResponse; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:27: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import org.globus.GenericPortal.stubs.GPService_instance.Init; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:28: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.DeInitResponse; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:29: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import org.globus.GenericPortal.stubs.GPService_instance.DeInit; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:30: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:31: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.Executable; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:32: > >> package org.globus.GenericPortal.stubs.GPService_instance.service does > >> not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; > >> > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:35: > >> package org.globus.GenericPortal.common does not exist > >> [javac] import org.globus.GenericPortal.common.Notification; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:48: > >> cannot resolve symbol > >> [javac] symbol : class FactoryPortType > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] private FactoryPortType gpFactory = null; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:54: > >> cannot resolve symbol > >> [javac] symbol : class Notification > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] private Notification userNot = null; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:16: > >> package org.globus.GenericPortal.common does not exist > >> [javac] import org.globus.GenericPortal.common.Notification; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:22: > >> cannot resolve symbol > >> [javac] symbol : class Notification > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.NotificationThread > >> [javac] private Notification userNot = null; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:31: > >> cannot resolve symbol > >> [javac] symbol : class Notification > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.NotificationThread > >> [javac] public NotificationThread(Map tasks, Notification > >> userNot, List completedQueue){ > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:16: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:17: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.Executable; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:18: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.UserJob; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:19: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:26: > >> cannot resolve symbol > >> [javac] symbol : class Executable > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >> [javac] private Executable[] execs = null; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:27: > >> cannot resolve symbol > >> [javac] symbol : class UserJob > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >> [javac] private UserJob job = null; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:97: > >> cannot resolve symbol > >> [javac] symbol : class Executable > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >> [javac] public void setStatus (Executable execs[]) { > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:16: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:17: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.Executable; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:18: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.UserJob; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:19: > >> package org.globus.GenericPortal.stubs.GPService_instance does not exist > >> [javac] import > >> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:59: > >> cannot resolve symbol > >> [javac] symbol : class Executable > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] private Executable[] execs = null; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:60: > >> cannot resolve symbol > >> [javac] symbol : class UserJob > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] private UserJob job = null; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:174: > >> cannot resolve symbol > >> [javac] symbol : class GPPortType > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] public GPPortType getNextResourcePort() { > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:184: > >> cannot resolve symbol > >> [javac] symbol : class Executable > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] public void submit(Task task, Executable exec) { > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:210: > >> cannot resolve symbol > >> [javac] symbol : class Notification > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] public static String getMachNamePort(Notification userNot){ > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:229: > >> cannot resolve symbol > >> [javac] symbol : class Executable > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] public Executable removeFirstExec() throws > >> NoSuchElementException { > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:63: > >> cannot resolve symbol > >> [javac] symbol : class GPPortType > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >> [javac] private GPPortType gGP = null; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:206: > >> cannot resolve symbol > >> [javac] symbol : class Executable > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >> [javac] private Executable prepareSpecification(JobSpecification > >> spec) > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:157: > >> cannot resolve symbol > >> [javac] symbol : class Executable > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >> [javac] Executable job = prepareSpecification(spec); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: > >> cannot resolve symbol > >> [javac] symbol : class Executable > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >> [javac] Executable exec = new Executable(); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: > >> cannot resolve symbol > >> [javac] symbol : class Executable > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >> [javac] Executable exec = new Executable(); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: > >> cannot resolve symbol > >> [javac] symbol : class DeInit > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >> [javac] DeInit deInit = new DeInit(); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: > >> cannot resolve symbol > >> [javac] symbol : class DeInit > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >> [javac] DeInit deInit = new DeInit(); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:291: > >> cannot resolve symbol > >> [javac] symbol : class DeInitResponse > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >> [javac] DeInitResponse dr = this.gGP.deInit(deInit); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:69: > >> cannot resolve symbol > >> [javac] symbol : class Notification > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] rp.userNot = new Notification(SO_TIMEOUT); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: > >> cannot resolve symbol > >> [javac] symbol : class FactoryServiceAddressingLocator > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] FactoryServiceAddressingLocator factoryLocator = new > >> FactoryServiceAddressingLocator(); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: > >> cannot resolve symbol > >> [javac] symbol : class FactoryServiceAddressingLocator > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] FactoryServiceAddressingLocator factoryLocator = new > >> FactoryServiceAddressingLocator(); > >> > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:130: > >> cannot resolve symbol > >> [javac] symbol : class CreateResourceResponse > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] CreateResourceResponse createResponse = gpFactory > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:131: > >> cannot resolve symbol > >> [javac] symbol : class CreateResource > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] .createResource(new CreateResource()); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: > >> cannot resolve symbol > >> [javac] symbol : class GPServiceAddressingLocator > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] GPServiceAddressingLocator instanceLocator = new > >> GPServiceAddressingLocator(); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: > >> cannot resolve symbol > >> [javac] symbol : class GPServiceAddressingLocator > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] GPServiceAddressingLocator instanceLocator = new > >> GPServiceAddressingLocator(); > >> > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:138: > >> cannot resolve symbol > >> [javac] symbol : class GPPortType > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] GPPortType gGP = > >> instanceLocator.getGPPortTypePort(instanceEPR); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: > >> cannot resolve symbol > >> [javac] symbol : class Init > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] Init initMsg = new Init(); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: > >> cannot resolve symbol > >> [javac] symbol : class Init > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] Init initMsg = new Init(); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:142: > >> cannot resolve symbol > >> [javac] symbol : class InitResponse > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] InitResponse ir = gGP.init(initMsg); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:177: > >> cannot resolve symbol > >> [javac] symbol : class GPPortType > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] return (GPPortType) gptPool.get(next); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:230: > >> cannot resolve symbol > >> [javac] symbol : class Executable > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] return (Executable) execQueue.remove(0); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: > >> cannot resolve symbol > >> [javac] symbol : class DeInit > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] DeInit deInit = new DeInit(); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: > >> cannot resolve symbol > >> [javac] symbol : class DeInit > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] DeInit deInit = new DeInit(); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: > >> cannot resolve symbol > >> [javac] symbol : class GPPortType > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: > >> cannot resolve symbol > >> [javac] symbol : class GPPortType > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:254: > >> cannot resolve symbol > >> [javac] symbol : class DeInitResponse > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >> [javac] DeInitResponse dr = gGP.deInit(deInit); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:44: > >> cannot resolve symbol > >> [javac] symbol : class UserJob > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >> [javac] job = new UserJob(); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:67: > >> cannot resolve symbol > >> [javac] symbol : class Executable > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >> [javac] execs = new Executable[num_execs]; > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:74: > >> cannot resolve symbol > >> [javac] symbol : class GPPortType > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >> [javac] GPPortType gGP = rp.getNextResourcePort(); > >> [javac] ^ > >> [javac] > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:75: > >> cannot resolve symbol > >> [javac] symbol : class UserJobResponse > >> [javac] location: class > >> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >> [javac] UserJobResponse jobRP = gGP.userJob(job); > >> [javac] ^ > >> [javac] Note: > >> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java > >> uses or overrides a deprecated API. > >> [javac] Note: Recompile with -deprecation for details. > >> [javac] 74 errors > >> > >> BUILD FAILED > >> /home/nefedova/cogl/modules/provider-deef/build.xml:56: The following > >> error occurred while executing this line: > >> /home/nefedova/cogl/mbuild.xml:463: The following error occurred while > >> executing this line: > >> /home/nefedova/cogl/mbuild.xml:227: Compile failed; see the compiler > >> error output for details. > >> > >> Total time: 23 seconds > >> > >> Ioan Raicu wrote: > >> > >>> Right! > >>> > >>> Ben Clifford wrote: > >>> > >>>> On Thu, 9 Aug 2007, Ioan Raicu wrote: > >>>> > >>>> > >>>> > >>>>> Now, I renamed the jar to FalkonStubs.jar and it can be found in > >>>>> /home/nefedova/cogl/modules/provider-deef/lib/ > >>>>> > >>>>> > >>>> you need to 'svn rm WhateverTheOldJarWas.jar' and 'svn add > >>>> FalkonStubs.jar' before you commit. > >>>> > >>>> > >>>> > >>> _______________________________________________ > >>> Swift-devel mailing list > >>> Swift-devel at ci.uchicago.edu > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>> > >>> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > >> > > > > > > > From iraicu at cs.uchicago.edu Thu Aug 9 15:01:12 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 15:01:12 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186689159.31536.0.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> <46BB6F15.5020308@cs.uchicago.edu> <1186689159.31536.0.camel@blabla.mcs.anl.gov> Message-ID: <46BB7288.4040708@cs.uchicago.edu> no.. i tried it with the old name, without changing the properties, and with the new name and with changing the properties... I'll try using the old jar, just to see if it compiles! Ioan Mihael Hategan wrote: > Yeah, but now you changed the jar name in project.properties to > FalkonStubs.jar. > > On Thu, 2007-08-09 at 14:46 -0500, Ioan Raicu wrote: > >> Renaming the jar to the old name, GenericPortal.jar didn't have any >> effect, and the build still fails. I see that the classpath is not set, >> is this being set at compile time? >> >> Ioan Raicu wrote: >> >>> Now i am trying to compile the provider-deef, and I fails... it seems >>> that its not finding the stubs I just added... I changed the file name >>> from GenericPortal.jar to FalkonStubs.jar. I need to do anything >>> special, like to rebuild the classpath with the new jar name so the >>> build can pick it up? >>> >>> Ioan >>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> > > > From iraicu at cs.uchicago.edu Thu Aug 9 15:01:17 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 15:01:17 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186689619.31721.1.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> <1186688556.31075.0.camel@blabla.mcs.anl.gov> <46BB7076.3080502@cs.uchicago.edu> <1186689619.31721.1.camel@blabla.mcs.anl.gov> Message-ID: <46BB728D.8060605@cs.uchicago.edu> Is that a problem? I can certainly expand the two jar files... but I thought this worked in Yong's install, where I got this new and updated jar! Mihael Hategan wrote: > Looks like that jar file contains two other jar files instead of > directories and class files. > > On Thu, 2007-08-09 at 14:52 -0500, Ioan Raicu wrote: > >> OK, updated this to reflect the new name, still no luck, same errors... >> >> Mihael Hategan wrote: >> >>> That's because you didn't change the project.properties file. You need >>> to edit that and replace GenericPortal.jar with whatever the new thing >>> is. >>> >>> On Thu, 2007-08-09 at 14:36 -0500, Ioan Raicu wrote: >>> >>> >>>> Now i am trying to compile the provider-deef, and I fails... it seems >>>> that its not finding the stubs I just added... I changed the file name >>>> from GenericPortal.jar to FalkonStubs.jar. I need to do anything >>>> special, like to rebuild the classpath with the new jar name so the >>>> build can pick it up? >>>> >>>> Ioan >>>> >>>> nefedova at viper:~/cogl/modules/provider-deef> ant distclean >>>> Buildfile: build.xml >>>> >>>> distclean: >>>> .... >>>> BUILD SUCCESSFUL >>>> Total time: 4 seconds >>>> nefedova at viper:~/cogl/modules/provider-deef> ant >>>> -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ dist >>>> Buildfile: build.xml >>>> >>>> dist: >>>> ... >>>> delete.jar: >>>> [echo] [provider-deef]: DELETE.JAR (cog-provider-deef-1.0.jar) >>>> [delete] Deleting: >>>> /home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/lib/cog-provider-deef-1.0.jar >>>> >>>> compile: >>>> [echo] [provider-deef]: COMPILE >>>> [mkdir] Created dir: /home/nefedova/cogl/modules/provider-deef/build >>>> [javac] Compiling 8 source files to >>>> /home/nefedova/cogl/modules/provider-deef/build >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:41: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import org.globus.GenericPortal.stubs.GPService_instance.*; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:42: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:43: >>>> package org.globus.GenericPortal.stubs.GPService_instance.service does >>>> not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; >>>> >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:45: >>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>> [javac] import org.globus.GenericPortal.stubs.Factory.CreateResource; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:46: >>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:47: >>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>> [javac] import org.globus.GenericPortal.stubs.Factory.FactoryPortType; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:48: >>>> package org.globus.GenericPortal.stubs.Factory.service does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:50: >>>> package org.globus.GenericPortal.common does not exist >>>> [javac] import org.globus.GenericPortal.common.Notification; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:19: >>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>> [javac] import org.globus.GenericPortal.stubs.Factory.CreateResource; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:20: >>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:21: >>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>> [javac] import org.globus.GenericPortal.stubs.Factory.FactoryPortType; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:22: >>>> package org.globus.GenericPortal.stubs.Factory.service does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:24: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:25: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:26: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.InitResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:27: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import org.globus.GenericPortal.stubs.GPService_instance.Init; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:28: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.DeInitResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:29: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import org.globus.GenericPortal.stubs.GPService_instance.DeInit; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:30: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:31: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:32: >>>> package org.globus.GenericPortal.stubs.GPService_instance.service does >>>> not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; >>>> >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:35: >>>> package org.globus.GenericPortal.common does not exist >>>> [javac] import org.globus.GenericPortal.common.Notification; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:48: >>>> cannot resolve symbol >>>> [javac] symbol : class FactoryPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] private FactoryPortType gpFactory = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:54: >>>> cannot resolve symbol >>>> [javac] symbol : class Notification >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] private Notification userNot = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:16: >>>> package org.globus.GenericPortal.common does not exist >>>> [javac] import org.globus.GenericPortal.common.Notification; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:22: >>>> cannot resolve symbol >>>> [javac] symbol : class Notification >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.NotificationThread >>>> [javac] private Notification userNot = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:31: >>>> cannot resolve symbol >>>> [javac] symbol : class Notification >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.NotificationThread >>>> [javac] public NotificationThread(Map tasks, Notification >>>> userNot, List completedQueue){ >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:16: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:17: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:18: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:19: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:26: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] private Executable[] execs = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:27: >>>> cannot resolve symbol >>>> [javac] symbol : class UserJob >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] private UserJob job = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:97: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] public void setStatus (Executable execs[]) { >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:16: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:17: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:18: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:19: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:59: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] private Executable[] execs = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:60: >>>> cannot resolve symbol >>>> [javac] symbol : class UserJob >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] private UserJob job = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:174: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] public GPPortType getNextResourcePort() { >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:184: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] public void submit(Task task, Executable exec) { >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:210: >>>> cannot resolve symbol >>>> [javac] symbol : class Notification >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] public static String getMachNamePort(Notification userNot){ >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:229: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] public Executable removeFirstExec() throws >>>> NoSuchElementException { >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:63: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] private GPPortType gGP = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:206: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] private Executable prepareSpecification(JobSpecification >>>> spec) >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:157: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] Executable job = prepareSpecification(spec); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] Executable exec = new Executable(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] Executable exec = new Executable(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: >>>> cannot resolve symbol >>>> [javac] symbol : class DeInit >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] DeInit deInit = new DeInit(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: >>>> cannot resolve symbol >>>> [javac] symbol : class DeInit >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] DeInit deInit = new DeInit(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:291: >>>> cannot resolve symbol >>>> [javac] symbol : class DeInitResponse >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] DeInitResponse dr = this.gGP.deInit(deInit); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:69: >>>> cannot resolve symbol >>>> [javac] symbol : class Notification >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] rp.userNot = new Notification(SO_TIMEOUT); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: >>>> cannot resolve symbol >>>> [javac] symbol : class FactoryServiceAddressingLocator >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] FactoryServiceAddressingLocator factoryLocator = new >>>> FactoryServiceAddressingLocator(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: >>>> cannot resolve symbol >>>> [javac] symbol : class FactoryServiceAddressingLocator >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] FactoryServiceAddressingLocator factoryLocator = new >>>> FactoryServiceAddressingLocator(); >>>> >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:130: >>>> cannot resolve symbol >>>> [javac] symbol : class CreateResourceResponse >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] CreateResourceResponse createResponse = gpFactory >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:131: >>>> cannot resolve symbol >>>> [javac] symbol : class CreateResource >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] .createResource(new CreateResource()); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: >>>> cannot resolve symbol >>>> [javac] symbol : class GPServiceAddressingLocator >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] GPServiceAddressingLocator instanceLocator = new >>>> GPServiceAddressingLocator(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: >>>> cannot resolve symbol >>>> [javac] symbol : class GPServiceAddressingLocator >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] GPServiceAddressingLocator instanceLocator = new >>>> GPServiceAddressingLocator(); >>>> >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:138: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] GPPortType gGP = >>>> instanceLocator.getGPPortTypePort(instanceEPR); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: >>>> cannot resolve symbol >>>> [javac] symbol : class Init >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] Init initMsg = new Init(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: >>>> cannot resolve symbol >>>> [javac] symbol : class Init >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] Init initMsg = new Init(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:142: >>>> cannot resolve symbol >>>> [javac] symbol : class InitResponse >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] InitResponse ir = gGP.init(initMsg); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:177: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] return (GPPortType) gptPool.get(next); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:230: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] return (Executable) execQueue.remove(0); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: >>>> cannot resolve symbol >>>> [javac] symbol : class DeInit >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] DeInit deInit = new DeInit(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: >>>> cannot resolve symbol >>>> [javac] symbol : class DeInit >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] DeInit deInit = new DeInit(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:254: >>>> cannot resolve symbol >>>> [javac] symbol : class DeInitResponse >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] DeInitResponse dr = gGP.deInit(deInit); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:44: >>>> cannot resolve symbol >>>> [javac] symbol : class UserJob >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] job = new UserJob(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:67: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] execs = new Executable[num_execs]; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:74: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] GPPortType gGP = rp.getNextResourcePort(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:75: >>>> cannot resolve symbol >>>> [javac] symbol : class UserJobResponse >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] UserJobResponse jobRP = gGP.userJob(job); >>>> [javac] ^ >>>> [javac] Note: >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java >>>> uses or overrides a deprecated API. >>>> [javac] Note: Recompile with -deprecation for details. >>>> [javac] 74 errors >>>> >>>> BUILD FAILED >>>> /home/nefedova/cogl/modules/provider-deef/build.xml:56: The following >>>> error occurred while executing this line: >>>> /home/nefedova/cogl/mbuild.xml:463: The following error occurred while >>>> executing this line: >>>> /home/nefedova/cogl/mbuild.xml:227: Compile failed; see the compiler >>>> error output for details. >>>> >>>> Total time: 23 seconds >>>> >>>> Ioan Raicu wrote: >>>> >>>> >>>>> Right! >>>>> >>>>> Ben Clifford wrote: >>>>> >>>>> >>>>>> On Thu, 9 Aug 2007, Ioan Raicu wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Now, I renamed the jar to FalkonStubs.jar and it can be found in >>>>>>> /home/nefedova/cogl/modules/provider-deef/lib/ >>>>>>> >>>>>>> >>>>>>> >>>>>> you need to 'svn rm WhateverTheOldJarWas.jar' and 'svn add >>>>>> FalkonStubs.jar' before you commit. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>>> >>>> >>> >>> > > > From iraicu at cs.uchicago.edu Thu Aug 9 15:04:39 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 15:04:39 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186689619.31721.1.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> <1186688556.31075.0.camel@blabla.mcs.anl.gov> <46BB7076.3080502@cs.uchicago.edu> <1186689619.31721.1.camel@blabla.mcs.anl.gov> Message-ID: <46BB7357.7040301@cs.uchicago.edu> So the old jar builds succesfully... I'll try to convert the new stubs to contain directories rather than jars... Mihael Hategan wrote: > Looks like that jar file contains two other jar files instead of > directories and class files. > > On Thu, 2007-08-09 at 14:52 -0500, Ioan Raicu wrote: > >> OK, updated this to reflect the new name, still no luck, same errors... >> >> Mihael Hategan wrote: >> >>> That's because you didn't change the project.properties file. You need >>> to edit that and replace GenericPortal.jar with whatever the new thing >>> is. >>> >>> On Thu, 2007-08-09 at 14:36 -0500, Ioan Raicu wrote: >>> >>> >>>> Now i am trying to compile the provider-deef, and I fails... it seems >>>> that its not finding the stubs I just added... I changed the file name >>>> from GenericPortal.jar to FalkonStubs.jar. I need to do anything >>>> special, like to rebuild the classpath with the new jar name so the >>>> build can pick it up? >>>> >>>> Ioan >>>> >>>> nefedova at viper:~/cogl/modules/provider-deef> ant distclean >>>> Buildfile: build.xml >>>> >>>> distclean: >>>> .... >>>> BUILD SUCCESSFUL >>>> Total time: 4 seconds >>>> nefedova at viper:~/cogl/modules/provider-deef> ant >>>> -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ dist >>>> Buildfile: build.xml >>>> >>>> dist: >>>> ... >>>> delete.jar: >>>> [echo] [provider-deef]: DELETE.JAR (cog-provider-deef-1.0.jar) >>>> [delete] Deleting: >>>> /home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/lib/cog-provider-deef-1.0.jar >>>> >>>> compile: >>>> [echo] [provider-deef]: COMPILE >>>> [mkdir] Created dir: /home/nefedova/cogl/modules/provider-deef/build >>>> [javac] Compiling 8 source files to >>>> /home/nefedova/cogl/modules/provider-deef/build >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:41: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import org.globus.GenericPortal.stubs.GPService_instance.*; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:42: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:43: >>>> package org.globus.GenericPortal.stubs.GPService_instance.service does >>>> not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; >>>> >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:45: >>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>> [javac] import org.globus.GenericPortal.stubs.Factory.CreateResource; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:46: >>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:47: >>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>> [javac] import org.globus.GenericPortal.stubs.Factory.FactoryPortType; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:48: >>>> package org.globus.GenericPortal.stubs.Factory.service does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:50: >>>> package org.globus.GenericPortal.common does not exist >>>> [javac] import org.globus.GenericPortal.common.Notification; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:19: >>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>> [javac] import org.globus.GenericPortal.stubs.Factory.CreateResource; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:20: >>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:21: >>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>> [javac] import org.globus.GenericPortal.stubs.Factory.FactoryPortType; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:22: >>>> package org.globus.GenericPortal.stubs.Factory.service does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:24: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:25: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:26: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.InitResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:27: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import org.globus.GenericPortal.stubs.GPService_instance.Init; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:28: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.DeInitResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:29: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import org.globus.GenericPortal.stubs.GPService_instance.DeInit; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:30: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:31: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:32: >>>> package org.globus.GenericPortal.stubs.GPService_instance.service does >>>> not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; >>>> >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:35: >>>> package org.globus.GenericPortal.common does not exist >>>> [javac] import org.globus.GenericPortal.common.Notification; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:48: >>>> cannot resolve symbol >>>> [javac] symbol : class FactoryPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] private FactoryPortType gpFactory = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:54: >>>> cannot resolve symbol >>>> [javac] symbol : class Notification >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] private Notification userNot = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:16: >>>> package org.globus.GenericPortal.common does not exist >>>> [javac] import org.globus.GenericPortal.common.Notification; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:22: >>>> cannot resolve symbol >>>> [javac] symbol : class Notification >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.NotificationThread >>>> [javac] private Notification userNot = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:31: >>>> cannot resolve symbol >>>> [javac] symbol : class Notification >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.NotificationThread >>>> [javac] public NotificationThread(Map tasks, Notification >>>> userNot, List completedQueue){ >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:16: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:17: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:18: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:19: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:26: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] private Executable[] execs = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:27: >>>> cannot resolve symbol >>>> [javac] symbol : class UserJob >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] private UserJob job = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:97: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] public void setStatus (Executable execs[]) { >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:16: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:17: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:18: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:19: >>>> package org.globus.GenericPortal.stubs.GPService_instance does not exist >>>> [javac] import >>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:59: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] private Executable[] execs = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:60: >>>> cannot resolve symbol >>>> [javac] symbol : class UserJob >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] private UserJob job = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:174: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] public GPPortType getNextResourcePort() { >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:184: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] public void submit(Task task, Executable exec) { >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:210: >>>> cannot resolve symbol >>>> [javac] symbol : class Notification >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] public static String getMachNamePort(Notification userNot){ >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:229: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] public Executable removeFirstExec() throws >>>> NoSuchElementException { >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:63: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] private GPPortType gGP = null; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:206: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] private Executable prepareSpecification(JobSpecification >>>> spec) >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:157: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] Executable job = prepareSpecification(spec); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] Executable exec = new Executable(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] Executable exec = new Executable(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: >>>> cannot resolve symbol >>>> [javac] symbol : class DeInit >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] DeInit deInit = new DeInit(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: >>>> cannot resolve symbol >>>> [javac] symbol : class DeInit >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] DeInit deInit = new DeInit(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:291: >>>> cannot resolve symbol >>>> [javac] symbol : class DeInitResponse >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>> [javac] DeInitResponse dr = this.gGP.deInit(deInit); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:69: >>>> cannot resolve symbol >>>> [javac] symbol : class Notification >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] rp.userNot = new Notification(SO_TIMEOUT); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: >>>> cannot resolve symbol >>>> [javac] symbol : class FactoryServiceAddressingLocator >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] FactoryServiceAddressingLocator factoryLocator = new >>>> FactoryServiceAddressingLocator(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: >>>> cannot resolve symbol >>>> [javac] symbol : class FactoryServiceAddressingLocator >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] FactoryServiceAddressingLocator factoryLocator = new >>>> FactoryServiceAddressingLocator(); >>>> >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:130: >>>> cannot resolve symbol >>>> [javac] symbol : class CreateResourceResponse >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] CreateResourceResponse createResponse = gpFactory >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:131: >>>> cannot resolve symbol >>>> [javac] symbol : class CreateResource >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] .createResource(new CreateResource()); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: >>>> cannot resolve symbol >>>> [javac] symbol : class GPServiceAddressingLocator >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] GPServiceAddressingLocator instanceLocator = new >>>> GPServiceAddressingLocator(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: >>>> cannot resolve symbol >>>> [javac] symbol : class GPServiceAddressingLocator >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] GPServiceAddressingLocator instanceLocator = new >>>> GPServiceAddressingLocator(); >>>> >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:138: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] GPPortType gGP = >>>> instanceLocator.getGPPortTypePort(instanceEPR); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: >>>> cannot resolve symbol >>>> [javac] symbol : class Init >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] Init initMsg = new Init(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: >>>> cannot resolve symbol >>>> [javac] symbol : class Init >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] Init initMsg = new Init(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:142: >>>> cannot resolve symbol >>>> [javac] symbol : class InitResponse >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] InitResponse ir = gGP.init(initMsg); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:177: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] return (GPPortType) gptPool.get(next); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:230: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] return (Executable) execQueue.remove(0); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: >>>> cannot resolve symbol >>>> [javac] symbol : class DeInit >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] DeInit deInit = new DeInit(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: >>>> cannot resolve symbol >>>> [javac] symbol : class DeInit >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] DeInit deInit = new DeInit(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:254: >>>> cannot resolve symbol >>>> [javac] symbol : class DeInitResponse >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>> [javac] DeInitResponse dr = gGP.deInit(deInit); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:44: >>>> cannot resolve symbol >>>> [javac] symbol : class UserJob >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] job = new UserJob(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:67: >>>> cannot resolve symbol >>>> [javac] symbol : class Executable >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] execs = new Executable[num_execs]; >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:74: >>>> cannot resolve symbol >>>> [javac] symbol : class GPPortType >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] GPPortType gGP = rp.getNextResourcePort(); >>>> [javac] ^ >>>> [javac] >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:75: >>>> cannot resolve symbol >>>> [javac] symbol : class UserJobResponse >>>> [javac] location: class >>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>> [javac] UserJobResponse jobRP = gGP.userJob(job); >>>> [javac] ^ >>>> [javac] Note: >>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java >>>> uses or overrides a deprecated API. >>>> [javac] Note: Recompile with -deprecation for details. >>>> [javac] 74 errors >>>> >>>> BUILD FAILED >>>> /home/nefedova/cogl/modules/provider-deef/build.xml:56: The following >>>> error occurred while executing this line: >>>> /home/nefedova/cogl/mbuild.xml:463: The following error occurred while >>>> executing this line: >>>> /home/nefedova/cogl/mbuild.xml:227: Compile failed; see the compiler >>>> error output for details. >>>> >>>> Total time: 23 seconds >>>> >>>> Ioan Raicu wrote: >>>> >>>> >>>>> Right! >>>>> >>>>> Ben Clifford wrote: >>>>> >>>>> >>>>>> On Thu, 9 Aug 2007, Ioan Raicu wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Now, I renamed the jar to FalkonStubs.jar and it can be found in >>>>>>> /home/nefedova/cogl/modules/provider-deef/lib/ >>>>>>> >>>>>>> >>>>>>> >>>>>> you need to 'svn rm WhateverTheOldJarWas.jar' and 'svn add >>>>>> FalkonStubs.jar' before you commit. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>>> >>>> >>> >>> > > > From benc at hawaga.org.uk Thu Aug 9 15:19:01 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 9 Aug 2007 20:19:01 +0000 (GMT) Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BB728D.8060605@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> <1186688556.31075.0.camel@blabla.mcs.anl.gov> <46BB7076.3080502@cs.uchicago.edu> <1186689619.31721.1.camel@blabla.mcs.anl.gov> <46BB728D.8060605@cs.uchicago.edu> Message-ID: I suspect the new and updated jar! didn't work but that you old old and not updated jars around in your install too because you weren't running distclean enough. On Thu, 9 Aug 2007, Ioan Raicu wrote: > Is that a problem? I can certainly expand the two jar files... but I thought > this worked in Yong's install, where I got this new and updated jar! > > Mihael Hategan wrote: > > Looks like that jar file contains two other jar files instead of > > directories and class files. > > > > On Thu, 2007-08-09 at 14:52 -0500, Ioan Raicu wrote: > > > > > OK, updated this to reflect the new name, still no luck, same errors... > > > > > > Mihael Hategan wrote: > > > > > > > That's because you didn't change the project.properties file. You need > > > > to edit that and replace GenericPortal.jar with whatever the new thing > > > > is. > > > > > > > > On Thu, 2007-08-09 at 14:36 -0500, Ioan Raicu wrote: > > > > > > > > > Now i am trying to compile the provider-deef, and I fails... it seems > > > > > that its not finding the stubs I just added... I changed the file name > > > > > from GenericPortal.jar to FalkonStubs.jar. I need to do anything > > > > > special, like to rebuild the classpath with the new jar name so the > > > > > build can pick it up? > > > > > > > > > > Ioan > > > > > > > > > > nefedova at viper:~/cogl/modules/provider-deef> ant distclean > > > > > Buildfile: build.xml > > > > > > > > > > distclean: > > > > > .... > > > > > BUILD SUCCESSFUL > > > > > Total time: 4 seconds > > > > > nefedova at viper:~/cogl/modules/provider-deef> ant > > > > > -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ dist > > > > > Buildfile: build.xml > > > > > > > > > > dist: > > > > > ... > > > > > delete.jar: > > > > > [echo] [provider-deef]: DELETE.JAR (cog-provider-deef-1.0.jar) > > > > > [delete] Deleting: > > > > > /home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/lib/cog-provider-deef-1.0.jar > > > > > > > > > > compile: > > > > > [echo] [provider-deef]: COMPILE > > > > > [mkdir] Created dir: > > > > > /home/nefedova/cogl/modules/provider-deef/build > > > > > [javac] Compiling 8 source files to > > > > > /home/nefedova/cogl/modules/provider-deef/build > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:41: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.*; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:42: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:43: > > > > > package org.globus.GenericPortal.stubs.GPService_instance.service does > > > > > not exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; > > > > > [javac] > > > > > ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:45: > > > > > package org.globus.GenericPortal.stubs.Factory does not exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.Factory.CreateResource; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:46: > > > > > package org.globus.GenericPortal.stubs.Factory does not exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:47: > > > > > package org.globus.GenericPortal.stubs.Factory does not exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.Factory.FactoryPortType; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:48: > > > > > package org.globus.GenericPortal.stubs.Factory.service does not exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:50: > > > > > package org.globus.GenericPortal.common does not exist > > > > > [javac] import org.globus.GenericPortal.common.Notification; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:19: > > > > > package org.globus.GenericPortal.stubs.Factory does not exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.Factory.CreateResource; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:20: > > > > > package org.globus.GenericPortal.stubs.Factory does not exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:21: > > > > > package org.globus.GenericPortal.stubs.Factory does not exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.Factory.FactoryPortType; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:22: > > > > > package org.globus.GenericPortal.stubs.Factory.service does not exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:24: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.UserJob; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:25: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:26: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.InitResponse; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:27: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.Init; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:28: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.DeInitResponse; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:29: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.DeInit; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:30: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:31: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.Executable; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:32: > > > > > package org.globus.GenericPortal.stubs.GPService_instance.service does > > > > > not exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; > > > > > [javac] > > > > > ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:35: > > > > > package org.globus.GenericPortal.common does not exist > > > > > [javac] import org.globus.GenericPortal.common.Notification; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:48: > > > > > cannot resolve symbol > > > > > [javac] symbol : class FactoryPortType > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] private FactoryPortType gpFactory = null; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:54: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Notification > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] private Notification userNot = null; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:16: > > > > > package org.globus.GenericPortal.common does not exist > > > > > [javac] import org.globus.GenericPortal.common.Notification; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:22: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Notification > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.NotificationThread > > > > > [javac] private Notification userNot = null; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:31: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Notification > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.NotificationThread > > > > > [javac] public NotificationThread(Map tasks, Notification > > > > > userNot, List completedQueue){ > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:16: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:17: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.Executable; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:18: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.UserJob; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:19: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:26: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Executable > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > > > > > [javac] private Executable[] execs = null; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:27: > > > > > cannot resolve symbol > > > > > [javac] symbol : class UserJob > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > > > > > [javac] private UserJob job = null; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:97: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Executable > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > > > > > [javac] public void setStatus (Executable execs[]) { > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:16: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:17: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.Executable; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:18: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.UserJob; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:19: > > > > > package org.globus.GenericPortal.stubs.GPService_instance does not > > > > > exist > > > > > [javac] import > > > > > org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:59: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Executable > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] private Executable[] execs = null; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:60: > > > > > cannot resolve symbol > > > > > [javac] symbol : class UserJob > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] private UserJob job = null; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:174: > > > > > cannot resolve symbol > > > > > [javac] symbol : class GPPortType > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] public GPPortType getNextResourcePort() { > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:184: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Executable > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] public void submit(Task task, Executable exec) { > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:210: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Notification > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] public static String getMachNamePort(Notification > > > > > userNot){ > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:229: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Executable > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] public Executable removeFirstExec() throws > > > > > NoSuchElementException { > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:63: > > > > > cannot resolve symbol > > > > > [javac] symbol : class GPPortType > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > > > > > [javac] private GPPortType gGP = null; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:206: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Executable > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > > > > > [javac] private Executable > > > > > prepareSpecification(JobSpecification spec) > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:157: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Executable > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > > > > > [javac] Executable job = prepareSpecification(spec); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Executable > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > > > > > [javac] Executable exec = new Executable(); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Executable > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > > > > > [javac] Executable exec = new Executable(); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: > > > > > cannot resolve symbol > > > > > [javac] symbol : class DeInit > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > > > > > [javac] DeInit deInit = new DeInit(); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: > > > > > cannot resolve symbol > > > > > [javac] symbol : class DeInit > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > > > > > [javac] DeInit deInit = new DeInit(); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:291: > > > > > cannot resolve symbol > > > > > [javac] symbol : class DeInitResponse > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > > > > > [javac] DeInitResponse dr = this.gGP.deInit(deInit); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:69: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Notification > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] rp.userNot = new Notification(SO_TIMEOUT); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: > > > > > cannot resolve symbol > > > > > [javac] symbol : class FactoryServiceAddressingLocator > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] FactoryServiceAddressingLocator factoryLocator = > > > > > new FactoryServiceAddressingLocator(); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: > > > > > cannot resolve symbol > > > > > [javac] symbol : class FactoryServiceAddressingLocator > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] FactoryServiceAddressingLocator factoryLocator = > > > > > new FactoryServiceAddressingLocator(); > > > > > [javac] > > > > > ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:130: > > > > > cannot resolve symbol > > > > > [javac] symbol : class CreateResourceResponse > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] CreateResourceResponse createResponse = > > > > > gpFactory > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:131: > > > > > cannot resolve symbol > > > > > [javac] symbol : class CreateResource > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] .createResource(new CreateResource()); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: > > > > > cannot resolve symbol > > > > > [javac] symbol : class GPServiceAddressingLocator > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] GPServiceAddressingLocator instanceLocator = > > > > > new GPServiceAddressingLocator(); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: > > > > > cannot resolve symbol > > > > > [javac] symbol : class GPServiceAddressingLocator > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] GPServiceAddressingLocator instanceLocator = > > > > > new GPServiceAddressingLocator(); > > > > > [javac] > > > > > ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:138: > > > > > cannot resolve symbol > > > > > [javac] symbol : class GPPortType > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] GPPortType gGP = > > > > > instanceLocator.getGPPortTypePort(instanceEPR); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Init > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] Init initMsg = new Init(); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Init > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] Init initMsg = new Init(); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:142: > > > > > cannot resolve symbol > > > > > [javac] symbol : class InitResponse > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] InitResponse ir = gGP.init(initMsg); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:177: > > > > > cannot resolve symbol > > > > > [javac] symbol : class GPPortType > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] return (GPPortType) gptPool.get(next); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:230: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Executable > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] return (Executable) execQueue.remove(0); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: > > > > > cannot resolve symbol > > > > > [javac] symbol : class DeInit > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] DeInit deInit = new DeInit(); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: > > > > > cannot resolve symbol > > > > > [javac] symbol : class DeInit > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] DeInit deInit = new DeInit(); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: > > > > > cannot resolve symbol > > > > > [javac] symbol : class GPPortType > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] GPPortType gGP = (GPPortType)gptPool.get(i); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: > > > > > cannot resolve symbol > > > > > [javac] symbol : class GPPortType > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] GPPortType gGP = (GPPortType)gptPool.get(i); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:254: > > > > > cannot resolve symbol > > > > > [javac] symbol : class DeInitResponse > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.ResourcePool > > > > > [javac] DeInitResponse dr = gGP.deInit(deInit); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:44: > > > > > cannot resolve symbol > > > > > [javac] symbol : class UserJob > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > > > > > [javac] job = new UserJob(); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:67: > > > > > cannot resolve symbol > > > > > [javac] symbol : class Executable > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > > > > > [javac] execs = new Executable[num_execs]; > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:74: > > > > > cannot resolve symbol > > > > > [javac] symbol : class GPPortType > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > > > > > [javac] GPPortType gGP = rp.getNextResourcePort(); > > > > > [javac] ^ > > > > > [javac] > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:75: > > > > > cannot resolve symbol > > > > > [javac] symbol : class UserJobResponse > > > > > [javac] location: class > > > > > org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > > > > > [javac] UserJobResponse jobRP = gGP.userJob(job); > > > > > [javac] ^ > > > > > [javac] Note: > > > > > /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java > > > > > uses or overrides a deprecated API. > > > > > [javac] Note: Recompile with -deprecation for details. > > > > > [javac] 74 errors > > > > > > > > > > BUILD FAILED > > > > > /home/nefedova/cogl/modules/provider-deef/build.xml:56: The following > > > > > error occurred while executing this line: > > > > > /home/nefedova/cogl/mbuild.xml:463: The following error occurred while > > > > > executing this line: > > > > > /home/nefedova/cogl/mbuild.xml:227: Compile failed; see the compiler > > > > > error output for details. > > > > > > > > > > Total time: 23 seconds > > > > > > > > > > Ioan Raicu wrote: > > > > > > > > > > > Right! > > > > > > > > > > > > Ben Clifford wrote: > > > > > > > > > > > > > On Thu, 9 Aug 2007, Ioan Raicu wrote: > > > > > > > > > > > > > > > > > > > > > > Now, I renamed the jar to FalkonStubs.jar and it can be found in > > > > > > > > /home/nefedova/cogl/modules/provider-deef/lib/ > > > > > > > > > > > > > > > you need to 'svn rm WhateverTheOldJarWas.jar' and 'svn add > > > > > > > FalkonStubs.jar' before you commit. > > > > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > > Swift-devel mailing list > > > > > > Swift-devel at ci.uchicago.edu > > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > _______________________________________________ > > > > > Swift-devel mailing list > > > > > Swift-devel at ci.uchicago.edu > > > > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > > > > > > > > > > > > > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From iraicu at cs.uchicago.edu Thu Aug 9 15:21:52 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 15:21:52 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> <1186688556.31075.0.camel@blabla.mcs.anl.gov> <46BB7076.3080502@cs.uchicago.edu> <1186689619.31721.1.camel@blabla.mcs.anl.gov> <46BB728D.8060605@cs.uchicago.edu> Message-ID: <46BB7760.3020203@cs.uchicago.edu> I went through and cleaned them up (manually)... not through dist clean.... for now, I am trying as little as possible of the Swift install... so I don't break anything else... I am just about to try a new jar... Ben Clifford wrote: > I suspect the new and updated jar! didn't work but that you old old and > not updated jars around in your install too because you weren't running > distclean enough. > > On Thu, 9 Aug 2007, Ioan Raicu wrote: > > > From hategan at mcs.anl.gov Thu Aug 9 15:29:25 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 09 Aug 2007 15:29:25 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BB7760.3020203@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> <1186688556.31075.0.camel@blabla.mcs.anl.gov> <46BB7076.3080502@cs.uchicago.edu> <1186689619.31721.1.camel@blabla.mcs.anl.gov> <46BB728D.8060605@cs.uchicago.edu> <46BB7760.3020203@cs.uchicago.edu> Message-ID: <1186691365.524.2.camel@blabla.mcs.anl.gov> Careful not to end up with both new and old stuff in there. On Thu, 2007-08-09 at 15:21 -0500, Ioan Raicu wrote: > I went through and cleaned them up (manually)... not through dist > clean.... for now, I am trying as little as possible of the Swift > install... so I don't break anything else... > > I am just about to try a new jar... > > Ben Clifford wrote: > > I suspect the new and updated jar! didn't work but that you old old and > > not updated jars around in your install too because you weren't running > > distclean enough. > > > > On Thu, 9 Aug 2007, Ioan Raicu wrote: > > > > > > > From iraicu at cs.uchicago.edu Thu Aug 9 15:28:38 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 15:28:38 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> <1186688556.31075.0.camel@blabla.mcs.anl.gov> <46BB7076.3080502@cs.uchicago.edu> <1186689619.31721.1.camel@blabla.mcs.anl.gov> <46BB728D.8060605@cs.uchicago.edu> Message-ID: <46BB78F6.2090309@cs.uchicago.edu> OK, it compiled succesfully after I reworked the jar to include directories and files, and not just tow other jars... now on to testing! Ben Clifford wrote: > I suspect the new and updated jar! didn't work but that you old old and > not updated jars around in your install too because you weren't running > distclean enough. > > On Thu, 9 Aug 2007, Ioan Raicu wrote: > > >> Is that a problem? I can certainly expand the two jar files... but I thought >> this worked in Yong's install, where I got this new and updated jar! >> >> Mihael Hategan wrote: >> >>> Looks like that jar file contains two other jar files instead of >>> directories and class files. >>> >>> On Thu, 2007-08-09 at 14:52 -0500, Ioan Raicu wrote: >>> >>> >>>> OK, updated this to reflect the new name, still no luck, same errors... >>>> >>>> Mihael Hategan wrote: >>>> >>>> >>>>> That's because you didn't change the project.properties file. You need >>>>> to edit that and replace GenericPortal.jar with whatever the new thing >>>>> is. >>>>> >>>>> On Thu, 2007-08-09 at 14:36 -0500, Ioan Raicu wrote: >>>>> >>>>> >>>>>> Now i am trying to compile the provider-deef, and I fails... it seems >>>>>> that its not finding the stubs I just added... I changed the file name >>>>>> from GenericPortal.jar to FalkonStubs.jar. I need to do anything >>>>>> special, like to rebuild the classpath with the new jar name so the >>>>>> build can pick it up? >>>>>> >>>>>> Ioan >>>>>> >>>>>> nefedova at viper:~/cogl/modules/provider-deef> ant distclean >>>>>> Buildfile: build.xml >>>>>> >>>>>> distclean: >>>>>> .... >>>>>> BUILD SUCCESSFUL >>>>>> Total time: 4 seconds >>>>>> nefedova at viper:~/cogl/modules/provider-deef> ant >>>>>> -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ dist >>>>>> Buildfile: build.xml >>>>>> >>>>>> dist: >>>>>> ... >>>>>> delete.jar: >>>>>> [echo] [provider-deef]: DELETE.JAR (cog-provider-deef-1.0.jar) >>>>>> [delete] Deleting: >>>>>> /home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/lib/cog-provider-deef-1.0.jar >>>>>> >>>>>> compile: >>>>>> [echo] [provider-deef]: COMPILE >>>>>> [mkdir] Created dir: >>>>>> /home/nefedova/cogl/modules/provider-deef/build >>>>>> [javac] Compiling 8 source files to >>>>>> /home/nefedova/cogl/modules/provider-deef/build >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:41: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.*; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:42: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:43: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance.service does >>>>>> not exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; >>>>>> [javac] >>>>>> ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:45: >>>>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.Factory.CreateResource; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:46: >>>>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:47: >>>>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.Factory.FactoryPortType; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:48: >>>>>> package org.globus.GenericPortal.stubs.Factory.service does not exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:50: >>>>>> package org.globus.GenericPortal.common does not exist >>>>>> [javac] import org.globus.GenericPortal.common.Notification; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:19: >>>>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.Factory.CreateResource; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:20: >>>>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:21: >>>>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.Factory.FactoryPortType; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:22: >>>>>> package org.globus.GenericPortal.stubs.Factory.service does not exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:24: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:25: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:26: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.InitResponse; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:27: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.Init; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:28: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.DeInitResponse; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:29: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.DeInit; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:30: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:31: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:32: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance.service does >>>>>> not exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; >>>>>> [javac] >>>>>> ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:35: >>>>>> package org.globus.GenericPortal.common does not exist >>>>>> [javac] import org.globus.GenericPortal.common.Notification; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:48: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class FactoryPortType >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] private FactoryPortType gpFactory = null; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:54: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Notification >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] private Notification userNot = null; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:16: >>>>>> package org.globus.GenericPortal.common does not exist >>>>>> [javac] import org.globus.GenericPortal.common.Notification; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:22: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Notification >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.NotificationThread >>>>>> [javac] private Notification userNot = null; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:31: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Notification >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.NotificationThread >>>>>> [javac] public NotificationThread(Map tasks, Notification >>>>>> userNot, List completedQueue){ >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:16: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:17: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:18: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:19: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:26: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Executable >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>> [javac] private Executable[] execs = null; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:27: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class UserJob >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>> [javac] private UserJob job = null; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:97: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Executable >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>> [javac] public void setStatus (Executable execs[]) { >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:16: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:17: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:18: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:19: >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>> exist >>>>>> [javac] import >>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:59: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Executable >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] private Executable[] execs = null; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:60: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class UserJob >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] private UserJob job = null; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:174: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class GPPortType >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] public GPPortType getNextResourcePort() { >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:184: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Executable >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] public void submit(Task task, Executable exec) { >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:210: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Notification >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] public static String getMachNamePort(Notification >>>>>> userNot){ >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:229: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Executable >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] public Executable removeFirstExec() throws >>>>>> NoSuchElementException { >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:63: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class GPPortType >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>> [javac] private GPPortType gGP = null; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:206: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Executable >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>> [javac] private Executable >>>>>> prepareSpecification(JobSpecification spec) >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:157: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Executable >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>> [javac] Executable job = prepareSpecification(spec); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Executable >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>> [javac] Executable exec = new Executable(); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Executable >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>> [javac] Executable exec = new Executable(); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class DeInit >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>> [javac] DeInit deInit = new DeInit(); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class DeInit >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>> [javac] DeInit deInit = new DeInit(); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:291: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class DeInitResponse >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>> [javac] DeInitResponse dr = this.gGP.deInit(deInit); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:69: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Notification >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] rp.userNot = new Notification(SO_TIMEOUT); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class FactoryServiceAddressingLocator >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] FactoryServiceAddressingLocator factoryLocator = >>>>>> new FactoryServiceAddressingLocator(); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class FactoryServiceAddressingLocator >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] FactoryServiceAddressingLocator factoryLocator = >>>>>> new FactoryServiceAddressingLocator(); >>>>>> [javac] >>>>>> ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:130: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class CreateResourceResponse >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] CreateResourceResponse createResponse = >>>>>> gpFactory >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:131: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class CreateResource >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] .createResource(new CreateResource()); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class GPServiceAddressingLocator >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] GPServiceAddressingLocator instanceLocator = >>>>>> new GPServiceAddressingLocator(); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class GPServiceAddressingLocator >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] GPServiceAddressingLocator instanceLocator = >>>>>> new GPServiceAddressingLocator(); >>>>>> [javac] >>>>>> ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:138: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class GPPortType >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] GPPortType gGP = >>>>>> instanceLocator.getGPPortTypePort(instanceEPR); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Init >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] Init initMsg = new Init(); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Init >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] Init initMsg = new Init(); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:142: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class InitResponse >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] InitResponse ir = gGP.init(initMsg); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:177: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class GPPortType >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] return (GPPortType) gptPool.get(next); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:230: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Executable >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] return (Executable) execQueue.remove(0); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class DeInit >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] DeInit deInit = new DeInit(); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class DeInit >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] DeInit deInit = new DeInit(); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class GPPortType >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class GPPortType >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:254: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class DeInitResponse >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>> [javac] DeInitResponse dr = gGP.deInit(deInit); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:44: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class UserJob >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>> [javac] job = new UserJob(); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:67: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class Executable >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>> [javac] execs = new Executable[num_execs]; >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:74: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class GPPortType >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>> [javac] GPPortType gGP = rp.getNextResourcePort(); >>>>>> [javac] ^ >>>>>> [javac] >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:75: >>>>>> cannot resolve symbol >>>>>> [javac] symbol : class UserJobResponse >>>>>> [javac] location: class >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>> [javac] UserJobResponse jobRP = gGP.userJob(job); >>>>>> [javac] ^ >>>>>> [javac] Note: >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java >>>>>> uses or overrides a deprecated API. >>>>>> [javac] Note: Recompile with -deprecation for details. >>>>>> [javac] 74 errors >>>>>> >>>>>> BUILD FAILED >>>>>> /home/nefedova/cogl/modules/provider-deef/build.xml:56: The following >>>>>> error occurred while executing this line: >>>>>> /home/nefedova/cogl/mbuild.xml:463: The following error occurred while >>>>>> executing this line: >>>>>> /home/nefedova/cogl/mbuild.xml:227: Compile failed; see the compiler >>>>>> error output for details. >>>>>> >>>>>> Total time: 23 seconds >>>>>> >>>>>> Ioan Raicu wrote: >>>>>> >>>>>> >>>>>>> Right! >>>>>>> >>>>>>> Ben Clifford wrote: >>>>>>> >>>>>>> >>>>>>>> On Thu, 9 Aug 2007, Ioan Raicu wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Now, I renamed the jar to FalkonStubs.jar and it can be found in >>>>>>>>> /home/nefedova/cogl/modules/provider-deef/lib/ >>>>>>>>> >>>>>>>>> >>>>>>>> you need to 'svn rm WhateverTheOldJarWas.jar' and 'svn add >>>>>>>> FalkonStubs.jar' before you commit. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> _______________________________________________ >>>>>>> Swift-devel mailing list >>>>>>> Swift-devel at ci.uchicago.edu >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>> >>>>>>> >>>>>>> >>>>>> _______________________________________________ >>>>>> Swift-devel mailing list >>>>>> Swift-devel at ci.uchicago.edu >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>> >>> >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> >> >> > > From hategan at mcs.anl.gov Thu Aug 9 15:32:37 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 09 Aug 2007 15:32:37 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <46BB78F6.2090309@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> <1186688556.31075.0.camel@blabla.mcs.anl.gov> <46BB7076.3080502@cs.uchicago.edu> <1186689619.31721.1.camel@blabla.mcs.anl.gov> <46BB728D.8060605@cs.uchicago.edu> <46BB78F6.2090309@cs.uchicago.edu> Message-ID: <1186691557.524.5.camel@blabla.mcs.anl.gov> You do have both FalkonStubs and GenericPortal jars in the dist. It's a guess as to what java will pick first. On Thu, 2007-08-09 at 15:28 -0500, Ioan Raicu wrote: > OK, it compiled succesfully after I reworked the jar to include > directories and files, and not just tow other jars... > now on to testing! > > Ben Clifford wrote: > > I suspect the new and updated jar! didn't work but that you old old and > > not updated jars around in your install too because you weren't running > > distclean enough. > > > > On Thu, 9 Aug 2007, Ioan Raicu wrote: > > > > > >> Is that a problem? I can certainly expand the two jar files... but I thought > >> this worked in Yong's install, where I got this new and updated jar! > >> > >> Mihael Hategan wrote: > >> > >>> Looks like that jar file contains two other jar files instead of > >>> directories and class files. > >>> > >>> On Thu, 2007-08-09 at 14:52 -0500, Ioan Raicu wrote: > >>> > >>> > >>>> OK, updated this to reflect the new name, still no luck, same errors... > >>>> > >>>> Mihael Hategan wrote: > >>>> > >>>> > >>>>> That's because you didn't change the project.properties file. You need > >>>>> to edit that and replace GenericPortal.jar with whatever the new thing > >>>>> is. > >>>>> > >>>>> On Thu, 2007-08-09 at 14:36 -0500, Ioan Raicu wrote: > >>>>> > >>>>> > >>>>>> Now i am trying to compile the provider-deef, and I fails... it seems > >>>>>> that its not finding the stubs I just added... I changed the file name > >>>>>> from GenericPortal.jar to FalkonStubs.jar. I need to do anything > >>>>>> special, like to rebuild the classpath with the new jar name so the > >>>>>> build can pick it up? > >>>>>> > >>>>>> Ioan > >>>>>> > >>>>>> nefedova at viper:~/cogl/modules/provider-deef> ant distclean > >>>>>> Buildfile: build.xml > >>>>>> > >>>>>> distclean: > >>>>>> .... > >>>>>> BUILD SUCCESSFUL > >>>>>> Total time: 4 seconds > >>>>>> nefedova at viper:~/cogl/modules/provider-deef> ant > >>>>>> -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ dist > >>>>>> Buildfile: build.xml > >>>>>> > >>>>>> dist: > >>>>>> ... > >>>>>> delete.jar: > >>>>>> [echo] [provider-deef]: DELETE.JAR (cog-provider-deef-1.0.jar) > >>>>>> [delete] Deleting: > >>>>>> /home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/lib/cog-provider-deef-1.0.jar > >>>>>> > >>>>>> compile: > >>>>>> [echo] [provider-deef]: COMPILE > >>>>>> [mkdir] Created dir: > >>>>>> /home/nefedova/cogl/modules/provider-deef/build > >>>>>> [javac] Compiling 8 source files to > >>>>>> /home/nefedova/cogl/modules/provider-deef/build > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:41: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.*; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:42: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:43: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance.service does > >>>>>> not exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; > >>>>>> [javac] > >>>>>> ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:45: > >>>>>> package org.globus.GenericPortal.stubs.Factory does not exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.Factory.CreateResource; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:46: > >>>>>> package org.globus.GenericPortal.stubs.Factory does not exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:47: > >>>>>> package org.globus.GenericPortal.stubs.Factory does not exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.Factory.FactoryPortType; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:48: > >>>>>> package org.globus.GenericPortal.stubs.Factory.service does not exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:50: > >>>>>> package org.globus.GenericPortal.common does not exist > >>>>>> [javac] import org.globus.GenericPortal.common.Notification; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:19: > >>>>>> package org.globus.GenericPortal.stubs.Factory does not exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.Factory.CreateResource; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:20: > >>>>>> package org.globus.GenericPortal.stubs.Factory does not exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:21: > >>>>>> package org.globus.GenericPortal.stubs.Factory does not exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.Factory.FactoryPortType; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:22: > >>>>>> package org.globus.GenericPortal.stubs.Factory.service does not exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:24: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:25: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:26: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.InitResponse; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:27: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.Init; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:28: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.DeInitResponse; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:29: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.DeInit; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:30: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:31: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:32: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance.service does > >>>>>> not exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; > >>>>>> [javac] > >>>>>> ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:35: > >>>>>> package org.globus.GenericPortal.common does not exist > >>>>>> [javac] import org.globus.GenericPortal.common.Notification; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:48: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class FactoryPortType > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] private FactoryPortType gpFactory = null; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:54: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Notification > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] private Notification userNot = null; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:16: > >>>>>> package org.globus.GenericPortal.common does not exist > >>>>>> [javac] import org.globus.GenericPortal.common.Notification; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:22: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Notification > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.NotificationThread > >>>>>> [javac] private Notification userNot = null; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:31: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Notification > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.NotificationThread > >>>>>> [javac] public NotificationThread(Map tasks, Notification > >>>>>> userNot, List completedQueue){ > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:16: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:17: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:18: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:19: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:26: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Executable > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >>>>>> [javac] private Executable[] execs = null; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:27: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class UserJob > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >>>>>> [javac] private UserJob job = null; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:97: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Executable > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >>>>>> [javac] public void setStatus (Executable execs[]) { > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:16: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:17: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:18: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:19: > >>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not > >>>>>> exist > >>>>>> [javac] import > >>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:59: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Executable > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] private Executable[] execs = null; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:60: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class UserJob > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] private UserJob job = null; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:174: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class GPPortType > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] public GPPortType getNextResourcePort() { > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:184: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Executable > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] public void submit(Task task, Executable exec) { > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:210: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Notification > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] public static String getMachNamePort(Notification > >>>>>> userNot){ > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:229: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Executable > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] public Executable removeFirstExec() throws > >>>>>> NoSuchElementException { > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:63: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class GPPortType > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >>>>>> [javac] private GPPortType gGP = null; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:206: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Executable > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >>>>>> [javac] private Executable > >>>>>> prepareSpecification(JobSpecification spec) > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:157: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Executable > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >>>>>> [javac] Executable job = prepareSpecification(spec); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Executable > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >>>>>> [javac] Executable exec = new Executable(); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Executable > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >>>>>> [javac] Executable exec = new Executable(); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class DeInit > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >>>>>> [javac] DeInit deInit = new DeInit(); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class DeInit > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >>>>>> [javac] DeInit deInit = new DeInit(); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:291: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class DeInitResponse > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler > >>>>>> [javac] DeInitResponse dr = this.gGP.deInit(deInit); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:69: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Notification > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] rp.userNot = new Notification(SO_TIMEOUT); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class FactoryServiceAddressingLocator > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] FactoryServiceAddressingLocator factoryLocator = > >>>>>> new FactoryServiceAddressingLocator(); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class FactoryServiceAddressingLocator > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] FactoryServiceAddressingLocator factoryLocator = > >>>>>> new FactoryServiceAddressingLocator(); > >>>>>> [javac] > >>>>>> ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:130: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class CreateResourceResponse > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] CreateResourceResponse createResponse = > >>>>>> gpFactory > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:131: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class CreateResource > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] .createResource(new CreateResource()); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class GPServiceAddressingLocator > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] GPServiceAddressingLocator instanceLocator = > >>>>>> new GPServiceAddressingLocator(); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class GPServiceAddressingLocator > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] GPServiceAddressingLocator instanceLocator = > >>>>>> new GPServiceAddressingLocator(); > >>>>>> [javac] > >>>>>> ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:138: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class GPPortType > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] GPPortType gGP = > >>>>>> instanceLocator.getGPPortTypePort(instanceEPR); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Init > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] Init initMsg = new Init(); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Init > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] Init initMsg = new Init(); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:142: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class InitResponse > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] InitResponse ir = gGP.init(initMsg); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:177: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class GPPortType > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] return (GPPortType) gptPool.get(next); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:230: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Executable > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] return (Executable) execQueue.remove(0); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class DeInit > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] DeInit deInit = new DeInit(); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class DeInit > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] DeInit deInit = new DeInit(); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class GPPortType > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class GPPortType > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:254: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class DeInitResponse > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool > >>>>>> [javac] DeInitResponse dr = gGP.deInit(deInit); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:44: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class UserJob > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >>>>>> [javac] job = new UserJob(); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:67: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class Executable > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >>>>>> [javac] execs = new Executable[num_execs]; > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:74: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class GPPortType > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >>>>>> [javac] GPPortType gGP = rp.getNextResourcePort(); > >>>>>> [javac] ^ > >>>>>> [javac] > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:75: > >>>>>> cannot resolve symbol > >>>>>> [javac] symbol : class UserJobResponse > >>>>>> [javac] location: class > >>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread > >>>>>> [javac] UserJobResponse jobRP = gGP.userJob(job); > >>>>>> [javac] ^ > >>>>>> [javac] Note: > >>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java > >>>>>> uses or overrides a deprecated API. > >>>>>> [javac] Note: Recompile with -deprecation for details. > >>>>>> [javac] 74 errors > >>>>>> > >>>>>> BUILD FAILED > >>>>>> /home/nefedova/cogl/modules/provider-deef/build.xml:56: The following > >>>>>> error occurred while executing this line: > >>>>>> /home/nefedova/cogl/mbuild.xml:463: The following error occurred while > >>>>>> executing this line: > >>>>>> /home/nefedova/cogl/mbuild.xml:227: Compile failed; see the compiler > >>>>>> error output for details. > >>>>>> > >>>>>> Total time: 23 seconds > >>>>>> > >>>>>> Ioan Raicu wrote: > >>>>>> > >>>>>> > >>>>>>> Right! > >>>>>>> > >>>>>>> Ben Clifford wrote: > >>>>>>> > >>>>>>> > >>>>>>>> On Thu, 9 Aug 2007, Ioan Raicu wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> Now, I renamed the jar to FalkonStubs.jar and it can be found in > >>>>>>>>> /home/nefedova/cogl/modules/provider-deef/lib/ > >>>>>>>>> > >>>>>>>>> > >>>>>>>> you need to 'svn rm WhateverTheOldJarWas.jar' and 'svn add > >>>>>>>> FalkonStubs.jar' before you commit. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> _______________________________________________ > >>>>>>> Swift-devel mailing list > >>>>>>> Swift-devel at ci.uchicago.edu > >>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> _______________________________________________ > >>>>>> Swift-devel mailing list > >>>>>> Swift-devel at ci.uchicago.edu > >>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>>>> > >>>>>> > >>>>>> > >>>>> > >>>>> > >>> > >>> > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > >> > >> > > > > > From iraicu at cs.uchicago.edu Thu Aug 9 15:39:25 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 09 Aug 2007 15:39:25 -0500 Subject: [Swift-devel] Q about MolDyn In-Reply-To: <1186691557.524.5.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46B7844F.6020801@cs.uchicago.edu> <6AF28AFA-F52E-4D0B-B236-BDDC967C258C@mcs.anl.gov> <46B78EEB.80800@cs.uchicago.edu> <1186680798.26452.3.camel@blabla.mcs.anl.gov> <46BB5692.90305@cs.uchicago.edu> <46BB62D9.9040409@cs.uchicago.edu> <46BB6AFD.4010004@cs.uchicago.edu> <46BB6CA5.8060006@cs.uchicago.edu> <1186688556.31075.0.camel@blabla.mcs.anl.gov> <46BB7076.3080502@cs.uchicago.edu> <1186689619.31721.1.camel@blabla.mcs.anl.gov> <46BB728D.8060605@cs.uchicago.edu> <46BB78F6.2090309@cs.uchicago.edu> <1186691557.524.5.camel@blabla.mcs.anl.gov> Message-ID: <46BB7B7D.90602@cs.uchicago.edu> I removed all references of GenericPortal, and placed FalkonStubs in their place... there are in total 8 FalkonStubs.jar floating around, they are all the same... perhaps one of you, Mihael or Ben could clean this up, and make sure that there is only 1 FalkonStubs.jar! Mihael Hategan wrote: > You do have both FalkonStubs and GenericPortal jars in the dist. It's a > guess as to what java will pick first. > > On Thu, 2007-08-09 at 15:28 -0500, Ioan Raicu wrote: > >> OK, it compiled succesfully after I reworked the jar to include >> directories and files, and not just tow other jars... >> now on to testing! >> >> Ben Clifford wrote: >> >>> I suspect the new and updated jar! didn't work but that you old old and >>> not updated jars around in your install too because you weren't running >>> distclean enough. >>> >>> On Thu, 9 Aug 2007, Ioan Raicu wrote: >>> >>> >>> >>>> Is that a problem? I can certainly expand the two jar files... but I thought >>>> this worked in Yong's install, where I got this new and updated jar! >>>> >>>> Mihael Hategan wrote: >>>> >>>> >>>>> Looks like that jar file contains two other jar files instead of >>>>> directories and class files. >>>>> >>>>> On Thu, 2007-08-09 at 14:52 -0500, Ioan Raicu wrote: >>>>> >>>>> >>>>> >>>>>> OK, updated this to reflect the new name, still no luck, same errors... >>>>>> >>>>>> Mihael Hategan wrote: >>>>>> >>>>>> >>>>>> >>>>>>> That's because you didn't change the project.properties file. You need >>>>>>> to edit that and replace GenericPortal.jar with whatever the new thing >>>>>>> is. >>>>>>> >>>>>>> On Thu, 2007-08-09 at 14:36 -0500, Ioan Raicu wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>>> Now i am trying to compile the provider-deef, and I fails... it seems >>>>>>>> that its not finding the stubs I just added... I changed the file name >>>>>>>> from GenericPortal.jar to FalkonStubs.jar. I need to do anything >>>>>>>> special, like to rebuild the classpath with the new jar name so the >>>>>>>> build can pick it up? >>>>>>>> >>>>>>>> Ioan >>>>>>>> >>>>>>>> nefedova at viper:~/cogl/modules/provider-deef> ant distclean >>>>>>>> Buildfile: build.xml >>>>>>>> >>>>>>>> distclean: >>>>>>>> .... >>>>>>>> BUILD SUCCESSFUL >>>>>>>> Total time: 4 seconds >>>>>>>> nefedova at viper:~/cogl/modules/provider-deef> ant >>>>>>>> -Ddist.dir=/home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/ dist >>>>>>>> Buildfile: build.xml >>>>>>>> >>>>>>>> dist: >>>>>>>> ... >>>>>>>> delete.jar: >>>>>>>> [echo] [provider-deef]: DELETE.JAR (cog-provider-deef-1.0.jar) >>>>>>>> [delete] Deleting: >>>>>>>> /home/nefedova/cogl/modules/vdsk/dist/vdsk-0.2-dev/lib/cog-provider-deef-1.0.jar >>>>>>>> >>>>>>>> compile: >>>>>>>> [echo] [provider-deef]: COMPILE >>>>>>>> [mkdir] Created dir: >>>>>>>> /home/nefedova/cogl/modules/provider-deef/build >>>>>>>> [javac] Compiling 8 source files to >>>>>>>> /home/nefedova/cogl/modules/provider-deef/build >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:41: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.*; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:42: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:43: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance.service does >>>>>>>> not exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; >>>>>>>> [javac] >>>>>>>> ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:45: >>>>>>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.Factory.CreateResource; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:46: >>>>>>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:47: >>>>>>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.Factory.FactoryPortType; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:48: >>>>>>>> package org.globus.GenericPortal.stubs.Factory.service does not exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:50: >>>>>>>> package org.globus.GenericPortal.common does not exist >>>>>>>> [javac] import org.globus.GenericPortal.common.Notification; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:19: >>>>>>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.Factory.CreateResource; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:20: >>>>>>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.Factory.CreateResourceResponse; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:21: >>>>>>>> package org.globus.GenericPortal.stubs.Factory does not exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.Factory.FactoryPortType; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:22: >>>>>>>> package org.globus.GenericPortal.stubs.Factory.service does not exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.Factory.service.FactoryServiceAddressingLocator; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:24: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:25: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:26: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.InitResponse; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:27: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.Init; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:28: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.DeInitResponse; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:29: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.DeInit; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:30: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:31: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:32: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance.service does >>>>>>>> not exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.service.GPServiceAddressingLocator; >>>>>>>> [javac] >>>>>>>> ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:35: >>>>>>>> package org.globus.GenericPortal.common does not exist >>>>>>>> [javac] import org.globus.GenericPortal.common.Notification; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:48: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class FactoryPortType >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] private FactoryPortType gpFactory = null; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:54: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Notification >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] private Notification userNot = null; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:16: >>>>>>>> package org.globus.GenericPortal.common does not exist >>>>>>>> [javac] import org.globus.GenericPortal.common.Notification; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:22: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Notification >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.NotificationThread >>>>>>>> [javac] private Notification userNot = null; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java:31: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Notification >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.NotificationThread >>>>>>>> [javac] public NotificationThread(Map tasks, Notification >>>>>>>> userNot, List completedQueue){ >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:16: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:17: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:18: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:19: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:26: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Executable >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>>>> [javac] private Executable[] execs = null; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:27: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class UserJob >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>>>> [javac] private UserJob job = null; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:97: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Executable >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>>>> [javac] public void setStatus (Executable execs[]) { >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:16: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.GPPortType; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:17: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.Executable; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:18: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJob; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java:19: >>>>>>>> package org.globus.GenericPortal.stubs.GPService_instance does not >>>>>>>> exist >>>>>>>> [javac] import >>>>>>>> org.globus.GenericPortal.stubs.GPService_instance.UserJobResponse; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:59: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Executable >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] private Executable[] execs = null; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:60: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class UserJob >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] private UserJob job = null; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:174: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class GPPortType >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] public GPPortType getNextResourcePort() { >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:184: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Executable >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] public void submit(Task task, Executable exec) { >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:210: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Notification >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] public static String getMachNamePort(Notification >>>>>>>> userNot){ >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:229: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Executable >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] public Executable removeFirstExec() throws >>>>>>>> NoSuchElementException { >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:63: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class GPPortType >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>>>> [javac] private GPPortType gGP = null; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:206: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Executable >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>>>> [javac] private Executable >>>>>>>> prepareSpecification(JobSpecification spec) >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:157: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Executable >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>>>> [javac] Executable job = prepareSpecification(spec); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Executable >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>>>> [javac] Executable exec = new Executable(); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:211: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Executable >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>>>> [javac] Executable exec = new Executable(); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class DeInit >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>>>> [javac] DeInit deInit = new DeInit(); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:289: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class DeInit >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>>>> [javac] DeInit deInit = new DeInit(); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java:291: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class DeInitResponse >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.JobSubmissionTaskHandler >>>>>>>> [javac] DeInitResponse dr = this.gGP.deInit(deInit); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:69: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Notification >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] rp.userNot = new Notification(SO_TIMEOUT); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class FactoryServiceAddressingLocator >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] FactoryServiceAddressingLocator factoryLocator = >>>>>>>> new FactoryServiceAddressingLocator(); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:110: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class FactoryServiceAddressingLocator >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] FactoryServiceAddressingLocator factoryLocator = >>>>>>>> new FactoryServiceAddressingLocator(); >>>>>>>> [javac] >>>>>>>> ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:130: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class CreateResourceResponse >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] CreateResourceResponse createResponse = >>>>>>>> gpFactory >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:131: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class CreateResource >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] .createResource(new CreateResource()); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class GPServiceAddressingLocator >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] GPServiceAddressingLocator instanceLocator = >>>>>>>> new GPServiceAddressingLocator(); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:137: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class GPServiceAddressingLocator >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] GPServiceAddressingLocator instanceLocator = >>>>>>>> new GPServiceAddressingLocator(); >>>>>>>> [javac] >>>>>>>> ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:138: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class GPPortType >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] GPPortType gGP = >>>>>>>> instanceLocator.getGPPortTypePort(instanceEPR); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Init >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] Init initMsg = new Init(); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:140: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Init >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] Init initMsg = new Init(); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:142: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class InitResponse >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] InitResponse ir = gGP.init(initMsg); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:177: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class GPPortType >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] return (GPPortType) gptPool.get(next); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:230: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Executable >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] return (Executable) execQueue.remove(0); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class DeInit >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] DeInit deInit = new DeInit(); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:247: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class DeInit >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] DeInit deInit = new DeInit(); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class GPPortType >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:252: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class GPPortType >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] GPPortType gGP = (GPPortType)gptPool.get(i); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java:254: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class DeInitResponse >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.ResourcePool >>>>>>>> [javac] DeInitResponse dr = gGP.deInit(deInit); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:44: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class UserJob >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>>>> [javac] job = new UserJob(); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:67: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class Executable >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>>>> [javac] execs = new Executable[num_execs]; >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:74: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class GPPortType >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>>>> [javac] GPPortType gGP = rp.getNextResourcePort(); >>>>>>>> [javac] ^ >>>>>>>> [javac] >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java:75: >>>>>>>> cannot resolve symbol >>>>>>>> [javac] symbol : class UserJobResponse >>>>>>>> [javac] location: class >>>>>>>> org.globus.cog.abstraction.impl.execution.deef.SubmissionThread >>>>>>>> [javac] UserJobResponse jobRP = gGP.userJob(job); >>>>>>>> [javac] ^ >>>>>>>> [javac] Note: >>>>>>>> /home/nefedova/cogl/modules/provider-deef/src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java >>>>>>>> uses or overrides a deprecated API. >>>>>>>> [javac] Note: Recompile with -deprecation for details. >>>>>>>> [javac] 74 errors >>>>>>>> >>>>>>>> BUILD FAILED >>>>>>>> /home/nefedova/cogl/modules/provider-deef/build.xml:56: The following >>>>>>>> error occurred while executing this line: >>>>>>>> /home/nefedova/cogl/mbuild.xml:463: The following error occurred while >>>>>>>> executing this line: >>>>>>>> /home/nefedova/cogl/mbuild.xml:227: Compile failed; see the compiler >>>>>>>> error output for details. >>>>>>>> >>>>>>>> Total time: 23 seconds >>>>>>>> >>>>>>>> Ioan Raicu wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Right! >>>>>>>>> >>>>>>>>> Ben Clifford wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> On Thu, 9 Aug 2007, Ioan Raicu wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> Now, I renamed the jar to FalkonStubs.jar and it can be found in >>>>>>>>>>> /home/nefedova/cogl/modules/provider-deef/lib/ >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> you need to 'svn rm WhateverTheOldJarWas.jar' and 'svn add >>>>>>>>>> FalkonStubs.jar' before you commit. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> _______________________________________________ >>>>>>>>> Swift-devel mailing list >>>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>> _______________________________________________ >>>>>>>> Swift-devel mailing list >>>>>>>> Swift-devel at ci.uchicago.edu >>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> >>>>> >>>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>>> >>>> >>>> >>> >>> > > > From iraicu at cs.uchicago.edu Sun Aug 12 00:22:04 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 12 Aug 2007 00:22:04 -0500 Subject: [Swift-devel] 244 MolDyn run was successful! In-Reply-To: <46BDB67D.2040207@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46BB7076.3080502@cs.uchicago.edu> <1186689619.31721.1.camel@blabla.mcs.anl.gov> <46BB728D.8060605@cs.uchicago.edu> <46BB78F6.2090309@cs.uchicago.edu> <1186691557.524.5.camel@blabla.mcs.anl.gov> <46BB7B7D.90602@cs.uchicago.edu> <46BB97A8.3060006@cs.uchicago.edu> <5CB62511-5C52-4C63-8EE6-C36A4A4457DF@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> Message-ID: <46BE98FC.8040606@cs.uchicago.edu> Hi, Here is a quick recap of the 244 MolDyn run we made this weekend... I have posted the logs and graphs at: http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/ 11079 failed with a -1. 2 failed with an exit code of 127. Inspecting the logs revealed the infamous stale NFS handle error! A single machine (192.5.198.37) had all the failed tasks (11081 tasks); the machine was not completely broken, as it did complete 4 tasks successfully, although the completion times were considerably higher than the other machines. 20836 tasks finished with an exit code 0. I was expecting 20497 tasks broken down as follows: 1 1 1 1 244 244 1 244 244 68 244 16592 1 244 244 11 244 2684 1 244 244 1 244 244 20497 I do not know why there were 339 more tasks than we were expecting. A close look at the summary graph (http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/summary_graph-med.jpg), we see that after the large number of failed tasks, the queue length (blue line) quickly went to 0, and then stayed there as Swift was trickling in only about 100 tasks at a time. For the rest of the experiment, only about 100 tasks at a time were ever running. This is not the first time we have seen this, and it seems that is only showing up when there is a bad machine failing many tasks, and essentially Swift doesn't try to resubmit them fast, and the jobs only trickle in thereafter not keeping all the processors busy. When we had runs with no bad nodes and no large number of failures, this did not happen, and Swift essentially submitted all independent tasks to Falkon. I know there is a heuristic within Karajan that is probably affecting the submit rate of tasks after the large number of failures happened, but I think it needs to be tuned to recover from large number of failures so in time, it again attempts to send more. A good analogy is TCP, think of its window size increasing larger and larger, but then a large number of packets get lost, and TCP collapses its window size, but then never recovering from this and remaining with a small window size for the rest of the connection, regardless of the fact that it could again increase the window size until the next round of lost packets... I believe the normal behavior should allow Swift to recover and again submit many tasks to Falkon. If this heuristic cannot be easily tweaked or made to recover from the "window collapse", could we disable it when we are running on Falkon at a single site? BTW, here were the graphs from a previous run when only the last few jobs didn't finish due to a bug in the application code. http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ In this run, notice that there were no bad nodes that caused many tasks to fail, and Swift submitted many tasks to Falkon, and managed to keep all processors busy! I think we can call the 244-mol MolDyn run a success, both the current run and the previous run from 7-16-07 that almost finished! We need to figure out how to control the job throttling better, and perhaps on how to automatically detect this plaguing problem with "Stale NFS handle", and possibly contain the damage to significantly fewer task failures. I also think that increasing the # of retries from Swift's end should be considered when running over Falkon. Notice that a single worker can fail as many as 1000 tasks per minute, which are many tasks given that when the NFS stale handle shows up, its around for tens of seconds to minutes at a time. BTW, the run we just made consummed about 1556.9 CPU hours (937.7 used and 619.2 wasted) in 8.5 hours. In contrast, the run we made on 7-16-07 which almost finished, but behaved much better since there were no node failures, consumed about 866.4 CPU hours (866.3 used and 0.1 wasted) in 4.18 hours. When Nika comes back from vacation, we can try the real application, which should consume some 16K CPU hours (service units)! She also has her own temporary allocation at ANL/UC now, so we can use that! Ioan Ioan Raicu wrote: > I think the workflow finally completed successfully, but there are > still some oddities in the way the logs look (especially job > throttling, a few hundred more jobs than I was expecting, etc). At > least, we have all the output we needed for every molecule! > > I'll write up a summary of what happened, and draw up some nice > graphs, and send it out later today. > > Ioan > > iraicu at viper:/home/nefedova/alamines> ls fe_* | wc > 488 488 6832 > >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Sun Aug 12 11:39:43 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Sun, 12 Aug 2007 16:39:43 +0000 (GMT) Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <46BE98FC.8040606@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46BB7076.3080502@cs.uchicago.edu> <1186689619.31721.1.camel@blabla.mcs.anl.gov> <46BB728D.8060605@cs.uchicago.edu> <46BB78F6.2090309@cs.uchicago.edu> <1186691557.524.5.camel@blabla.mcs.anl.gov> <46BB7B7D.90602@cs.uchicago.edu> <46BB97A8.3060006@cs.uchicago.edu> <5CB62511-5C52-4C63-8EE6-C36A4A4457DF@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> Message-ID: please make sure that all the code associated with this is in version control somewhere. -- From hategan at mcs.anl.gov Sun Aug 12 12:00:48 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 12 Aug 2007 12:00:48 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <46BE98FC.8040606@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46BB7076.3080502@cs.uchicago.edu> <1186689619.31721.1.camel@blabla.mcs.anl.gov> <46BB728D.8060605@cs.uchicago.edu> <46BB78F6.2090309@cs.uchicago.edu> <1186691557.524.5.camel@blabla.mcs.anl.gov> <46BB7B7D.90602@cs.uchicago.edu> <46BB97A8.3060006@cs.uchicago.edu> <5CB62511-5C52-4C63-8EE6-C36A4A4457DF@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> Message-ID: <1186938048.24879.8.camel@blabla.mcs.anl.gov> On Sun, 2007-08-12 at 00:22 -0500, Ioan Raicu wrote: > Hi, > Here is a quick recap of the 244 MolDyn run we made this weekend... > > I have posted the logs and graphs at: > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/ > > 11079 failed with a -1. > 2 failed with an exit code of 127. > > Inspecting the logs revealed the infamous stale NFS handle error! > > A single machine (192.5.198.37) had all the failed tasks (11081 > tasks); the machine was not completely broken, as it did complete 4 > tasks successfully, although the completion times were considerably > higher than the other machines. It seems a bit inefficient that 1/3 of the tasks would go to the one machine (out of a fairly large number) that consistently fails tasks. > > 20836 tasks finished with an exit code 0. > > I was expecting 20497 tasks broken down as follows: > > 1 > 1 > 1 > 1 > 244 > 244 > 1 > 244 > 244 > 68 > 244 > 16592 > 1 > 244 > 244 > 11 > 244 > 2684 > 1 > 244 > 244 > 1 > 244 > 244 > > > > > > > > > > > 20497 > > I do not know why there were 339 more tasks than we were expecting. > > A close look at the summary graph > (http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/summary_graph-med.jpg), we see that after the large number of failed tasks, the queue length (blue line) quickly went to 0, and then stayed there as Swift was trickling in only about 100 tasks at a time. > For the rest of the experiment, only about 100 tasks at a time were > ever running. This is not the first time we have seen this, and it > seems that is only showing up when there is a bad machine failing many > tasks, and essentially Swift doesn't try to resubmit them fast, and > the jobs only trickle in thereafter not keeping all the processors > busy. That's the job throttle set to 10000, multiplied by a score of 0.01 (after all those failures). > > When we had runs with no bad nodes and no large number of failures, > this did not happen, and Swift essentially submitted all independent > tasks to Falkon. I know there is a heuristic within Karajan that is > probably affecting the submit rate of tasks after the large number of > failures happened, but I think it needs to be tuned to recover from > large number of failures so in time, it again attempts to send more. It does. Unfortunately jobs keep failing. Set the aforementioned throttle higher until a better algorithm is stuck in the scheduler. That or stop sending jobs to a machine that keeps failing them. > A good analogy is TCP, think of its window size increasing larger and > larger, but then a large number of packets get lost, and TCP collapses > its window size, but then never recovering from this and remaining > with a small window size for the rest of the connection, regardless of > the fact that it could again increase the window size until the next > round of lost packets... Your analogy is incorrect. In this case the score is kept low because jobs keep on failing, even after the throttling kicks in. Mihael > I believe the normal behavior should allow Swift to recover and again > submit many tasks to Falkon. If this heuristic cannot be easily > tweaked or made to recover from the "window collapse", could we > disable it when we are running on Falkon at a single site? > > BTW, here were the graphs from a previous run when only the last few > jobs didn't finish due to a bug in the application code. > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ > In this run, notice that there were no bad nodes that caused many > tasks to fail, and Swift submitted many tasks to Falkon, and managed > to keep all processors busy! > > I think we can call the 244-mol MolDyn run a success, both the current > run and the previous run from 7-16-07 that almost finished! > > We need to figure out how to control the job throttling better, and > perhaps on how to automatically detect this plaguing problem with > "Stale NFS handle", and possibly contain the damage to significantly > fewer task failures. I also think that increasing the # of retries > from Swift's end should be considered when running over Falkon. > Notice that a single worker can fail as many as 1000 tasks per minute, > which are many tasks given that when the NFS stale handle shows up, > its around for tens of seconds to minutes at a time. > > BTW, the run we just made consummed about 1556.9 CPU hours (937.7 used > and 619.2 wasted) in 8.5 hours. In contrast, the run we made on > 7-16-07 which almost finished, but behaved much better since there > were no node failures, consumed about 866.4 CPU hours (866.3 used and > 0.1 wasted) in 4.18 hours. > > When Nika comes back from vacation, we can try the real application, > which should consume some 16K CPU hours (service units)! She also > has her own temporary allocation at ANL/UC now, so we can use that! > > Ioan > > Ioan Raicu wrote: > > I think the workflow finally completed successfully, but there are > > still some oddities in the way the logs look (especially job > > throttling, a few hundred more jobs than I was expecting, etc). At > > least, we have all the output we needed for every molecule! > > > > I'll write up a summary of what happened, and draw up some nice > > graphs, and send it out later today. > > > > Ioan > > > > iraicu at viper:/home/nefedova/alamines> ls fe_* | wc > > 488 488 6832 > > From iraicu at cs.uchicago.edu Sun Aug 12 22:46:06 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 12 Aug 2007 22:46:06 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <1186938048.24879.8.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46BB728D.8060605@cs.uchicago.edu> <46BB78F6.2090309@cs.uchicago.edu> <1186691557.524.5.camel@blabla.mcs.anl.gov> <46BB7B7D.90602@cs.uchicago.edu> <46BB97A8.3060006@cs.uchicago.edu> <5CB62511-5C52-4C63-8EE6-C36A4A4457DF@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> Message-ID: <46BFD3FE.1090205@cs.uchicago.edu> Mihael Hategan wrote: > On Sun, 2007-08-12 at 00:22 -0500, Ioan Raicu wrote: > >> Hi, >> Here is a quick recap of the 244 MolDyn run we made this weekend... >> >> I have posted the logs and graphs at: >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/ >> >> 11079 failed with a -1. >> 2 failed with an exit code of 127. >> >> Inspecting the logs revealed the infamous stale NFS handle error! >> >> A single machine (192.5.198.37) had all the failed tasks (11081 >> tasks); the machine was not completely broken, as it did complete 4 >> tasks successfully, although the completion times were considerably >> higher than the other machines. >> > > It seems a bit inefficient that 1/3 of the tasks would go to the one > machine (out of a fairly large number) that consistently fails tasks. > > Only one machine had problems with the GPFS mount. The errors we happening within the first 10 ms or so, and the communication overhead was around 20~30 ms, so we are talking about a bad machine that is failing tasks every 30~40 ms. while other machines that were operating normally had jobs lasting a few minutes. Now, the GPFS mount errors came in bursts of some tens of seconds to maybe a minute or two (several of these), in which it failed all the tasks in a few batches. >> >> 20836 tasks finished with an exit code 0. >> >> I was expecting 20497 tasks broken down as follows: >> >> 1 >> 1 >> 1 >> 1 >> 244 >> 244 >> 1 >> 244 >> 244 >> 68 >> 244 >> 16592 >> 1 >> 244 >> 244 >> 11 >> 244 >> 2684 >> 1 >> 244 >> 244 >> 1 >> 244 >> 244 >> >> >> >> >> >> >> >> >> >> >> 20497 >> >> I do not know why there were 339 more tasks than we were expecting. >> >> A close look at the summary graph >> (http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/summary_graph-med.jpg), we see that after the large number of failed tasks, the queue length (blue line) quickly went to 0, and then stayed there as Swift was trickling in only about 100 tasks at a time. >> For the rest of the experiment, only about 100 tasks at a time were >> ever running. This is not the first time we have seen this, and it >> seems that is only showing up when there is a bad machine failing many >> tasks, and essentially Swift doesn't try to resubmit them fast, and >> the jobs only trickle in thereafter not keeping all the processors >> busy. >> > > That's the job throttle set to 10000, multiplied by a score of 0.01 > (after all those failures). > OK, so should we set the job throttle higher, ideally to make sure that even in the worst case (such as the one we found), it still sends enough jobs to keep the processors busy? In our case, we should have set it to 25000 to get about 250 concurrent jobs. > >> When we had runs with no bad nodes and no large number of failures, >> this did not happen, and Swift essentially submitted all independent >> tasks to Falkon. I know there is a heuristic within Karajan that is >> probably affecting the submit rate of tasks after the large number of >> failures happened, but I think it needs to be tuned to recover from >> large number of failures so in time, it again attempts to send more. >> > > It does. Unfortunately jobs keep failing. I don't think that is the case... they failed in a few bunches over a relatively small amount of time... > Set the aforementioned > throttle higher until a better algorithm is stuck in the scheduler. That > or stop sending jobs to a machine that keeps failing them. > This is not hard to do in Falkon, look at the exit codes of the application and do some housekeeping around that, but its not all that clear that this kind of logic should be in Falkon. I am not sure how easy its going to be to discern between machine failures and other errors. I believe the reaction within Falkon should be different between a machine failure and other errors, so its important to discern between these. If Falkon is to take some action when a certain machine keeps failing jobs, what does everyone recommend? Should it blacklist the machine to never send jobs again to it, should it just suspend the machine job dispatch for some time, should it actually retry failed jobs on other nodes, etc... > >> A good analogy is TCP, think of its window size increasing larger and >> larger, but then a large number of packets get lost, and TCP collapses >> its window size, but then never recovering from this and remaining >> with a small window size for the rest of the connection, regardless of >> the fact that it could again increase the window size until the next >> round of lost packets... >> > > Your analogy is incorrect. In this case the score is kept low because > jobs keep on failing, even after the throttling kicks in. > I would argue against your theory.... the last job (#12794) failed at 3954 seconds into the experiment, yet the last job (#31917) was ended at 30600 seconds. There were no failed jobs in the last 26K+ seconds with 19K+ jobs. Now my question is again, why would the score not improve at all over this large period of time and jobs, as the throtling seems to be relatively constant throughout the experiment (after the failed jobs). Ioan > Mihael > > >> I believe the normal behavior should allow Swift to recover and again >> submit many tasks to Falkon. If this heuristic cannot be easily >> tweaked or made to recover from the "window collapse", could we >> disable it when we are running on Falkon at a single site? >> >> BTW, here were the graphs from a previous run when only the last few >> jobs didn't finish due to a bug in the application code. >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ >> In this run, notice that there were no bad nodes that caused many >> tasks to fail, and Swift submitted many tasks to Falkon, and managed >> to keep all processors busy! >> >> I think we can call the 244-mol MolDyn run a success, both the current >> run and the previous run from 7-16-07 that almost finished! >> >> We need to figure out how to control the job throttling better, and >> perhaps on how to automatically detect this plaguing problem with >> "Stale NFS handle", and possibly contain the damage to significantly >> fewer task failures. I also think that increasing the # of retries >> from Swift's end should be considered when running over Falkon. >> Notice that a single worker can fail as many as 1000 tasks per minute, >> which are many tasks given that when the NFS stale handle shows up, >> its around for tens of seconds to minutes at a time. >> >> BTW, the run we just made consummed about 1556.9 CPU hours (937.7 used >> and 619.2 wasted) in 8.5 hours. In contrast, the run we made on >> 7-16-07 which almost finished, but behaved much better since there >> were no node failures, consumed about 866.4 CPU hours (866.3 used and >> 0.1 wasted) in 4.18 hours. >> >> When Nika comes back from vacation, we can try the real application, >> which should consume some 16K CPU hours (service units)! She also >> has her own temporary allocation at ANL/UC now, so we can use that! >> >> Ioan >> >> Ioan Raicu wrote: >> >>> I think the workflow finally completed successfully, but there are >>> still some oddities in the way the logs look (especially job >>> throttling, a few hundred more jobs than I was expecting, etc). At >>> least, we have all the output we needed for every molecule! >>> >>> I'll write up a summary of what happened, and draw up some nice >>> graphs, and send it out later today. >>> >>> Ioan >>> >>> iraicu at viper:/home/nefedova/alamines> ls fe_* | wc >>> 488 488 6832 >>> >>> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sun Aug 12 23:21:31 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sun, 12 Aug 2007 23:21:31 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <46BFD3FE.1090205@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46BB728D.8060605@cs.uchicago.edu> <46BB78F6.2090309@cs.uchicago.edu> <1186691557.524.5.camel@blabla.mcs.anl.gov> <46BB7B7D.90602@cs.uchicago.edu> <46BB97A8.3060006@cs.uchicago.edu> <5CB62511-5C52-4C63-8EE6-C36A4A4457DF@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> Message-ID: <1186978892.21992.12.camel@blabla.mcs.anl.gov> > > > > Your analogy is incorrect. In this case the score is kept low because > > jobs keep on failing, even after the throttling kicks in. > > > I would argue against your theory.... the last job (#12794) failed at > 3954 seconds into the experiment, yet the last job (#31917) was ended > at 30600 seconds. Strange. I've seen jobs failing all throughout the run. Did something make falkon stop sending jobs to that broken node? Statistically every job would have a 1/total_workers chance of going the the one place where it shouldn't. Higher if some "good" workers are busy doing actual stuff. > There were no failed jobs in the last 26K+ seconds with 19K+ jobs. > Now my question is again, why would the score not improve at all Quantify "at all". Point is it may take quite a few jobs to make up for ~10000 failed ones. The ratio is 1/5 (i.e. it takes 5 successful jobs to make up for a failed one). Should the probability of jobs failing be less than 1/5 in a certain time window, the score should increase. In the context in which jobs are sent to non-busy workers, the system would tend to produce lots of failed jobs if it takes little time (compared to the normal run-time of a job) for a bad worker to fail a job. This *IS* why the swift scheduler throttles in the beginning: to avoid sending a large number of jobs to a resource that is broken. Mihael > over this large period of time and jobs, as the throtling seems to be > relatively constant throughout the experiment (after the failed jobs). > > Ioan > > Mihael > > > > > > > I believe the normal behavior should allow Swift to recover and again > > > submit many tasks to Falkon. If this heuristic cannot be easily > > > tweaked or made to recover from the "window collapse", could we > > > disable it when we are running on Falkon at a single site? > > > > > > BTW, here were the graphs from a previous run when only the last few > > > jobs didn't finish due to a bug in the application code. > > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ > > > In this run, notice that there were no bad nodes that caused many > > > tasks to fail, and Swift submitted many tasks to Falkon, and managed > > > to keep all processors busy! > > > > > > I think we can call the 244-mol MolDyn run a success, both the current > > > run and the previous run from 7-16-07 that almost finished! > > > > > > We need to figure out how to control the job throttling better, and > > > perhaps on how to automatically detect this plaguing problem with > > > "Stale NFS handle", and possibly contain the damage to significantly > > > fewer task failures. I also think that increasing the # of retries > > > from Swift's end should be considered when running over Falkon. > > > Notice that a single worker can fail as many as 1000 tasks per minute, > > > which are many tasks given that when the NFS stale handle shows up, > > > its around for tens of seconds to minutes at a time. > > > > > > BTW, the run we just made consummed about 1556.9 CPU hours (937.7 used > > > and 619.2 wasted) in 8.5 hours. In contrast, the run we made on > > > 7-16-07 which almost finished, but behaved much better since there > > > were no node failures, consumed about 866.4 CPU hours (866.3 used and > > > 0.1 wasted) in 4.18 hours. > > > > > > When Nika comes back from vacation, we can try the real application, > > > which should consume some 16K CPU hours (service units)! She also > > > has her own temporary allocation at ANL/UC now, so we can use that! > > > > > > Ioan > > > > > > Ioan Raicu wrote: > > > > > > > I think the workflow finally completed successfully, but there are > > > > still some oddities in the way the logs look (especially job > > > > throttling, a few hundred more jobs than I was expecting, etc). At > > > > least, we have all the output we needed for every molecule! > > > > > > > > I'll write up a summary of what happened, and draw up some nice > > > > graphs, and send it out later today. > > > > > > > > Ioan > > > > > > > > iraicu at viper:/home/nefedova/alamines> ls fe_* | wc > > > > 488 488 6832 > > > > > > > > > > > > > > From iraicu at cs.uchicago.edu Sun Aug 12 22:13:36 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Sun, 12 Aug 2007 22:13:36 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <1186689619.31721.1.camel@blabla.mcs.anl.gov> <46BB728D.8060605@cs.uchicago.edu> <46BB78F6.2090309@cs.uchicago.edu> <1186691557.524.5.camel@blabla.mcs.anl.gov> <46BB7B7D.90602@cs.uchicago.edu> <46BB97A8.3060006@cs.uchicago.edu> <5CB62511-5C52-4C63-8EE6-C36A4A4457DF@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> Message-ID: <46BFCC60.8080207@cs.uchicago.edu> I don't have my CI password yet, so I still cannot commit my changes. I'll stop by Greg's office tomorrow, maybe that will speed up the password reset request. Ioan Ben Clifford wrote: > please make sure that all the code associated with this is in version > control somewhere. > From iraicu at cs.uchicago.edu Mon Aug 13 15:17:06 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 13 Aug 2007 15:17:06 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <1186978892.21992.12.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46BB78F6.2090309@cs.uchicago.edu> <1186691557.524.5.camel@blabla.mcs.anl.gov> <46BB7B7D.90602@cs.uchicago.edu> <46BB97A8.3060006@cs.uchicago.edu> <5CB62511-5C52-4C63-8EE6-C36A4A4457DF@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> Message-ID: <46C0BC42.6050108@cs.uchicago.edu> Mihael Hategan wrote: >>> Your analogy is incorrect. In this case the score is kept low because >>> jobs keep on failing, even after the throttling kicks in. >>> >>> >> I would argue against your theory.... the last job (#12794) failed at >> 3954 seconds into the experiment, yet the last job (#31917) was ended >> at 30600 seconds. >> > > Strange. I've seen jobs failing all throughout the run. Did something > make falkon stop sending jobs to that broken node? At time xxx, the bad node was deregistered for failing to answer notifications. The graph below shows just the jobs for the bad node. So for the first hour or so of the experiment, there were 4 jobs that were successful (these are the faint black lines below that are horizontal, showing their long execution time), and the rest all failed (denoted by the small dots... showing their short execution time). Then the node de-registered, and did not come back for the rest of the experiment. > Statistically every > job would have a 1/total_workers chance of going the the one place where > it shouldn't. Higher if some "good" workers are busy doing actual stuff. > > >> There were no failed jobs in the last 26K+ seconds with 19K+ jobs. >> Now my question is again, why would the score not improve at all >> > > Quantify "at all". Look at http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/summary_graph-med.jpg! Do you see the # of active (green) workers as a relatively flat line at around 100 (and this is with the wait queue length being 0, so Swift was simply not sending enough work to keep Falkon's 200+ workers busy)? If the score would have improved, then I would have expected an upward trend on the number of active workers! > Point is it may take quite a few jobs to make up for > ~10000 failed ones. The ratio is 1/5 (i.e. it takes 5 successful jobs to > make up for a failed one). Should the probability of jobs failing be > less than 1/5 in a certain time window, the score should increase. > So you are saying that 19K+ successful jobs was not enough to counteract the 10K+ failed jobs from the early part of the experiment? Can this ratio (1:5) be changed? From this experiment, it would seem that the euristic is a slow learner... maybe you ahve ideas on how to make it more quick to adapt to changes? > In the context in which jobs are sent to non-busy workers, the system > would tend to produce lots of failed jobs if it takes little time > (compared to the normal run-time of a job) for a bad worker to fail a > job. This *IS* why the swift scheduler throttles in the beginning: to > avoid sending a large number of jobs to a resource that is broken. > But not the whole resource is broken... that is the whole point here... anyways, I think this is a valid case that we need to discuss how to handle, to make the entire Swift+Falkon more robust! BTW, here is another experiment with MolDyn that shows the throttling and this heuristic behaving as I would expected! http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed/summary_graph.jpg Notice the queue lenth (blue line) at around 11K seconds dropped sharply, but then grew back up. That sudden drop was many jobs failing fast on a bad node, and the sudden growth back up was Swift re-submitting almost the same # of jobs that failed back to Falkon. The same thing happened again at around 16K seconds. Now my question is, why did it work so nicely in this experiment, and not in our latest? Could it be that there were many succesful jobs done (10K+) at the time of the first failure? And the failure was short enough that it only produced maybe 1K failed jobs? If this is the reason, then one way to make the playing field more even, to handle both cases is to use a sliding window when training the heuristic, instead of the entire history. You can then adjust the window size to make the heuristic more responsive or more consistent! We should certainly talk around this issue what needs to be done, and who will do it! Ioan > Mihael > > >> over this large period of time and jobs, as the throtling seems to be >> relatively constant throughout the experiment (after the failed jobs). >> >> Ioan >> >>> Mihael >>> >>> >>> >>>> I believe the normal behavior should allow Swift to recover and again >>>> submit many tasks to Falkon. If this heuristic cannot be easily >>>> tweaked or made to recover from the "window collapse", could we >>>> disable it when we are running on Falkon at a single site? >>>> >>>> BTW, here were the graphs from a previous run when only the last few >>>> jobs didn't finish due to a bug in the application code. >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ >>>> In this run, notice that there were no bad nodes that caused many >>>> tasks to fail, and Swift submitted many tasks to Falkon, and managed >>>> to keep all processors busy! >>>> >>>> I think we can call the 244-mol MolDyn run a success, both the current >>>> run and the previous run from 7-16-07 that almost finished! >>>> >>>> We need to figure out how to control the job throttling better, and >>>> perhaps on how to automatically detect this plaguing problem with >>>> "Stale NFS handle", and possibly contain the damage to significantly >>>> fewer task failures. I also think that increasing the # of retries >>>> from Swift's end should be considered when running over Falkon. >>>> Notice that a single worker can fail as many as 1000 tasks per minute, >>>> which are many tasks given that when the NFS stale handle shows up, >>>> its around for tens of seconds to minutes at a time. >>>> >>>> BTW, the run we just made consummed about 1556.9 CPU hours (937.7 used >>>> and 619.2 wasted) in 8.5 hours. In contrast, the run we made on >>>> 7-16-07 which almost finished, but behaved much better since there >>>> were no node failures, consumed about 866.4 CPU hours (866.3 used and >>>> 0.1 wasted) in 4.18 hours. >>>> >>>> When Nika comes back from vacation, we can try the real application, >>>> which should consume some 16K CPU hours (service units)! She also >>>> has her own temporary allocation at ANL/UC now, so we can use that! >>>> >>>> Ioan >>>> >>>> Ioan Raicu wrote: >>>> >>>> >>>>> I think the workflow finally completed successfully, but there are >>>>> still some oddities in the way the logs look (especially job >>>>> throttling, a few hundred more jobs than I was expecting, etc). At >>>>> least, we have all the output we needed for every molecule! >>>>> >>>>> I'll write up a summary of what happened, and draw up some nice >>>>> graphs, and send it out later today. >>>>> >>>>> Ioan >>>>> >>>>> iraicu at viper:/home/nefedova/alamines> ls fe_* | wc >>>>> 488 488 6832 >>>>> >>>>> >>>>> >>> >>> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: moz-screenshot-1.jpg Type: image/jpeg Size: 24336 bytes Desc: not available URL: From hategan at mcs.anl.gov Mon Aug 13 15:47:11 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 13 Aug 2007 15:47:11 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <46C0BC42.6050108@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46BB78F6.2090309@cs.uchicago.edu> <1186691557.524.5.camel@blabla.mcs.anl.gov> <46BB7B7D.90602@cs.uchicago.edu> <46BB97A8.3060006@cs.uchicago.edu> <5CB62511-5C52-4C63-8EE6-C36A4A4457DF@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> Message-ID: <1187038031.5916.23.camel@blabla.mcs.anl.gov> On Mon, 2007-08-13 at 15:17 -0500, Ioan Raicu wrote: > > > Mihael Hategan wrote: > > > > Your analogy is incorrect. In this case the score is kept low because > > > > jobs keep on failing, even after the throttling kicks in. > > > > > > > > > > > I would argue against your theory.... the last job (#12794) failed at > > > 3954 seconds into the experiment, yet the last job (#31917) was ended > > > at 30600 seconds. > > > > > > > Strange. I've seen jobs failing all throughout the run. Did something > > make falkon stop sending jobs to that broken node? > At time xxx, the bad node was deregistered for failing to answer > notifications. The graph below shows just the jobs for the bad node. > So for the first hour or so of the experiment, there were 4 jobs that > were successful (these are the faint black lines below that are > horizontal, showing their long execution time), and the rest all > failed (denoted by the small dots... showing their short execution > time). Then the node de-registered, and did not come back for the > rest of the experiment. > Ok. > > > Statistically every > > job would have a 1/total_workers chance of going the the one place where > > it shouldn't. Higher if some "good" workers are busy doing actual stuff. > > > > > > > There were no failed jobs in the last 26K+ seconds with 19K+ jobs. > > > Now my question is again, why would the score not improve at all > > > > > > > Quantify "at all". > Look at > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/summary_graph-med.jpg! > Do you see the # of active (green) workers as a relatively flat line > at around 100 (and this is with the wait queue length being 0, so > Swift was simply not sending enough work to keep Falkon's 200+ workers > busy)? If the score would have improved, then I would have expected > an upward trend on the number of active workers! small != not at all > > Point is it may take quite a few jobs to make up for > > ~10000 failed ones. The ratio is 1/5 (i.e. it takes 5 successful jobs to > > make up for a failed one). Should the probability of jobs failing be > > less than 1/5 in a certain time window, the score should increase. > > > So you are saying that 19K+ successful jobs was not enough to > counteract the 10K+ failed jobs from the early part of the > experiment? Yep. 19*1/5 = 3.8 < 10. > Can this ratio (1:5) be changed? Yes. The scheduler has two relevant properties: successFactor (currently 0.1) and failureFactor (currently -0.5). The term "factor" is not used formally, since these get added to the current score. > From this experiment, it would seem that the euristic is a slow > learner... maybe you ahve ideas on how to make it more quick to adapt > to changes? That could perhaps be done. > > In the context in which jobs are sent to non-busy workers, the system > > would tend to produce lots of failed jobs if it takes little time > > (compared to the normal run-time of a job) for a bad worker to fail a > > job. This *IS* why the swift scheduler throttles in the beginning: to > > avoid sending a large number of jobs to a resource that is broken. > > > But not the whole resource is broken... No, just slightly more than 1/3 of it. At least that's how it appears from the outside. > that is the whole point here... This point comes because you KNOW how things work internally. All Swift sees is 10K failed jobs out of 29K. > anyways, I think this is a valid case that we need to discuss how to > handle, to make the entire Swift+Falkon more robust! > > BTW, here is another experiment with MolDyn that shows the throttling > and this heuristic behaving as I would expected! > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed/summary_graph.jpg > > Notice the queue lenth (blue line) at around 11K seconds dropped > sharply, but then grew back up. That sudden drop was many jobs > failing fast on a bad node, and the sudden growth back up was Swift > re-submitting almost the same # of jobs that failed back to Falkon. That failing many jobs fast behavior is not right, regardless of whether Swift can deal with it or not. Frankly I'd rather Swift not be the part to deal with it because it has to resort to heuristics, whereas Falkon has direct knowledge of which nodes do what. Mihael > The same thing happened again at around 16K seconds. Now my > question is, why did it work so nicely in this experiment, and not in > our latest? Could it be that there were many succesful jobs done > (10K+) at the time of the first failure? And the failure was short > enough that it only produced maybe 1K failed jobs? If this is the > reason, then one way to make the playing field more even, to handle > both cases is to use a sliding window when training the heuristic, > instead of the entire history. You can then adjust the window size to > make the heuristic more responsive or more consistent! > > We should certainly talk around this issue what needs to be done, and > who will do it! > > Ioan > > Mihael > > > > > > > over this large period of time and jobs, as the throtling seems to be > > > relatively constant throughout the experiment (after the failed jobs). > > > > > > Ioan > > > > > > > Mihael > > > > > > > > > > > > > > > > > I believe the normal behavior should allow Swift to recover and again > > > > > submit many tasks to Falkon. If this heuristic cannot be easily > > > > > tweaked or made to recover from the "window collapse", could we > > > > > disable it when we are running on Falkon at a single site? > > > > > > > > > > BTW, here were the graphs from a previous run when only the last few > > > > > jobs didn't finish due to a bug in the application code. > > > > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed-7-16-07/ > > > > > In this run, notice that there were no bad nodes that caused many > > > > > tasks to fail, and Swift submitted many tasks to Falkon, and managed > > > > > to keep all processors busy! > > > > > > > > > > I think we can call the 244-mol MolDyn run a success, both the current > > > > > run and the previous run from 7-16-07 that almost finished! > > > > > > > > > > We need to figure out how to control the job throttling better, and > > > > > perhaps on how to automatically detect this plaguing problem with > > > > > "Stale NFS handle", and possibly contain the damage to significantly > > > > > fewer task failures. I also think that increasing the # of retries > > > > > from Swift's end should be considered when running over Falkon. > > > > > Notice that a single worker can fail as many as 1000 tasks per minute, > > > > > which are many tasks given that when the NFS stale handle shows up, > > > > > its around for tens of seconds to minutes at a time. > > > > > > > > > > BTW, the run we just made consummed about 1556.9 CPU hours (937.7 used > > > > > and 619.2 wasted) in 8.5 hours. In contrast, the run we made on > > > > > 7-16-07 which almost finished, but behaved much better since there > > > > > were no node failures, consumed about 866.4 CPU hours (866.3 used and > > > > > 0.1 wasted) in 4.18 hours. > > > > > > > > > > When Nika comes back from vacation, we can try the real application, > > > > > which should consume some 16K CPU hours (service units)! She also > > > > > has her own temporary allocation at ANL/UC now, so we can use that! > > > > > > > > > > Ioan > > > > > > > > > > Ioan Raicu wrote: > > > > > > > > > > > > > > > > I think the workflow finally completed successfully, but there are > > > > > > still some oddities in the way the logs look (especially job > > > > > > throttling, a few hundred more jobs than I was expecting, etc). At > > > > > > least, we have all the output we needed for every molecule! > > > > > > > > > > > > I'll write up a summary of what happened, and draw up some nice > > > > > > graphs, and send it out later today. > > > > > > > > > > > > Ioan > > > > > > > > > > > > iraicu at viper:/home/nefedova/alamines> ls fe_* | wc > > > > > > 488 488 6832 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > From iraicu at cs.uchicago.edu Mon Aug 13 23:07:20 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 13 Aug 2007 23:07:20 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <1187038031.5916.23.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46BB7B7D.90602@cs.uchicago.edu> <46BB97A8.3060006@cs.uchicago.edu> <5CB62511-5C52-4C63-8EE6-C36A4A4457DF@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> Message-ID: <46C12A78.5000602@cs.uchicago.edu> Mihael Hategan wrote: > On Mon, 2007-08-13 at 15:17 -0500, Ioan Raicu wrote: > >> Mihael Hategan wrote: >> > >>> >> Look at >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/summary_graph-med.jpg! >> Do you see the # of active (green) workers as a relatively flat line >> at around 100 (and this is with the wait queue length being 0, so >> Swift was simply not sending enough work to keep Falkon's 200+ workers >> busy)? If the score would have improved, then I would have expected >> an upward trend on the number of active workers! >> > > small != not at all > Check out these two graphs, showing the # of active tasks within Falkon! Active tasks = queued+pending+active+done_and_not_delivered. http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/number-of-active-tasks.jpg http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/number-of-active-tasks-zoom.jpg Notice that after 3600 some seconds (after all the jobs that failed had failed), the # of active tasks in Falkon oscillates between 100 and 101 active tasks! The # presented on these graphs are taken from the median value per minute (the raw samples were 60 samples per minute). Notice that only at the very end of the experiment, at 30K+ seconds, the # of active tasks increases to a max of 109 for a brief period of time before it drops towards 0 as the workflow completes. I did notice that towards the end of the workflow, the jobs were typically shorter, and perhaps that somehow influenced the # of active tasks within Falkon... So, when I said not at all, I was refering to this flat line 100~101 active tasks that is shown in these figures! > >>> >> So you are saying that 19K+ successful jobs was not enough to >> counteract the 10K+ failed jobs from the early part of the >> experiment? >> > > Yep. 19*1/5 = 3.8 < 10. > > >> Can this ratio (1:5) be changed? >> > > Yes. The scheduler has two relevant properties: successFactor (currently > 0.1) and failureFactor (currently -0.5). The term "factor" is not used > formally, since these get added to the current score. > > >> From this experiment, it would seem that the euristic is a slow >> learner... maybe you ahve ideas on how to make it more quick to adapt >> to changes? >> > > That could perhaps be done. > > >>> In the context in which jobs are sent to non-busy workers, the system >>> would tend to produce lots of failed jobs if it takes little time >>> (compared to the normal run-time of a job) for a bad worker to fail a >>> job. This *IS* why the swift scheduler throttles in the beginning: to >>> avoid sending a large number of jobs to a resource that is broken. >>> >>> >> But not the whole resource is broken... >> > > No, just slightly more than 1/3 of it. At least that's how it appears > from the outside. > But a failed job should not be given the same weight as a succesful job, in my oppinion. For example, it seems to me that you are giving the failed jobs 5 times more weight than succesful jobs, but in reality it should be the other way around. Failed jobs usually will fail quickly (as in the case that we have in MolDyn), or they will fail slowly (within the lifetime of the resource allocation). On the other hand, most successful jobs will likely take more time to complete that it takes for a job to fail (if it fails quickly). Perhaps instead of > successFactor (currently > 0.1) and failureFactor (currently -0.5) it should be more like: successFactor: +1*(executionTime) failureFactor: -1*(failureTime) The 1 could of course be changed with some other weight to give preference to successful jobs, or to failed jobs. With this kind of strategy, the problems we are facing with throttling when there are large # of short failures wouldn't be happening! Do you see any drawbacks to this approach? > >> that is the whole point here... >> > > This point comes because you KNOW how things work internally. All Swift > sees is 10K failed jobs out of 29K. > > >> anyways, I think this is a valid case that we need to discuss how to >> handle, to make the entire Swift+Falkon more robust! >> >> BTW, here is another experiment with MolDyn that shows the throttling >> and this heuristic behaving as I would expected! >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed/summary_graph.jpg >> >> Notice the queue lenth (blue line) at around 11K seconds dropped >> sharply, but then grew back up. That sudden drop was many jobs >> failing fast on a bad node, and the sudden growth back up was Swift >> re-submitting almost the same # of jobs that failed back to Falkon. >> > > That failing many jobs fast behavior is not right, regardless of whether > Swift can deal with it or not. If its a machine error, then it would be best to not fail many jobs fast... however, if its an app error, you want to fail the tasks as fast as possible to fail the entire workflow faster, so the app can be fixed and the workflow retried! For example, say you had 1000 tasks (all independent), and had a wrong path set to the app... with the current Falkon behaviour, the entire workflow would likely fail within some 10~20 seconds of it submitting the first task! However, if Falkon does some "smart" throttling when it sees failures, its going to take time proportional to the failures to fail the workflow! Essentially, I am not a bit fan of throttling task dispatch due to failed executions, unless we know why these tasks failed! Exit codes are not usually enough in general, unless we define our own and the app and wrapper scripts generate these particular exit codes that Falkon can intercept and interpret reliably! > Frankly I'd rather Swift not be the part > to deal with it because it has to resort to heuristics, whereas Falkon > has direct knowledge of which nodes do what. > That's fine, but I don't think Falkon can do it alone, it needs context and failure definition, which I believe only the application and Swift could say for certain! Ioan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Aug 13 23:31:18 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 13 Aug 2007 23:31:18 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <46C12A78.5000602@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46BB7B7D.90602@cs.uchicago.edu> <46BB97A8.3060006@cs.uchicago.edu> <5CB62511-5C52-4C63-8EE6-C36A4A4457DF@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> Message-ID: <1187065878.4015.19.camel@blabla.mcs.anl.gov> On Mon, 2007-08-13 at 23:07 -0500, Ioan Raicu wrote: > > > > > > > small != not at all > > > Check out these two graphs, showing the # of active tasks within > Falkon! Active tasks = queued+pending+active+done_and_not_delivered. > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/number-of-active-tasks.jpg > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/number-of-active-tasks-zoom.jpg > > Notice that after 3600 some seconds (after all the jobs that failed > had failed), the # of active tasks in Falkon oscillates between 100 > and 101 active tasks! The # presented on these graphs are taken from > the median value per minute (the raw samples were 60 samples per > minute). Notice that only at the very end of the experiment, at 30K+ > seconds, the # of active tasks increases to a max of 109 for a brief > period of time before it drops towards 0 as the workflow completes. I > did notice that towards the end of the workflow, the jobs were > typically shorter, and perhaps that somehow influenced the # of active > tasks within Falkon... So, when I said not at all, I was refering to > this flat line 100~101 active tasks that is shown in these figures! Then say "it appears (from x and y) that the number of concurrent jobs does not increase by an observable amount". This is not the same as "the score does not increase at all". > > > So you are saying that 19K+ successful jobs was not enough to > > > counteract the 10K+ failed jobs from the early part of the > > > experiment? > > > > > > > Yep. 19*1/5 = 3.8 < 10. > > > > > > > Can this ratio (1:5) be changed? > > > > > > > Yes. The scheduler has two relevant properties: successFactor (currently > > 0.1) and failureFactor (currently -0.5). The term "factor" is not used > > formally, since these get added to the current score. > > > > > > > From this experiment, it would seem that the euristic is a slow > > > learner... maybe you ahve ideas on how to make it more quick to adapt > > > to changes? > > > > > > > That could perhaps be done. > > > > > > > > In the context in which jobs are sent to non-busy workers, the system > > > > would tend to produce lots of failed jobs if it takes little time > > > > (compared to the normal run-time of a job) for a bad worker to fail a > > > > job. This *IS* why the swift scheduler throttles in the beginning: to > > > > avoid sending a large number of jobs to a resource that is broken. > > > > > > > > > > > But not the whole resource is broken... > > > > > > > No, just slightly more than 1/3 of it. At least that's how it appears > > from the outside. > > > But a failed job should not be given the same weight as a succesful > job, in my oppinion. Nope. I'd punish failures quite harshly. That's because the expected behavior is for things to work. I would not want a site that fails half the jobs to be anywhere near keeping a constant score. > For example, it seems to me that you are giving the failed jobs 5 > times more weight than succesful jobs, but in reality it should be the > other way around. Failed jobs usually will fail quickly (as in the > case that we have in MolDyn), or they will fail slowly (within the > lifetime of the resource allocation). On the other hand, most > successful jobs will likely take more time to complete that it takes > for a job to fail (if it fails quickly). Perhaps instead of > > successFactor (currently > > 0.1) and failureFactor (currently -0.5) > it should be more like: > successFactor: +1*(executionTime) > failureFactor: -1*(failureTime) That's a very good idea. Biasing score based on run-time (at least when known). Please note: you should still fix Falkon to not do that thing it's doing. > > The 1 could of course be changed with some other weight to give > preference to successful jobs, or to failed jobs. With this kind of > strategy, the problems we are facing with throttling when there are > large # of short failures wouldn't be happening! Do you see any > drawbacks to this approach? None that are obvious. It's in fact a good thing if the goal is performance, since it takes execution time into account. I've had manual "punishments" for connection time-outs because they take a long time to happen. But this time biasing naturally integrates that kind of stuff. So thanks. > > > that is the whole point here... > > > > > > > This point comes because you KNOW how things work internally. All Swift > > sees is 10K failed jobs out of 29K. > > > > > > > anyways, I think this is a valid case that we need to discuss how to > > > handle, to make the entire Swift+Falkon more robust! > > > > > > BTW, here is another experiment with MolDyn that shows the throttling > > > and this heuristic behaving as I would expected! > > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed/summary_graph.jpg > > > > > > Notice the queue lenth (blue line) at around 11K seconds dropped > > > sharply, but then grew back up. That sudden drop was many jobs > > > failing fast on a bad node, and the sudden growth back up was Swift > > > re-submitting almost the same # of jobs that failed back to Falkon. > > > > > > > That failing many jobs fast behavior is not right, regardless of whether > > Swift can deal with it or not. > If its a machine error, then it would be best to not fail many jobs > fast... > however, if its an app error, you want to fail the tasks as fast as > possible to fail the entire workflow faster, But you can't distinguish between the two. The best you can do is assume that the failure is a linear combination between broken application and broken node. If it's broken node, rescheduling would do (which does not happen in your case: jobs keep being sent to the worker that is not busy, and that's the broken one). If it's a broken application, then the way to distinguish it from the other one is that after a bunch of retries on different nodes, it still fails. Notice that different nodes is essential here. > so the app can be fixed and the workflow retried! For example, say > you had 1000 tasks (all independent), and had a wrong path set to the > app... with the current Falkon behaviour, the entire workflow would > likely fail within some 10~20 seconds of it submitting the first task! > However, if Falkon does some "smart" throttling when it sees failures, > its going to take time proportional to the failures to fail the > workflow! You're missing the part where all nodes fail the jobs equally, thus not creating the inequality we're talking about (the ones where broken nodes get higher chances of getting more jobs). > Essentially, I am not a bit fan of throttling task dispatch due to > failed executions, unless we know why these tasks failed! Stop putting exclamation marks after every sentence. It diminishes the meaning of it! Well, you can't know why these tasks failed. That's the whole problem. You're dealing with incomplete information and you have to devise heuristics that get things done efficiently. > Exit codes are not usually enough in general, unless we define our > own and the app and wrapper scripts generate these particular exit > codes that Falkon can intercept and interpret reliably! That would be an improvement, but probably not a universally valid assumption. So I wouldn't design with only that in mind. > > Frankly I'd rather Swift not be the part > > to deal with it because it has to resort to heuristics, whereas Falkon > > has direct knowledge of which nodes do what. > > > That's fine, but I don't think Falkon can do it alone, it needs > context and failure definition, which I believe only the application > and Swift could say for certain! Nope, they can't. Swift does not meddle with semantics of applications. They're all equally valuable functions. Now, there's stuff you can do to improve things, I'm guessing. You can choose not to, and then we can keep having this discussion. There might be stuff Swift can do, but it's not insight into applications, so you'll have to ask for something else. Mihael > > Ioan > From iraicu at cs.uchicago.edu Mon Aug 13 23:52:24 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 13 Aug 2007 23:52:24 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <1187065878.4015.19.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46BB97A8.3060006@cs.uchicago.edu> <5CB62511-5C52-4C63-8EE6-C36A4A4457DF@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> Message-ID: <46C13508.3070000@cs.uchicago.edu> Mihael Hategan wrote: > On Mon, 2007-08-13 at 23:07 -0500, Ioan Raicu wrote: > >>>> >>>> >>> small != not at all >>> >>> >> Check out these two graphs, showing the # of active tasks within >> Falkon! Active tasks = queued+pending+active+done_and_not_delivered. >> >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/number-of-active-tasks.jpg >> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/number-of-active-tasks-zoom.jpg >> >> Notice that after 3600 some seconds (after all the jobs that failed >> had failed), the # of active tasks in Falkon oscillates between 100 >> and 101 active tasks! The # presented on these graphs are taken from >> the median value per minute (the raw samples were 60 samples per >> minute). Notice that only at the very end of the experiment, at 30K+ >> seconds, the # of active tasks increases to a max of 109 for a brief >> period of time before it drops towards 0 as the workflow completes. I >> did notice that towards the end of the workflow, the jobs were >> typically shorter, and perhaps that somehow influenced the # of active >> tasks within Falkon... So, when I said not at all, I was refering to >> this flat line 100~101 active tasks that is shown in these figures! >> > > Then say "it appears (from x and y) that the number of concurrent jobs > does not increase by an observable amount". This is not the same as "the > score does not increase at all". > You are playing with words here... the bottom line is that after 19K+ jobs and several hours of successful jobs, there was no indication that the heuristic was adapting to the new conditions, in which no jobs were failing! > >>>> So you are saying that 19K+ successful jobs was not enough to >>>> counteract the 10K+ failed jobs from the early part of the >>>> experiment? >>>> >>>> >>> Yep. 19*1/5 = 3.8 < 10. >>> >>> >>> >>>> Can this ratio (1:5) be changed? >>>> >>>> >>> Yes. The scheduler has two relevant properties: successFactor (currently >>> 0.1) and failureFactor (currently -0.5). The term "factor" is not used >>> formally, since these get added to the current score. >>> >>> >>> >>>> From this experiment, it would seem that the euristic is a slow >>>> learner... maybe you ahve ideas on how to make it more quick to adapt >>>> to changes? >>>> >>>> >>> That could perhaps be done. >>> >>> >>> >>>>> In the context in which jobs are sent to non-busy workers, the system >>>>> would tend to produce lots of failed jobs if it takes little time >>>>> (compared to the normal run-time of a job) for a bad worker to fail a >>>>> job. This *IS* why the swift scheduler throttles in the beginning: to >>>>> avoid sending a large number of jobs to a resource that is broken. >>>>> >>>>> >>>>> >>>> But not the whole resource is broken... >>>> >>>> >>> No, just slightly more than 1/3 of it. At least that's how it appears >>> from the outside. >>> >>> >> But a failed job should not be given the same weight as a succesful >> job, in my oppinion. >> > > Nope. I'd punish failures quite harshly. That's because the expected > behavior is for things to work. I would not want a site that fails half > the jobs to be anywhere near keeping a constant score. > That is fine, but you have a case (such as this one) in which this is not ideal... how do you propose we adapt to cover this corner case? > >> For example, it seems to me that you are giving the failed jobs 5 >> times more weight than succesful jobs, but in reality it should be the >> other way around. Failed jobs usually will fail quickly (as in the >> case that we have in MolDyn), or they will fail slowly (within the >> lifetime of the resource allocation). On the other hand, most >> successful jobs will likely take more time to complete that it takes >> for a job to fail (if it fails quickly). Perhaps instead of >> >>> successFactor (currently >>> 0.1) and failureFactor (currently -0.5) >>> >> it should be more like: >> successFactor: +1*(executionTime) >> failureFactor: -1*(failureTime) >> > > That's a very good idea. Biasing score based on run-time (at least when > known). Please note: you should still fix Falkon to not do that thing > it's doing. > Its not clear to me this should be done all the time, Falkon needs to know why the failure happened to decide to throttle! > >> The 1 could of course be changed with some other weight to give >> preference to successful jobs, or to failed jobs. With this kind of >> strategy, the problems we are facing with throttling when there are >> large # of short failures wouldn't be happening! Do you see any >> drawbacks to this approach? >> > > None that are obvious. It's in fact a good thing if the goal is > performance, since it takes execution time into account. I've had manual > "punishments" for connection time-outs because they take a long time to > happen. But this time biasing naturally integrates that kind of stuff. > So thanks. > > >>>> that is the whole point here... >>>> >>>> >>> This point comes because you KNOW how things work internally. All Swift >>> sees is 10K failed jobs out of 29K. >>> >>> >>> >>>> anyways, I think this is a valid case that we need to discuss how to >>>> handle, to make the entire Swift+Falkon more robust! >>>> >>>> BTW, here is another experiment with MolDyn that shows the throttling >>>> and this heuristic behaving as I would expected! >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed/summary_graph.jpg >>>> >>>> Notice the queue lenth (blue line) at around 11K seconds dropped >>>> sharply, but then grew back up. That sudden drop was many jobs >>>> failing fast on a bad node, and the sudden growth back up was Swift >>>> re-submitting almost the same # of jobs that failed back to Falkon. >>>> >>>> >>> That failing many jobs fast behavior is not right, regardless of whether >>> Swift can deal with it or not. >>> >> If its a machine error, then it would be best to not fail many jobs >> fast... >> however, if its an app error, you want to fail the tasks as fast as >> possible to fail the entire workflow faster, >> > > But you can't distinguish between the two. The best you can do is assume > that the failure is a linear combination between broken application and > broken node. If it's broken node, rescheduling would do (which does not > happen in your case: jobs keep being sent to the worker that is not > busy, and that's the broken one). If it's a broken application, then the > way to distinguish it from the other one is that after a bunch of > retries on different nodes, it still fails. Notice that different nodes > is essential here. > Right, I could try to keep track of statistics on each node, and when failures happen, try to determine if its a system wide failure (all nodes reporting errors), or are the faiures isolated on a single (or small set) node(s)... I'll have to think about how to do this efficiently! > >> so the app can be fixed and the workflow retried! For example, say >> you had 1000 tasks (all independent), and had a wrong path set to the >> app... with the current Falkon behaviour, the entire workflow would >> likely fail within some 10~20 seconds of it submitting the first task! >> However, if Falkon does some "smart" throttling when it sees failures, >> its going to take time proportional to the failures to fail the >> workflow! >> > > You're missing the part where all nodes fail the jobs equally, thus not > creating the inequality we're talking about (the ones where broken nodes > get higher chances of getting more jobs). > Right, maybe we can use this to distinguish between node failure and app failure! > >> Essentially, I am not a bit fan of throttling task dispatch due to >> failed executions, unless we know why these tasks failed! >> > > Stop putting exclamation marks after every sentence. It diminishes the > meaning of it! > So you are going from playing with words to picking on my exclamation! :) > Well, you can't know why these tasks failed. That's the whole problem. > You're dealing with incomplete information and you have to devise > heuristics that get things done efficiently. > But Swift might know why it failed, it has a bunch of STDOUT/STDERR that it always captures! Falkon might capture the same output, but its optional ;( Could these outputs not be parsed for certain well know errors, and have different exit codes to mean different kinds of errors? > >> Exit codes are not usually enough in general, unless we define our >> own and the app and wrapper scripts generate these particular exit >> codes that Falkon can intercept and interpret reliably! >> > > That would be an improvement, but probably not a universally valid > assumption. So I wouldn't design with only that in mind. > But it would be an improvement over what we currently have... > >>> Frankly I'd rather Swift not be the part >>> to deal with it because it has to resort to heuristics, whereas Falkon >>> has direct knowledge of which nodes do what. >>> >>> >> That's fine, but I don't think Falkon can do it alone, it needs >> context and failure definition, which I believe only the application >> and Swift could say for certain! >> > > Nope, they can't. Swift does not meddle with semantics of applications. > They're all equally valuable functions. > > Now, there's stuff you can do to improve things, I'm guessing. You can > choose not to, and then we can keep having this discussion. There might > be stuff Swift can do, but it's not insight into applications, so you'll > have to ask for something else. > Any suggestions? Ioan > Mihael > > >> Ioan >> >> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Aug 14 00:26:15 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 14 Aug 2007 00:26:15 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <46C13508.3070000@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46BB97A8.3060006@cs.uchicago.edu> <5CB62511-5C52-4C63-8EE6-C36A4A4457DF@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchicago.edu> Message-ID: <1187069175.5653.30.camel@blabla.mcs.anl.gov> On Mon, 2007-08-13 at 23:52 -0500, Ioan Raicu wrote: > > > Mihael Hategan wrote: > > On Mon, 2007-08-13 at 23:07 -0500, Ioan Raicu wrote: > > > > > > small != not at all > > > > > > > > > > > Check out these two graphs, showing the # of active tasks within > > > Falkon! Active tasks = queued+pending+active+done_and_not_delivered. > > > > > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/number-of-active-tasks.jpg > > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/number-of-active-tasks-zoom.jpg > > > > > > Notice that after 3600 some seconds (after all the jobs that failed > > > had failed), the # of active tasks in Falkon oscillates between 100 > > > and 101 active tasks! The # presented on these graphs are taken from > > > the median value per minute (the raw samples were 60 samples per > > > minute). Notice that only at the very end of the experiment, at 30K+ > > > seconds, the # of active tasks increases to a max of 109 for a brief > > > period of time before it drops towards 0 as the workflow completes. I > > > did notice that towards the end of the workflow, the jobs were > > > typically shorter, and perhaps that somehow influenced the # of active > > > tasks within Falkon... So, when I said not at all, I was refering to > > > this flat line 100~101 active tasks that is shown in these figures! > > > > > > > Then say "it appears (from x and y) that the number of concurrent jobs > > does not increase by an observable amount". This is not the same as "the > > score does not increase at all". > > > You are playing with words here... I don't think so. All I'm saying is that there is a distinction between things not happening and you not observing things happening. And you should not claim things are not happening because you don't see enough things happening. You are making inferences about things based on certain observations. The observations are correct. The inferences are wrong. > the bottom line is that after 19K+ jobs and several hours of > successful jobs, there was no indication that the heuristic was > adapting to the new conditions, in which no jobs were failing! > > > > > So you are saying that 19K+ successful jobs was not enough to > > > > > counteract the 10K+ failed jobs from the early part of the > > > > > experiment? > > > > > > > > > > > > > > Yep. 19*1/5 = 3.8 < 10. > > > > > > > > > > > > > > > > > Can this ratio (1:5) be changed? > > > > > > > > > > > > > > Yes. The scheduler has two relevant properties: successFactor (currently > > > > 0.1) and failureFactor (currently -0.5). The term "factor" is not used > > > > formally, since these get added to the current score. > > > > > > > > > > > > > > > > > From this experiment, it would seem that the euristic is a slow > > > > > learner... maybe you ahve ideas on how to make it more quick to adapt > > > > > to changes? > > > > > > > > > > > > > > That could perhaps be done. > > > > > > > > > > > > > > > > > > In the context in which jobs are sent to non-busy workers, the system > > > > > > would tend to produce lots of failed jobs if it takes little time > > > > > > (compared to the normal run-time of a job) for a bad worker to fail a > > > > > > job. This *IS* why the swift scheduler throttles in the beginning: to > > > > > > avoid sending a large number of jobs to a resource that is broken. > > > > > > > > > > > > > > > > > > > > > > > But not the whole resource is broken... > > > > > > > > > > > > > > No, just slightly more than 1/3 of it. At least that's how it appears > > > > from the outside. > > > > > > > > > > > But a failed job should not be given the same weight as a succesful > > > job, in my oppinion. > > > > > > > Nope. I'd punish failures quite harshly. That's because the expected > > behavior is for things to work. I would not want a site that fails half > > the jobs to be anywhere near keeping a constant score. > > > That is fine, but you have a case (such as this one) in which this is > not ideal... how do you propose we adapt to cover this corner case? Falkon not sending 1/3 of the jobs to one particularly bad node. That or Falkon: 1. exposing information about what nodes the jobs run on and 2. allowing control over what nodes future jobs will go to. For efficiency purposes, I'd prefer the former. That's because scheduling is an expensive thing and making it hierarchical and distributed may improve overall performance (although this is debatable). > > > For example, it seems to me that you are giving the failed jobs 5 > > > times more weight than succesful jobs, but in reality it should be the > > > other way around. Failed jobs usually will fail quickly (as in the > > > case that we have in MolDyn), or they will fail slowly (within the > > > lifetime of the resource allocation). On the other hand, most > > > successful jobs will likely take more time to complete that it takes > > > for a job to fail (if it fails quickly). Perhaps instead of > > > > > > > successFactor (currently > > > > 0.1) and failureFactor (currently -0.5) > > > > > > > it should be more like: > > > successFactor: +1*(executionTime) > > > failureFactor: -1*(failureTime) > > > > > > > That's a very good idea. Biasing score based on run-time (at least when > > known). Please note: you should still fix Falkon to not do that thing > > it's doing. > > > Its not clear to me this should be done all the time, Falkon needs to > know why the failure happened to decide to throttle! Whatever. Do your best. > > > The 1 could of course be changed with some other weight to give > > > preference to successful jobs, or to failed jobs. With this kind of > > > strategy, the problems we are facing with throttling when there are > > > large # of short failures wouldn't be happening! Do you see any > > > drawbacks to this approach? > > > > > > > None that are obvious. It's in fact a good thing if the goal is > > performance, since it takes execution time into account. I've had manual > > "punishments" for connection time-outs because they take a long time to > > happen. But this time biasing naturally integrates that kind of stuff. > > So thanks. > > > > > > > > > that is the whole point here... > > > > > > > > > > > > > > This point comes because you KNOW how things work internally. All Swift > > > > sees is 10K failed jobs out of 29K. > > > > > > > > > > > > > > > > > anyways, I think this is a valid case that we need to discuss how to > > > > > handle, to make the entire Swift+Falkon more robust! > > > > > > > > > > BTW, here is another experiment with MolDyn that shows the throttling > > > > > and this heuristic behaving as I would expected! > > > > > http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed/summary_graph.jpg > > > > > > > > > > Notice the queue lenth (blue line) at around 11K seconds dropped > > > > > sharply, but then grew back up. That sudden drop was many jobs > > > > > failing fast on a bad node, and the sudden growth back up was Swift > > > > > re-submitting almost the same # of jobs that failed back to Falkon. > > > > > > > > > > > > > > That failing many jobs fast behavior is not right, regardless of whether > > > > Swift can deal with it or not. > > > > > > > If its a machine error, then it would be best to not fail many jobs > > > fast... > > > however, if its an app error, you want to fail the tasks as fast as > > > possible to fail the entire workflow faster, > > > > > > > But you can't distinguish between the two. The best you can do is assume > > that the failure is a linear combination between broken application and > > broken node. If it's broken node, rescheduling would do (which does not > > happen in your case: jobs keep being sent to the worker that is not > > busy, and that's the broken one). If it's a broken application, then the > > way to distinguish it from the other one is that after a bunch of > > retries on different nodes, it still fails. Notice that different nodes > > is essential here. > > > Right, I could try to keep track of statistics on each node, and when > failures happen, try to determine if its a system wide failure (all > nodes reporting errors), or are the faiures isolated on a single (or > small set) node(s)... I'll have to think about how to do this > efficiently! > > > so the app can be fixed and the workflow retried! For example, say > > > you had 1000 tasks (all independent), and had a wrong path set to the > > > app... with the current Falkon behaviour, the entire workflow would > > > likely fail within some 10~20 seconds of it submitting the first task! > > > However, if Falkon does some "smart" throttling when it sees failures, > > > its going to take time proportional to the failures to fail the > > > workflow! > > > > > > > You're missing the part where all nodes fail the jobs equally, thus not > > creating the inequality we're talking about (the ones where broken nodes > > get higher chances of getting more jobs). > > > Right, maybe we can use this to distinguish between node failure and > app failure! > > > Essentially, I am not a bit fan of throttling task dispatch due to > > > failed executions, unless we know why these tasks failed! > > > > > > > Stop putting exclamation marks after every sentence. It diminishes the > > meaning of it! > > > So you are going from playing with words to picking on my > exclamation! :) No playing, no picking. You can keep doing it, but then people will equate exclamation marks at the end of your sentences with "generic sentence termination mark" (traditionally "."). And then when you want to emphasize a sentence, you won't have the right tool to do it (although I'm guessing having more than one exclamation mark could work). > > Well, you can't know why these tasks failed. That's the whole problem. > > You're dealing with incomplete information and you have to devise > > heuristics that get things done efficiently. > > > But Swift might know why it failed, it has a bunch of STDOUT/STDERR > that it always captures! Falkon might capture the same output, but > its optional ;( Could these outputs not be parsed for certain well > know errors I'm not aware of what these "well known errors" are. And that still doesn't solve the problem when you get the not so well known errors. > , and have different exit codes to mean different kinds of errors? > > > Exit codes are not usually enough in general, unless we define our > > > own and the app and wrapper scripts generate these particular exit > > > codes that Falkon can intercept and interpret reliably! > > > > > > > That would be an improvement, but probably not a universally valid > > assumption. So I wouldn't design with only that in mind. > > > But it would be an improvement over what we currently have... > > > > Frankly I'd rather Swift not be the part > > > > to deal with it because it has to resort to heuristics, whereas Falkon > > > > has direct knowledge of which nodes do what. > > > > > > > > > > > That's fine, but I don't think Falkon can do it alone, it needs > > > context and failure definition, which I believe only the application > > > and Swift could say for certain! > > > > > > > Nope, they can't. Swift does not meddle with semantics of applications. > > They're all equally valuable functions. > > > > Now, there's stuff you can do to improve things, I'm guessing. You can > > choose not to, and then we can keep having this discussion. There might > > be stuff Swift can do, but it's not insight into applications, so you'll > > have to ask for something else. > > > Any suggestions? You're supposed to tell me what you need, not ask me what it is that you need :) > > Ioan > > Mihael > > > > > > > Ioan > > > > > > > > > > > > From iraicu at cs.uchicago.edu Tue Aug 14 18:26:23 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 14 Aug 2007 18:26:23 -0500 Subject: [Swift-devel] cannot commit to SVN... Message-ID: <46C23A1F.5020808@cs.uchicago.edu> Hi, I can't commit to SVN at CI. Here is the output I get from trying to commit a simple test! iraicu at viper:~/java/svn/falkon> svn ci test just testing --This line, and those below, will be ignored-- A test "svn-commit.5.tmp" 4L, 72C written Adding test Authentication realm: SVN Login Password for 'iraicu': svn: Commit failed (details follow): svn: CHECKOUT of '/svn/vdl2/!svn/ver/1075/falkon': 401 Authorization Required (https://svn.ci.uchicago.edu) svn: Your commit message was left in a temporary file: svn: '/home/iraicu/java/svn/falkon/svn-commit.5.tmp' Thanks, Ioan From iraicu at cs.uchicago.edu Tue Aug 14 19:59:19 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 14 Aug 2007 19:59:19 -0500 Subject: [Swift-devel] Re: cannot commit to SVN... In-Reply-To: <46C23A1F.5020808@cs.uchicago.edu> References: <46C23A1F.5020808@cs.uchicago.edu> Message-ID: <46C24FE7.9040500@cs.uchicago.edu> Hi, I just tried again, with a clean SVN checkout on login.ci.uchicago.edu. I included the date so you can look it up in the logs, if you need to. [iraicu at login container]$ date Tue Aug 14 19:55:43 CDT 2007 [iraicu at login container]$ svn status A hello [iraicu at login container]$ cd .. [iraicu at login falkon]$ svn cleanup [iraicu at login falkon]$ cd container/ [iraicu at login container]$ svn ci hello testing --This line, and those below, will be ignored-- A hello "svn-commit.tmp" 4L, 68C written Adding hello And then it just hangs there... Can you keep looking into this to see what is wrong? Thanks, Ioan Ioan Raicu wrote: > Hi, > I can't commit to SVN at CI. Here is the output I get from trying to > commit a simple test! > > iraicu at viper:~/java/svn/falkon> svn ci test > > just testing > --This line, and those below, will be ignored-- > > A test > > "svn-commit.5.tmp" 4L, 72C > written > > Adding test > Authentication realm: SVN Login > Password for 'iraicu': > svn: Commit failed (details follow): > svn: CHECKOUT of '/svn/vdl2/!svn/ver/1075/falkon': 401 Authorization > Required (https://svn.ci.uchicago.edu) > svn: Your commit message was left in a temporary file: > svn: '/home/iraicu/java/svn/falkon/svn-commit.5.tmp' > > Thanks, > Ioan > From leggett at ci.uchicago.edu Wed Aug 15 06:17:25 2007 From: leggett at ci.uchicago.edu (Ti Leggett) Date: Wed, 15 Aug 2007 06:17:25 -0500 Subject: [Swift-devel] Re: cannot commit to SVN... In-Reply-To: <46C24FE7.9040500@cs.uchicago.edu> References: <46C23A1F.5020808@cs.uchicago.edu> <46C24FE7.9040500@cs.uchicago.edu> Message-ID: <69460737-E2FD-4E5C-A6D9-36CF37ABD164@ci.uchicago.edu> There was a typo in the svn authz file. Try now. On Aug 14, 2007, at 7:59 PM, Ioan Raicu wrote: > Hi, > I just tried again, with a clean SVN checkout on > login.ci.uchicago.edu. I included the date so you can look it up > in the logs, if you need to. > > [iraicu at login container]$ date > Tue Aug 14 19:55:43 CDT 2007 > [iraicu at login container]$ svn status > A hello > [iraicu at login container]$ cd .. > [iraicu at login falkon]$ svn cleanup > [iraicu at login falkon]$ cd container/ > [iraicu at login container]$ svn ci hello > > testing > --This line, and those below, will be ignored-- > > A hello > "svn-commit.tmp" 4L, 68C written > Adding hello > > And then it just hangs there... > > Can you keep looking into this to see what is wrong? > > Thanks, > Ioan > > Ioan Raicu wrote: >> Hi, >> I can't commit to SVN at CI. Here is the output I get from trying >> to commit a simple test! >> >> iraicu at viper:~/java/svn/falkon> svn ci test >> >> just testing >> --This line, and those below, will be ignored-- >> >> A test >> >> "svn-commit.5.tmp" 4L, 72C written >> Adding test >> Authentication realm: SVN Login >> Password for 'iraicu': >> svn: Commit failed (details follow): >> svn: CHECKOUT of '/svn/vdl2/!svn/ver/1075/falkon': 401 >> Authorization Required (https://svn.ci.uchicago.edu) >> svn: Your commit message was left in a temporary file: >> svn: '/home/iraicu/java/svn/falkon/svn-commit.5.tmp' >> >> Thanks, >> Ioan >> From iraicu at cs.uchicago.edu Wed Aug 15 12:10:05 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 15 Aug 2007 12:10:05 -0500 Subject: [Swift-devel] Re: cannot commit to SVN... In-Reply-To: <69460737-E2FD-4E5C-A6D9-36CF37ABD164@ci.uchicago.edu> References: <46C23A1F.5020808@cs.uchicago.edu> <46C24FE7.9040500@cs.uchicago.edu> <69460737-E2FD-4E5C-A6D9-36CF37ABD164@ci.uchicago.edu> Message-ID: <46C3336D.8090100@cs.uchicago.edu> It works! iraicu at viper:~/java/svn/falkon> svn ci test just testing --This line, and those below, will be ignored-- A test "svn-commit.7.tmp" 4L, 72C written Adding test Authentication realm: SVN Login Password for 'iraicu': Transmitting file data . Committed revision 1076. Thanks, Ioan Ti Leggett wrote: > There was a typo in the svn authz file. Try now. > > On Aug 14, 2007, at 7:59 PM, Ioan Raicu wrote: > >> Hi, >> I just tried again, with a clean SVN checkout on >> login.ci.uchicago.edu. I included the date so you can look it up in >> the logs, if you need to. >> >> [iraicu at login container]$ date >> Tue Aug 14 19:55:43 CDT 2007 >> [iraicu at login container]$ svn status >> A hello >> [iraicu at login container]$ cd .. >> [iraicu at login falkon]$ svn cleanup >> [iraicu at login falkon]$ cd container/ >> [iraicu at login container]$ svn ci hello >> >> testing >> --This line, and those below, will be ignored-- >> >> A hello >> "svn-commit.tmp" 4L, 68C written >> Adding hello >> >> And then it just hangs there... >> >> Can you keep looking into this to see what is wrong? >> >> Thanks, >> Ioan >> >> Ioan Raicu wrote: >>> Hi, >>> I can't commit to SVN at CI. Here is the output I get from trying to >>> commit a simple test! >>> >>> iraicu at viper:~/java/svn/falkon> svn ci test >>> >>> just testing >>> --This line, and those below, will be ignored-- >>> >>> A test >>> >>> "svn-commit.5.tmp" 4L, 72C written >>> Adding test >>> Authentication realm: SVN Login >>> Password for 'iraicu': >>> svn: Commit failed (details follow): >>> svn: CHECKOUT of '/svn/vdl2/!svn/ver/1075/falkon': 401 Authorization >>> Required (https://svn.ci.uchicago.edu) >>> svn: Your commit message was left in a temporary file: >>> svn: '/home/iraicu/java/svn/falkon/svn-commit.5.tmp' >>> >>> Thanks, >>> Ioan >>> > > From iraicu at cs.uchicago.edu Thu Aug 16 11:57:36 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 16 Aug 2007 11:57:36 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <1187069175.5653.30.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchicago.edu> <1187069175.5653.30.camel@blabla.mcs.anl.gov> Message-ID: <46C48200.4020503@cs.uchicago.edu> Hi all, I finally committed all the updates for the provider-deef! I still have to commit the latest Falkon (if the consensus is that you want the latest changes in SVN, then perhaps I can do this on a weekly basis), but all the latest changes have been new features, and aren't essential for anyone to really test yet (especially as I am not doing much testing to make sure I am not breaking the existing code). You should be able to use the latest SVN Falkon to run Swift apps over. Ioan nefedova at viper:~/cogl/modules/provider-deef> svn ci updated with Yong's latest code from 7-26-07... made a succesful run with MolDyn 244 molecules! --This line, and those below, will be ignored-- M project.properties AM lib/FalkonStubs.jar D lib/GenericPortal.jar M src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java M src/org/globus/cog/abstraction/impl/execution/deef/Boot.java M src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java M src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java M src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java M src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java "svn-commit.3.tmp" 13L, 679C written Adding (bin) lib/FalkonStubs.jar Authentication realm: SVN Login Password for 'nefedova': Authentication realm: SVN Login Username: iraicu Password for 'iraicu': Deleting lib/GenericPortal.jar Sending project.properties Sending src/org/globus/cog/abstraction/impl/execution/deef/Boot.java Sending src/org/globus/cog/abstraction/impl/execution/deef/JobSubmissionTaskHandler.java Sending src/org/globus/cog/abstraction/impl/execution/deef/NotificationThread.java Sending src/org/globus/cog/abstraction/impl/execution/deef/ResourcePool.java Sending src/org/globus/cog/abstraction/impl/execution/deef/StatusThread.java Sending src/org/globus/cog/abstraction/impl/execution/deef/SubmissionThread.java Transmitting file data ........ Committed revision 1079. From wilde at mcs.anl.gov Tue Aug 21 11:56:15 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 21 Aug 2007 11:56:15 -0500 Subject: [Swift-devel] [Fwd: [vds-devel] kickstart, seqexec] Message-ID: <46CB192F.2010905@mcs.anl.gov> -------- Original Message -------- Subject: [vds-devel] kickstart, seqexec Date: Mon, 20 Aug 2007 10:47:41 -0700 From: Jens-Soenke Voeckler To: VDS Developers List Hi, there was a bad bug of switched arguments to a kill() call in kickstart and seqexec by code that was recently added. I've checked in the bug fix into both, Pegasus's SVN and VDS's CVS. You may want to check out the new, bug-fixed version, if you are still using either. Aloha, Dipl.-Ing. Jens-S. V?ckler voeckler at isi dot edu University of Southern California Viterbi School of Engineering Information Sciences Institute; 4676 Admiralty Way Ste 1001 Marina Del Rey, CA 90292-6611; USA; +1 310 448 8427 From hategan at mcs.anl.gov Thu Aug 23 21:59:12 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 23 Aug 2007 21:59:12 -0500 Subject: [Swift-devel] updates Message-ID: <1187924352.14048.2.camel@blabla.mcs.anl.gov> Lots of changes went in. That goes for cog, karajan, and swift. It's mostly cleanups for the type system and mapping related code, but also a prototype dcache provider, and no more manual editing of scheduler.xml to add providers. The language behavior tests seem to pass, but I think more testing would be needed. Mihael From benc at hawaga.org.uk Mon Aug 27 04:18:23 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 27 Aug 2007 09:18:23 +0000 (GMT) Subject: [Swift-devel] language-behaviour/150 broken Message-ID: Language behaviour test 150 seems to be broken in r1113. That code is this: type file; file f[] ; I made a brief attempt to bisect where it got broken in between r1091 (which was what was working last week) and r1113, but the two revisions I tried between 1091 and 1113 both fail to compile Swift itself (r1102 and r1108) -- From benc at hawaga.org.uk Mon Aug 27 04:41:21 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 27 Aug 2007 09:41:21 +0000 (GMT) Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <46C48200.4020503@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchicago.edu> <1187069175.5653.30.camel@blabla.mcs.anl.gov> <46C48200.4020503@cs.uchicago.edu> Message-ID: On Thu, 16 Aug 2007, Ioan Raicu wrote: > (if the consensus is that you want the latest changes in SVN, then > perhaps I can do this on a weekly basis) One model for what you should put in the trunk of the SVN is code that you think works well enough for a user to be making regular use of for their work (eg Nika). approximately equivalently, you shouldn't be pointing users to anything other than the SVN trunk (or some snapshot of trunk)) to obtain code. -- From benc at hawaga.org.uk Mon Aug 27 04:50:18 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 27 Aug 2007 09:50:18 +0000 (GMT) Subject: [Swift-devel] language-behaviour/150 broken In-Reply-To: References: Message-ID: Looks like maybe r1108 caused a regression bug that was previously fixed in r1050 - RootArrayDataNode expects a java.lang.String for its "prefix" parameter, but if it is passed a SwiftScript expression the value is not of that class. -- From benc at hawaga.org.uk Mon Aug 27 05:06:44 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 27 Aug 2007 10:06:44 +0000 (GMT) Subject: [Swift-devel] language-behaviour/150 broken In-Reply-To: References: Message-ID: I put in a fix for this in r1115 and all the tests seem to run now. -- From nefedova at mcs.anl.gov Mon Aug 27 12:21:11 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Mon, 27 Aug 2007 12:21:11 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <46C13508.3070000@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46BB97A8.3060006@cs.uchicago.edu> <5CB62511-5C52-4C63-8EE6-C36A4A4457DF@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchicago.edu> Message-ID: OK. I looked at the output and it looks like 14 molecules have still failed. They all failed due to hardware problems -- I saw nothing application-specific in applications logs, all very consistent with staled NFS handle that Ioan reported seeing. It would be great to be able to stop submitting jobs to 'bad' nodes during the run (long term), or to increase the number of retries in swift(short term) to enable the whole workflow to go through. Nika On Aug 13, 2007, at 11:52 PM, Ioan Raicu wrote: > > > Mihael Hategan wrote: >> On Mon, 2007-08-13 at 23:07 -0500, Ioan Raicu wrote: >> >>>>> >>>> small != not at all >>>> >>>> >>> Check out these two graphs, showing the # of active tasks within >>> Falkon! Active tasks = queued+pending+active >>> +done_and_not_delivered. >>> >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- >>> mol-success-8-10-07/number-of-active-tasks.jpg >>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- >>> mol-success-8-10-07/number-of-active-tasks-zoom.jpg >>> >>> Notice that after 3600 some seconds (after all the jobs that failed >>> had failed), the # of active tasks in Falkon oscillates between 100 >>> and 101 active tasks! The # presented on these graphs are taken >>> from >>> the median value per minute (the raw samples were 60 samples per >>> minute). Notice that only at the very end of the experiment, at >>> 30K+ >>> seconds, the # of active tasks increases to a max of 109 for a brief >>> period of time before it drops towards 0 as the workflow >>> completes. I >>> did notice that towards the end of the workflow, the jobs were >>> typically shorter, and perhaps that somehow influenced the # of >>> active >>> tasks within Falkon... So, when I said not at all, I was >>> refering to >>> this flat line 100~101 active tasks that is shown in these figures! >>> >> Then say "it appears (from x and y) that the number of concurrent >> jobs >> does not increase by an observable amount". This is not the same >> as "the >> score does not increase at all". >> > You are playing with words here... the bottom line is that after 19K > + jobs and several hours of successful jobs, there was no > indication that the heuristic was adapting to the new conditions, > in which no jobs were failing! >> >>>>> So you are saying that 19K+ successful jobs was not enough to >>>>> counteract the 10K+ failed jobs from the early part of the >>>>> experiment? >>>>> >>>>> >>>> Yep. 19*1/5 = 3.8 < 10. >>>> >>>> >>>> >>>>> Can this ratio (1:5) be changed? >>>>> >>>>> >>>> Yes. The scheduler has two relevant properties: successFactor >>>> (currently >>>> 0.1) and failureFactor (currently -0.5). The term "factor" is >>>> not used >>>> formally, since these get added to the current score. >>>> >>>> >>>> >>>>> From this experiment, it would seem that the euristic is a slow >>>>> learner... maybe you ahve ideas on how to make it more quick to >>>>> adapt >>>>> to changes? >>>>> >>>>> >>>> That could perhaps be done. >>>> >>>> >>>> >>>>>> In the context in which jobs are sent to non-busy workers, the >>>>>> system >>>>>> would tend to produce lots of failed jobs if it takes little time >>>>>> (compared to the normal run-time of a job) for a bad worker to >>>>>> fail a >>>>>> job. This *IS* why the swift scheduler throttles in the >>>>>> beginning: to >>>>>> avoid sending a large number of jobs to a resource that is >>>>>> broken. >>>>>> >>>>>> >>>>>> >>>>> But not the whole resource is broken... >>>>> >>>>> >>>> No, just slightly more than 1/3 of it. At least that's how it >>>> appears >>>> from the outside. >>>> >>>> >>> But a failed job should not be given the same weight as a succesful >>> job, in my oppinion. >>> >> Nope. I'd punish failures quite harshly. That's because the expected >> behavior is for things to work. I would not want a site that fails >> half >> the jobs to be anywhere near keeping a constant score. >> > That is fine, but you have a case (such as this one) in which this > is not ideal... how do you propose we adapt to cover this corner case? >> >>> For example, it seems to me that you are giving the failed jobs 5 >>> times more weight than succesful jobs, but in reality it should >>> be the >>> other way around. Failed jobs usually will fail quickly (as in the >>> case that we have in MolDyn), or they will fail slowly (within the >>> lifetime of the resource allocation). On the other hand, most >>> successful jobs will likely take more time to complete that it takes >>> for a job to fail (if it fails quickly). Perhaps instead of >>> >>>> successFactor (currently >>>> 0.1) and failureFactor (currently -0.5) >>>> >>> it should be more like: >>> successFactor: +1*(executionTime) >>> failureFactor: -1*(failureTime) >>> >> That's a very good idea. Biasing score based on run-time (at least >> when >> known). Please note: you should still fix Falkon to not do that thing >> it's doing. >> > Its not clear to me this should be done all the time, Falkon needs > to know why the failure happened to decide to throttle! >> >>> The 1 could of course be changed with some other weight to give >>> preference to successful jobs, or to failed jobs. With this kind of >>> strategy, the problems we are facing with throttling when there are >>> large # of short failures wouldn't be happening! Do you see any >>> drawbacks to this approach? >>> >> None that are obvious. It's in fact a good thing if the goal is >> performance, since it takes execution time into account. I've had >> manual >> "punishments" for connection time-outs because they take a long >> time to >> happen. But this time biasing naturally integrates that kind of >> stuff. >> So thanks. >> >> >>>>> that is the whole point here... >>>>> >>>>> >>>> This point comes because you KNOW how things work internally. >>>> All Swift >>>> sees is 10K failed jobs out of 29K. >>>> >>>> >>>> >>>>> anyways, I think this is a valid case that we need to discuss >>>>> how to >>>>> handle, to make the entire Swift+Falkon more robust! >>>>> >>>>> BTW, here is another experiment with MolDyn that shows the >>>>> throttling >>>>> and this heuristic behaving as I would expected! >>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244- >>>>> mol-failed/summary_graph.jpg >>>>> >>>>> Notice the queue lenth (blue line) at around 11K seconds dropped >>>>> sharply, but then grew back up. That sudden drop was many jobs >>>>> failing fast on a bad node, and the sudden growth back up was >>>>> Swift >>>>> re-submitting almost the same # of jobs that failed back to >>>>> Falkon. >>>>> >>>>> >>>> That failing many jobs fast behavior is not right, regardless of >>>> whether >>>> Swift can deal with it or not. >>>> >>> If its a machine error, then it would be best to not fail many jobs >>> fast... >>> however, if its an app error, you want to fail the tasks as fast as >>> possible to fail the entire workflow faster, >>> >> But you can't distinguish between the two. The best you can do is >> assume >> that the failure is a linear combination between broken >> application and >> broken node. If it's broken node, rescheduling would do (which >> does not >> happen in your case: jobs keep being sent to the worker that is not >> busy, and that's the broken one). If it's a broken application, >> then the >> way to distinguish it from the other one is that after a bunch of >> retries on different nodes, it still fails. Notice that different >> nodes >> is essential here. >> > Right, I could try to keep track of statistics on each node, and > when failures happen, try to determine if its a system wide failure > (all nodes reporting errors), or are the faiures isolated on a > single (or small set) node(s)... I'll have to think about how to > do this efficiently! >> >>> so the app can be fixed and the workflow retried! For example, say >>> you had 1000 tasks (all independent), and had a wrong path set to >>> the >>> app... with the current Falkon behaviour, the entire workflow would >>> likely fail within some 10~20 seconds of it submitting the first >>> task! >>> However, if Falkon does some "smart" throttling when it sees >>> failures, >>> its going to take time proportional to the failures to fail the >>> workflow! >>> >> You're missing the part where all nodes fail the jobs equally, >> thus not >> creating the inequality we're talking about (the ones where broken >> nodes >> get higher chances of getting more jobs). >> > Right, maybe we can use this to distinguish between node failure > and app failure! >> >>> Essentially, I am not a bit fan of throttling task dispatch due to >>> failed executions, unless we know why these tasks failed! >>> >> Stop putting exclamation marks after every sentence. It diminishes >> the >> meaning of it! >> > So you are going from playing with words to picking on my > exclamation! :) >> Well, you can't know why these tasks failed. That's the whole >> problem. >> You're dealing with incomplete information and you have to devise >> heuristics that get things done efficiently. >> > But Swift might know why it failed, it has a bunch of STDOUT/STDERR > that it always captures! Falkon might capture the same output, but > its optional ;( Could these outputs not be parsed for certain well > know errors, and have different exit codes to mean different kinds > of errors? >> >>> Exit codes are not usually enough in general, unless we define our >>> own and the app and wrapper scripts generate these particular exit >>> codes that Falkon can intercept and interpret reliably! >>> >> That would be an improvement, but probably not a universally valid >> assumption. So I wouldn't design with only that in mind. >> > But it would be an improvement over what we currently have... >> >>>> Frankly I'd rather Swift not be the part >>>> to deal with it because it has to resort to heuristics, whereas >>>> Falkon >>>> has direct knowledge of which nodes do what. >>>> >>>> >>> That's fine, but I don't think Falkon can do it alone, it needs >>> context and failure definition, which I believe only the application >>> and Swift could say for certain! >>> >> Nope, they can't. Swift does not meddle with semantics of >> applications. >> They're all equally valuable functions. >> >> Now, there's stuff you can do to improve things, I'm guessing. You >> can >> choose not to, and then we can keep having this discussion. There >> might >> be stuff Swift can do, but it's not insight into applications, so >> you'll >> have to ask for something else. >> > Any suggestions? > > Ioan >> Mihael >> >> >>> Ioan >>> >>> >> >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Mon Aug 27 12:30:20 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 27 Aug 2007 12:30:20 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> Message-ID: <46D30A2C.7040009@cs.uchicago.edu> Hi, I will look at the Falkon scheduler to what I can do to either throttle or blacklist task dispatches to bad nodes. On a similar note, IMO, the heuristic in Karajan should be modified to take into account the task execution time of the failed or successful task, and not just the number of tasks. This would ensure that Swift is not throttling task submission to Falkon when there are 1000s of successful tasks that take on the order of 100s of second to complete, yet there are also 1000s of failed tasks that are only 10 ms long. This is exactly the case with MolDyn, when we get a bad node in a bunch of 100s of nodes, which ends up throttling the number of active and running tasks to about 100, regardless of the number of processors Falkon has. I also think that when Swift runs in conjunction with Falkon, we should increase the number of retry attempts Swift is willing to make per task before giving up. Currently, it is set to 3, but a higher number of would be better, considering the low overhead of task submission Falkon has! I think the combination of these three changes (one from Falkon and another from Swift) should increase the probability of large workflows completing on a large number of resources! Ioan Veronika Nefedova wrote: > OK. I looked at the output and it looks like 14 molecules have still > failed. They all failed due to hardware problems -- I saw nothing > application-specific in applications logs, all very consistent with > staled NFS handle that Ioan reported seeing. > It would be great to be able to stop submitting jobs to 'bad' nodes > during the run (long term), or to increase the number of retries in > swift(short term) to enable the whole workflow to go through. > > Nika > > On Aug 13, 2007, at 11:52 PM, Ioan Raicu wrote: > >> >> >> Mihael Hategan wrote: >>> On Mon, 2007-08-13 at 23:07 -0500, Ioan Raicu wrote: >>> >>>>>> >>>>> small != not at all >>>>> >>>>> >>>> Check out these two graphs, showing the # of active tasks within >>>> Falkon! Active tasks = queued+pending+active+done_and_not_delivered. >>>> >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/number-of-active-tasks.jpg >>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-success-8-10-07/number-of-active-tasks-zoom.jpg >>>> >>>> Notice that after 3600 some seconds (after all the jobs that failed >>>> had failed), the # of active tasks in Falkon oscillates between 100 >>>> and 101 active tasks! The # presented on these graphs are taken from >>>> the median value per minute (the raw samples were 60 samples per >>>> minute). Notice that only at the very end of the experiment, at 30K+ >>>> seconds, the # of active tasks increases to a max of 109 for a brief >>>> period of time before it drops towards 0 as the workflow completes. I >>>> did notice that towards the end of the workflow, the jobs were >>>> typically shorter, and perhaps that somehow influenced the # of active >>>> tasks within Falkon... So, when I said not at all, I was refering to >>>> this flat line 100~101 active tasks that is shown in these figures! >>>> >>> Then say "it appears (from x and y) that the number of concurrent jobs >>> does not increase by an observable amount". This is not the same as "the >>> score does not increase at all". >>> >> You are playing with words here... the bottom line is that after 19K+ >> jobs and several hours of successful jobs, there was no indication >> that the heuristic was adapting to the new conditions, in which no >> jobs were failing! >>> >>>>>> So you are saying that 19K+ successful jobs was not enough to >>>>>> counteract the 10K+ failed jobs from the early part of the >>>>>> experiment? >>>>>> >>>>>> >>>>> Yep. 19*1/5 = 3.8 < 10. >>>>> >>>>> >>>>> >>>>>> Can this ratio (1:5) be changed? >>>>>> >>>>>> >>>>> Yes. The scheduler has two relevant properties: successFactor (currently >>>>> 0.1) and failureFactor (currently -0.5). The term "factor" is not used >>>>> formally, since these get added to the current score. >>>>> >>>>> >>>>> >>>>>> From this experiment, it would seem that the euristic is a slow >>>>>> learner... maybe you ahve ideas on how to make it more quick to adapt >>>>>> to changes? >>>>>> >>>>>> >>>>> That could perhaps be done. >>>>> >>>>> >>>>> >>>>>>> In the context in which jobs are sent to non-busy workers, the system >>>>>>> would tend to produce lots of failed jobs if it takes little time >>>>>>> (compared to the normal run-time of a job) for a bad worker to fail a >>>>>>> job. This *IS* why the swift scheduler throttles in the beginning: to >>>>>>> avoid sending a large number of jobs to a resource that is broken. >>>>>>> >>>>>>> >>>>>>> >>>>>> But not the whole resource is broken... >>>>>> >>>>>> >>>>> No, just slightly more than 1/3 of it. At least that's how it appears >>>>> from the outside. >>>>> >>>>> >>>> But a failed job should not be given the same weight as a succesful >>>> job, in my oppinion. >>>> >>> Nope. I'd punish failures quite harshly. That's because the expected >>> behavior is for things to work. I would not want a site that fails half >>> the jobs to be anywhere near keeping a constant score. >>> >> That is fine, but you have a case (such as this one) in which this is >> not ideal... how do you propose we adapt to cover this corner case? >>> >>>> For example, it seems to me that you are giving the failed jobs 5 >>>> times more weight than succesful jobs, but in reality it should be the >>>> other way around. Failed jobs usually will fail quickly (as in the >>>> case that we have in MolDyn), or they will fail slowly (within the >>>> lifetime of the resource allocation). On the other hand, most >>>> successful jobs will likely take more time to complete that it takes >>>> for a job to fail (if it fails quickly). Perhaps instead of >>>> >>>>> successFactor (currently >>>>> 0.1) and failureFactor (currently -0.5) >>>>> >>>> it should be more like: >>>> successFactor: +1*(executionTime) >>>> failureFactor: -1*(failureTime) >>>> >>> That's a very good idea. Biasing score based on run-time (at least when >>> known). Please note: you should still fix Falkon to not do that thing >>> it's doing. >>> >> Its not clear to me this should be done all the time, Falkon needs to >> know why the failure happened to decide to throttle! >>> >>>> The 1 could of course be changed with some other weight to give >>>> preference to successful jobs, or to failed jobs. With this kind of >>>> strategy, the problems we are facing with throttling when there are >>>> large # of short failures wouldn't be happening! Do you see any >>>> drawbacks to this approach? >>>> >>> None that are obvious. It's in fact a good thing if the goal is >>> performance, since it takes execution time into account. I've had manual >>> "punishments" for connection time-outs because they take a long time to >>> happen. But this time biasing naturally integrates that kind of stuff. >>> So thanks. >>> >>> >>>>>> that is the whole point here... >>>>>> >>>>>> >>>>> This point comes because you KNOW how things work internally. All Swift >>>>> sees is 10K failed jobs out of 29K. >>>>> >>>>> >>>>> >>>>>> anyways, I think this is a valid case that we need to discuss how to >>>>>> handle, to make the entire Swift+Falkon more robust! >>>>>> >>>>>> BTW, here is another experiment with MolDyn that shows the throttling >>>>>> and this heuristic behaving as I would expected! >>>>>> http://people.cs.uchicago.edu/~iraicu/research/docs/MolDyn/244-mol-failed/summary_graph.jpg >>>>>> >>>>>> Notice the queue lenth (blue line) at around 11K seconds dropped >>>>>> sharply, but then grew back up. That sudden drop was many jobs >>>>>> failing fast on a bad node, and the sudden growth back up was Swift >>>>>> re-submitting almost the same # of jobs that failed back to Falkon. >>>>>> >>>>>> >>>>> That failing many jobs fast behavior is not right, regardless of whether >>>>> Swift can deal with it or not. >>>>> >>>> If its a machine error, then it would be best to not fail many jobs >>>> fast... >>>> however, if its an app error, you want to fail the tasks as fast as >>>> possible to fail the entire workflow faster, >>>> >>> But you can't distinguish between the two. The best you can do is assume >>> that the failure is a linear combination between broken application and >>> broken node. If it's broken node, rescheduling would do (which does not >>> happen in your case: jobs keep being sent to the worker that is not >>> busy, and that's the broken one). If it's a broken application, then the >>> way to distinguish it from the other one is that after a bunch of >>> retries on different nodes, it still fails. Notice that different nodes >>> is essential here. >>> >> Right, I could try to keep track of statistics on each node, and when >> failures happen, try to determine if its a system wide failure (all >> nodes reporting errors), or are the faiures isolated on a single (or >> small set) node(s)... I'll have to think about how to do this >> efficiently! >>> >>>> so the app can be fixed and the workflow retried! For example, say >>>> you had 1000 tasks (all independent), and had a wrong path set to the >>>> app... with the current Falkon behaviour, the entire workflow would >>>> likely fail within some 10~20 seconds of it submitting the first task! >>>> However, if Falkon does some "smart" throttling when it sees failures, >>>> its going to take time proportional to the failures to fail the >>>> workflow! >>>> >>> You're missing the part where all nodes fail the jobs equally, thus not >>> creating the inequality we're talking about (the ones where broken nodes >>> get higher chances of getting more jobs). >>> >> Right, maybe we can use this to distinguish between node failure and >> app failure! >>> >>>> Essentially, I am not a bit fan of throttling task dispatch due to >>>> failed executions, unless we know why these tasks failed! >>>> >>> Stop putting exclamation marks after every sentence. It diminishes the >>> meaning of it! >>> >> So you are going from playing with words to picking on my exclamation! :) >>> Well, you can't know why these tasks failed. That's the whole problem. >>> You're dealing with incomplete information and you have to devise >>> heuristics that get things done efficiently. >>> >> But Swift might know why it failed, it has a bunch of STDOUT/STDERR >> that it always captures! Falkon might capture the same output, but >> its optional ;( Could these outputs not be parsed for certain well >> know errors, and have different exit codes to mean different kinds of >> errors? >>> >>>> Exit codes are not usually enough in general, unless we define our >>>> own and the app and wrapper scripts generate these particular exit >>>> codes that Falkon can intercept and interpret reliably! >>>> >>> That would be an improvement, but probably not a universally valid >>> assumption. So I wouldn't design with only that in mind. >>> >> But it would be an improvement over what we currently have... >>> >>>>> Frankly I'd rather Swift not be the part >>>>> to deal with it because it has to resort to heuristics, whereas Falkon >>>>> has direct knowledge of which nodes do what. >>>>> >>>>> >>>> That's fine, but I don't think Falkon can do it alone, it needs >>>> context and failure definition, which I believe only the application >>>> and Swift could say for certain! >>>> >>> Nope, they can't. Swift does not meddle with semantics of applications. >>> They're all equally valuable functions. >>> >>> Now, there's stuff you can do to improve things, I'm guessing. You can >>> choose not to, and then we can keep having this discussion. There might >>> be stuff Swift can do, but it's not insight into applications, so you'll >>> have to ask for something else. >>> >> Any suggestions? >> >> Ioan >>> Mihael >>> >>> >>>> Ioan >>>> >>>> >>> > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From benc at hawaga.org.uk Mon Aug 27 12:37:51 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 27 Aug 2007 17:37:51 +0000 (GMT) Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <46D30A2C.7040009@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> Message-ID: On Mon, 27 Aug 2007, Ioan Raicu wrote: > On a similar note, IMO, the heuristic in Karajan should be modified to take > into account the task execution time of the failed or successful task, and not > just the number of tasks. This would ensure that Swift is not throttling task > submission to Falkon when there are 1000s of successful tasks that take on the > order of 100s of second to complete, yet there are also 1000s of failed tasks > that are only 10 ms long. This is exactly the case with MolDyn, when we get a > bad node in a bunch of 100s of nodes, which ends up throttling the number of > active and running tasks to about 100, regardless of the number of processors > Falkon has. Is that different from when submitting to PBS or GRAM where there are 1000s of successful tasks taking 100s of seconds to complete but with 1000s of failed tasks that are only 10ms long? -- From hategan at mcs.anl.gov Mon Aug 27 13:07:59 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 27 Aug 2007 13:07:59 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> < 46BC7C46.6030004@cs.uchicago.edu> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> Message-ID: <1188238079.31798.25.camel@blabla.mcs.anl.gov> On Mon, 2007-08-27 at 17:37 +0000, Ben Clifford wrote: > > On Mon, 27 Aug 2007, Ioan Raicu wrote: > > > On a similar note, IMO, the heuristic in Karajan should be modified to take > > into account the task execution time of the failed or successful task, and not > > just the number of tasks. This would ensure that Swift is not throttling task > > submission to Falkon when there are 1000s of successful tasks that take on the > > order of 100s of second to complete, yet there are also 1000s of failed tasks > > that are only 10 ms long. This is exactly the case with MolDyn, when we get a > > bad node in a bunch of 100s of nodes, which ends up throttling the number of > > active and running tasks to about 100, regardless of the number of processors > > Falkon has. > > Is that different from when submitting to PBS or GRAM where there are > 1000s of successful tasks taking 100s of seconds to complete but with > 1000s of failed tasks that are only 10ms long? In your scenario, assuming that GRAM and PBS do work (since some jobs succeed), then you can't really submit that fast. So the same thing would happen, but slower. Unfortunately, in the PBS case, there's not much that can be done but to throttle until no more jobs than good nodes are being run at one time. Now, there is the probing part, which makes the system start with a lower throttle which increases until problems appear. If this is disabled (as it was in the ModDyn run), large numbers of parallel jobs will be submitted causing a large number of failures. So this whole thing is close to a linear system with negative feedback. If the initial state is very far away from stability, there will be large transients. You're more than welcome to study how to make it converge faster, or how to guess the initial state better (knowing the number of nodes a cluster has would be a step). > From iraicu at cs.uchicago.edu Mon Aug 27 13:19:31 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 27 Aug 2007 13:19:31 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> Message-ID: <46D315B3.9030605@cs.uchicago.edu> Yes it is, it is VERY different! With GRAM/PBS, although the failed job only takes 10ms to fail, there is about a 1 sec overhead to submit the job and get the error code. In Falkon, the overhead is about 20ms. Also, in the time that the 1 node was faulty (~30 sec), Falkon can submit and return about 1000 failed tasks, while GRAM/PBS could only do about 15~30 failed jobs. The fact that Falkon's submit/execute throughput is 2 orders of magnitude higher than GRAM/PBS is what makes is different, and hence needs to be handled different. Ioan Ben Clifford wrote: > On Mon, 27 Aug 2007, Ioan Raicu wrote: > > >> On a similar note, IMO, the heuristic in Karajan should be modified to take >> into account the task execution time of the failed or successful task, and not >> just the number of tasks. This would ensure that Swift is not throttling task >> submission to Falkon when there are 1000s of successful tasks that take on the >> order of 100s of second to complete, yet there are also 1000s of failed tasks >> that are only 10 ms long. This is exactly the case with MolDyn, when we get a >> bad node in a bunch of 100s of nodes, which ends up throttling the number of >> active and running tasks to about 100, regardless of the number of processors >> Falkon has. >> > > Is that different from when submitting to PBS or GRAM where there are > 1000s of successful tasks taking 100s of seconds to complete but with > 1000s of failed tasks that are only 10ms long? > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Mon Aug 27 13:25:30 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 27 Aug 2007 13:25:30 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <1188238079.31798.25.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> Message-ID: <46D3171A.9030001@cs.uchicago.edu> The question I am interested in, can you modify the heuristic to take into account the execution time of tasks when updating the site score? I think it is important you use only the execution time (and not Falkon queue time + execution time + result delivery time); in this case, how does Falkon pass this information back to Swift? Ioan Mihael Hategan wrote: > On Mon, 2007-08-27 at 17:37 +0000, Ben Clifford wrote: > >> On Mon, 27 Aug 2007, Ioan Raicu wrote: >> >> >>> On a similar note, IMO, the heuristic in Karajan should be modified to take >>> into account the task execution time of the failed or successful task, and not >>> just the number of tasks. This would ensure that Swift is not throttling task >>> submission to Falkon when there are 1000s of successful tasks that take on the >>> order of 100s of second to complete, yet there are also 1000s of failed tasks >>> that are only 10 ms long. This is exactly the case with MolDyn, when we get a >>> bad node in a bunch of 100s of nodes, which ends up throttling the number of >>> active and running tasks to about 100, regardless of the number of processors >>> Falkon has. >>> >> Is that different from when submitting to PBS or GRAM where there are >> 1000s of successful tasks taking 100s of seconds to complete but with >> 1000s of failed tasks that are only 10ms long? >> > > In your scenario, assuming that GRAM and PBS do work (since some jobs > succeed), then you can't really submit that fast. So the same thing > would happen, but slower. Unfortunately, in the PBS case, there's not > much that can be done but to throttle until no more jobs than good nodes > are being run at one time. > > Now, there is the probing part, which makes the system start with a > lower throttle which increases until problems appear. If this is > disabled (as it was in the ModDyn run), large numbers of parallel jobs > will be submitted causing a large number of failures. > > So this whole thing is close to a linear system with negative feedback. > If the initial state is very far away from stability, there will be > large transients. You're more than welcome to study how to make it > converge faster, or how to guess the initial state better (knowing the > number of nodes a cluster has would be a step). > > > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Aug 27 13:47:46 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 27 Aug 2007 13:47:46 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <46D3171A.9030001@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> Message-ID: <1188240466.1493.9.camel@blabla.mcs.anl.gov> On Mon, 2007-08-27 at 13:25 -0500, Ioan Raicu wrote: > The question I am interested in, can you modify the heuristic to take > into account the execution time of tasks when updating the site score? I thought I mentioned I can. > I think it is important you use only the execution time (and not > Falkon queue time + execution time + result delivery time); in this > case, how does Falkon pass this information back to Swift? I thought I mentioned why that's not a good idea. Here's a short version: If Falkon is slow for some reason, that needs to be taken into account. Excluding it from measurements under the assumption that it will always be fast is not a particularly good idea. And if it is always fast then it doesn't matter much since it won't add much overhead. > > Ioan > > Mihael Hategan wrote: > > On Mon, 2007-08-27 at 17:37 +0000, Ben Clifford wrote: > > > > > On Mon, 27 Aug 2007, Ioan Raicu wrote: > > > > > > > > > > On a similar note, IMO, the heuristic in Karajan should be modified to take > > > > into account the task execution time of the failed or successful task, and not > > > > just the number of tasks. This would ensure that Swift is not throttling task > > > > submission to Falkon when there are 1000s of successful tasks that take on the > > > > order of 100s of second to complete, yet there are also 1000s of failed tasks > > > > that are only 10 ms long. This is exactly the case with MolDyn, when we get a > > > > bad node in a bunch of 100s of nodes, which ends up throttling the number of > > > > active and running tasks to about 100, regardless of the number of processors > > > > Falkon has. > > > > > > > Is that different from when submitting to PBS or GRAM where there are > > > 1000s of successful tasks taking 100s of seconds to complete but with > > > 1000s of failed tasks that are only 10ms long? > > > > > > > In your scenario, assuming that GRAM and PBS do work (since some jobs > > succeed), then you can't really submit that fast. So the same thing > > would happen, but slower. Unfortunately, in the PBS case, there's not > > much that can be done but to throttle until no more jobs than good nodes > > are being run at one time. > > > > Now, there is the probing part, which makes the system start with a > > lower throttle which increases until problems appear. If this is > > disabled (as it was in the ModDyn run), large numbers of parallel jobs > > will be submitted causing a large number of failures. > > > > So this whole thing is close to a linear system with negative feedback. > > If the initial state is very far away from stability, there will be > > large transients. You're more than welcome to study how to make it > > converge faster, or how to guess the initial state better (knowing the > > number of nodes a cluster has would be a step). > > > > > > > > > > > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ From hategan at mcs.anl.gov Mon Aug 27 13:54:34 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 27 Aug 2007 13:54:34 -0500 Subject: [Swift-devel] language-behaviour/150 broken In-Reply-To: References: Message-ID: <1188240874.2086.2.camel@blabla.mcs.anl.gov> My bad. There was a conflict and what I saw as being the last was the typecast version. Probably because I did a reverse diff when looking at the issue. On Mon, 2007-08-27 at 09:50 +0000, Ben Clifford wrote: > Looks like maybe r1108 caused a regression bug that was previously fixed > in r1050 - RootArrayDataNode expects a java.lang.String for its "prefix" > parameter, but if it is passed a SwiftScript expression the value is not > of that class. > From foster at mcs.anl.gov Mon Aug 27 14:40:41 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Mon, 27 Aug 2007 14:40:41 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <46D3171A.9030001@cs.uchicago.edu> References: <46AF37D9.7000301@mcs.anl.gov> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> Message-ID: <46D328B9.9000503@mcs.anl.gov> It's still not clear to me why Karajan is throttling at all when working with Falkon. I've asked this question before, and I don't recall receiving a satisfactory answer. So far at least, this behavior has just created problems for us. Can we turn it off? Ian. Ioan Raicu wrote: > The question I am interested in, can you modify the heuristic to take > into account the execution time of tasks when updating the site > score? I think it is important you use only the execution time (and > not Falkon queue time + execution time + result delivery time); in > this case, how does Falkon pass this information back to Swift? > > Ioan > > Mihael Hategan wrote: >> On Mon, 2007-08-27 at 17:37 +0000, Ben Clifford wrote: >> >>> On Mon, 27 Aug 2007, Ioan Raicu wrote: >>> >>> >>>> On a similar note, IMO, the heuristic in Karajan should be modified to take >>>> into account the task execution time of the failed or successful task, and not >>>> just the number of tasks. This would ensure that Swift is not throttling task >>>> submission to Falkon when there are 1000s of successful tasks that take on the >>>> order of 100s of second to complete, yet there are also 1000s of failed tasks >>>> that are only 10 ms long. This is exactly the case with MolDyn, when we get a >>>> bad node in a bunch of 100s of nodes, which ends up throttling the number of >>>> active and running tasks to about 100, regardless of the number of processors >>>> Falkon has. >>>> >>> Is that different from when submitting to PBS or GRAM where there are >>> 1000s of successful tasks taking 100s of seconds to complete but with >>> 1000s of failed tasks that are only 10ms long? >>> >> >> In your scenario, assuming that GRAM and PBS do work (since some jobs >> succeed), then you can't really submit that fast. So the same thing >> would happen, but slower. Unfortunately, in the PBS case, there's not >> much that can be done but to throttle until no more jobs than good nodes >> are being run at one time. >> >> Now, there is the probing part, which makes the system start with a >> lower throttle which increases until problems appear. If this is >> disabled (as it was in the ModDyn run), large numbers of parallel jobs >> will be submitted causing a large number of failures. >> >> So this whole thing is close to a linear system with negative feedback. >> If the initial state is very far away from stability, there will be >> large transients. You're more than welcome to study how to make it >> converge faster, or how to guess the initial state better (knowing the >> number of nodes a cluster has would be a step). >> >> >> >> >> > > -- > ============================================ > Ioan Raicu > Ph.D. Student > ============================================ > Distributed Systems Laboratory > Computer Science Department > University of Chicago > 1100 E. 58th Street, Ryerson Hall > Chicago, IL 60637 > ============================================ > Email: iraicu at cs.uchicago.edu > Web: http://www.cs.uchicago.edu/~iraicu > http://dsl.cs.uchicago.edu/ > ============================================ > ============================================ -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Aug 27 15:04:00 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 27 Aug 2007 15:04:00 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: <46D328B9.9000503@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <46D328B9.9000503@mcs.anl.gov> Message-ID: <1188245040.4490.19.camel@blabla.mcs.anl.gov> On Mon, 2007-08-27 at 14:40 -0500, Ian Foster wrote: > It's still not clear to me why Karajan is throttling at all when > working with Falkon. I've asked this question before, and I don't > recall receiving a satisfactory answer. So far at least, this behavior > has just created problems for us. The suggestion that throttling has created "just" problems for us is, I'd say, misleading and unnecessary. We're discussing exactly the issue that better (not necessarily more) throttling is needed in order to prevent the workflow from failing badly. > Can we turn it off? Sure. I mentioned how to do that. Perhaps there should be an "off" option for each throttling configuration. I'll see to that. Mihael > > Ian. > > Ioan Raicu wrote: > > The question I am interested in, can you modify the heuristic to > > take into account the execution time of tasks when updating the site > > score? I think it is important you use only the execution time (and > > not Falkon queue time + execution time + result delivery time); in > > this case, how does Falkon pass this information back to Swift? > > > > Ioan > > > > Mihael Hategan wrote: > > > On Mon, 2007-08-27 at 17:37 +0000, Ben Clifford wrote: > > > > > > > On Mon, 27 Aug 2007, Ioan Raicu wrote: > > > > > > > > > > > > > On a similar note, IMO, the heuristic in Karajan should be modified to take > > > > > into account the task execution time of the failed or successful task, and not > > > > > just the number of tasks. This would ensure that Swift is not throttling task > > > > > submission to Falkon when there are 1000s of successful tasks that take on the > > > > > order of 100s of second to complete, yet there are also 1000s of failed tasks > > > > > that are only 10 ms long. This is exactly the case with MolDyn, when we get a > > > > > bad node in a bunch of 100s of nodes, which ends up throttling the number of > > > > > active and running tasks to about 100, regardless of the number of processors > > > > > Falkon has. > > > > > > > > > Is that different from when submitting to PBS or GRAM where there are > > > > 1000s of successful tasks taking 100s of seconds to complete but with > > > > 1000s of failed tasks that are only 10ms long? > > > > > > > > > > In your scenario, assuming that GRAM and PBS do work (since some jobs > > > succeed), then you can't really submit that fast. So the same thing > > > would happen, but slower. Unfortunately, in the PBS case, there's not > > > much that can be done but to throttle until no more jobs than good nodes > > > are being run at one time. > > > > > > Now, there is the probing part, which makes the system start with a > > > lower throttle which increases until problems appear. If this is > > > disabled (as it was in the ModDyn run), large numbers of parallel jobs > > > will be submitted causing a large number of failures. > > > > > > So this whole thing is close to a linear system with negative feedback. > > > If the initial state is very far away from stability, there will be > > > large transients. You're more than welcome to study how to make it > > > converge faster, or how to guess the initial state better (knowing the > > > number of nodes a cluster has would be a step). > > > > > > > > > > > > > > > > > > > -- > > ============================================ > > Ioan Raicu > > Ph.D. Student > > ============================================ > > Distributed Systems Laboratory > > Computer Science Department > > University of Chicago > > 1100 E. 58th Street, Ryerson Hall > > Chicago, IL 60637 > > ============================================ > > Email: iraicu at cs.uchicago.edu > > Web: http://www.cs.uchicago.edu/~iraicu > > http://dsl.cs.uchicago.edu/ > > ============================================ > > ============================================ > > -- > > Ian Foster, Director, Computation Institute > Argonne National Laboratory & University of Chicago > Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 > Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 > Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. > Globus Alliance: www.globus.org. From wilde at mcs.anl.gov Mon Aug 27 15:07:38 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 27 Aug 2007 15:07:38 -0500 Subject: [Swift-devel] Request for control over throttle algorithm In-Reply-To: <1188240466.1493.9.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> Message-ID: <46D32F0A.3010400@mcs.anl.gov> [changing subject line to start a new thread] Mihael, all, I'm observing again that Karajan job throttling algorithms need more discussion, design and testing, and that in the meantime - and perhaps always - we need simple ways to override the algorithms and manually control the throttle. This is true for throttling both successful and failing jobs. Right now MolDyn progress is being impeded by a situation where a single bad cluster node (with stale FS file handles) has an unduly negative impact on overall workflow performance. I feel that before we discuss and work on the nuances of throttling algorithms (which will take some time to perfect) we should provide a simple and reliable way for the user to override the default heuristics and achieve good performance in situations that are currently occurring. How much work it would take to provide a config parameter that causes failed jobs to get retried immediately with no delay or scheduling penalty? I.e., let the user set the "failure penalty" ratio to reduce or eliminate the penalty for failures. Its possible that once we have this control, we'd need a few other parameters to make reasonable things happen in the case of running on one or more Falkon sites. In tandem with this, Falkon will provide parameters to control what happens to a node after a failure: - a failure analyzer will attempt to recognize node failures as opposed to app failures (some of this may need to go into the Swift launcher, wrapper.sh - on known node failures Falkon will log the failure to bring to sysadmin attention, and will also leave the node held - In the future falcon will add new nodes to compensate for nodes that it has disabled. I'd like to ask that we focus discussion on what is needed to design and implement these basic changes, and whether they would solve the current problems and be useful in general. - Mike Mihael Hategan wrote: > On Mon, 2007-08-27 at 13:25 -0500, Ioan Raicu wrote: >> The question I am interested in, can you modify the heuristic to take >> into account the execution time of tasks when updating the site score? > > I thought I mentioned I can. > >> I think it is important you use only the execution time (and not >> Falkon queue time + execution time + result delivery time); in this >> case, how does Falkon pass this information back to Swift? > > I thought I mentioned why that's not a good idea. Here's a short > version: > If Falkon is slow for some reason, that needs to be taken into account. > Excluding it from measurements under the assumption that it will always > be fast is not a particularly good idea. And if it is always fast then > it doesn't matter much since it won't add much overhead. > >> Ioan >> >> Mihael Hategan wrote: >>> On Mon, 2007-08-27 at 17:37 +0000, Ben Clifford wrote: >>> >>>> On Mon, 27 Aug 2007, Ioan Raicu wrote: >>>> >>>> >>>>> On a similar note, IMO, the heuristic in Karajan should be modified to take >>>>> into account the task execution time of the failed or successful task, and not >>>>> just the number of tasks. This would ensure that Swift is not throttling task >>>>> submission to Falkon when there are 1000s of successful tasks that take on the >>>>> order of 100s of second to complete, yet there are also 1000s of failed tasks >>>>> that are only 10 ms long. This is exactly the case with MolDyn, when we get a >>>>> bad node in a bunch of 100s of nodes, which ends up throttling the number of >>>>> active and running tasks to about 100, regardless of the number of processors >>>>> Falkon has. >>>>> >>>> Is that different from when submitting to PBS or GRAM where there are >>>> 1000s of successful tasks taking 100s of seconds to complete but with >>>> 1000s of failed tasks that are only 10ms long? >>>> >>> In your scenario, assuming that GRAM and PBS do work (since some jobs >>> succeed), then you can't really submit that fast. So the same thing >>> would happen, but slower. Unfortunately, in the PBS case, there's not >>> much that can be done but to throttle until no more jobs than good nodes >>> are being run at one time. >>> >>> Now, there is the probing part, which makes the system start with a >>> lower throttle which increases until problems appear. If this is >>> disabled (as it was in the ModDyn run), large numbers of parallel jobs >>> will be submitted causing a large number of failures. >>> >>> So this whole thing is close to a linear system with negative feedback. >>> If the initial state is very far away from stability, there will be >>> large transients. You're more than welcome to study how to make it >>> converge faster, or how to guess the initial state better (knowing the >>> number of nodes a cluster has would be a step). >>> >>> >>> >>> >>> >> -- >> ============================================ >> Ioan Raicu >> Ph.D. Student >> ============================================ >> Distributed Systems Laboratory >> Computer Science Department >> University of Chicago >> 1100 E. 58th Street, Ryerson Hall >> Chicago, IL 60637 >> ============================================ >> Email: iraicu at cs.uchicago.edu >> Web: http://www.cs.uchicago.edu/~iraicu >> http://dsl.cs.uchicago.edu/ >> ============================================ >> ============================================ > > From hategan at mcs.anl.gov Mon Aug 27 15:34:41 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 27 Aug 2007 15:34:41 -0500 Subject: [Swift-devel] Re: Request for control over throttle algorithm In-Reply-To: <46D32F0A.3010400@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> Message-ID: <1188246881.5795.13.camel@blabla.mcs.anl.gov> On Mon, 2007-08-27 at 15:07 -0500, Michael Wilde wrote: > [changing subject line to start a new thread] > > Mihael, all, > > I'm observing again that Karajan job throttling algorithms need more > discussion, design and testing, and that in the meantime - and perhaps > always - we need simple ways to override the algorithms and manually > control the throttle. Here's what happens: 1. somebody says "I don't like throttling because it decreases the performance" (that's what throttles do, in order to make things not fail) 2. we collectively conclude that we should disable throttling 3. there are options to change those in swift.properties (and one in scheduler.xml which I will also add to swift.properties), and they are increased to "virtually off" numbers (I need to add an explicit "off" to make things easier) 4. the workflows still don't work very well because there are lots of failures now, and quality drops 5. throttles are set back to reasonable values 6. maybe some things are changed (i.e. gram -> falkon), but fundamentally the problems are the same (different scales though) 7. GOTO 1 > > This is true for throttling both successful and failing jobs. > > Right now MolDyn progress is being impeded by a situation where a single > bad cluster node (with stale FS file handles) has an unduly negative > impact on overall workflow performance. Yes. And this is how things work. There are problems. It's a statement of fact. > > I feel that before we discuss and work on the nuances of throttling > algorithms (which will take some time to perfect) we should provide a > simple and reliable way for the user to override the default heuristics > and achieve good performance in situations that are currently occurring. Groovy. Would the above (all throttling parameters in swift.properties and the "off" option for each) work? > > How much work it would take to provide a config parameter that causes > failed jobs to get retried immediately with no delay or scheduling > penalty? I.e., let the user set the "failure penalty" ratio to reduce or > eliminate the penalty for failures. I'd suggest simply not throttling on such things. There can also be an option for tweaking the factors, but I have at least one small adversion towards having too many things in swift.properties. Mihael > > Its possible that once we have this control, we'd need a few other > parameters to make reasonable things happen in the case of running on > one or more Falkon sites. > > In tandem with this, Falkon will provide parameters to control what > happens to a node after a failure: > - a failure analyzer will attempt to recognize node failures as opposed > to app failures (some of this may need to go into the Swift launcher, > wrapper.sh > - on known node failures Falkon will log the failure to bring to > sysadmin attention, and will also leave the node held > - In the future falcon will add new nodes to compensate for nodes that > it has disabled. > > I'd like to ask that we focus discussion on what is needed to design and > implement these basic changes, and whether they would solve the current > problems and be useful in general. > > - Mike > > > > > > Mihael Hategan wrote: > > On Mon, 2007-08-27 at 13:25 -0500, Ioan Raicu wrote: > >> The question I am interested in, can you modify the heuristic to take > >> into account the execution time of tasks when updating the site score? > > > > I thought I mentioned I can. > > > >> I think it is important you use only the execution time (and not > >> Falkon queue time + execution time + result delivery time); in this > >> case, how does Falkon pass this information back to Swift? > > > > I thought I mentioned why that's not a good idea. Here's a short > > version: > > If Falkon is slow for some reason, that needs to be taken into account. > > Excluding it from measurements under the assumption that it will always > > be fast is not a particularly good idea. And if it is always fast then > > it doesn't matter much since it won't add much overhead. > > > >> Ioan > >> > >> Mihael Hategan wrote: > >>> On Mon, 2007-08-27 at 17:37 +0000, Ben Clifford wrote: > >>> > >>>> On Mon, 27 Aug 2007, Ioan Raicu wrote: > >>>> > >>>> > >>>>> On a similar note, IMO, the heuristic in Karajan should be modified to take > >>>>> into account the task execution time of the failed or successful task, and not > >>>>> just the number of tasks. This would ensure that Swift is not throttling task > >>>>> submission to Falkon when there are 1000s of successful tasks that take on the > >>>>> order of 100s of second to complete, yet there are also 1000s of failed tasks > >>>>> that are only 10 ms long. This is exactly the case with MolDyn, when we get a > >>>>> bad node in a bunch of 100s of nodes, which ends up throttling the number of > >>>>> active and running tasks to about 100, regardless of the number of processors > >>>>> Falkon has. > >>>>> > >>>> Is that different from when submitting to PBS or GRAM where there are > >>>> 1000s of successful tasks taking 100s of seconds to complete but with > >>>> 1000s of failed tasks that are only 10ms long? > >>>> > >>> In your scenario, assuming that GRAM and PBS do work (since some jobs > >>> succeed), then you can't really submit that fast. So the same thing > >>> would happen, but slower. Unfortunately, in the PBS case, there's not > >>> much that can be done but to throttle until no more jobs than good nodes > >>> are being run at one time. > >>> > >>> Now, there is the probing part, which makes the system start with a > >>> lower throttle which increases until problems appear. If this is > >>> disabled (as it was in the ModDyn run), large numbers of parallel jobs > >>> will be submitted causing a large number of failures. > >>> > >>> So this whole thing is close to a linear system with negative feedback. > >>> If the initial state is very far away from stability, there will be > >>> large transients. You're more than welcome to study how to make it > >>> converge faster, or how to guess the initial state better (knowing the > >>> number of nodes a cluster has would be a step). > >>> > >>> > >>> > >>> > >>> > >> -- > >> ============================================ > >> Ioan Raicu > >> Ph.D. Student > >> ============================================ > >> Distributed Systems Laboratory > >> Computer Science Department > >> University of Chicago > >> 1100 E. 58th Street, Ryerson Hall > >> Chicago, IL 60637 > >> ============================================ > >> Email: iraicu at cs.uchicago.edu > >> Web: http://www.cs.uchicago.edu/~iraicu > >> http://dsl.cs.uchicago.edu/ > >> ============================================ > >> ============================================ > > > > > From benc at hawaga.org.uk Mon Aug 27 16:12:05 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 27 Aug 2007 21:12:05 +0000 (GMT) Subject: [Swift-devel] language-behaviour/150 broken In-Reply-To: <1188240874.2086.2.camel@blabla.mcs.anl.gov> References: <1188240874.2086.2.camel@blabla.mcs.anl.gov> Message-ID: ok. that matches up sanely with what I thought happened. On Mon, 27 Aug 2007, Mihael Hategan wrote: > My bad. There was a conflict and what I saw as being the last was the > typecast version. Probably because I did a reverse diff when looking at > the issue. > > On Mon, 2007-08-27 at 09:50 +0000, Ben Clifford wrote: > > Looks like maybe r1108 caused a regression bug that was previously fixed > > in r1050 - RootArrayDataNode expects a java.lang.String for its "prefix" > > parameter, but if it is passed a SwiftScript expression the value is not > > of that class. > > > > From wilde at mcs.anl.gov Mon Aug 27 16:15:20 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 27 Aug 2007 16:15:20 -0500 Subject: [Swift-devel] Re: Request for control over throttle algorithm In-Reply-To: <1188246881.5795.13.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> <1188246881.5795.13.camel@blabla.mcs.anl.gov> Message-ID: <46D33EE8.9040900@mcs.anl.gov> Mihael Hategan wrote: > On Mon, 2007-08-27 at 15:07 -0500, Michael Wilde wrote: >> [changing subject line to start a new thread] >> >> Mihael, all, >> >> I'm observing again that Karajan job throttling algorithms need more >> discussion, design and testing, and that in the meantime - and perhaps >> always - we need simple ways to override the algorithms and manually >> control the throttle. > > Here's what happens: > 1. somebody says "I don't like throttling because it decreases the > performance" (that's what throttles do, in order to make things not > fail) No. What was said was: We are trying to get a workflow running for a real science user - on whose success we depend on. And in the process of doing that, the current obstacle to good performance is a failure-retry behavior that is not working well. > 2. we collectively conclude that we should disable throttling Several of us believe that in *this* case it will enable the workflow to *finally* succeed and will also yield better performance. Note that the default settings do not even let the workflow complete successully. > 3. there are options to change those in swift.properties (and one in > scheduler.xml which I will also add to swift.properties), and they are > increased to "virtually off" numbers (I need to add an explicit "off" to > make things easier) This is great - just what we need. But I think Ioan cant find the prior email in which you describe them, and I couldnt either. Could you re-state what to set, please? > 4. the workflows still don't work very well because there are lots of > failures now, and quality drops That would be a different scenario. In this case, Ioan will try to take the offending node(s) out of service as seen by Falkon. > 5. throttles are set back to reasonable values Yes, thats the goal. I believe that automated failure handling is difficult and takes a while - lots of design, measurement, test, improve - before they work well. Certainly the internet and TCP/IP teaches us that. Critical, necessary, but a long road. > 6. maybe some things are changed (i.e. gram -> falkon), but > fundamentally the problems are the same (different scales though) > 7. GOTO 1 Yes, as often as needed. Its iteration, but not endless, if done thoughtfully. > >> This is true for throttling both successful and failing jobs. I agree. >> >> Right now MolDyn progress is being impeded by a situation where a single >> bad cluster node (with stale FS file handles) has an unduly negative >> impact on overall workflow performance. > > Yes. And this is how things work. There are problems. It's a statement > of fact. > >> I feel that before we discuss and work on the nuances of throttling >> algorithms (which will take some time to perfect) we should provide a >> simple and reliable way for the user to override the default heuristics >> and achieve good performance in situations that are currently occurring. > > Groovy. Would the above (all throttling parameters in swift.properties > and the "off" option for each) work? Yes, I think so - again, please (re)re-iterate what they are, please. :) > >> How much work it would take to provide a config parameter that causes >> failed jobs to get retried immediately with no delay or scheduling >> penalty? I.e., let the user set the "failure penalty" ratio to reduce or >> eliminate the penalty for failures. > > I'd suggest simply not throttling on such things. Agreed. Cool. > > There can also be an option for tweaking the factors, but I have at > least one small adversion towards having too many things in > swift.properties. Sounds reasonable. Lets start with the basics. Now, having said all this - perhaps Ioan can catch and retry the failure all in falkon. Is wrapper.sh capable of getting re-run on a different node of the same cluster? (If not I think we can enance it to be). Thanks, Mike > > Mihael > >> Its possible that once we have this control, we'd need a few other >> parameters to make reasonable things happen in the case of running on >> one or more Falkon sites. >> >> In tandem with this, Falkon will provide parameters to control what >> happens to a node after a failure: >> - a failure analyzer will attempt to recognize node failures as opposed >> to app failures (some of this may need to go into the Swift launcher, >> wrapper.sh >> - on known node failures Falkon will log the failure to bring to >> sysadmin attention, and will also leave the node held >> - In the future falcon will add new nodes to compensate for nodes that >> it has disabled. >> >> I'd like to ask that we focus discussion on what is needed to design and >> implement these basic changes, and whether they would solve the current >> problems and be useful in general. >> >> - Mike >> >> >> >> >> >> Mihael Hategan wrote: >>> On Mon, 2007-08-27 at 13:25 -0500, Ioan Raicu wrote: >>>> The question I am interested in, can you modify the heuristic to take >>>> into account the execution time of tasks when updating the site score? >>> I thought I mentioned I can. >>> >>>> I think it is important you use only the execution time (and not >>>> Falkon queue time + execution time + result delivery time); in this >>>> case, how does Falkon pass this information back to Swift? >>> I thought I mentioned why that's not a good idea. Here's a short >>> version: >>> If Falkon is slow for some reason, that needs to be taken into account. >>> Excluding it from measurements under the assumption that it will always >>> be fast is not a particularly good idea. And if it is always fast then >>> it doesn't matter much since it won't add much overhead. >>> >>>> Ioan >>>> >>>> Mihael Hategan wrote: >>>>> On Mon, 2007-08-27 at 17:37 +0000, Ben Clifford wrote: >>>>> >>>>>> On Mon, 27 Aug 2007, Ioan Raicu wrote: >>>>>> >>>>>> >>>>>>> On a similar note, IMO, the heuristic in Karajan should be modified to take >>>>>>> into account the task execution time of the failed or successful task, and not >>>>>>> just the number of tasks. This would ensure that Swift is not throttling task >>>>>>> submission to Falkon when there are 1000s of successful tasks that take on the >>>>>>> order of 100s of second to complete, yet there are also 1000s of failed tasks >>>>>>> that are only 10 ms long. This is exactly the case with MolDyn, when we get a >>>>>>> bad node in a bunch of 100s of nodes, which ends up throttling the number of >>>>>>> active and running tasks to about 100, regardless of the number of processors >>>>>>> Falkon has. >>>>>>> >>>>>> Is that different from when submitting to PBS or GRAM where there are >>>>>> 1000s of successful tasks taking 100s of seconds to complete but with >>>>>> 1000s of failed tasks that are only 10ms long? >>>>>> >>>>> In your scenario, assuming that GRAM and PBS do work (since some jobs >>>>> succeed), then you can't really submit that fast. So the same thing >>>>> would happen, but slower. Unfortunately, in the PBS case, there's not >>>>> much that can be done but to throttle until no more jobs than good nodes >>>>> are being run at one time. >>>>> >>>>> Now, there is the probing part, which makes the system start with a >>>>> lower throttle which increases until problems appear. If this is >>>>> disabled (as it was in the ModDyn run), large numbers of parallel jobs >>>>> will be submitted causing a large number of failures. >>>>> >>>>> So this whole thing is close to a linear system with negative feedback. >>>>> If the initial state is very far away from stability, there will be >>>>> large transients. You're more than welcome to study how to make it >>>>> converge faster, or how to guess the initial state better (knowing the >>>>> number of nodes a cluster has would be a step). >>>>> >>>>> >>>>> >>>>> >>>>> >>>> -- >>>> ============================================ >>>> Ioan Raicu >>>> Ph.D. Student >>>> ============================================ >>>> Distributed Systems Laboratory >>>> Computer Science Department >>>> University of Chicago >>>> 1100 E. 58th Street, Ryerson Hall >>>> Chicago, IL 60637 >>>> ============================================ >>>> Email: iraicu at cs.uchicago.edu >>>> Web: http://www.cs.uchicago.edu/~iraicu >>>> http://dsl.cs.uchicago.edu/ >>>> ============================================ >>>> ============================================ >>> > > From iraicu at cs.uchicago.edu Mon Aug 27 16:20:42 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Mon, 27 Aug 2007 16:20:42 -0500 Subject: [Swift-devel] Re: 244 MolDyn run was successful! In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46BC950E.4080503@cs.uchicago.edu> <46BCBBB0.70705@cs.uchicago.edu> <5DC342F2-74C9-4D7F-B93E-51AEE08A0C03@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchicago.edu> <1187069175.5653.30.camel@blabla.mcs.anl.gov> <46C48200.4020503@cs.uchicago.edu> Message-ID: <46D3402A.600@cs.uchicago.edu> Right, I have been working out of SVN for a while now, just haven't done many commits... I will try to get my latest Falkon (not provider as this is currently up to date) changes in today into SVN! Ioan Ben Clifford wrote: > On Thu, 16 Aug 2007, Ioan Raicu wrote: > > >> (if the consensus is that you want the latest changes in SVN, then >> perhaps I can do this on a weekly basis) >> > > One model for what you should put in the trunk of the SVN is code that you > think works well enough for a user to be making regular use of for their > work (eg Nika). approximately equivalently, you shouldn't be pointing > users to anything other than the SVN trunk (or some snapshot of trunk)) to > obtain code. > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From foster at mcs.anl.gov Mon Aug 27 16:21:12 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Mon, 27 Aug 2007 16:21:12 -0500 Subject: [Swift-devel] Request for control over throttle algorithm In-Reply-To: <46D32F0A.3010400@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46BCC442.1080500@cs.uchicago.edu> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> Message-ID: <46D34048.8010509@mcs.anl.gov> Yes, well put. I appreciate how important throttling is in many circumstances, and the care and thought that has gone into its design. It's just that "running with a single Falkon-controlled site" is not one of those circumstances where throttling is useful. It's a special case, certainly, but an important one at present. Ian. Michael Wilde wrote: > [changing subject line to start a new thread] > > Mihael, all, > > I'm observing again that Karajan job throttling algorithms need more > discussion, design and testing, and that in the meantime - and perhaps > always - we need simple ways to override the algorithms and manually > control the throttle. > > This is true for throttling both successful and failing jobs. > > Right now MolDyn progress is being impeded by a situation where a > single bad cluster node (with stale FS file handles) has an unduly > negative impact on overall workflow performance. > > I feel that before we discuss and work on the nuances of throttling > algorithms (which will take some time to perfect) we should provide a > simple and reliable way for the user to override the default > heuristics and achieve good performance in situations that are > currently occurring. > > How much work it would take to provide a config parameter that causes > failed jobs to get retried immediately with no delay or scheduling > penalty? I.e., let the user set the "failure penalty" ratio to reduce > or eliminate the penalty for failures. > > Its possible that once we have this control, we'd need a few other > parameters to make reasonable things happen in the case of running on > one or more Falkon sites. > > In tandem with this, Falkon will provide parameters to control what > happens to a node after a failure: > - a failure analyzer will attempt to recognize node failures as > opposed to app failures (some of this may need to go into the Swift > launcher, wrapper.sh > - on known node failures Falkon will log the failure to bring to > sysadmin attention, and will also leave the node held > - In the future falcon will add new nodes to compensate for nodes that > it has disabled. > > I'd like to ask that we focus discussion on what is needed to design > and implement these basic changes, and whether they would solve the > current problems and be useful in general. > > - Mike > > > > > > Mihael Hategan wrote: >> On Mon, 2007-08-27 at 13:25 -0500, Ioan Raicu wrote: >>> The question I am interested in, can you modify the heuristic to take >>> into account the execution time of tasks when updating the site score? >> >> I thought I mentioned I can. >> >>> I think it is important you use only the execution time (and not >>> Falkon queue time + execution time + result delivery time); in this >>> case, how does Falkon pass this information back to Swift? >> >> I thought I mentioned why that's not a good idea. Here's a short >> version: >> If Falkon is slow for some reason, that needs to be taken into account. >> Excluding it from measurements under the assumption that it will always >> be fast is not a particularly good idea. And if it is always fast then >> it doesn't matter much since it won't add much overhead. >> >>> Ioan >>> >>> Mihael Hategan wrote: >>>> On Mon, 2007-08-27 at 17:37 +0000, Ben Clifford wrote: >>>> >>>>> On Mon, 27 Aug 2007, Ioan Raicu wrote: >>>>> >>>>> >>>>>> On a similar note, IMO, the heuristic in Karajan should be >>>>>> modified to take >>>>>> into account the task execution time of the failed or successful >>>>>> task, and not >>>>>> just the number of tasks. This would ensure that Swift is not >>>>>> throttling task >>>>>> submission to Falkon when there are 1000s of successful tasks >>>>>> that take on the >>>>>> order of 100s of second to complete, yet there are also 1000s of >>>>>> failed tasks >>>>>> that are only 10 ms long. This is exactly the case with MolDyn, >>>>>> when we get a >>>>>> bad node in a bunch of 100s of nodes, which ends up throttling >>>>>> the number of >>>>>> active and running tasks to about 100, regardless of the number >>>>>> of processors >>>>>> Falkon has. >>>>> Is that different from when submitting to PBS or GRAM where there >>>>> are 1000s of successful tasks taking 100s of seconds to complete >>>>> but with 1000s of failed tasks that are only 10ms long? >>>>> >>>> In your scenario, assuming that GRAM and PBS do work (since some jobs >>>> succeed), then you can't really submit that fast. So the same thing >>>> would happen, but slower. Unfortunately, in the PBS case, there's not >>>> much that can be done but to throttle until no more jobs than good >>>> nodes >>>> are being run at one time. >>>> >>>> Now, there is the probing part, which makes the system start with a >>>> lower throttle which increases until problems appear. If this is >>>> disabled (as it was in the ModDyn run), large numbers of parallel jobs >>>> will be submitted causing a large number of failures. >>>> >>>> So this whole thing is close to a linear system with negative >>>> feedback. >>>> If the initial state is very far away from stability, there will be >>>> large transients. You're more than welcome to study how to make it >>>> converge faster, or how to guess the initial state better (knowing the >>>> number of nodes a cluster has would be a step). >>>> >>>> >>>> >>>> >>> -- >>> ============================================ >>> Ioan Raicu >>> Ph.D. Student >>> ============================================ >>> Distributed Systems Laboratory >>> Computer Science Department >>> University of Chicago >>> 1100 E. 58th Street, Ryerson Hall >>> Chicago, IL 60637 >>> ============================================ >>> Email: iraicu at cs.uchicago.edu >>> Web: http://www.cs.uchicago.edu/~iraicu >>> http://dsl.cs.uchicago.edu/ >>> ============================================ >>> ============================================ >> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. From benc at hawaga.org.uk Mon Aug 27 16:38:32 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 27 Aug 2007 21:38:32 +0000 (GMT) Subject: [Swift-devel] Re: Request for control over throttle algorithm In-Reply-To: <46D33EE8.9040900@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> <1188246881.5795.13.camel@blabla.mcs.anl.gov> <46D33EE8.9040900@mcs.anl.gov> Message-ID: On Mon, 27 Aug 2007, Michael Wilde wrote: > > 3. there are options to change those in swift.properties (and one in > > scheduler.xml which I will also add to swift.properties), and they are > > increased to "virtually off" numbers (I need to add an explicit "off" to > > make things easier) > > This is great - just what we need. But I think Ioan cant find the prior email > in which you describe them, and I couldnt either. Could you re-state what to > set, please? or, for long term preservation, make the user guide such that it is apparent. -- From hategan at mcs.anl.gov Mon Aug 27 16:55:33 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 27 Aug 2007 16:55:33 -0500 Subject: [Swift-devel] Re: Request for control over throttle algorithm In-Reply-To: <46D33EE8.9040900@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> <1188246881.5795.13.camel@blabla.mcs.anl.gov> <46D33EE8.9040900@mcs.anl.gov> Message-ID: <1188251733.9813.33.camel@blabla.mcs.anl.gov> On Mon, 2007-08-27 at 16:15 -0500, Michael Wilde wrote: > Mihael Hategan wrote: > > On Mon, 2007-08-27 at 15:07 -0500, Michael Wilde wrote: > >> [changing subject line to start a new thread] > >> > >> Mihael, all, > >> > >> I'm observing again that Karajan job throttling algorithms need more > >> discussion, design and testing, and that in the meantime - and perhaps > >> always - we need simple ways to override the algorithms and manually > >> control the throttle. > > > > Here's what happens: > > 1. somebody says "I don't like throttling because it decreases the > > performance" (that's what throttles do, in order to make things not > > fail) > > No. What was said was: We are trying to get a workflow running for a > real science user - on whose success we depend on. And in the process > of doing that, the current obstacle to good performance is a > failure-retry behavior that is not working well. I think the problem is a misunderstanding about what "good performance" means given the current assumptions. Sub-systems begin to break as more and more performance is requested from them, causing chained failures. Throttling tries to achieve that balance between what's too much and what's too little. Yong and I played with some of the numbers. And some of those approximate that balance. But not everybody is convinced it seems. Which is fine. The reaction has however always been "let's disable throttling". Which is also fine. But only once. > > > 2. we collectively conclude that we should disable throttling > > Several of us believe that in *this* case it will enable the workflow to > *finally* succeed and will also yield better performance. I think that's wrong, and I think the long discussions on the mol-dyn run topic explain why. In short lack of throttling will cause large numbers of failures. But beyond that, please, I'm not trying to stop anybody from disabling these. I've mentioned how, if there are further questions, I'm happy to answer them. > Note that the > default settings do not even let the workflow complete successully. That's only correlation. I highly doubt the throttles are the reason the workflow didn't complete. > > > 3. there are options to change those in swift.properties (and one in > > scheduler.xml which I will also add to swift.properties), and they are > > increased to "virtually off" numbers (I need to add an explicit "off" to > > make things easier) > > This is great - just what we need. But I think Ioan cant find the prior > email in which you describe them, and I couldnt either. Could you > re-state what to set, please? all throttle.* properties to, say 100000021. libexec/scheduler.xml > > > > 4. the workflows still don't work very well because there are lots of > > failures now, and quality drops > > That would be a different scenario. In this case, Ioan will try to take > the offending node(s) out of service as seen by Falkon. Right. Which is a particular case of throttling done because there's better info available (i.e. set throttle to 0 on bad nodes). > > > 5. throttles are set back to reasonable values > > Yes, thats the goal. I believe that automated failure handling is > difficult and takes a while - lots of design, measurement, test, improve > - before they work well. Certainly the internet and TCP/IP teaches us > that. Critical, necessary, but a long road. > > > 6. maybe some things are changed (i.e. gram -> falkon), but > > fundamentally the problems are the same (different scales though) > > 7. GOTO 1 > > Yes, as often as needed. Its iteration, but not endless, if done > thoughtfully. Only if there's any learning. But the conflict between what we think is achievable and what we can achieve seems to remain. That's pretty much the problem: instead of trying to reconcile these, we keep saying that the other side is wrong, and either the other side fails to provide some proof of that or we refuse to listen to (or don't care about) the other side because we *know* we are right. Pretty much like the feedback system described a while ago, but this one has a very low dampening factor. Mihael > > > > >> This is true for throttling both successful and failing jobs. > > I agree. > > >> > >> Right now MolDyn progress is being impeded by a situation where a single > >> bad cluster node (with stale FS file handles) has an unduly negative > >> impact on overall workflow performance. > > > > Yes. And this is how things work. There are problems. It's a statement > > of fact. > > > >> I feel that before we discuss and work on the nuances of throttling > >> algorithms (which will take some time to perfect) we should provide a > >> simple and reliable way for the user to override the default heuristics > >> and achieve good performance in situations that are currently occurring. > > > > Groovy. Would the above (all throttling parameters in swift.properties > > and the "off" option for each) work? > > Yes, I think so - again, please (re)re-iterate what they are, please. :) > > > > >> How much work it would take to provide a config parameter that causes > >> failed jobs to get retried immediately with no delay or scheduling > >> penalty? I.e., let the user set the "failure penalty" ratio to reduce or > >> eliminate the penalty for failures. > > > > I'd suggest simply not throttling on such things. > > Agreed. Cool. > > > > > There can also be an option for tweaking the factors, but I have at > > least one small adversion towards having too many things in > > swift.properties. > > Sounds reasonable. Lets start with the basics. > > Now, having said all this - perhaps Ioan can catch and retry the failure > all in falkon. Is wrapper.sh capable of getting re-run on a different > node of the same cluster? (If not I think we can enance it to be). > > Thanks, > > Mike > > > > > Mihael > > > >> Its possible that once we have this control, we'd need a few other > >> parameters to make reasonable things happen in the case of running on > >> one or more Falkon sites. > >> > >> In tandem with this, Falkon will provide parameters to control what > >> happens to a node after a failure: > >> - a failure analyzer will attempt to recognize node failures as opposed > >> to app failures (some of this may need to go into the Swift launcher, > >> wrapper.sh > >> - on known node failures Falkon will log the failure to bring to > >> sysadmin attention, and will also leave the node held > >> - In the future falcon will add new nodes to compensate for nodes that > >> it has disabled. > >> > >> I'd like to ask that we focus discussion on what is needed to design and > >> implement these basic changes, and whether they would solve the current > >> problems and be useful in general. > >> > >> - Mike > >> > >> > >> > >> > >> > >> Mihael Hategan wrote: > >>> On Mon, 2007-08-27 at 13:25 -0500, Ioan Raicu wrote: > >>>> The question I am interested in, can you modify the heuristic to take > >>>> into account the execution time of tasks when updating the site score? > >>> I thought I mentioned I can. > >>> > >>>> I think it is important you use only the execution time (and not > >>>> Falkon queue time + execution time + result delivery time); in this > >>>> case, how does Falkon pass this information back to Swift? > >>> I thought I mentioned why that's not a good idea. Here's a short > >>> version: > >>> If Falkon is slow for some reason, that needs to be taken into account. > >>> Excluding it from measurements under the assumption that it will always > >>> be fast is not a particularly good idea. And if it is always fast then > >>> it doesn't matter much since it won't add much overhead. > >>> > >>>> Ioan > >>>> > >>>> Mihael Hategan wrote: > >>>>> On Mon, 2007-08-27 at 17:37 +0000, Ben Clifford wrote: > >>>>> > >>>>>> On Mon, 27 Aug 2007, Ioan Raicu wrote: > >>>>>> > >>>>>> > >>>>>>> On a similar note, IMO, the heuristic in Karajan should be modified to take > >>>>>>> into account the task execution time of the failed or successful task, and not > >>>>>>> just the number of tasks. This would ensure that Swift is not throttling task > >>>>>>> submission to Falkon when there are 1000s of successful tasks that take on the > >>>>>>> order of 100s of second to complete, yet there are also 1000s of failed tasks > >>>>>>> that are only 10 ms long. This is exactly the case with MolDyn, when we get a > >>>>>>> bad node in a bunch of 100s of nodes, which ends up throttling the number of > >>>>>>> active and running tasks to about 100, regardless of the number of processors > >>>>>>> Falkon has. > >>>>>>> > >>>>>> Is that different from when submitting to PBS or GRAM where there are > >>>>>> 1000s of successful tasks taking 100s of seconds to complete but with > >>>>>> 1000s of failed tasks that are only 10ms long? > >>>>>> > >>>>> In your scenario, assuming that GRAM and PBS do work (since some jobs > >>>>> succeed), then you can't really submit that fast. So the same thing > >>>>> would happen, but slower. Unfortunately, in the PBS case, there's not > >>>>> much that can be done but to throttle until no more jobs than good nodes > >>>>> are being run at one time. > >>>>> > >>>>> Now, there is the probing part, which makes the system start with a > >>>>> lower throttle which increases until problems appear. If this is > >>>>> disabled (as it was in the ModDyn run), large numbers of parallel jobs > >>>>> will be submitted causing a large number of failures. > >>>>> > >>>>> So this whole thing is close to a linear system with negative feedback. > >>>>> If the initial state is very far away from stability, there will be > >>>>> large transients. You're more than welcome to study how to make it > >>>>> converge faster, or how to guess the initial state better (knowing the > >>>>> number of nodes a cluster has would be a step). > >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > >>>> -- > >>>> ============================================ > >>>> Ioan Raicu > >>>> Ph.D. Student > >>>> ============================================ > >>>> Distributed Systems Laboratory > >>>> Computer Science Department > >>>> University of Chicago > >>>> 1100 E. 58th Street, Ryerson Hall > >>>> Chicago, IL 60637 > >>>> ============================================ > >>>> Email: iraicu at cs.uchicago.edu > >>>> Web: http://www.cs.uchicago.edu/~iraicu > >>>> http://dsl.cs.uchicago.edu/ > >>>> ============================================ > >>>> ============================================ > >>> > > > > > From benc at hawaga.org.uk Mon Aug 27 16:56:20 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Mon, 27 Aug 2007 21:56:20 +0000 (GMT) Subject: [Swift-devel] Re: Request for control over throttle algorithm In-Reply-To: <46D33EE8.9040900@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> <1188246881.5795.13.camel@blabla.mcs.anl.gov> <46D33EE8.9040900@mcs.anl.gov> Message-ID: it seems superficially to say "we shouldn't be forcing our execution management semantics on the underlying execution system"; which perhaps means both that we shouldn't be rate limiting but also that we shouldn't be doing retries and things like that - we either expect someone else to deal with this stuff or we do it ourselves. -- From hategan at mcs.anl.gov Mon Aug 27 16:59:37 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 27 Aug 2007 16:59:37 -0500 Subject: [Swift-devel] Re: Request for control over throttle algorithm In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> <1188246881.5795.13.camel@blabla.mcs.anl.gov> <46D33EE8.9040900@mcs.anl.gov> Message-ID: <1188251977.9813.35.camel@blabla.mcs.anl.gov> On Mon, 2007-08-27 at 21:56 +0000, Ben Clifford wrote: > it seems superficially to say "we shouldn't be forcing our execution > management semantics on the underlying execution system"; which perhaps > means both that we shouldn't be rate limiting but also that we shouldn't > be doing retries and things like that - we either expect someone else to > deal with this stuff or we do it ourselves. Or both, but at different levels? > From bugzilla-daemon at mcs.anl.gov Tue Aug 28 02:04:29 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 28 Aug 2007 02:04:29 -0500 (CDT) Subject: [Swift-devel] [Bug 87] New: quoting parse failure. Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=87 Summary: quoting parse failure. Product: Swift Version: unspecified Platform: Macintosh OS/Version: Mac OS Status: NEW Severity: normal Priority: P2 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: benc at hawaga.org.uk CC: swift-devel at ci.uchicago.edu in r1115, the single line: string s = "\"foo\""; causes this error: $ swift quote.swift Could not compile SwiftScript source: line 1:18: unexpected char: '\' It looks like there is no way to get a " character into a string. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From foster at mcs.anl.gov Mon Aug 27 20:04:56 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Mon, 27 Aug 2007 20:04:56 -0500 Subject: [Swift-devel] Re: Request for control over throttle algorithm In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> <1188246881.5795.13.camel@blabla.mcs.anl.gov> <46D33EE8.9040900@mcs.anl.gov> Message-ID: <46D374B8.20205@mcs.anl.gov> These are good questions: where do different things belong. Working out the right approach in this specific example will surely provide insights into how to do things in other cases. Ian. Ben Clifford wrote: > it seems superficially to say "we shouldn't be forcing our execution > management semantics on the underlying execution system"; which perhaps > means both that we shouldn't be rate limiting but also that we shouldn't > be doing retries and things like that - we either expect someone else to > deal with this stuff or we do it ourselves. > > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. From hategan at mcs.anl.gov Tue Aug 28 12:02:58 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 28 Aug 2007 12:02:58 -0500 Subject: [Swift-devel] Re: Request for control over throttle algorithm In-Reply-To: <46D374B8.20205@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> <1188246881.5795.13.camel@blabla.mcs.anl.gov> <46D33EE8.9040900@mcs.anl.gov> <46D374B8.20205@mcs.anl.gov> Message-ID: <1188320578.7029.11.camel@blabla.mcs.anl.gov> On Mon, 2007-08-27 at 20:04 -0500, Ian Foster wrote: > These are good questions: where do different things belong. > > Working out the right approach in this specific example will surely > provide insights into how to do things in other cases. Will it? The exact same statement could have been made when such "concepts" were put into cog/karajan, or when Yong figured out specific numbers (and on all the occasions that have worked to build up to what we have today). > > Ian. > > Ben Clifford wrote: > > it seems superficially to say "we shouldn't be forcing our execution > > management semantics on the underlying execution system"; which perhaps > > means both that we shouldn't be rate limiting but also that we shouldn't > > be doing retries and things like that - we either expect someone else to > > deal with this stuff or we do it ourselves. > > > > > From hategan at mcs.anl.gov Tue Aug 28 12:05:39 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 28 Aug 2007 12:05:39 -0500 Subject: [Swift-devel] Re: Request for control over throttle algorithm In-Reply-To: <46D33EE8.9040900@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <1186777351.2369.2.camel@blabla.mcs.anl.gov> <46BCCAEB.5030207@cs.uchicago.edu> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> <1188246881.5795.13.camel@blabla.mcs.anl.gov> <46D33EE8.9040900@mcs.anl.gov> Message-ID: <1188320739.7029.14.camel@blabla.mcs.anl.gov> > > Groovy. Would the above (all throttling parameters in swift.properties > > and the "off" option for each) work? > > Yes, I think so - again, please (re)re-iterate what they are, please. :) > 1. "off" as a valid value for throttles and the job throttling bit were added to swift.properties. Perhaps unrelated, but to save me some clicks: 2. PBS and dCache providers were added as dependencies, so they're built by default From benc at hawaga.org.uk Tue Aug 28 12:19:25 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Tue, 28 Aug 2007 17:19:25 +0000 (GMT) Subject: [Swift-devel] Re: Request for control over throttle algorithm In-Reply-To: <1188320739.7029.14.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> <1188246881.5795.13.camel@blabla.mcs.anl.gov> <46D33EE8.9040900@mcs.anl.gov> <1188320739.7029.14.camel@blabla.mcs.anl.gov> Message-ID: On Tue, 28 Aug 2007, Mihael Hategan wrote: > 1. "off" as a valid value for throttles and the job throttling bit were > added to swift.properties. so all of the throttle.* properties mentioned in the userguide? -- From foster at mcs.anl.gov Tue Aug 28 12:46:22 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Tue, 28 Aug 2007 12:46:22 -0500 Subject: [Swift-devel] Re: Request for control over throttle algorithm In-Reply-To: <1188320578.7029.11.camel@blabla.mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> <1188246881.5795.13.camel@blabla.mcs.anl.gov> <46D33EE8.9040900@mcs.anl.gov> <46D374B8.20205@mcs.anl.gov> <1188320578.7029.11.camel@blabla.mcs.anl.gov> Message-ID: <46D45F6E.7010105@mcs.anl.gov> I seem no reason to assume that our understanding will not advance as we study more cases. Mihael Hategan wrote: > On Mon, 2007-08-27 at 20:04 -0500, Ian Foster wrote: > >> These are good questions: where do different things belong. >> >> Working out the right approach in this specific example will surely >> provide insights into how to do things in other cases. >> > > Will it? The exact same statement could have been made when such > "concepts" were put into cog/karajan, or when Yong figured out specific > numbers (and on all the occasions that have worked to build up to what > we have today). > > >> Ian. >> >> Ben Clifford wrote: >> >>> it seems superficially to say "we shouldn't be forcing our execution >>> management semantics on the underlying execution system"; which perhaps >>> means both that we shouldn't be rate limiting but also that we shouldn't >>> be doing retries and things like that - we either expect someone else to >>> deal with this stuff or we do it ourselves. >>> >>> >>> > > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Aug 28 12:48:11 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 28 Aug 2007 12:48:11 -0500 Subject: [Swift-devel] Re: Request for control over throttle algorithm In-Reply-To: References: <46AF37D9.7000301@mcs.anl.gov> <46BCE5AD.3030401@cs.uchicago.edu> <1186787247.8088.2.camel@blabla.mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> <1188246881.5795.13.camel@blabla.mcs.anl.gov> <46D33EE8.9040900@mcs.anl.gov> <1188320739.7029.14.camel@blabla.mcs.anl.gov> Message-ID: <1188323291.9313.0.camel@blabla.mcs.anl.gov> On Tue, 2007-08-28 at 17:19 +0000, Ben Clifford wrote: > > On Tue, 28 Aug 2007, Mihael Hategan wrote: > > > 1. "off" as a valid value for throttles and the job throttling bit were > > added to swift.properties. > > so all of the throttle.* properties mentioned in the userguide? Not the added one. > From hategan at mcs.anl.gov Tue Aug 28 12:54:02 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 28 Aug 2007 12:54:02 -0500 Subject: [Swift-devel] Re: Request for control over throttle algorithm In-Reply-To: <46D45F6E.7010105@mcs.anl.gov> References: <46AF37D9.7000301@mcs.anl.gov> <46BDB67D.2040207@cs.uchicago.edu> <46BE98FC.8040606@cs.uchicago.edu> <1186938048.24879.8.camel@blabla.mcs.anl.gov> <46BFD3FE.1090205@cs.uchicago.edu> <1186978892.21992.12.camel@blabla.mcs.anl.gov> <46C0BC42.6050108@cs.uchicago.edu> <1187038031.5916.23.camel@blabla.mcs.anl.gov> <46C12A78.5000602@cs.uchicago.edu> <1187065878.4015.19.camel@blabla.mcs.anl.gov> <46C13508.3070000@cs.uchic ago.edu> <46D30A2C.7040009@cs.uchicago.edu> <1188238079.31798.25.camel@blabla.mcs.anl.gov> <46D3171A.9030001@cs.uchicago.edu> <1188240466.1493.9.camel@blabla.mcs.anl.gov> <46D32F0A.3010400@mcs.anl.gov> <1188246881.5795.13.camel@blabla.mcs.anl.gov> <46D33EE8.9040900@mcs.anl.gov> <46D374B8.20205@mcs.anl.gov> <1188320578.7029.11.camel@blabla.mcs.anl.gov> <46D45F6E.7010105@mcs.anl.gov> Message-ID: <1188323642.9313.6.camel@blabla.mcs.anl.gov> Our understanding as a group is not the sum of our understandings as individuals (although ideally I'd like to think it should be). On Tue, 2007-08-28 at 12:46 -0500, Ian Foster wrote: > I seem no reason to assume that our understanding will not advance as > we study more cases. > > Mihael Hategan wrote: > > On Mon, 2007-08-27 at 20:04 -0500, Ian Foster wrote: > > > > > These are good questions: where do different things belong. > > > > > > Working out the right approach in this specific example will surely > > > provide insights into how to do things in other cases. > > > > > > > Will it? The exact same statement could have been made when such > > "concepts" were put into cog/karajan, or when Yong figured out specific > > numbers (and on all the occasions that have worked to build up to what > > we have today). > > > > > > > Ian. > > > > > > Ben Clifford wrote: > > > > > > > it seems superficially to say "we shouldn't be forcing our execution > > > > management semantics on the underlying execution system"; which perhaps > > > > means both that we shouldn't be rate limiting but also that we shouldn't > > > > be doing retries and things like that - we either expect someone else to > > > > deal with this stuff or we do it ourselves. > > > > > > > > > > > > > > > > > > -- > > Ian Foster, Director, Computation Institute > Argonne National Laboratory & University of Chicago > Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 > Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 > Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. > Globus Alliance: www.globus.org. From iraicu at cs.uchicago.edu Tue Aug 28 16:43:42 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 28 Aug 2007 16:43:42 -0500 Subject: [Swift-devel] latest Falkon code is in SVN! Message-ID: <46D4970E.4000809@cs.uchicago.edu> Hi all, I finally have the latest Falkon code in SVN! Transmitting file data ...................................................................................... Committed revision 1126. To checkout Falkon (service, worker code, client code, GT4 container, web server, ploticus, etc...): svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon To compile everything: cd falkon ./make-falkon.sh The latest Falkon provider code has been in SVN for a while now. Assuming you have cog and swift: Cog: svn co https://cogkit.svn.sourceforge.net/svnroot/cogkit/trunk/current/src/cog Swift: cd cog/modules svn co https://svn.ci.uchicago.edu/svn/vdl2/trunk vdsk You can get the Falkon provider by: svn co https://svn.ci.uchicago.edu/svn/vdl2/provider-deef You can build the falkon provider by: cd provider-deef ant distclean ant -Ddist.dir=../vdsk/dist/vdsk-0.2-dev/ dist Mike, do you want to post these instructions on the Wiki? Ioan -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From grog at ci.uchicago.edu Tue Aug 14 18:33:07 2007 From: grog at ci.uchicago.edu (Greg Cross) Date: Tue, 14 Aug 2007 18:33:07 -0500 Subject: [Swift-devel] Re: cannot commit to SVN... In-Reply-To: <46C23A1F.5020808@cs.uchicago.edu> References: <46C23A1F.5020808@cs.uchicago.edu> Message-ID: <6B341F5C-BA01-4DC6-9222-50B3F2AC89D7@ci.uchicago.edu> Your original account application apparently did not include the CVS Resource. I have added you to the necessary netgroup; try again. -- G On Tue 14 Aug 2007, at 18:26, Ioan Raicu wrote: > Hi, > I can't commit to SVN at CI. Here is the output I get from trying > to commit a simple test! > > iraicu at viper:~/java/svn/falkon> svn ci test > > just testing > --This line, and those below, will be ignored-- > > A test > > "svn-commit.5.tmp" 4L, 72C > written > Adding test > Authentication realm: SVN Login > Password for 'iraicu': > svn: Commit failed (details follow): > svn: CHECKOUT of '/svn/vdl2/!svn/ver/1075/falkon': 401 > Authorization Required (https://svn.ci.uchicago.edu) > svn: Your commit message was left in a temporary file: > svn: '/home/iraicu/java/svn/falkon/svn-commit.5.tmp' > > Thanks, > Ioan From wilde at mcs.anl.gov Tue Aug 28 17:40:44 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 28 Aug 2007 17:40:44 -0500 Subject: [Swift-devel] Re: latest Falkon code is in SVN! In-Reply-To: <46D4970E.4000809@cs.uchicago.edu> References: <46D4970E.4000809@cs.uchicago.edu> Message-ID: <46D4A46C.6070806@mcs.anl.gov> Excellent - thanks Ioan! I will try it. Ioan Raicu wrote: > Hi all, > I finally have the latest Falkon code in SVN! > > Transmitting file data > ...................................................................................... > > Committed revision 1126. > > To checkout Falkon (service, worker code, client code, GT4 container, > web server, ploticus, etc...): > svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon > > To compile everything: > cd falkon > ./make-falkon.sh > > The latest Falkon provider code has been in SVN for a while now. > Assuming you have cog and swift: > Cog: > svn co > https://cogkit.svn.sourceforge.net/svnroot/cogkit/trunk/current/src/cog > > Swift: > cd cog/modules > svn co https://svn.ci.uchicago.edu/svn/vdl2/trunk vdsk > > You can get the Falkon provider by: > svn co https://svn.ci.uchicago.edu/svn/vdl2/provider-deef > > You can build the falkon provider by: > cd provider-deef > ant distclean > ant -Ddist.dir=../vdsk/dist/vdsk-0.2-dev/ dist > > Mike, do you want to post these instructions on the Wiki? Doing that right now. - Mike > > Ioan > From iraicu at cs.uchicago.edu Tue Aug 28 18:08:53 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Tue, 28 Aug 2007 18:08:53 -0500 Subject: [Swift-devel] Re: latest Falkon code is in SVN! In-Reply-To: <46D4A46C.6070806@mcs.anl.gov> References: <46D4970E.4000809@cs.uchicago.edu> <46D4A46C.6070806@mcs.anl.gov> Message-ID: <46D4AB05.9070900@cs.uchicago.edu> Hi, I forgot to mention how you can test that everything works once you build all the Falkon components: Shell #1: #comment: starts GT4 container on port 50001 cd falkon/service ./run.gpws_local.sh 50001 Shell #2: #comment: starts 1 worker on local machine in interactive mode, terminate by simply typing any key and hit enter cd falkon/worker ./run.worker-debug.sh 0 0 localhost 50001 etc/client-security-config.xml 1 Shell #3: #comment: starts the command line client that submits 10 sleep 1 tasks cd falkon/client ./run.user.file.sh localhost 50001 workloads/sleep/sleep_1 10 1 etc/client-security-config.xml I attached below the sample output of the command line client, if everything went OK! iraicu at gto:~/java/svn/falkon/client> ./run.user.file.sh localhost 50001 workloads/sleep/sleep_1 10 1 etc/client-security-config.xml Starting Falkon Command Line Client v0.8.1... Starting non-interactive mode.... Reading file: workloads/sleep/sleep_1... time 0.0050 pend_not_queue 0 tasks_recv 0 tasks_sent 0 completed 0.0 not_tp 0.0 tasks_tp 0.0 ETA ? time 1.007 pend_not_queue 0 tasks_recv 0 tasks_sent 10 completed 0.0 not_tp 0.0 tasks_tp 0.0 ETA ? time 2.012 pend_not_queue 0 tasks_recv 1 tasks_sent 10 completed 10.0 not_tp 0.0 tasks_tp 1.0 ETA 18.108 time 3.015 pend_not_queue 0 tasks_recv 2 tasks_sent 10 completed 20.0 not_tp 0.0 tasks_tp 1.0 ETA 12.064 time 4.019 pend_not_queue 0 tasks_recv 3 tasks_sent 10 completed 30.0 not_tp 0.0 tasks_tp 1.0 ETA 9.378 time 5.023 pend_not_queue 0 tasks_recv 4 tasks_sent 10 completed 40.0 not_tp 0.0 tasks_tp 1.0 ETA 7.535 time 6.027 pend_not_queue 0 tasks_recv 5 tasks_sent 10 completed 50.0 not_tp 0.0 tasks_tp 1.0 ETA 6.027 time 7.031 pend_not_queue 0 tasks_recv 6 tasks_sent 10 completed 60.0 not_tp 0.0 tasks_tp 1.0 ETA 4.687 time 8.035 pend_not_queue 0 tasks_recv 7 tasks_sent 10 completed 70.0 not_tp 0.0 tasks_tp 1.0 ETA 3.444 time 9.045 pend_not_queue 0 tasks_recv 8 tasks_sent 10 completed 80.0 not_tp 0.0 tasks_tp 0.99 ETA 2.261 time 10.047 pend_not_queue 0 tasks_recv 8 tasks_sent 10 completed 80.0 not_tp 0.0 tasks_tp 0.0 ETA 2.512 time 11.051 pend_not_queue 0 tasks_recv 9 tasks_sent 10 completed 90.0 not_tp 0.0 tasks_tp 1.0 ETA 1.228 time 11.196 pend_not_queue 0 tasks_recv 10 tasks_sent 10 completed 100.0 not_tp 0.0 tasks_tp 0.0 ETA 0.0 10 tasks completed in 11.197 sec Successful tasks: 10 Failed tasks: 0 Notification Errors: 0 Overall Throughput (tasks/sec): 0.89 For more serious stuff, you'll have to run the provisioner: vi falkon/worker/etc/Provisioner.config cd falkon/worker ./run.drp.sh etc/Provisioner.config 60 BTW, don't forget to update the falkon/worker/run.worker.sh script with the correct JAVA_HOME for the Grid site you will run on... ideally, this should not have to happen, but without this, the script doesn't run correctly at ANL/UC. OK, that is about it for now... Ioan Michael Wilde wrote: > Excellent - thanks Ioan! I will try it. > > Ioan Raicu wrote: >> Hi all, >> I finally have the latest Falkon code in SVN! >> >> Transmitting file data >> ...................................................................................... >> >> Committed revision 1126. >> >> To checkout Falkon (service, worker code, client code, GT4 container, >> web server, ploticus, etc...): >> svn co https://svn.ci.uchicago.edu/svn/vdl2/falkon >> >> To compile everything: >> cd falkon >> ./make-falkon.sh >> >> The latest Falkon provider code has been in SVN for a while now. >> Assuming you have cog and swift: >> Cog: >> svn co >> https://cogkit.svn.sourceforge.net/svnroot/cogkit/trunk/current/src/cog >> >> Swift: >> cd cog/modules >> svn co https://svn.ci.uchicago.edu/svn/vdl2/trunk vdsk >> >> You can get the Falkon provider by: >> svn co https://svn.ci.uchicago.edu/svn/vdl2/provider-deef >> >> You can build the falkon provider by: >> cd provider-deef >> ant distclean >> ant -Ddist.dir=../vdsk/dist/vdsk-0.2-dev/ dist >> >> Mike, do you want to post these instructions on the Wiki? > > Doing that right now. > > - Mike > >> >> Ioan >> > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From bugzilla-daemon at mcs.anl.gov Tue Aug 28 21:31:20 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Tue, 28 Aug 2007 21:31:20 -0500 (CDT) Subject: [Swift-devel] [Bug 88] New: functions inside the implicit single file mapper Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=88 Summary: functions inside the implicit single file mapper Product: Swift Version: unspecified Platform: PC OS/Version: Linux Status: NEW Severity: minor Priority: P2 Component: SwiftScript language AssignedTo: benc at hawaga.org.uk ReportedBy: hategan at mcs.anl.gov type name <@f()>; causes a parsing exception. It should be equivalent to ; -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are the assignee for the bug, or are watching the assignee. From benc at hawaga.org.uk Wed Aug 29 02:21:30 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 29 Aug 2007 07:21:30 +0000 (GMT) Subject: [Swift-devel] latest Falkon code is in SVN! In-Reply-To: <46D4970E.4000809@cs.uchicago.edu> References: <46D4970E.4000809@cs.uchicago.edu> Message-ID: On Tue, 28 Aug 2007, Ioan Raicu wrote: > Mike, do you want to post these instructions on the Wiki? I'd prefer that you keep the instructions in a README text file with the source code. That way, they have the same distribution and version control semantics as the source code - eg available in same place, available offline, versioned in same system as the code, available publicly. -- From benc at hawaga.org.uk Wed Aug 29 08:41:23 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 29 Aug 2007 13:41:23 +0000 (GMT) Subject: [Swift-devel] latest Falkon code is in SVN! In-Reply-To: <46D4970E.4000809@cs.uchicago.edu> References: <46D4970E.4000809@cs.uchicago.edu> Message-ID: During ./make-falkon.sh I noticed this: Compiling GramClient Compiling C Executor /usr/bin/ld: can't locate file for: -lcrt0.o collect2: ld returned 1 exit status It didn't halt the build. It probably should. Probably better to use make and/or ant as build language. -- From wilde at mcs.anl.gov Wed Aug 29 09:22:14 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 29 Aug 2007 09:22:14 -0500 Subject: [Swift-devel] latest Falkon code is in SVN! In-Reply-To: References: <46D4970E.4000809@cs.uchicago.edu> Message-ID: <46D58116.6090602@mcs.anl.gov> I didnt see this error wen compiling on terminable. Ben Clifford wrote: > During ./make-falkon.sh I noticed this: > > Compiling GramClient > Compiling C Executor > /usr/bin/ld: can't locate file for: -lcrt0.o > collect2: ld returned 1 exit status > > It didn't halt the build. It probably should. Probably better to use make > and/or ant as build language. > From benc at hawaga.org.uk Wed Aug 29 09:25:02 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Wed, 29 Aug 2007 14:25:02 +0000 (GMT) Subject: [Swift-devel] latest Falkon code is in SVN! In-Reply-To: <46D58116.6090602@mcs.anl.gov> References: <46D4970E.4000809@cs.uchicago.edu> <46D58116.6090602@mcs.anl.gov> Message-ID: Looks like --static doesn't work for gcc here - I get the same error trying to statically link hello world. On Wed, 29 Aug 2007, Michael Wilde wrote: > I didnt see this error wen compiling on terminable. > > Ben Clifford wrote: > > During ./make-falkon.sh I noticed this: > > > > Compiling GramClient > > Compiling C Executor > > /usr/bin/ld: can't locate file for: -lcrt0.o > > collect2: ld returned 1 exit status > > > > It didn't halt the build. It probably should. Probably better to use make > > and/or ant as build language. > > > > From iraicu at cs.uchicago.edu Wed Aug 29 09:53:41 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 29 Aug 2007 09:53:41 -0500 Subject: [Swift-devel] latest Falkon code is in SVN! In-Reply-To: References: <46D4970E.4000809@cs.uchicago.edu> Message-ID: <46D58875.1040702@cs.uchicago.edu> Sure, I'll update the readme file in SVN as well. Ioan Ben Clifford wrote: > On Tue, 28 Aug 2007, Ioan Raicu wrote: > > >> Mike, do you want to post these instructions on the Wiki? >> > > I'd prefer that you keep the instructions in a README text file with the > source code. That way, they have the same distribution and version control > semantics as the source code - eg available in same place, available > offline, versioned in same system as the code, available publicly. > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From iraicu at cs.uchicago.edu Wed Aug 29 10:01:15 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Wed, 29 Aug 2007 10:01:15 -0500 Subject: [Swift-devel] latest Falkon code is in SVN! In-Reply-To: References: <46D4970E.4000809@cs.uchicago.edu> Message-ID: <46D58A3B.8050702@cs.uchicago.edu> Right, it should have warned the user that the build failed. I don't know how to use the ant build language, and I don't have time right now to learn it. If anyone wants to take a stab at reworking my compile scripts into ant build scripts, I think that would be great! BTW, about the particular error you saw, the C Executor is a new experimental implementation that could replace the Java based worker code and WS-based communication protocol. If you simply remove the -static option, it will likely compile; the static option is generally helpful when you want to compile on one machine and take the binary elsewhere. By default, the Java executor code base is used, so even if the C Executor didn't compile, Falkon could still work from the Java code base. Ioan Ben Clifford wrote: > During ./make-falkon.sh I noticed this: > > Compiling GramClient > Compiling C Executor > /usr/bin/ld: can't locate file for: -lcrt0.o > collect2: ld returned 1 exit status > > It didn't halt the build. It probably should. Probably better to use make > and/or ant as build language. > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From bugzilla-daemon at mcs.anl.gov Wed Aug 29 19:39:38 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 29 Aug 2007 19:39:38 -0500 (CDT) Subject: [Swift-devel] [Bug 89] New: Use unique package names! Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=89 Summary: Use unique package names! Product: Swift Version: unspecified Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: hategan at mcs.anl.gov CC: benc at hawaga.org.uk org.gryphin.vdl.parser.VDLtParser.java (though it may not be the only one) exists both in Swift and VDS (whose jar file is in the Swift lib). So far, it seems, jar files were always looked up in the "right" order, and we've never seen the problem, but should the VDS class with the above name be loaded first, something like the following is produced: Caused by: java.lang.IllegalArgumentException: Can't load template globalVariable.st at org.antlr.stringtemplate.StringTemplateGroup.lookupTemplate(StringTemplateGroup.java:301) at org.antlr.stringtemplate.StringTemplateGroup.getInstanceOf(StringTemplateGroup.java:246) at org.griphyn.vdl.parser.VDLtParser.template(VDLtParser.java:34) at org.griphyn.vdl.parser.VDLtParser.variableDecl(VDLtParser.java:1090) at org.griphyn.vdl.parser.VDLtParser.declaration(VDLtParser.java:715) at org.griphyn.vdl.parser.VDLtParser.topLevelStatement(VDLtParser.java:253) at org.griphyn.vdl.parser.VDLtParser.program(VDLtParser.java:125) at org.griphyn.vdl.toolkit.VDLt2VDLx.compile(VDLt2VDLx.java:59) ... 40 more -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From bugzilla-daemon at mcs.anl.gov Wed Aug 29 20:04:19 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Wed, 29 Aug 2007 20:04:19 -0500 (CDT) Subject: [Swift-devel] [Bug 89] Use unique package names! In-Reply-To: Message-ID: <20070830010419.011F316505@foxtrot.mcs.anl.gov> http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=89 ------- Comment #1 from hategan at mcs.anl.gov 2007-08-29 20:04 ------- The solution is to refactor the conflicting packages. Given that any change in the package names will cause an equal amount of bad things, regardless of what the new name is, we now have the option of choosing any package name. Also, given that org.gryphin.vdl isn't technically the right prefix root for Swift packages anyway, there's the question of what the new one should be: 1. edu.uchicago.ci.swift? 2. gov.anl.mcs.swift? 3. org.globus.swift? 4. org.globus.cog.swift? :) Please vote. (PS: Only the absolutely necessary package names should be changed first, and we can analyze the implications of changing the other ones later, but there is this minimal conflicting set that needs to be changed) -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You are on the CC list for the bug, or are watching someone who is. From foster at mcs.anl.gov Wed Aug 29 23:51:49 2007 From: foster at mcs.anl.gov (Ian Foster) Date: Wed, 29 Aug 2007 23:51:49 -0500 Subject: [Swift-devel] [Bug 89] Use unique package names! In-Reply-To: <20070830010419.011F316505@foxtrot.mcs.anl.gov> References: <20070830010419.011F316505@foxtrot.mcs.anl.gov> Message-ID: <46D64CE5.2050703@mcs.anl.gov> as Swift is a Globus incubator, org.globus.swift looks good to me. bugzilla-daemon at mcs.anl.gov wrote: > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=89 > > > > > > ------- Comment #1 from hategan at mcs.anl.gov 2007-08-29 20:04 ------- > The solution is to refactor the conflicting packages. Given that any change in > the package names will cause an equal amount of bad things, regardless of what > the new name is, we now have the option of choosing any package name. > > Also, given that org.gryphin.vdl isn't technically the right prefix root for > Swift packages anyway, there's the question of what the new one should be: > > 1. edu.uchicago.ci.swift? > 2. gov.anl.mcs.swift? > 3. org.globus.swift? > 4. org.globus.cog.swift? :) > > Please vote. > > (PS: Only the absolutely necessary package names should be changed first, and > we can analyze the implications of changing the other ones later, but there is > this minimal conflicting set that needs to be changed) > > > -- Ian Foster, Director, Computation Institute Argonne National Laboratory & University of Chicago Argonne: MCS/221, 9700 S. Cass Ave, Argonne, IL 60439 Chicago: Rm 405, 5640 S. Ellis Ave, Chicago, IL 60637 Tel: +1 630 252 4619. Web: www.ci.uchicago.edu. Globus Alliance: www.globus.org. From hategan at mcs.anl.gov Thu Aug 30 00:12:18 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 00:12:18 -0500 Subject: [Swift-devel] latest Falkon code is in SVN! In-Reply-To: <46D58A3B.8050702@cs.uchicago.edu> References: <46D4970E.4000809@cs.uchicago.edu> <46D58A3B.8050702@cs.uchicago.edu> Message-ID: <1188450738.21808.25.camel@blabla.mcs.anl.gov> On Wed, 2007-08-29 at 10:01 -0500, Ioan Raicu wrote: > Right, it should have warned the user that the build failed. I don't > know how to use the ant build language, and I don't have time right now > to learn it. If anyone wants to take a stab at reworking my compile > scripts into ant build scripts, I think that would be great! Great, but unlikely. I doubt that any of us have appreciably more time than you do. It is probably a bit unwise to count on the adoption of Falkon if you rely on people understanding its internal details, and doing many little things here and there. The little things tend to add up to a lot. In our case the choice would boil down to the amount of time we spend on Swift vs. the amount of time we spend on non-Swift. Should Swift be perfect, the choice would be easy. But it, too, needs many little things done here and there. And while there is undeniable value in what Falkon does, it is not a blank check. Mihael > > BTW, about the particular error you saw, the C Executor is a new > experimental implementation that could replace the Java based worker > code and WS-based communication protocol. If you simply remove the > -static option, it will likely compile; the static option is generally > helpful when you want to compile on one machine and take the binary > elsewhere. By default, the Java executor code base is used, so even if > the C Executor didn't compile, Falkon could still work from the Java > code base. > > Ioan > > Ben Clifford wrote: > > During ./make-falkon.sh I noticed this: > > > > Compiling GramClient > > Compiling C Executor > > /usr/bin/ld: can't locate file for: -lcrt0.o > > collect2: ld returned 1 exit status > > > > It didn't halt the build. It probably should. Probably better to use make > > and/or ant as build language. > > > > > From hategan at mcs.anl.gov Thu Aug 30 00:16:05 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 00:16:05 -0500 Subject: [Swift-devel] [Bug 89] Use unique package names! In-Reply-To: <46D64CE5.2050703@mcs.anl.gov> References: <20070830010419.011F316505@foxtrot.mcs.anl.gov> <46D64CE5.2050703@mcs.anl.gov> Message-ID: <1188450966.21808.29.camel@blabla.mcs.anl.gov> That seems like the most reasonable choice. On Wed, 2007-08-29 at 23:51 -0500, Ian Foster wrote: > as Swift is a Globus incubator, org.globus.swift looks good to me. > > bugzilla-daemon at mcs.anl.gov wrote: > > http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=89 > > > > > > > > > > > > ------- Comment #1 from hategan at mcs.anl.gov 2007-08-29 20:04 ------- > > The solution is to refactor the conflicting packages. Given that any change in > > the package names will cause an equal amount of bad things, regardless of what > > the new name is, we now have the option of choosing any package name. > > > > Also, given that org.gryphin.vdl isn't technically the right prefix root for > > Swift packages anyway, there's the question of what the new one should be: > > > > 1. edu.uchicago.ci.swift? > > 2. gov.anl.mcs.swift? > > 3. org.globus.swift? > > 4. org.globus.cog.swift? :) > > > > Please vote. > > > > (PS: Only the absolutely necessary package names should be changed first, and > > we can analyze the implications of changing the other ones later, but there is > > this minimal conflicting set that needs to be changed) > > > > > > > From iraicu at cs.uchicago.edu Thu Aug 30 00:22:24 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Thu, 30 Aug 2007 00:22:24 -0500 Subject: [Swift-devel] latest Falkon code is in SVN! In-Reply-To: <1188450738.21808.25.camel@blabla.mcs.anl.gov> References: <46D4970E.4000809@cs.uchicago.edu> <46D58A3B.8050702@cs.uchicago.edu> <1188450738.21808.25.camel@blabla.mcs.anl.gov> Message-ID: <46D65410.4040505@cs.uchicago.edu> Thats fine... the current build scripts are more than enough for me at the moment. If this changes in the future, I am sure I or someone else will adapt them accordingly. Ioan Mihael Hategan wrote: > On Wed, 2007-08-29 at 10:01 -0500, Ioan Raicu wrote: > >> Right, it should have warned the user that the build failed. I don't >> know how to use the ant build language, and I don't have time right now >> to learn it. If anyone wants to take a stab at reworking my compile >> scripts into ant build scripts, I think that would be great! >> > > Great, but unlikely. I doubt that any of us have appreciably more time > than you do. It is probably a bit unwise to count on the adoption of > Falkon if you rely on people understanding its internal details, and > doing many little things here and there. The little things tend to add > up to a lot. In our case the choice would boil down to the amount of > time we spend on Swift vs. the amount of time we spend on non-Swift. > Should Swift be perfect, the choice would be easy. But it, too, needs > many little things done here and there. And while there is undeniable > value in what Falkon does, it is not a blank check. > > Mihael > > >> BTW, about the particular error you saw, the C Executor is a new >> experimental implementation that could replace the Java based worker >> code and WS-based communication protocol. If you simply remove the >> -static option, it will likely compile; the static option is generally >> helpful when you want to compile on one machine and take the binary >> elsewhere. By default, the Java executor code base is used, so even if >> the C Executor didn't compile, Falkon could still work from the Java >> code base. >> >> Ioan >> >> Ben Clifford wrote: >> >>> During ./make-falkon.sh I noticed this: >>> >>> Compiling GramClient >>> Compiling C Executor >>> /usr/bin/ld: can't locate file for: -lcrt0.o >>> collect2: ld returned 1 exit status >>> >>> It didn't halt the build. It probably should. Probably better to use make >>> and/or ant as build language. >>> >>> >>> > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ -------------- next part -------------- An HTML attachment was scrubbed... URL: From benc at hawaga.org.uk Thu Aug 30 07:24:49 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 30 Aug 2007 12:24:49 +0000 (GMT) Subject: [Swift-devel] Re: [Bug 89] Use unique package names! In-Reply-To: <20070830010420.0B0DB16506@foxtrot.mcs.anl.gov> References: <20070830010420.0B0DB16506@foxtrot.mcs.anl.gov> Message-ID: On Wed, 29 Aug 2007, bugzilla-daemon at mcs.anl.gov wrote: > 1. edu.uchicago.ci.swift? > 2. gov.anl.mcs.swift? > 3. org.globus.swift? > 4. org.globus.cog.swift? :) > > Please vote. Note that the swift group doesn't own any of those names so really should be co-ordinated with whoever. I'd prefer option 3. -- From benc at hawaga.org.uk Thu Aug 30 07:41:10 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 30 Aug 2007 12:41:10 +0000 (GMT) Subject: [Swift-devel] svn info in displayed version info Message-ID: As of r1141, swift will display SVN revision number and an attempt to guess whether the source has been modified from SVN. This introduces a built dependency on SVN, but I don't think anyone builds without SVN around. -- From wilde at mcs.anl.gov Thu Aug 30 08:06:01 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 30 Aug 2007 08:06:01 -0500 Subject: [Swift-devel] svn info in displayed version info In-Reply-To: References: Message-ID: <46D6C0B9.1050808@mcs.anl.gov> Display when it starts? Check for changes at build time? Is there (or should we make) a --version option? Ben Clifford wrote: > As of r1141, swift will display SVN revision number and an attempt to > guess whether the source has been modified from SVN. > > This introduces a built dependency on SVN, but I don't think anyone builds > without SVN around. > From benc at hawaga.org.uk Thu Aug 30 08:08:14 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Thu, 30 Aug 2007 13:08:14 +0000 (GMT) Subject: [Swift-devel] svn info in displayed version info In-Reply-To: <46D6C0B9.1050808@mcs.anl.gov> References: <46D6C0B9.1050808@mcs.anl.gov> Message-ID: On Thu, 30 Aug 2007, Michael Wilde wrote: > Display when it starts? Check for changes at build time? At the start of execution, where it previously displayed v0.2-dev it now says something like this: $ swift foo.swift Swift v0.2-dev r1141 (modified locally) RunID: esdqkazt5fxe2 -- From hategan at mcs.anl.gov Thu Aug 30 08:26:14 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 08:26:14 -0500 Subject: [Swift-devel] svn info in displayed version info In-Reply-To: References: Message-ID: <1188480374.26157.0.camel@blabla.mcs.anl.gov> On Thu, 2007-08-30 at 12:41 +0000, Ben Clifford wrote: > As of r1141, swift will display SVN revision number and an attempt to > guess whether the source has been modified from SVN. It's ok. Without SVN Swift does not become incorrect, but slightly (more) inconvenient. > > This introduces a built dependency on SVN, but I don't think anyone builds > without SVN around. > From hategan at mcs.anl.gov Thu Aug 30 11:30:51 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 11:30:51 -0500 Subject: [Swift-devel] Re: [Swift-user] Resending: I/O errors in swift script In-Reply-To: <46D6E892.5000706@mcs.anl.gov> References: <46D6C6B9.6020708@mcs.anl.gov> <1188483739.27541.8.camel@blabla.mcs.anl.gov> <46D6E892.5000706@mcs.anl.gov> Message-ID: <1188491451.31884.27.camel@blabla.mcs.anl.gov> Note: moved to swift-devel. On Thu, 2007-08-30 at 10:56 -0500, Michael Wilde wrote: > Great - thanks. That was indeed the problem: my application script had > a typo and was trying to run the 32-bit binary regardless what processor > type it wound up on. When I last run successfully, I was getting most > or all i686 machines; this time I was getting ia64 machines. > > I'll try to re-run it w/o debug, and see if the messages need improvement. There is no translation for the cryptic missing file message I know of, so I doubt that will improve. > > Kickstart would have helped here - would have told me that Im running on > ia64. What stops you from enabling it? > > This is the kind of problem that on a local machine would have been > recognizable instantly but on a remote machine through swift, karajan, > globus and PBS is a much greater challenge to diagnose. We should think > in terms of how to make that long pipeline to the remote execution > environment much more transparent to the user. I don't think It's the long pipeline that is the problem, but the fact that the assumptions that you can usually make about your local machine don't hold for a random machine out there. Moreover, they change depending on where your job happens to run, whereas your machine stays the same. We can improve things, I hope, and for that we need concrete ideas. > > Think: "what would I see if I ran this locally" and "how do I bring that > environment to the swift user"? You can't bring that environment to the swift user. Remote != local, and it may take a long time until it will be if at all. Question is "what is a useful set of things/information to troubleshoot such problems and how do we get that without compromising other things too much". > > Also noted that: > > - the retry logic here did more harm than good. Can you be more specific? > Maybe we want the > default for this to be off, especially during debugging. That, I'm guessing, could be added as an option. > > - in my latest run, which succeeded, the final job completion was > excessively delayed. The output files were all back on the submit host, > 4 of 5 jobs were logged as completed, and the completion of the final > job seemed to take a few minutes longer. > > I'll work through the error logs more closely and file an enhancement > request in bugz. > > I can batch these for later discussion or bring them as I encounter > things, whatever people prefer. I dont want to distract anyone at the > moment into long discssions on these; I'll organize them into bug > reports and enhancement requests and file for discussion when we next > review priorities. > > Ian was suggesting that this be soon - now is when we need to pick the > next features for you to work on, Ben and Mihael. Maybe a review of > bugs and requests next week, which can be started by email discussion, > and we'll note which topics needs voice or f2f discussion. Action items! Yummy. Mihael > > - Mike > > > Mihael Hategan wrote: > > Ok. You have a bunch of errors, mainly of two types: > > 1. Missing output file (we should add a rule in error.properties to make > > that verbose message a little more readable). This may be because the > > application didn't run or because the filesystem is broken. Right now an > > exit code file is produced by the wrapper only if the exit code of the > > application is not 0. This does not allow telling between the > > application having completed successfully or the filesystem being > > broken. I believe that a stamp file should also be created by the > > wrapper in order to distinguish between the two. The reason for the > > stamp file instead of always having an exit code file is that it is more > > efficient to check the existence of a file than to stage it out and look > > at its contents. > > > > 2. Exit code != 0. Looks like some issues with R. > > > > Mihael > > > > On Thu, 2007-08-30 at 08:31 -0500, Michael Wilde wrote: > >> Resending this after changing list to take larger attachments. > >> Previous message seems to have gotten lost (I musta pressed the wrong > >> button in the list manager?) > >> > >> --- > >> > >> I'm progressing on the angle runs. Previous errors were due to problems > >> with svn update, and then apparently needing ant clean and distclean. > >> > >> Now I'm executing but getting I/O errors. Ive attached all the logs and > >> output from this run. > >> > >> My result files are coming back zero-length and Im seeing I/O errors in > >> the logs (eg, in swift.out): > >> > >> ... > >> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to > >> SubmittedTask(type=2, identity=urn:0-0-6-0-1-1188429807121) setting > >> status to Active > >> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to Active > >> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to > >> Failed Exception in getFile > >> > >> ... > >> > >> My suspcion is that the app is failing and not proucing an expected > >> output file. Perhaps theres a clean error in the log that says this but > >> I havent found it yet. I think I saw error #500's from gridftp in the log. > >> > >> While I debug further, if anyone sees a different or obvious cause, I'd > >> appreciate your eyeballs on it. > >> > >> Thanks, > >> > >> Mike > >> > >> _______________________________________________ > >> Swift-user mailing list > >> Swift-user at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > From wilde at mcs.anl.gov Thu Aug 30 12:22:10 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 30 Aug 2007 12:22:10 -0500 Subject: [Swift-devel] Re: [Swift-user] Resending: I/O errors in swift script In-Reply-To: <46D6E892.5000706@mcs.anl.gov> References: <46D6C6B9.6020708@mcs.anl.gov> <1188483739.27541.8.camel@blabla.mcs.anl.gov> <46D6E892.5000706@mcs.anl.gov> Message-ID: <46D6FCC2.7000007@mcs.anl.gov> Following up on this, Mihael, you said: >> 2. Exit code != 0. Looks like some issues with R. I dont see where in the logs you observed that the jobs were failing. I think that would have tipped me off earlier that I have an app problem. I must be looking in the wrong place. I redirected stdout and stderr into a file, starting swift like this: $ swift -debug awf2.swift >swift.out 2>&1 & from which I get the following logs when all is done: $ wc -l *log *out 1 awf2-rm4p72i7lp0r0.0.rlog 1322 awf2-rm4p72i7lp0r0.log 1 swift.log 1400 swift.out 2724 total $ The awf2*.log file seems to be more or less a timestamped version of stdout/err. (Interesting to note where the extra lines are going that are in swift.out but not in awf2*.log, though. ) In the .log file I see the text that Ive excerpted below. I think the following impovements could be made and wonder if you agree: - Clearly show job exit code (I still dont see this) - Use mnemonic codes for task types (rather than 1,2...) - for the logs, map task URNs to simple integers; display the mapping up front - Mike 2007-08-29 18:23:53,895 INFO vdl:dostagein Staged in pc1.pcap to awf2-rm4p72i7lp0r0/shared/ on UC 2007-08-29 18:23:53,896 INFO vdl:execute2 Running job angle4-h2fjbhgi angle4 with arguments [pc1.pcap, of-75398839-775c-40ac-bd5c-49275e3269d5-0-1, cf-a8272a9e-0f23-472f-8b4e-9f7825877a5a-0-1] in awf2-rm4p72i7lp0r0/angle4-h2fjbhgi on\ UC 2007-08-29 18:23:54,078 DEBUG TaskImpl Task(type=1, identity=urn:0-0-3-0-1188429807105) setting status to Submitted 2007-08-29 18:23:54,943 DEBUG TaskImpl Task(type=1, identity=urn:0-0-2-0-1188429807107) setting status to Submitted 2007-08-29 18:23:55,364 DEBUG TaskImpl Task(type=1, identity=urn:0-0-6-0-1188429807109) setting status to Submitted 2007-08-29 18:23:55,503 DEBUG TaskImpl Task(type=1, identity=urn:0-0-3-0-1188429807105) setting status to Active 2007-08-29 18:23:57,057 DEBUG TaskImpl Task(type=2, identity=urn:0-0-1-0-1-1188429807096) setting status to Completed ... 2007-08-29 18:23:58,117 DEBUG TaskImpl Task(type=1, identity=urn:0-0-1-0-1188429807111) setting status to Submitted 2007-08-29 18:24:01,480 DEBUG TaskImpl Task(type=1, identity=urn:0-0-4-0-1188429807103) setting status to Active 2007-08-29 18:24:06,322 DEBUG TaskImpl Task(type=1, identity=urn:0-0-2-0-1188429807107) setting status to Active 2007-08-29 18:24:06,727 DEBUG TaskImpl Task(type=1, identity=urn:0-0-6-0-1188429807109) setting status to Completed 2007-08-29 18:24:06,729 DEBUG TaskImpl Task(type=4, identity=urn:0-0-6-0-1188429807113) setting status to Active 2007-08-29 18:24:06,734 DEBUG TaskImpl Task(type=4, identity=urn:0-0-6-0-1188429807113) setting status to Completed 2007-08-29 18:24:06,735 INFO vdl:execute2 Completed job angle4-h2fjbhgi angle4 with arguments [pc1.pcap, of-75398839-775c-40ac-bd5c-49275e3269d5-0-1, cf-a8272a9e-0f23-472f-8b4e-9f7825877a5a-0-1] on UC 2007-08-29 18:24:06,744 INFO vdl:dostageout Staging out awf2-rm4p72i7lp0r0/shared/of-75398839-775c-40ac-bd5c-49275e3269d5-0-1 to file://localhost/of-75398839-775c-40ac-bd5c-49275e3269d5-0-1 from UC 2007-08-29 18:24:06,744 INFO vdl:dostageout Staging out awf2-rm4p72i7lp0r0/shared/cf-a8272a9e-0f23-472f-8b4e-9f7825877a5a-0-1 to file://localhost/cf-a8272a9e-0f23-472f-8b4e-9f7825877a5a-0-1 from UC 2007-08-29 18:24:06,745 DEBUG TaskImpl Task(type=4, identity=urn:0-0-6-0-1-1188429807115) setting status to Active Michael Wilde wrote: > Great - thanks. That was indeed the problem: my application script had > a typo and was trying to run the 32-bit binary regardless what processor > type it wound up on. When I last run successfully, I was getting most > or all i686 machines; this time I was getting ia64 machines. > > I'll try to re-run it w/o debug, and see if the messages need improvement. > > Kickstart would have helped here - would have told me that Im running on > ia64. > > This is the kind of problem that on a local machine would have been > recognizable instantly but on a remote machine through swift, karajan, > globus and PBS is a much greater challenge to diagnose. We should think > in terms of how to make that long pipeline to the remote execution > environment much more transparent to the user. > > Think: "what would I see if I ran this locally" and "how do I bring that > environment to the swift user"? > > Also noted that: > > - the retry logic here did more harm than good. Maybe we want the > default for this to be off, especially during debugging. > > - in my latest run, which succeeded, the final job completion was > excessively delayed. The output files were all back on the submit host, > 4 of 5 jobs were logged as completed, and the completion of the final > job seemed to take a few minutes longer. > > I'll work through the error logs more closely and file an enhancement > request in bugz. > > I can batch these for later discussion or bring them as I encounter > things, whatever people prefer. I dont want to distract anyone at the > moment into long discssions on these; I'll organize them into bug > reports and enhancement requests and file for discussion when we next > review priorities. > > Ian was suggesting that this be soon - now is when we need to pick the > next features for you to work on, Ben and Mihael. Maybe a review of > bugs and requests next week, which can be started by email discussion, > and we'll note which topics needs voice or f2f discussion. > > - Mike > > > Mihael Hategan wrote: >> Ok. You have a bunch of errors, mainly of two types: >> 1. Missing output file (we should add a rule in error.properties to make >> that verbose message a little more readable). This may be because the >> application didn't run or because the filesystem is broken. Right now an >> exit code file is produced by the wrapper only if the exit code of the >> application is not 0. This does not allow telling between the >> application having completed successfully or the filesystem being >> broken. I believe that a stamp file should also be created by the >> wrapper in order to distinguish between the two. The reason for the >> stamp file instead of always having an exit code file is that it is more >> efficient to check the existence of a file than to stage it out and look >> at its contents. >> >> 2. Exit code != 0. Looks like some issues with R. >> >> Mihael >> >> On Thu, 2007-08-30 at 08:31 -0500, Michael Wilde wrote: >>> Resending this after changing list to take larger attachments. >>> Previous message seems to have gotten lost (I musta pressed the wrong >>> button in the list manager?) >>> >>> --- >>> >>> I'm progressing on the angle runs. Previous errors were due to problems >>> with svn update, and then apparently needing ant clean and distclean. >>> >>> Now I'm executing but getting I/O errors. Ive attached all the logs and >>> output from this run. >>> >>> My result files are coming back zero-length and Im seeing I/O errors in >>> the logs (eg, in swift.out): >>> >>> ... >>> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to >>> SubmittedTask(type=2, identity=urn:0-0-6-0-1-1188429807121) setting >>> status to Active >>> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to >>> Active >>> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to >>> Failed Exception in getFile >>> >>> ... >>> >>> My suspcion is that the app is failing and not proucing an expected >>> output file. Perhaps theres a clean error in the log that says this but >>> I havent found it yet. I think I saw error #500's from gridftp in >>> the log. >>> >>> While I debug further, if anyone sees a different or obvious cause, I'd >>> appreciate your eyeballs on it. >>> >>> Thanks, >>> >>> Mike >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> >> > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > From wilde at mcs.anl.gov Thu Aug 30 12:36:38 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 30 Aug 2007 12:36:38 -0500 Subject: [Swift-devel] Re: [Swift-user] Resending: I/O errors in swift script In-Reply-To: <46D6FCC2.7000007@mcs.anl.gov> References: <46D6C6B9.6020708@mcs.anl.gov> <1188483739.27541.8.camel@blabla.mcs.anl.gov> <46D6E892.5000706@mcs.anl.gov> <46D6FCC2.7000007@mcs.anl.gov> Message-ID: <46D70026.9030201@mcs.anl.gov> i should note that i did se the jobs getting retried, but no clear indication of what their exit code was. Michael Wilde wrote: > Following up on this, Mihael, you said: > > >> 2. Exit code != 0. Looks like some issues with R. > > I dont see where in the logs you observed that the jobs were failing. I > think that would have tipped me off earlier that I have an app problem. > > I must be looking in the wrong place. I redirected stdout and stderr > into a file, starting swift like this: > > $ swift -debug awf2.swift >swift.out 2>&1 & > > from which I get the following logs when all is done: > > $ wc -l *log *out > 1 awf2-rm4p72i7lp0r0.0.rlog > 1322 awf2-rm4p72i7lp0r0.log > 1 swift.log > 1400 swift.out > 2724 total > $ > > The awf2*.log file seems to be more or less a timestamped version of > stdout/err. (Interesting to note where the extra lines are going that > are in swift.out but not in awf2*.log, though. ) > > In the .log file I see the text that Ive excerpted below. I think the > following impovements could be made and wonder if you agree: > > - Clearly show job exit code (I still dont see this) > - Use mnemonic codes for task types (rather than 1,2...) > - for the logs, map task URNs to simple integers; > display the mapping up front > > - Mike > > > 2007-08-29 18:23:53,895 INFO vdl:dostagein Staged in pc1.pcap to > awf2-rm4p72i7lp0r0/shared/ on UC > 2007-08-29 18:23:53,896 INFO vdl:execute2 Running job angle4-h2fjbhgi > angle4 with arguments [pc1.pcap, > of-75398839-775c-40ac-bd5c-49275e3269d5-0-1, > cf-a8272a9e-0f23-472f-8b4e-9f7825877a5a-0-1] in > awf2-rm4p72i7lp0r0/angle4-h2fjbhgi on\ > UC > 2007-08-29 18:23:54,078 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-3-0-1188429807105) setting status to Submitted > 2007-08-29 18:23:54,943 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-2-0-1188429807107) setting status to Submitted > 2007-08-29 18:23:55,364 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-6-0-1188429807109) setting status to Submitted > 2007-08-29 18:23:55,503 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-3-0-1188429807105) setting status to Active > 2007-08-29 18:23:57,057 DEBUG TaskImpl Task(type=2, > identity=urn:0-0-1-0-1-1188429807096) setting status to Completed > ... > 2007-08-29 18:23:58,117 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-1-0-1188429807111) setting status to Submitted > 2007-08-29 18:24:01,480 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-4-0-1188429807103) setting status to Active > 2007-08-29 18:24:06,322 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-2-0-1188429807107) setting status to Active > 2007-08-29 18:24:06,727 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-6-0-1188429807109) setting status to Completed > 2007-08-29 18:24:06,729 DEBUG TaskImpl Task(type=4, > identity=urn:0-0-6-0-1188429807113) setting status to Active > 2007-08-29 18:24:06,734 DEBUG TaskImpl Task(type=4, > identity=urn:0-0-6-0-1188429807113) setting status to Completed > 2007-08-29 18:24:06,735 INFO vdl:execute2 Completed job angle4-h2fjbhgi > angle4 with arguments [pc1.pcap, > of-75398839-775c-40ac-bd5c-49275e3269d5-0-1, > cf-a8272a9e-0f23-472f-8b4e-9f7825877a5a-0-1] on UC > 2007-08-29 18:24:06,744 INFO vdl:dostageout Staging out > awf2-rm4p72i7lp0r0/shared/of-75398839-775c-40ac-bd5c-49275e3269d5-0-1 to > file://localhost/of-75398839-775c-40ac-bd5c-49275e3269d5-0-1 from UC > 2007-08-29 18:24:06,744 INFO vdl:dostageout Staging out > awf2-rm4p72i7lp0r0/shared/cf-a8272a9e-0f23-472f-8b4e-9f7825877a5a-0-1 to > file://localhost/cf-a8272a9e-0f23-472f-8b4e-9f7825877a5a-0-1 from UC > 2007-08-29 18:24:06,745 DEBUG TaskImpl Task(type=4, > identity=urn:0-0-6-0-1-1188429807115) setting status to Active > > > Michael Wilde wrote: >> Great - thanks. That was indeed the problem: my application script >> had a typo and was trying to run the 32-bit binary regardless what >> processor type it wound up on. When I last run successfully, I was >> getting most or all i686 machines; this time I was getting ia64 machines. >> >> I'll try to re-run it w/o debug, and see if the messages need >> improvement. >> >> Kickstart would have helped here - would have told me that Im running >> on ia64. >> >> This is the kind of problem that on a local machine would have been >> recognizable instantly but on a remote machine through swift, karajan, >> globus and PBS is a much greater challenge to diagnose. We should >> think in terms of how to make that long pipeline to the remote >> execution environment much more transparent to the user. >> >> Think: "what would I see if I ran this locally" and "how do I bring >> that environment to the swift user"? >> >> Also noted that: >> >> - the retry logic here did more harm than good. Maybe we want the >> default for this to be off, especially during debugging. >> >> - in my latest run, which succeeded, the final job completion was >> excessively delayed. The output files were all back on the submit >> host, 4 of 5 jobs were logged as completed, and the completion of the >> final job seemed to take a few minutes longer. >> >> I'll work through the error logs more closely and file an enhancement >> request in bugz. >> >> I can batch these for later discussion or bring them as I encounter >> things, whatever people prefer. I dont want to distract anyone at the >> moment into long discssions on these; I'll organize them into bug >> reports and enhancement requests and file for discussion when we next >> review priorities. >> >> Ian was suggesting that this be soon - now is when we need to pick the >> next features for you to work on, Ben and Mihael. Maybe a review of >> bugs and requests next week, which can be started by email discussion, >> and we'll note which topics needs voice or f2f discussion. >> >> - Mike >> >> >> Mihael Hategan wrote: >>> Ok. You have a bunch of errors, mainly of two types: >>> 1. Missing output file (we should add a rule in error.properties to make >>> that verbose message a little more readable). This may be because the >>> application didn't run or because the filesystem is broken. Right now an >>> exit code file is produced by the wrapper only if the exit code of the >>> application is not 0. This does not allow telling between the >>> application having completed successfully or the filesystem being >>> broken. I believe that a stamp file should also be created by the >>> wrapper in order to distinguish between the two. The reason for the >>> stamp file instead of always having an exit code file is that it is more >>> efficient to check the existence of a file than to stage it out and look >>> at its contents. >>> >>> 2. Exit code != 0. Looks like some issues with R. >>> >>> Mihael >>> >>> On Thu, 2007-08-30 at 08:31 -0500, Michael Wilde wrote: >>>> Resending this after changing list to take larger attachments. >>>> Previous message seems to have gotten lost (I musta pressed the >>>> wrong button in the list manager?) >>>> >>>> --- >>>> >>>> I'm progressing on the angle runs. Previous errors were due to problems >>>> with svn update, and then apparently needing ant clean and distclean. >>>> >>>> Now I'm executing but getting I/O errors. Ive attached all the logs >>>> and >>>> output from this run. >>>> >>>> My result files are coming back zero-length and Im seeing I/O errors in >>>> the logs (eg, in swift.out): >>>> >>>> ... >>>> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to >>>> SubmittedTask(type=2, identity=urn:0-0-6-0-1-1188429807121) setting >>>> status to Active >>>> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to >>>> Active >>>> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to >>>> Failed Exception in getFile >>>> >>>> ... >>>> >>>> My suspcion is that the app is failing and not proucing an expected >>>> output file. Perhaps theres a clean error in the log that says this >>>> but >>>> I havent found it yet. I think I saw error #500's from gridftp in >>>> the log. >>>> >>>> While I debug further, if anyone sees a different or obvious cause, I'd >>>> appreciate your eyeballs on it. >>>> >>>> Thanks, >>>> >>>> Mike >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >>> >>> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From wilde at mcs.anl.gov Thu Aug 30 12:43:03 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 30 Aug 2007 12:43:03 -0500 Subject: [Swift-devel] Re: [Swift-user] Resending: I/O errors in swift script In-Reply-To: <1188491451.31884.27.camel@blabla.mcs.anl.gov> References: <46D6C6B9.6020708@mcs.anl.gov> <1188483739.27541.8.camel@blabla.mcs.anl.gov> <46D6E892.5000706@mcs.anl.gov> <1188491451.31884.27.camel@blabla.mcs.anl.gov> Message-ID: <46D701A7.4080409@mcs.anl.gov> Mihael Hategan wrote: > Note: moved to swift-devel. > > On Thu, 2007-08-30 at 10:56 -0500, Michael Wilde wrote: >> Great - thanks. That was indeed the problem: my application script had >> a typo and was trying to run the 32-bit binary regardless what processor >> type it wound up on. When I last run successfully, I was getting most >> or all i686 machines; this time I was getting ia64 machines. >> >> I'll try to re-run it w/o debug, and see if the messages need improvement. > > There is no translation for the cryptic missing file message I know of, > so I doubt that will improve. > >> Kickstart would have helped here - would have told me that Im running on >> ia64. > > What stops you from enabling it? Nothing - that was just an observation. I'll try it once I get comfortable with how the default options behave. > >> This is the kind of problem that on a local machine would have been >> recognizable instantly but on a remote machine through swift, karajan, >> globus and PBS is a much greater challenge to diagnose. We should think >> in terms of how to make that long pipeline to the remote execution >> environment much more transparent to the user. > > I don't think It's the long pipeline that is the problem, but the fact > that the assumptions that you can usually make about your local machine > don't hold for a random machine out there. Moreover, they change > depending on where your job happens to run, whereas your machine stays > the same. We can improve things, I hope, and for that we need concrete > ideas. > >> Think: "what would I see if I ran this locally" and "how do I bring that >> environment to the swift user"? > > You can't bring that environment to the swift user. Remote != local, and > it may take a long time until it will be if at all. Question is "what is > a useful set of things/information to troubleshoot such problems and how > do we get that without compromising other things too much". > >> Also noted that: >> >> - the retry logic here did more harm than good. > > Can you be more specific? In this case there was a script error. Every retry that wound up on an IA64 host would fail. But there was no feedback on this aspect of the runtime environment. I suspect a better default is "stop the workflow on first failure", then let the user re-run till the wf is considered "debugged" and then let the user set how things should be retried. - Mike > >> Maybe we want the >> default for this to be off, especially during debugging. > > That, I'm guessing, could be added as an option. > >> - in my latest run, which succeeded, the final job completion was >> excessively delayed. The output files were all back on the submit host, >> 4 of 5 jobs were logged as completed, and the completion of the final >> job seemed to take a few minutes longer. >> >> I'll work through the error logs more closely and file an enhancement >> request in bugz. >> >> I can batch these for later discussion or bring them as I encounter >> things, whatever people prefer. I dont want to distract anyone at the >> moment into long discssions on these; I'll organize them into bug >> reports and enhancement requests and file for discussion when we next >> review priorities. >> >> Ian was suggesting that this be soon - now is when we need to pick the >> next features for you to work on, Ben and Mihael. Maybe a review of >> bugs and requests next week, which can be started by email discussion, >> and we'll note which topics needs voice or f2f discussion. > > Action items! Yummy. > > Mihael > >> - Mike >> >> >> Mihael Hategan wrote: >>> Ok. You have a bunch of errors, mainly of two types: >>> 1. Missing output file (we should add a rule in error.properties to make >>> that verbose message a little more readable). This may be because the >>> application didn't run or because the filesystem is broken. Right now an >>> exit code file is produced by the wrapper only if the exit code of the >>> application is not 0. This does not allow telling between the >>> application having completed successfully or the filesystem being >>> broken. I believe that a stamp file should also be created by the >>> wrapper in order to distinguish between the two. The reason for the >>> stamp file instead of always having an exit code file is that it is more >>> efficient to check the existence of a file than to stage it out and look >>> at its contents. >>> >>> 2. Exit code != 0. Looks like some issues with R. >>> >>> Mihael >>> >>> On Thu, 2007-08-30 at 08:31 -0500, Michael Wilde wrote: >>>> Resending this after changing list to take larger attachments. >>>> Previous message seems to have gotten lost (I musta pressed the wrong >>>> button in the list manager?) >>>> >>>> --- >>>> >>>> I'm progressing on the angle runs. Previous errors were due to problems >>>> with svn update, and then apparently needing ant clean and distclean. >>>> >>>> Now I'm executing but getting I/O errors. Ive attached all the logs and >>>> output from this run. >>>> >>>> My result files are coming back zero-length and Im seeing I/O errors in >>>> the logs (eg, in swift.out): >>>> >>>> ... >>>> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to >>>> SubmittedTask(type=2, identity=urn:0-0-6-0-1-1188429807121) setting >>>> status to Active >>>> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to Active >>>> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to >>>> Failed Exception in getFile >>>> >>>> ... >>>> >>>> My suspcion is that the app is failing and not proucing an expected >>>> output file. Perhaps theres a clean error in the log that says this but >>>> I havent found it yet. I think I saw error #500's from gridftp in the log. >>>> >>>> While I debug further, if anyone sees a different or obvious cause, I'd >>>> appreciate your eyeballs on it. >>>> >>>> Thanks, >>>> >>>> Mike >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user >>> > > From hategan at mcs.anl.gov Thu Aug 30 12:45:19 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 12:45:19 -0500 Subject: [Swift-devel] Re: [Swift-user] Resending: I/O errors in swift script In-Reply-To: <46D6FCC2.7000007@mcs.anl.gov> References: <46D6C6B9.6020708@mcs.anl.gov> <1188483739.27541.8.camel@blabla.mcs.anl.gov> <46D6E892.5000706@mcs.anl.gov> <46D6FCC2.7000007@mcs.anl.gov> Message-ID: <1188495919.1180.22.camel@blabla.mcs.anl.gov> On Thu, 2007-08-30 at 12:22 -0500, Michael Wilde wrote: > Following up on this, Mihael, you said: > > >> 2. Exit code != 0. Looks like some issues with R. > > I dont see where in the logs you observed that the jobs were failing. I > think that would have tipped me off earlier that I have an app problem. It normally comes out on stderr. grep -A 1000 "The following errors have occurred" swift.out But that's fundamentally the problem with information overload: it's hard to tell what the relevant part is. That's why you shouldn't run with -d. That information is in the logs anyway. > > I must be looking in the wrong place. I redirected stdout and stderr > into a file, starting swift like this: > > $ swift -debug awf2.swift >swift.out 2>&1 & > > from which I get the following logs when all is done: > > $ wc -l *log *out > 1 awf2-rm4p72i7lp0r0.0.rlog > 1322 awf2-rm4p72i7lp0r0.log > 1 swift.log > 1400 swift.out > 2724 total > $ > > The awf2*.log file seems to be more or less a timestamped version of > stdout/err. That's because you run with -d, which pretty much means "show me everything on stdout". > (Interesting to note where the extra lines are going that > are in swift.out but not in awf2*.log, though. ) Those are the error reports. They are printed on stderr. And yes, the actual log should also contain these. Bug report. > > In the .log file I see the text that Ive excerpted below. I think the > following impovements could be made and wonder if you agree: > > - Clearly show job exit code (I still dont see this) grep -A 10 "exit code" swift.out. I'm not sure what can be more clear in a log file than spelling "application x failed with an exit code of y". Please, don't confuse clarity of a particular message with the difficulty to find a particular message in a haystack of messages. > - Use mnemonic codes for task types (rather than 1,2...) Makes sense. Should be cog bug report. > - for the logs, map task URNs to simple integers; That's not such a good idea. The current scheme shows allows one to figure out the thread hierarchy. > display the mapping up front grep "running in" swift.out > > - Mike > > > 2007-08-29 18:23:53,895 INFO vdl:dostagein Staged in pc1.pcap to > awf2-rm4p72i7lp0r0/shared/ on UC > 2007-08-29 18:23:53,896 INFO vdl:execute2 Running job angle4-h2fjbhgi > angle4 with arguments [pc1.pcap, > of-75398839-775c-40ac-bd5c-49275e3269d5-0-1, > cf-a8272a9e-0f23-472f-8b4e-9f7825877a5a-0-1] in > awf2-rm4p72i7lp0r0/angle4-h2fjbhgi on\ > UC > 2007-08-29 18:23:54,078 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-3-0-1188429807105) setting status to Submitted > 2007-08-29 18:23:54,943 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-2-0-1188429807107) setting status to Submitted > 2007-08-29 18:23:55,364 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-6-0-1188429807109) setting status to Submitted > 2007-08-29 18:23:55,503 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-3-0-1188429807105) setting status to Active > 2007-08-29 18:23:57,057 DEBUG TaskImpl Task(type=2, > identity=urn:0-0-1-0-1-1188429807096) setting status to Completed > ... > 2007-08-29 18:23:58,117 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-1-0-1188429807111) setting status to Submitted > 2007-08-29 18:24:01,480 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-4-0-1188429807103) setting status to Active > 2007-08-29 18:24:06,322 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-2-0-1188429807107) setting status to Active > 2007-08-29 18:24:06,727 DEBUG TaskImpl Task(type=1, > identity=urn:0-0-6-0-1188429807109) setting status to Completed > 2007-08-29 18:24:06,729 DEBUG TaskImpl Task(type=4, > identity=urn:0-0-6-0-1188429807113) setting status to Active > 2007-08-29 18:24:06,734 DEBUG TaskImpl Task(type=4, > identity=urn:0-0-6-0-1188429807113) setting status to Completed > 2007-08-29 18:24:06,735 INFO vdl:execute2 Completed job angle4-h2fjbhgi > angle4 with arguments [pc1.pcap, > of-75398839-775c-40ac-bd5c-49275e3269d5-0-1, > cf-a8272a9e-0f23-472f-8b4e-9f7825877a5a-0-1] on UC > 2007-08-29 18:24:06,744 INFO vdl:dostageout Staging out > awf2-rm4p72i7lp0r0/shared/of-75398839-775c-40ac-bd5c-49275e3269d5-0-1 to > file://localhost/of-75398839-775c-40ac-bd5c-49275e3269d5-0-1 from UC > 2007-08-29 18:24:06,744 INFO vdl:dostageout Staging out > awf2-rm4p72i7lp0r0/shared/cf-a8272a9e-0f23-472f-8b4e-9f7825877a5a-0-1 to > file://localhost/cf-a8272a9e-0f23-472f-8b4e-9f7825877a5a-0-1 from UC > 2007-08-29 18:24:06,745 DEBUG TaskImpl Task(type=4, > identity=urn:0-0-6-0-1-1188429807115) setting status to Active > > > Michael Wilde wrote: > > Great - thanks. That was indeed the problem: my application script had > > a typo and was trying to run the 32-bit binary regardless what processor > > type it wound up on. When I last run successfully, I was getting most > > or all i686 machines; this time I was getting ia64 machines. > > > > I'll try to re-run it w/o debug, and see if the messages need improvement. > > > > Kickstart would have helped here - would have told me that Im running on > > ia64. > > > > This is the kind of problem that on a local machine would have been > > recognizable instantly but on a remote machine through swift, karajan, > > globus and PBS is a much greater challenge to diagnose. We should think > > in terms of how to make that long pipeline to the remote execution > > environment much more transparent to the user. > > > > Think: "what would I see if I ran this locally" and "how do I bring that > > environment to the swift user"? > > > > Also noted that: > > > > - the retry logic here did more harm than good. Maybe we want the > > default for this to be off, especially during debugging. > > > > - in my latest run, which succeeded, the final job completion was > > excessively delayed. The output files were all back on the submit host, > > 4 of 5 jobs were logged as completed, and the completion of the final > > job seemed to take a few minutes longer. > > > > I'll work through the error logs more closely and file an enhancement > > request in bugz. > > > > I can batch these for later discussion or bring them as I encounter > > things, whatever people prefer. I dont want to distract anyone at the > > moment into long discssions on these; I'll organize them into bug > > reports and enhancement requests and file for discussion when we next > > review priorities. > > > > Ian was suggesting that this be soon - now is when we need to pick the > > next features for you to work on, Ben and Mihael. Maybe a review of > > bugs and requests next week, which can be started by email discussion, > > and we'll note which topics needs voice or f2f discussion. > > > > - Mike > > > > > > Mihael Hategan wrote: > >> Ok. You have a bunch of errors, mainly of two types: > >> 1. Missing output file (we should add a rule in error.properties to make > >> that verbose message a little more readable). This may be because the > >> application didn't run or because the filesystem is broken. Right now an > >> exit code file is produced by the wrapper only if the exit code of the > >> application is not 0. This does not allow telling between the > >> application having completed successfully or the filesystem being > >> broken. I believe that a stamp file should also be created by the > >> wrapper in order to distinguish between the two. The reason for the > >> stamp file instead of always having an exit code file is that it is more > >> efficient to check the existence of a file than to stage it out and look > >> at its contents. > >> > >> 2. Exit code != 0. Looks like some issues with R. > >> > >> Mihael > >> > >> On Thu, 2007-08-30 at 08:31 -0500, Michael Wilde wrote: > >>> Resending this after changing list to take larger attachments. > >>> Previous message seems to have gotten lost (I musta pressed the wrong > >>> button in the list manager?) > >>> > >>> --- > >>> > >>> I'm progressing on the angle runs. Previous errors were due to problems > >>> with svn update, and then apparently needing ant clean and distclean. > >>> > >>> Now I'm executing but getting I/O errors. Ive attached all the logs and > >>> output from this run. > >>> > >>> My result files are coming back zero-length and Im seeing I/O errors in > >>> the logs (eg, in swift.out): > >>> > >>> ... > >>> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to > >>> SubmittedTask(type=2, identity=urn:0-0-6-0-1-1188429807121) setting > >>> status to Active > >>> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to > >>> Active > >>> Task(type=2, identity=urn:0-0-6-0-2-1188429807124) setting status to > >>> Failed Exception in getFile > >>> > >>> ... > >>> > >>> My suspcion is that the app is failing and not proucing an expected > >>> output file. Perhaps theres a clean error in the log that says this but > >>> I havent found it yet. I think I saw error #500's from gridftp in > >>> the log. > >>> > >>> While I debug further, if anyone sees a different or obvious cause, I'd > >>> appreciate your eyeballs on it. > >>> > >>> Thanks, > >>> > >>> Mike > >>> > >>> _______________________________________________ > >>> Swift-user mailing list > >>> Swift-user at ci.uchicago.edu > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > >> > >> > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-user > > > > > From hategan at mcs.anl.gov Thu Aug 30 13:00:25 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 13:00:25 -0500 Subject: [Swift-devel] Re: [Swift-user] Resending: I/O errors in swift script In-Reply-To: <46D701A7.4080409@mcs.anl.gov> References: <46D6C6B9.6020708@mcs.anl.gov> <1188483739.27541.8.camel@blabla.mcs.anl.gov> <46D6E892.5000706@mcs.anl.gov> <1188491451.31884.27.camel@blabla.mcs.anl.gov> <46D701A7.4080409@mcs.anl.gov> Message-ID: <1188496825.1180.37.camel@blabla.mcs.anl.gov> > >> Also noted that: > >> > >> - the retry logic here did more harm than good. > > > > Can you be more specific? > > In this case there was a script error. Every retry that wound up on an > IA64 host would fail. But there was no feedback on this aspect of the > runtime environment. > > I suspect a better default is "stop the workflow on first failure", then > let the user re-run till the wf is considered "debugged" and then let > the user set how things should be retried. I think that's an over generalization of a solution to your particular case. It ignores errors due to sites having problems, which is pretty standard, and would cause lots of annoyances. Ioan asked for more retries, and I can understand why. Now you're asking for no retries. The assumption was this: if there's a problem with the application invocation, all retries will eventually fail. There is no way to tell between application failures and site failures (even the exit code may not be the right indicator). Retries dramatically decrease the odds of failing the whole workflow because of a bad node/site (although it depends on the exact initial probability of finding bad nodes). But they do not change much if the invocation is broken. The application not being installed properly is, to a certain extent, a site problem, and chances are that running the same thing on a different site will make it work. Perhaps there should be two different sets of settings: one for setting up the workflow, and one for running it in production mode. Or, perhaps, the information about the workflow should be organized better, using interfaces more intuitive than endless streams of loosely structured text, so that the user can, interactively, explore the various details of what has happened. Now, there's retries and there's lazy errors (compute everything that's possible and only stop after nothing more can be done). You can disable that. swift -help. I think it's -lazy.errors=false. Mihael > > - Mike > From wilde at mcs.anl.gov Thu Aug 30 13:23:24 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 30 Aug 2007 13:23:24 -0500 Subject: [Swift-devel] Re: [Swift-user] Resending: I/O errors in swift script In-Reply-To: <1188496825.1180.37.camel@blabla.mcs.anl.gov> References: <46D6C6B9.6020708@mcs.anl.gov> <1188483739.27541.8.camel@blabla.mcs.anl.gov> <46D6E892.5000706@mcs.anl.gov> <1188491451.31884.27.camel@blabla.mcs.anl.gov> <46D701A7.4080409@mcs.anl.gov> <1188496825.1180.37.camel@blabla.mcs.anl.gov> Message-ID: <46D70B1C.1070308@mcs.anl.gov> You make some good points here, Mihael. I'll wait till I get a bit more experience. But I dont want to loose the "newbie" perspective, as thats where most users will start (and end) their experience with Swift. I went back to the log/out-err files and I think I see where I was confused: the indication of nonzero exit codes comes out much later in the log; it seems like the earlier jobs failed on output file retreival long before there was any indication of a non-zero job exitcode. This seems to me to need much more scrutiny; either I need to try several more controlled test cases and annotate the logs, or we should walk through a log together and I can explain what questions a newbie has about various messages and what an improved format might be. I'll try same with debug off to see what the default looks like. Onwards for now but we need to come back to this. - Mike Mihael Hategan wrote: >>>> Also noted that: >>>> >>>> - the retry logic here did more harm than good. >>> Can you be more specific? >> In this case there was a script error. Every retry that wound up on an >> IA64 host would fail. But there was no feedback on this aspect of the >> runtime environment. >> >> I suspect a better default is "stop the workflow on first failure", then >> let the user re-run till the wf is considered "debugged" and then let >> the user set how things should be retried. > > I think that's an over generalization of a solution to your particular > case. It ignores errors due to sites having problems, which is pretty > standard, and would cause lots of annoyances. Ioan asked for more > retries, and I can understand why. Now you're asking for no retries. > > The assumption was this: if there's a problem with the application > invocation, all retries will eventually fail. There is no way to tell > between application failures and site failures (even the exit code may > not be the right indicator). Retries dramatically decrease the odds of > failing the whole workflow because of a bad node/site (although it > depends on the exact initial probability of finding bad nodes). But they > do not change much if the invocation is broken. The application not > being installed properly is, to a certain extent, a site problem, and > chances are that running the same thing on a different site will make it > work. > > Perhaps there should be two different sets of settings: one for setting > up the workflow, and one for running it in production mode. > > Or, perhaps, the information about the workflow should be organized > better, using interfaces more intuitive than endless streams of loosely > structured text, so that the user can, interactively, explore the > various details of what has happened. > > Now, there's retries and there's lazy errors (compute everything that's > possible and only stop after nothing more can be done). You can disable > that. swift -help. I think it's -lazy.errors=false. > > Mihael > >> - Mike >> > > > From hategan at mcs.anl.gov Thu Aug 30 14:03:06 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 14:03:06 -0500 Subject: [Swift-devel] Re: [Swift-user] Resending: I/O errors in swift script In-Reply-To: <46D70B1C.1070308@mcs.anl.gov> References: <46D6C6B9.6020708@mcs.anl.gov> <1188483739.27541.8.camel@blabla.mcs.anl.gov> <46D6E892.5000706@mcs.anl.gov> <1188491451.31884.27.camel@blabla.mcs.anl.gov> <46D701A7.4080409@mcs.anl.gov> <1188496825.1180.37.camel@blabla.mcs.anl.gov> <46D70B1C.1070308@mcs.anl.gov> Message-ID: <1188500586.4347.22.camel@blabla.mcs.anl.gov> On Thu, 2007-08-30 at 13:23 -0500, Michael Wilde wrote: > You make some good points here, Mihael. > I'll wait till I get a bit more experience. > But I dont want to loose the "newbie" perspective, as thats where most > users will start (and end) their experience with Swift. Then don't try all the fancy non-newbie flags. "-debug" means "I want the details because I think I can make sense of them". > > I went back to the log/out-err files and I think I see where I was > confused: the indication of nonzero exit codes comes out much later in > the log; it seems like the earlier jobs failed on output file retreival > long before there was any indication of a non-zero job exitcode. The exit code is checked first. So exit code errors and missing file errors for a given job are mutually exclusive. Normally these are only reported at the end of the workflow. Anyway, I'll try to put in the stamp file, to distinguish between application failures and filesystem failures. > > This seems to me to need much more scrutiny; either I need to try > several more controlled test cases and annotate the logs, or we should > walk through a log together and I can explain what questions a newbie > has about various messages and what an improved format might be. There are two directions here. One is improving what we have, and the other is re-inventing what we have. Now, I'm not saying that there are no mistakes at all in the reasoning leading to the current state. But most of the things in there were not randomly thrown in, but the result of (I'd like to think) careful thinking. As much as there can be given the complexity of the problem. So there is a fine line between improving and re-inventing. If it's aggressively crossed, we may end up improving few things at the expense of considerable time. Of course, not crossing that line assumes a certain level of trust. Which is hard to formally define. In any event, those are, I think, the options. Mihael > > I'll try same with debug off to see what the default looks like. > > Onwards for now but we need to come back to this. > > - Mike > > > Mihael Hategan wrote: > >>>> Also noted that: > >>>> > >>>> - the retry logic here did more harm than good. > >>> Can you be more specific? > >> In this case there was a script error. Every retry that wound up on an > >> IA64 host would fail. But there was no feedback on this aspect of the > >> runtime environment. > >> > >> I suspect a better default is "stop the workflow on first failure", then > >> let the user re-run till the wf is considered "debugged" and then let > >> the user set how things should be retried. > > > > I think that's an over generalization of a solution to your particular > > case. It ignores errors due to sites having problems, which is pretty > > standard, and would cause lots of annoyances. Ioan asked for more > > retries, and I can understand why. Now you're asking for no retries. > > > > The assumption was this: if there's a problem with the application > > invocation, all retries will eventually fail. There is no way to tell > > between application failures and site failures (even the exit code may > > not be the right indicator). Retries dramatically decrease the odds of > > failing the whole workflow because of a bad node/site (although it > > depends on the exact initial probability of finding bad nodes). But they > > do not change much if the invocation is broken. The application not > > being installed properly is, to a certain extent, a site problem, and > > chances are that running the same thing on a different site will make it > > work. > > > > Perhaps there should be two different sets of settings: one for setting > > up the workflow, and one for running it in production mode. > > > > Or, perhaps, the information about the workflow should be organized > > better, using interfaces more intuitive than endless streams of loosely > > structured text, so that the user can, interactively, explore the > > various details of what has happened. > > > > Now, there's retries and there's lazy errors (compute everything that's > > possible and only stop after nothing more can be done). You can disable > > that. swift -help. I think it's -lazy.errors=false. > > > > Mihael > > > >> - Mike > >> > > > > > > > From bugzilla-daemon at mcs.anl.gov Thu Aug 30 14:08:43 2007 From: bugzilla-daemon at mcs.anl.gov (bugzilla-daemon at mcs.anl.gov) Date: Thu, 30 Aug 2007 14:08:43 -0500 (CDT) Subject: [Swift-devel] [Bug 90] New: Ability to identify filesystem problems Message-ID: http://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=90 Summary: Ability to identify filesystem problems Product: Swift Version: unspecified Platform: PC OS/Version: Linux Status: NEW Severity: normal Priority: P2 Component: General AssignedTo: hategan at mcs.anl.gov ReportedBy: hategan at mcs.anl.gov Currently there is some ambiguity between a problem with a shared file system on a cluster and an application not having produced certain output files. To help in troubleshooting, this distinction should be eliminated. A way to do this would be for the wrapper, which runs on the worker node, to always produce a well determined file that can be can be checked before staging out files. Should that file be missing, it would indicate that there is some conflict between what is on the filesystem on the worker node, and what is seen by the file server on, typically, the head node. -- Configure bugmail: http://bugzilla.mcs.anl.gov/swift/userprefs.cgi?tab=email ------- You are receiving this mail because: ------- You reported the bug, or are watching the reporter. You are the assignee for the bug, or are watching the assignee. From hategan at mcs.anl.gov Thu Aug 30 14:13:55 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 14:13:55 -0500 Subject: [Swift-devel] Re: [Swift-user] Resending: I/O errors in swift script In-Reply-To: <1188500586.4347.22.camel@blabla.mcs.anl.gov> References: <46D6C6B9.6020708@mcs.anl.gov> <1188483739.27541.8.camel@blabla.mcs.anl.gov> <46D6E892.5000706@mcs.anl.gov> <1188491451.31884.27.camel@blabla.mcs.anl.gov> <46D701A7.4080409@mcs.anl.gov> <1188496825.1180.37.camel@blabla.mcs.anl.gov> <46D70B1C.1070308@mcs.anl.gov> <1188500586.4347.22.camel@blabla.mcs.anl.gov> Message-ID: <1188501235.5419.4.camel@blabla.mcs.anl.gov> On Thu, 2007-08-30 at 14:03 -0500, Mihael Hategan wrote: > On Thu, 2007-08-30 at 13:23 -0500, Michael Wilde wrote: > > > > I went back to the log/out-err files and I think I see where I was > > confused: the indication of nonzero exit codes comes out much later in > > the log; it seems like the earlier jobs failed on output file retreival > > long before there was any indication of a non-zero job exitcode. > > The exit code is checked first. So exit code errors and missing file > errors for a given job are mutually exclusive. Normally these are only > reported at the end of the workflow. Anyway, I'll try to put in the > stamp file, to distinguish between application failures and filesystem > failures. Makes me think though. Could this be a sfs synchronization problem? I know Globus waits for a similar stamp file from a job to be visible on the head node. But does that guarantee that all files produced by the job will be visible and their contents up to date? Can we assume that individual items in the set of things that can be observed from a sfs on a node have the same ordering as their individual causes on another node? > From hategan at mcs.anl.gov Thu Aug 30 14:50:40 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 14:50:40 -0500 Subject: [Swift-devel] svn info in displayed version info In-Reply-To: References: Message-ID: <1188503440.10591.1.camel@blabla.mcs.anl.gov> On Thu, 2007-08-30 at 12:41 +0000, Ben Clifford wrote: > As of r1141, swift will display SVN revision number and an attempt to > guess whether the source has been modified from SVN. > > This introduces a built dependency on SVN, Also on bash for building. May I suggest writing a simple java class instead of a bash script, so that this can still be run on windows, and other systems that may not have bash for that matter? > but I don't think anyone builds > without SVN around. > From wilde at mcs.anl.gov Thu Aug 30 18:47:43 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 30 Aug 2007 18:47:43 -0500 Subject: [Swift-devel] Cant get Falkon provider connected to Swift Message-ID: <46D7571F.2010605@mcs.anl.gov> I'm using trunk (release 1139) and getting the following error when I run swift: === Execution failed: No security context can be found or created for service (provider deef): No 'deef' provider or alias found. Available providers: [gt2ft, gsiftp, condor, pbs, ssh, gt4ft, cobalt, local, dcache, gt4, gsiftp-old, http, gt2, ftp, webdav]. Aliases: local <-> file; pbs <-> pbslocal; gsiftp-old <-> gridftp-old; gsiftp <-> gridftp; cobalt <-> cobaltlocal; gt4 <-> gt3.9.5, gt4 .0.2, gt4.0.1, gt4.0.0; === I did what I think was a clean Swift build (ant dist after clean and distclean); then from the modules/provider-deef dir did a ant distclean and ant dist pointing to my swift vdsk dir that was built in the prior step). I have a cog-provider-deef-1.0.jar file in the lib dir of my dist, and my libexec/vds-sc.k file has: === element(execution, [provider, url] service(type="execution", provider=provider, url=url) ) === which should match the pool entry in my sites.xml: /home/wilde/swift/tmp/UC === Does anyone know what I missed to messed up here in wiring things together? Ive asked Ioan, but he's stumped because this is on the swift provider side of things. Thanks, Mike From hategan at mcs.anl.gov Thu Aug 30 19:35:00 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 19:35:00 -0500 Subject: [Swift-devel] Cant get Falkon provider connected to Swift In-Reply-To: <46D7571F.2010605@mcs.anl.gov> References: <46D7571F.2010605@mcs.anl.gov> Message-ID: <1188520501.18147.11.camel@blabla.mcs.anl.gov> Hmm. Strange: hategan at tg-viz-login1:~> swift -d -sites.file ./sites.xml -tc.file ./tc.data test.swift WARN - Failed to configure log file name DEBUG - Booting Falkon ... hategan at tg-viz-login1:~> which swift /home/wilde/swift1139f/vdsk-0.2-dev/bin/swift Mihael On Thu, 2007-08-30 at 18:47 -0500, Michael Wilde wrote: > I'm using trunk (release 1139) and getting the following error when I > run swift: > > === > Execution failed: > No security context can be found or created for service > (provider deef): No 'deef' provider or alias found. Available > providers: [gt2ft, gsiftp, condor, pbs, ssh, gt4ft, cobalt, local, > dcache, gt4, gsiftp-old, http, gt2, ftp, webdav]. Aliases: > local <-> file; pbs <-> pbslocal; gsiftp-old <-> gridftp-old; gsiftp <-> > gridftp; cobalt <-> cobaltlocal; gt4 <-> gt3.9.5, gt4 > .0.2, gt4.0.1, gt4.0.0; > === > > I did what I think was a clean Swift build (ant dist after clean and > distclean); then from the modules/provider-deef dir did a ant distclean > and ant dist pointing to my swift vdsk dir that was built in the prior > step). > > I have a cog-provider-deef-1.0.jar file in the lib dir of my dist, and > my libexec/vds-sc.k file has: > === > element(execution, [provider, url] > service(type="execution", > provider=provider, url=url) > ) > === > > which should match the pool entry in my sites.xml: > > > storage="/home/wilde/swift/tmp/UC" major="2" minor="2" /> > url="tg-grid.uc.teragrid.org/jobmanager-pbs" major="2" minor="2"/> > /home/wilde/swift/tmp/UC > > > === > > Does anyone know what I missed to messed up here in wiring things together? > > Ive asked Ioan, but he's stumped because this is on the swift provider > side of things. > > Thanks, > > Mike > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From nefedova at mcs.anl.gov Thu Aug 30 19:44:19 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Thu, 30 Aug 2007 19:44:19 -0500 Subject: [Swift-devel] Cant get Falkon provider connected to Swift In-Reply-To: <1188520501.18147.11.camel@blabla.mcs.anl.gov> References: <46D7571F.2010605@mcs.anl.gov> <1188520501.18147.11.camel@blabla.mcs.anl.gov> Message-ID: You need to fix log4j.properties, I had the same problem some time ago. SVN update somehow messes it up. Nika On Aug 30, 2007, at 7:35 PM, Mihael Hategan wrote: > Hmm. Strange: > hategan at tg-viz-login1:~> swift -d -sites.file ./sites.xml > -tc.file ./tc.data test.swift > WARN - Failed to configure log file name > DEBUG - Booting Falkon > ... > > hategan at tg-viz-login1:~> which swift > /home/wilde/swift1139f/vdsk-0.2-dev/bin/swift > > > Mihael > > On Thu, 2007-08-30 at 18:47 -0500, Michael Wilde wrote: >> I'm using trunk (release 1139) and getting the following error when I >> run swift: >> >> === >> Execution failed: >> No security context can be found or created for service >> (provider deef): No 'deef' provider or alias found. Available >> providers: [gt2ft, gsiftp, condor, pbs, ssh, gt4ft, cobalt, local, >> dcache, gt4, gsiftp-old, http, gt2, ftp, webdav]. Aliases: >> local <-> file; pbs <-> pbslocal; gsiftp-old <-> gridftp-old; >> gsiftp <-> >> gridftp; cobalt <-> cobaltlocal; gt4 <-> gt3.9.5, gt4 >> .0.2, gt4.0.1, gt4.0.0; >> === >> >> I did what I think was a clean Swift build (ant dist after clean and >> distclean); then from the modules/provider-deef dir did a ant >> distclean >> and ant dist pointing to my swift vdsk dir that was built in the >> prior >> step). >> >> I have a cog-provider-deef-1.0.jar file in the lib dir of my dist, >> and >> my libexec/vds-sc.k file has: >> === >> element(execution, [provider, url] >> service(type="execution", >> provider=provider, url=url) >> ) >> === >> >> which should match the pool entry in my sites.xml: >> >> >> > storage="/home/wilde/swift/tmp/UC" major="2" minor="2" /> >> > url="tg-grid.uc.teragrid.org/jobmanager-pbs" major="2" minor="2"/> >> /home/wilde/swift/tmp/UC >> >> >> === >> >> Does anyone know what I missed to messed up here in wiring things >> together? >> >> Ive asked Ioan, but he's stumped because this is on the swift >> provider >> side of things. >> >> Thanks, >> >> Mike >> _______________________________________________ >> Swift-devel mailing list >> Swift-devel at ci.uchicago.edu >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >> > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From hategan at mcs.anl.gov Thu Aug 30 20:02:18 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 20:02:18 -0500 Subject: [Swift-devel] Cant get Falkon provider connected to Swift In-Reply-To: References: <46D7571F.2010605@mcs.anl.gov> <1188520501.18147.11.camel@blabla.mcs.anl.gov> Message-ID: <1188522138.18147.22.camel@blabla.mcs.anl.gov> On Thu, 2007-08-30 at 19:44 -0500, Veronika Nefedova wrote: > You need to fix log4j.properties, I had the same problem some time > ago. SVN update somehow messes it up. Irrespective of how logging is set up. When I run, seemingly the same thing as Mike, it works and it finds the deef provider. So the problem of finding deef is probably not in the build, but in something else, unless this is nondeterministic. But from what I understand, Mike repeatedly got this. Or no? Mihael > > Nika > On Aug 30, 2007, at 7:35 PM, Mihael Hategan wrote: > > > Hmm. Strange: > > hategan at tg-viz-login1:~> swift -d -sites.file ./sites.xml > > -tc.file ./tc.data test.swift > > WARN - Failed to configure log file name > > DEBUG - Booting Falkon > > ... > > > > hategan at tg-viz-login1:~> which swift > > /home/wilde/swift1139f/vdsk-0.2-dev/bin/swift > > > > > > Mihael > > > > On Thu, 2007-08-30 at 18:47 -0500, Michael Wilde wrote: > >> I'm using trunk (release 1139) and getting the following error when I > >> run swift: > >> > >> === > >> Execution failed: > >> No security context can be found or created for service > >> (provider deef): No 'deef' provider or alias found. Available > >> providers: [gt2ft, gsiftp, condor, pbs, ssh, gt4ft, cobalt, local, > >> dcache, gt4, gsiftp-old, http, gt2, ftp, webdav]. Aliases: > >> local <-> file; pbs <-> pbslocal; gsiftp-old <-> gridftp-old; > >> gsiftp <-> > >> gridftp; cobalt <-> cobaltlocal; gt4 <-> gt3.9.5, gt4 > >> .0.2, gt4.0.1, gt4.0.0; > >> === > >> > >> I did what I think was a clean Swift build (ant dist after clean and > >> distclean); then from the modules/provider-deef dir did a ant > >> distclean > >> and ant dist pointing to my swift vdsk dir that was built in the > >> prior > >> step). > >> > >> I have a cog-provider-deef-1.0.jar file in the lib dir of my dist, > >> and > >> my libexec/vds-sc.k file has: > >> === > >> element(execution, [provider, url] > >> service(type="execution", > >> provider=provider, url=url) > >> ) > >> === > >> > >> which should match the pool entry in my sites.xml: > >> > >> > >> >> storage="/home/wilde/swift/tmp/UC" major="2" minor="2" /> > >> >> url="tg-grid.uc.teragrid.org/jobmanager-pbs" major="2" minor="2"/> > >> /home/wilde/swift/tmp/UC > >> > >> > >> === > >> > >> Does anyone know what I missed to messed up here in wiring things > >> together? > >> > >> Ive asked Ioan, but he's stumped because this is on the swift > >> provider > >> side of things. > >> > >> Thanks, > >> > >> Mike > >> _______________________________________________ > >> Swift-devel mailing list > >> Swift-devel at ci.uchicago.edu > >> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >> > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > > From nefedova at mcs.anl.gov Thu Aug 30 20:09:44 2007 From: nefedova at mcs.anl.gov (Veronika Nefedova) Date: Thu, 30 Aug 2007 20:09:44 -0500 Subject: [Swift-devel] Cant get Falkon provider connected to Swift In-Reply-To: <1188522138.18147.22.camel@blabla.mcs.anl.gov> References: <46D7571F.2010605@mcs.anl.gov> <1188520501.18147.11.camel@blabla.mcs.anl.gov> <1188522138.18147.22.camel@blabla.mcs.anl.gov> Message-ID: <64C4F84D-9469-41A7-B63C-5826E55320B2@mcs.anl.gov> I was commenting on these errors: WARN - Failed to configure log file name DEBUG - Booting Falkon If you have this, it means your logging is completely screwed up (no log file). It could be fixed in log4j.properties. On Aug 30, 2007, at 8:02 PM, Mihael Hategan wrote: > On Thu, 2007-08-30 at 19:44 -0500, Veronika Nefedova wrote: >> You need to fix log4j.properties, I had the same problem some time >> ago. SVN update somehow messes it up. > > Irrespective of how logging is set up. When I run, seemingly the same > thing as Mike, it works and it finds the deef provider. > > So the problem of finding deef is probably not in the build, but in > something else, unless this is nondeterministic. But from what I > understand, Mike repeatedly got this. Or no? > > Mihael > >> >> Nika >> On Aug 30, 2007, at 7:35 PM, Mihael Hategan wrote: >> >>> Hmm. Strange: >>> hategan at tg-viz-login1:~> swift -d -sites.file ./sites.xml >>> -tc.file ./tc.data test.swift >>> WARN - Failed to configure log file name >>> DEBUG - Booting Falkon >>> ... >>> >>> hategan at tg-viz-login1:~> which swift >>> /home/wilde/swift1139f/vdsk-0.2-dev/bin/swift >>> >>> >>> Mihael >>> >>> On Thu, 2007-08-30 at 18:47 -0500, Michael Wilde wrote: >>>> I'm using trunk (release 1139) and getting the following error >>>> when I >>>> run swift: >>>> >>>> === >>>> Execution failed: >>>> No security context can be found or created for service >>>> (provider deef): No 'deef' provider or alias found. Available >>>> providers: [gt2ft, gsiftp, condor, pbs, ssh, gt4ft, cobalt, local, >>>> dcache, gt4, gsiftp-old, http, gt2, ftp, webdav]. Aliases: >>>> local <-> file; pbs <-> pbslocal; gsiftp-old <-> gridftp-old; >>>> gsiftp <-> >>>> gridftp; cobalt <-> cobaltlocal; gt4 <-> gt3.9.5, gt4 >>>> .0.2, gt4.0.1, gt4.0.0; >>>> === >>>> >>>> I did what I think was a clean Swift build (ant dist after clean >>>> and >>>> distclean); then from the modules/provider-deef dir did a ant >>>> distclean >>>> and ant dist pointing to my swift vdsk dir that was built in the >>>> prior >>>> step). >>>> >>>> I have a cog-provider-deef-1.0.jar file in the lib dir of my dist, >>>> and >>>> my libexec/vds-sc.k file has: >>>> === >>>> element(execution, [provider, url] >>>> service(type="execution", >>>> provider=provider, url=url) >>>> ) >>>> === >>>> >>>> which should match the pool entry in my sites.xml: >>>> >>>> >>>> >>> storage="/home/wilde/swift/tmp/UC" major="2" minor="2" /> >>>> >>> url="tg-grid.uc.teragrid.org/jobmanager-pbs" major="2" minor="2"/> >>>> /home/wilde/swift/tmp/UC >>>> >>>> >>>> === >>>> >>>> Does anyone know what I missed to messed up here in wiring things >>>> together? >>>> >>>> Ive asked Ioan, but he's stumped because this is on the swift >>>> provider >>>> side of things. >>>> >>>> Thanks, >>>> >>>> Mike >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> >> > From wilde at mcs.anl.gov Thu Aug 30 20:10:40 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 30 Aug 2007 20:10:40 -0500 Subject: [Swift-devel] Cant get Falkon provider connected to Swift In-Reply-To: <1188522138.18147.22.camel@blabla.mcs.anl.gov> References: <46D7571F.2010605@mcs.anl.gov> <1188520501.18147.11.camel@blabla.mcs.anl.gov> <1188522138.18147.22.camel@blabla.mcs.anl.gov> Message-ID: <46D76A90.40700@mcs.anl.gov> RIght, but I'll check my env to see if I spot anything suspicious. And see if I can duplicate your startup command Mihael. WIll also check the log4j thing Nika. - Mike Mihael Hategan wrote: > On Thu, 2007-08-30 at 19:44 -0500, Veronika Nefedova wrote: >> You need to fix log4j.properties, I had the same problem some time >> ago. SVN update somehow messes it up. > > Irrespective of how logging is set up. When I run, seemingly the same > thing as Mike, it works and it finds the deef provider. > > So the problem of finding deef is probably not in the build, but in > something else, unless this is nondeterministic. But from what I > understand, Mike repeatedly got this. Or no? > > Mihael > >> Nika >> On Aug 30, 2007, at 7:35 PM, Mihael Hategan wrote: >> >>> Hmm. Strange: >>> hategan at tg-viz-login1:~> swift -d -sites.file ./sites.xml >>> -tc.file ./tc.data test.swift >>> WARN - Failed to configure log file name >>> DEBUG - Booting Falkon >>> ... >>> >>> hategan at tg-viz-login1:~> which swift >>> /home/wilde/swift1139f/vdsk-0.2-dev/bin/swift >>> >>> >>> Mihael >>> >>> On Thu, 2007-08-30 at 18:47 -0500, Michael Wilde wrote: >>>> I'm using trunk (release 1139) and getting the following error when I >>>> run swift: >>>> >>>> === >>>> Execution failed: >>>> No security context can be found or created for service >>>> (provider deef): No 'deef' provider or alias found. Available >>>> providers: [gt2ft, gsiftp, condor, pbs, ssh, gt4ft, cobalt, local, >>>> dcache, gt4, gsiftp-old, http, gt2, ftp, webdav]. Aliases: >>>> local <-> file; pbs <-> pbslocal; gsiftp-old <-> gridftp-old; >>>> gsiftp <-> >>>> gridftp; cobalt <-> cobaltlocal; gt4 <-> gt3.9.5, gt4 >>>> .0.2, gt4.0.1, gt4.0.0; >>>> === >>>> >>>> I did what I think was a clean Swift build (ant dist after clean and >>>> distclean); then from the modules/provider-deef dir did a ant >>>> distclean >>>> and ant dist pointing to my swift vdsk dir that was built in the >>>> prior >>>> step). >>>> >>>> I have a cog-provider-deef-1.0.jar file in the lib dir of my dist, >>>> and >>>> my libexec/vds-sc.k file has: >>>> === >>>> element(execution, [provider, url] >>>> service(type="execution", >>>> provider=provider, url=url) >>>> ) >>>> === >>>> >>>> which should match the pool entry in my sites.xml: >>>> >>>> >>>> >>> storage="/home/wilde/swift/tmp/UC" major="2" minor="2" /> >>>> >>> url="tg-grid.uc.teragrid.org/jobmanager-pbs" major="2" minor="2"/> >>>> /home/wilde/swift/tmp/UC >>>> >>>> >>>> === >>>> >>>> Does anyone know what I missed to messed up here in wiring things >>>> together? >>>> >>>> Ive asked Ioan, but he's stumped because this is on the swift >>>> provider >>>> side of things. >>>> >>>> Thanks, >>>> >>>> Mike >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >>> _______________________________________________ >>> Swift-devel mailing list >>> Swift-devel at ci.uchicago.edu >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>> > > From hategan at mcs.anl.gov Thu Aug 30 20:21:19 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 20:21:19 -0500 Subject: [Swift-devel] Cant get Falkon provider connected to Swift In-Reply-To: <64C4F84D-9469-41A7-B63C-5826E55320B2@mcs.anl.gov> References: <46D7571F.2010605@mcs.anl.gov> <1188520501.18147.11.camel@blabla.mcs.anl.gov> <1188522138.18147.22.camel@blabla.mcs.anl.gov> <64C4F84D-9469-41A7-B63C-5826E55320B2@mcs.anl.gov> Message-ID: <1188523279.18147.27.camel@blabla.mcs.anl.gov> On Thu, 2007-08-30 at 20:09 -0500, Veronika Nefedova wrote: > I was commenting on these errors: > > WARN - Failed to configure log file name > DEBUG - Booting Falkon > > If you have this, it means your logging is completely screwed up (no > log file). It could be fixed in log4j.properties. Yeah. The build system cannot very well cope with log4j.properties when multiple builds are done. We should probably make provider-deef build as a dependency from the start. > > > On Aug 30, 2007, at 8:02 PM, Mihael Hategan wrote: > > > On Thu, 2007-08-30 at 19:44 -0500, Veronika Nefedova wrote: > >> You need to fix log4j.properties, I had the same problem some time > >> ago. SVN update somehow messes it up. > > > > Irrespective of how logging is set up. When I run, seemingly the same > > thing as Mike, it works and it finds the deef provider. > > > > So the problem of finding deef is probably not in the build, but in > > something else, unless this is nondeterministic. But from what I > > understand, Mike repeatedly got this. Or no? > > > > Mihael > > > >> > >> Nika > >> On Aug 30, 2007, at 7:35 PM, Mihael Hategan wrote: > >> > >>> Hmm. Strange: > >>> hategan at tg-viz-login1:~> swift -d -sites.file ./sites.xml > >>> -tc.file ./tc.data test.swift > >>> WARN - Failed to configure log file name > >>> DEBUG - Booting Falkon > >>> ... > >>> > >>> hategan at tg-viz-login1:~> which swift > >>> /home/wilde/swift1139f/vdsk-0.2-dev/bin/swift > >>> > >>> > >>> Mihael > >>> > >>> On Thu, 2007-08-30 at 18:47 -0500, Michael Wilde wrote: > >>>> I'm using trunk (release 1139) and getting the following error > >>>> when I > >>>> run swift: > >>>> > >>>> === > >>>> Execution failed: > >>>> No security context can be found or created for service > >>>> (provider deef): No 'deef' provider or alias found. Available > >>>> providers: [gt2ft, gsiftp, condor, pbs, ssh, gt4ft, cobalt, local, > >>>> dcache, gt4, gsiftp-old, http, gt2, ftp, webdav]. Aliases: > >>>> local <-> file; pbs <-> pbslocal; gsiftp-old <-> gridftp-old; > >>>> gsiftp <-> > >>>> gridftp; cobalt <-> cobaltlocal; gt4 <-> gt3.9.5, gt4 > >>>> .0.2, gt4.0.1, gt4.0.0; > >>>> === > >>>> > >>>> I did what I think was a clean Swift build (ant dist after clean > >>>> and > >>>> distclean); then from the modules/provider-deef dir did a ant > >>>> distclean > >>>> and ant dist pointing to my swift vdsk dir that was built in the > >>>> prior > >>>> step). > >>>> > >>>> I have a cog-provider-deef-1.0.jar file in the lib dir of my dist, > >>>> and > >>>> my libexec/vds-sc.k file has: > >>>> === > >>>> element(execution, [provider, url] > >>>> service(type="execution", > >>>> provider=provider, url=url) > >>>> ) > >>>> === > >>>> > >>>> which should match the pool entry in my sites.xml: > >>>> > >>>> > >>>> >>>> storage="/home/wilde/swift/tmp/UC" major="2" minor="2" /> > >>>> >>>> url="tg-grid.uc.teragrid.org/jobmanager-pbs" major="2" minor="2"/> > >>>> /home/wilde/swift/tmp/UC > >>>> > >>>> > >>>> === > >>>> > >>>> Does anyone know what I missed to messed up here in wiring things > >>>> together? > >>>> > >>>> Ive asked Ioan, but he's stumped because this is on the swift > >>>> provider > >>>> side of things. > >>>> > >>>> Thanks, > >>>> > >>>> Mike > >>>> _______________________________________________ > >>>> Swift-devel mailing list > >>>> Swift-devel at ci.uchicago.edu > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>>> > >>> > >>> _______________________________________________ > >>> Swift-devel mailing list > >>> Swift-devel at ci.uchicago.edu > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > >>> > >> > > > From hategan at mcs.anl.gov Thu Aug 30 20:23:30 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 30 Aug 2007 20:23:30 -0500 Subject: [Swift-devel] Cant get Falkon provider connected to Swift In-Reply-To: <1188523279.18147.27.camel@blabla.mcs.anl.gov> References: <46D7571F.2010605@mcs.anl.gov> <1188520501.18147.11.camel@blabla.mcs.anl.gov> <1188522138.18147.22.camel@blabla.mcs.anl.gov> <64C4F84D-9469-41A7-B63C-5826E55320B2@mcs.anl.gov> <1188523279.18147.27.camel@blabla.mcs.anl.gov> Message-ID: <1188523410.18147.29.camel@blabla.mcs.anl.gov> On Thu, 2007-08-30 at 20:21 -0500, Mihael Hategan wrote: > On Thu, 2007-08-30 at 20:09 -0500, Veronika Nefedova wrote: > > I was commenting on these errors: > > > > WARN - Failed to configure log file name > > DEBUG - Booting Falkon > > > > If you have this, it means your logging is completely screwed up (no > > log file). It could be fixed in log4j.properties. > > Yeah. The build system cannot very well cope with log4j.properties when > multiple builds are done. We should probably make provider-deef build as > a dependency from the start. Or I should fix the build system. > > > > > > > On Aug 30, 2007, at 8:02 PM, Mihael Hategan wrote: > > > > > On Thu, 2007-08-30 at 19:44 -0500, Veronika Nefedova wrote: > > >> You need to fix log4j.properties, I had the same problem some time > > >> ago. SVN update somehow messes it up. > > > > > > Irrespective of how logging is set up. When I run, seemingly the same > > > thing as Mike, it works and it finds the deef provider. > > > > > > So the problem of finding deef is probably not in the build, but in > > > something else, unless this is nondeterministic. But from what I > > > understand, Mike repeatedly got this. Or no? > > > > > > Mihael > > > > > >> > > >> Nika > > >> On Aug 30, 2007, at 7:35 PM, Mihael Hategan wrote: > > >> > > >>> Hmm. Strange: > > >>> hategan at tg-viz-login1:~> swift -d -sites.file ./sites.xml > > >>> -tc.file ./tc.data test.swift > > >>> WARN - Failed to configure log file name > > >>> DEBUG - Booting Falkon > > >>> ... > > >>> > > >>> hategan at tg-viz-login1:~> which swift > > >>> /home/wilde/swift1139f/vdsk-0.2-dev/bin/swift > > >>> > > >>> > > >>> Mihael > > >>> > > >>> On Thu, 2007-08-30 at 18:47 -0500, Michael Wilde wrote: > > >>>> I'm using trunk (release 1139) and getting the following error > > >>>> when I > > >>>> run swift: > > >>>> > > >>>> === > > >>>> Execution failed: > > >>>> No security context can be found or created for service > > >>>> (provider deef): No 'deef' provider or alias found. Available > > >>>> providers: [gt2ft, gsiftp, condor, pbs, ssh, gt4ft, cobalt, local, > > >>>> dcache, gt4, gsiftp-old, http, gt2, ftp, webdav]. Aliases: > > >>>> local <-> file; pbs <-> pbslocal; gsiftp-old <-> gridftp-old; > > >>>> gsiftp <-> > > >>>> gridftp; cobalt <-> cobaltlocal; gt4 <-> gt3.9.5, gt4 > > >>>> .0.2, gt4.0.1, gt4.0.0; > > >>>> === > > >>>> > > >>>> I did what I think was a clean Swift build (ant dist after clean > > >>>> and > > >>>> distclean); then from the modules/provider-deef dir did a ant > > >>>> distclean > > >>>> and ant dist pointing to my swift vdsk dir that was built in the > > >>>> prior > > >>>> step). > > >>>> > > >>>> I have a cog-provider-deef-1.0.jar file in the lib dir of my dist, > > >>>> and > > >>>> my libexec/vds-sc.k file has: > > >>>> === > > >>>> element(execution, [provider, url] > > >>>> service(type="execution", > > >>>> provider=provider, url=url) > > >>>> ) > > >>>> === > > >>>> > > >>>> which should match the pool entry in my sites.xml: > > >>>> > > >>>> > > >>>> > >>>> storage="/home/wilde/swift/tmp/UC" major="2" minor="2" /> > > >>>> > >>>> url="tg-grid.uc.teragrid.org/jobmanager-pbs" major="2" minor="2"/> > > >>>> /home/wilde/swift/tmp/UC > > >>>> > > >>>> > > >>>> === > > >>>> > > >>>> Does anyone know what I missed to messed up here in wiring things > > >>>> together? > > >>>> > > >>>> Ive asked Ioan, but he's stumped because this is on the swift > > >>>> provider > > >>>> side of things. > > >>>> > > >>>> Thanks, > > >>>> > > >>>> Mike > > >>>> _______________________________________________ > > >>>> Swift-devel mailing list > > >>>> Swift-devel at ci.uchicago.edu > > >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >>>> > > >>> > > >>> _______________________________________________ > > >>> Swift-devel mailing list > > >>> Swift-devel at ci.uchicago.edu > > >>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > >>> > > >> > > > > > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > From benc at hawaga.org.uk Fri Aug 31 04:56:26 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 31 Aug 2007 09:56:26 +0000 (GMT) Subject: [Swift-devel] svn info in displayed version info In-Reply-To: <1188503440.10591.1.camel@blabla.mcs.anl.gov> References: <1188503440.10591.1.camel@blabla.mcs.anl.gov> Message-ID: On Thu, 30 Aug 2007, Mihael Hategan wrote: > > This introduces a built dependency on SVN, > > Also on bash for building. May I suggest writing a simple java class > instead of a bash script, so that this can still be run on windows, and > other systems that may not have bash for that matter? I added an OS test so that the SVN version info will only be added for linux and osx builds. -- From benc at hawaga.org.uk Fri Aug 31 05:53:22 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 31 Aug 2007 10:53:22 +0000 (GMT) Subject: [Swift-devel] Re: latest Falkon code is in SVN! In-Reply-To: <46D4AB05.9070900@cs.uchicago.edu> References: <46D4970E.4000809@cs.uchicago.edu> <46D4A46C.6070806@mcs.anl.gov> <46D4AB05.9070900@cs.uchicago.edu> Message-ID: When I run the client, I get this error, which looks like maybe you've got some pre-compiled code that is compiled with something later than the 1.5 JDK that I use... 2007-08-31 11:50:59,646 ERROR container.ServiceThread [ServiceThread-2,run:297] Unexpected error during request processing java.lang.UnsupportedClassVersionError: Bad version number in .class file at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:620) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) at java.net.URLClassLoader.defineClass(URLClassLoader.java:260) at java.net.URLClassLoader.access$100(URLClassLoader.java:56) at java.net.URLClassLoader$1.run(URLClassLoader.java:195) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:188) at java.lang.ClassLoader.loadClass(ClassLoader.java:306) at java.lang.ClassLoader.loadClass(ClassLoader.java:251) at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:242) at org.apache.axis.utils.ClassUtils$2.run(ClassUtils.java:176) at java.security.AccessController.doPrivileged(Native Method) at org.apache.axis.utils.ClassUtils.loadClass(ClassUtils.java:160) at org.apache.axis.utils.ClassUtils.forName(ClassUtils.java:142) at org.apache.axis.utils.cache.ClassCache.lookup(ClassCache.java:85) at org.apache.axis.providers.java.JavaProvider.getServiceClass(JavaProvider.java:424) at org.apache.axis.providers.java.JavaProvider.initServiceDesc(JavaProvider.java:457) at org.apache.axis.handlers.soap.SOAPService.getInitializedServiceDesc(SOAPService.java:283) at org.apache.axis.deployment.wsdd.WSDDService.makeNewInstance(WSDDService.java:487) at org.apache.axis.deployment.wsdd.WSDDDeployableItem.getNewInstance(WSDDDeployableItem.java:274) at org.apache.axis.deployment.wsdd.WSDDDeployableItem.getInstance(WSDDDeployableItem.java:260) at org.apache.axis.deployment.wsdd.WSDDDeployment.getService(WSDDDeployment.java:478) at org.apache.axis.configuration.DirProvider.getService(DirProvider.java:156) at org.apache.axis.AxisEngine.getService(AxisEngine.java:323) at org.apache.axis.MessageContext.setTargetService(MessageContext.java:757) at org.globus.wsrf.handlers.AddressingHandler.setTargetService(AddressingHandler.java:152) at org.apache.axis.message.addressing.handler.AddressingHandler.processServerRequest(AddressingHandler.java:344) at org.globus.wsrf.handlers.AddressingHandler.processServerRequest(AddressingHandler.java:77) at org.apache.axis.message.addressing.handler.AddressingHandler.invoke(AddressingHandler.java:114) at org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32) at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) at org.apache.axis.server.AxisServer.invoke(AxisServer.java:248) at org.globus.wsrf.container.ServiceThread.doPost(ServiceThread.java:664) at org.globus.wsrf.container.ServiceThread.process(ServiceThread.java:382) at org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:291) -- From benc at hawaga.org.uk Fri Aug 31 06:01:45 2007 From: benc at hawaga.org.uk (Ben Clifford) Date: Fri, 31 Aug 2007 11:01:45 +0000 (GMT) Subject: [Swift-devel] Re: latest Falkon code is in SVN! In-Reply-To: References: <46D4970E.4000809@cs.uchicago.edu> <46D4A46C.6070806@mcs.anl.gov> <46D4AB05.9070900@cs.uchicago.edu> Message-ID: I think by default this looks like there's no security enabled for job submission on Falkon. That is wrong. -- From iraicu at cs.uchicago.edu Fri Aug 31 06:37:07 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Fri, 31 Aug 2007 06:37:07 -0500 Subject: [Swift-devel] Re: latest Falkon code is in SVN! In-Reply-To: References: <46D4970E.4000809@cs.uchicago.edu> <46D4A46C.6070806@mcs.anl.gov> <46D4AB05.9070900@cs.uchicago.edu> Message-ID: <46D7FD63.2070502@cs.uchicago.edu> Yes, there is pre-compiled code there, and its probably from Java 1.6... I would do a falkon/clean-falkon.sh before the falkon/make-falkon.sh! This should clean up any leftover code that might have been, and allow you to build cleanly with the current version of Java. Ioan Ben Clifford wrote: > When I run the client, I get this error, which looks like maybe you've got > some pre-compiled code that is compiled with something later than the 1.5 > JDK that I use... > > 2007-08-31 11:50:59,646 ERROR container.ServiceThread > [ServiceThread-2,run:297] Unexpected error during request processing > java.lang.UnsupportedClassVersionError: Bad version number in .class file > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:620) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:260) > at java.net.URLClassLoader.access$100(URLClassLoader.java:56) > at java.net.URLClassLoader$1.run(URLClassLoader.java:195) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at java.lang.ClassLoader.loadClass(ClassLoader.java:251) > at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:242) > at org.apache.axis.utils.ClassUtils$2.run(ClassUtils.java:176) > at java.security.AccessController.doPrivileged(Native Method) > at org.apache.axis.utils.ClassUtils.loadClass(ClassUtils.java:160) > at org.apache.axis.utils.ClassUtils.forName(ClassUtils.java:142) > at > org.apache.axis.utils.cache.ClassCache.lookup(ClassCache.java:85) > at > org.apache.axis.providers.java.JavaProvider.getServiceClass(JavaProvider.java:424) > at > org.apache.axis.providers.java.JavaProvider.initServiceDesc(JavaProvider.java:457) > at > org.apache.axis.handlers.soap.SOAPService.getInitializedServiceDesc(SOAPService.java:283) > at > org.apache.axis.deployment.wsdd.WSDDService.makeNewInstance(WSDDService.java:487) > at > org.apache.axis.deployment.wsdd.WSDDDeployableItem.getNewInstance(WSDDDeployableItem.java:274) > at > org.apache.axis.deployment.wsdd.WSDDDeployableItem.getInstance(WSDDDeployableItem.java:260) > at > org.apache.axis.deployment.wsdd.WSDDDeployment.getService(WSDDDeployment.java:478) > at > org.apache.axis.configuration.DirProvider.getService(DirProvider.java:156) > at org.apache.axis.AxisEngine.getService(AxisEngine.java:323) > at > org.apache.axis.MessageContext.setTargetService(MessageContext.java:757) > at > org.globus.wsrf.handlers.AddressingHandler.setTargetService(AddressingHandler.java:152) > at > org.apache.axis.message.addressing.handler.AddressingHandler.processServerRequest(AddressingHandler.java:344) > at > org.globus.wsrf.handlers.AddressingHandler.processServerRequest(AddressingHandler.java:77) > at > org.apache.axis.message.addressing.handler.AddressingHandler.invoke(AddressingHandler.java:114) > at > org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32) > at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) > at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) > at org.apache.axis.server.AxisServer.invoke(AxisServer.java:248) > at > org.globus.wsrf.container.ServiceThread.doPost(ServiceThread.java:664) > at > org.globus.wsrf.container.ServiceThread.process(ServiceThread.java:382) > at > org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:291) > > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From iraicu at cs.uchicago.edu Fri Aug 31 06:40:01 2007 From: iraicu at cs.uchicago.edu (Ioan Raicu) Date: Fri, 31 Aug 2007 06:40:01 -0500 Subject: [Swift-devel] Re: latest Falkon code is in SVN! In-Reply-To: References: <46D4970E.4000809@cs.uchicago.edu> <46D4A46C.6070806@mcs.anl.gov> <46D4AB05.9070900@cs.uchicago.edu> Message-ID: <46D7FE11.7070204@cs.uchicago.edu> Right, bu default, all the scripts are without security. To enable security, one would have to modify 3 scripts (the service script -- remove -nosec option, the worker script -- replace http with https, and the client script -- replace http with https), and update the etc/client-security-config.xml on the worker and client accordingly with the relevant security parameters. It is pretty straight forward, but I haven't got the chance to document it yet. Ioan Ben Clifford wrote: > I think by default this looks like there's no security enabled for job > submission on Falkon. > > That is wrong. > > -- ============================================ Ioan Raicu Ph.D. Student ============================================ Distributed Systems Laboratory Computer Science Department University of Chicago 1100 E. 58th Street, Ryerson Hall Chicago, IL 60637 ============================================ Email: iraicu at cs.uchicago.edu Web: http://www.cs.uchicago.edu/~iraicu http://dsl.cs.uchicago.edu/ ============================================ ============================================ From wilde at mcs.anl.gov Fri Aug 31 07:01:39 2007 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 31 Aug 2007 07:01:39 -0500 Subject: [Swift-devel] Cant get Falkon provider connected to Swift In-Reply-To: <46D76A90.40700@mcs.anl.gov> References: <46D7571F.2010605@mcs.anl.gov> <1188520501.18147.11.camel@blabla.mcs.anl.gov> <1188522138.18147.22.camel@blabla.mcs.anl.gov> <46D76A90.40700@mcs.anl.gov> Message-ID: <46D80323.1070009@mcs.anl.gov> Found the problem: I had been using symlinks from the swift etc dir to my local sites and tc files, and set the links wrong when I switched to the falkon-enabled build. Its running now - thanks. - Mike Michael Wilde wrote: > RIght, but I'll check my env to see if I spot anything suspicious. > And see if I can duplicate your startup command Mihael. WIll also check > the log4j thing Nika. > > - Mike > > > Mihael Hategan wrote: >> On Thu, 2007-08-30 at 19:44 -0500, Veronika Nefedova wrote: >>> You need to fix log4j.properties, I had the same problem some time >>> ago. SVN update somehow messes it up. >> >> Irrespective of how logging is set up. When I run, seemingly the same >> thing as Mike, it works and it finds the deef provider. >> >> So the problem of finding deef is probably not in the build, but in >> something else, unless this is nondeterministic. But from what I >> understand, Mike repeatedly got this. Or no? >> >> Mihael >> >>> Nika >>> On Aug 30, 2007, at 7:35 PM, Mihael Hategan wrote: >>> >>>> Hmm. Strange: >>>> hategan at tg-viz-login1:~> swift -d -sites.file ./sites.xml >>>> -tc.file ./tc.data test.swift >>>> WARN - Failed to configure log file name >>>> DEBUG - Booting Falkon >>>> ... >>>> >>>> hategan at tg-viz-login1:~> which swift >>>> /home/wilde/swift1139f/vdsk-0.2-dev/bin/swift >>>> >>>> >>>> Mihael >>>> >>>> On Thu, 2007-08-30 at 18:47 -0500, Michael Wilde wrote: >>>>> I'm using trunk (release 1139) and getting the following error when I >>>>> run swift: >>>>> >>>>> === >>>>> Execution failed: >>>>> No security context can be found or created for service >>>>> (provider deef): No 'deef' provider or alias found. Available >>>>> providers: [gt2ft, gsiftp, condor, pbs, ssh, gt4ft, cobalt, local, >>>>> dcache, gt4, gsiftp-old, http, gt2, ftp, webdav]. Aliases: >>>>> local <-> file; pbs <-> pbslocal; gsiftp-old <-> gridftp-old; >>>>> gsiftp <-> >>>>> gridftp; cobalt <-> cobaltlocal; gt4 <-> gt3.9.5, gt4 >>>>> .0.2, gt4.0.1, gt4.0.0; >>>>> === >>>>> >>>>> I did what I think was a clean Swift build (ant dist after clean and >>>>> distclean); then from the modules/provider-deef dir did a ant >>>>> distclean >>>>> and ant dist pointing to my swift vdsk dir that was built in the >>>>> prior >>>>> step). >>>>> >>>>> I have a cog-provider-deef-1.0.jar file in the lib dir of my dist, >>>>> and >>>>> my libexec/vds-sc.k file has: >>>>> === >>>>> element(execution, [provider, url] >>>>> service(type="execution", >>>>> provider=provider, url=url) >>>>> ) >>>>> === >>>>> >>>>> which should match the pool entry in my sites.xml: >>>>> >>>>> >>>>> >>>> storage="/home/wilde/swift/tmp/UC" major="2" minor="2" /> >>>>> >>>> url="tg-grid.uc.teragrid.org/jobmanager-pbs" major="2" minor="2"/> >>>>> /home/wilde/swift/tmp/UC >>>>> >>>>> >>>>> === >>>>> >>>>> Does anyone know what I missed to messed up here in wiring things >>>>> together? >>>>> >>>>> Ive asked Ioan, but he's stumped because this is on the swift >>>>> provider >>>>> side of things. >>>>> >>>>> Thanks, >>>>> >>>>> Mike >>>>> _______________________________________________ >>>>> Swift-devel mailing list >>>>> Swift-devel at ci.uchicago.edu >>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>>> >>>> _______________________________________________ >>>> Swift-devel mailing list >>>> Swift-devel at ci.uchicago.edu >>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel >>>> >> >> > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > http://mail.ci.uchicago.edu/mailman/listinfo/swift-devel > > From hategan at mcs.anl.gov Fri Aug 31 08:53:54 2007 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 31 Aug 2007 08:53:54 -0500 Subject: [Swift-devel] Re: latest Falkon code is in SVN! In-Reply-To: References: <46D4970E.4000809@cs.uchicago.edu> <46D4A46C.6070806@mcs.anl.gov> <46D4AB05.9070900@cs.uchicago.edu> Message-ID: <1188568435.12219.0.camel@blabla.mcs.anl.gov> Use javac -source 1.4 -target 1.4 when compiling. On Fri, 2007-08-31 at 10:53 +0000, Ben Clifford wrote: > When I run the client, I get this error, which looks like maybe you've got > some pre-compiled code that is compiled with something later than the 1.5 > JDK that I use... > > 2007-08-31 11:50:59,646 ERROR container.ServiceThread > [ServiceThread-2,run:297] Unexpected error during request processing > java.lang.UnsupportedClassVersionError: Bad version number in .class file > at java.lang.ClassLoader.defineClass1(Native Method) > at java.lang.ClassLoader.defineClass(ClassLoader.java:620) > at > java.security.SecureClassLoader.defineClass(SecureClassLoader.java:124) > at java.net.URLClassLoader.defineClass(URLClassLoader.java:260) > at java.net.URLClassLoader.access$100(URLClassLoader.java:56) > at java.net.URLClassLoader$1.run(URLClassLoader.java:195) > at java.security.AccessController.doPrivileged(Native Method) > at java.net.URLClassLoader.findClass(URLClassLoader.java:188) > at java.lang.ClassLoader.loadClass(ClassLoader.java:306) > at java.lang.ClassLoader.loadClass(ClassLoader.java:251) > at java.lang.ClassLoader.loadClassInternal(ClassLoader.java:319) > at java.lang.Class.forName0(Native Method) > at java.lang.Class.forName(Class.java:242) > at org.apache.axis.utils.ClassUtils$2.run(ClassUtils.java:176) > at java.security.AccessController.doPrivileged(Native Method) > at org.apache.axis.utils.ClassUtils.loadClass(ClassUtils.java:160) > at org.apache.axis.utils.ClassUtils.forName(ClassUtils.java:142) > at > org.apache.axis.utils.cache.ClassCache.lookup(ClassCache.java:85) > at > org.apache.axis.providers.java.JavaProvider.getServiceClass(JavaProvider.java:424) > at > org.apache.axis.providers.java.JavaProvider.initServiceDesc(JavaProvider.java:457) > at > org.apache.axis.handlers.soap.SOAPService.getInitializedServiceDesc(SOAPService.java:283) > at > org.apache.axis.deployment.wsdd.WSDDService.makeNewInstance(WSDDService.java:487) > at > org.apache.axis.deployment.wsdd.WSDDDeployableItem.getNewInstance(WSDDDeployableItem.java:274) > at > org.apache.axis.deployment.wsdd.WSDDDeployableItem.getInstance(WSDDDeployableItem.java:260) > at > org.apache.axis.deployment.wsdd.WSDDDeployment.getService(WSDDDeployment.java:478) > at > org.apache.axis.configuration.DirProvider.getService(DirProvider.java:156) > at org.apache.axis.AxisEngine.getService(AxisEngine.java:323) > at > org.apache.axis.MessageContext.setTargetService(MessageContext.java:757) > at > org.globus.wsrf.handlers.AddressingHandler.setTargetService(AddressingHandler.java:152) > at > org.apache.axis.message.addressing.handler.AddressingHandler.processServerRequest(AddressingHandler.java:344) > at > org.globus.wsrf.handlers.AddressingHandler.processServerRequest(AddressingHandler.java:77) > at > org.apache.axis.message.addressing.handler.AddressingHandler.invoke(AddressingHandler.java:114) > at > org.apache.axis.strategies.InvocationStrategy.visit(InvocationStrategy.java:32) > at org.apache.axis.SimpleChain.doVisiting(SimpleChain.java:118) > at org.apache.axis.SimpleChain.invoke(SimpleChain.java:83) > at org.apache.axis.server.AxisServer.invoke(AxisServer.java:248) > at > org.globus.wsrf.container.ServiceThread.doPost(ServiceThread.java:664) > at > org.globus.wsrf.container.ServiceThread.process(ServiceThread.java:382) > at > org.globus.wsrf.container.ServiceThread.run(ServiceThread.java:291) > >