From wilde at mcs.anl.gov Thu Aug 1 08:22:34 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 1 Aug 2013 08:22:34 -0500 (CDT) Subject: [Swift-devel] Issues in running 0.94-latest on Tukey and Vesta Message-ID: <182606706.18960284.1375363354297.JavaMail.root@mcs.anl.gov> Hi Mihael, We're encountering several issues in this environment, and Im hoping you can jump in on this and hopefully fix/patch things to work well enough for initial use. Tukey is a rather vanilla Intel x86_64 cluster running Cobalt. It shares a GridFTP server with Mira and Cetus, the production BG/Q systems. Vesta is a 2-rack development BG/Q. Its got a PPC head node, IBM Java 6, and no GridFTP server. Cobalt scheduler with unique BG/Q-specific "subjob" capability which we are trying to exploit. Both systems have cryptocard-only ssh access. Here's the problems so far: - Swift on Tukey cant access the Tukey GridFTP server. (Need to try to other places) - I sent you the error yesterday - Swift on Tukey cant run jobs on vesta through automated, tunneled coasters (using the methods that work well on Orthros). It hits problems in authentication, due to what looks like incompatibilities with IBM Java. - Swift on Tukey can run jobs to a persistent automatic coaster server on Vesta, but not reliably. Every other run works/fails/works/fails, then fails hard. I think this was achieved with Oracle Java 1.7 on Tukey and IBM Java on Vesta. This message is mainly a heads-up. I'll create tickets for each of these to associate logs and messages with each problem, but I need more hands to keep all this moving forward. - Mike From wilde at mcs.anl.gov Thu Aug 1 08:54:15 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 1 Aug 2013 08:54:15 -0500 (CDT) Subject: [Swift-devel] Fwd: tutorial In-Reply-To: Message-ID: <2064095183.18965152.1375365255535.JavaMail.root@mcs.anl.gov> Here's a useful annotated version of the tutorial, from Eric Skogen of Clemson. This may be a good time to review and integrate Eric's edits. - Mike ----- Forwarded Message ----- From: "Eric Skogen" To: "Michael Wilde" Sent: Wednesday, May 8, 2013 9:29:00 AM Subject: tutorial I took the stuff you sent me and turned it into something more usable without a live demonstrator. I attached it as a pdf, but the code insertion is an ugly hack because I broke my ability to build the documentation. I'll be fixing that soon so I can debug the attached and send it in as build able. -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: Programming Quickstart.txt URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: Programming Quickstart.pdf Type: application/pdf Size: 56167 bytes Desc: not available URL: From wilde at mcs.anl.gov Thu Aug 1 08:56:00 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 1 Aug 2013 08:56:00 -0500 (CDT) Subject: [Swift-devel] Fwd: Revised Quickstart In-Reply-To: Message-ID: <491521126.18965591.1375365360850.JavaMail.root@mcs.anl.gov> Im reposting this for quick reference related to the preceding message. - Mike ----- Forwarded Message ----- From: "Eric Skogen" To: swift-devel at ci.uchicago.edu Sent: Friday, May 10, 2013 9:03:27 AM Subject: [Swift-devel] Revised Quickstart Attached is a revised Quickstart guide. Building it requires that the UofC_2013-04-09 folder be moved to the examples folder and renamed "quickstart" We hope this will help new users become comfortable with swift more quickly. _______________________________________________ Swift-devel mailing list Swift-devel at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An embedded and charset-unspecified text was scrubbed... Name: quickstart.txt URL: From wilde at mcs.anl.gov Wed Aug 7 09:40:40 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Wed, 7 Aug 2013 09:40:40 -0500 (CDT) Subject: [Swift-devel] Swift/K on Titan In-Reply-To: Message-ID: <1650280547.1409298.1375886440132.JavaMail.root@mcs.anl.gov> Scott, David, The problem here is this: Swift is still generating a submit file for a "vanilla" PBS cluster and trying to ssh to the nodes of the job to start a coaster on each node. For Cray systems it needs to use aprun instead. So that part of the sites file needs to be re-adjusted. You can see in the *.submit.stderr file all the errors from ssh, and from the .submit file, the fact that the ssh logic is there at all is incorrect for Cray systems. You should check the beagle logic and behavior, and then get the same to work on Titan but with the Titan-specific adjustments to the submit-file. Maybe "mpp" should be "cray" with variants "cray-titan" and cray-kraken. - Mike ----- Original Message ----- > From: "Scott Krieder" > To: "Michael Wilde" > Cc: "Swift Language" > Sent: Wednesday, August 7, 2013 9:14:03 AM > Subject: Re: Swift/K on Titan > > > Ok, I made sure there was nothing else in the queue and I ran again. > I'm still having the same issue where the job was marked as running > for around 2 minutes through qstat -u csep44 but swift was only > reporting "submitted." > > > I've attached the latest log, submit and stderr from that run. > > > -Scott > > > > On Wed, Aug 7, 2013 at 8:56 AM, Michael Wilde < wilde at mcs.anl.gov > > wrote: > > > Scott, David, > > I think this message in the log explains the root cause: > > 2013-08-06 21:28:16,528-0400 DEBUG AbstractExecutor Output from qsub > is: " Job not submitted You currently have a job in the debug queue. > Each user is allowed to have only one job at a time in the debug > queue. Please wait until job 1695274 completes before submitting > another job to the debug queue. " > > The "bug" here is simply that Swift doesnt send these messages back > to the user in a clear manner. > > Can you remove the offending job (if its still queued) and try again? > > Thanks, > > - Mike > > > > ----- Original Message ----- > > From: "Scott Krieder" < skrieder at iit.edu > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > Cc: "Swift Language" < davidk at ci.uchicago.edu > > > > > Sent: Tuesday, August 6, 2013 11:42:37 PM > > Subject: Re: Swift/K on Titan > > > > > > Hi Mike, > > > > > > Here is the *.log from the modis01 run. > > > > > > -Scott > > > > > > > > On Tue, Aug 6, 2013 at 11:30 PM, Michael Wilde < wilde at mcs.anl.gov > > > > > wrote: > > > > > > Scott, can you also send the large *.log file? > > > > > > > > ----- Original Message ----- > > > From: "Scott Krieder" < skrieder at iit.edu > > > > To: "Michael Wilde" < wilde at mcs.anl.gov > > > > Cc: "Swift Language" < davidk at ci.uchicago.edu > > > > Sent: Tuesday, August 6, 2013 10:28:40 PM > > > Subject: Swift/K on Titan > > > > > > > > > Hi Mike, > > > > > > > > > Just wanted to keep you in the loop. David and I got a lot closer > > > to > > > getting Swift/K running on Titan. > > > > > > > > > We ended up leaving off with the following error: > > > The Swift job would join the queue and show as running under > > > qstat, > > > but would only show as submitted through the swift report. > > > > > > > > > The submit script was also writing this to stderr: > > > ssh: connect to host 811 port 22: Invalid argument > > > > > > > > > > > > I've attached the submit script that swift generated as well as > > > the > > > stderr that was generated. > > > > > > > > > -Scott > > > > > > > > > -- > > > Scott J. Krieder > > > > > > > > > C: 419-685-0410 > > > > > > E: skrieder at iit.edu > > > > > > http://datasys.cs.iit.edu/~skrieder/ > > > > > > > > > > -- > > Scott J. Krieder > > > > > > C: 419-685-0410 > > > > E: skrieder at iit.edu > > > > http://datasys.cs.iit.edu/~skrieder/ > > > > > -- > Scott J. Krieder > > > C: 419-685-0410 > > E: skrieder at iit.edu > > http://datasys.cs.iit.edu/~skrieder/ -------------- next part -------------- A non-text attachment was scrubbed... Name: modis01-20130807-1005-u53lntkc.log Type: text/x-log Size: 17886 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PBS1071539250834089601.submit Type: application/octet-stream Size: 1312 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: PBS1071539250834089601.submit.stderr Type: application/octet-stream Size: 863 bytes Desc: not available URL: From hategan at mcs.anl.gov Thu Aug 8 21:33:52 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 08 Aug 2013 19:33:52 -0700 Subject: [Swift-devel] monitor updates Message-ID: <1376015632.13257.6.camel@echo> Hi, I just committed some updates to the monitors. Mainly support for charts in the http monitor and sampling feeds from coasters, in particular the number of active cores. This was motivated by Mike's observation that some surprising coaster behavior regarding node allocation could be better understood if there was an easy way to look at both active coaster cores and active jobs. I also spent two days on a jfreechart based gantt chart that turned out to be waaaaay to slow if you had a few hundred jobs (which is a low number for most serious applications). So I'm pretty upset about that. Please test if you can. I'll suspend improvements on the monitors for now and focus on bug fixing since various inconvenient bugs have started to crop up. But the long term plan is terminal windows into nodes with coasters and ability to use the swing monitor code for off-line plotting. Mihael From wilde at mcs.anl.gov Fri Aug 9 09:59:09 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Fri, 9 Aug 2013 09:59:09 -0500 (CDT) Subject: [Swift-devel] monitor updates In-Reply-To: <1376015632.13257.6.camel@echo> Message-ID: <515662612.2633637.1376060349310.JavaMail.root@mcs.anl.gov> Mihael, this sounds great. I may have missed something, though. I tried the earlier monitor version through Swing. Is the http version ready to try, and if so, where do you point a browser to view it? The use of the charting for offline plotting is a good topic for discussion today. - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Swift Devel" > Sent: Thursday, August 8, 2013 9:33:52 PM > Subject: [Swift-devel] monitor updates > > Hi, > > I just committed some updates to the monitors. Mainly support for > charts > in the http monitor and sampling feeds from coasters, in particular > the > number of active cores. This was motivated by Mike's observation that > some surprising coaster behavior regarding node allocation could be > better understood if there was an easy way to look at both active > coaster cores and active jobs. > > I also spent two days on a jfreechart based gantt chart that turned > out > to be waaaaay to slow if you had a few hundred jobs (which is a low > number for most serious applications). So I'm pretty upset about > that. > > Please test if you can. I'll suspend improvements on the monitors for > now and focus on bug fixing since various inconvenient bugs have > started > to crop up. But the long term plan is terminal windows into nodes > with > coasters and ability to use the swing monitor code for off-line > plotting. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From davidk at ci.uchicago.edu Fri Aug 9 11:12:11 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Fri, 9 Aug 2013 11:12:11 -0500 (CDT) Subject: [Swift-devel] UC3 mini-tutorial In-Reply-To: <1997803119.1515423.1376064173874.JavaMail.root@ci.uchicago.edu> Message-ID: <64409016.1521241.1376064731639.JavaMail.root@ci.uchicago.edu> For anyone interested in running Swift on uc3, there is a mini-tutorial available at http://swiftlang.org/guides/trunk/uc3/README.html. It's aimed at new Swift users and doesn't go into too much detail, but it does include a a bit of info at the end about running on specific uc3 resources. It's done in a similar style to recent tutorials, using a shell script number generator as a mock science application. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Fri Aug 9 13:51:20 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Fri, 09 Aug 2013 11:51:20 -0700 Subject: [Swift-devel] monitor updates In-Reply-To: <515662612.2633637.1376060349310.JavaMail.root@mcs.anl.gov> References: <515662612.2633637.1376060349310.JavaMail.root@mcs.anl.gov> Message-ID: <1376074280.17740.1.camel@echo> On Fri, 2013-08-09 at 09:59 -0500, Michael Wilde wrote: > Mihael, this sounds great. > > I may have missed something, though. I tried the earlier monitor version through Swing. > > Is the http version ready to try, and if so, where do you point a browser to view it? It's ready to try. If you start swift with -ui http it will print a port (default is 3456 I think), or you can start it with -ui http:. Mihael From marialemos72 at gmail.com Thu Aug 15 09:05:47 2013 From: marialemos72 at gmail.com (WorldCIST) Date: Thu, 15 Aug 2013 15:05:47 +0100 Subject: [Swift-devel] CFP: WorldCIST'14 - World Conference on IST; Best papers published in JCR/ISI Journals Message-ID: <20130815140354.9FAB37CC574@mailrelay.anl.gov> Apologies if you are receiving this mail more than once... Please disseminate by friends, colleagues, researchers, students, etc. Thanks a lot! ********************************************************************************** WorldCIST'14 The 2014 World Conference on Information Systems and Technologies April 15 - 18, Madeira Island, Portugal http://www.aisti.eu/worldcist14/ ********************************************************************************** The 2014 World Conference on Information Systems and Technologies (WorldCIST'14: http://www.aisti.eu/worldcist14) is a global forum for researchers and practitioners to present and discuss the most recent innovations, trends, results, experiences and concerns in the several perspectives of Information Systems and Technologies. We are pleased to invite you to submit your papers to WorldCISTI'14. All submissions will be reviewed on the basis of relevance, originality, importance and clarity. THEMES Submitted papers should be related with one or more of the main themes proposed for the Conference: A) Information and Knowledge Management (IKM); B) Organizational Models and Information Systems (OMIS); C) Intelligent and Decision Support Systems (IDSS); D) Software Systems, Architectures, Applications and Tools (SSAAT); E) Computer Networks, Mobility and Pervasive Systems (CNMPS); F) Human-Computer Interaction (HCI); G) Health Informatics (HIS); H) Information Technologies in Education (ITE). TYPES OF SUBMISSIONS AND DECISIONS Four types of papers can be submitted: Full paper: Finished or consolidated R&D works, to be included in one of the Conference themes. These papers are assigned a 10-page limit. Short paper: Ongoing works with relevant preliminary results, open to discussion. These papers are assigned a 7-page limit. Poster paper: Initial work with relevant ideas, open to discussion. These papers are assigned to a 4-page limit. Company paper: Companies' papers that show practical experience, R & D, tools, etc., focused on some topics of the conference. These papers are assigned to a 4-page limit. Submitted papers must comply with the format of Advances in Intelligent Systems and Computing Series (see Instructions for Authors at Springer Website or download a DOC example) be written in English, must not have been published before, not be under review for any other conference or publication and not include any information leading to the authors? identification. Therefore, the authors? names, affiliations and bibliographic references should not be included in the version for evaluation by the Program Committee. This information should only be included in the camera-ready version, saved in Word or Latex format and also in PDF format. These files must be accompanied by the Consent to Publication form filled out, in a ZIP file, and uploaded at the conference management system. All papers will be subjected to a ?double-blind review? by at least two members of the Program Committee. Based on Program Committee evaluation, a paper can be rejected or accepted by the Conference Chairs. In the later case, it can be accepted as the type originally submitted or as another type. Thus, full papers can be accepted as short papers or poster papers only. Similarly, short papers can be accepted as poster papers only. In these cases, the authors will be allowed to maintain the original number of pages in the camera-ready version. The authors of accepted poster papers must also build and print a poster to be exhibited during the Conference. This poster must follow an A1 or A2 vertical format. The Conference includes Work Sessions where these posters are presented and orally discussed, with a 5 minute limit per poster. The authors of accepted full papers will have 15 minutes to present their work in a Conference Work Session; approximately 5 minutes of discussion will follow each presentation. The authors of accepted short papers and company papers will have 11 minutes to present their work in a Conference Work Session; approximately 4 minutes of discussion will follow each presentation. PUBLICATION AND INDEXING To ensure that a full paper, short paper, poster paper or company paper is published in the Proceedings, at least one of the authors must be fully registered by the 24th of January 2014, and the paper must comply with the suggested layout and page-limit. Additionally, all recommended changes must be addressed by the authors before they submit the camera-ready version. No more than one paper per registration will be published in the Conference Proceedings. An extra fee must be paid for publication of additional papers, with a maximum of one additional paper per registration. Full and short papers will be published in Proceedings by Springer, in Advances in Intelligent Systems and Computing Series. Poster and company papers will be published in Proceedings by AISTI. Published full and short papers will be submitted for indexation by ISI, EI-Compendex, SCOPUS and DBLP, among others, and will be available in the SpringerLink Digital Library. Published poster and company papers will be submitted for indexation by EI-Compendex and EBSCO. The authors of the best selected papers will be invited to extend them for publication in international journals indexed by ISI, SCOPUS and DBLP, among others, such as: Journal of Information Technology (JIT) Social Science Computer Review (SSC) Information Technology & People (ITP) Computer Science and Information Systems (ComSIS) Information Development (IDV) Journal of Information Science and Engineering (JISE) IEEE IT Professional (ITPro) Computer Methods in Biomechanics and Biomedical Engineering - Imaging & Visualization (CMBBE-IV) Journal of Medical Internet Research (JMIR) International Journal of Health Information Systems & Informatics (IJHISI) International Journal of Web Based Communities (IJWBC) International Journal of Interactive Multimedia and Artificial Intelligence (IJIMAI) INPORTANT DATES Paper Submission: November 15, 2013 Notification of Acceptance: January 10, 2014 Camera-ready Submission: January 19, 2014 Payment of Registration, to ensure the inclusion of an accepted paper in the conference proceedings: January 24, 2014. - Regards, WorldCIST Team http://www.aisti.eu/worldcist14 From hategan at mcs.anl.gov Sat Aug 17 16:28:03 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 17 Aug 2013 14:28:03 -0700 Subject: [Swift-devel] test for "got one name (derr) and 0 values" Message-ID: <1376774883.31117.2.camel@echo> f = quicklyFailingApp(); foreach i in [0:sufficientlyLargeNumber] { a[i] = someApp(f); } dummy = sleep(15 seconds or so) // to give the other stuff time to do its thing. Mihael From hategan at mcs.anl.gov Sat Aug 17 17:01:28 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 17 Aug 2013 15:01:28 -0700 Subject: [Swift-devel] user guide re-write Message-ID: <1376776888.31117.16.camel@echo> Here's the user guide version Mike was talking about. I only messed with it up to and right after the picture of Shane. The goals were (and I'm not sure I succeeded in reaching them): - relatively formal (i.e. leave little room for ambiguity) - try to use a single term for any given concept (we should be extra careful about this since we developed all these synonyms that we are comfortable with but may be confusing for new users) - examples backing up most statements about the language. In particular, users should be able to quickly figure out stuff by just looking at the examples, and the text is used mostly to clarify subtle points not obvious from the examples. - links to relevant sections whenever a major concept particular to swift is used and its meaning not clearly obvious. - hopefully correct grammar/syntax/spelling and simple sentences. - "see also" lists at the end of every section for concepts somewhat related to the current section. Mihael -------------- next part -------------- A non-text attachment was scrubbed... Name: ug.tar.gz Type: application/x-compressed-tar Size: 41771 bytes Desc: not available URL: From wilde at mcs.anl.gov Mon Aug 19 11:24:17 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 19 Aug 2013 11:24:17 -0500 (CDT) Subject: [Swift-devel] test for "got one name (derr) and 0 values" In-Reply-To: <1376774883.31117.2.camel@echo> Message-ID: <2086040295.5496570.1376929457351.JavaMail.root@mcs.anl.gov> This is filed as bug 1067: https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1067 The ticket contains a working implementation of Mihael's test sketch below, but on simple tests on bridled Im not yet able to reproduce the failure. ALL: Can anyone recall who other than Jason and Lorenzo on Beagle have encountered this failure? I vaguely recall having see it in my own tests, but cant find any other trace of it in my email at the moment. Mihael, all, please update the ticket with what you know about this. I really want to work hard to resolve this bug this week, as (I think) its holding up some important science work. Thanks, - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Swift Devel" > Sent: Saturday, August 17, 2013 4:28:03 PM > Subject: [Swift-devel] test for "got one name (derr) and 0 values" > > f = quicklyFailingApp(); > foreach i in [0:sufficientlyLargeNumber] { > a[i] = someApp(f); > } > > dummy = sleep(15 seconds or so) // to give the other stuff time to do > its thing. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From davidk at ci.uchicago.edu Mon Aug 19 11:36:03 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Mon, 19 Aug 2013 11:36:03 -0500 (CDT) Subject: [Swift-devel] test for "got one name (derr) and 0 values" In-Reply-To: <2086040295.5496570.1376929457351.JavaMail.root@mcs.anl.gov> Message-ID: <1113209002.3633558.1376930163298.JavaMail.root@ci.uchicago.edu> I haven't seen this recently, but I have in the past when an application fails. In the ticket below it was the old modis scripts. https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=457 ----- Original Message ----- > From: "Michael Wilde" > To: "Swift Devel" > Sent: Monday, August 19, 2013 11:24:17 AM > Subject: Re: [Swift-devel] test for "got one name (derr) and 0 > values" > This is filed as bug 1067: > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1067 > The ticket contains a working implementation of Mihael's test sketch > below, but on simple tests on bridled Im not yet able to reproduce > the failure. > ALL: Can anyone recall who other than Jason and Lorenzo on Beagle > have encountered this failure? > I vaguely recall having see it in my own tests, but cant find any > other trace of it in my email at the moment. > Mihael, all, please update the ticket with what you know about this. > I really want to work hard to resolve this bug this week, as (I > think) its holding up some important science work. > Thanks, > - Mike > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Swift Devel" > > Sent: Saturday, August 17, 2013 4:28:03 PM > > Subject: [Swift-devel] test for "got one name (derr) and 0 values" > > > > f = quicklyFailingApp(); > > foreach i in [0:sufficientlyLargeNumber] { > > a[i] = someApp(f); > > } > > > > dummy = sleep(15 seconds or so) // to give the other stuff time to > > do > > its thing. > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Aug 19 11:40:57 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 19 Aug 2013 09:40:57 -0700 Subject: [Swift-devel] test for "got one name (derr) and 0 values" In-Reply-To: <1113209002.3633558.1376930163298.JavaMail.root@ci.uchicago.edu> References: <1113209002.3633558.1376930163298.JavaMail.root@ci.uchicago.edu> Message-ID: <1376930457.12805.0.camel@echo> Just a friendly reminder that lazy errors must be enabled. Mihael On Mon, 2013-08-19 at 11:36 -0500, David Kelly wrote: > I haven't seen this recently, but I have in the past when an application fails. In the ticket below it was the old modis scripts. > > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=457 > > ----- Original Message ----- > > > From: "Michael Wilde" > > To: "Swift Devel" > > Sent: Monday, August 19, 2013 11:24:17 AM > > Subject: Re: [Swift-devel] test for "got one name (derr) and 0 > > values" > > > This is filed as bug 1067: > > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1067 > > > The ticket contains a working implementation of Mihael's test sketch > > below, but on simple tests on bridled Im not yet able to reproduce > > the failure. > > > ALL: Can anyone recall who other than Jason and Lorenzo on Beagle > > have encountered this failure? > > > I vaguely recall having see it in my own tests, but cant find any > > other trace of it in my email at the moment. > > > Mihael, all, please update the ticket with what you know about this. > > I really want to work hard to resolve this bug this week, as (I > > think) its holding up some important science work. > > > Thanks, > > > - Mike > > > ----- Original Message ----- > > > From: "Mihael Hategan" > > > To: "Swift Devel" > > > Sent: Saturday, August 17, 2013 4:28:03 PM > > > Subject: [Swift-devel] test for "got one name (derr) and 0 values" > > > > > > f = quicklyFailingApp(); > > > foreach i in [0:sufficientlyLargeNumber] { > > > a[i] = someApp(f); > > > } > > > > > > dummy = sleep(15 seconds or so) // to give the other stuff time to > > > do > > > its thing. > > > > > > Mihael > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > I haven't seen this recently, but I have in the past when an > application fails. In the ticket below it was the old modis scripts. > > > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=457 > > > ______________________________________________________________________ > From: "Michael Wilde" > To: "Swift Devel" > Sent: Monday, August 19, 2013 11:24:17 AM > Subject: Re: [Swift-devel] test for "got one name (derr) and 0 > values" > > This is filed as bug 1067: > https://bugzilla.mcs.anl.gov/swift/show_bug.cgi?id=1067 > > The ticket contains a working implementation of Mihael's test > sketch below, but on simple tests on bridled Im not yet able > to reproduce the failure. > > ALL: Can anyone recall who other than Jason and Lorenzo on > Beagle have encountered this failure? > > I vaguely recall having see it in my own tests, but cant find > any other trace of it in my email at the moment. > > Mihael, all, please update the ticket with what you know about > this. I really want to work hard to resolve this bug this > week, as (I think) its holding up some important science work. > > Thanks, > > - Mike > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Swift Devel" > > Sent: Saturday, August 17, 2013 4:28:03 PM > > Subject: [Swift-devel] test for "got one name (derr) and 0 > values" > > > > f = quicklyFailingApp(); > > foreach i in [0:sufficientlyLargeNumber] { > > a[i] = someApp(f); > > } > > > > dummy = sleep(15 seconds or so) // to give the other stuff > time to do > > its thing. > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From wilde at mcs.anl.gov Mon Aug 19 11:46:58 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 19 Aug 2013 11:46:58 -0500 (CDT) Subject: [Swift-devel] test for "got one name (derr) and 0 values" In-Reply-To: <1376930457.12805.0.camel@echo> Message-ID: <194236248.5528012.1376930818411.JavaMail.root@mcs.anl.gov> > Just a friendly reminder that lazy errors must be enabled. Indeed, and it was, as indicated in ticket 1067 (see below). Can you suggest any other adjustments to the test that might trigger the problem? Are you able to run the test, and is does it replicate the logic you sketched? Thanks, - Mike bri$ cat ~/.swift/swift.properties sites.file=sites.xml tc.file=apps wrapperlog.always.transfer=true sitedir.keep=true file.gc.enabled=false status.mode=provider execution.retries=5 lazy.errors=true use.wrapper.staging=false use.provider.staging=false provider.staging.pin.swiftfiles=false bri$ From wilde at mcs.anl.gov Mon Aug 19 12:03:38 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 19 Aug 2013 12:03:38 -0500 (CDT) Subject: [Swift-devel] test for "got one name (derr) and 0 values" In-Reply-To: <194236248.5528012.1376930818411.JavaMail.root@mcs.anl.gov> Message-ID: <40413530.5546745.1376931818421.JavaMail.root@mcs.anl.gov> Ive tested now with provider staging and localhost on communicado. Will now shift to beagle, then try running on the compute nodes using local:pbs. Can you explain more about what you know and theorize about the problem? Is the error message coming from one of the vdl-int*.k scripts, or a karajan lib method, or ??? Thanks, - Mike ----- Original Message ----- > From: "Michael Wilde" > To: "Mihael Hategan" > Cc: "Swift Devel" > Sent: Monday, August 19, 2013 11:46:58 AM > Subject: Re: [Swift-devel] test for "got one name (derr) and 0 values" > > > > Just a friendly reminder that lazy errors must be enabled. > > Indeed, and it was, as indicated in ticket 1067 (see below). > > Can you suggest any other adjustments to the test that might trigger > the problem? > > Are you able to run the test, and is does it replicate the logic you > sketched? > > Thanks, > > - Mike > > bri$ cat ~/.swift/swift.properties > sites.file=sites.xml > tc.file=apps > > wrapperlog.always.transfer=true > sitedir.keep=true > file.gc.enabled=false > status.mode=provider > > execution.retries=5 > lazy.errors=true > > use.wrapper.staging=false > use.provider.staging=false > provider.staging.pin.swiftfiles=false > bri$ > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From wilde at mcs.anl.gov Mon Aug 19 12:06:36 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Mon, 19 Aug 2013 12:06:36 -0500 (CDT) Subject: [Swift-devel] test for "got one name (derr) and 0 values" In-Reply-To: <40413530.5546745.1376931818421.JavaMail.root@mcs.anl.gov> Message-ID: <1883809772.5547194.1376931996540.JavaMail.root@mcs.anl.gov> > Can you explain more about what you know and theorize about the > problem? Is the error message coming from one of the vdl-int*.k > scripts, or a karajan lib method, or ??? OK, I see its from here: src/org/globus/cog/karajan/workflow/nodes/SetVarK.java: throw new ExecutionException("Got one name (" + ident + ") and " + vargs.size() + " values: " + vargs); From hategan at mcs.anl.gov Mon Aug 19 12:26:18 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 19 Aug 2013 10:26:18 -0700 Subject: [Swift-devel] test for "got one name (derr) and 0 values" In-Reply-To: <194236248.5528012.1376930818411.JavaMail.root@mcs.anl.gov> References: <194236248.5528012.1376930818411.JavaMail.root@mcs.anl.gov> Message-ID: <1376933178.13266.1.camel@echo> On Mon, 2013-08-19 at 11:46 -0500, Michael Wilde wrote: > > Just a friendly reminder that lazy errors must be enabled. > > Indeed, and it was, as indicated in ticket 1067 (see below). > > Can you suggest any other adjustments to the test that might trigger the problem? I don't know any. I am mostly shooting in the dark at this point. > > Are you able to run the test, and is does it replicate the logic you sketched? Haven't tried yet. From hategan at mcs.anl.gov Mon Aug 19 12:32:31 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 19 Aug 2013 10:32:31 -0700 Subject: [Swift-devel] test for "got one name (derr) and 0 values" In-Reply-To: <40413530.5546745.1376931818421.JavaMail.root@mcs.anl.gov> References: <40413530.5546745.1376931818421.JavaMail.root@mcs.anl.gov> Message-ID: <1376933551.13266.7.camel@echo> On Mon, 2013-08-19 at 12:03 -0500, Michael Wilde wrote: > Ive tested now with provider staging and localhost on communicado. Will now shift to beagle, then try running on the compute nodes using local:pbs. > > Can you explain more about what you know and theorize about the problem? Is the error message coming from one of the vdl-int*.k scripts, or a karajan lib method, or ??? The only place where derr is assigned is execute-default.k: derr := try(deperror, false) And deperror is an optional argument to the current function. So that statement tries to read deperror and if it's not assigned it goes on to evaluating false. One of deperror or false wrongly contain a java null, which karajan considers as nothing in this case and produces the error. That shouldn't be happening. If deperror is assigned, it should have a non-null value and derr <- deperror. If it is not assigned, false is a global constant that should have a java Boolean.FALSE in it. Mihael From wilde at mcs.anl.gov Thu Aug 22 13:05:52 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Thu, 22 Aug 2013 13:05:52 -0500 (CDT) Subject: [Swift-devel] For OSGConnect: Fwd: ProjectName required to submit jobs In-Reply-To: Message-ID: <668047823.1125748.1377194752519.JavaMail.root@mcs.anl.gov> ----- Forwarded Message ----- From: "Marco Mambelli" To: rynge at isi.edu, "Mike Wilde" Cc: "Rob Gardner" , "Lincoln Bryant" , "Suchandra Thapa" , champions at hep.uchicago.edu Sent: Thursday, August 22, 2013 12:58:22 PM Subject: ProjectName required to submit jobs Mats, Mike, please circulate this in your groups. Later today we'll start enforcing on OSG-Connect the requirement of the specification of a project name in all HTCondor jobs. condor_submit will fail if no valid project name is specified. Check the "Choose your project name" section in the job submisison guide for OSG Connect: https://confluence.grid.iu.edu/display/CON/OSG+Connect+Quickstart#OSGConnectQuickstart-ChoosetheProjectName This will tell you how to check valid names and how to set the project name. Thank you, Marco From ketancmaheshwari at gmail.com Mon Aug 26 13:54:06 2013 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 26 Aug 2013 13:54:06 -0500 Subject: [Swift-devel] ui http error on browser Message-ID: Hi, I am trying to run swift trunk (swift-r7005 cog-r3767) to test the new http ui feature. When running with -ui http:8080 on midway machine, the browser shows the following error message: Error: The requested resource is not available It seems the http server starts but is missing index.html or the default file to display. Or something is wrong with my setup. The log for the run is: http://www.mcs.anl.gov/~ketan/catsn-20130826-1846-vgqnbo9b.log Thanks for any suggestions to fix this. Regards, -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Mon Aug 26 20:28:09 2013 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 26 Aug 2013 20:28:09 -0500 Subject: [Swift-devel] trunk execution fails with absolute path Message-ID: Hi, I notice the execution fails in trunk when providing absolute path for a file. I tested this with a catsn example. When providing relative path, it works. The mode is coasters, local:slurm on midway. Error message is as follows: Execution failed: Exception in cat: Arguments: [__root__/scratch/midway/maheshwari/globusonline-galaxy-globus-738fb324c285/tools/swift/data.txt] Host: midway Directory: catsn-20130827-0122-c0vppi56/jobs/d/cat-dpf6fdel exception @ swift-int-staging.k, line: 159 Caused by: Job failed with and exit code of 1 org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with and exit code of 1 (exit code: 1) at org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) at org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) at org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) at org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) at org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) at org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with and exit code of 1 org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with and exit code of 1 (exit code: 1) at org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) at org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) at org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) at org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) at org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) at org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) Log for this run is: http://www.mcs.anl.gov/~ketan/absolutepath.log Thanks for any suggestions on fixing this. Regards, -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketancmaheshwari at gmail.com Mon Aug 26 20:31:52 2013 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Mon, 26 Aug 2013 20:31:52 -0500 Subject: [Swift-devel] trunk execution fails with absolute path In-Reply-To: References: Message-ID: Just a note that I tested the same configuration with 0.94 which works. On Mon, Aug 26, 2013 at 8:28 PM, Ketan Maheshwari < ketancmaheshwari at gmail.com> wrote: > Hi, > > I notice the execution fails in trunk when providing absolute path for a > file. I tested this with a catsn example. When providing relative path, it > works. The mode is coasters, local:slurm on midway. > > Error message is as follows: > > Execution failed: > Exception in cat: > Arguments: > [__root__/scratch/midway/maheshwari/globusonline-galaxy-globus-738fb324c285/tools/swift/data.txt] > Host: midway > Directory: catsn-20130827-0122-c0vppi56/jobs/d/cat-dpf6fdel > exception @ swift-int-staging.k, line: 159 > Caused by: Job failed with and exit code of 1 > org.globus.cog.abstraction.impl.common.execution.JobException: Job failed > with and exit code of 1 (exit code: 1) > at > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > at > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > at > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > at > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > at > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > at > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with and exit code of 1 > org.globus.cog.abstraction.impl.common.execution.JobException: Job failed > with and exit code of 1 (exit code: 1) > at > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > at > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > at > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > at > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > at > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > at > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > Log for this run is: > http://www.mcs.anl.gov/~ketan/absolutepath.log > > Thanks for any suggestions on fixing this. > > Regards, > -- > Ketan > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Mon Aug 26 22:35:04 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Mon, 26 Aug 2013 20:35:04 -0700 Subject: [Swift-devel] tests changes Message-ID: <1377574504.18379.1.camel@echo> https://trac.ci.uchicago.edu/swift/changeset/6909 But I use local all the time! Please don't remove just because something is not used in one particular case. Remove only if something is not used at all. Mihael From wilde at mcs.anl.gov Tue Aug 27 09:34:42 2013 From: wilde at mcs.anl.gov (Michael Wilde) Date: Tue, 27 Aug 2013 09:34:42 -0500 (CDT) Subject: [Swift-devel] tests changes In-Reply-To: <1377574504.18379.1.camel@echo> Message-ID: <1931048936.2272296.1377614082549.JavaMail.root@mcs.anl.gov> Indeed, we should query swift-devel before making changes with potential negative impact (like removing or deprecating features, tests, docs etc). Regarding the tests removed: all the clusters there are still active. Maybe just move ones for which the tests are not yet usable to "pending/" or similar? Im not sure why beagle, local, local-coaster were on the list? - Mike ----- Original Message ----- > From: "Mihael Hategan" > To: "Swift Devel" > Sent: Monday, August 26, 2013 10:35:04 PM > Subject: [Swift-devel] tests changes > > https://trac.ci.uchicago.edu/swift/changeset/6909 > > But I use local all the time! > > Please don't remove just because something is not used in one > particular > case. Remove only if something is not used at all. > > Mihael > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > From yadudoc1729 at gmail.com Tue Aug 27 10:11:34 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Tue, 27 Aug 2013 10:11:34 -0500 Subject: [Swift-devel] tests changes In-Reply-To: <1931048936.2272296.1377614082549.JavaMail.root@mcs.anl.gov> References: <1377574504.18379.1.camel@echo> <1931048936.2272296.1377614082549.JavaMail.root@mcs.anl.gov> Message-ID: Hi Mike, Mihael, David and I had discussed this last week, and we came to the conclusion that since they are not being run regularly in the test-suite they maybe be obsolete. I am now running sanity tests and IO tests regularly which test most of these functionality. I had moved the sites specific configs to the sites/todo folder, for reference. I could add all of these deleted content back into either the todo folder or a pending folder. Please let me know what you would prefer. I see how deleting those files before checking with the group was a mistake. Sorry about that. Thanks, -Yadu On Tue, Aug 27, 2013 at 9:34 AM, Michael Wilde wrote: > Indeed, we should query swift-devel before making changes with potential > negative impact (like removing or deprecating features, tests, docs etc). > > Regarding the tests removed: all the clusters there are still active. > Maybe just move ones for which the tests are not yet usable to "pending/" > or similar? > > Im not sure why beagle, local, local-coaster were on the list? > > - Mike > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Swift Devel" > > Sent: Monday, August 26, 2013 10:35:04 PM > > Subject: [Swift-devel] tests changes > > > > https://trac.ci.uchicago.edu/swift/changeset/6909 > > > > But I use local all the time! > > > > Please don't remove just because something is not used in one > > particular > > case. Remove only if something is not used at all. > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Aug 27 10:54:01 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Aug 2013 08:54:01 -0700 Subject: [Swift-devel] tests changes In-Reply-To: References: <1377574504.18379.1.camel@echo> <1931048936.2272296.1377614082549.JavaMail.root@mcs.anl.gov> Message-ID: <1377618841.9637.1.camel@echo> The local dir was the default when running suite.sh in tree mode (-t). It is how I test non-trivial changes. Compile, cd tests, ./suite.sh -t groups/group-all-local. That should work whether or not it is used for the wider battery of tests. Mihael On Tue, 2013-08-27 at 10:11 -0500, Yadu Nand wrote: > Hi Mike, Mihael, > > David and I had discussed this last week, and we came to the conclusion > that since they > are not being run regularly in the test-suite they maybe be obsolete. I am > now running > sanity tests and IO tests regularly which test most of these functionality. > I had moved the > sites specific configs to the sites/todo folder, for reference. > > I could add all of these deleted content back into either the todo folder > or a pending folder. > Please let me know what you would prefer. > > I see how deleting those files before checking with the group was a > mistake. Sorry about that. > > Thanks, > -Yadu > > > On Tue, Aug 27, 2013 at 9:34 AM, Michael Wilde wrote: > > > Indeed, we should query swift-devel before making changes with potential > > negative impact (like removing or deprecating features, tests, docs etc). > > > > Regarding the tests removed: all the clusters there are still active. > > Maybe just move ones for which the tests are not yet usable to "pending/" > > or similar? > > > > Im not sure why beagle, local, local-coaster were on the list? > > > > - Mike > > > > ----- Original Message ----- > > > From: "Mihael Hategan" > > > To: "Swift Devel" > > > Sent: Monday, August 26, 2013 10:35:04 PM > > > Subject: [Swift-devel] tests changes > > > > > > https://trac.ci.uchicago.edu/swift/changeset/6909 > > > > > > But I use local all the time! > > > > > > Please don't remove just because something is not used in one > > > particular > > > case. Remove only if something is not used at all. > > > > > > Mihael > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > Hi Mike, Mihael, > > > David and I had discussed this last week, and we came to the > conclusion that since they > are not being run regularly in the test-suite they maybe be obsolete. > I am now running > sanity tests and IO tests regularly which test most of these > functionality. I had moved the > sites specific configs to the sites/todo folder, for reference. > > > I could add all of these deleted content back into either the todo > folder or a pending folder. > Please let me know what you would prefer. > > > I see how deleting those files before checking with the group was a > mistake. Sorry about that. > > > Thanks, > -Yadu > > > On Tue, Aug 27, 2013 at 9:34 AM, Michael Wilde > wrote: > Indeed, we should query swift-devel before making changes with > potential negative impact (like removing or deprecating > features, tests, docs etc). > > Regarding the tests removed: all the clusters there are still > active. Maybe just move ones for which the tests are not yet > usable to "pending/" or similar? > > Im not sure why beagle, local, local-coaster were on the list? > > - Mike > > ----- Original Message ----- > > From: "Mihael Hategan" > > To: "Swift Devel" > > Sent: Monday, August 26, 2013 10:35:04 PM > > Subject: [Swift-devel] tests changes > > > > https://trac.ci.uchicago.edu/swift/changeset/6909 > > > > But I use local all the time! > > > > Please don't remove just because something is not used in > one > > particular > > case. Remove only if something is not used at all. > > > > Mihael > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > -- > Yadu Nand B From yadudoc1729 at gmail.com Tue Aug 27 11:38:02 2013 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Tue, 27 Aug 2013 11:38:02 -0500 Subject: [Swift-devel] tests changes In-Reply-To: <1377618841.9637.1.camel@echo> References: <1377574504.18379.1.camel@echo> <1931048936.2272296.1377614082549.JavaMail.root@mcs.anl.gov> <1377618841.9637.1.camel@echo> Message-ID: Hi Mihael, Are any tests failing for you when you run ./suite.sh -t groups/group-all-local ? It is working for me on both trunk and 0.94.1-RC2 and that pretty much runs every night. Are there any other groups that you run ? -Yadu On Tue, Aug 27, 2013 at 10:54 AM, Mihael Hategan wrote: > The local dir was the default when running suite.sh in tree mode (-t). > It is how I test non-trivial changes. Compile, cd tests, ./suite.sh -t > groups/group-all-local. That should work whether or not it is used for > the wider battery of tests. > > Mihael > > On Tue, 2013-08-27 at 10:11 -0500, Yadu Nand wrote: > > Hi Mike, Mihael, > > > > David and I had discussed this last week, and we came to the conclusion > > that since they > > are not being run regularly in the test-suite they maybe be obsolete. I > am > > now running > > sanity tests and IO tests regularly which test most of these > functionality. > > I had moved the > > sites specific configs to the sites/todo folder, for reference. > > > > I could add all of these deleted content back into either the todo folder > > or a pending folder. > > Please let me know what you would prefer. > > > > I see how deleting those files before checking with the group was a > > mistake. Sorry about that. > > > > Thanks, > > -Yadu > > > > > > On Tue, Aug 27, 2013 at 9:34 AM, Michael Wilde > wrote: > > > > > Indeed, we should query swift-devel before making changes with > potential > > > negative impact (like removing or deprecating features, tests, docs > etc). > > > > > > Regarding the tests removed: all the clusters there are still active. > > > Maybe just move ones for which the tests are not yet usable to > "pending/" > > > or similar? > > > > > > Im not sure why beagle, local, local-coaster were on the list? > > > > > > - Mike > > > > > > ----- Original Message ----- > > > > From: "Mihael Hategan" > > > > To: "Swift Devel" > > > > Sent: Monday, August 26, 2013 10:35:04 PM > > > > Subject: [Swift-devel] tests changes > > > > > > > > https://trac.ci.uchicago.edu/swift/changeset/6909 > > > > > > > > But I use local all the time! > > > > > > > > Please don't remove just because something is not used in one > > > > particular > > > > case. Remove only if something is not used at all. > > > > > > > > Mihael > > > > > > > > _______________________________________________ > > > > Swift-devel mailing list > > > > Swift-devel at ci.uchicago.edu > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > Hi Mike, Mihael, > > > > > > David and I had discussed this last week, and we came to the > > conclusion that since they > > are not being run regularly in the test-suite they maybe be obsolete. > > I am now running > > sanity tests and IO tests regularly which test most of these > > functionality. I had moved the > > sites specific configs to the sites/todo folder, for reference. > > > > > > I could add all of these deleted content back into either the todo > > folder or a pending folder. > > Please let me know what you would prefer. > > > > > > I see how deleting those files before checking with the group was a > > mistake. Sorry about that. > > > > > > Thanks, > > -Yadu > > > > > > On Tue, Aug 27, 2013 at 9:34 AM, Michael Wilde > > wrote: > > Indeed, we should query swift-devel before making changes with > > potential negative impact (like removing or deprecating > > features, tests, docs etc). > > > > Regarding the tests removed: all the clusters there are still > > active. Maybe just move ones for which the tests are not yet > > usable to "pending/" or similar? > > > > Im not sure why beagle, local, local-coaster were on the list? > > > > - Mike > > > > ----- Original Message ----- > > > From: "Mihael Hategan" > > > To: "Swift Devel" > > > Sent: Monday, August 26, 2013 10:35:04 PM > > > Subject: [Swift-devel] tests changes > > > > > > https://trac.ci.uchicago.edu/swift/changeset/6909 > > > > > > But I use local all the time! > > > > > > Please don't remove just because something is not used in > > one > > > particular > > > case. Remove only if something is not used at all. > > > > > > Mihael > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > > > > -- > > Yadu Nand B > > > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Tue Aug 27 15:56:35 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Aug 2013 13:56:35 -0700 Subject: [Swift-devel] tests changes In-Reply-To: References: <1377574504.18379.1.camel@echo> <1931048936.2272296.1377614082549.JavaMail.root@mcs.anl.gov> <1377618841.9637.1.camel@echo> Message-ID: <1377636995.11804.1.camel@echo> On Tue, 2013-08-27 at 11:38 -0500, Yadu Nand wrote: > Hi Mihael, > > Are any tests failing for you when you run ./suite.sh -t > groups/group-all-local ? > It is working for me on both trunk and 0.94.1-RC2 and that pretty much runs > every night. Didn't work for me. Also didn't work for the nightly tests: http://www.ci.uchicago.edu/swift/tests/swift-trunk/run-2013-08-27/tests-2013-08-27.html Mihael From hategan at mcs.anl.gov Tue Aug 27 18:51:56 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Aug 2013 16:51:56 -0700 Subject: [Swift-devel] ui http error on browser In-Reply-To: References: Message-ID: <1377647516.11804.3.camel@echo> I added log messages when a request fails. They weren't there before, so can you try with the latest trunk? Also, can you let me know exactly what URL you typed in the browser window? And if possible the exact error message that you got back? Mihael On Mon, 2013-08-26 at 13:54 -0500, Ketan Maheshwari wrote: > Hi, > > I am trying to run swift trunk (swift-r7005 cog-r3767) to test the new http > ui feature. > > When running with -ui http:8080 on midway machine, the browser shows the > following error message: > Error: The requested resource is not available > > It seems the http server starts but is missing index.html or the default > file to display. Or something is wrong with my setup. > > The log for the run is: > http://www.mcs.anl.gov/~ketan/catsn-20130826-1846-vgqnbo9b.log > > Thanks for any suggestions to fix this. > > Regards, > Hi, > > > I am trying to run swift trunk (swift-r7005 cog-r3767) to test the new > http ui feature. > > > When running with -ui http:8080 on midway machine, the browser shows > the following error message: > Error: The requested resource is not available > > > It seems the http server starts but is missing index.html or the > default file to display. Or something is wrong with my setup. > > > > The log for the run is: > http://www.mcs.anl.gov/~ketan/catsn-20130826-1846-vgqnbo9b.log > > > Thanks for any suggestions to fix this. > > > > Regards, > > -- > Ketan > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Tue Aug 27 20:50:11 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 27 Aug 2013 18:50:11 -0700 Subject: [Swift-devel] trunk execution fails with absolute path In-Reply-To: References: Message-ID: <1377654611.11804.4.camel@echo> Should be fixed now. Please give it a try and let me know if it works. On Mon, 2013-08-26 at 20:31 -0500, Ketan Maheshwari wrote: > Just a note that I tested the same configuration with 0.94 which works. > > > On Mon, Aug 26, 2013 at 8:28 PM, Ketan Maheshwari < > ketancmaheshwari at gmail.com> wrote: > > > Hi, > > > > I notice the execution fails in trunk when providing absolute path for a > > file. I tested this with a catsn example. When providing relative path, it > > works. The mode is coasters, local:slurm on midway. > > > > Error message is as follows: > > > > Execution failed: > > Exception in cat: > > Arguments: > > [__root__/scratch/midway/maheshwari/globusonline-galaxy-globus-738fb324c285/tools/swift/data.txt] > > Host: midway > > Directory: catsn-20130827-0122-c0vppi56/jobs/d/cat-dpf6fdel > > exception @ swift-int-staging.k, line: 159 > > Caused by: Job failed with and exit code of 1 > > org.globus.cog.abstraction.impl.common.execution.JobException: Job failed > > with and exit code of 1 (exit code: 1) > > at > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > at > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > at > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > at > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > at > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > at > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: > > Job failed with and exit code of 1 > > org.globus.cog.abstraction.impl.common.execution.JobException: Job failed > > with and exit code of 1 (exit code: 1) > > at > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > at > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > at > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > at > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > at > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > at > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > > > Log for this run is: > > http://www.mcs.anl.gov/~ketan/absolutepath.log > > > > Thanks for any suggestions on fixing this. > > > > Regards, > > -- > > Ketan > > > > > > > Just a note that I tested the same configuration with 0.94 which > works. > > > > On Mon, Aug 26, 2013 at 8:28 PM, Ketan Maheshwari > wrote: > Hi, > > > I notice the execution fails in trunk when providing absolute > path for a file. I tested this with a catsn example. When > providing relative path, it works. The mode is coasters, > local:slurm on midway. > > > Error message is as follows: > > Execution failed: > Exception in cat: > Arguments: > [__root__/scratch/midway/maheshwari/globusonline-galaxy-globus-738fb324c285/tools/swift/data.txt] > Host: midway > Directory: > catsn-20130827-0122-c0vppi56/jobs/d/cat-dpf6fdel > exception @ swift-int-staging.k, line: 159 > Caused by: Job failed with and exit code of 1 > org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with and exit code of 1 (exit code: 1) > at > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > at > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > at > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > at > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > at > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > at > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > Caused by: > org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with and exit code of 1 > org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed with and exit code of 1 (exit code: 1) > at > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > at > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > at > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > at > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > at > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > at > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > Log for this run is: > http://www.mcs.anl.gov/~ketan/absolutepath.log > > > > Thanks for any suggestions on fixing this. > > > Regards, > > -- > Ketan > > > > > > -- > Ketan > > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From ketancmaheshwari at gmail.com Wed Aug 28 20:35:05 2013 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 28 Aug 2013 20:35:05 -0500 Subject: [Swift-devel] ui http error on browser In-Reply-To: <1377647516.11804.3.camel@echo> References: <1377647516.11804.3.camel@echo> Message-ID: Hi Mihael, I tried with the updated version. The log is attached. In the browser, I type: localhost:8080 The exact message is same as earlier: Error: The requested resource is not available (in large fonts) Thanks, Ketan On Tue, Aug 27, 2013 at 6:51 PM, Mihael Hategan wrote: > I added log messages when a request fails. They weren't there before, so > can you try with the latest trunk? > > Also, can you let me know exactly what URL you typed in the browser > window? And if possible the exact error message that you got back? > > Mihael > > On Mon, 2013-08-26 at 13:54 -0500, Ketan Maheshwari wrote: > > Hi, > > > > I am trying to run swift trunk (swift-r7005 cog-r3767) to test the new > http > > ui feature. > > > > When running with -ui http:8080 on midway machine, the browser shows the > > following error message: > > Error: The requested resource is not available > > > > It seems the http server starts but is missing index.html or the default > > file to display. Or something is wrong with my setup. > > > > The log for the run is: > > http://www.mcs.anl.gov/~ketan/catsn-20130826-1846-vgqnbo9b.log > > > > Thanks for any suggestions to fix this. > > > > Regards, > > Hi, > > > > > > I am trying to run swift trunk (swift-r7005 cog-r3767) to test the new > > http ui feature. > > > > > > When running with -ui http:8080 on midway machine, the browser shows > > the following error message: > > Error: The requested resource is not available > > > > > > It seems the http server starts but is missing index.html or the > > default file to display. Or something is wrong with my setup. > > > > > > > > The log for the run is: > > http://www.mcs.anl.gov/~ketan/catsn-20130826-1846-vgqnbo9b.log > > > > > > Thanks for any suggestions to fix this. > > > > > > > > Regards, > > > > -- > > Ketan > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: uilog.tgz Type: application/x-gzip Size: 558147 bytes Desc: not available URL: From hategan at mcs.anl.gov Wed Aug 28 20:50:39 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Wed, 28 Aug 2013 18:50:39 -0700 Subject: [Swift-devel] ui http error on browser In-Reply-To: References: <1377647516.11804.3.camel@echo> Message-ID: <1377741039.4674.0.camel@echo> Odd. Can you send me the swift jar file from the dist/[...]/lib dir? Mihael On Wed, 2013-08-28 at 20:35 -0500, Ketan Maheshwari wrote: > Hi Mihael, > > I tried with the updated version. The log is attached. > > In the browser, I type: localhost:8080 > > The exact message is same as earlier: > Error: The requested resource is not available (in large fonts) > > Thanks, > Ketan > From ketancmaheshwari at gmail.com Wed Aug 28 21:41:07 2013 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Wed, 28 Aug 2013 21:41:07 -0500 Subject: [Swift-devel] ui http error on browser In-Reply-To: <1377741039.4674.0.camel@echo> References: <1377647516.11804.3.camel@echo> <1377741039.4674.0.camel@echo> Message-ID: Please find attached. On Wed, Aug 28, 2013 at 8:50 PM, Mihael Hategan wrote: > Odd. Can you send me the swift jar file from the dist/[...]/lib dir? > > Mihael > > On Wed, 2013-08-28 at 20:35 -0500, Ketan Maheshwari wrote: > > Hi Mihael, > > > > I tried with the updated version. The log is attached. > > > > In the browser, I type: localhost:8080 > > > > The exact message is same as earlier: > > Error: The requested resource is not available (in large fonts) > > > > Thanks, > > Ketan > > > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cog-swift-svn.jar Type: application/java-archive Size: 1616864 bytes Desc: not available URL: From ketancmaheshwari at gmail.com Thu Aug 29 14:59:14 2013 From: ketancmaheshwari at gmail.com (Ketan Maheshwari) Date: Thu, 29 Aug 2013 14:59:14 -0500 Subject: [Swift-devel] trunk execution fails with absolute path In-Reply-To: <1377654611.11804.4.camel@echo> References: <1377654611.11804.4.camel@echo> Message-ID: Hi Mihael, This works now. Just a note that I see status ticker showing every second instead of every 30 seconds. Thanks, Ketan On Tue, Aug 27, 2013 at 8:50 PM, Mihael Hategan wrote: > Should be fixed now. Please give it a try and let me know if it works. > > On Mon, 2013-08-26 at 20:31 -0500, Ketan Maheshwari wrote: > > Just a note that I tested the same configuration with 0.94 which works. > > > > > > On Mon, Aug 26, 2013 at 8:28 PM, Ketan Maheshwari < > > ketancmaheshwari at gmail.com> wrote: > > > > > Hi, > > > > > > I notice the execution fails in trunk when providing absolute path for > a > > > file. I tested this with a catsn example. When providing relative > path, it > > > works. The mode is coasters, local:slurm on midway. > > > > > > Error message is as follows: > > > > > > Execution failed: > > > Exception in cat: > > > Arguments: > > > > [__root__/scratch/midway/maheshwari/globusonline-galaxy-globus-738fb324c285/tools/swift/data.txt] > > > Host: midway > > > Directory: catsn-20130827-0122-c0vppi56/jobs/d/cat-dpf6fdel > > > exception @ swift-int-staging.k, line: 159 > > > Caused by: Job failed with and exit code of 1 > > > org.globus.cog.abstraction.impl.common.execution.JobException: Job > failed > > > with and exit code of 1 (exit code: 1) > > > at > > > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > > at > > > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > > at > > > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > > at > > > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > > at > > > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > > at > > > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > > > Caused by: > org.globus.cog.abstraction.impl.common.execution.JobException: > > > Job failed with and exit code of 1 > > > org.globus.cog.abstraction.impl.common.execution.JobException: Job > failed > > > with and exit code of 1 (exit code: 1) > > > at > > > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > > at > > > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > > at > > > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > > at > > > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > > at > > > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > > at > > > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > > > > > > Log for this run is: > > > http://www.mcs.anl.gov/~ketan/absolutepath.log > > > > > > Thanks for any suggestions on fixing this. > > > > > > Regards, > > > -- > > > Ketan > > > > > > > > > > > > Just a note that I tested the same configuration with 0.94 which > > works. > > > > > > > > On Mon, Aug 26, 2013 at 8:28 PM, Ketan Maheshwari > > wrote: > > Hi, > > > > > > I notice the execution fails in trunk when providing absolute > > path for a file. I tested this with a catsn example. When > > providing relative path, it works. The mode is coasters, > > local:slurm on midway. > > > > > > Error message is as follows: > > > > Execution failed: > > Exception in cat: > > Arguments: > > > [__root__/scratch/midway/maheshwari/globusonline-galaxy-globus-738fb324c285/tools/swift/data.txt] > > Host: midway > > Directory: > > catsn-20130827-0122-c0vppi56/jobs/d/cat-dpf6fdel > > exception @ swift-int-staging.k, line: 159 > > Caused by: Job failed with and exit code of 1 > > org.globus.cog.abstraction.impl.common.execution.JobException: > > Job failed with and exit code of 1 (exit code: 1) > > at > > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > at > > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > at > > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > at > > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > at > > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > at > > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > Caused by: > > org.globus.cog.abstraction.impl.common.execution.JobException: > > Job failed with and exit code of 1 > > org.globus.cog.abstraction.impl.common.execution.JobException: > > Job failed with and exit code of 1 (exit code: 1) > > at > > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > at > > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > at > > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > at > > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > at > > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > at > > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > > > > > Log for this run is: > > http://www.mcs.anl.gov/~ketan/absolutepath.log > > > > > > > > Thanks for any suggestions on fixing this. > > > > > > Regards, > > > > -- > > Ketan > > > > > > > > > > > > -- > > Ketan > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > -- Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From davidk at ci.uchicago.edu Fri Aug 30 16:39:32 2013 From: davidk at ci.uchicago.edu (David Kelly) Date: Fri, 30 Aug 2013 16:39:32 -0500 (CDT) Subject: [Swift-devel] Bugs related to execution providers In-Reply-To: <2012326951.6914428.1377898605002.JavaMail.root@ci.uchicago.edu> Message-ID: <1765477359.6914657.1377898772711.JavaMail.root@ci.uchicago.edu> Hello, I've put together this list of bugzilla bugs related to execution providers and assigned them to myself. I'm not sure yet how many of these are still issues. Some are very old and need to be tested. 133 - PBS walltime violation poor reporting 138 - Spaces in filenames sometimes don't work right with GRAM2+Condor 210 - Job exceeding wallclock limit -- error is not reported by swift 224 - PBS job submission fails when maxnodes greater than queue limit 227 - Always keep submit and stdout/err files for failing jobs from localscheduler provider 327 - PBS job name 409 - Clean up stack dump on manageable qsub errors 453 - Swift accepts bad sites parameters without warning to user591 - Bad project in PBS-coaster sites file causes swift to hang with no message to user 620 - On ranger, jobs are not cancelled after Swift run is killed682 - Make location and naming of execution provider logs and scripts more manageable flexible 740 - Swift continues if the project you specify has no allocation 746 - Cobalt provider on Eureka only uses 8 cores in multi-node jobs 758 - Cobalt provider on Eureka fails silently if user project is invalid 776 - Swift waits on failed job with no messages to user 778 - With status.mode=file, app non-zero exit code is returned as success 979 - Runtime errors in SLURM provider are not halting run and getting sent to user 992 - Allow user defined settings in SGE provider 1010 - Condor jobs linger in C state after completing successfully 1016 - Slurm options from sites definitions does not make into job def using gt2 provider1044 - PBS provider generates incorrect submit file directives for coaster worker logging 1054 - Implement retry around local scheduler commands 1060 - Add option to condor provider to save condor logs Are there any other known issues that are not represented here? Please let me know. Thanks! Regards, David -------------- next part -------------- An HTML attachment was scrubbed... URL: From tim.g.armstrong at gmail.com Fri Aug 30 17:01:31 2013 From: tim.g.armstrong at gmail.com (Tim Armstrong) Date: Fri, 30 Aug 2013 17:01:31 -0500 Subject: [Swift-devel] Submitting jobs on Blue Waters Message-ID: Hi All, I just had some preliminary success in getting Swift/T running on Blue Waters. Mike suggested I should email swift-devel to summarize what that entailed. Essentially things are the same as on other Cray systems (Beagle, Raven, etc), except the PBS directives for job size are different. You need to use something of the form: #PBS -l nodes=1:ppn=32 The regular mpp* directives don't work (and strange things like jobs stuck in the queue seem to happen if you do specify them). A complete submit script is pasted below for context. Cheers, Tim ------------------------------------------------------------------------------------------------- #PBS -N TURBINE #PBS -q normal #PBS -l walltime=00:15:00 #PBS -o /u/sciteam/tarmstro/turbine-output/2013/08/30/16/09/40/output.txt # Set the job size using appropriate directives for this system #PBS -l nodes=1:ppn=32 # Pass all environment variables to the job #PBS -V # Merge stdout/stderr #PBS -j oe # Disable mail #PBS -m n SCRIPT=/mnt/a/u/sciteam/tarmstro/helloworld.tcl ARGS="" NODES=1 WALLTIME=00:15:00 TURBINE_OUTPUT=/u/sciteam/tarmstro/turbine-output/2013/08/30/16/09/40 TCLSH=/usr/bin/tclsh8.5 cd ${TURBINE_OUTPUT} OUTPUT_FILE=/u/sciteam/tarmstro/turbine-output/2013/08/30/16/09/40/output.txt aprun -n 32 -N 32 -cc none -d 1 ${TCLSH} ${SCRIPT} ${ARGS} \ 2>&1 > "${OUTPUT_FILE}.${PBS_JOBID}.out" -------------- next part -------------- An HTML attachment was scrubbed... URL: From foster at anl.gov Fri Aug 30 17:13:27 2013 From: foster at anl.gov (Ian Foster) Date: Fri, 30 Aug 2013 17:13:27 -0500 Subject: [Swift-devel] Submitting jobs on Blue Waters In-Reply-To: References: Message-ID: <161E1527-B78A-4214-8ED5-A951CCA239D9@anl.gov> Nice! On Aug 30, 2013, at 5:01 PM, Tim Armstrong wrote: > Hi All, > I just had some preliminary success in getting Swift/T running on Blue Waters. Mike suggested I should email swift-devel to summarize what that entailed. > > Essentially things are the same as on other Cray systems (Beagle, Raven, etc), except the PBS directives for job size are different. You need to use something of the form: > > #PBS -l nodes=1:ppn=32 > > The regular mpp* directives don't work (and strange things like jobs stuck in the queue seem to happen if you do specify them). > > A complete submit script is pasted below for context. > > Cheers, > Tim > > > > ------------------------------------------------------------------------------------------------- > > #PBS -N TURBINE > #PBS -q normal > #PBS -l walltime=00:15:00 > #PBS -o /u/sciteam/tarmstro/turbine-output/2013/08/30/16/09/40/output.txt > > # Set the job size using appropriate directives for this system > #PBS -l nodes=1:ppn=32 > > # Pass all environment variables to the job > #PBS -V > > # Merge stdout/stderr > #PBS -j oe > # Disable mail > #PBS -m n > > SCRIPT=/mnt/a/u/sciteam/tarmstro/helloworld.tcl > ARGS="" > NODES=1 > WALLTIME=00:15:00 > TURBINE_OUTPUT=/u/sciteam/tarmstro/turbine-output/2013/08/30/16/09/40 > TCLSH=/usr/bin/tclsh8.5 > > cd ${TURBINE_OUTPUT} > OUTPUT_FILE=/u/sciteam/tarmstro/turbine-output/2013/08/30/16/09/40/output.txt > aprun -n 32 -N 32 -cc none -d 1 ${TCLSH} ${SCRIPT} ${ARGS} \ > 2>&1 > "${OUTPUT_FILE}.${PBS_JOBID}.out" > > _______________________________________________ > Swift-devel mailing list > Swift-devel at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel From hategan at mcs.anl.gov Sat Aug 31 14:56:09 2013 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 31 Aug 2013 12:56:09 -0700 Subject: [Swift-devel] trunk execution fails with absolute path In-Reply-To: References: <1377654611.11804.4.camel@echo> Message-ID: <1377978969.27620.0.camel@echo> The ticker should not be fixed. Mihael On Thu, 2013-08-29 at 14:59 -0500, Ketan Maheshwari wrote: > Hi Mihael, > > This works now. Just a note that I see status ticker showing every second > instead of every 30 seconds. > > Thanks, > Ketan > > > On Tue, Aug 27, 2013 at 8:50 PM, Mihael Hategan wrote: > > > Should be fixed now. Please give it a try and let me know if it works. > > > > On Mon, 2013-08-26 at 20:31 -0500, Ketan Maheshwari wrote: > > > Just a note that I tested the same configuration with 0.94 which works. > > > > > > > > > On Mon, Aug 26, 2013 at 8:28 PM, Ketan Maheshwari < > > > ketancmaheshwari at gmail.com> wrote: > > > > > > > Hi, > > > > > > > > I notice the execution fails in trunk when providing absolute path for > > a > > > > file. I tested this with a catsn example. When providing relative > > path, it > > > > works. The mode is coasters, local:slurm on midway. > > > > > > > > Error message is as follows: > > > > > > > > Execution failed: > > > > Exception in cat: > > > > Arguments: > > > > > > [__root__/scratch/midway/maheshwari/globusonline-galaxy-globus-738fb324c285/tools/swift/data.txt] > > > > Host: midway > > > > Directory: catsn-20130827-0122-c0vppi56/jobs/d/cat-dpf6fdel > > > > exception @ swift-int-staging.k, line: 159 > > > > Caused by: Job failed with and exit code of 1 > > > > org.globus.cog.abstraction.impl.common.execution.JobException: Job > > failed > > > > with and exit code of 1 (exit code: 1) > > > > at > > > > > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > > > at > > > > > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > > > at > > > > > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > > > at > > > > > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > > > at > > > > > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > > > at > > > > > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > > > > > Caused by: > > org.globus.cog.abstraction.impl.common.execution.JobException: > > > > Job failed with and exit code of 1 > > > > org.globus.cog.abstraction.impl.common.execution.JobException: Job > > failed > > > > with and exit code of 1 (exit code: 1) > > > > at > > > > > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > > > at > > > > > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > > > at > > > > > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > > > at > > > > > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > > > at > > > > > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > > > at > > > > > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > > > > > > > > > Log for this run is: > > > > http://www.mcs.anl.gov/~ketan/absolutepath.log > > > > > > > > Thanks for any suggestions on fixing this. > > > > > > > > Regards, > > > > -- > > > > Ketan > > > > > > > > > > > > > > > > > Just a note that I tested the same configuration with 0.94 which > > > works. > > > > > > > > > > > > On Mon, Aug 26, 2013 at 8:28 PM, Ketan Maheshwari > > > wrote: > > > Hi, > > > > > > > > > I notice the execution fails in trunk when providing absolute > > > path for a file. I tested this with a catsn example. When > > > providing relative path, it works. The mode is coasters, > > > local:slurm on midway. > > > > > > > > > Error message is as follows: > > > > > > Execution failed: > > > Exception in cat: > > > Arguments: > > > > > [__root__/scratch/midway/maheshwari/globusonline-galaxy-globus-738fb324c285/tools/swift/data.txt] > > > Host: midway > > > Directory: > > > catsn-20130827-0122-c0vppi56/jobs/d/cat-dpf6fdel > > > exception @ swift-int-staging.k, line: 159 > > > Caused by: Job failed with and exit code of 1 > > > org.globus.cog.abstraction.impl.common.execution.JobException: > > > Job failed with and exit code of 1 (exit code: 1) > > > at > > > > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > > at > > > > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > > at > > > > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > > at > > > > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > > at > > > > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > > at > > > > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > > > Caused by: > > > org.globus.cog.abstraction.impl.common.execution.JobException: > > > Job failed with and exit code of 1 > > > org.globus.cog.abstraction.impl.common.execution.JobException: > > > Job failed with and exit code of 1 (exit code: 1) > > > at > > > > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > > at > > > > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > > at > > > > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > > at > > > > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > > at > > > > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > > at > > > > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > > > > > > > > > Log for this run is: > > > http://www.mcs.anl.gov/~ketan/absolutepath.log > > > > > > > > > > > > Thanks for any suggestions on fixing this. > > > > > > > > > Regards, > > > > > > -- > > > Ketan > > > > > > > > > > > > > > > > > > -- > > > Ketan > > > > > > > > > _______________________________________________ > > > Swift-devel mailing list > > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > > > > Hi Mihael, > > > This works now. Just a note that I see status ticker showing every > second instead of every 30 seconds. > > > Thanks, > > Ketan > > > > On Tue, Aug 27, 2013 at 8:50 PM, Mihael Hategan > wrote: > Should be fixed now. Please give it a try and let me know if > it works. > > On Mon, 2013-08-26 at 20:31 -0500, Ketan Maheshwari wrote: > > Just a note that I tested the same configuration with 0.94 > which works. > > > > > > On Mon, Aug 26, 2013 at 8:28 PM, Ketan Maheshwari < > > ketancmaheshwari at gmail.com> wrote: > > > > > Hi, > > > > > > I notice the execution fails in trunk when providing > absolute path for a > > > file. I tested this with a catsn example. When providing > relative path, it > > > works. The mode is coasters, local:slurm on midway. > > > > > > Error message is as follows: > > > > > > Execution failed: > > > Exception in cat: > > > Arguments: > > > > [__root__/scratch/midway/maheshwari/globusonline-galaxy-globus-738fb324c285/tools/swift/data.txt] > > > Host: midway > > > Directory: > catsn-20130827-0122-c0vppi56/jobs/d/cat-dpf6fdel > > > exception @ swift-int-staging.k, line: 159 > > > Caused by: Job failed with and exit code of 1 > > > > org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed > > > with and exit code of 1 (exit code: 1) > > > at > > > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > > at > > > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > > at > > > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > > at > > > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > > at > > > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > > at > > > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > > > Caused by: > org.globus.cog.abstraction.impl.common.execution.JobException: > > > Job failed with and exit code of 1 > > > > org.globus.cog.abstraction.impl.common.execution.JobException: > Job failed > > > with and exit code of 1 (exit code: 1) > > > at > > > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > > at > > > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > > at > > > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > > at > > > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > > at > > > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > > at > > > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > > > > > > Log for this run is: > > > http://www.mcs.anl.gov/~ketan/absolutepath.log > > > > > > Thanks for any suggestions on fixing this. > > > > > > Regards, > > > -- > > > Ketan > > > > > > > > > > > > Just a note that I tested the same configuration with 0.94 > which > > works. > > > > > > > > On Mon, Aug 26, 2013 at 8:28 PM, Ketan Maheshwari > > wrote: > > Hi, > > > > > > I notice the execution fails in trunk when providing > absolute > > path for a file. I tested this with a catsn example. > When > > providing relative path, it works. The mode is > coasters, > > local:slurm on midway. > > > > > > Error message is as follows: > > > > Execution failed: > > Exception in cat: > > Arguments: > > > [__root__/scratch/midway/maheshwari/globusonline-galaxy-globus-738fb324c285/tools/swift/data.txt] > > Host: midway > > Directory: > > catsn-20130827-0122-c0vppi56/jobs/d/cat-dpf6fdel > > exception @ swift-int-staging.k, line: 159 > > Caused by: Job failed with and exit code of 1 > > > org.globus.cog.abstraction.impl.common.execution.JobException: > > Job failed with and exit code of 1 (exit code: 1) > > at > > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > at > > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > at > > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > at > > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > at > > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > at > > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > Caused by: > > > org.globus.cog.abstraction.impl.common.execution.JobException: > > Job failed with and exit code of 1 > > > org.globus.cog.abstraction.impl.common.execution.JobException: > > Job failed with and exit code of 1 (exit code: 1) > > at > > > org.globus.cog.abstraction.coaster.service.local.JobStatusHandler.requestComplete(JobStatusHandler.java:40) > > at > > > org.globus.cog.coaster.handlers.RequestHandler.receiveCompleted(RequestHandler.java:88) > > at > > > org.globus.cog.coaster.channels.AbstractCoasterChannel.handleRequest(AbstractCoasterChannel.java:527) > > at > > > org.globus.cog.coaster.channels.AbstractStreamCoasterChannel.stepNIO(AbstractStreamCoasterChannel.java:238) > > at > > > org.globus.cog.coaster.channels.NIOMultiplexer.loop(NIOMultiplexer.java:97) > > at > > > org.globus.cog.coaster.channels.NIOMultiplexer.run(NIOMultiplexer.java:56) > > > > > > > > Log for this run is: > > http://www.mcs.anl.gov/~ketan/absolutepath.log > > > > > > > > Thanks for any suggestions on fixing this. > > > > > > Regards, > > > > -- > > Ketan > > > > > > > > > > > > -- > > Ketan > > > > > > > _______________________________________________ > > Swift-devel mailing list > > Swift-devel at ci.uchicago.edu > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel > > > > > > -- > Ketan > >