From justinbbt at gmail.com Mon Sep 1 16:20:57 2014 From: justinbbt at gmail.com (Justin bbt) Date: Mon, 1 Sep 2014 17:20:57 -0400 Subject: [Swift-user] running jobs on cluster or cloud In-Reply-To: References: Message-ID: I solved the permission problem with ssh-add command to add the key to list of keys. (This modification is required if the local system is linux- i am using ubuntu) (more here https://help.github.com/articles/error-agent-admitted-failure-to-sign) Now, start-coaster-service connect to the cluster without password, but it does not terminate. The is the the output Service address: localhost Starting coaster-service Service port: 35925 Local port: 40681 Generating sites.xml Starting worker on W.X.Y.Z WORKER_LOGGING_LEVEL=DEBUG: Command not found. If I just use my sites.xml passive 1 10 10000 . it fails with the following error Execution failed: Exception in simulate: Arguments: [] Host: persistent-coasters Directory: p1-20140901-1648-r8mdqbse/jobs/z/simulate-zcyxesvl Caused by: Could not submit job Caused by: Failed to create socket Caused by: Connection refused simulation, p1.swift, line 9 On Sat, Aug 30, 2014 at 1:28 AM, Justin bbt wrote: > For cluster: > > When I run the start-caoster-service, I receive the following, in which it > asks for password and then says Permission is denied > > Start-coaster-service... > Configuration: /home/lenovo/swift-cloud-tutorial/scs/coaster-service.conf > Service address: localhost > Starting coaster-service > Service port: 52809 > Local port: 58460 > Generating sites.xml > username at ipadress's password: > username at ipadress's password: > Starting worker on username@ > lenovo at lenovo-laptop:~/swift-cloud-tutorial/scs$username at ipadress's > password: > Permission denied, please try again. > username at ipadress's password: > Permission denied, please try again. > username at ipadress's password: > Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). > > This happens though I have created my keys with ssh-keygen. (only changed > that I made was to create rsa keys rather than dsa keys - my cluster did > not accept dsa). I can connect with rsa keygen and my passphrase for > regular ssh > > The output of my sites.xml from this partial running of > start-coaster-service is > > > url="http://localhost:37584" > jobmanager="local:local"/> > passive > 1 > 10 > 10000 > > . > > > Using this XML , I just get a sequence of job submission every 30 seconds, > no finished jobs. > > > BTW, I have a public ip for my cluster and then each compute node has a > local/private ip. > In > export WORKER_HOSTS=" " > currently I just set the public IP address which still I am not successful > with this one node even. I was wondering how should I set the other IPs? > Does it mean that I have to install swift on the cluster? > > > I will look at the new release of swift for AWS. > > > Thanks, > J. > > > > > > On Fri, Aug 29, 2014 at 11:43 AM, Yadu Nand wrote: > >> Hi Justin, >> >> ??Did you do the following steps: >> export WORKER_LOCATION="/home/ubuntu" >> export WORKER_HOSTS=" " >> export WORKER_USERNAME=ubuntu >> >> and then run "source setup.sh" ? >> When you source the setup.sh scripts you must've gotten a sites.xml and a >> start-coaster-service.log in your scs folder, could you send us those ? >> The setup script should start a persistent coaster service and connect to >> the nodes on amazon, start workers, and generate a sites.xml file >> that would let your swift scripts run across the amazon nodes. You >> shouldn't have to make changes to the sites.xml. >> >> Alternatively, you could try using the beta release of swift, Swift 0.95 >> RC6 with the new cloud mechanism: >> https://github.com/swift-lang/swift-on-cloud/tree/master/aws >> >> That will set you up with a headnode on AWS with a few worker nodes that >> you define, with everything setup to run swift. >> >> >> Thanks, >> Yadu >> ? >> >> >> On Thu, Aug 28, 2014 at 6:57 PM, Justin bbt wrote: >> >>> >>> >>> >>> Hi all, >>>> >>>> I could successfully run swift on my local system. >>>> Next, I want to use the swift to run some jobs on a cluster. >>>> >>>> I followed this tutorial. (I am using just a simple cluster- I even >>>> could not run the job on one remote node of the cluster) >>>> http://swift-lang.org/tutorials/cloud/tutorial.html >>>> >>>> But, I get this when I run swift p1.swift or other swift >>>> >>>> Swift 0.94.1 swift-r7114 cog-r3803 >>>> >>>> RunID: 20140828-1758-ea4phzag >>>> Progress: time: Thu, 28 Aug 2014 17:58:15 -0400 >>>> Progress: time: Thu, 28 Aug 2014 17:58:24 -0400 Submitted:1 >>>> Execution failed: >>>> Exception in simulate: >>>> Arguments: [] >>>> Host: remotehost2 >>>> Directory: p1-20140828-1758-ea4phzag/jobs/7/simulate-7k2fxlvl >>>> >>>> Caused by: >>>> Job failed with an exit code of 127 >>>> simulation, p1.swift, line 9 >>>> >>>> >>>> --- this is my site.xml file setting >>>> >>>> >>>> >>> url="myclusteturl"/> >>>> >>>> 0 >>>> 10000 >>>> /path/to/remote/workdirectory >>>> >>>> >>>> --- if I use this one >>>> >>>> >>> url="myclusterurl" >>>> jobmanager="local:local"/> >>>> passive >>>> 1 >>>> 10 >>>> 10000 >>>> >>>> .l >>>> >>>> --- then it loops to my localhost and just repeat submitting the jobs >>>> >>>> 1. Is this a correct setting? >>>> 2. Should I use coaster? I could not understand the description in user >>>> guides and documentation about the concepts of coaster and the required >>>> setting. Is there any better tutorial which would describe the coaster ? >>>> 3. I plan to use the swift later on the cloud (Microsoft Azure). What >>>> are the setting required for that? for site.xml and if any other file >>>> >>>> >>>> Thanks in Advance. >>>> >>>> >>> >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >> >> >> >> -- >> Yadu Nand B >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From Matthew.Shaxted at som.com Tue Sep 2 10:45:40 2014 From: Matthew.Shaxted at som.com (Matthew Shaxted) Date: Tue, 2 Sep 2014 11:45:40 -0400 Subject: [Swift-user] Swift Coaster Service & Docker Containers In-Reply-To: <41C141B3-570C-43A7-A30C-D6740B38E829@som.com> References: <54012A0C.4070703@anl.gov> <41C141B3-570C-43A7-A30C-D6740B38E829@som.com> Message-ID: Hi Mike/Yadu, Thanks for these suggestions. I am successfully starting the coaster-service, making the connection and running jobs on the docker containers with the below commands: On the coaster-service host: coaster-service -p 50200 -localport 50100 -nosec -passive &> /var/log/coaster-service.logs On each container I start this command: ./worker.pl http://172.20.24.20:50100 172.20.24.101 ~/swiftwork MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 SOUTH MICHIGAN AVENUE CHICAGO, IL 60604 T (312) 360-4368 MATTHEW.SHAXTED at SOM.COM [cid:image004.png at 01CFC69B.09BB3CD0] The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sen???der immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. [cid:image003.gif at 01CFC696.B5E21380] From: Matthew Shaxted Sent: Saturday, August 30, 2014 8:40 AM To: Yadu Nand Cc: Michael Wilde; Swift User Subject: Re: [Swift-user] Swift Coaster Service & Docker Containers Thanks Mike and Yadu, Yes it would be great if there is a published config for this use case, and I am willing to help in whatever capacity I can. The docker containers can reach out to external IP addresses, so as you both mention the best way may be to run the worker.pl in each container and have it connect back to the coaster service. I start the coaster service on a known ip/port host using the command you specify. I think I conceptually understand how this could work and will try to test something. I have an ubuntu docker container configured to start ssh services at the command below. It sounds like, however, if I can run the worker.pl on each container and connect to the coaster service, ssh is not even required. docker run -i -p 22 -t mattshax/precise_ep_node sh -c '/usr/sbin/sshd -D' On Aug 29, 2014, at 9:33 PM, "Yadu Nand" > wrote: Hi Matthew, You can use the coaster-service directly with the following arguments : coaster-service -p $SERVICEPORT -localport $WORKERPORT -nosec -passive &> /var/log/coaster-service.logs The workers running under docker should be connecting to the WORKERPORT and the swift client would use the SERVICEPORT. Another thing to remember is to make sure that you use the the coaster-service and worker.pl from the same swift release. The start-coaster-service command is a wrapper over coaster-service which in addition to starting the coaster service, also tries to ssh the worker.pl to the WORKER_HOSTS. In the situation that you describe, I think it makes sense to have the workers just connect back to the coaster-service which is listening on a known IP / port. Thanks, Yadu On Fri, Aug 29, 2014 at 8:34 PM, Michael Wilde > wrote: Matthew, I would treat this case similar to the configuration you'd use for a set of virtual machines. Start one coaster service for each pool of identical docker containers that you want to run. For each pool of containers, run a Swift worker (worker.pl) in the container and have it connect back to the coaster service you designate to manage that pool. I'm assuming that from a docker contained you can connect out to any reachable IP address. Another approach is to treat the containers like a set of ad-hoc compute nodes, and ssh into them with automatic coasters using the ssh:local jobmanager setting. We'll try to test and publish a config for such cases. We'd welcome your help with that. - Mike On 8/29/14, 7:48 PM, Matthew Shaxted wrote: Hi All, I?m trying to find a way to run Swift workflows on multi-host docker containers, and wondering if anybody has had success with this. When I start a docker container and define specific ports to open on the container, they mapped to random ports on the host machine. So for example, I can start a container with an ssh port open from a host ?10.1.1.1?, and can then access this container across hosts with ?ssh compute at 10.1.1.1 ?p 49160?. Now I?m hoping to link these docker containers to Swift?s start-coaster-service. I think it would be possible and relatively easy if I can say, start the coasters on a series of IP addresses AND ports. So a host?s file perhaps would look something like below, and coasters would be started on the correct docker container: 10.1.1.1: 49160 10.1.1.2: 34155 10.1.1.2: 34156 ? Does this make sense? Is it possible to start coasters by specifying an IP address and port number? Any thoughts are greatly appreciated. Thanks, Matthew MATTHEW SHAXTED SKIDMORE, OWINGS & MERRILL LLP 224 SOUTH MICHIGAN AVENUE CHICAGO, IL 60604 T (312) 360-4368 MATTHEW.SHAXTED at SOM.COM The information contained in this communication may be confidential, is intended only for the use of the recipient(s) named above, and may be legally privileged. If the reader of this message is not the intended recipient, you are hereby notified that any dissemination, distribution, or copying of this communication, or any of its contents, is strictly prohibited and may be unlawful. If you have received this communication in error, please return it to the sen???der immediately and delete the original message and any copy of it from your computer system. If you have any questions concerning this message, please contact the sender. _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Yadu Nand B _______________________________________________ Swift-user mailing list Swift-user at ci.uchicago.edu https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image003.gif Type: image/gif Size: 566 bytes Desc: image003.gif URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: image004.png Type: image/png Size: 5311 bytes Desc: image004.png URL: From yadudoc1729 at gmail.com Tue Sep 2 12:13:45 2014 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Tue, 2 Sep 2014 12:13:45 -0500 Subject: [Swift-user] running jobs on cluster or cloud In-Reply-To: References: Message-ID: Hi Justin, The sites.xml generated by start-coaster-service has to be used by swift, so that swift can connect to the coaster service. I do not understand if you are using the generated sites.xml, or some other sites.xml file that you wrote. In the case where coaster-service output says "Service port: 35925", the url in the sites.xml should be http://localhost:35925. Could you give me a paste of the sites.xml file that was generated by start-coaster-service, the start-coaster-service.log, and the logs from swift when you attempted to run a swift script. Thanks, Yadu On Mon, Sep 1, 2014 at 4:20 PM, Justin bbt wrote: > I solved the permission problem with ssh-add command to add the key to > list of keys. (This modification is required if the local system is linux- > i am using ubuntu) > > (more here > https://help.github.com/articles/error-agent-admitted-failure-to-sign) > > Now, start-coaster-service connect to the cluster without password, but it > does not terminate. The is the the output > > Service address: localhost > Starting coaster-service > Service port: 35925 > Local port: 40681 > Generating sites.xml > Starting worker on W.X.Y.Z > WORKER_LOGGING_LEVEL=DEBUG: Command not found. > > > > If I just use my sites.xml > > > url="http:// urladdress" > jobmanager="local:local"/> > passive > 1 > 10 > 10000 > > . > > > > it fails with the following error > > > Execution failed: > Exception in simulate: > Arguments: [] > Host: persistent-coasters > Directory: p1-20140901-1648-r8mdqbse/jobs/z/simulate-zcyxesvl > > Caused by: > Could not submit job > Caused by: > Failed to create socket > Caused by: > Connection refused > simulation, p1.swift, line 9 > > > > > > > On Sat, Aug 30, 2014 at 1:28 AM, Justin bbt wrote: > >> For cluster: >> >> When I run the start-caoster-service, I receive the following, in which >> it asks for password and then says Permission is denied >> >> Start-coaster-service... >> Configuration: /home/lenovo/swift-cloud-tutorial/scs/coaster-service.conf >> Service address: localhost >> Starting coaster-service >> Service port: 52809 >> Local port: 58460 >> Generating sites.xml >> username at ipadress's password: >> username at ipadress's password: >> Starting worker on username@ >> lenovo at lenovo-laptop:~/swift-cloud-tutorial/scs$username at ipadress's >> password: >> Permission denied, please try again. >> username at ipadress's password: >> Permission denied, please try again. >> username at ipadress's password: >> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). >> >> This happens though I have created my keys with ssh-keygen. (only changed >> that I made was to create rsa keys rather than dsa keys - my cluster did >> not accept dsa). I can connect with rsa keygen and my passphrase for >> regular ssh >> >> The output of my sites.xml from this partial running of >> start-coaster-service is >> >> >> > url="http://localhost:37584" >> jobmanager="local:local"/> >> passive >> 1 >> 10 >> 10000 >> >> . >> >> >> Using this XML , I just get a sequence of job submission every 30 >> seconds, no finished jobs. >> >> >> BTW, I have a public ip for my cluster and then each compute node has a >> local/private ip. >> In >> export WORKER_HOSTS=" " >> currently I just set the public IP address which still I am not >> successful with this one node even. I was wondering how should I set the >> other IPs? Does it mean that I have to install swift on the cluster? >> >> >> I will look at the new release of swift for AWS. >> >> >> Thanks, >> J. >> >> >> >> >> >> On Fri, Aug 29, 2014 at 11:43 AM, Yadu Nand >> wrote: >> >>> Hi Justin, >>> >>> ??Did you do the following steps: >>> export WORKER_LOCATION="/home/ubuntu" >>> export WORKER_HOSTS=" " >>> export WORKER_USERNAME=ubuntu >>> >>> and then run "source setup.sh" ? >>> When you source the setup.sh scripts you must've gotten a sites.xml and >>> a start-coaster-service.log in your scs folder, could you send us those ? >>> The setup script should start a persistent coaster service and connect >>> to the nodes on amazon, start workers, and generate a sites.xml file >>> that would let your swift scripts run across the amazon nodes. You >>> shouldn't have to make changes to the sites.xml. >>> >>> Alternatively, you could try using the beta release of swift, Swift >>> 0.95 RC6 with the new cloud mechanism: >>> https://github.com/swift-lang/swift-on-cloud/tree/master/aws >>> >>> That will set you up with a headnode on AWS with a few worker nodes that >>> you define, with everything setup to run swift. >>> >>> >>> Thanks, >>> Yadu >>> ? >>> >>> >>> On Thu, Aug 28, 2014 at 6:57 PM, Justin bbt >>> wrote: >>> >>>> >>>> >>>> >>>> Hi all, >>>>> >>>>> I could successfully run swift on my local system. >>>>> Next, I want to use the swift to run some jobs on a cluster. >>>>> >>>>> I followed this tutorial. (I am using just a simple cluster- I even >>>>> could not run the job on one remote node of the cluster) >>>>> http://swift-lang.org/tutorials/cloud/tutorial.html >>>>> >>>>> But, I get this when I run swift p1.swift or other swift >>>>> >>>>> Swift 0.94.1 swift-r7114 cog-r3803 >>>>> >>>>> RunID: 20140828-1758-ea4phzag >>>>> Progress: time: Thu, 28 Aug 2014 17:58:15 -0400 >>>>> Progress: time: Thu, 28 Aug 2014 17:58:24 -0400 Submitted:1 >>>>> Execution failed: >>>>> Exception in simulate: >>>>> Arguments: [] >>>>> Host: remotehost2 >>>>> Directory: p1-20140828-1758-ea4phzag/jobs/7/simulate-7k2fxlvl >>>>> >>>>> Caused by: >>>>> Job failed with an exit code of 127 >>>>> simulation, p1.swift, line 9 >>>>> >>>>> >>>>> --- this is my site.xml file setting >>>>> >>>>> >>>>> >>>> url="myclusteturl"/> >>>>> >>>>> 0 >>>>> 10000 >>>>> /path/to/remote/workdirectory >>>>> >>>>> >>>>> --- if I use this one >>>>> >>>>> >>>> url="myclusterurl" >>>>> jobmanager="local:local"/> >>>>> passive >>>>> 1 >>>>> 10 >>>>> 10000 >>>>> >>>>> .l >>>>> >>>>> --- then it loops to my localhost and just repeat submitting the jobs >>>>> >>>>> 1. Is this a correct setting? >>>>> 2. Should I use coaster? I could not understand the description in >>>>> user guides and documentation about the concepts of coaster and the >>>>> required setting. Is there any better tutorial which would describe the >>>>> coaster ? >>>>> 3. I plan to use the swift later on the cloud (Microsoft Azure). What >>>>> are the setting required for that? for site.xml and if any other file >>>>> >>>>> >>>>> Thanks in Advance. >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>> >>> >>> >>> >>> -- >>> Yadu Nand B >>> >>> >> > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Tue Sep 2 16:00:28 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Tue, 2 Sep 2014 16:00:28 -0500 Subject: [Swift-user] new config for Cobalt provider option ignored Message-ID: Hi, So, trying the new config from trunk for the cobalt provider, the config translator is ignoring the script mode. Is there an alternative way to tell Cobalt provider to be invoked in script mode? The old-style sites file is: 1 script 2.99 10000 4 00:15:00 1500 128 128 /home/ketan/swiftwork The generated config is: options { maxNodesPerJob: 128 maxJobs: 1 tasksPerNode: 4 # Option ignored: globus:mode = script nodeGranularity: 128 # Option ignored: globus:walltime = 1500 } ... Thanks, Ketan -------------- next part -------------- An HTML attachment was scrubbed... URL: From justinbbt at gmail.com Tue Sep 2 13:26:28 2014 From: justinbbt at gmail.com (Justin bbt) Date: Tue, 2 Sep 2014 14:26:28 -0400 Subject: [Swift-user] running jobs on cluster or cloud In-Reply-To: References: Message-ID: I re-run my coaster to make sure I am sending you an updated log. Log is attached. The sites.xml is this now passive 1 10 10000 /scratchspace The first question is why the start-coaster-service does not terminate? Anyhow, if I use this sites.xml, then swift output is lenovo at lenovo-laptop:~/swift-cloud-tutorial/part01$ swift p1.swift Swift 0.94.1 swift-r7114 cog-r3803 RunID: 20140902-1406-he5yo1s3 Progress: time: Tue, 02 Sep 2014 14:06:50 -0400 Progress: time: Tue, 02 Sep 2014 14:07:20 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:07:50 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:08:20 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:08:50 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:09:20 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:09:50 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:10:20 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:10:50 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:11:20 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:11:50 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:12:20 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:12:50 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:13:20 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:13:50 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:14:20 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:14:50 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:15:20 -0400 Submitted:1 Progress: time: Tue, 02 Sep 2014 14:15:50 -0400 Submitted:1 I am also attaching the log file for that. On Mon, Sep 1, 2014 at 5:20 PM, Justin bbt wrote: > I solved the permission problem with ssh-add command to add the key to > list of keys. (This modification is required if the local system is linux- > i am using ubuntu) > > (more here > https://help.github.com/articles/error-agent-admitted-failure-to-sign) > > Now, start-coaster-service connect to the cluster without password, but it > does not terminate. The is the the output > > Service address: localhost > Starting coaster-service > Service port: 35925 > Local port: 40681 > Generating sites.xml > Starting worker on W.X.Y.Z > WORKER_LOGGING_LEVEL=DEBUG: Command not found. > > > > If I just use my sites.xml > > > url="http:// urladdress" > jobmanager="local:local"/> > passive > 1 > 10 > 10000 > > . > > > > it fails with the following error > > > Execution failed: > Exception in simulate: > Arguments: [] > Host: persistent-coasters > Directory: p1-20140901-1648-r8mdqbse/jobs/z/simulate-zcyxesvl > > Caused by: > Could not submit job > Caused by: > Failed to create socket > Caused by: > Connection refused > simulation, p1.swift, line 9 > > > > > > > On Sat, Aug 30, 2014 at 1:28 AM, Justin bbt wrote: > >> For cluster: >> >> When I run the start-caoster-service, I receive the following, in which >> it asks for password and then says Permission is denied >> >> Start-coaster-service... >> Configuration: /home/lenovo/swift-cloud-tutorial/scs/coaster-service.conf >> Service address: localhost >> Starting coaster-service >> Service port: 52809 >> Local port: 58460 >> Generating sites.xml >> username at ipadress's password: >> username at ipadress's password: >> Starting worker on username@ >> lenovo at lenovo-laptop:~/swift-cloud-tutorial/scs$username at ipadress's >> password: >> Permission denied, please try again. >> username at ipadress's password: >> Permission denied, please try again. >> username at ipadress's password: >> Permission denied (publickey,gssapi-keyex,gssapi-with-mic,password). >> >> This happens though I have created my keys with ssh-keygen. (only changed >> that I made was to create rsa keys rather than dsa keys - my cluster did >> not accept dsa). I can connect with rsa keygen and my passphrase for >> regular ssh >> >> The output of my sites.xml from this partial running of >> start-coaster-service is >> >> >> > url="http://localhost:37584" >> jobmanager="local:local"/> >> passive >> 1 >> 10 >> 10000 >> >> . >> >> >> Using this XML , I just get a sequence of job submission every 30 >> seconds, no finished jobs. >> >> >> BTW, I have a public ip for my cluster and then each compute node has a >> local/private ip. >> In >> export WORKER_HOSTS=" " >> currently I just set the public IP address which still I am not >> successful with this one node even. I was wondering how should I set the >> other IPs? Does it mean that I have to install swift on the cluster? >> >> >> I will look at the new release of swift for AWS. >> >> >> Thanks, >> J. >> >> >> >> >> >> On Fri, Aug 29, 2014 at 11:43 AM, Yadu Nand >> wrote: >> >>> Hi Justin, >>> >>> ??Did you do the following steps: >>> export WORKER_LOCATION="/home/ubuntu" >>> export WORKER_HOSTS=" " >>> export WORKER_USERNAME=ubuntu >>> >>> and then run "source setup.sh" ? >>> When you source the setup.sh scripts you must've gotten a sites.xml and >>> a start-coaster-service.log in your scs folder, could you send us those ? >>> The setup script should start a persistent coaster service and connect >>> to the nodes on amazon, start workers, and generate a sites.xml file >>> that would let your swift scripts run across the amazon nodes. You >>> shouldn't have to make changes to the sites.xml. >>> >>> Alternatively, you could try using the beta release of swift, Swift >>> 0.95 RC6 with the new cloud mechanism: >>> https://github.com/swift-lang/swift-on-cloud/tree/master/aws >>> >>> That will set you up with a headnode on AWS with a few worker nodes that >>> you define, with everything setup to run swift. >>> >>> >>> Thanks, >>> Yadu >>> ? >>> >>> >>> On Thu, Aug 28, 2014 at 6:57 PM, Justin bbt >>> wrote: >>> >>>> >>>> >>>> >>>> Hi all, >>>>> >>>>> I could successfully run swift on my local system. >>>>> Next, I want to use the swift to run some jobs on a cluster. >>>>> >>>>> I followed this tutorial. (I am using just a simple cluster- I even >>>>> could not run the job on one remote node of the cluster) >>>>> http://swift-lang.org/tutorials/cloud/tutorial.html >>>>> >>>>> But, I get this when I run swift p1.swift or other swift >>>>> >>>>> Swift 0.94.1 swift-r7114 cog-r3803 >>>>> >>>>> RunID: 20140828-1758-ea4phzag >>>>> Progress: time: Thu, 28 Aug 2014 17:58:15 -0400 >>>>> Progress: time: Thu, 28 Aug 2014 17:58:24 -0400 Submitted:1 >>>>> Execution failed: >>>>> Exception in simulate: >>>>> Arguments: [] >>>>> Host: remotehost2 >>>>> Directory: p1-20140828-1758-ea4phzag/jobs/7/simulate-7k2fxlvl >>>>> >>>>> Caused by: >>>>> Job failed with an exit code of 127 >>>>> simulation, p1.swift, line 9 >>>>> >>>>> >>>>> --- this is my site.xml file setting >>>>> >>>>> >>>>> >>>> url="myclusteturl"/> >>>>> >>>>> 0 >>>>> 10000 >>>>> /path/to/remote/workdirectory >>>>> >>>>> >>>>> --- if I use this one >>>>> >>>>> >>>> url="myclusterurl" >>>>> jobmanager="local:local"/> >>>>> passive >>>>> 1 >>>>> 10 >>>>> 10000 >>>>> >>>>> .l >>>>> >>>>> --- then it loops to my localhost and just repeat submitting the jobs >>>>> >>>>> 1. Is this a correct setting? >>>>> 2. Should I use coaster? I could not understand the description in >>>>> user guides and documentation about the concepts of coaster and the >>>>> required setting. Is there any better tutorial which would describe the >>>>> coaster ? >>>>> 3. I plan to use the swift later on the cloud (Microsoft Azure). What >>>>> are the setting required for that? for site.xml and if any other file >>>>> >>>>> >>>>> Thanks in Advance. >>>>> >>>>> >>>> >>>> >>>> _______________________________________________ >>>> Swift-user mailing list >>>> Swift-user at ci.uchicago.edu >>>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>>> >>> >>> >>> >>> -- >>> Yadu Nand B >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: jobspernode-20140902-1401-nux8gdl0.log Type: text/x-log Size: 14003 bytes Desc: not available URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: p1-20140902-1406-he5yo1s3.log Type: text/x-log Size: 10845 bytes Desc: not available URL: From hategan at mcs.anl.gov Tue Sep 2 17:09:11 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Tue, 2 Sep 2014 15:09:11 -0700 Subject: [Swift-user] new config for Cobalt provider option ignored In-Reply-To: References: Message-ID: <1409695751.16233.12.camel@echo> Hi Ketan, The config translator does not generally translate new features that were added just now by you, and it even tells you when it doesn't recognize an option: # Option ignored: globus:mode = script You can say instead: options { mode: "script" } Mihael On Tue, 2014-09-02 at 16:00 -0500, Ketan Maheshwari wrote: > Hi, > > So, trying the new config from trunk for the cobalt provider, the config > translator is ignoring the script mode. Is there an alternative way to tell > Cobalt provider to be invoked in script mode? The old-style sites file is: > > > > > > > > 1 > script > > 2.99 > 10000 > 4 > 00:15:00 > 1500 > > 128 > 128 > > /home/ketan/swiftwork > > > > > > The generated config is: > > > options { > maxNodesPerJob: 128 > maxJobs: 1 > tasksPerNode: 4 > # Option ignored: globus:mode = script > nodeGranularity: 128 > # Option ignored: globus:walltime = 1500 > } > ... > > Thanks, > Ketan > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From jozik at uchicago.edu Thu Sep 4 10:44:56 2014 From: jozik at uchicago.edu (Jonathan Ozik) Date: Thu, 4 Sep 2014 10:44:56 -0500 Subject: [Swift-user] Bus Error Message-ID: Hi all, I?m trying to launch a large number of runs, 16807 to be precise, and am running into a couple of issues. The first issue was a Java heap out of memory exception. I looked around to see where the best place to increase the max heap would be, and I settled on line 13 (?HEAPMAX=4096M?) of the swift executable. I should mention that all of this is using swift-0.95-RC6. That did get rid of the out of memory exception, but this time I got a Bus error: /home/ozik/swift_dist/swift-0.95-RC6/bin/swift: line 188: 108521 Bus error (core dumped) java -Xmx4096M -XX:+HeapDumpOnOutOfMemoryError -Djava.endorsed.dirs=/home/ozik/swift_dist/swift-0.95-RC6/lib/endorsed -DUID=5702 -DGLOBUS_HOSTNAME=blogin3 -DCOG_INSTALL_PATH=/home/ozik/swift_dist/swift-0.95-RC6 -Dswift.home=/home/ozik/swift_dist/swift-0.95-RC6 -Djava.security.egd=file:///dev/urandom -Dscript.dir=/lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/scripts -Drestart.log.name=/lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/restart.log -Ddebug.dir.prefix=run003/ -classpath /home/ozik/swift_dist/swift-0.95-RC6/etc:/home/ozik/swift_dist/swift-0.95-RC6/libexec:/home/ozik/swift_dist/swift-0.95-RC6/lib/ant.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/antlr-2.7.5.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/castor-0.9.6.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/coaster-bootstrap.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-abstraction-common-2.4.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-grapheditor-0.47.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-jglobus-1.7.0.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-karajan-0.36-dev.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-coaster-0.3.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-dcache-0.1.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-gt2-2.4.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-local-2.2.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-localscheduler-0.4.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-ssh-2.4.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-webdav-2.1.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-resources-1.0.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-swift-svn.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-util-0.92.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/commons-httpclient.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/commons-logging-1.1.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cryptix32.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cryptix-asn1.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cryptix.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/j2ssh-common-0.2.2.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/j2ssh-core-0.2.2-patch-b.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jakarta-regexp-1.2.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jakarta-slide-webdavlib-2.0.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jaxrpc.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jce-jdk13-131.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jcommon-1.0.18.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jfreechart-1.0.15.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jgss.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jline-0.9.94.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jsr173_1.0_api.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jug-lgpl-2.0.0.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/junit.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/log4j-1.2.16.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/puretls.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/resolver.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/stringtemplate.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/vdldefinitions.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/xbean.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/xbean_xpath.jar: org.griphyn.vdl.karajan.Loader -runid run003 -logfile /lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/run003.log -sites.file /lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/sites.xml -tc.file /lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/apps -config /lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/cf repast.swift -lines=823543 -inst=16807 -upf=unrolledParamFile.txt Any ideas on this? Jonathan From hategan at mcs.anl.gov Thu Sep 4 14:28:50 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Thu, 4 Sep 2014 12:28:50 -0700 Subject: [Swift-user] Bus Error In-Reply-To: References: Message-ID: <1409858930.32145.3.camel@echo> Hi Jonathan, It's a bus error, which is just slightly worse than a segfault. both of which would reflect bugs in the JVM. Was there a hs_err_pid???.log produced? It might help. Also, it might help to know the exact version of java as well as the hardware details. Mihael On Thu, 2014-09-04 at 10:44 -0500, Jonathan Ozik wrote: > Hi all, > > I?m trying to launch a large number of runs, 16807 to be precise, and am running into a couple of issues. > The first issue was a Java heap out of memory exception. I looked around to see where the best place to increase the max heap would be, and I settled on line 13 (?HEAPMAX=4096M?) of the swift executable. I should mention that all of this is using swift-0.95-RC6. > That did get rid of the out of memory exception, but this time I got a Bus error: > > /home/ozik/swift_dist/swift-0.95-RC6/bin/swift: line 188: 108521 Bus error (core dumped) java -Xmx4096M -XX:+HeapDumpOnOutOfMemoryError -Djava.endorsed.dirs=/home/ozik/swift_dist/swift-0.95-RC6/lib/endorsed -DUID=5702 -DGLOBUS_HOSTNAME=blogin3 -DCOG_INSTALL_PATH=/home/ozik/swift_dist/swift-0.95-RC6 -Dswift.home=/home/ozik/swift_dist/swift-0.95-RC6 -Djava.security.egd=file:///dev/urandom -Dscript.dir=/lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/scripts -Drestart.log.name=/lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/restart.log -Ddebug.dir.prefix=run003/ -classpath /home/ozik/swift_dist/swift-0.95-RC6/etc:/home/ozik/swift_dist/swift-0.95-RC6/libexec:/home/ozik/swift_dist/swift-0.95-RC6/lib/ant.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/antlr-2.7.5.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/castor-0.9.6.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/coaster-bootstrap.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-abstraction-common-2.4.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-grapheditor-0.47.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-jglobus-1.7.0.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-karajan-0.36-dev.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-coaster-0.3.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-dcache-0.1.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-gt2-2.4.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-local-2.2.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-localscheduler-0.4.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-ssh-2.4.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-webdav-2.1.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-resources-1.0.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-swift-svn.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-util-0.92.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/commons-httpclient.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/commons-logging-1.1.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cryptix32.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cryptix-asn1.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cryptix.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/j2ssh-common-0.2.2.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/j2ssh-core-0.2.2-patch-b.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jakarta-regexp-1.2.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jakarta-slide-webdavlib-2.0.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jaxrpc.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jce-jdk13-131.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jcommon-1.0.18.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jfreechart-1.0.15.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jgss.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jline-0.9.94.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jsr173_1.0_api.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jug-lgpl-2.0.0.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/junit.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/log4j-1.2.16.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/puretls.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/resolver.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/stringtemplate.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/vdldefinitions.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/xbean.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/xbean_xpath.jar: org.griphyn.vdl.karajan.Loader -runid run003 -logfile /lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/run003.log -sites.file /lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/sites.xml -tc.file /lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/apps -config /lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/cf repast.swift -lines=823543 -inst=16807 -upf=unrolledParamFile.txt > > Any ideas on this? > > Jonathan > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From jozik at uchicago.edu Thu Sep 4 14:52:56 2014 From: jozik at uchicago.edu (Jonathan Ozik) Date: Thu, 4 Sep 2014 14:52:56 -0500 Subject: [Swift-user] Bus Error In-Reply-To: <1409858930.32145.3.camel@echo> References: <1409858930.32145.3.camel@echo> Message-ID: Hi Mihael, This was on Blues, java version ?1.7.0_55?. Theres a ~4GB core-java-7-5702-3026-108521-1409822206 core dump file that was generated. (Actually there were two, the other one is ~1.3GB.) Following this (http://www.javacodegeeks.com/2013/02/analysing-a-java-core-dump.html), this is the error: Program terminated with signal 7, Bus error. #0 0x00002b873a0fe6e0 in signalHandler(int, siginfo*, void*) () from /fusion/gpfs/software/linux-rhel5-x86_64/jdk/1.7.0_55/jre/lib/amd64/server/libjvm.so And this is where it happened: (gdb) where #0 0x00002b873a0fe6e0 in signalHandler(int, siginfo*, void*) () from /fusion/gpfs/software/linux-rhel5-x86_64/jdk/1.7.0_55/jre/lib/amd64/server/libjvm.so #1 #2 0x00002b873a102428 in os::PlatformEvent::park(long) () from /fusion/gpfs/software/linux-rhel5-x86_64/jdk/1.7.0_55/jre/lib/amd64/server/libjvm.so #3 0x00002b873a0f12eb in ObjectMonitor::wait(long, bool, Thread*) () from /fusion/gpfs/software/linux-rhel5-x86_64/jdk/1.7.0_55/jre/lib/amd64/server/libjvm.so #4 0x00002b8739f63148 in JVM_MonitorWait () from /fusion/gpfs/software/linux-rhel5-x86_64/jdk/1.7.0_55/jre/lib/amd64/server/libjvm.so #5 0x00002b87402b1608 in ?? () #6 0x00000007c7208000 in ?? () #7 0x0000000000000000 in ?? () Does this help at all? Jonathan On Sep 4, 2014, at 2:28 PM, Mihael Hategan wrote: > Hi Jonathan, > > It's a bus error, which is just slightly worse than a segfault. both of > which would reflect bugs in the JVM. > > Was there a hs_err_pid???.log produced? It might help. > > Also, it might help to know the exact version of java as well as the > hardware details. > > Mihael > > On Thu, 2014-09-04 at 10:44 -0500, Jonathan Ozik wrote: >> Hi all, >> >> I?m trying to launch a large number of runs, 16807 to be precise, and am running into a couple of issues. >> The first issue was a Java heap out of memory exception. I looked around to see where the best place to increase the max heap would be, and I settled on line 13 (?HEAPMAX=4096M?) of the swift executable. I should mention that all of this is using swift-0.95-RC6. >> That did get rid of the out of memory exception, but this time I got a Bus error: >> >> /home/ozik/swift_dist/swift-0.95-RC6/bin/swift: line 188: 108521 Bus error (core dumped) java -Xmx4096M -XX:+HeapDumpOnOutOfMemoryError -Djava.endorsed.dirs=/home/ozik/swift_dist/swift-0.95-RC6/lib/endorsed -DUID=5702 -DGLOBUS_HOSTNAME=blogin3 -DCOG_INSTALL_PATH=/home/ozik/swift_dist/swift-0.95-RC6 -Dswift.home=/home/ozik/swift_dist/swift-0.95-RC6 -Djava.security.egd=file:///dev/urandom -Dscript.dir=/lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/scripts -Drestart.log.name=/lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/restart.log -Ddebug.dir.prefix=run003/ -classpath /home/ozik/swift_dist/swift-0.95-RC6/etc:/home/ozik/swift_dist/swift-0.95-RC6/libexec:/home/ozik/swift_dist/swift-0.95-RC6/lib/ant.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/antlr-2.7.5.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/castor-0.9.6.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/coaster-bootstrap.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-abstraction-common-2.4.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-grapheditor-0.47.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-jglobus-1.7.0.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-karajan-0.36-dev.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-coaster-0.3.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-dcache-0.1.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-gt2-2.4.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-local-2.2.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-localscheduler-0.4.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-ssh-2.4.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-provider-webdav-2.1.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-resources-1.0.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-swift-svn.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cog-util-0.92.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/commons-httpclient.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/commons-logging-1.1.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cryptix32.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cryptix-asn1.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/cryptix.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/j2ssh-common-0.2.2.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/j2ssh-core-0.2.2-patch-b.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jakarta-regexp-1.2.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jakarta-slide-webdavlib-2.0.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jaxrpc.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jce-jdk13-131.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jcommon-1.0.18.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jfreechart-1.0.15.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jgss.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jline-0.9.94.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jsr173_1.0_api.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/jug-lgpl-2.0.0.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/junit.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/log4j-1.2.16.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/puretls.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/resolver.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/stringtemplate.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/vdldefinitions.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/xbean.jar:/home/ozik/swift_dist/swift-0.95-RC6/lib/xbean_xpath.jar: org.griphyn.vdl.karajan.Loader -runid run003 -logfile /lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/run003.log -sites.file /lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/sites.xml -tc.file /lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/apps -config /lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run003/cf repast.swift -lines=823543 -inst=16807 -upf=unrolledParamFile.txt >> >> Any ideas on this? >> >> Jonathan >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > From ketan at mcs.anl.gov Fri Sep 5 20:03:26 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Fri, 5 Sep 2014 20:03:26 -0500 Subject: [Swift-user] Pointer to Swift tutorials for computational science education and research In-Reply-To: <362b5dde41ab45f5a0f455a267c058da@GEORGE.anl.gov> References: <539FC360.6090800@anl.gov> <59fc9be3f4d445ab8d502b2b1ce63af7@GEORGE.anl.gov> <13482362cf5c4f13a4d98c897b022f89@LUCKMAN.anl.gov> <362b5dde41ab45f5a0f455a267c058da@GEORGE.anl.gov> Message-ID: Hi Andrew, Yes, I remember: thanks for getting back on this. >From the error message and tc.data, indeed it looks like the executable is provided as absolute path but somehow Swift is looking into system path and not finding it. One possibility is that the node on which catnap.sh is running does not have it installed on the path specified in the tc.data. Can you also check if catnap.sh has the executable bit set. Less likely that this is causing the issue though. Also, from the tc.data line it looks like you are using persistent coasters. Have started the coaster service beforehand and made sure the service started correctly without any error messages. Could you indicate more about your cluster. Depending on the type of cluster, it is possible that we can run Swift in a non-persistent, implicit coasters mode. Can you also send the Swift generated log for this run. Thanks, Ketan On Fri, Sep 5, 2014 at 7:16 PM, Andrew Stocker wrote: > Hi Ketan, > > I'm not sure if you remember, but myself and my research advisor > Xiaosheng spoke to you at LBL in Oakland at the beginning of the summer > about starting to use Swift at our school. We have been working hard on > setting it up, and I am trying to get your demo to run but I'm having a > problem. For some reason I keep getting the following error when I try to > run your catsnsleep demo: > > Execution failed: > Exception in catnap: > Arguments: [5, data.txt] > Host: persistent-coasters > Directory: catsnsleep-20140905-1702-mihcat06/jobs/u/catnap-utx080xl > > Caused by: > Cannot find executable catnap.sh on site system path > catnap, catsnsleep.swift, line 13 > > However I'm not sure why. In our tc.data file we have the line: > > persistent-coasters catnap /usr/local/swift-0.94.1/oakland-demo/catnap.sh > > which I think should work but obviously something is going wrong. I > have been browsing the documentation articles but I can't find anything > about why this might be happening. We would greatly appreciate your advice! > > Regards, > > Andrew Stocker > > > On Fri, Jun 20, 2014 at 12:00 PM, Xiaosheng Huang > wrote: > >> >> ---------- Forwarded message ---------- >> From: Ketan Maheshwari >> Date: Fri, Jun 20, 2014 at 11:45 AM >> Subject: Re: Pointer to Swift tutorials for computational science >> education and research >> To: Xiaosheng Huang >> Cc: Wilde >> >> >> Hi Xiaosheng, >> >> The tarball is: http://www.mcs.anl.gov/~ketan/oakland-demo.tgz >> >> There is a small README in there which outlines the steps. >> >> Best, >> Ketan >> >> ************************************************************ >> Xiaosheng Huang, Assistant Professor >> Department of Physics and Astronomy >> University of San Francisco >> 2130 Fulton Street, San Francisco, CA 94117-1080 >> >> Phone: (415) 422-6281 >> E-mail: xhuang22 at usfca.edu >> ************************************************************ >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jozik at uchicago.edu Mon Sep 8 13:04:29 2014 From: jozik at uchicago.edu (Jonathan Ozik) Date: Mon, 8 Sep 2014 13:04:29 -0500 Subject: [Swift-user] CPU failed leading to end of run Message-ID: Hello all, I?m getting failed runs on Blues which might be due to CPUs failing. I?ve put the run log at: /tmp/run004_ozik.log Any help is greatly appreciated, Jonathan From yadunand at uchicago.edu Mon Sep 8 13:18:57 2014 From: yadunand at uchicago.edu (Yadu Nand Babuji) Date: Mon, 08 Sep 2014 13:18:57 -0500 Subject: [Swift-user] CPU failed leading to end of run In-Reply-To: References: Message-ID: <540DF311.5080308@uchicago.edu> Hi Jonathan, Could you add read permissions to the log please ? Thanks, Yadu On 09/08/2014 01:04 PM, Jonathan Ozik wrote: > Hello all, > > I?m getting failed runs on Blues which might be due to CPUs failing. I?ve put the run log at: /tmp/run004_ozik.log > > Any help is greatly appreciated, > > Jonathan > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From jozik at uchicago.edu Mon Sep 8 13:45:25 2014 From: jozik at uchicago.edu (Jonathan Ozik) Date: Mon, 8 Sep 2014 13:45:25 -0500 Subject: [Swift-user] CPU failed leading to end of run In-Reply-To: <540DF311.5080308@uchicago.edu> References: <540DF311.5080308@uchicago.edu> Message-ID: <23DDE6BD-5D60-41D4-BD84-22CEA340DEB2@uchicago.edu> Yadu, Here?s the location of the log and I did add read permissions: /lcrc/project/gcmat/runs/heatmap7_7x7/complete_model/run004/run004.log Thank you, Jonathan On Sep 8, 2014, at 1:18 PM, Yadu Nand Babuji wrote: > Hi Jonathan, > > Could you add read permissions to the log please ? > > Thanks, > Yadu > On 09/08/2014 01:04 PM, Jonathan Ozik wrote: >> Hello all, >> >> I?m getting failed runs on Blues which might be due to CPUs failing. I?ve put the run log at: /tmp/run004_ozik.log >> >> Any help is greatly appreciated, >> >> Jonathan >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From amstocker at dons.usfca.edu Mon Sep 8 18:28:47 2014 From: amstocker at dons.usfca.edu (Andrew Stocker) Date: Mon, 8 Sep 2014 16:28:47 -0700 Subject: [Swift-user] Pointer to Swift tutorials for computational science education and research In-Reply-To: References: <539FC360.6090800@anl.gov> <59fc9be3f4d445ab8d502b2b1ce63af7@GEORGE.anl.gov> <13482362cf5c4f13a4d98c897b022f89@LUCKMAN.anl.gov> <362b5dde41ab45f5a0f455a267c058da@GEORGE.anl.gov> Message-ID: Thanks for your response! Since we're just in the stages of experimentation, our preliminary cluster is just four iMacs connected to a switch. I set up password-less ssh communication between the four and I'm able to start the coaster service (in the folder with coaster-service.conf) without any errors. I am running Swift from the computer which has the catnap.sh installed at the correct path, and I'm pretty sure it has the executable bit set ( #!/bin/sh is the first line of the program). None of the other three computers have Swift installed, nor do they have catnap.sh at the location specified in tc.data, is this a problem? Attached is the log file from the run when I got the error I copy+pasted above. Interestingly, when I run the catnap swift script with only 3 concurrent instances, it seems to run fine since we allow 3 jobs per node and so it is probably only running locally. Regards, Andrew On Fri, Sep 5, 2014 at 6:03 PM, Ketan Maheshwari wrote: > Hi Andrew, > > Yes, I remember: thanks for getting back on this. > > From the error message and tc.data, indeed it looks like the executable is > provided as absolute path but somehow Swift is looking into system path and > not finding it. One possibility is that the node on which catnap.sh is > running does not have it installed on the path specified in the tc.data. > Can you also check if catnap.sh has the executable bit set. Less likely > that this is causing the issue though. > > Also, from the tc.data line it looks like you are using persistent > coasters. Have started the coaster service beforehand and made sure the > service started correctly without any error messages. Could you indicate > more about your cluster. Depending on the type of cluster, it is possible > that we can run Swift in a non-persistent, implicit coasters mode. > > Can you also send the Swift generated log for this run. > > Thanks, > Ketan > > > On Fri, Sep 5, 2014 at 7:16 PM, Andrew Stocker > wrote: > >> Hi Ketan, >> >> I'm not sure if you remember, but myself and my research advisor >> Xiaosheng spoke to you at LBL in Oakland at the beginning of the summer >> about starting to use Swift at our school. We have been working hard on >> setting it up, and I am trying to get your demo to run but I'm having a >> problem. For some reason I keep getting the following error when I try to >> run your catsnsleep demo: >> >> Execution failed: >> Exception in catnap: >> Arguments: [5, data.txt] >> Host: persistent-coasters >> Directory: catsnsleep-20140905-1702-mihcat06/jobs/u/catnap-utx080xl >> >> Caused by: >> Cannot find executable catnap.sh on site system path >> catnap, catsnsleep.swift, line 13 >> >> However I'm not sure why. In our tc.data file we have the line: >> >> persistent-coasters catnap >> /usr/local/swift-0.94.1/oakland-demo/catnap.sh >> >> which I think should work but obviously something is going wrong. I >> have been browsing the documentation articles but I can't find anything >> about why this might be happening. We would greatly appreciate your advice! >> >> Regards, >> >> Andrew Stocker >> >> >> On Fri, Jun 20, 2014 at 12:00 PM, Xiaosheng Huang >> wrote: >> >>> >>> ---------- Forwarded message ---------- >>> From: Ketan Maheshwari >>> Date: Fri, Jun 20, 2014 at 11:45 AM >>> Subject: Re: Pointer to Swift tutorials for computational science >>> education and research >>> To: Xiaosheng Huang >>> Cc: Wilde >>> >>> >>> Hi Xiaosheng, >>> >>> The tarball is: http://www.mcs.anl.gov/~ketan/oakland-demo.tgz >>> >>> There is a small README in there which outlines the steps. >>> >>> Best, >>> Ketan >>> >>> ************************************************************ >>> Xiaosheng Huang, Assistant Professor >>> Department of Physics and Astronomy >>> University of San Francisco >>> 2130 Fulton Street, San Francisco, CA 94117-1080 >>> >>> Phone: (415) 422-6281 >>> E-mail: xhuang22 at usfca.edu >>> ************************************************************ >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: -------------- next part -------------- A non-text attachment was scrubbed... Name: cps-2014-09-05_17-01-54.log Type: application/octet-stream Size: 1999136 bytes Desc: not available URL: From ketan at mcs.anl.gov Tue Sep 9 09:49:10 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Tue, 9 Sep 2014 09:49:10 -0500 Subject: [Swift-user] Pointer to Swift tutorials for computational science education and research In-Reply-To: References: <539FC360.6090800@anl.gov> <59fc9be3f4d445ab8d502b2b1ce63af7@GEORGE.anl.gov> <13482362cf5c4f13a4d98c897b022f89@LUCKMAN.anl.gov> <362b5dde41ab45f5a0f455a267c058da@GEORGE.anl.gov> Message-ID: On Mon, Sep 8, 2014 at 6:28 PM, Andrew Stocker wrote: > Thanks for your response! > > Since we're just in the stages of experimentation, our preliminary > cluster is just four iMacs connected to a switch. I set up password-less > ssh communication between the four and I'm able to start the coaster > service (in the folder with coaster-service.conf) without any errors. I am > running Swift from the computer which has the catnap.sh installed at the > correct path, and I'm pretty sure it has the executable bit set ( #!/bin/sh > is the first line of the program). None of the other three computers have > Swift installed, nor do they have catnap.sh at the location specified in > tc.data, is this a problem? > Yes, that seems to be the issue. The executable--catnap.sh in this case must be available on all compute nodes in the location specified in the tc. An alternative in this case is to use catnap.sh as data and move it along with data to target compute nodes. However, we can do that later. For now, could you try to put catnap.sh in a common location on each of the compute nodes and try again. No, Swift is not needed to be installed on compute nodes. Swift just needs to be on the submit node. > > Attached is the log file from the run when I got the error I copy+pasted > above. Interestingly, when I run the catnap swift script with only 3 > concurrent instances, it seems to run fine since we allow 3 jobs per node > and so it is probably only running locally. > > Regards, > Andrew > > On Fri, Sep 5, 2014 at 6:03 PM, Ketan Maheshwari > wrote: > >> Hi Andrew, >> >> Yes, I remember: thanks for getting back on this. >> >> From the error message and tc.data, indeed it looks like the executable >> is provided as absolute path but somehow Swift is looking into system path >> and not finding it. One possibility is that the node on which catnap.sh is >> running does not have it installed on the path specified in the tc.data. >> Can you also check if catnap.sh has the executable bit set. Less likely >> that this is causing the issue though. >> >> Also, from the tc.data line it looks like you are using persistent >> coasters. Have started the coaster service beforehand and made sure the >> service started correctly without any error messages. Could you indicate >> more about your cluster. Depending on the type of cluster, it is possible >> that we can run Swift in a non-persistent, implicit coasters mode. >> >> Can you also send the Swift generated log for this run. >> >> Thanks, >> Ketan >> >> >> On Fri, Sep 5, 2014 at 7:16 PM, Andrew Stocker >> wrote: >> >>> Hi Ketan, >>> >>> I'm not sure if you remember, but myself and my research advisor >>> Xiaosheng spoke to you at LBL in Oakland at the beginning of the summer >>> about starting to use Swift at our school. We have been working hard on >>> setting it up, and I am trying to get your demo to run but I'm having a >>> problem. For some reason I keep getting the following error when I try to >>> run your catsnsleep demo: >>> >>> Execution failed: >>> Exception in catnap: >>> Arguments: [5, data.txt] >>> Host: persistent-coasters >>> Directory: catsnsleep-20140905-1702-mihcat06/jobs/u/catnap-utx080xl >>> >>> Caused by: >>> Cannot find executable catnap.sh on site system path >>> catnap, catsnsleep.swift, line 13 >>> >>> However I'm not sure why. In our tc.data file we have the line: >>> >>> persistent-coasters catnap >>> /usr/local/swift-0.94.1/oakland-demo/catnap.sh >>> >>> which I think should work but obviously something is going wrong. I >>> have been browsing the documentation articles but I can't find anything >>> about why this might be happening. We would greatly appreciate your advice! >>> >>> Regards, >>> >>> Andrew Stocker >>> >>> >>> On Fri, Jun 20, 2014 at 12:00 PM, Xiaosheng Huang >>> wrote: >>> >>>> >>>> ---------- Forwarded message ---------- >>>> From: Ketan Maheshwari >>>> Date: Fri, Jun 20, 2014 at 11:45 AM >>>> Subject: Re: Pointer to Swift tutorials for computational science >>>> education and research >>>> To: Xiaosheng Huang >>>> Cc: Wilde >>>> >>>> >>>> Hi Xiaosheng, >>>> >>>> The tarball is: http://www.mcs.anl.gov/~ketan/oakland-demo.tgz >>>> >>>> There is a small README in there which outlines the steps. >>>> >>>> Best, >>>> Ketan >>>> >>>> ************************************************************ >>>> Xiaosheng Huang, Assistant Professor >>>> Department of Physics and Astronomy >>>> University of San Francisco >>>> 2130 Fulton Street, San Francisco, CA 94117-1080 >>>> >>>> Phone: (415) 422-6281 >>>> E-mail: xhuang22 at usfca.edu >>>> ************************************************************ >>>> >>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From ganesh at cs.utah.edu Mon Sep 8 23:41:09 2014 From: ganesh at cs.utah.edu (Ganesh Gopalakrishnan) Date: Mon, 8 Sep 2014 22:41:09 -0600 Subject: [Swift-user] Fwd: Fwd: Swift/K node creation In-Reply-To: <53F78E7E.9010704@uchicago.edu> References: <53F51FF7.7010107@cs.utah.edu> <53F6295F.5040402@anl.gov> <53F78E7E.9010704@uchicago.edu> Message-ID: Hi Yadu and Swift folks (and Martin B): The new semester 'hit us' . We are going to certainly follow up and let you know . Thanks for your help! (Just wanted to allay the impression that we are ignoring you.) Thanks again, Ganesh On Fri, Aug 22, 2014 at 12:39 PM, Yadu Nand Babuji wrote: > Hi Ian, > > Reposting as I suspect that the original email bounced. > > Here's a tar ball of the swift tutorial with minor changes to the > swift.properties config file to connect to a few > remote machines : > http://users.rcc.uchicago.edu/~yadunand/remote_pool.tar.gz > > Before you run the tests, please make sure to edit the > remote_pool/swift.properties file to have the URLs of > your remote machines. > > There's an online swift sandbox here -> http://swift-lang.org/tryswift/ > > You can get documentation here for different releases of Swift -> > http://swift-lang.org/docs/index.php > > I would recommend getting Swift 0.95-RC6 from ( > http://swift-lang.org/downloads/index.php). > > Thanks! > Yadu > > On 08/21/2014 12:16 PM, Michael Wilde wrote: > > > > > -------- Original Message -------- Subject: Fwd: Swift/K node creation Date: > Wed, 20 Aug 2014 16:23:51 -0600 From: Ganesh Gopalakrishnan > To: Michael Wilde > CC: Martin Burtscher > , ian briggs > , Mark Baranowski > > > Hi Mike, > > After yesterday's call, Ian Briggs and Mark Baranowski have looked into > Swift/K. I encouraged them to post a question to the Swift group. Hence > this email is strictly redundant - but I thought I'd drop in an intro to > give you some context for this question. Thanks! > > Best, > > Ganesh > > > -------- Original Message -------- Subject: Swift/K node creation Date: Wed, > 20 Aug 2014 16:19:57 -0600 From: Ian Briggs > To: swift-user at ci.uchicago.edu CC: Ganesh > Gopalakrishnan > > > Thank you for reading this. > > I was able to install and run swift/k on my local computer. > From here I would like to install it on a remote machine > and use that machine as a remote site. How should I go > about doing this? If there is a guide I missed I apologize. > > Thanks, > Ian Briggs > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From burtscher at txstate.edu Wed Sep 10 08:53:53 2014 From: burtscher at txstate.edu (Martin Burtscher) Date: Wed, 10 Sep 2014 08:53:53 -0500 Subject: [Swift-user] Fwd: Fwd: Swift/K node creation In-Reply-To: References: <53F51FF7.7010107@cs.utah.edu> <53F6295F.5040402@anl.gov> <53F78E7E.9010704@uchicago.edu> Message-ID: <541057F1.2090203@txstate.edu> Hello all! I'm also very busy with the start of the semester, but I have hired a student who has started to learn Swift. He installed it successfully on one of our machines and is going through the tutorials now. I hope to have some feedback from him to share with you soon. Kind regards, Martin On 9/8/2014 11:41 PM, Ganesh Gopalakrishnan wrote: > Hi Yadu and Swift folks (and Martin B): > > The new semester 'hit us' . We are going to certainly follow up and let > you know . Thanks for your help! (Just wanted to allay the impression > that we are ignoring you.) > > Thanks again, > > Ganesh > > > > On Fri, Aug 22, 2014 at 12:39 PM, Yadu Nand Babuji > > wrote: > > Hi Ian, > > Reposting as I suspect that the original email bounced. > > Here's a tar ball of the swift tutorial with minor changes to the > swift.properties config file to connect to a few > remote machines : > http://users.rcc.uchicago.edu/~yadunand/remote_pool.tar.gz > > Before you run the tests, please make sure to edit the > remote_pool/swift.properties file to have the URLs of > your remote machines. > > There's an online swift sandbox here -> http://swift-lang.org/tryswift/ > > You can get documentation here for different releases of Swift -> > http://swift-lang.org/docs/index.php > > I would recommend getting Swift 0.95-RC6 from > (http://swift-lang.org/downloads/index.php). > > Thanks! > Yadu > > On 08/21/2014 12:16 PM, Michael Wilde wrote: >> >> >> >> -------- Original Message -------- >> Subject: Fwd: Swift/K node creation >> Date: Wed, 20 Aug 2014 16:23:51 -0600 >> From: Ganesh Gopalakrishnan >> >> To: Michael Wilde >> CC: Martin Burtscher >> , ian briggs >> , Mark Baranowski >> >> >> >> >> Hi Mike, >> >> After yesterday's call, Ian Briggs and Mark Baranowski have looked >> into Swift/K. I encouraged them to post a question to the Swift >> group. Hence this email is strictly redundant - but I thought >> I'd drop in an intro to give you some context for this question. >> Thanks! >> >> Best, >> >> Ganesh >> >> >> -------- Original Message -------- >> Subject: Swift/K node creation >> Date: Wed, 20 Aug 2014 16:19:57 -0600 >> From: Ian Briggs >> >> To: swift-user at ci.uchicago.edu >> CC: Ganesh Gopalakrishnan >> >> >> >> >> Thank you for reading this. >> >> I was able to install and run swift/k on my local computer. >> >From here I would like to install it on a remote machine >> and use that machine as a remote site. How should I go >> about doing this? If there is a guide I missed I apologize. >> >> Thanks, >> Ian Briggs >> >> >> >> > > -- Martin Burtscher, Ph.D. http://www.cs.txstate.edu/~burtscher/ Associate Professor E-mail: burtscher at txstate.edu Department of Computer Science Phone: (512) 245-3443 Texas State University Office: Comal 309C From amstocker at dons.usfca.edu Wed Sep 10 20:45:50 2014 From: amstocker at dons.usfca.edu (Andrew Stocker) Date: Wed, 10 Sep 2014 18:45:50 -0700 Subject: [Swift-user] Pointer to Swift tutorials for computational science education and research In-Reply-To: References: <539FC360.6090800@anl.gov> <59fc9be3f4d445ab8d502b2b1ce63af7@GEORGE.anl.gov> <13482362cf5c4f13a4d98c897b022f89@LUCKMAN.anl.gov> <362b5dde41ab45f5a0f455a267c058da@GEORGE.anl.gov> Message-ID: Ketan, I copied the catnap executable to the same directory on each of the computers and now the swift script is working perfectly without error. Thanks for your help! What are the next steps we can take to set up our cluster to not require the script to be on all the computers? Since we are fairly new to parallel computing with a cluster, could you point us towards any resources regarding the technical configuration for Swift? I've looked at the documentation for tc.data but I am still a bit confused by it. Thanks, Andrew On Tue, Sep 9, 2014 at 7:49 AM, Ketan Maheshwari wrote: > > On Mon, Sep 8, 2014 at 6:28 PM, Andrew Stocker > wrote: > >> Thanks for your response! >> >> Since we're just in the stages of experimentation, our preliminary >> cluster is just four iMacs connected to a switch. I set up password-less >> ssh communication between the four and I'm able to start the coaster >> service (in the folder with coaster-service.conf) without any errors. I am >> running Swift from the computer which has the catnap.sh installed at the >> correct path, and I'm pretty sure it has the executable bit set ( #!/bin/sh >> is the first line of the program). None of the other three computers >> have Swift installed, nor do they have catnap.sh at the location >> specified in tc.data, is this a problem? >> > > Yes, that seems to be the issue. The executable--catnap.sh in this case > must be available on all compute nodes in the location specified in the tc. > > An alternative in this case is to use catnap.sh as data and move it along > with data to target compute nodes. However, we can do that later. For now, > could you try to put catnap.sh in a common location on each of the compute > nodes and try again. > > No, Swift is not needed to be installed on compute nodes. Swift just needs > to be on the submit node. > > >> >> Attached is the log file from the run when I got the error I >> copy+pasted above. Interestingly, when I run the catnap swift script with >> only 3 concurrent instances, it seems to run fine since we allow 3 jobs per >> node and so it is probably only running locally. >> >> Regards, >> Andrew >> >> On Fri, Sep 5, 2014 at 6:03 PM, Ketan Maheshwari >> wrote: >> >>> Hi Andrew, >>> >>> Yes, I remember: thanks for getting back on this. >>> >>> From the error message and tc.data, indeed it looks like the >>> executable is provided as absolute path but somehow Swift is looking into >>> system path and not finding it. One possibility is that the node on which >>> catnap.sh is running does not have it installed on the path specified in >>> the tc.data. Can you also check if catnap.sh has the executable bit set. >>> Less likely that this is causing the issue though. >>> >>> Also, from the tc.data line it looks like you are using persistent >>> coasters. Have started the coaster service beforehand and made sure the >>> service started correctly without any error messages. Could you indicate >>> more about your cluster. Depending on the type of cluster, it is possible >>> that we can run Swift in a non-persistent, implicit coasters mode. >>> >>> Can you also send the Swift generated log for this run. >>> >>> Thanks, >>> Ketan >>> >>> >>> On Fri, Sep 5, 2014 at 7:16 PM, Andrew Stocker >> > wrote: >>> >>>> Hi Ketan, >>>> >>>> I'm not sure if you remember, but myself and my research advisor >>>> Xiaosheng spoke to you at LBL in Oakland at the beginning of the summer >>>> about starting to use Swift at our school. We have been working hard on >>>> setting it up, and I am trying to get your demo to run but I'm having a >>>> problem. For some reason I keep getting the following error when I try to >>>> run your catsnsleep demo: >>>> >>>> Execution failed: >>>> Exception in catnap: >>>> Arguments: [5, data.txt] >>>> Host: persistent-coasters >>>> Directory: catsnsleep-20140905-1702-mihcat06/jobs/u/catnap-utx080xl >>>> >>>> Caused by: >>>> Cannot find executable catnap.sh on site system path >>>> catnap, catsnsleep.swift, line 13 >>>> >>>> However I'm not sure why. In our tc.data file we have the line: >>>> >>>> persistent-coasters catnap >>>> /usr/local/swift-0.94.1/oakland-demo/catnap.sh >>>> >>>> which I think should work but obviously something is going wrong. I >>>> have been browsing the documentation articles but I can't find anything >>>> about why this might be happening. We would greatly appreciate your advice! >>>> >>>> Regards, >>>> >>>> Andrew Stocker >>>> >>>> >>>> On Fri, Jun 20, 2014 at 12:00 PM, Xiaosheng Huang >>>> wrote: >>>> >>>>> >>>>> ---------- Forwarded message ---------- >>>>> From: Ketan Maheshwari >>>>> Date: Fri, Jun 20, 2014 at 11:45 AM >>>>> Subject: Re: Pointer to Swift tutorials for computational science >>>>> education and research >>>>> To: Xiaosheng Huang >>>>> Cc: Wilde >>>>> >>>>> >>>>> Hi Xiaosheng, >>>>> >>>>> The tarball is: http://www.mcs.anl.gov/~ketan/oakland-demo.tgz >>>>> >>>>> There is a small README in there which outlines the steps. >>>>> >>>>> Best, >>>>> Ketan >>>>> >>>>> ************************************************************ >>>>> Xiaosheng Huang, Assistant Professor >>>>> Department of Physics and Astronomy >>>>> University of San Francisco >>>>> 2130 Fulton Street, San Francisco, CA 94117-1080 >>>>> >>>>> Phone: (415) 422-6281 >>>>> E-mail: xhuang22 at usfca.edu >>>>> ************************************************************ >>>>> >>>> >>>> >>> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Wed Sep 10 23:03:21 2014 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Wed, 10 Sep 2014 23:03:21 -0500 Subject: [Swift-user] Pointer to Swift tutorials for computational science education and research In-Reply-To: References: <539FC360.6090800@anl.gov> <59fc9be3f4d445ab8d502b2b1ce63af7@GEORGE.anl.gov> <13482362cf5c4f13a4d98c897b022f89@LUCKMAN.anl.gov> <362b5dde41ab45f5a0f455a267c058da@GEORGE.anl.gov> Message-ID: Hi Andrew, If you would like to have swift move the executable for you, you could try the method used in the following example : type file; /* App definition calls bash which in-turn executes the bash script defined as an argument here Every file in the input parameter list is staged by swift to the worker nodes Every file in the return list is staged back to the client node by swift */ app (file out, file err) foo (file script, file input) { bash @script @input stdout=@out stderr=@err; } // Script to be executed file wrapper <"wrapper.sh">; file hello <"hello.txt">; file out <"foo.out">; file err <"foo.err">; (out, err) = foo (wrapper, hello); Thanks, Yadu On Wed, Sep 10, 2014 at 8:45 PM, Andrew Stocker wrote: > Ketan, > > I copied the catnap executable to the same directory on each of the > computers and now the swift script is working perfectly without error. > Thanks for your help! What are the next steps we can take to set up our > cluster to not require the script to be on all the computers? Since we are > fairly new to parallel computing with a cluster, could you point us towards > any resources regarding the technical configuration for Swift? I've looked > at the documentation for tc.data but I am still a bit confused by it. > > Thanks, > > Andrew > > On Tue, Sep 9, 2014 at 7:49 AM, Ketan Maheshwari > wrote: > >> >> On Mon, Sep 8, 2014 at 6:28 PM, Andrew Stocker >> wrote: >> >>> Thanks for your response! >>> >>> Since we're just in the stages of experimentation, our preliminary >>> cluster is just four iMacs connected to a switch. I set up password-less >>> ssh communication between the four and I'm able to start the coaster >>> service (in the folder with coaster-service.conf) without any errors. I am >>> running Swift from the computer which has the catnap.sh installed at >>> the correct path, and I'm pretty sure it has the executable bit set ( #!/bin/sh >>> is the first line of the program). None of the other three computers >>> have Swift installed, nor do they have catnap.sh at the location >>> specified in tc.data, is this a problem? >>> >> >> Yes, that seems to be the issue. The executable--catnap.sh in this case >> must be available on all compute nodes in the location specified in the tc. >> >> An alternative in this case is to use catnap.sh as data and move it along >> with data to target compute nodes. However, we can do that later. For now, >> could you try to put catnap.sh in a common location on each of the compute >> nodes and try again. >> >> No, Swift is not needed to be installed on compute nodes. Swift just >> needs to be on the submit node. >> >> >>> >>> Attached is the log file from the run when I got the error I >>> copy+pasted above. Interestingly, when I run the catnap swift script with >>> only 3 concurrent instances, it seems to run fine since we allow 3 jobs per >>> node and so it is probably only running locally. >>> >>> Regards, >>> Andrew >>> >>> On Fri, Sep 5, 2014 at 6:03 PM, Ketan Maheshwari >>> wrote: >>> >>>> Hi Andrew, >>>> >>>> Yes, I remember: thanks for getting back on this. >>>> >>>> From the error message and tc.data, indeed it looks like the >>>> executable is provided as absolute path but somehow Swift is looking into >>>> system path and not finding it. One possibility is that the node on which >>>> catnap.sh is running does not have it installed on the path specified in >>>> the tc.data. Can you also check if catnap.sh has the executable bit set. >>>> Less likely that this is causing the issue though. >>>> >>>> Also, from the tc.data line it looks like you are using persistent >>>> coasters. Have started the coaster service beforehand and made sure the >>>> service started correctly without any error messages. Could you indicate >>>> more about your cluster. Depending on the type of cluster, it is possible >>>> that we can run Swift in a non-persistent, implicit coasters mode. >>>> >>>> Can you also send the Swift generated log for this run. >>>> >>>> Thanks, >>>> Ketan >>>> >>>> >>>> On Fri, Sep 5, 2014 at 7:16 PM, Andrew Stocker < >>>> amstocker at dons.usfca.edu> wrote: >>>> >>>>> Hi Ketan, >>>>> >>>>> I'm not sure if you remember, but myself and my research advisor >>>>> Xiaosheng spoke to you at LBL in Oakland at the beginning of the summer >>>>> about starting to use Swift at our school. We have been working hard on >>>>> setting it up, and I am trying to get your demo to run but I'm having a >>>>> problem. For some reason I keep getting the following error when I try to >>>>> run your catsnsleep demo: >>>>> >>>>> Execution failed: >>>>> Exception in catnap: >>>>> Arguments: [5, data.txt] >>>>> Host: persistent-coasters >>>>> Directory: catsnsleep-20140905-1702-mihcat06/jobs/u/catnap-utx080xl >>>>> >>>>> Caused by: >>>>> Cannot find executable catnap.sh on site system path >>>>> catnap, catsnsleep.swift, line 13 >>>>> >>>>> However I'm not sure why. In our tc.data file we have the line: >>>>> >>>>> persistent-coasters catnap >>>>> /usr/local/swift-0.94.1/oakland-demo/catnap.sh >>>>> >>>>> which I think should work but obviously something is going wrong. I >>>>> have been browsing the documentation articles but I can't find anything >>>>> about why this might be happening. We would greatly appreciate your advice! >>>>> >>>>> Regards, >>>>> >>>>> Andrew Stocker >>>>> >>>>> >>>>> On Fri, Jun 20, 2014 at 12:00 PM, Xiaosheng Huang >>>>> wrote: >>>>> >>>>>> >>>>>> ---------- Forwarded message ---------- >>>>>> From: Ketan Maheshwari >>>>>> Date: Fri, Jun 20, 2014 at 11:45 AM >>>>>> Subject: Re: Pointer to Swift tutorials for computational science >>>>>> education and research >>>>>> To: Xiaosheng Huang >>>>>> Cc: Wilde >>>>>> >>>>>> >>>>>> Hi Xiaosheng, >>>>>> >>>>>> The tarball is: http://www.mcs.anl.gov/~ketan/oakland-demo.tgz >>>>>> >>>>>> There is a small README in there which outlines the steps. >>>>>> >>>>>> Best, >>>>>> Ketan >>>>>> >>>>>> ************************************************************ >>>>>> Xiaosheng Huang, Assistant Professor >>>>>> Department of Physics and Astronomy >>>>>> University of San Francisco >>>>>> 2130 Fulton Street, San Francisco, CA 94117-1080 >>>>>> >>>>>> Phone: (415) 422-6281 >>>>>> E-mail: xhuang22 at usfca.edu >>>>>> ************************************************************ >>>>>> >>>>> >>>>> >>>> >>> >> > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- Yadu Nand B -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Thu Sep 11 08:52:18 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Thu, 11 Sep 2014 08:52:18 -0500 Subject: [Swift-user] Pointer to Swift tutorials for computational science education and research In-Reply-To: References: <539FC360.6090800@anl.gov> <59fc9be3f4d445ab8d502b2b1ce63af7@GEORGE.anl.gov> <13482362cf5c4f13a4d98c897b022f89@LUCKMAN.anl.gov> <362b5dde41ab45f5a0f455a267c058da@GEORGE.anl.gov> Message-ID: Andrew, Just to be clear, in the new script that Yadu posted, you will need to add one entry in your tc.data as follows: persistent-coasters bash /bin/bash Thanks, Ketan On Wed, Sep 10, 2014 at 11:03 PM, Yadu Nand wrote: > Hi Andrew, > > If you would like to have swift move the executable for you, you could try > the method used in the following example : > > type file; > > /* App definition calls bash which in-turn executes the bash script > defined as an argument here > Every file in the input parameter list is staged by swift to the worker > nodes > Every file in the return list is staged back to the client node by swift > */ > app (file out, file err) foo (file script, file input) > { > bash @script @input stdout=@out stderr=@err; > } > > // Script to be executed > file wrapper <"wrapper.sh">; > file hello <"hello.txt">; > > file out <"foo.out">; > file err <"foo.err">; > > (out, err) = foo (wrapper, hello); > > Thanks, > Yadu > > > > On Wed, Sep 10, 2014 at 8:45 PM, Andrew Stocker > wrote: > >> Ketan, >> >> I copied the catnap executable to the same directory on each of the >> computers and now the swift script is working perfectly without error. >> Thanks for your help! What are the next steps we can take to set up our >> cluster to not require the script to be on all the computers? Since we are >> fairly new to parallel computing with a cluster, could you point us towards >> any resources regarding the technical configuration for Swift? I've looked >> at the documentation for tc.data but I am still a bit confused by it. >> >> Thanks, >> >> Andrew >> >> On Tue, Sep 9, 2014 at 7:49 AM, Ketan Maheshwari >> wrote: >> >>> >>> On Mon, Sep 8, 2014 at 6:28 PM, Andrew Stocker >> > wrote: >>> >>>> Thanks for your response! >>>> >>>> Since we're just in the stages of experimentation, our preliminary >>>> cluster is just four iMacs connected to a switch. I set up password-less >>>> ssh communication between the four and I'm able to start the coaster >>>> service (in the folder with coaster-service.conf) without any errors. I am >>>> running Swift from the computer which has the catnap.sh installed at >>>> the correct path, and I'm pretty sure it has the executable bit set ( #!/bin/sh >>>> is the first line of the program). None of the other three computers >>>> have Swift installed, nor do they have catnap.sh at the location >>>> specified in tc.data, is this a problem? >>>> >>> >>> Yes, that seems to be the issue. The executable--catnap.sh in this case >>> must be available on all compute nodes in the location specified in the tc. >>> >>> An alternative in this case is to use catnap.sh as data and move it >>> along with data to target compute nodes. However, we can do that later. >>> For now, could you try to put catnap.sh in a common location on each of the >>> compute nodes and try again. >>> >>> No, Swift is not needed to be installed on compute nodes. Swift just >>> needs to be on the submit node. >>> >>> >>>> >>>> Attached is the log file from the run when I got the error I >>>> copy+pasted above. Interestingly, when I run the catnap swift script with >>>> only 3 concurrent instances, it seems to run fine since we allow 3 jobs per >>>> node and so it is probably only running locally. >>>> >>>> Regards, >>>> Andrew >>>> >>>> On Fri, Sep 5, 2014 at 6:03 PM, Ketan Maheshwari >>>> wrote: >>>> >>>>> Hi Andrew, >>>>> >>>>> Yes, I remember: thanks for getting back on this. >>>>> >>>>> From the error message and tc.data, indeed it looks like the >>>>> executable is provided as absolute path but somehow Swift is looking into >>>>> system path and not finding it. One possibility is that the node on which >>>>> catnap.sh is running does not have it installed on the path specified in >>>>> the tc.data. Can you also check if catnap.sh has the executable bit set. >>>>> Less likely that this is causing the issue though. >>>>> >>>>> Also, from the tc.data line it looks like you are using persistent >>>>> coasters. Have started the coaster service beforehand and made sure the >>>>> service started correctly without any error messages. Could you indicate >>>>> more about your cluster. Depending on the type of cluster, it is possible >>>>> that we can run Swift in a non-persistent, implicit coasters mode. >>>>> >>>>> Can you also send the Swift generated log for this run. >>>>> >>>>> Thanks, >>>>> Ketan >>>>> >>>>> >>>>> On Fri, Sep 5, 2014 at 7:16 PM, Andrew Stocker < >>>>> amstocker at dons.usfca.edu> wrote: >>>>> >>>>>> Hi Ketan, >>>>>> >>>>>> I'm not sure if you remember, but myself and my research advisor >>>>>> Xiaosheng spoke to you at LBL in Oakland at the beginning of the summer >>>>>> about starting to use Swift at our school. We have been working hard on >>>>>> setting it up, and I am trying to get your demo to run but I'm having a >>>>>> problem. For some reason I keep getting the following error when I try to >>>>>> run your catsnsleep demo: >>>>>> >>>>>> Execution failed: >>>>>> Exception in catnap: >>>>>> Arguments: [5, data.txt] >>>>>> Host: persistent-coasters >>>>>> Directory: >>>>>> catsnsleep-20140905-1702-mihcat06/jobs/u/catnap-utx080xl >>>>>> >>>>>> Caused by: >>>>>> Cannot find executable catnap.sh on site system path >>>>>> catnap, catsnsleep.swift, line 13 >>>>>> >>>>>> However I'm not sure why. In our tc.data file we have the line: >>>>>> >>>>>> persistent-coasters catnap >>>>>> /usr/local/swift-0.94.1/oakland-demo/catnap.sh >>>>>> >>>>>> which I think should work but obviously something is going wrong. >>>>>> I have been browsing the documentation articles but I can't find anything >>>>>> about why this might be happening. We would greatly appreciate your advice! >>>>>> >>>>>> Regards, >>>>>> >>>>>> Andrew Stocker >>>>>> >>>>>> >>>>>> On Fri, Jun 20, 2014 at 12:00 PM, Xiaosheng Huang >>>>> > wrote: >>>>>> >>>>>>> >>>>>>> ---------- Forwarded message ---------- >>>>>>> From: Ketan Maheshwari >>>>>>> Date: Fri, Jun 20, 2014 at 11:45 AM >>>>>>> Subject: Re: Pointer to Swift tutorials for computational science >>>>>>> education and research >>>>>>> To: Xiaosheng Huang >>>>>>> Cc: Wilde >>>>>>> >>>>>>> >>>>>>> Hi Xiaosheng, >>>>>>> >>>>>>> The tarball is: http://www.mcs.anl.gov/~ketan/oakland-demo.tgz >>>>>>> >>>>>>> There is a small README in there which outlines the steps. >>>>>>> >>>>>>> Best, >>>>>>> Ketan >>>>>>> >>>>>>> ************************************************************ >>>>>>> Xiaosheng Huang, Assistant Professor >>>>>>> Department of Physics and Astronomy >>>>>>> University of San Francisco >>>>>>> 2130 Fulton Street, San Francisco, CA 94117-1080 >>>>>>> >>>>>>> Phone: (415) 422-6281 >>>>>>> E-mail: xhuang22 at usfca.edu >>>>>>> ************************************************************ >>>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > > > -- > Yadu Nand B > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From amstocker at dons.usfca.edu Fri Sep 12 18:23:13 2014 From: amstocker at dons.usfca.edu (Andrew Stocker) Date: Fri, 12 Sep 2014 16:23:13 -0700 Subject: [Swift-user] Pointer to Swift tutorials for computational science education and research In-Reply-To: References: <539FC360.6090800@anl.gov> <59fc9be3f4d445ab8d502b2b1ce63af7@GEORGE.anl.gov> <13482362cf5c4f13a4d98c897b022f89@LUCKMAN.anl.gov> <362b5dde41ab45f5a0f455a267c058da@GEORGE.anl.gov> Message-ID: Ketan and Yadu, I made changes to our Swift script according to your example and now things are working fine without each node having the script. Thanks very much for your suggestions! Andrew On Thu, Sep 11, 2014 at 6:52 AM, Ketan Maheshwari wrote: > Andrew, > > Just to be clear, in the new script that Yadu posted, you will need to add > one entry in your tc.data as follows: > > persistent-coasters bash /bin/bash > > Thanks, > Ketan > > > > On Wed, Sep 10, 2014 at 11:03 PM, Yadu Nand wrote: > >> Hi Andrew, >> >> If you would like to have swift move the executable for you, you could >> try the method used in the following example : >> >> type file; >> >> /* App definition calls bash which in-turn executes the bash script >> defined as an argument here >> Every file in the input parameter list is staged by swift to the >> worker nodes >> Every file in the return list is staged back to the client node by >> swift >> */ >> app (file out, file err) foo (file script, file input) >> { >> bash @script @input stdout=@out stderr=@err; >> } >> >> // Script to be executed >> file wrapper <"wrapper.sh">; >> file hello <"hello.txt">; >> >> file out <"foo.out">; >> file err <"foo.err">; >> >> (out, err) = foo (wrapper, hello); >> >> Thanks, >> Yadu >> >> >> >> On Wed, Sep 10, 2014 at 8:45 PM, Andrew Stocker > > wrote: >> >>> Ketan, >>> >>> I copied the catnap executable to the same directory on each of the >>> computers and now the swift script is working perfectly without error. >>> Thanks for your help! What are the next steps we can take to set up our >>> cluster to not require the script to be on all the computers? Since we are >>> fairly new to parallel computing with a cluster, could you point us towards >>> any resources regarding the technical configuration for Swift? I've looked >>> at the documentation for tc.data but I am still a bit confused by it. >>> >>> Thanks, >>> >>> Andrew >>> >>> On Tue, Sep 9, 2014 at 7:49 AM, Ketan Maheshwari >>> wrote: >>> >>>> >>>> On Mon, Sep 8, 2014 at 6:28 PM, Andrew Stocker < >>>> amstocker at dons.usfca.edu> wrote: >>>> >>>>> Thanks for your response! >>>>> >>>>> Since we're just in the stages of experimentation, our preliminary >>>>> cluster is just four iMacs connected to a switch. I set up password-less >>>>> ssh communication between the four and I'm able to start the coaster >>>>> service (in the folder with coaster-service.conf) without any errors. I am >>>>> running Swift from the computer which has the catnap.sh installed at >>>>> the correct path, and I'm pretty sure it has the executable bit set ( #!/bin/sh >>>>> is the first line of the program). None of the other three computers >>>>> have Swift installed, nor do they have catnap.sh at the location >>>>> specified in tc.data, is this a problem? >>>>> >>>> >>>> Yes, that seems to be the issue. The executable--catnap.sh in this case >>>> must be available on all compute nodes in the location specified in the tc. >>>> >>>> An alternative in this case is to use catnap.sh as data and move it >>>> along with data to target compute nodes. However, we can do that later. >>>> For now, could you try to put catnap.sh in a common location on each of the >>>> compute nodes and try again. >>>> >>>> No, Swift is not needed to be installed on compute nodes. Swift just >>>> needs to be on the submit node. >>>> >>>> >>>>> >>>>> Attached is the log file from the run when I got the error I >>>>> copy+pasted above. Interestingly, when I run the catnap swift script with >>>>> only 3 concurrent instances, it seems to run fine since we allow 3 jobs per >>>>> node and so it is probably only running locally. >>>>> >>>>> Regards, >>>>> Andrew >>>>> >>>>> On Fri, Sep 5, 2014 at 6:03 PM, Ketan Maheshwari >>>>> wrote: >>>>> >>>>>> Hi Andrew, >>>>>> >>>>>> Yes, I remember: thanks for getting back on this. >>>>>> >>>>>> From the error message and tc.data, indeed it looks like the >>>>>> executable is provided as absolute path but somehow Swift is looking into >>>>>> system path and not finding it. One possibility is that the node on which >>>>>> catnap.sh is running does not have it installed on the path specified in >>>>>> the tc.data. Can you also check if catnap.sh has the executable bit set. >>>>>> Less likely that this is causing the issue though. >>>>>> >>>>>> Also, from the tc.data line it looks like you are using persistent >>>>>> coasters. Have started the coaster service beforehand and made sure the >>>>>> service started correctly without any error messages. Could you indicate >>>>>> more about your cluster. Depending on the type of cluster, it is possible >>>>>> that we can run Swift in a non-persistent, implicit coasters mode. >>>>>> >>>>>> Can you also send the Swift generated log for this run. >>>>>> >>>>>> Thanks, >>>>>> Ketan >>>>>> >>>>>> >>>>>> On Fri, Sep 5, 2014 at 7:16 PM, Andrew Stocker < >>>>>> amstocker at dons.usfca.edu> wrote: >>>>>> >>>>>>> Hi Ketan, >>>>>>> >>>>>>> I'm not sure if you remember, but myself and my research advisor >>>>>>> Xiaosheng spoke to you at LBL in Oakland at the beginning of the summer >>>>>>> about starting to use Swift at our school. We have been working hard on >>>>>>> setting it up, and I am trying to get your demo to run but I'm having a >>>>>>> problem. For some reason I keep getting the following error when I try to >>>>>>> run your catsnsleep demo: >>>>>>> >>>>>>> Execution failed: >>>>>>> Exception in catnap: >>>>>>> Arguments: [5, data.txt] >>>>>>> Host: persistent-coasters >>>>>>> Directory: >>>>>>> catsnsleep-20140905-1702-mihcat06/jobs/u/catnap-utx080xl >>>>>>> >>>>>>> Caused by: >>>>>>> Cannot find executable catnap.sh on site system path >>>>>>> catnap, catsnsleep.swift, line 13 >>>>>>> >>>>>>> However I'm not sure why. In our tc.data file we have the line: >>>>>>> >>>>>>> persistent-coasters catnap >>>>>>> /usr/local/swift-0.94.1/oakland-demo/catnap.sh >>>>>>> >>>>>>> which I think should work but obviously something is going wrong. >>>>>>> I have been browsing the documentation articles but I can't find anything >>>>>>> about why this might be happening. We would greatly appreciate your advice! >>>>>>> >>>>>>> Regards, >>>>>>> >>>>>>> Andrew Stocker >>>>>>> >>>>>>> >>>>>>> On Fri, Jun 20, 2014 at 12:00 PM, Xiaosheng Huang < >>>>>>> xhuang22 at usfca.edu> wrote: >>>>>>> >>>>>>>> >>>>>>>> ---------- Forwarded message ---------- >>>>>>>> From: Ketan Maheshwari >>>>>>>> Date: Fri, Jun 20, 2014 at 11:45 AM >>>>>>>> Subject: Re: Pointer to Swift tutorials for computational science >>>>>>>> education and research >>>>>>>> To: Xiaosheng Huang >>>>>>>> Cc: Wilde >>>>>>>> >>>>>>>> >>>>>>>> Hi Xiaosheng, >>>>>>>> >>>>>>>> The tarball is: http://www.mcs.anl.gov/~ketan/oakland-demo.tgz >>>>>>>> >>>>>>>> There is a small README in there which outlines the steps. >>>>>>>> >>>>>>>> Best, >>>>>>>> Ketan >>>>>>>> >>>>>>>> ************************************************************ >>>>>>>> Xiaosheng Huang, Assistant Professor >>>>>>>> Department of Physics and Astronomy >>>>>>>> University of San Francisco >>>>>>>> 2130 Fulton Street, San Francisco, CA 94117-1080 >>>>>>>> >>>>>>>> Phone: (415) 422-6281 >>>>>>>> E-mail: xhuang22 at usfca.edu >>>>>>>> ************************************************************ >>>>>>>> >>>>>>> >>>>>>> >>>>>> >>>>> >>>> >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >> >> >> >> -- >> Yadu Nand B >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From justinbbt at gmail.com Fri Sep 19 20:49:16 2014 From: justinbbt at gmail.com (Justin bbt) Date: Fri, 19 Sep 2014 21:49:16 -0400 Subject: [Swift-user] number of simultaneous jobs on a node Message-ID: Hi, How can we configure the number of simultaneous jobs on a node? The goal is to utilize all cores of a multi-core node if the with simultaneous running several instances of a program if the program itself need only one core. Best, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From justinbbt at gmail.com Fri Sep 19 20:48:48 2014 From: justinbbt at gmail.com (Justin bbt) Date: Fri, 19 Sep 2014 21:48:48 -0400 Subject: [Swift-user] problem with running coaster Message-ID: Hi all I have a problem in running swift over the cloud of Microsoft Azure. I have two nodes on the cloud. I installed swift on one of them and I can run part01-03 locally successfully. I am trying to use the coaster to run part04-part-6 on the other node. Here is the output I get for "swift p4.swift" Swift 0.95 RC6 swift-r7900 cog-r3908 RunID: run005 Warning: The @ syntax for function invocation is deprecated Progress: Sat, 20 Sep 2014 01:36:02+0000 Exception in thread "Scheduler" java.lang.NullPointerException at org.globus.cog.abstraction.impl.common.task.TaskImpl.hashCode(TaskImpl.java:364) at java.util.HashMap.hash(HashMap.java:338) at java.util.HashMap.get(HashMap.java:556) at org.griphyn.vdl.karajan.VDSAdaptiveScheduler.failTask(VDSAdaptiveScheduler.java:400) at org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:266) Progress: Sat, 20 Sep 2014 01:36:03+0000 Selecting site:10 No events in 1s. Finding dependency loops... Waiting threads: Thread: R-6, waiting on sims (declared on line 21) swift:execute, p4, line 96 analyze, p4, line 211 Thread: R-5-2-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-7-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-8-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-9-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-3x2, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-0-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-1-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-6-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-4-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5x2-3, waiting on simout (declared on line 24) assignment, p4, line 28 ---- No dependency loops found. The following threads are independently hung: Thread: R-6, waiting on sims (declared on line 21) swift:execute, p4, line 96 analyze, p4, line 211 ---- Irrecoverable error found. Exiting. This is the content of my coaster-service.conf export WORKER_LOCATION=. export WORKER_HOSTS="191.238.1.187" export WORKER_MODE=ssh export WORKER_USERNAME=azureuser export IPADDR=191.238.1.33 export WORKER_LOG_DIR=. export WORK=/home/ubuntu/work export JOBSPERNODE=1 export JOBTHROTTLE=10 export SSH_TUNNELING=yes And here is the content of start-coaster-service.log Running /home/azureuser/swift-0.95-RC6/bin/coaster-service -nosec -portfile /tmp/tmp.8WkG0mKKzo -localportfile /tmp/tmp.hZvsxQd5J1 -passive Switching log to: cps-2014-09-20_01-21-41.log 2014-09-20 01:21:41,305+0000 WARN CoasterPersistentService Switching log to: cps-2014-09-20_01-21-41.log Local contacts: [http://100.74.60.3:55071] 2014-09-20 01:21:41,338+0000 INFO Settings Local contacts: [ http://100.74.60.3:55071] Starting... id=0920-2101410 2014-09-20 01:21:41,350+0000 INFO BlockQueueProcessor Starting... id=0920-2101410 Started local service: http://100.74.60.3:55071 2014-09-20 01:21:41,350+0000 INFO CoasterService Started local service: http://100.74.60.3:55071 Started coaster service: http://100.74.60.3:42972 2014-09-20 01:21:41,351+0000 INFO CoasterService Started coaster service: http://100.74.60.3:42972 Started coaster service: http://100.74.60.3:42972 Worker connection URL: http://100.74.60.3:55071 Running ssh -N -T -R *:55071:localhost:55071 azureuser at 191.238.1.187 Running ssh azureuser at 191.238.1.187 mkdir -p . && mkdir -p . Running scp /home/azureuser/swift-0.95-RC6/bin/worker.pl azureuser at 191.238.1.187:. Running ssh azureuser at 191.238.1.187 WORKER_LOGGING_LEVEL= nohup ./worker.pl http://191.238.1.33:55071 191.238.1.187 . &> /dev/null & HeapMax: 13134987264, CrtHeap: 886571008, UsedHeap: 41712768 2014-09-20 01:21:51,356+0000 INFO CoasterService HeapMax: 13134987264, CrtHeap: 886571008, UsedHeap: 41712768 HeapMax: 13134987264, CrtHeap: 886571008, UsedHeap: 41712768 2014-09-20 01:22:01,364+0000 INFO CoasterService HeapMax: 13134987264, CrtHeap: 886571008, UsedHeap: 41712768 I had this problem before with my laptop connecting to a remote server and Yadu Nand Babuji told me it is because I don't have a public ip. But, I have the problem again now with my nodes in the cloud having the public ip. Though as can be seen in the log, during the connection coaster uses the private ip (100.74.60.3) Is this really a problem with public IP? Does anybody know how to solve this problem ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From justinbbt at gmail.com Mon Sep 22 12:24:45 2014 From: justinbbt at gmail.com (Justin bbt) Date: Mon, 22 Sep 2014 13:24:45 -0400 Subject: [Swift-user] Fwd: number of simultaneous jobs on a node In-Reply-To: References: Message-ID: ---------- Forwarded message ---------- From: Justin bbt Date: Fri, Sep 19, 2014 at 9:49 PM Subject: number of simultaneous jobs on a node To: Swift User Hi, How can we configure the number of simultaneous jobs on a node? The goal is to utilize all cores of a multi-core node if the with simultaneous running several instances of a program if the program itself need only one core. Best, Justin -------------- next part -------------- An HTML attachment was scrubbed... URL: From justinbbt at gmail.com Mon Sep 22 12:22:12 2014 From: justinbbt at gmail.com (Justin bbt) Date: Mon, 22 Sep 2014 13:22:12 -0400 Subject: [Swift-user] Fwd: problem with running coaster In-Reply-To: References: Message-ID: ---------- Forwarded message ---------- From: Justin bbt Date: Fri, Sep 19, 2014 at 9:48 PM Subject: problem with running coaster To: Swift User Cc: Yadu Nand Hi all I have a problem in running swift over the cloud of Microsoft Azure. I have two nodes on the cloud. I installed swift on one of them and I can run part01-03 locally successfully. I am trying to use the coaster to run part04-part-6 on the other node. Here is the output I get for "swift p4.swift" Swift 0.95 RC6 swift-r7900 cog-r3908 RunID: run005 Warning: The @ syntax for function invocation is deprecated Progress: Sat, 20 Sep 2014 01:36:02+0000 Exception in thread "Scheduler" java.lang.NullPointerException at org.globus.cog.abstraction.impl.common.task.TaskImpl.hashCode(TaskImpl.java:364) at java.util.HashMap.hash(HashMap.java:338) at java.util.HashMap.get(HashMap.java:556) at org.griphyn.vdl.karajan.VDSAdaptiveScheduler.failTask(VDSAdaptiveScheduler.java:400) at org.globus.cog.karajan.scheduler.LateBindingScheduler.run(LateBindingScheduler.java:266) Progress: Sat, 20 Sep 2014 01:36:03+0000 Selecting site:10 No events in 1s. Finding dependency loops... Waiting threads: Thread: R-6, waiting on sims (declared on line 21) swift:execute, p4, line 96 analyze, p4, line 211 Thread: R-5-2-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-7-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-8-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-9-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-3x2, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-0-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-1-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-6-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5-4-3, waiting on simout (declared on line 24) assignment, p4, line 28 Thread: R-5x2-3, waiting on simout (declared on line 24) assignment, p4, line 28 ---- No dependency loops found. The following threads are independently hung: Thread: R-6, waiting on sims (declared on line 21) swift:execute, p4, line 96 analyze, p4, line 211 ---- Irrecoverable error found. Exiting. This is the content of my coaster-service.conf export WORKER_LOCATION=. export WORKER_HOSTS="191.238.1.187" export WORKER_MODE=ssh export WORKER_USERNAME=azureuser export IPADDR=191.238.1.33 export WORKER_LOG_DIR=. export WORK=/home/ubuntu/work export JOBSPERNODE=1 export JOBTHROTTLE=10 export SSH_TUNNELING=yes And here is the content of start-coaster-service.log Running /home/azureuser/swift-0.95-RC6/bin/coaster-service -nosec -portfile /tmp/tmp.8WkG0mKKzo -localportfile /tmp/tmp.hZvsxQd5J1 -passive Switching log to: cps-2014-09-20_01-21-41.log 2014-09-20 01:21:41,305+0000 WARN CoasterPersistentService Switching log to: cps-2014-09-20_01-21-41.log Local contacts: [http://100.74.60.3:55071] 2014-09-20 01:21:41,338+0000 INFO Settings Local contacts: [ http://100.74.60.3:55071] Starting... id=0920-2101410 2014-09-20 01:21:41,350+0000 INFO BlockQueueProcessor Starting... id=0920-2101410 Started local service: http://100.74.60.3:55071 2014-09-20 01:21:41,350+0000 INFO CoasterService Started local service: http://100.74.60.3:55071 Started coaster service: http://100.74.60.3:42972 2014-09-20 01:21:41,351+0000 INFO CoasterService Started coaster service: http://100.74.60.3:42972 Started coaster service: http://100.74.60.3:42972 Worker connection URL: http://100.74.60.3:55071 Running ssh -N -T -R *:55071:localhost:55071 azureuser at 191.238.1.187 Running ssh azureuser at 191.238.1.187 mkdir -p . && mkdir -p . Running scp /home/azureuser/swift-0.95-RC6/bin/worker.pl azureuser at 191.238.1.187:. Running ssh azureuser at 191.238.1.187 WORKER_LOGGING_LEVEL= nohup ./worker.pl http://191.238.1.33:55071 191.238.1.187 . &> /dev/null & HeapMax: 13134987264, CrtHeap: 886571008, UsedHeap: 41712768 2014-09-20 01:21:51,356+0000 INFO CoasterService HeapMax: 13134987264, CrtHeap: 886571008, UsedHeap: 41712768 HeapMax: 13134987264, CrtHeap: 886571008, UsedHeap: 41712768 2014-09-20 01:22:01,364+0000 INFO CoasterService HeapMax: 13134987264, CrtHeap: 886571008, UsedHeap: 41712768 I had this problem before with my laptop connecting to a remote server and Yadu Nand Babuji told me it is because I don't have a public ip. But, I have the problem again now with my nodes in the cloud having the public ip. Though as can be seen in the log, during the connection coaster uses the private ip (100.74.60.3) Is this really a problem with public IP? Does anybody know how to solve this problem ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From justinbbt at gmail.com Mon Sep 22 20:32:49 2014 From: justinbbt at gmail.com (Justin bbt) Date: Mon, 22 Sep 2014 21:32:49 -0400 Subject: [Swift-user] question about xsede Message-ID: Hi If I want to use resources on the xsede https://www.xsede.org/overview which site config should I use ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From ketan at mcs.anl.gov Tue Sep 23 09:31:53 2014 From: ketan at mcs.anl.gov (Ketan Maheshwari) Date: Tue, 23 Sep 2014 09:31:53 -0500 Subject: [Swift-user] question about xsede In-Reply-To: References: Message-ID: Hi Justin, If you are using xsede Stampede regular nodes (non xeon phi), here is a site configuration that has worked for me in the past, connecting over ssh to slurm: 1 1 7500 00:10:00 100 100 normal 1 1 1 TG-EAR130015 .3199 10000 /tmp/{env.USER}/swift.work You will need to replace project id with yours. Thanks, Ketan On Mon, Sep 22, 2014 at 8:32 PM, Justin bbt wrote: > Hi > > If I want to use resources on the xsede > https://www.xsede.org/overview > which site config should I use ? > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -------------- next part -------------- An HTML attachment was scrubbed... URL: From justinbbt at gmail.com Tue Sep 23 17:22:07 2014 From: justinbbt at gmail.com (Justin bbt) Date: Tue, 23 Sep 2014 18:22:07 -0400 Subject: [Swift-user] question about xsede In-Reply-To: References: Message-ID: Thank you very much. I am actually using the Blacklight, which I guess is PBS based. So, should I use the Crays tutorial and setting ? http://swift-lang.org/tutorials/cray/tutorial.html On Tue, Sep 23, 2014 at 10:31 AM, Ketan Maheshwari wrote: > Hi Justin, > > If you are using xsede Stampede regular nodes (non xeon phi), here is a > site configuration that has worked for me in the past, connecting over ssh > to slurm: > > > > > > > 1 > 1 > 7500 > 00:10:00 > 100 > 100 > normal > 1 > 1 > 1 > TG-EAR130015 > .3199 > 10000 > /tmp/{env.USER}/swift.work > > > > You will need to replace project id with yours. > > Thanks, > Ketan > > On Mon, Sep 22, 2014 at 8:32 PM, Justin bbt wrote: > >> Hi >> >> If I want to use resources on the xsede >> https://www.xsede.org/overview >> which site config should I use ? >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at anl.gov Tue Sep 23 17:30:48 2014 From: wilde at anl.gov (Michael Wilde) Date: Tue, 23 Sep 2014 17:30:48 -0500 Subject: [Swift-user] question about xsede In-Reply-To: References: Message-ID: <5421F498.7070909@anl.gov> Justin, Blacklight is an SGI machine, and thus a more standard PBS machine to which the Cray-specific sites file settings do not apply. It does seem to have a few quirks, though, as describe at: https://portal.xsede.org/psc-blacklight#running In particular, the page above suggests using "-l ncpus=N" where N is the number of CPUS desired and a multiple of 16. But by default Swift will generate "-l nodes=n" where n is the number of nodes. So we'll need to either do some testing (once we get on the machine) or send you our best guess of the right sites settings and work with you to test them for us. We'll look into this further and report back. - Mike On 9/23/14, 5:22 PM, Justin bbt wrote: > Thank you very much. > > I am actually using the Blacklight, which I guess is PBS based. > So, should I use the Crays tutorial and setting ? > http://swift-lang.org/tutorials/cray/tutorial.html > > On Tue, Sep 23, 2014 at 10:31 AM, Ketan Maheshwari > wrote: > > Hi Justin, > > If you are using xsede Stampede regular nodes (non xeon phi), here > is a site configuration that has worked for me in the past, > connecting over ssh to slurm: > > > > > url="stampede.tacc.utexas.edu "/> > > 1 > 1 > 7500 > 00:10:00 > 100 > 100 > normal > 1 > 1 > 1 > TG-EAR130015 > .3199 > 10000 > /tmp/{env.USER}/swift.work > > > > You will need to replace project id with yours. > > Thanks, > Ketan > > On Mon, Sep 22, 2014 at 8:32 PM, Justin bbt > wrote: > > Hi > > If I want to use resources on the xsede > https://www.xsede.org/overview > which site config should I use ? > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago -------------- next part -------------- An HTML attachment was scrubbed... URL: From yadudoc1729 at gmail.com Thu Sep 25 17:25:01 2014 From: yadudoc1729 at gmail.com (Yadu Nand) Date: Thu, 25 Sep 2014 17:25:01 -0500 Subject: [Swift-user] question about xsede In-Reply-To: <5421F498.7070909@anl.gov> References: <5421F498.7070909@anl.gov> Message-ID: Hi Justin, Here are some tested configs and a small README from running the sanity test on Blacklight: http://users.rcc.uchicago.edu/~yadunand/blacklight-sanity/ There's an example each of configs for Swift 0.94, Swift 0.95 and the configs we would use going forward (Swift 0.96 and current trunk) in that folder. In the example, I've used ppn=16 (or any multiple of 16) which seems to work as a substitute for ncpus. Hope that helps! -Yadu On Tue, Sep 23, 2014 at 5:30 PM, Michael Wilde wrote: > Justin, > > Blacklight is an SGI machine, and thus a more standard PBS machine to > which the Cray-specific sites file settings do not apply. > > It does seem to have a few quirks, though, as describe at: > https://portal.xsede.org/psc-blacklight#running > > In particular, the page above suggests using "-l ncpus=N" where N is the > number of CPUS desired and a multiple of 16. But by default Swift will > generate "-l nodes=n" where n is the number of nodes. > > So we'll need to either do some testing (once we get on the machine) or > send you our best guess of the right sites settings and work with you to > test them for us. > > We'll look into this further and report back. > > - Mike > > > On 9/23/14, 5:22 PM, Justin bbt wrote: > > Thank you very much. > > I am actually using the Blacklight, which I guess is PBS based. > So, should I use the Crays tutorial and setting ? > http://swift-lang.org/tutorials/cray/tutorial.html > > On Tue, Sep 23, 2014 at 10:31 AM, Ketan Maheshwari > wrote: > >> Hi Justin, >> >> If you are using xsede Stampede regular nodes (non xeon phi), here is a >> site configuration that has worked for me in the past, connecting over ssh >> to slurm: >> >> >> >> >> >> >> 1 >> 1 >> 7500 >> 00:10:00 >> 100 >> 100 >> normal >> 1 >> 1 >> 1 >> TG-EAR130015 >> .3199 >> 10000 >> /tmp/{env.USER}/swift.work >> >> >> >> You will need to replace project id with yours. >> >> Thanks, >> Ketan >> >> On Mon, Sep 22, 2014 at 8:32 PM, Justin bbt wrote: >> >>> Hi >>> >>> If I want to use resources on the xsede >>> https://www.xsede.org/overview >>> which site config should I use ? >>> >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >> >> > > > _______________________________________________ > Swift-user mailing listSwift-user at ci.uchicago.eduhttps://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > -- > Michael Wilde > Mathematics and Computer Science Computation Institute > Argonne National Laboratory The University of Chicago > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > -- ?? Yadu Nand B ?? ?? -------------- next part -------------- An HTML attachment was scrubbed... URL: From pwj3 at txstate.edu Fri Sep 26 13:18:22 2014 From: pwj3 at txstate.edu (Janovics, Peter W) Date: Fri, 26 Sep 2014 18:18:22 +0000 Subject: [Swift-user] Swift parallel scripting language questions Message-ID: <20FEAE5E-D09E-404C-A82F-51CC5F5C5DB5@txstate.edu> Dear Swift development team, My name is Peter Janovics, and I am a researcher in Dr. Burtscher?s lab at Texas State University. For the past month I have been learning the Swift programming language because it supports parallel execution and can execute other applications that are parallel. I have several questions and encountered some difficulties that you may want to be made aware of. I noticed that the ?Try Swift online? application on your webpage does not work and now appears to be offline. Is there a timeframe for when it will be back online and working? Is an updated version of the tutorial available that requires a slightly lower learning curve and/or includes more examples with explanations? I have been trying to piece together information but feel that there is a gap between the tutorial, which is based on an older version of Swift, and the documentation, which explains some of the new keywords but does not give enough complete examples with descriptions. Are there any examples of executing simple Linux commands using Swift? The documentation mentions having applications appended to a file. Does this mean that any command that I would like to execute in Swift needs to be included in this file, including shell commands? There is a section that explains how to use Swift with MPI, but the files shown do not seem complete. Moreover, I would like to execute MPI code on a supercomputer in a batch environment. Are there any instructions on how to use Swift in such an environment? Also, what is the required folder structure to ensure that Swift will find the files needed to execute MPI code? Thank you for your time, Peter Janovics From justinbbt at gmail.com Sat Sep 27 14:41:53 2014 From: justinbbt at gmail.com (Justin bbt) Date: Sat, 27 Sep 2014 15:41:53 -0400 Subject: [Swift-user] question about xsede In-Reply-To: References: Message-ID: So, I am using this config {env.PWD}/../app 1 normal 16 00:01:00 3600 100 100 2 1 1 .320 10000 1 TG-CCR134513 /brashear/usrname local but, it does not work and give this error : Swift 0.95 RC6 swift-r7900 cog-r3908 RunID: run011 Warning: The @ syntax for function invocation is deprecated [Error] sites.xml, line 2, col 10: cvc-elt.1: Cannot find the declaration of element 'config'. Progress: Sat, 27 Sep 2014 15:35:52-0400 Could not submit job (qsub reported an exit code of 170). qsub: Unknown queue MSG=cannot locate queue Execution failed: Exception in simulate: Arguments: [--timesteps, 1, --range, 100, --nvalues, 5] Host: black Directory: p4-run011/jobs/s/simulate-sdm041yl exception @ swift-int-staging.k, line: 181 Caused by: exception @ swift-int-staging.k, line: 177 Caused by: Block task failed: Error submitting block task org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could not submit job (qsub reported an exit code of 170). qsub: Unknown queue MSG=cannot locate queue at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63) at org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) at org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:61) at org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:70) Caused by: org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could not submit job (qsub reported an exit code of 170). qsub: Unknown queue MSG=cannot locate queue at org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:113) at org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53) ... 3 more also, in some configs I saw this. But I dont know what this is and what values should I set to pbs.aprun;pbs.mpp;depth=32 On Tue, Sep 23, 2014 at 6:22 PM, Justin bbt wrote: > Thank you very much. > > I am actually using the Blacklight, which I guess is PBS based. > So, should I use the Crays tutorial and setting ? > http://swift-lang.org/tutorials/cray/tutorial.html > > On Tue, Sep 23, 2014 at 10:31 AM, Ketan Maheshwari > wrote: > >> Hi Justin, >> >> If you are using xsede Stampede regular nodes (non xeon phi), here is a >> site configuration that has worked for me in the past, connecting over ssh >> to slurm: >> >> >> >> >> >> >> 1 >> 1 >> 7500 >> 00:10:00 >> 100 >> 100 >> normal >> 1 >> 1 >> 1 >> TG-EAR130015 >> .3199 >> 10000 >> /tmp/{env.USER}/swift.work >> >> >> >> You will need to replace project id with yours. >> >> Thanks, >> Ketan >> >> On Mon, Sep 22, 2014 at 8:32 PM, Justin bbt wrote: >> >>> Hi >>> >>> If I want to use resources on the xsede >>> https://www.xsede.org/overview >>> which site config should I use ? >>> >>> >>> _______________________________________________ >>> Swift-user mailing list >>> Swift-user at ci.uchicago.edu >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >>> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hategan at mcs.anl.gov Sat Sep 27 16:17:10 2014 From: hategan at mcs.anl.gov (Mihael Hategan) Date: Sat, 27 Sep 2014 14:17:10 -0700 Subject: [Swift-user] question about xsede In-Reply-To: References: Message-ID: <1411852630.2517.1.camel@echo> Hi Justin, Is there a queue named "normal" on that machine (qstat -q should tell you)? Mihael On Sat, 2014-09-27 at 15:41 -0400, Justin bbt wrote: > So, I am using this config > > > > > {env.PWD}/../app > 1 > normal > 16 > 00:01:00 > 3600 > 100 > 100 > 2 > 1 > 1 > .320 > 10000 > 1 > TG-CCR134513 > /brashear/usrname > local > > > > but, it does not work and give this error : > > Swift 0.95 RC6 swift-r7900 cog-r3908 > RunID: run011 > Warning: The @ syntax for function invocation is deprecated > [Error] sites.xml, line 2, col 10: cvc-elt.1: Cannot find the declaration > of element 'config'. > Progress: Sat, 27 Sep 2014 15:35:52-0400 > > Could not submit job (qsub reported an exit code of 170). > qsub: Unknown queue MSG=cannot locate queue > > Execution failed: > Exception in simulate: > Arguments: [--timesteps, 1, --range, 100, --nvalues, 5] > Host: black > Directory: p4-run011/jobs/s/simulate-sdm041yl > exception @ swift-int-staging.k, line: 181 > Caused by: > exception @ swift-int-staging.k, line: 177 > Caused by: Block task failed: Error submitting block task > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: Could > not submit job (qsub reported an exit code of 170). > qsub: Unknown queue MSG=cannot locate queue > > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63) > at > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) > at > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:61) > at > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:70) > Caused by: > org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could > not submit job (qsub reported an exit code of 170). > qsub: Unknown queue MSG=cannot locate queue > > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:113) > at > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53) > ... 3 more > > > > also, in some configs I saw this. But I dont know what this is and what > values should I set to > > key="providerAttributes">pbs.aprun;pbs.mpp;depth=32 > > > On Tue, Sep 23, 2014 at 6:22 PM, Justin bbt wrote: > > > Thank you very much. > > > > I am actually using the Blacklight, which I guess is PBS based. > > So, should I use the Crays tutorial and setting ? > > http://swift-lang.org/tutorials/cray/tutorial.html > > > > On Tue, Sep 23, 2014 at 10:31 AM, Ketan Maheshwari > > wrote: > > > >> Hi Justin, > >> > >> If you are using xsede Stampede regular nodes (non xeon phi), here is a > >> site configuration that has worked for me in the past, connecting over ssh > >> to slurm: > >> > >> > >> > >> > >> > >> > >> 1 > >> 1 > >> 7500 > >> 00:10:00 > >> 100 > >> 100 > >> normal > >> 1 > >> 1 > >> 1 > >> TG-EAR130015 > >> .3199 > >> 10000 > >> /tmp/{env.USER}/swift.work > >> > >> > >> > >> You will need to replace project id with yours. > >> > >> Thanks, > >> Ketan > >> > >> On Mon, Sep 22, 2014 at 8:32 PM, Justin bbt wrote: > >> > >>> Hi > >>> > >>> If I want to use resources on the xsede > >>> https://www.xsede.org/overview > >>> which site config should I use ? > >>> > >>> > >>> _______________________________________________ > >>> Swift-user mailing list > >>> Swift-user at ci.uchicago.edu > >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > >>> > >> > >> > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user From justinbbt at gmail.com Sun Sep 28 18:03:19 2014 From: justinbbt at gmail.com (Justin bbt) Date: Sun, 28 Sep 2014 19:03:19 -0400 Subject: [Swift-user] question about xsede In-Reply-To: <1411852630.2517.1.camel@echo> References: <1411852630.2517.1.camel@echo> Message-ID: changing the queue to "batch", I get this Execution failed: Exception in simulate: Arguments: [--timesteps, 1, --range, 100, --nvalues, 5] Host: black Directory: p4-run026/jobs/5/simulate-58d6x2yl exception @ swift-int-staging.k, line: 181 Caused by: exception @ swift-int-staging.k, line: 177 Caused by: null Caused by: org.globus.cog.abstraction.impl.common.execution.JobException: Job failed with an exit code of 1 ???? On Sat, Sep 27, 2014 at 5:17 PM, Mihael Hategan wrote: > Hi Justin, > > Is there a queue named "normal" on that machine (qstat -q should tell > you)? > > Mihael > > On Sat, 2014-09-27 at 15:41 -0400, Justin bbt wrote: > > So, I am using this config > > > > > > > > > > key="PATHPREFIX">{env.PWD}/../app > > 1 > > normal > > 16 > > 00:01:00 > > 3600 > > 100 > > 100 > > 2 > > 1 > > 1 > > .320 > > 10000 > > 1 > > TG-CCR134513 > > /brashear/usrname > > local > > > > > > > > but, it does not work and give this error : > > > > Swift 0.95 RC6 swift-r7900 cog-r3908 > > RunID: run011 > > Warning: The @ syntax for function invocation is deprecated > > [Error] sites.xml, line 2, col 10: cvc-elt.1: Cannot find the declaration > > of element 'config'. > > Progress: Sat, 27 Sep 2014 15:35:52-0400 > > > > Could not submit job (qsub reported an exit code of 170). > > qsub: Unknown queue MSG=cannot locate queue > > > > Execution failed: > > Exception in simulate: > > Arguments: [--timesteps, 1, --range, 100, --nvalues, 5] > > Host: black > > Directory: p4-run011/jobs/s/simulate-sdm041yl > > exception @ swift-int-staging.k, line: 181 > > Caused by: > > exception @ swift-int-staging.k, line: 177 > > Caused by: Block task failed: Error submitting block task > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Could > > not submit job (qsub reported an exit code of 170). > > qsub: Unknown queue MSG=cannot locate queue > > > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63) > > at > > > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) > > at > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:61) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:70) > > Caused by: > > org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could > > not submit job (qsub reported an exit code of 170). > > qsub: Unknown queue MSG=cannot locate queue > > > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:113) > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53) > > ... 3 more > > > > > > > > also, in some configs I saw this. But I dont know what this is and what > > values should I set to > > > > > key="providerAttributes">pbs.aprun;pbs.mpp;depth=32 > > > > > > On Tue, Sep 23, 2014 at 6:22 PM, Justin bbt wrote: > > > > > Thank you very much. > > > > > > I am actually using the Blacklight, which I guess is PBS based. > > > So, should I use the Crays tutorial and setting ? > > > http://swift-lang.org/tutorials/cray/tutorial.html > > > > > > On Tue, Sep 23, 2014 at 10:31 AM, Ketan Maheshwari > > > wrote: > > > > > >> Hi Justin, > > >> > > >> If you are using xsede Stampede regular nodes (non xeon phi), here is > a > > >> site configuration that has worked for me in the past, connecting > over ssh > > >> to slurm: > > >> > > >> > > >> > > >> > > >> > > >> > > >> 1 > > >> 1 > > >> 7500 > > >> 00:10:00 > > >> 100 > > >> 100 > > >> normal > > >> 1 > > >> 1 > > >> 1 > > >> TG-EAR130015 > > >> .3199 > > >> 10000 > > >> /tmp/{env.USER}/swift.work > > >> > > >> > > >> > > >> You will need to replace project id with yours. > > >> > > >> Thanks, > > >> Ketan > > >> > > >> On Mon, Sep 22, 2014 at 8:32 PM, Justin bbt > wrote: > > >> > > >>> Hi > > >>> > > >>> If I want to use resources on the xsede > > >>> https://www.xsede.org/overview > > >>> which site config should I use ? > > >>> > > >>> > > >>> _______________________________________________ > > >>> Swift-user mailing list > > >>> Swift-user at ci.uchicago.edu > > >>> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > >>> > > >> > > >> > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at anl.gov Mon Sep 29 09:24:23 2014 From: wilde at anl.gov (Michael Wilde) Date: Mon, 29 Sep 2014 09:24:23 -0500 Subject: [Swift-user] question about xsede In-Reply-To: References: <1411852630.2517.1.camel@echo> Message-ID: <54296B97.6060506@anl.gov> Justin, in your sites entry below, this line looks suspect: /brashear/usrname That needs to be the name of a writable directory. - Mike On 9/28/14, 6:03 PM, Justin bbt wrote: > changing the queue to "batch", I get this > Execution failed: > Exception in simulate: > Arguments: [--timesteps, 1, --range, 100, --nvalues, 5] > Host: black > Directory: p4-run026/jobs/5/simulate-58d6x2yl > exception @ swift-int-staging.k, line: 181 > Caused by: > exception @ swift-int-staging.k, line: 177 > Caused by: null > Caused by: > org.globus.cog.abstraction.impl.common.execution.JobException: Job > failed with an exit code of 1 > > ???? > > On Sat, Sep 27, 2014 at 5:17 PM, Mihael Hategan > wrote: > > Hi Justin, > > Is there a queue named "normal" on that machine (qstat -q should tell > you)? > > Mihael > > On Sat, 2014-09-27 at 15:41 -0400, Justin bbt wrote: > > So, I am using this config > > > > > > > > URL="none"/> > > key="PATHPREFIX">{env.PWD}/../app > > 1 > > normal > > 16 > > 00:01:00 > > 3600 > > key="lowOverAllocation">100 > > key="highOverAllocation">100 > > 2 > > 1 > > 1 > > .320 > > 10000 > > 1 > > TG-CCR134513 > > /brashear/usrname > > local > > > > > > > > but, it does not work and give this error : > > > > Swift 0.95 RC6 swift-r7900 cog-r3908 > > RunID: run011 > > Warning: The @ syntax for function invocation is deprecated > > [Error] sites.xml, line 2, col 10: cvc-elt.1: Cannot find the > declaration > > of element 'config'. > > Progress: Sat, 27 Sep 2014 15:35:52-0400 > > > > Could not submit job (qsub reported an exit code of 170). > > qsub: Unknown queue MSG=cannot locate queue > > > > Execution failed: > > Exception in simulate: > > Arguments: [--timesteps, 1, --range, 100, --nvalues, 5] > > Host: black > > Directory: p4-run011/jobs/s/simulate-sdm041yl > > exception @ swift-int-staging.k, line: 181 > > Caused by: > > exception @ swift-int-staging.k, line: 177 > > Caused by: Block task failed: Error submitting block task > > > org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: > Could > > not submit job (qsub reported an exit code of 170). > > qsub: Unknown queue MSG=cannot locate queue > > > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63) > > at > > > org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) > > at > > > org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:61) > > at > > > org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:70) > > Caused by: > > > org.globus.cog.abstraction.impl.scheduler.common.ProcessException: > Could > > not submit job (qsub reported an exit code of 170). > > qsub: Unknown queue MSG=cannot locate queue > > > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:113) > > at > > > org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53) > > ... 3 more > > > > > > > > also, in some configs I saw this. But I dont know what this is > and what > > values should I set to > > > > > key="providerAttributes">pbs.aprun;pbs.mpp;depth=32 > > > > > > On Tue, Sep 23, 2014 at 6:22 PM, Justin bbt > wrote: > > > > > Thank you very much. > > > > > > I am actually using the Blacklight, which I guess is PBS based. > > > So, should I use the Crays tutorial and setting ? > > > http://swift-lang.org/tutorials/cray/tutorial.html > > > > > > On Tue, Sep 23, 2014 at 10:31 AM, Ketan Maheshwari > > > > > wrote: > > > > > >> Hi Justin, > > >> > > >> If you are using xsede Stampede regular nodes (non xeon phi), > here is a > > >> site configuration that has worked for me in the past, > connecting over ssh > > >> to slurm: > > >> > > >> > > >> > > >> > > >> > > >> > > >> 1 > > >> 1 > > >> 7500 > > >> key="maxwalltime">00:10:00 > > >> key="lowOverallocation">100 > > >> key="highOverallocation">100 > > >> normal > > >> 1 > > >> 1 > > >> 1 > > >> key="project">TG-EAR130015 > > >> .3199 > > >> key="initialScore">10000 > > >> /tmp/{env.USER}/swift.work > > >> > > >> > > >> > > >> You will need to replace project id with yours. > > >> > > >> Thanks, > > >> Ketan > > >> > > >> On Mon, Sep 22, 2014 at 8:32 PM, Justin bbt > > wrote: > > >> > > >>> Hi > > >>> > > >>> If I want to use resources on the xsede > > >>> https://www.xsede.org/overview > > >>> which site config should I use ? > > >>> > > >>> > > >>> _______________________________________________ > > >>> Swift-user mailing list > > >>> Swift-user at ci.uchicago.edu > > >>> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > >>> > > >> > > >> > > > > > _______________________________________________ > > Swift-user mailing list > > Swift-user at ci.uchicago.edu > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > > > > > _______________________________________________ > Swift-user mailing list > Swift-user at ci.uchicago.edu > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago -------------- next part -------------- An HTML attachment was scrubbed... URL: From wilde at anl.gov Mon Sep 29 12:23:04 2014 From: wilde at anl.gov (Michael Wilde) Date: Mon, 29 Sep 2014 12:23:04 -0500 Subject: [Swift-user] question about xsede In-Reply-To: <54296B97.6060506@anl.gov> References: <1411852630.2517.1.camel@echo> <54296B97.6060506@anl.gov> Message-ID: <54299578.5000301@anl.gov> Justin, I see that Yadu provided tested configurations for running with Swift 0.94.1, 0.95RC7, and trunk, in his email to this list on 9/25 (pasted below). He pointed you to this directory for a sites.xml example for 0.94 and 0.95: http://users.rcc.uchicago.edu/~yadunand/blacklight-sanity/0.94configs/sites.xml The config he provided for 0.94.1 should also work for 0.95RC6 (which I see you are using) (Note that we will be posting an 0.95 final release, or an RC7, by the end of this week). Based on this discussion so far, I think the following is a good base for a sites entry for running on Blacklight: debug .320 10000 16 900 00:10:00 16 /usr/users/8/yadunand/swiftwork for the "queue" tag, use either debug or batch. for the "workdirectory" tag, specify a fully qualified directory pathname that you can write/create in. Note that you might have a different $HOME "users/N" dir than Yadu's. Please double check the latest code at the link Yadu provided, to make sure I did not get anything wrong, above. Regards, - Mike -------- Original Message -------- Subject: Re: [Swift-user] question about xsede Date: Thu, 25 Sep 2014 17:25:01 -0500 From: Yadu Nand To: Michael Wilde CC: Swift User Hi Justin, Here are some tested configs and a small README from running the sanity test on Blacklight: http://users.rcc.uchicago.edu/~yadunand/blacklight-sanity/ There's an example each of configs for Swift 0.94, Swift 0.95 and the configs we would use going forward (Swift 0.96 and current trunk) in that folder. In the example, I've used ppn=16 (or any multiple of 16) which seems to work as a substitute for ncpus. Hope that helps! -Yadu On 9/29/14, 9:24 AM, Michael Wilde wrote: > Justin, in your sites entry below, this line looks suspect: > > /brashear/usrname > > That needs to be the name of a writable directory. > > - Mike > > > On 9/28/14, 6:03 PM, Justin bbt wrote: >> changing the queue to "batch", I get this >> Execution failed: >> Exception in simulate: >> Arguments: [--timesteps, 1, --range, 100, --nvalues, 5] >> Host: black >> Directory: p4-run026/jobs/5/simulate-58d6x2yl >> exception @ swift-int-staging.k, line: 181 >> Caused by: >> exception @ swift-int-staging.k, line: 177 >> Caused by: null >> Caused by: >> org.globus.cog.abstraction.impl.common.execution.JobException: Job >> failed with an exit code of 1 >> >> ???? >> >> On Sat, Sep 27, 2014 at 5:17 PM, Mihael Hategan > > wrote: >> >> Hi Justin, >> >> Is there a queue named "normal" on that machine (qstat -q should tell >> you)? >> >> Mihael >> >> On Sat, 2014-09-27 at 15:41 -0400, Justin bbt wrote: >> > So, I am using this config >> > >> > >> > >> > > URL="none"/> >> > > key="PATHPREFIX">{env.PWD}/../app >> > 1 >> > normal >> > 16 >> > 00:01:00 >> > 3600 >> > > key="lowOverAllocation">100 >> > > key="highOverAllocation">100 >> > 2 >> > 1 >> > 1 >> > .320 >> > 10000 >> > 1 >> > TG-CCR134513 >> > /brashear/usrname >> > > key="stagingMethod">local >> > >> > >> > >> > but, it does not work and give this error : >> > >> > Swift 0.95 RC6 swift-r7900 cog-r3908 >> > RunID: run011 >> > Warning: The @ syntax for function invocation is deprecated >> > [Error] sites.xml, line 2, col 10: cvc-elt.1: Cannot find the >> declaration >> > of element 'config'. >> > Progress: Sat, 27 Sep 2014 15:35:52-0400 >> > >> > Could not submit job (qsub reported an exit code of 170). >> > qsub: Unknown queue MSG=cannot locate queue >> > >> > Execution failed: >> > Exception in simulate: >> > Arguments: [--timesteps, 1, --range, 100, --nvalues, 5] >> > Host: black >> > Directory: p4-run011/jobs/s/simulate-sdm041yl >> > exception @ swift-int-staging.k, line: 181 >> > Caused by: >> > exception @ swift-int-staging.k, line: 177 >> > Caused by: Block task failed: Error submitting block task >> > >> org.globus.cog.abstraction.impl.common.task.TaskSubmissionException: >> Could >> > not submit job (qsub reported an exit code of 170). >> > qsub: Unknown queue MSG=cannot locate queue >> > >> > at >> > >> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:63) >> > at >> > >> org.globus.cog.abstraction.impl.common.AbstractTaskHandler.submit(AbstractTaskHandler.java:45) >> > at >> > >> org.globus.cog.abstraction.impl.common.task.ExecutionTaskHandler.submit(ExecutionTaskHandler.java:61) >> > at >> > >> org.globus.cog.abstraction.coaster.service.job.manager.BlockTaskSubmitter.run(BlockTaskSubmitter.java:70) >> > Caused by: >> > >> org.globus.cog.abstraction.impl.scheduler.common.ProcessException: Could >> > not submit job (qsub reported an exit code of 170). >> > qsub: Unknown queue MSG=cannot locate queue >> > >> > at >> > >> org.globus.cog.abstraction.impl.scheduler.common.AbstractExecutor.start(AbstractExecutor.java:113) >> > at >> > >> org.globus.cog.abstraction.impl.scheduler.common.AbstractJobSubmissionTaskHandler.submit(AbstractJobSubmissionTaskHandler.java:53) >> > ... 3 more >> > >> > >> > >> > also, in some configs I saw this. But I dont know what this is >> and what >> > values should I set to >> > >> > > > key="providerAttributes">pbs.aprun;pbs.mpp;depth=32 >> > >> > >> > On Tue, Sep 23, 2014 at 6:22 PM, Justin bbt >> > wrote: >> > >> > > Thank you very much. >> > > >> > > I am actually using the Blacklight, which I guess is PBS based. >> > > So, should I use the Crays tutorial and setting ? >> > > http://swift-lang.org/tutorials/cray/tutorial.html >> > > >> > > On Tue, Sep 23, 2014 at 10:31 AM, Ketan Maheshwari >> > >> > > wrote: >> > > >> > >> Hi Justin, >> > >> >> > >> If you are using xsede Stampede regular nodes (non xeon >> phi), here is a >> > >> site configuration that has worked for me in the past, >> connecting over ssh >> > >> to slurm: >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> >> > >> 1 >> > >> 1 >> > >> 7500 >> > >> > key="maxwalltime">00:10:00 >> > >> > key="lowOverallocation">100 >> > >> > key="highOverallocation">100 >> > >> normal >> > >> 1 >> > >> 1 >> > >> 1 >> > >> > key="project">TG-EAR130015 >> > >> > key="jobThrottle">.3199 >> > >> > key="initialScore">10000 >> > >> /tmp/{env.USER}/swift.work >> > >> >> > >> >> > >> >> > >> You will need to replace project id with yours. >> > >> >> > >> Thanks, >> > >> Ketan >> > >> >> > >> On Mon, Sep 22, 2014 at 8:32 PM, Justin bbt >> > wrote: >> > >> >> > >>> Hi >> > >>> >> > >>> If I want to use resources on the xsede >> > >>> https://www.xsede.org/overview >> > >>> which site config should I use ? >> > >>> >> > >>> >> > >>> _______________________________________________ >> > >>> Swift-user mailing list >> > >>> Swift-user at ci.uchicago.edu >> > >>> >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> > >>> >> > >> >> > >> >> > > >> > _______________________________________________ >> > Swift-user mailing list >> > Swift-user at ci.uchicago.edu >> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user >> >> >> >> >> >> _______________________________________________ >> Swift-user mailing list >> Swift-user at ci.uchicago.edu >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-user > > -- > Michael Wilde > Mathematics and Computer Science Computation Institute > Argonne National Laboratory The University of Chicago -- Michael Wilde Mathematics and Computer Science Computation Institute Argonne National Laboratory The University of Chicago -------------- next part -------------- An HTML attachment was scrubbed... URL: