[Swift-devel] Swift and BGP

skenny at uchicago.edu skenny at uchicago.edu
Wed Nov 18 16:36:25 CST 2009


>On Mon, 2009-10-26 at 11:56 -0500, Mihael Hategan wrote:
>> So here's how one would go with this on intrepid:
>> - determine the maximum number of workers (avg-exec-time * 100)
>> - set the nodeGranularity to 512 nodes, 4 workers per node.
Also set
>> maxWorkers to 512 so that only 512 node blocks are
requested. For some
>> reason 512 node partitions start almost instantly (even if
you have 6 of
>> them) while 1024 node partitions you have to wait for.
>> - set the total number of blocks ("slots" parameter) to
>> no-of-workers/2048.
>> - set the jobThrottle to 2*no-of-workers/100
>> - make sure you also have foreach,max.threads set to
2*no-of-workers
>> (though that depends on the structure of the program).
>> - run on login6. There is no point in using the normal
login machines
>> since they have a limit of 1024 file descriptors per process.
>> 

so, am i correct in understanding that currently swift can
only run on login6 when running on intrepid? i ask because i'm
currently not able to get on login6, but decided to try a
512-job workflow on login3 and got this:


Progress:  Submitted:56  Active:456
Server died: Too many open files
java.net.SocketException: Too many open files
        at
java.net.PlainSocketImpl.accept(PlainSocketImpl.java:457)
        at java.net.ServerSocket.implAccept(ServerSocket.java:473)
        at java.net.ServerSocket.accept(ServerSocket.java:444)
        at org.globus.net.BaseServer.run(BaseServer.java:226)
        at java.lang.Thread.run(Thread.java:810)
Worker task failed:
Progress:  Active:511 Failed but can retry:1
Worker task failed:
Failed to transfer wrapper log from
test-20091118-1622-lx3lzyx5/info/r on localhost
Execution failed:
        Failed to transfer wrapper log from
test-20091118-1622-lx3lzyx5/info/t on localhost
Exception in RInvoke:
Arguments: [scripts/4reg_dummy.R,
matrices/net1_gestspeech.cov, 440, 0.5, gestspeech, net1]
Host: localhost
Directory: test-20091118-1622-lx3lzyx5/jobs/r/RInvoke-rznt0njj
stderr.txt:

stdout.txt:

here's my swift.properties file:

skenny at login3.intrepid:~/swift_runs/exhaustive_sem> cat
config/swift.properties
sites.file=/home/skenny/cnari/config/local_sites.xml
tc.file=/home/skenny/cnari/config/tc.data

lazy.errors=false
caching.algorithm=LRU
pgraph=false
pgraph.graph.options=splines="compound", rankdir="TB"
pgraph.node.options=color="seagreen", style="filled"
clustering.enabled=false
clustering.queue.delay=4
clustering.min.time=60
kickstart.enabled=maybe
kickstart.always.transfer=false
wrapperlog.always.transfer=false
throttle.submit=3
throttle.host.submit=8
throttle.score.job.factor=64
throttle.transfers=16
throttle.file.operations=16
sitedir.keep=true
execution.retries=0
replication.enabled=false
replication.min.queue.time=60
replication.limit=3
foreach.max.threads=2048

skenny at login3.intrepid:~/scratch/g/RInvoke-g8ot0njj> cat
~/cnari/config/local_sites.xml
<config>
<pool handle="localhost">
    <filesystem provider="local"/>
    <execution provider="coaster" jobmanager="local:cobalt"/>
    <profile namespace="globus" key="slots">10</profile>
    <profile namespace="globus"
key="nodeGranularity">512</profile>
    <profile namespace="globus" key="workersPerNode">4</profile>
    <profile namespace="globus" key="maxNodes">512</profile>
    <profile namespace="globus"
key="project">HTCScienceApps</profile>
    <profile namespace="globus"
key="kernelprofile">zeptoos</profile>
    <profile namespace="globus" key="maxtime">3000</profile>
    <profile namespace="globus" key="alcfbgpnat">true</profile>
    <profile namespace="karajan"
key="initialScore">100000</profile>
    <workdirectory
>/intrepid-fs0/users/skenny/scratch</workdirectory>
    <scratch>/home/skenny/scratch</scratch>
  </pool>
</config>

i guess i'm wondering if there's a configuration i can use
that will allow me to run on other logins besides 6 (?)

thnx
~sk




More information about the Swift-devel mailing list