[Swift-devel] Dynamic profiles and count

David Kelly davidk at ci.uchicago.edu
Wed Sep 19 13:01:25 CDT 2012


That seems to be working great. Thanks!

David

----- Original Message -----
> From: "Mihael Hategan" <hategan at mcs.anl.gov>
> To: "Justin M Wozniak" <wozniak at mcs.anl.gov>
> Cc: "swift-devel Devel" <swift-devel at ci.uchicago.edu>
> Sent: Tuesday, September 18, 2012 11:25:33 PM
> Subject: Re: [Swift-devel] Dynamic profiles and count
> David,
> 
> Can you try r5934?
> 
> Mihael
> 
> On Tue, 2012-09-18 at 20:42 -0700, Mihael Hategan wrote:
> > Ah, thanks. I can check and see what's happening.
> >
> > On Tue, 2012-09-18 at 21:40 -0500, Justin M Wozniak wrote:
> > > Just to clarify- this is based on changes already in svn regarding
> > > dynamic profiles. That's why the map is there.
> > >
> > > http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_dynamic_profiles
> > >
> > > David is trying to solve a specific issue with using dynamic
> > > profiles
> > > and "count": we need to know all about "count" to solve this.
> > >
> > > This is a paste of my recent email to David about dynamic profiles
> > > and
> > > issues with "count":
> > >
> > > Ok, the place to look is in the Swift repo: svn diff -r5206:5207
> > >
> > > Apparently, no changes were made to CoG.
> > >
> > > The dynamic profiles should be visible in the KML.
> > >
> > >  From there, you really need to know your Karajan semantics. The
> > > attributes are known to execute2() and thus execute() and Swift's
> > > Execute.
> > >
> > > Now Java. You should be able to see the attributes in GridExec.
> > > They
> > > are applied to the Task's JobSpecification. These good places to
> > > add
> > > some trace-level logging.
> > >
> > > Then, the Task is sent to PBSExecutor. You could add logging here
> > > too
> > > to make sure the attributes made it.
> > >
> > > One thing to consider is the possibility that the attribute is
> > > being
> > > overwritten by some other component. If you add logging to
> > > JobSpecification.setAttribute(), you might be able to find that.
> > >
> > > On 9/18/2012 4:58 PM, Mihael Hategan wrote:
> > > > I'm not sure. The way I understand things, your change modifies
> > > > the way
> > > > execute is invoked from:
> > > >
> > > > execute(executable, ..., count=n)
> > > >
> > > > to:
> > > >
> > > > execute(executable, ..., attributes=map(map:entry("count", n),
> > > > ...))
> > > >
> > > > But the semantics are the same. The task object, in both cases,
> > > > will
> > > > have an attribute named "count" equal to n.
> > > >
> > > > Can you send me the full diff of your changes?
> > > >
> > > > Mihael
> > > >
> > > > On Tue, 2012-09-18 at 13:33 -0500, David Kelly wrote:
> > > >> Hello,
> > > >>
> > > >> I have been working on a namd scaling test using Swift. I am
> > > >> using the plain PBS provider at the moment (no coasters). The
> > > >> swift script sets a minimum number of nodes, a maximum number
> > > >> of nodes, iterates through those values, then uses dynamic
> > > >> profiles to change the value of 'count' to modify the number of
> > > >> nodes to request. Here is the script:
> > > >>
> > > >> ---
> > > >> type file;
> > > >>
> > > >> app (file out, file err) namd_wrapper (int numnodes, file
> > > >> psf_file, file pdb_file, file coord_restart_file,
> > > >>                                         file
> > > >>                                         velocity_restart_file,
> > > >>                                         file
> > > >>                                         system_restart_file)
> > > >> {
> > > >>     profile "count" = numnodes;
> > > >>     namd_wrapper @psf_file @pdb_file @coord_restart_file
> > > >>     @velocity_restart_file @system_restart_file stdout=@out
> > > >>     stderr=@err;
> > > >> }
> > > >>
> > > >> # Range of nodes to test on
> > > >> int minNodes=1;
> > > >> int maxNodes=2;
> > > >> int delta=1;
> > > >>
> > > >> # Files
> > > >> file psf <"h0_solvion.psf">;
> > > >> file pdb <"h0_solvion.pdb">;
> > > >> file coord_restart <"h0_eq.0.restart.coor">;
> > > >> file velocity_restart <"h0_eq.0.restart.vel">;
> > > >> file system_restart <"h0_eq.0.restart.xsc">;
> > > >>
> > > >> foreach nodes in [minNodes:maxNodes:delta] {
> > > >>     file output <single_file_mapper;
> > > >>     file=@strcat("logs/scaling-", nodes, ".out.txt")>;
> > > >>     file error <single_file_mapper;
> > > >>     file=@strcat("logs/scaling-", nodes, ".err.txt")>;
> > > >>     (output, error) = namd_wrapper(nodes, psf, pdb,
> > > >>     coord_restart, velocity_restart, system_restart);
> > > >> }
> > > >> ---
> > > >>
> > > >> In sites.xml, I also set the jobtype to "single" so it doesn't
> > > >> start a worker on each node (namd uses MPI).
> > > >>
> > > >> The problem that I'm running into is that, as is, dynamic
> > > >> profiles do not seem to allow you to modify the value for
> > > >> count. I have a workaround for this which involves removing
> > > >> references to "count" in GridExec.java, Execute.java, and
> > > >> TCProfile.java. This works for me in terms of this script, and
> > > >> it works with a few other simple catsn type scripts I've
> > > >> tested. I just wanted to double check to make sure this
> > > >> wouldn't cause any other issues before committing. Here are the
> > > >> changes:
> > > >>
> > > >> Index:
> > > >> modules/karajan/src/org/globus/cog/karajan/workflow/nodes/grid/GridExec.java
> > > >> ===================================================================
> > > >> ---
> > > >> modules/karajan/src/org/globus/cog/karajan/workflow/nodes/grid/GridExec.java
> > > >> (revision 3472)
> > > >> +++
> > > >> modules/karajan/src/org/globus/cog/karajan/workflow/nodes/grid/GridExec.java
> > > >> (working copy)
> > > >> @@ -56,7 +56,7 @@
> > > >>           public static final Arg A_STDIN = new
> > > >>           Arg.Optional("stdin");
> > > >>           public static final Arg A_PROVIDER = new
> > > >>           Arg.Optional("provider");
> > > >>           public static final Arg A_SECURITY_CONTEXT = new
> > > >>           Arg.Optional("securitycontext");
> > > >> - public static final Arg A_COUNT = new Arg.Optional("count");
> > > >> + // public static final Arg A_COUNT = new
> > > >> Arg.Optional("count");
> > > >>           public static final Arg A_HOST_COUNT = new
> > > >>           Arg.Optional("hostcount");
> > > >>           public static final Arg A_JOBTYPE = new
> > > >>           Arg.Optional("jobtype");
> > > >>           public static final Arg A_MAXTIME = new
> > > >>           Arg.Optional("maxtime");
> > > >> @@ -86,7 +86,8 @@
> > > >>           static {
> > > >>                   setArguments(GridExec.class, new Arg[] {
> > > >>                   A_EXECUTABLE, A_ARGS, A_ARGUMENTS, A_HOST,
> > > >>                                   A_STDOUT, A_STDERR,
> > > >>                                   A_STDOUTLOCATION,
> > > >>                                   A_STDERRLOCATION, A_STDIN,
> > > >>                                   A_PROVIDER,
> > > >> - A_COUNT, A_HOST_COUNT, A_JOBTYPE, A_MAXTIME, A_MAXWALLTIME,
> > > >> A_MAXCPUTIME,
> > > >> + // A_COUNT,
> > > >> + A_HOST_COUNT, A_JOBTYPE, A_MAXTIME, A_MAXWALLTIME,
> > > >> A_MAXCPUTIME,
> > > >>                                   A_ENVIRONMENT, A_QUEUE,
> > > >>                                   A_PROJECT, A_MINMEMORY,
> > > >>                                   A_MAXMEMORY, A_REDIRECT,
> > > >>                                   A_SECURITY_CONTEXT,
> > > >>                                   A_DIRECTORY, A_NATIVESPEC,
> > > >>                                   A_DELEGATION, A_ATTRIBUTES,
> > > >>                                   C_ENVIRONMENT,
> > > >>                                   A_FAIL_ON_JOB_ERROR, A_BATCH,
> > > >>                                   C_STAGEIN, C_STAGEOUT,
> > > >>                                   C_CLEANUP,
> > > >> @@ -346,7 +347,7 @@
> > > >>                   }
> > > >>           }
> > > >>
> > > >> - protected final static Arg[] MISC_ATTRS = new Arg[] {
> > > >> A_COUNT, A_HOST_COUNT, A_JOBTYPE,
> > > >> + protected final static Arg[] MISC_ATTRS = new Arg[] {
> > > >> A_HOST_COUNT, A_JOBTYPE,
> > > >>                           A_MAXTIME, A_MAXWALLTIME,
> > > >>                           A_MAXCPUTIME, A_QUEUE, A_PROJECT,
> > > >>                           A_MINMEMORY, A_MAXMEMORY };
> > > >>
> > > >>           protected void setMiscAttributes(JobSpecification js,
> > > >>           VariableStack stack)
> > > >>
> > > >> Index: src/org/griphyn/vdl/karajan/lib/TCProfile.java
> > > >> ===================================================================
> > > >> --- src/org/griphyn/vdl/karajan/lib/TCProfile.java (revision
> > > >> 5930)
> > > >> +++ src/org/griphyn/vdl/karajan/lib/TCProfile.java (working
> > > >> copy)
> > > >> @@ -63,7 +63,6 @@
> > > >>
> > > >>           static {
> > > >>                   PROFILE_T = new HashMap<String, Arg>();
> > > >> - PROFILE_T.put("count", GridExec.A_COUNT);
> > > >>                   PROFILE_T.put("jobtype", GridExec.A_JOBTYPE);
> > > >>                   PROFILE_T.put("maxcputime",
> > > >>                   GridExec.A_MAXCPUTIME);
> > > >>                   PROFILE_T.put("maxmemory",
> > > >>                   GridExec.A_MAXMEMORY);
> > > >>
> > > >> Index: src/org/griphyn/vdl/karajan/lib/Execute.java
> > > >> ===================================================================
> > > >> --- src/org/griphyn/vdl/karajan/lib/Execute.java (revision
> > > >> 5930)
> > > >> +++ src/org/griphyn/vdl/karajan/lib/Execute.java (working copy)
> > > >> @@ -47,7 +47,7 @@
> > > >>           static {
> > > >>                   setArguments(Execute.class, new Arg[] {
> > > >>                   A_EXECUTABLE, A_ARGS, A_ARGUMENTS, A_HOST,
> > > >>                                   A_STDOUT, A_STDERR,
> > > >>                                   A_STDOUTLOCATION,
> > > >>                                   A_STDERRLOCATION, A_STDIN,
> > > >>                                   A_PROVIDER,
> > > >> - A_COUNT, A_HOST_COUNT, A_JOBTYPE, A_MAXTIME, A_MAXWALLTIME,
> > > >> A_MAXCPUTIME,
> > > >> + A_HOST_COUNT, A_JOBTYPE, A_MAXTIME, A_MAXWALLTIME,
> > > >> A_MAXCPUTIME,
> > > >>                                   A_ENVIRONMENT, A_QUEUE,
> > > >>                                   A_PROJECT, A_MINMEMORY,
> > > >>                                   A_MAXMEMORY, A_REDIRECT,
> > > >>                                   A_SECURITY_CONTEXT,
> > > >>                                   A_DIRECTORY, A_NATIVESPEC,
> > > >>                                   A_DELEGATION, A_ATTRIBUTES,
> > > >>                                   C_ENVIRONMENT,
> > > >>                                   A_FAIL_ON_JOB_ERROR, A_BATCH,
> > > >>                                   A_REPLICATION_GROUP,
> > > >> _______________________________________________
> > > >> Swift-devel mailing list
> > > >> Swift-devel at ci.uchicago.edu
> > > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > > >
> > > > _______________________________________________
> > > > Swift-devel mailing list
> > > > Swift-devel at ci.uchicago.edu
> > > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >
> > >
> >
> >
> > _______________________________________________
> > Swift-devel mailing list
> > Swift-devel at ci.uchicago.edu
> > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel



More information about the Swift-devel mailing list