[Swift-devel] Dynamic profiles and count

Mihael Hategan hategan at mcs.anl.gov
Tue Sep 18 23:25:33 CDT 2012


David,

Can you try r5934?

Mihael

On Tue, 2012-09-18 at 20:42 -0700, Mihael Hategan wrote:
> Ah, thanks. I can check and see what's happening.
> 
> On Tue, 2012-09-18 at 21:40 -0500, Justin M Wozniak wrote:
> > Just to clarify- this is based on changes already in svn regarding 
> > dynamic profiles.  That's why the map is there.
> > 
> > http://www.ci.uchicago.edu/swift/guides/trunk/userguide/userguide.html#_dynamic_profiles
> > 
> > David is trying to solve a specific issue with using dynamic profiles 
> > and "count": we need to know all about "count" to solve this.
> > 
> > This is a paste of my recent email to David about dynamic profiles and 
> > issues with "count":
> > 
> > Ok, the place to look is in the Swift repo: svn diff -r5206:5207
> > 
> > Apparently, no changes were made to CoG.
> > 
> > The dynamic profiles should be visible in the KML.
> > 
> >  From there, you really need to know your Karajan semantics.  The 
> > attributes are known to execute2() and thus execute() and Swift's Execute.
> > 
> > Now Java.  You should be able to see the attributes in GridExec. They 
> > are applied to the Task's JobSpecification.  These good places to add 
> > some trace-level logging.
> > 
> > Then, the Task is sent to PBSExecutor.  You could add logging here too 
> > to make sure the attributes made it.
> > 
> > One thing to consider is the possibility that the attribute is being 
> > overwritten by some other component.  If you add logging to 
> > JobSpecification.setAttribute(), you might be able to find that.
> > 
> > On 9/18/2012 4:58 PM, Mihael Hategan wrote:
> > > I'm not sure. The way I understand things, your change modifies the way
> > > execute is invoked from:
> > >
> > > execute(executable, ..., count=n)
> > >
> > > to:
> > >
> > > execute(executable, ..., attributes=map(map:entry("count", n), ...))
> > >
> > > But the semantics are the same. The task object, in both cases, will
> > > have an attribute named "count" equal to n.
> > >
> > > Can you send me the full diff of your changes?
> > >
> > > Mihael
> > >
> > > On Tue, 2012-09-18 at 13:33 -0500, David Kelly wrote:
> > >> Hello,
> > >>
> > >> I have been working on a namd scaling test using Swift. I am using the plain PBS provider at the moment (no coasters). The swift script sets a minimum number of nodes, a maximum number of nodes, iterates through those values, then uses dynamic profiles to change the value of 'count' to modify the number of nodes to request. Here is the script:
> > >>
> > >> ---
> > >> type file;
> > >>
> > >> app (file out, file err) namd_wrapper (int numnodes, file psf_file, file pdb_file, file coord_restart_file,
> > >>                                         file velocity_restart_file, file system_restart_file)
> > >> {
> > >>     profile "count" = numnodes;
> > >>     namd_wrapper @psf_file @pdb_file @coord_restart_file @velocity_restart_file @system_restart_file stdout=@out stderr=@err;
> > >> }
> > >>
> > >> # Range of nodes to test on
> > >> int minNodes=1;
> > >> int maxNodes=2;
> > >> int delta=1;
> > >>
> > >> # Files
> > >> file psf <"h0_solvion.psf">;
> > >> file pdb <"h0_solvion.pdb">;
> > >> file coord_restart <"h0_eq.0.restart.coor">;
> > >> file velocity_restart <"h0_eq.0.restart.vel">;
> > >> file system_restart <"h0_eq.0.restart.xsc">;
> > >>
> > >> foreach nodes in [minNodes:maxNodes:delta] {
> > >>     file output <single_file_mapper; file=@strcat("logs/scaling-", nodes, ".out.txt")>;
> > >>     file error <single_file_mapper; file=@strcat("logs/scaling-", nodes, ".err.txt")>;
> > >>     (output, error) = namd_wrapper(nodes, psf, pdb, coord_restart, velocity_restart, system_restart);
> > >> }
> > >> ---
> > >>
> > >> In sites.xml, I also set the jobtype to "single" so it doesn't start a worker on each node (namd uses MPI).
> > >>
> > >> The problem that I'm running into is that, as is, dynamic profiles do not seem to allow you to modify the value for count. I have a workaround for this which involves removing references to "count" in GridExec.java, Execute.java, and TCProfile.java. This works for me in terms of this script, and it works with a few other simple catsn type scripts I've tested. I just wanted to double check to make sure this wouldn't cause any other issues before committing. Here are the changes:
> > >>
> > >> Index: modules/karajan/src/org/globus/cog/karajan/workflow/nodes/grid/GridExec.java
> > >> ===================================================================
> > >> --- modules/karajan/src/org/globus/cog/karajan/workflow/nodes/grid/GridExec.java        (revision 3472)
> > >> +++ modules/karajan/src/org/globus/cog/karajan/workflow/nodes/grid/GridExec.java        (working copy)
> > >> @@ -56,7 +56,7 @@
> > >>           public static final Arg A_STDIN = new Arg.Optional("stdin");
> > >>           public static final Arg A_PROVIDER = new Arg.Optional("provider");
> > >>           public static final Arg A_SECURITY_CONTEXT = new Arg.Optional("securitycontext");
> > >> -        public static final Arg A_COUNT = new Arg.Optional("count");
> > >> +        // public static final Arg A_COUNT = new Arg.Optional("count");
> > >>           public static final Arg A_HOST_COUNT = new Arg.Optional("hostcount");
> > >>           public static final Arg A_JOBTYPE = new Arg.Optional("jobtype");
> > >>           public static final Arg A_MAXTIME = new Arg.Optional("maxtime");
> > >> @@ -86,7 +86,8 @@
> > >>           static {
> > >>                   setArguments(GridExec.class, new Arg[] { A_EXECUTABLE, A_ARGS, A_ARGUMENTS, A_HOST,
> > >>                                   A_STDOUT, A_STDERR, A_STDOUTLOCATION, A_STDERRLOCATION, A_STDIN, A_PROVIDER,
> > >> -                                A_COUNT, A_HOST_COUNT, A_JOBTYPE, A_MAXTIME, A_MAXWALLTIME, A_MAXCPUTIME,
> > >> +                                // A_COUNT,
> > >> +                                A_HOST_COUNT, A_JOBTYPE, A_MAXTIME, A_MAXWALLTIME, A_MAXCPUTIME,
> > >>                                   A_ENVIRONMENT, A_QUEUE, A_PROJECT, A_MINMEMORY, A_MAXMEMORY, A_REDIRECT,
> > >>                                   A_SECURITY_CONTEXT, A_DIRECTORY, A_NATIVESPEC, A_DELEGATION, A_ATTRIBUTES,
> > >>                                   C_ENVIRONMENT, A_FAIL_ON_JOB_ERROR, A_BATCH, C_STAGEIN, C_STAGEOUT, C_CLEANUP,
> > >> @@ -346,7 +347,7 @@
> > >>                   }
> > >>           }
> > >>   
> > >> -        protected final static Arg[] MISC_ATTRS = new Arg[] { A_COUNT, A_HOST_COUNT, A_JOBTYPE,
> > >> +        protected final static Arg[] MISC_ATTRS = new Arg[] { A_HOST_COUNT, A_JOBTYPE,
> > >>                           A_MAXTIME, A_MAXWALLTIME, A_MAXCPUTIME, A_QUEUE, A_PROJECT, A_MINMEMORY, A_MAXMEMORY };
> > >>   
> > >>           protected void setMiscAttributes(JobSpecification js, VariableStack stack)
> > >>
> > >> Index: src/org/griphyn/vdl/karajan/lib/TCProfile.java
> > >> ===================================================================
> > >> --- src/org/griphyn/vdl/karajan/lib/TCProfile.java        (revision 5930)
> > >> +++ src/org/griphyn/vdl/karajan/lib/TCProfile.java        (working copy)
> > >> @@ -63,7 +63,6 @@
> > >>   
> > >>           static {
> > >>                   PROFILE_T = new HashMap<String, Arg>();
> > >> -                PROFILE_T.put("count", GridExec.A_COUNT);
> > >>                   PROFILE_T.put("jobtype", GridExec.A_JOBTYPE);
> > >>                   PROFILE_T.put("maxcputime", GridExec.A_MAXCPUTIME);
> > >>                   PROFILE_T.put("maxmemory", GridExec.A_MAXMEMORY);
> > >>
> > >> Index: src/org/griphyn/vdl/karajan/lib/Execute.java
> > >> ===================================================================
> > >> --- src/org/griphyn/vdl/karajan/lib/Execute.java        (revision 5930)
> > >> +++ src/org/griphyn/vdl/karajan/lib/Execute.java        (working copy)
> > >> @@ -47,7 +47,7 @@
> > >>           static {
> > >>                   setArguments(Execute.class, new Arg[] { A_EXECUTABLE, A_ARGS, A_ARGUMENTS, A_HOST,
> > >>                                   A_STDOUT, A_STDERR, A_STDOUTLOCATION, A_STDERRLOCATION, A_STDIN, A_PROVIDER,
> > >> -                                A_COUNT, A_HOST_COUNT, A_JOBTYPE, A_MAXTIME, A_MAXWALLTIME, A_MAXCPUTIME,
> > >> +                                A_HOST_COUNT, A_JOBTYPE, A_MAXTIME, A_MAXWALLTIME, A_MAXCPUTIME,
> > >>                                   A_ENVIRONMENT, A_QUEUE, A_PROJECT, A_MINMEMORY, A_MAXMEMORY, A_REDIRECT,
> > >>                                   A_SECURITY_CONTEXT, A_DIRECTORY, A_NATIVESPEC, A_DELEGATION, A_ATTRIBUTES,
> > >>                                   C_ENVIRONMENT, A_FAIL_ON_JOB_ERROR, A_BATCH, A_REPLICATION_GROUP,
> > >> _______________________________________________
> > >> Swift-devel mailing list
> > >> Swift-devel at ci.uchicago.edu
> > >> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > >
> > > _______________________________________________
> > > Swift-devel mailing list
> > > Swift-devel at ci.uchicago.edu
> > > https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel
> > 
> > 
> 
> 
> _______________________________________________
> Swift-devel mailing list
> Swift-devel at ci.uchicago.edu
> https://lists.ci.uchicago.edu/cgi-bin/mailman/listinfo/swift-devel





More information about the Swift-devel mailing list