[Swift-user] swift with matlab

Park, Jun-Sang parkjs at aps.anl.gov
Fri May 8 02:20:49 CDT 2015


Hello,

I am tinkering with the matlab + swift capability and am having problems.

As a trial, what I am trying is to do with the "hello world" problem (hwsq or the magic square problem) is this; instead of giving a csv input file path in the vanilla "hello world" problem, I am giving a mat-file path as an input and loading it in the "hello world" problem.

The problem I am having is when I change the size of the problem (increase the number for iterations in the foreach statement) , it no longer works; smaller problems run, bigger problems fail with the following message. It seems like some jobs go through but some don't. Is it possible that there are just too many pings to the mat file and swift or file system somehow doesn't like?

Anyways, if there is someone I can talk to or if we can talk about how to go about it, I'd appreciate it.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Swift 0.94 swift-r6637 cog-r3742

RunID: 20150507-1148-9kb4by0e
Progress:  time: Thu, 07 May 2015 11:48:18 -0500
Progress:  time: Thu, 07 May 2015 11:48:21 -0500  Selecting site:25  Submitted:15  Active:1
Progress:  time: Thu, 07 May 2015 11:48:35 -0500  Selecting site:25  Active:15  Checking status:1
Progress:  time: Thu, 07 May 2015 11:48:36 -0500  Selecting site:18  Stage in:1  Active:14  Finished successfully:8
Progress:  time: Thu, 07 May 2015 11:48:37 -0500  Selecting site:10  Stage in:1  Active:15  Finished successfully:15
Progress:  time: Thu, 07 May 2015 11:48:48 -0500  Selecting site:9  Active:16  Finished successfully:16
Progress:  time: Thu, 07 May 2015 11:48:49 -0500  Selecting site:9  Active:15  Checking status:1  Finished successfully:16
Exception in run_hwsq_matload:
    Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0017.dat, 17]
    Host: cluster
    Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/c/run_hwsq_matload-csn64g8m
Caused by: Failed to link input file ../mat.mat

Progress:  time: Thu, 07 May 2015 11:48:50 -0500  Submitted:1  Active:14  Failed:2  Finished successfully:24
Exception in run_hwsq_matload:
    Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0000.dat, 0]
    Host: cluster
    Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/d/run_hwsq_matload-dsn64g8m
Caused by: Failed to link input file ../mat.mat

Exception in run_hwsq_matload:
    Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0001.dat, 1]
    Host: cluster
    Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/e/run_hwsq_matload-esn64g8m
Caused by: Failed to link input file ../mat.mat

Exception in run_hwsq_matload:
    Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0011.dat, 11]
    Host: cluster
    Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/f/run_hwsq_matload-fsn64g8m
Caused by: Failed to link input file ../mat.mat

Exception in run_hwsq_matload:
    Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0028.dat, 28]
    Host: cluster
    Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/g/run_hwsq_matload-gsn64g8m
Caused by: Failed to link input file ../mat.mat

Exception in run_hwsq_matload:
    Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0005.dat, 5]
    Host: cluster
    Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/h/run_hwsq_matload-hsn64g8m
Caused by: Failed to link input file ../mat.mat

Progress:  time: Thu, 07 May 2015 11:48:52 -0500  Active:3  Checking status:1  Failed:6  Finished successfully:31
Progress:  time: Thu, 07 May 2015 11:49:02 -0500  Active:2  Checking status:1  Failed:6  Finished successfully:32
Final status: Thu, 07 May 2015 11:49:02 -0500  Failed:6  Finished successfully:35
The following errors have occurred:
1. Exception in run_hwsq_matload:
    Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0017.dat, 17]
    Host: cluster
    Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/c/run_hwsq_matload-csn64g8m
Caused by:
        Failed to link input file ../mat.mat
2. Exception in run_hwsq_matload:
    Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0011.dat, 11]
    Host: cluster
    Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/f/run_hwsq_matload-fsn64g8m
Caused by:
        Failed to link input file ../mat.mat
3. Exception in run_hwsq_matload:
    Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0001.dat, 1]
    Host: cluster
    Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/e/run_hwsq_matload-esn64g8m
Caused by:
        Failed to link input file ../mat.mat
4. Exception in run_hwsq_matload:
    Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0000.dat, 0]
    Host: cluster
    Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/d/run_hwsq_matload-dsn64g8m
Caused by:
        Failed to link input file ../mat.mat
5. Exception in run_hwsq_matload:
    Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0028.dat, 28]
    Host: cluster
    Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/g/run_hwsq_matload-gsn64g8m
Caused by:
        Failed to link input file ../mat.mat
6. Exception in run_hwsq_matload:
    Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0005.dat, 5]
    Host: cluster
    Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/h/run_hwsq_matload-hsn64g8m
Caused by:
        Failed to link input file ../mat.mat
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% run.sh script
#! /bin/bash

PATH=~wilde/swift/rev/swift-0.94.1/bin:$PATH
swift -sites.file sites.xml -tc.file tc -config cf hwsq_matload.swift

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% run_hwsq_matload.sh script (autogenerated by matlab and I edited for environment)
#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MCR environment for the current $ARCH and executes 
# the specified command.
#
exe_name=$0
exe_dir=`dirname "$0"`
echo "------------------------------------------"
if [ "x$1" = "x" ]; then
  echo Usage:
  echo    $0 \<deployedMCRroot\> args
else
  echo Setting up environment variables
  MCRROOT="$1"
  echo ---
  LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
  LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
  # LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/clhome/TOMO1/tomo/MATLAB_Compiler_Runtime/v82/runtime/glnxa64:/clhome/TOMO1/tomo/MATLAB_Compiler_Runtime/v82/bin/glnxa64:/clhome/TOMO1/tomo/MATLAB_Compiler_Runtime/v82/sys/os/glnxa64:/clhome/TOMO1/tomo/MATLAB_Compiler_Runtime/v82/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/clhome/TOMO1/tomo/MATLAB_Compiler_Runtime/v82/sys/java/jre/glnxa64/jre/lib/amd64/server:/clhome/TOMO1/tomo/MATLAB_Compiler_Runtime/v82/sys/java/jre/glnxa64/jre/lib/amd64;

  XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
  # XAPPLRESDIR=XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
  
  export XAPPLRESDIR;
  export LD_LIBRARY_PATH;
  echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
  shift 1
  args=
  while [ $# -gt 0 ]; do
      token=$1
      args="${args} \"${token}\"" 
      shift
  done
  eval "/clhome/TOMO1/tomo/park_swift/bin_hwsq_matload/hwsq_matload" $args
fi
exit
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% hwsq_matload.swift
type file;

app (file outdata) hwsq_matload (file indata, int factor)
{
  run_hwsq_matload "/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/" @indata @outdata factor;
}

file degreeData<"../mat.mat">;

int factors[] = [0:40];
file squareMats[] <simple_mapper; prefix="sqmat.",suffix=".dat">;

foreach f, i in factors {
  squareMats[i] = hwsq_matload (degreeData, f);
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% tc
cluster run_hwsq_matload /clhome/TOMO1/tomo/park_swift/bin_hwsq_matload/run_hwsq_matload.sh

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% sites.xml
<config>

 <pool handle="localhost">
   <execution provider="local"/>
   <filesystem provider="local"/>
   <workdirectory>{env.HOME}/swiftwork</workdirectory>
 </pool>

 <pool handle="cluster">
   <execution provider="coaster" jobmanager="local:sge"/>

   <!-- Set partition and account here: -->
   <profile namespace="globus" key="queue">sec1bigmem</profile> -->
   <profile namespace="globus" key="pe">sec1_bigmem</profile> -->
   <profile namespace="globus" key="ppn">32</profile>
   <!-- <profile namespace="globus" key="project">pi-wilde</profile> -->

   <!-- Set number of jobs and nodes per job here: -->
   <profile namespace="globus" key="slots">1</profile>
   <profile namespace="globus" key="maxnodes">1</profile>
   <profile namespace="globus" key="nodegranularity">1</profile>
   <profile namespace="globus" key="jobsPerNode">16</profile> <!-- apps per node! -->
   <profile namespace="karajan" key="jobThrottle">.15</profile> <!-- eg .11 -> 12 -->

   <!-- Set estimated app time (maxwalltime) and requested job time (maxtime) here: -->
   <profile namespace="globus" key="maxWalltime">00:15:00</profile>
   <profile namespace="globus" key="maxtime">1800</profile>  <!-- in seconds! -->

   <!-- Set data staging model and work dir here: -->
   <filesystem provider="local"/>
   <workdirectory>{env.HOME}/swiftwork</workdirectory>

   <!-- Typically leave these constant: -->
   <!-- <profile namespace="globus" key="slurm.exclusive">false</profile> -->
   <profile namespace="globus" key="highOverAllocation">100</profile>
   <profile namespace="globus" key="lowOverAllocation">100</profile>
   <profile namespace="karajan" key="initialScore">10000</profile>
 </pool>

</config>


More information about the Swift-user mailing list