[Swift-user] swift with matlab
Park, Jun-Sang
parkjs at aps.anl.gov
Fri May 8 02:20:49 CDT 2015
Hello,
I am tinkering with the matlab + swift capability and am having problems.
As a trial, what I am trying is to do with the "hello world" problem (hwsq or the magic square problem) is this; instead of giving a csv input file path in the vanilla "hello world" problem, I am giving a mat-file path as an input and loading it in the "hello world" problem.
The problem I am having is when I change the size of the problem (increase the number for iterations in the foreach statement) , it no longer works; smaller problems run, bigger problems fail with the following message. It seems like some jobs go through but some don't. Is it possible that there are just too many pings to the mat file and swift or file system somehow doesn't like?
Anyways, if there is someone I can talk to or if we can talk about how to go about it, I'd appreciate it.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Swift 0.94 swift-r6637 cog-r3742
RunID: 20150507-1148-9kb4by0e
Progress: time: Thu, 07 May 2015 11:48:18 -0500
Progress: time: Thu, 07 May 2015 11:48:21 -0500 Selecting site:25 Submitted:15 Active:1
Progress: time: Thu, 07 May 2015 11:48:35 -0500 Selecting site:25 Active:15 Checking status:1
Progress: time: Thu, 07 May 2015 11:48:36 -0500 Selecting site:18 Stage in:1 Active:14 Finished successfully:8
Progress: time: Thu, 07 May 2015 11:48:37 -0500 Selecting site:10 Stage in:1 Active:15 Finished successfully:15
Progress: time: Thu, 07 May 2015 11:48:48 -0500 Selecting site:9 Active:16 Finished successfully:16
Progress: time: Thu, 07 May 2015 11:48:49 -0500 Selecting site:9 Active:15 Checking status:1 Finished successfully:16
Exception in run_hwsq_matload:
Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0017.dat, 17]
Host: cluster
Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/c/run_hwsq_matload-csn64g8m
Caused by: Failed to link input file ../mat.mat
Progress: time: Thu, 07 May 2015 11:48:50 -0500 Submitted:1 Active:14 Failed:2 Finished successfully:24
Exception in run_hwsq_matload:
Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0000.dat, 0]
Host: cluster
Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/d/run_hwsq_matload-dsn64g8m
Caused by: Failed to link input file ../mat.mat
Exception in run_hwsq_matload:
Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0001.dat, 1]
Host: cluster
Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/e/run_hwsq_matload-esn64g8m
Caused by: Failed to link input file ../mat.mat
Exception in run_hwsq_matload:
Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0011.dat, 11]
Host: cluster
Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/f/run_hwsq_matload-fsn64g8m
Caused by: Failed to link input file ../mat.mat
Exception in run_hwsq_matload:
Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0028.dat, 28]
Host: cluster
Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/g/run_hwsq_matload-gsn64g8m
Caused by: Failed to link input file ../mat.mat
Exception in run_hwsq_matload:
Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0005.dat, 5]
Host: cluster
Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/h/run_hwsq_matload-hsn64g8m
Caused by: Failed to link input file ../mat.mat
Progress: time: Thu, 07 May 2015 11:48:52 -0500 Active:3 Checking status:1 Failed:6 Finished successfully:31
Progress: time: Thu, 07 May 2015 11:49:02 -0500 Active:2 Checking status:1 Failed:6 Finished successfully:32
Final status: Thu, 07 May 2015 11:49:02 -0500 Failed:6 Finished successfully:35
The following errors have occurred:
1. Exception in run_hwsq_matload:
Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0017.dat, 17]
Host: cluster
Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/c/run_hwsq_matload-csn64g8m
Caused by:
Failed to link input file ../mat.mat
2. Exception in run_hwsq_matload:
Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0011.dat, 11]
Host: cluster
Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/f/run_hwsq_matload-fsn64g8m
Caused by:
Failed to link input file ../mat.mat
3. Exception in run_hwsq_matload:
Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0001.dat, 1]
Host: cluster
Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/e/run_hwsq_matload-esn64g8m
Caused by:
Failed to link input file ../mat.mat
4. Exception in run_hwsq_matload:
Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0000.dat, 0]
Host: cluster
Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/d/run_hwsq_matload-dsn64g8m
Caused by:
Failed to link input file ../mat.mat
5. Exception in run_hwsq_matload:
Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0028.dat, 28]
Host: cluster
Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/g/run_hwsq_matload-gsn64g8m
Caused by:
Failed to link input file ../mat.mat
6. Exception in run_hwsq_matload:
Arguments: [/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/, ../mat.mat, sqmat.0005.dat, 5]
Host: cluster
Directory: hwsq_matload-20150507-1148-9kb4by0e/jobs/h/run_hwsq_matload-hsn64g8m
Caused by:
Failed to link input file ../mat.mat
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% run.sh script
#! /bin/bash
PATH=~wilde/swift/rev/swift-0.94.1/bin:$PATH
swift -sites.file sites.xml -tc.file tc -config cf hwsq_matload.swift
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% run_hwsq_matload.sh script (autogenerated by matlab and I edited for environment)
#!/bin/sh
# script for execution of deployed applications
#
# Sets up the MCR environment for the current $ARCH and executes
# the specified command.
#
exe_name=$0
exe_dir=`dirname "$0"`
echo "------------------------------------------"
if [ "x$1" = "x" ]; then
echo Usage:
echo $0 \<deployedMCRroot\> args
else
echo Setting up environment variables
MCRROOT="$1"
echo ---
LD_LIBRARY_PATH=.:${MCRROOT}/runtime/glnxa64 ;
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/bin/glnxa64 ;
LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${MCRROOT}/sys/os/glnxa64;
# LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:/clhome/TOMO1/tomo/MATLAB_Compiler_Runtime/v82/runtime/glnxa64:/clhome/TOMO1/tomo/MATLAB_Compiler_Runtime/v82/bin/glnxa64:/clhome/TOMO1/tomo/MATLAB_Compiler_Runtime/v82/sys/os/glnxa64:/clhome/TOMO1/tomo/MATLAB_Compiler_Runtime/v82/sys/java/jre/glnxa64/jre/lib/amd64/native_threads:/clhome/TOMO1/tomo/MATLAB_Compiler_Runtime/v82/sys/java/jre/glnxa64/jre/lib/amd64/server:/clhome/TOMO1/tomo/MATLAB_Compiler_Runtime/v82/sys/java/jre/glnxa64/jre/lib/amd64;
XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
# XAPPLRESDIR=XAPPLRESDIR=${MCRROOT}/X11/app-defaults ;
export XAPPLRESDIR;
export LD_LIBRARY_PATH;
echo LD_LIBRARY_PATH is ${LD_LIBRARY_PATH};
shift 1
args=
while [ $# -gt 0 ]; do
token=$1
args="${args} \"${token}\""
shift
done
eval "/clhome/TOMO1/tomo/park_swift/bin_hwsq_matload/hwsq_matload" $args
fi
exit
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% hwsq_matload.swift
type file;
app (file outdata) hwsq_matload (file indata, int factor)
{
run_hwsq_matload "/clhome/TOMO1/tomo/MATLAB/MATLAB_Compiler_Runtime/v84/" @indata @outdata factor;
}
file degreeData<"../mat.mat">;
int factors[] = [0:40];
file squareMats[] <simple_mapper; prefix="sqmat.",suffix=".dat">;
foreach f, i in factors {
squareMats[i] = hwsq_matload (degreeData, f);
}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% tc
cluster run_hwsq_matload /clhome/TOMO1/tomo/park_swift/bin_hwsq_matload/run_hwsq_matload.sh
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
% sites.xml
<config>
<pool handle="localhost">
<execution provider="local"/>
<filesystem provider="local"/>
<workdirectory>{env.HOME}/swiftwork</workdirectory>
</pool>
<pool handle="cluster">
<execution provider="coaster" jobmanager="local:sge"/>
<!-- Set partition and account here: -->
<profile namespace="globus" key="queue">sec1bigmem</profile> -->
<profile namespace="globus" key="pe">sec1_bigmem</profile> -->
<profile namespace="globus" key="ppn">32</profile>
<!-- <profile namespace="globus" key="project">pi-wilde</profile> -->
<!-- Set number of jobs and nodes per job here: -->
<profile namespace="globus" key="slots">1</profile>
<profile namespace="globus" key="maxnodes">1</profile>
<profile namespace="globus" key="nodegranularity">1</profile>
<profile namespace="globus" key="jobsPerNode">16</profile> <!-- apps per node! -->
<profile namespace="karajan" key="jobThrottle">.15</profile> <!-- eg .11 -> 12 -->
<!-- Set estimated app time (maxwalltime) and requested job time (maxtime) here: -->
<profile namespace="globus" key="maxWalltime">00:15:00</profile>
<profile namespace="globus" key="maxtime">1800</profile> <!-- in seconds! -->
<!-- Set data staging model and work dir here: -->
<filesystem provider="local"/>
<workdirectory>{env.HOME}/swiftwork</workdirectory>
<!-- Typically leave these constant: -->
<!-- <profile namespace="globus" key="slurm.exclusive">false</profile> -->
<profile namespace="globus" key="highOverAllocation">100</profile>
<profile namespace="globus" key="lowOverAllocation">100</profile>
<profile namespace="karajan" key="initialScore">10000</profile>
</pool>
</config>
More information about the Swift-user
mailing list