[Swift-user] Using swift with falkon on teraport

Michael Wilde wilde at mcs.anl.gov
Tue Mar 25 11:18:42 CDT 2008


Hi Quan,

I'm doing something similar at the moment on machines at Argonne.

Do you already have Falkon built? (I'm using the attached file of notes 
that I compiled from Ioan).

I run swift and falkon together on a host that has access to the cluster 
shared filesystem, which in your case would be tp-login (or better yet a 
cluster node that you can allocate using qsub -I, as as not to over-tax 
a login host).

I use the local data provider, so that swift uses direct 
shared-filesystem access to move data back and forth and do directory 
and status file management.

Here's my sites file:


Below is my working doc of info form Ioan and Zhao, also attached in word.

- Mike

<pool handle="sico">
       <gridftp  url="local://localhost"/>
       <execution provider="deef"
 
url="http://140.221.37.30:50001/wsrf/services/GenericPortal/core/WS/GPFactoryService"/>
       <workdirectory>/home/wilde/swiftwork</workdirectory>
</pool>


Compiling Swift with Falkon support:

when you build Swift, add the -Dwith-provider-deef option:

cd ${FALKON_ROOT}/cog/modules/vdsk/
ant -Dwith-provider-deef redist

Security Note

BGexec supports no security
they connect back to the Falkon service and get work from there
they don't have any server sockets
so someone would have to hijack the connections and fake the service
for them to inject jobs to the workers...
if the workers would have had server sockets listening on some ports
then it would be different
but they are simple clients that only generate outgoing connections to a 
specific IP
the service IP

and the Falkon service can run on the same box with Swift
behind a firewall
with only 3 ports open

Java Needs

IA64 nodes require Java 1.4
work up to 1.6

Falkon Tarball

wget http://people.cs.uchicago.edu/~iraicu/source/falkon-r83.tgz
tar xfz falkon-r83.tgz
cd falkon-r83/
source falkon.env

if you want to re-build (not needed for this tar ball)
falkon-clean.sh
falkon-build.sh

Building Falkon

The SVN archive has grown rather large recently, and some of the 
directories (i.e. workloads and AstroPortal) make up the largest part of 
the contents.  With its current organization, here is how you would do a 
minimal checkout (~43MB, Falkon User Guide, Section 2.1, 
http://dev.globus.org/images/0/0e/Falkon_User_Guide_v2.pdf), and compile:

export ANT_HOME=/home/wilde/ant/dist

svn co https://svn.globus.org/repos/falkon -N
cd falkon
svn co https://svn.globus.org/repos/falkon/bin
source falkon.env
falkon-checkout-minimal.sh
source falkon.env
falkon-build.sh
This checkout takes 62 seconds for me, and the compile takes 43 seconds.
BTW, the entire thing (including all .svn dirs and compiled) is 148MB 
after a clean checkout and compilation.

Starting Falkon

On screen 1:

cd falkon-r83
source falkon.env
falkon-service-stdout.sh 50001 config/Falkon-TCPCore.config

On screen 2:

cd falkon-r83
source falkon.env
falkon-worker-stdout.sh localhost 50001

at this point, you have the service running... press any key and enter 
at the worker to terminate

BGexec’s on sico:

The file: /home/iraicu/java/svn/falkon/worker/ServiceName.txt
points each BGexec to where the service is running
so you need to update that file prior to starting the BGexecs with the 
IP of the service

then to start them:

cd ~iraicu/java/svn/falkon/worker
./run.drp-slurm.sh 6 60

this would start 6 BGexecs for 60 minutes

you might need to copy over the BGexec source (1 file) and compile it on 
the SiCo itself
and the starting scripts (2 of them)

Testing:

create a 3rd screen
cd falkon-r83
source falkon.env
falkon-client.sh 140.221.37.30 50001 workloads/sleep/sleep_1x10
the IP can also be localhost at this point

Debugging and Logs

here are the logs you need to make sure you capture when running in 
debug mode:

cd ~/java/svn/falkon/config
cat Falkon-TCPCore.config

GenericPortalWS=falkon_task_submission_history.txt
GenericPortalWS_perf_per_sec=falkon_summary.txt
GenericPortalWS_taskPerf=falkon_task_perf.txt
GenericPortalWS_task=falkon_task_status.txt

When running in normal mode (when we know things work fine), we just need:

cd ~/java/svn/falkon/config
cat Falkon-TCPCore.config

GenericPortalWS_perf_per_sec=falkon_summary.txt
GenericPortalWS_taskPerf=falkon_task_perf.txt

In the event that we can't figure out things from the Swift and Falkon 
service logs, we might have to enable worker side logs as well, which 
you do from the run.worker-c.sh (or run.worker-c-ram.sh) script(s).



On 3/25/08 11:01 AM, Quan Tran Pham wrote:
> Hi,
> 
> I just wonder if anyone has run Swift with Falkon on teraport? How do
> you config Swift (sites.xml (no sample on falkon), tc.data (no need to
> change?)). I find a link about Swift and Falkon here
> http://dev.globus.org/wiki/Incubator/Falkon#Project_Branches , but the
> link to the article has no content.
> 
> I am try ing to: run falkon on tp-login, run swift on that same
> machine to submit jobs to falkon to run on teraport.
> 
> Thank you very much
> 
> Quan Pham
> _______________________________________________
> Swift-user mailing list
> Swift-user at ci.uchicago.edu
> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
> 
> 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Falkon.SiCo.FromIoan.2008.0311.doc
Type: application/msword
Size: 37376 bytes
Desc: not available
URL: <http://lists.mcs.anl.gov/pipermail/swift-user/attachments/20080325/79a9a9a6/attachment.doc>


More information about the Swift-user mailing list