[Swift-user] queue problem?

Sheri Mickelson mickelso at mcs.anl.gov
Thu May 19 12:13:22 CDT 2011


I hit the queue problem again in my latest attempt (queuedsize > 0 but no job dequeued. Queued: {})
Here's a summary of my attempts in the order I tried them:

-I used swift-0.92.1 on the mcs machines and everything worked fine.
-I used swift-0.92.1 on fusion and I hit the queue failure (used coasters).
-I used Justin's trunk branch on fusion and got the "mapper.existing() returned a path [3] that it 
cannot     subsequently map" error
-I used Mike's older version (/home/wilde/swift/rev/0.92/bin/swift) on fusion and it worked after I 
fixed the path problem.
-I tried using swift-0.92.1 again on fusion and got the queue error again.  This was using the same 
code,directories,etc that I used while running with Mike's version.  I just changed the hard coded 
path of which version of swift I was using.

I have the last attempt I tried in /fusion/gpfs/home/mickelso/amwg-swift/swift
Also, I piped the output to script.out.

-Sheri


Justin M Wozniak wrote:
> 
> Yeah, I was about to suggest that the file might not be there.  Let me 
> know what you find.
> 
> On Thu, 19 May 2011, Sheri Mickelson wrote:
> 
>> Hi Mike,
>>
>> I was originally running 0.92.1, but I got the "mapper.existing() 
>> returned a path [3] that it cannot subsequently map" error using 
>> Justin's trunk version.
>>
>> I went back to an older version of swift and I think I might have 
>> found what was causing the initial error (an error in one of my csh 
>> scripts that had the wrong path in it).  I'm still looking into it and 
>> let you know how it goes.
>>
>> Justin, the path to my working directory is 
>> /home/climate1/mickelso/amwg-swift/test-swift.
>>
>> -Sheri
>>
>> Michael Wilde wrote:
>>> Also, SHeri - are you using Swift 0.92.1?  This looks a bit like the 
>>> bug in 0.92 that was fixed in 0.92.1
>>>
>>> - Mike
>>>
>>> ----- Original Message -----
>>>> Is this a SwiftScript that ran successfully on the MCS machines but
>>>> fails
>>>> on Fusion? If so, can you point me to the working directory for this
>>>> run?
>>>> Justin
>>>>
>>>> On Mon, 16 May 2011, Sheri Mickelson wrote:
>>>>
>>>>> I'm seeing a different error now:
>>>>> mapper.existing() returned a path [3] that it cannot subsequently
>>>>> map
>>>>>
>>>>> It starts up, but dies shortly after that. I attached the log file.
>>>>>
>>>>> -Sheri
>>>>>
>>>>> Justin M Wozniak wrote:
>>>>>> That's probably a perms thing, I just reapplied the permissions,
>>>>>> please try
>>>>>> again.
>>>>>>
>>>>>> On Mon, 16 May 2011, Sheri Mickelson wrote:
>>>>>>
>>>>>>> Hi Justin,
>>>>>>>
>>>>>>> I'm getting this error when swift tries to run:
>>>>>>>
>>>>>>> Exception in thread "main" java.lang.NoClassDefFoundError:
>>>>>>> org/griphyn/vdl/karajan/Loader
>>>>>>> Caused by: java.lang.ClassNotFoundException:
>>>>>>> org.griphyn.vdl.karajan.Loader
>>>>>>>     at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
>>>>>>>     at java.security.AccessController.doPrivileged(Native Method)
>>>>>>>     at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
>>>>>>>     at
>>>>>>>     sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
>>>>>>>     at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
>>>>>>> Could not find the main class: org.griphyn.vdl.karajan.Loader.
>>>>>>> Program
>>>>>>> will exit.
>>>>>>>
>>>>>>> -Sheri
>>>>>>>
>>>>>>> Justin M Wozniak wrote:
>>>>>>>> Let's go with my trunk-based installation in the location below
>>>>>>>> for now.
>>>>>>>> I tried testing this again over the weekend but did not get
>>>>>>>> through the
>>>>>>>> queue. I have already set up the additional logging in this
>>>>>>>> installation.
>>>>>>>>
>>>>>>>> /homes/wozniak/Public/cog/modules/swift/dist/swift-svn/bin/swift
>>>>>>>>
>>>>>>>>     Justin
>>>>>>>>
>>>>>>>> On Fri, 13 May 2011, Sheri Mickelson wrote:
>>>>>>>>
>>>>>>>>> Here's the log file.
>>>>>>>>> This is the first time I'm running this version of swift on
>>>>>>>>> fusion. I
>>>>>>>>> had done my development work with this swift version on an mcs
>>>>>>>>> compute
>>>>>>>>> machine.
>>>>>>>>>
>>>>>>>>> -Sheri
>>>>>>>>>
>>>>>>>>> Justin M Wozniak wrote:
>>>>>>>>>> Hello
>>>>>>>>>>     Can you send the log for this run?
>>>>>>>>>>     Is this a new issue that appeared after an update?
>>>>>>>>>>     Also, in any future runs regarding this issue, please add
>>>>>>>>>>
>>>>>>>>>> log4j.logger.org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor 
>>>>>>>>>>
>>>>>>>>>> = DEBUG
>>>>>>>>>>
>>>>>>>>>> (one line) to your etc/log4j.properties file.
>>>>>>>>>>
>>>>>>>>>>     Thanks
>>>>>>>>>>     Justin
>>>>>>>>>>
>>>>>>>>>> On Fri, 13 May 2011, Sheri Mickelson wrote:
>>>>>>>>>>
>>>>>>>>>>> I'm running into a problem running swift version 0.92.1 on
>>>>>>>>>>> fusion with
>>>>>>>>>>> coasters.
>>>>>>>>>>> This is the error I'm seeing:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ----------------------------------------------------------------------------- 
>>>>>>>>>>>
>>>>>>>>>>> Progress: Selecting site:168 Submitted:23 Active:2
>>>>>>>>>>> Progress: Selecting site:168 Submitted:23 Active:1 Checking
>>>>>>>>>>> status:1
>>>>>>>>>>> Progress: Selecting site:167 Stage in:1 Submitted:22 Active:2
>>>>>>>>>>> Finished successfully:1
>>>>>>>>>>> queuedsize > 0 but no job dequeued. Queued: {}
>>>>>>>>>>> java.lang.Throwable
>>>>>>>>>>>     at
>>>>>>>>>>> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) 
>>>>>>>>>>>
>>>>>>>>>>>     at
>>>>>>>>>>> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) 
>>>>>>>>>>>
>>>>>>>>>>>     at
>>>>>>>>>>> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) 
>>>>>>>>>>>
>>>>>>>>>>> queuedsize > 0 but no job dequeued. Queued: {}
>>>>>>>>>>> java.lang.Throwable
>>>>>>>>>>>     at
>>>>>>>>>>> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.requeueNonFitting(BlockQueueProcessor.java:252) 
>>>>>>>>>>>
>>>>>>>>>>>     at
>>>>>>>>>>> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.updatePlan(BlockQueueProcessor.java:520) 
>>>>>>>>>>>
>>>>>>>>>>>     at
>>>>>>>>>>> org.globus.cog.abstraction.coaster.service.job.manager.BlockQueueProcessor.run(BlockQueueProcessor.java:109) 
>>>>>>>>>>>
>>>>>>>>>>> Shutting down worker
>>>>>>>>>>>
>>>>>>>>>>> Shutting down worker
>>>>>>>>>>>
>>>>>>>>>>> Shutting down worker
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ----------------------------------------------------------------------------- 
>>>>>>>>>>>
>>>>>>>>>>> And here's my sites file:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ----------------------------------------------------------------------------- 
>>>>>>>>>>>
>>>>>>>>>>> <config>
>>>>>>>>>>> <pool handle="fusion">
>>>>>>>>>>>  <execution jobmanager="local:pbs" provider="coaster"
>>>>>>>>>>>  url="none"/>
>>>>>>>>>>>  <profile namespace="globus" key="maxtime">3600</profile>
>>>>>>>>>>>  <profile namespace="globus" key="workersPerNode">1</profile>
>>>>>>>>>>>  <profile namespace="globus" key="slots">1</profile>
>>>>>>>>>>>  <profile namespace="globus" key="nodeGranularity">4</profile>
>>>>>>>>>>>  <profile namespace="globus" key="maxNodes">2</profile>
>>>>>>>>>>>  <profile namespace="globus" key="queue">batch</profile>
>>>>>>>>>>>  <profile namespace="karajan" key="jobThrottle">0.23</profile>
>>>>>>>>>>>  <profile namespace="karajan"
>>>>>>>>>>>  key="initialScore">10000</profile>
>>>>>>>>>>>  <profile namespace="globus" key="project">parvis</profile>
>>>>>>>>>>>  <profile namespace="globus"
>>>>>>>>>>>  key="lowOverAllocation">100</profile>
>>>>>>>>>>>  <profile namespace="globus"
>>>>>>>>>>>  key="highOverAllocation">100</profile>
>>>>>>>>>>>  <filesystem provider="local"/>
>>>>>>>>>>>  <workdirectory>/fusion/gpfs/home/mickelso/amwg-swift/swift/</workdirectory> 
>>>>>>>>>>>
>>>>>>>>>>> </pool>
>>>>>>>>>>> </config>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> ----------------------------------------------------------------------------- 
>>>>>>>>>>>
>>>>>>>>>>> Do you know what might be causing this?
>>>>>>>>>>>
>>>>>>>>>>> Thanks, Sheri
>>>>>>>>>>> _______________________________________________
>>>>>>>>>>> Swift-user mailing list
>>>>>>>>>>> Swift-user at ci.uchicago.edu
>>>>>>>>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>>>>>>>>>
>>>> -- 
>>>> Justin M Wozniak
>>>> _______________________________________________
>>>> Swift-user mailing list
>>>> Swift-user at ci.uchicago.edu
>>>> http://mail.ci.uchicago.edu/mailman/listinfo/swift-user
>>>
>>
> 



More information about the Swift-user mailing list