[AG-TECH] Re: [AG-L] Lastest kernels (Fedora) causing issues with the AG Software

Thomas D. Uram turam at mcs.anl.gov
Mon Jun 23 22:04:35 CDT 2008


It's a problem of the current SVN code that's included in the 3.2 
pre-beta packages.

The problem is due to services being optionally loaded inline rather 
than run as separate processes.  This is controlled by an inlineClass 
option in the .svc file.  Your fix of using a random string in the 
startupFile name is very similar to the approach that I had in mind, and 
that I thought was already in place.  I'm puzzling over that at the 
moment...

I'd suggest you commit the fix using getrandbits and cut new packages.  
We may change it later, but getrandbits will solve the problem for now.

Tom


On 6/23/08 9:51 PM, Christoph Willing wrote:
>
> On 24/06/2008, at 9:27 AM, Christoph Willing wrote:
>
>>
>> On 23/06/2008, at 11:58 AM, Jason Bell wrote:
>>
>>> Colleagues
>>>
>>> This is just a quick email to warm about issues in regards to updating
>>> to the latest kernels in Fedora.
>>>
>>> It has been found that using kernels 2.6.25.6-27.fc8 and 
>>> 2.6.25.4-10.fc8
>>> (Fedora 8) has caused issues in regards to "multiple instances of vic
>>> (producer and service) "locking up".  Basically, only one of the 
>>> streams
>>> will transmit, whilst the others will lock up.
>>
>>
>> This isn't just a Fedora thing, I just had the same problem with 
>> another distro upgraded to 2.6.25.8 (although no vic lock ups in this 
>> case). Rather, its a kernel version (2.6.25) issue - in particular 
>> its interaction with python.
>
>
> More info - I can replicate this bug with a clean install of 3.2 with 
> earlier kernel versions too. This means the problem is not a Linux 
> kernel version issue and, by extension, won't be limited to Linux but 
> will affect all platforms.
>
>
> chris
>
>
>> For those in a hurry, here's short fix until something more elegant 
>> is devised:
>> 1. edit your 
>> ~/.AccessGrid/local_services/VideoService/VideoService.py (or 
>> VideoProducerService/VideoProducerService.py) to include a new import:
>>     from random import getrandbits
>> 2. then search for "os.getpid()" and replace it with "getrandbits(16)"
>> 3. now restart your VenueClient
>>
>>
>> For those interested in the background to the problem, each time 
>> VideoService.py is run (i.e. for each video device) an individual 
>> temporary startup file is created containing various startup 
>> parameters including the video device name (/dev/video0 etc.). Each 
>> of these startup files should have a unique name, normally 
>> constructed using the process id of the current (VideoService.py) 
>> process using python's os.getpid() function. Previously this returned 
>> a new pid for each service, but for some reason it now (with kernel 
>> 2.6.25) returns the process id of the VenueClient itself. This means 
>> that each of the startup files now has the same name (in fact each 
>> new one overwites the previous one) resulting in all vics being 
>> started with the same startup file. Each vic then tries to open the 
>> single video device named in that startup file. Only the first vic to 
>> run succeeds in grabbing the video device, the others trying to grab 
>> it just cause problems.
>>
>> Replacing os.getpid() with getrandbits() ensures that each of the vic 
>> startup files has a unique name so that all vics can run as configured.
>>
>>
>> chris
>>
>>
>>
>>> An interesting side effect is that you cannot manually "kill" the 
>>> locked
>>> up "vic" process.  Though xkill seems to work.
>>>
>>> Additionally, if you add each stream manually (using configure node
>>> service), rather than loading from a configuration, this appears to 
>>> work
>>> fine.
>>>
>>> Will keep you posted when through testing has been conducted and any
>>> possible resolutions found.
>>>
>>> Therefore, for now, I recommend not updating your kernel and/or 
>>> nvidia /
>>> xorg drivers.
>>>
>>> We are also investigating whether there are any issues with out "Linux"
>>> variants, but it appears that most other Linux distributions still only
>>> run 2.6.24.xxx kernel, rather than 2.6.25.xxx, which appears to be the
>>> problematic version.
>>>
>>> Thanks for your time,
>>> Jason.
>>>
>>> --------------------------------------------
>>> Jason Bell, B.I.T. (Honours)
>>>
>>> Research Systems Support Officer
>>> Information Technology Division
>>> Central Queensland University
>>>
>>> Australian Research Collaboration Service
>>> http://www.arcs.org.au/
>>>
>>> E-mail : j.bell at cqu.edu.au
>>>          jason.bell at arcs.org.au
>>> Work   : +61 7 4930 9229
>>> Mobile : 0409 630897
>>> Postal : Building 19
>>>          Central Queensland University
>>>          Bruce Highway
>>>          Rockhampton, Queensland, Australia, 4702
>>> --------------------------------------------
>>> Patience is a virtue.
>>>
>>> But if I wanted Patience,
>>> I would have become a Doctor.
>>> --------------------------------------------
>>>
>>>
>>> _______________________________________________
>>> accessgrid-l mailing list
>>> accessgrid-l at lists.aarnet.edu.au
>>> http://lists.aarnet.edu.au/mailman/listinfo/accessgrid-l
>>
>> Christoph Willing                        +617 3365 8350
>> QCIF Access Grid Manager
>> University of Queensland
>>
>>
>>
>>
>> _______________________________________________
>> accessgrid-l mailing list
>> accessgrid-l at lists.aarnet.edu.au
>> http://lists.aarnet.edu.au/mailman/listinfo/accessgrid-l
>
> Christoph Willing                        +617 3365 8350
> QCIF Access Grid Manager
> University of Queensland
>
>
>
>
> _______________________________________________
> accessgrid-l mailing list
> accessgrid-l at lists.aarnet.edu.au
> http://lists.aarnet.edu.au/mailman/listinfo/accessgrid-l
>




More information about the ag-tech mailing list