Stabilizing Text service

Robert Olson olson at mcs.anl.gov
Mon Jul 14 15:46:17 CDT 2003


I think the current fd problems -- the crash of this morning was a fd table 
overflow:

(gdb) where
#0  0x402f02f2 in globus_l_io_table_add (handle=0xad4c4c0) at 
globus_io_core.c:505
#1  0x402f0487 in globus_i_io_register_read_func (handle=0xad4c4c0, 
callback_func=0x402fbe64 <globus_l_io_read_auth_token>,
     callback_arg=0xacd0578, arg_destructor=0, register_select=1) at 
globus_io_core.c:627
#2  0x402f9251 in globus_i_io_securesocket_register_accept (handle=0xad4c4c0,
     callback_func=0x402efec0 <globus_i_io_accept_callback>, 
callback_arg=0xa9fafe8) at globus_io_securesocket.c:391
#3  0x402ff217 in globus_io_tcp_register_accept (listener_handle=0x83bdc78, 
attr=0x83ad668, new_handle=0xad4c4c0,
     callback=0x402efc94 <globus_i_io_monitor_callback>, 
callback_arg=0xbe3fecec) at globus_io_tcp.c:925
#4  0x402ff3fd in globus_io_tcp_accept (listener_handle=0x83bdc78, 
attr=0x83ad668, handle=0xad4c4c0) at globus_io_tcp.c:1028
#5  0x402d43ea in tcp_accept (listenerHandle=0x83bdc78, attr=0x83ad668) at 
src/io_wrap.c:2371
#6  0x402dc31c in _wrap_tcp_accept (self=0x0, args=0x8e049f4) at 
src/io_wrap.c:4188
#7  0x080cb709 in PyCFunction_Call ()
[snip]
(gdb) fr 0
#0  0x402f02f2 in globus_l_io_table_add (handle=0xad4c4c0) at 
globus_io_core.c:505
505             globus_l_io_fd_table[handle->fd]->handle = handle;
(gdb) list
500          *                  ("globus_l_io_table_add()\n"));
501          */
502
503         if (globus_l_io_fd_table[handle->fd])
504         {
505             globus_l_io_fd_table[handle->fd]->handle = handle;
506
507             goto fn_exit;
508         }
509         select_info = (globus_io_select_info_t *)
(gdb) p handle->fd
$3 = 257
(gdb) p globus_l_io_fd_tablesize
$4 = 256

with the TVS are due to the text service not yet having the sanity fixes 
applied to it that the event service. Note all the CLOSE_WAIT sockets 
lingering from the text service:


[root at vv2 bin]# /usr/sbin/lsof -i tcp:9006 -a -p 29581 |sort +7
python2 29581   ag   11u  IPv4 184475379       TCP *:9006 (LISTEN)
COMMAND   PID USER   FD   TYPE    DEVICE SIZE NODE NAME
python2 29581   ag   30u  IPv4 184546394       TCP 
vv2.mcs.anl.gov:9006->131.193.77.118:1459 (CLOSE_WAIT)
python2 29581   ag   35u  IPv4 184546508       TCP 
vv2.mcs.anl.gov:9006->131.193.77.118:1499 (CLOSE_WAIT)
python2 29581   ag   36u  IPv4 184546521       TCP 
vv2.mcs.anl.gov:9006->131.193.77.118:1538 (CLOSE_WAIT)
python2 29581   ag   50u  IPv4 184583926       TCP 
vv2.mcs.anl.gov:9006->131.193.77.118:1774 (ESTABLISHED)
python2 29581   ag   41u  IPv4 184546644       TCP 
vv2.mcs.anl.gov:9006->aero.east.isi.edu:32781 (CLOSE_WAIT)
python2 29581   ag   42u  IPv4 184546731       TCP 
vv2.mcs.anl.gov:9006->aero.east.isi.edu:32800 (CLOSE_WAIT)
python2 29581   ag   49u  IPv4 184547014       TCP 
vv2.mcs.anl.gov:9006->aero.east.isi.edu:32937 (CLOSE_WAIT)
python2 29581   ag   48u  IPv4 184554088       TCP 
vv2.mcs.anl.gov:9006->aero.east.isi.edu:32957 (CLOSE_WAIT)
python2 29581   ag   51u  IPv4 184579268       TCP 
vv2.mcs.anl.gov:9006->aero.east.isi.edu:32974 (ESTABLISHED)
python2 29581   ag   44u  IPv4 184546811       TCP 
vv2.mcs.anl.gov:9006->aglaptop.arsc.edu:1711 (CLOSE_WAIT)
python2 29581   ag   45u  IPv4 184546832       TCP 
vv2.mcs.anl.gov:9006->aglaptop.arsc.edu:1738 (ESTABLISHED)
python2 29581   ag   13u  IPv4 184475660       TCP 
vv2.mcs.anl.gov:9006->colomb.trace.wisc.edu:4317 (CLOSE_WAIT)
python2 29581   ag   14u  IPv4 184475678       TCP 
vv2.mcs.anl.gov:9006->colomb.trace.wisc.edu:4345 (CLOSE_WAIT)
python2 29581   ag   15u  IPv4 184475704       TCP 
vv2.mcs.anl.gov:9006->colomb.trace.wisc.edu:4370 (CLOSE_WAIT)
python2 29581   ag   16u  IPv4 184485219       TCP 
vv2.mcs.anl.gov:9006->colomb.trace.wisc.edu:4395 (CLOSE_WAIT)
python2 29581   ag   17u  IPv4 184486808       TCP 
vv2.mcs.anl.gov:9006->colomb.trace.wisc.edu:4420 (CLOSE_WAIT)
python2 29581   ag   22u  IPv4 184546116       TCP 
vv2.mcs.anl.gov:9006->colomb.trace.wisc.edu:4537 (CLOSE_WAIT)
python2 29581   ag   24u  IPv4 184546160       TCP 
vv2.mcs.anl.gov:9006->colomb.trace.wisc.edu:4570 (CLOSE_WAIT)
python2 29581   ag   25u  IPv4 184546172       TCP 
vv2.mcs.anl.gov:9006->colomb.trace.wisc.edu:4602 (CLOSE_WAIT)
python2 29581   ag   26u  IPv4 184546190       TCP 
vv2.mcs.anl.gov:9006->colomb.trace.wisc.edu:4634 (CLOSE_WAIT)
python2 29581   ag   19u  IPv4 184546024       TCP 
vv2.mcs.anl.gov:9006->deng.cs.uiuc.edu:2931 (ESTABLISHED)
python2 29581   ag   28u  IPv4 184546383       TCP 
vv2.mcs.anl.gov:9006->dhcp-7.ccr.buffalo.edu:1231 (CLOSE_WAIT)
python2 29581   ag   31u  IPv4 184546413       TCP 
vv2.mcs.anl.gov:9006->dhcp-7.ccr.buffalo.edu:1302 (CLOSE_WAIT)
python2 29581   ag   32u  IPv4 184546446       TCP 
vv2.mcs.anl.gov:9006->dhcp-7.ccr.buffalo.edu:1334 (CLOSE_WAIT)
python2 29581   ag   33u  IPv4 184546460       TCP 
vv2.mcs.anl.gov:9006->dhcp-7.ccr.buffalo.edu:1366 (CLOSE_WAIT)
python2 29581   ag   38u  IPv4 184546561       TCP 
vv2.mcs.anl.gov:9006->dhcp-7.ccr.buffalo.edu:1472 (CLOSE_WAIT)
python2 29581   ag   39u  IPv4 184546574       TCP 
vv2.mcs.anl.gov:9006->dhcp-7.ccr.buffalo.edu:1504 (CLOSE_WAIT)

A solution is to modify the text service to use the AsynchIO object that is 
in the asynch branch. The transformation should be fairly straightforward; 
if you look at EventServiceAsynch2.py in the asynch branch you'll find a 
version of the event service to which that tranformation has been applied. 
It would probably be a good thing to work that code into the release code 
for both the text and event services.

--bob




More information about the ag-dev mailing list