<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>
    <meta content="text/html; charset=ISO-8859-1"
      http-equiv="Content-Type">
  </head>
  <body bgcolor="#ffffff" text="#000000">
    I will look into the issue and fix.<br>
    -Paul<br>
    <blockquote cite="mid:4f4c866a.a123440a.3710.ffff9efe@mx.google.com"
      type="cite">
      <meta http-equiv="Content-Type" content="text/html;
        charset=ISO-8859-1">
      <meta name="Generator" content="Microsoft Word 12 (filtered
        medium)">
      <style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
        {font-family:Tahoma;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
@font-face
        {font-family:Verdana;
        panose-1:2 11 6 4 3 5 4 4 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        margin-bottom:.0001pt;
        font-size:12.0pt;
        font-family:"Times New Roman","serif";}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
        {mso-style-priority:99;
        color:purple;
        text-decoration:underline;}
p.MsoAcetate, li.MsoAcetate, div.MsoAcetate
        {mso-style-priority:99;
        mso-style-link:"Balloon Text Char";
        margin:0in;
        margin-bottom:.0001pt;
        font-size:8.0pt;
        font-family:"Tahoma","sans-serif";}
span.hoenzb
        {mso-style-name:hoenzb;}
span.EmailStyle18
        {mso-style-type:personal-reply;
        font-family:"Verdana","sans-serif";
        color:windowtext;}
span.BalloonTextChar
        {mso-style-name:"Balloon Text Char";
        mso-style-priority:99;
        mso-style-link:"Balloon Text";
        font-family:"Tahoma","sans-serif";}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
      <div class="WordSection1">
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";">John, Paul,<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";">I ran the
            example with the same options and the code aborts at<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";">a different
            location in cusp.  Although still called by PCSetUp_SACUSP.<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";">The example
            works fine if txpetscgpu is not used.<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";">Valgrind does
            not show any relevant issues prior to the std::terminate.<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";">My best guess
            based on this and some investigation is that this is
            happening<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";">because of
            inconsistent C style casts in the code (which are #ifdefed
            out when<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";">txpetscgpu is
            not used). They could be related to different code paths
            taken<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";">in calling
            MatCUSPCopyToGPU in sacusp.cu depending on txpetscgpu macro.<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";">I’m busy with
            other stuff, but I’ll let you know when this gets fixed.<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";">Chetan<o:p></o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";"><o:p> </o:p></span></p>
        <p class="MsoNormal"><span style="font-size: 11pt; font-family:
            "Verdana","sans-serif";"><o:p> </o:p></span></p>
        <div style="border-width: medium medium medium 1.5pt;
          border-style: none none none solid; border-color:
          -moz-use-text-color -moz-use-text-color -moz-use-text-color
          blue; padding: 0in 0in 0in 4pt;">
          <div>
            <div style="border-right: medium none; border-width: 1pt
              medium medium; border-style: solid none none;
              border-color: rgb(181, 196, 223) -moz-use-text-color
              -moz-use-text-color; padding: 3pt 0in 0in;">
              <p class="MsoNormal"><b><span style="font-size: 10pt;
                    font-family:
                    "Tahoma","sans-serif";">From:</span></b><span
                  style="font-size: 10pt; font-family:
                  "Tahoma","sans-serif";"> <a
                    moz-do-not-send="true"
                    href="mailto:petsc-dev-bounces@mcs.anl.gov">petsc-dev-bounces@mcs.anl.gov</a>
                  [<a moz-do-not-send="true"
                    href="mailto:petsc-dev-bounces@mcs.anl.gov">mailto:petsc-dev-bounces@mcs.anl.gov</a>]
                  <b>On Behalf Of </b>John Fettig<br>
                  <b>Sent:</b> Monday, February 27, 2012 2:02 PM<br>
                  <b>To:</b> For users of the development version of
                  PETSc<br>
                  <b>Subject:</b> Re: [petsc-dev] PETSc GPU capabilities<o:p></o:p></span></p>
            </div>
          </div>
          <p class="MsoNormal"><o:p> </o:p></p>
          <p class="MsoNormal" style="margin-bottom: 12pt;">It finally
            finished running through cuda-gdb.  Here's a backtrace. 
            new_size=46912574500784 in the call to
            thrust::detail::vector_base<double,
            thrust::device_malloc_allocator<double> >::resize
            looks suspicious.<br>
            <br>
            #0  0x0000003e1c832885 in raise () from /lib64/libc.so.6<br>
            #1  0x0000003e1c834065 in abort () from /lib64/libc.so.6<br>
            #2  0x0000003e284bea7d in
            __gnu_cxx::__verbose_terminate_handler() ()<br>
               from /usr/lib64/libstdc++.so.6<br>
            #3  0x0000003e284bcc06 in ?? () from
            /usr/lib64/libstdc++.so.6<br>
            #4  0x0000003e284bcc33 in std::terminate() () from
            /usr/lib64/libstdc++.so.6<br>
            #5  0x0000003e284bcd2e in __cxa_throw () from
            /usr/lib64/libstdc++.so.6<br>
            #6  0x00002aaaab45ad71 in
            thrust::detail::backend::cuda::malloc<0u>
            (n=375300596006272)<br>
                at malloc.inl:50<br>
            #7  0x00002aaaab454322 in
            thrust::detail::backend::dispatch::malloc<0u>
            (n=375300596006272)<br>
                at malloc.h:56<br>
            #8  0x00002aaaab453555 in thrust::device_malloc
            (n=375300596006272) at device_malloc.inl:32<br>
            #9  0x00002aaaab46477d in
            thrust::device_malloc<double> (n=46912574500784)<br>
                at device_malloc.inl:38<br>
            #10 0x00002aaaab461fce in
            thrust::device_malloc_allocator<double>::allocate (<br>
                this=0x7fffffff9880, cnt=46912574500784) at
            device_malloc_allocator.h:101<br>
            #11 0x00002aaaab45ee91 in
            thrust::detail::contiguous_storage<double,
            thrust::device_malloc_allocator<double> >::allocate
            (this=0x7fffffff9880, n=46912574500784)<br>
                at contiguous_storage.inl:134<br>
            #12 0x00002aaaab46ebba in
            thrust::detail::contiguous_storage<double,
            thrust::device_malloc_allocator<double>
            >::contiguous_storage (this=0x7fffffff9880,
            n=46912574500784)<br>
                at contiguous_storage.inl:46<br>
            #13 0x00002aaaab46cd1e in
            thrust::detail::vector_base<double,
            thrust::device_malloc_allocator<double>
            >::fill_insert (this=0x13623990, position=...,
            n=46912574500784, <br>
                x=@0x7fffffff9f18) at vector_base.inl:792<br>
            #14 0x00002aaaab46b058 in
            thrust::detail::vector_base<double,
            thrust::device_malloc_allocator<double> >::insert
            (this=0x13623990, position=..., n=46912574500784,
            x=@0x7fffffff9f18)<br>
                at vector_base.inl:561<br>
            #15 0x00002aaaab4692a3 in
            thrust::detail::vector_base<double,
            thrust::device_malloc_allocator<double> >::resize
            (this=0x13623990, new_size=46912574500784,
            x=@0x7fffffff9f18)<br>
                at vector_base.inl:222<br>
            #16 0x00002aaaac2c3d9b in
            cusp::precond::smoothed_aggregation<int, double,
            thrust::detail::cuda_device_space_tag>::smoothed_aggregation<cusp::csr_matrix<int,
            double, thrust::detail::cuda_device_space_tag> >
            (this=0x136182b0, A=..., theta=0) at
            smoothed_aggregation.inl:210<br>
            #17 0x00002aaaac27cf84 in PCSetUp_SACUSP (pc=0x1360f330) at
            <a moz-do-not-send="true" href="http://sacusp.cu:76">sacusp.cu:76</a><br>
            #18 0x00002aaaac1f0024 in PCSetUp (pc=0x1360f330) at
            precon.c:832<br>
            #19 0x00002aaaabd02144 in KSPSetUp (ksp=0x135d2a00) at
            itfunc.c:261<br>
            #20 0x00002aaaabd0396e in KSPSolve (ksp=0x135d2a00,
            b=0x135a0fa0, x=0x135a2b50)<br>
                at itfunc.c:385<br>
            #21 0x0000000000403619 in main (argc=17,
            args=0x7fffffffc538) at ex2.c:217<br>
            <br>
            <o:p></o:p></p>
          <div>
            <p class="MsoNormal">On Mon, Feb 27, 2012 at 4:48 PM, John
              Fettig <<a moz-do-not-send="true"
                href="mailto:john.fettig@gmail.com">john.fettig@gmail.com</a>>
              wrote:<o:p></o:p></p>
            <p class="MsoNormal">Hi Paul,<br>
              <br>
              This is very interesting.  I tried building the code with
              --download-txpetscgpu and it doesn't work for me.  It runs
              out of memory, no matter how small the problem (this is
              ex2 from src/ksp/ksp/examples/tutorials):<br>
              <br>
              mpirun -np 1 ./ex2 -n 10 -m 10 -ksp_type cg -pc_type
              sacusp -mat_type aijcusp -vec_type cusp
              -cusp_storage_format csr -use_cusparse 0<br>
              <br>
              terminate called after throwing an instance of
              'thrust::system::detail::bad_alloc'<br>
                what():  std::bad_alloc: out of memory<br>
              MPI Application rank 0 killed before MPI_Finalize() with
              signal 6<br>
              <br>
              This example works fine when I build without your gpu
              additions (and for much larger problems too).  Am I doing
              something wrong?<br>
              <br>
              For reference, I'm using CUDA 4.1, CUSP 0.3, and Thrust
              1.5.1<span style="color: rgb(136, 136, 136);"><br>
                <br>
                <span class="hoenzb">John</span></span><o:p></o:p></p>
            <div>
              <div>
                <p class="MsoNormal" style="margin-bottom: 12pt;"><o:p> </o:p></p>
                <div>
                  <p class="MsoNormal">On Fri, Feb 10, 2012 at 5:04 PM,
                    Paul Mullowney <<a moz-do-not-send="true"
                      href="mailto:paulm@txcorp.com" target="_blank">paulm@txcorp.com</a>>
                    wrote:<o:p></o:p></p>
                  <p class="MsoNormal">Hi All,<br>
                    <br>
                    I've been developing GPU capabilities for PETSc. The
                    development has focused mostly on<br>
                    (1) An efficient multi-GPU SpMV, i.e. MatMult. This
                    is working well.<br>
                    (2) Triangular Solve used in ILU preconditioners;
                    i.e. MatSolve. The performance of this ... is what
                    it is :|<br>
                    This code is in beta mode. Keep that in mind, if you
                    decide to use it. It supports single and double
                    precision, real numbers only! Complex will be
                    supported at some point in the future, but not any
                    time soon.<br>
                    <br>
                    To build with these capabilities, add the following
                    to your configure line.<br>
                    --download-txpetscgpu=yes<br>
                    <br>
                    The capabilities of the SpMV code are accessed with
                    the following 2 command line flags<br>
                    -cusp_storage_format csr (other options are coo
                    (coordinate), ell (ellpack), dia (diagonal). hyb
                    (hybrid) is not yet supported)<br>
                    -use_cusparse (this is a boolean and at the moment
                    is only supported with csr format matrices. In the
                    future, cusparse will work with ell, coo, and hyb
                    formats).<br>
                    <br>
                    Regarding the number of GPUs to run on:<br>
                    Imagine a system with P nodes, N cores per node, and
                    M GPUs per node. Then, to use only the GPUs, I would
                    run with M ranks per node over P nodes.  As an
                    example, I have a system with 2 nodes. Each node has
                    8 cores, and 4 GPUs attached to each node (P=2, N=8,
                    M=4). In a PBS queue script, one would use 2 nodes
                    at 4 processors per node. Each mpi rank (CPU
                    processor) will be attached to a GPU.<br>
                    <br>
                    You do not need to explicitly manage the GPUs, apart
                    from understanding what type of system you are
                    running on. To learn how many devices are available
                    per node, use the command line flag:<br>
                    -cuda_show_devices<span style="color: rgb(136, 136,
                      136);"><br>
                      <br>
                      -Paul</span><o:p></o:p></p>
                </div>
                <p class="MsoNormal"><o:p> </o:p></p>
              </div>
            </div>
          </div>
          <p class="MsoNormal"><o:p> </o:p></p>
        </div>
      </div>
    </blockquote>
    <br>
  </body>
</html>