[petsc-dev] [Radev, Martin] Re: Adding a new encoding for FP data
Smith, Barry F.
bsmith at mcs.anl.gov
Thu Jul 11 10:56:32 CDT 2019
Sorry, I wasn't clear. Just meant something simpler. Compress the matrix to copy it to the GPU for faster transfers (and uncompress it appropriately on the GPU).
Barry
> On Jul 11, 2019, at 10:49 AM, Jed Brown <jed at jedbrown.org> wrote:
>
> I don't know anything about zstd (or competitive compression) for GPU,
> but doubt it works at the desired granularity. I think SpMV on late-gen
> CPUs can be accelerated by zstd column index compression, especially for
> semi-structured problems, but likely also for unstructured problems
> numbered by breadth-first search or similar. But we'd need to demo that
> use specifically.
>
> "Smith, Barry F." <bsmith at mcs.anl.gov> writes:
>
>> CPU to GPU? Especially matrices?
>>
>>> On Jul 11, 2019, at 9:05 AM, Jed Brown via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
>>>
>>> Zstd is a remarkably good compressor. I've experimented with it for
>>> compressing column indices for sparse matrices on structured grids and
>>> (after a simple transform: subtracting the row number) gotten
>>> decompression speed in the neighborhood of 10 GB/s (i.e., faster per
>>> core than DRAM). I've been meaning to follow up. The transformation
>>> described below (splitting the bytes) is yielding decompression speed
>>> around 1GB/s (in this link below), which isn't competitive for things
>>> like MatMult, but could be useful for things like trajectory
>>> checkpointing.
>>>
>>> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view
>>>
>>>
>>> From: "Radev, Martin" <martin.radev at tum.de>
>>> Subject: Re: Adding a new encoding for FP data
>>> Date: July 11, 2019 at 4:55:03 AM CDT
>>> To: "dev at arrow.apache.org" <dev at arrow.apache.org>
>>> Cc: "Raoofy, Amir" <amir.raoofy at tum.de>, "Karlstetter, Roman" <roman.karlstetter at tum.de>
>>> Reply-To: <dev at arrow.apache.org>
>>>
>>>
>>> Hello Liya Fan,
>>>
>>>
>>> this explains the technique but for a more complex case:
>>>
>>> https://fgiesen.wordpress.com/2011/01/24/x86-code-compression-in-kkrunchy/
>>>
>>> For FP data, the approach which seemed to be the best is the following.
>>>
>>> Say we have a buffer of two 32-bit floating point values:
>>>
>>> buf = [af, bf]
>>>
>>> We interpret each FP value as a 32-bit uint and look at each individual byte. We have 8 bytes in total for this small input.
>>>
>>> buf = [af0, af1, af2, af3, bf0, bf1, bf2, bf3]
>>>
>>> Then we apply stream splitting and the new buffer becomes:
>>>
>>> newbuf = [af0, bf0, af1, bf1, af2, bf2, af3, bf3]
>>>
>>> We compress newbuf.
>>>
>>> Due to similarities the sign bits, mantissa bits and MSB exponent bits, we might have a lot more repetitions in data. For scientific data, the 2nd and 3rd byte for 32-bit data is probably largely noise. Thus in the original representation we would always have a few bytes of data which could appear somewhere else in the buffer and then a couple bytes of possible noise. In the new representation we have a long stream of data which could compress well and then a sequence of noise towards the end.
>>>
>>> This transformation improved compression ratio as can be seen in the report.
>>>
>>> It also improved speed for ZSTD. This could be because ZSTD makes a decision of how to compress the data - RLE, new huffman tree, huffman tree of the previous frame, raw representation. Each can potentially achieve a different compression ratio and compression/decompression speed. It turned out that when the transformation is applied, zstd would attempt to compress fewer frames and copy the other. This could lead to less attempts to build a huffman tree. It's hard to pin-point the exact reason.
>>>
>>> I did not try other lossless text compressors but I expect similar results.
>>>
>>> For code, I can polish my patches, create a Jira task and submit the patches for review.
>>>
>>>
>>> Regards,
>>>
>>> Martin
>>>
>>>
>>> ________________________________
>>> From: Fan Liya <liya.fan03 at gmail.com>
>>> Sent: Thursday, July 11, 2019 11:32:53 AM
>>> To: dev at arrow.apache.org
>>> Cc: Raoofy, Amir; Karlstetter, Roman
>>> Subject: Re: Adding a new encoding for FP data
>>>
>>> Hi Radev,
>>>
>>> Thanks for the information. It seems interesting.
>>> IMO, Arrow has much to do for data compression. However, it seems there are
>>> some differences for memory data compression and external storage data
>>> compression.
>>>
>>> Could you please provide some reference for stream splitting?
>>>
>>> Best,
>>> Liya Fan
>>>
>>> On Thu, Jul 11, 2019 at 5:15 PM Radev, Martin <martin.radev at tum.de> wrote:
>>>
>>>> Hello people,
>>>>
>>>>
>>>> there has been discussion in the Apache Parquet mailing list on adding a
>>>> new encoder for FP data.
>>>> The reason for this is that the supported compressors by Apache Parquet
>>>> (zstd, gzip, etc) do not compress well raw FP data.
>>>>
>>>>
>>>> In my investigation it turns out that a very simple simple technique,
>>>> named stream splitting, can improve the compression ratio and even speed
>>>> for some of the compressors.
>>>>
>>>> You can read about the results here:
>>>> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view
>>>>
>>>>
>>>> I went through the developer guide for Apache Arrow and wrote a patch to
>>>> add the new encoding and test coverage for it.
>>>>
>>>> I will polish my patch and work in parallel to extend the Apache Parquet
>>>> format for the new encoding.
>>>>
>>>>
>>>> If you have any concerns, please let me know.
>>>>
>>>>
>>>> Regards,
>>>>
>>>> Martin
>>>>
>>>>
>>>
>>>
More information about the petsc-dev
mailing list