[petsc-dev] [Radev, Martin] Re: Adding a new encoding for FP data
Jed Brown
jed at jedbrown.org
Thu Jul 11 10:49:17 CDT 2019
I don't know anything about zstd (or competitive compression) for GPU,
but doubt it works at the desired granularity. I think SpMV on late-gen
CPUs can be accelerated by zstd column index compression, especially for
semi-structured problems, but likely also for unstructured problems
numbered by breadth-first search or similar. But we'd need to demo that
use specifically.
"Smith, Barry F." <bsmith at mcs.anl.gov> writes:
> CPU to GPU? Especially matrices?
>
>> On Jul 11, 2019, at 9:05 AM, Jed Brown via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
>>
>> Zstd is a remarkably good compressor. I've experimented with it for
>> compressing column indices for sparse matrices on structured grids and
>> (after a simple transform: subtracting the row number) gotten
>> decompression speed in the neighborhood of 10 GB/s (i.e., faster per
>> core than DRAM). I've been meaning to follow up. The transformation
>> described below (splitting the bytes) is yielding decompression speed
>> around 1GB/s (in this link below), which isn't competitive for things
>> like MatMult, but could be useful for things like trajectory
>> checkpointing.
>>
>> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view
>>
>>
>> From: "Radev, Martin" <martin.radev at tum.de>
>> Subject: Re: Adding a new encoding for FP data
>> Date: July 11, 2019 at 4:55:03 AM CDT
>> To: "dev at arrow.apache.org" <dev at arrow.apache.org>
>> Cc: "Raoofy, Amir" <amir.raoofy at tum.de>, "Karlstetter, Roman" <roman.karlstetter at tum.de>
>> Reply-To: <dev at arrow.apache.org>
>>
>>
>> Hello Liya Fan,
>>
>>
>> this explains the technique but for a more complex case:
>>
>> https://fgiesen.wordpress.com/2011/01/24/x86-code-compression-in-kkrunchy/
>>
>> For FP data, the approach which seemed to be the best is the following.
>>
>> Say we have a buffer of two 32-bit floating point values:
>>
>> buf = [af, bf]
>>
>> We interpret each FP value as a 32-bit uint and look at each individual byte. We have 8 bytes in total for this small input.
>>
>> buf = [af0, af1, af2, af3, bf0, bf1, bf2, bf3]
>>
>> Then we apply stream splitting and the new buffer becomes:
>>
>> newbuf = [af0, bf0, af1, bf1, af2, bf2, af3, bf3]
>>
>> We compress newbuf.
>>
>> Due to similarities the sign bits, mantissa bits and MSB exponent bits, we might have a lot more repetitions in data. For scientific data, the 2nd and 3rd byte for 32-bit data is probably largely noise. Thus in the original representation we would always have a few bytes of data which could appear somewhere else in the buffer and then a couple bytes of possible noise. In the new representation we have a long stream of data which could compress well and then a sequence of noise towards the end.
>>
>> This transformation improved compression ratio as can be seen in the report.
>>
>> It also improved speed for ZSTD. This could be because ZSTD makes a decision of how to compress the data - RLE, new huffman tree, huffman tree of the previous frame, raw representation. Each can potentially achieve a different compression ratio and compression/decompression speed. It turned out that when the transformation is applied, zstd would attempt to compress fewer frames and copy the other. This could lead to less attempts to build a huffman tree. It's hard to pin-point the exact reason.
>>
>> I did not try other lossless text compressors but I expect similar results.
>>
>> For code, I can polish my patches, create a Jira task and submit the patches for review.
>>
>>
>> Regards,
>>
>> Martin
>>
>>
>> ________________________________
>> From: Fan Liya <liya.fan03 at gmail.com>
>> Sent: Thursday, July 11, 2019 11:32:53 AM
>> To: dev at arrow.apache.org
>> Cc: Raoofy, Amir; Karlstetter, Roman
>> Subject: Re: Adding a new encoding for FP data
>>
>> Hi Radev,
>>
>> Thanks for the information. It seems interesting.
>> IMO, Arrow has much to do for data compression. However, it seems there are
>> some differences for memory data compression and external storage data
>> compression.
>>
>> Could you please provide some reference for stream splitting?
>>
>> Best,
>> Liya Fan
>>
>> On Thu, Jul 11, 2019 at 5:15 PM Radev, Martin <martin.radev at tum.de> wrote:
>>
>>> Hello people,
>>>
>>>
>>> there has been discussion in the Apache Parquet mailing list on adding a
>>> new encoder for FP data.
>>> The reason for this is that the supported compressors by Apache Parquet
>>> (zstd, gzip, etc) do not compress well raw FP data.
>>>
>>>
>>> In my investigation it turns out that a very simple simple technique,
>>> named stream splitting, can improve the compression ratio and even speed
>>> for some of the compressors.
>>>
>>> You can read about the results here:
>>> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view
>>>
>>>
>>> I went through the developer guide for Apache Arrow and wrote a patch to
>>> add the new encoding and test coverage for it.
>>>
>>> I will polish my patch and work in parallel to extend the Apache Parquet
>>> format for the new encoding.
>>>
>>>
>>> If you have any concerns, please let me know.
>>>
>>>
>>> Regards,
>>>
>>> Martin
>>>
>>>
>>
>>
More information about the petsc-dev
mailing list