[petsc-dev] [Radev, Martin] Re: Adding a new encoding for FP data

Thu Jul 11 10:49:17 CDT 2019

I don't know anything about zstd (or competitive compression) for GPU,
but doubt it works at the desired granularity.  I think SpMV on late-gen
CPUs can be accelerated by zstd column index compression, especially for
semi-structured problems, but likely also for unstructured problems
numbered by breadth-first search or similar.  But we'd need to demo that
use specifically.

"Smith, Barry F." <bsmith at mcs.anl.gov> writes:

>   CPU to GPU? Especially matrices?
>
>> On Jul 11, 2019, at 9:05 AM, Jed Brown via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
>> 
>> Zstd is a remarkably good compressor.  I've experimented with it for
>> compressing column indices for sparse matrices on structured grids and
>> (after a simple transform: subtracting the row number) gotten
>> decompression speed in the neighborhood of 10 GB/s (i.e., faster per
>> core than DRAM).  I've been meaning to follow up.  The transformation
>> described below (splitting the bytes) is yielding decompression speed
>> around 1GB/s (in this link below), which isn't competitive for things
>> like MatMult, but could be useful for things like trajectory
>> checkpointing.
>> 
>> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view
>> 
>> 
>> From: "Radev, Martin" <martin.radev at tum.de>
>> Subject: Re: Adding a new encoding for FP data
>> Date: July 11, 2019 at 4:55:03 AM CDT
>> To: "dev at arrow.apache.org" <dev at arrow.apache.org>
>> Cc: "Raoofy, Amir" <amir.raoofy at tum.de>, "Karlstetter, Roman" <roman.karlstetter at tum.de>
>> Reply-To: <dev at arrow.apache.org>
>> 
>> 
>> Hello Liya Fan,
>> 
>> 
>> this explains the technique but for a more complex case:
>> 
>> https://fgiesen.wordpress.com/2011/01/24/x86-code-compression-in-kkrunchy/
>> 
>> For FP data, the approach which seemed to be the best is the following.
>> 
>> Say we have a buffer of two 32-bit floating point values:
>> 
>> buf = [af, bf]
>> 
>> We interpret each FP value as a 32-bit uint and look at each individual byte. We have 8 bytes in total for this small input.
>> 
>> buf = [af0, af1, af2, af3, bf0, bf1, bf2, bf3]
>> 
>> Then we apply stream splitting and the new buffer becomes:
>> 
>> newbuf = [af0, bf0, af1, bf1, af2, bf2, af3, bf3]
>> 
>> We compress newbuf.
>> 
>> Due to similarities the sign bits, mantissa bits and MSB exponent bits, we might have a lot more repetitions in data. For scientific data, the 2nd and 3rd byte for 32-bit data is probably largely noise. Thus in the original representation we would always have a few bytes of data which could appear somewhere else in the buffer and then a couple bytes of possible noise. In the new representation we have a long stream of data which could compress well and then a sequence of noise towards the end.
>> 
>> This transformation improved compression ratio as can be seen in the report.
>> 
>> It also improved speed for ZSTD. This could be because ZSTD makes a decision of how to compress the data - RLE, new huffman tree, huffman tree of the previous frame, raw representation. Each can potentially achieve a different compression ratio and compression/decompression speed. It turned out that when the transformation is applied, zstd would attempt to compress fewer frames and copy the other. This could lead to less attempts to build a huffman tree. It's hard to pin-point the exact reason.
>> 
>> I did not try other lossless text compressors but I expect similar results.
>> 
>> For code, I can polish my patches, create a Jira task and submit the patches for review.
>> 
>> 
>> Regards,
>> 
>> Martin
>> 
>> 
>> ________________________________
>> From: Fan Liya <liya.fan03 at gmail.com>
>> Sent: Thursday, July 11, 2019 11:32:53 AM
>> To: dev at arrow.apache.org
>> Cc: Raoofy, Amir; Karlstetter, Roman
>> Subject: Re: Adding a new encoding for FP data
>> 
>> Hi Radev,
>> 
>> Thanks for the information. It seems interesting.
>> IMO, Arrow has much to do for data compression. However, it seems there are
>> some differences for memory data compression and external storage data
>> compression.
>> 
>> Could you please provide some reference for stream splitting?
>> 
>> Best,
>> Liya Fan
>> 
>> On Thu, Jul 11, 2019 at 5:15 PM Radev, Martin <martin.radev at tum.de> wrote:
>> 
>>> Hello people,
>>> 
>>> 
>>> there has been discussion in the Apache Parquet mailing list on adding a
>>> new encoder for FP data.
>>> The reason for this is that the supported compressors by Apache Parquet
>>> (zstd, gzip, etc) do not compress well raw FP data.
>>> 
>>> 
>>> In my investigation it turns out that a very simple simple technique,
>>> named stream splitting, can improve the compression ratio and even speed
>>> for some of the compressors.
>>> 
>>> You can read about the results here:
>>> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view
>>> 
>>> 
>>> I went through the developer guide for Apache Arrow and wrote a patch to
>>> add the new encoding and test coverage for it.
>>> 
>>> I will polish my patch and work in parallel to extend the Apache Parquet
>>> format for the new encoding.
>>> 
>>> 
>>> If you have any concerns, please let me know.
>>> 
>>> 
>>> Regards,
>>> 
>>> Martin
>>> 
>>> 
>> 
>>