[petsc-dev] [Radev, Martin] Re: Adding a new encoding for FP data

Thu Jul 11 10:35:35 CDT 2019

  CPU to GPU? Especially matrices?

> On Jul 11, 2019, at 9:05 AM, Jed Brown via petsc-dev <petsc-dev at mcs.anl.gov> wrote:
> 
> Zstd is a remarkably good compressor.  I've experimented with it for
> compressing column indices for sparse matrices on structured grids and
> (after a simple transform: subtracting the row number) gotten
> decompression speed in the neighborhood of 10 GB/s (i.e., faster per
> core than DRAM).  I've been meaning to follow up.  The transformation
> described below (splitting the bytes) is yielding decompression speed
> around 1GB/s (in this link below), which isn't competitive for things
> like MatMult, but could be useful for things like trajectory
> checkpointing.
> 
> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view
> 
> 
> From: "Radev, Martin" <martin.radev at tum.de>
> Subject: Re: Adding a new encoding for FP data
> Date: July 11, 2019 at 4:55:03 AM CDT
> To: "dev at arrow.apache.org" <dev at arrow.apache.org>
> Cc: "Raoofy, Amir" <amir.raoofy at tum.de>, "Karlstetter, Roman" <roman.karlstetter at tum.de>
> Reply-To: <dev at arrow.apache.org>
> 
> 
> Hello Liya Fan,
> 
> 
> this explains the technique but for a more complex case:
> 
> https://fgiesen.wordpress.com/2011/01/24/x86-code-compression-in-kkrunchy/
> 
> For FP data, the approach which seemed to be the best is the following.
> 
> Say we have a buffer of two 32-bit floating point values:
> 
> buf = [af, bf]
> 
> We interpret each FP value as a 32-bit uint and look at each individual byte. We have 8 bytes in total for this small input.
> 
> buf = [af0, af1, af2, af3, bf0, bf1, bf2, bf3]
> 
> Then we apply stream splitting and the new buffer becomes:
> 
> newbuf = [af0, bf0, af1, bf1, af2, bf2, af3, bf3]
> 
> We compress newbuf.
> 
> Due to similarities the sign bits, mantissa bits and MSB exponent bits, we might have a lot more repetitions in data. For scientific data, the 2nd and 3rd byte for 32-bit data is probably largely noise. Thus in the original representation we would always have a few bytes of data which could appear somewhere else in the buffer and then a couple bytes of possible noise. In the new representation we have a long stream of data which could compress well and then a sequence of noise towards the end.
> 
> This transformation improved compression ratio as can be seen in the report.
> 
> It also improved speed for ZSTD. This could be because ZSTD makes a decision of how to compress the data - RLE, new huffman tree, huffman tree of the previous frame, raw representation. Each can potentially achieve a different compression ratio and compression/decompression speed. It turned out that when the transformation is applied, zstd would attempt to compress fewer frames and copy the other. This could lead to less attempts to build a huffman tree. It's hard to pin-point the exact reason.
> 
> I did not try other lossless text compressors but I expect similar results.
> 
> For code, I can polish my patches, create a Jira task and submit the patches for review.
> 
> 
> Regards,
> 
> Martin
> 
> 
> ________________________________
> From: Fan Liya <liya.fan03 at gmail.com>
> Sent: Thursday, July 11, 2019 11:32:53 AM
> To: dev at arrow.apache.org
> Cc: Raoofy, Amir; Karlstetter, Roman
> Subject: Re: Adding a new encoding for FP data
> 
> Hi Radev,
> 
> Thanks for the information. It seems interesting.
> IMO, Arrow has much to do for data compression. However, it seems there are
> some differences for memory data compression and external storage data
> compression.
> 
> Could you please provide some reference for stream splitting?
> 
> Best,
> Liya Fan
> 
> On Thu, Jul 11, 2019 at 5:15 PM Radev, Martin <martin.radev at tum.de> wrote:
> 
>> Hello people,
>> 
>> 
>> there has been discussion in the Apache Parquet mailing list on adding a
>> new encoder for FP data.
>> The reason for this is that the supported compressors by Apache Parquet
>> (zstd, gzip, etc) do not compress well raw FP data.
>> 
>> 
>> In my investigation it turns out that a very simple simple technique,
>> named stream splitting, can improve the compression ratio and even speed
>> for some of the compressors.
>> 
>> You can read about the results here:
>> https://drive.google.com/file/d/1wfLQyO2G5nofYFkS7pVbUW0-oJkQqBvv/view
>> 
>> 
>> I went through the developer guide for Apache Arrow and wrote a patch to
>> add the new encoding and test coverage for it.
>> 
>> I will polish my patch and work in parallel to extend the Apache Parquet
>> format for the new encoding.
>> 
>> 
>> If you have any concerns, please let me know.
>> 
>> 
>> Regards,
>> 
>> Martin
>> 
>> 
> 
>