generate_compressor_model.py generates invalid .h files #27

hansmaad · 2017-02-11T09:18:24Z

I'm using a quite large file (7mb), ASCII chars only. The words look like this:

hninetrakierf hnisnoitknuf hrettuf hcsnedobSuf htsrUf g hbeilnetrag htsag...

With default options, the tool generates a model file where each element in successor_ids_by_chr_id_and_chr_id looks like this:

static const int8_t successor_ids_by_chr_id_and_chr_id[32][32] = {
  {-1, 9, 3, 10, 14, -1, 4, 1, 6, 5, -1, 0, ... -1, -1, -1, -1, -1, None},
  {0, -1, 1, 7, 4, 6, 2, 3, 5, 8, 12, 10, 1..., -1, -1, -1, -1, -1, None},

If you have no time to fix this, do you have any idea how I could repair this file?
If i replace None with -1, the compression is a little bit better than the default shoco_model.h file, but maybe I could do better?

The text was updated successfully, but these errors were encountered:

EverydayApps · 2019-12-20T18:02:28Z

Just for anyone else that stumbles upon this issue, I just ran this script, and got the same results.

I only have 30 unique characters in my data set, so I get two None's (Pythons null identifier), at each of those internal arrays. I'm going to guess that you have 31 unique characters in your dataset.

Those values are never read. So, anything between INT8_MIN and INT8_MAX (-128 and 127 maybe) should work fine. Or, just delete those slots and they should work also. Null char would probably work. Or junk. But, the loop that reads those values does short circuit if the value is less than zero, so if for some reason they were read (they won't be), you would want them to be less than zero.

There's probably a bug in the Python script that is printing the null identifier ("None") instead of the null value ('\0').

Count the number of elements inside chrs_by_chr_id[32]. For me, there's space for 32, but only thirty characters in the array. The last two are missing, and the compression loop breaks when it gets the null value, so it never gets to the inner part to read what the null value is. So, no optimization can be done.

Still, the Python script could be fixed.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

generate_compressor_model.py generates invalid .h files #27

generate_compressor_model.py generates invalid .h files #27

hansmaad commented Feb 11, 2017

EverydayApps commented Dec 20, 2019

generate_compressor_model.py generates invalid .h files #27

generate_compressor_model.py generates invalid .h files #27

Comments

hansmaad commented Feb 11, 2017

EverydayApps commented Dec 20, 2019