Merge pull request #3 from cubicibo/MR/managepalette

Implement ahead of time decoding, double buffering and event filtering.
cubicibo · Aug 2, 2023 · 60cc4ad · 60cc4ad
2 parents d8975f2 + 37e611e
commit 60cc4ad
Show file tree

Hide file tree

Showing 13 changed files with 781 additions and 804 deletions.
diff --git a/README.md b/README.md
@@ -1,23 +1,30 @@
 # SUPer
-SUPer is a subtitle rendering and manipulation tool specifically for the PGS (SUP) format. Unlike any other .SUP exporting tools, SUPer re-renders the subtitles graphics internally to make full use of the the BDSup format. Caption files generated with SUPer can feature softsub karaokes, masking and fades and are likely to work nicely on your favorite Blu-Ray player.
+SUPer is a tool to convert BDNXML+PNG assets to Blu-ray SUP subtitles.
+Unlike any other .SUP conversion tools, SUPer analyzes and re-renders the subtitles graphics internally to make full use of the the BD SUP format (Presentation Graphic Stream). Caption files generated with SUPer can feature softsub karaokes, masking, fades and basic moves and are guaranteed to work nicely on your favorite Blu-ray player.
 
 ## Usage
 SUPer is made easy to use with the graphical user interface `supergui.py` - it lets you choose your input BDNXML, the output file name and optionally a SUP file to merge with. A command line client is also available as `supercli.py`. See below for further details.
 
-## Suggested workflow
-- Generate a BDNXML with PNG assets using ass2bdnxml, avs2bdnxml or SubtitleEdit.
-- Use SUPer to convert the BDNXML to a BD SUP; simply load a BDNXML file in the GUI, set an output file and have an espresso while the fan spins.
+The common usage is the following:
+- Generate a BDNXML with PNG assets using [ass2bdnxml](https://github.com/cubicibo/ass2bdnxml) or avs2bdnxml.
+- Use SUPer to convert the BDNXML to a Blu-ray SUP; simply load a BDNXML file in the GUI, set an output file and have an espresso while the fan spins.
 
 ## GUI client
-This is the client executed when you download the stand-alone binary or when you run `python3 supergui.py`. Its interface is very simple but extensive for all types of conversion.
+This is the client executed when you download the stand-alone binary or when you run `python3 supergui.py`. Its interface is very simple but extensive for all types of conversion. The GUI is always running aside a command-line window which gives the conversion progress and logging information.
+
+- Select the input BDN XML file. The file must resides in the same directory as the PNG assets.
+- Select the desired output file and extension using the Windows explorer.
+- "Make it SUPer" starts the conversion process. The actual conversion progress is printed in the command line window.
+
+The GUI supports two output format: SUP and PES+MUI (Scenarist BD).
 
 ## Command line client
 `supercli.py` is essentially the command line equivalent to `supergui.py`.
 
-### Usage
+### CLI Usage
 `python3 supercli.py [PARAMETERS] outputfile`
 
-### Parameters
+### CLI Parameters
 ```
  -i, --input         Input BDNXML file.
  -c, --compression   Time threshold for acquisitions. [int, 0-100, def: 80], 
@@ -33,17 +40,16 @@ This is the client executed when you download the stand-alone binary or when you
 ```
 The output file extension is used to infer the desired output type (SUP or PES).  
 
-## Misc
-Some misc and trivia about SUPer, how it works and manages to generate complex stream with animations that work on hardware decoders:
+## How SUPer works
+SUPer implements a conversion engine that uses the entirety of the PG specs described in the two patents US8638861B2 and US20090185789A1. PG decoders, while designed to be as cheap as possible, feature a few nifty capabilities that includes palette updates, object redefinition, object cropping and events buffering.
 
-### Behind the scene
-SUPer tries to re-use existing object in the stream and exploits the PG decoders capabilites like palette updates to encode animations. This saves bandwidth significantly and enables to perform animations that are otherwise impossible due to hardware limitations of the bandwidth limited PG object decoder.
+SUPer analyzes each input images and encodes a sequence of similar images together into a single presentation graphic (bitmap). This PG object has the animation encoded in it and a sequence of palette updates will display the sequence of images. This dramatically reduces the decoding and composition bandwidth and allows for complex animations to be performed while the hardware PG decoder is busy decoding the next PG objects.
 
 ### PGS Limitations to keep in mind
 - There are only two PGS objects on screen at a time. SUPer puts as many subtitles lines as it can to a single PGS object and minimizes the windows areas in which the said objects are displayed. Palette updates are then used to eventually display/undisplay specific lines associated to a given object.
-- A hardware PG decoder has a limited bandwidth and can refresh an object only ever so often. SUPer distributes the object definitions in the stream to ease the work of the decoder. SUPer then uses palette updates to link the missing "steps" between two objects definition. However, SUPer defines the steps depending of a similarity measure with the previous bitmaps. If it changes too much, SUPer is obligated to insert the new object in the stream as visual quality remains the most important aspect.
-
+- A hardware PG decoder has a limited bandwidth and can refresh an object ever so often. SUPer distributes the object definitions in the stream and uses double buffering to ease the work of the decoder. However, the bigger the objects (= windows), the longer they will take to decode. SUPer may be obligated to drop events every now and then if an event can't be decoded and displayed in due time. This will happen frequently if the graphics differ excessively between successive events.
+- Moves, within a reasonable area, are doable at lower framerates like 23.976, 24 or 25. The ability to perform moves lowers if the epoch is complex or if the PG windows within which the object is displayed are large.
 
 ## Special thanks
-- TheScorpius666, NLScavenger, Prince 7, Masstock
-- FFmpeg libavcodec pgssubdec authors
+- TheScorpius666, Masstock, NLScavenger, Prince 7
+- FFmpeg libavcodec pgssubdec authors
diff --git a/SUPer/__metadata__.py b/SUPer/__metadata__.py
@@ -18,7 +18,7 @@
 
 __MAJOR = 0
 __MINOR = 1
-__REVISION = 7
+__REVISION = 8
 
 __name__    = "SUPer"
 __version__ = '.'.join(map(str, [__MAJOR, __MINOR, __REVISION]))

diff --git a/SUPer/interface.py b/SUPer/interface.py
@@ -23,6 +23,7 @@
 from scenaristream import EsMuiStream
 
 from .utils import Shape, TimeConv as TC, _pinit_fn, get_super_logger
+from .pgraphics import PGDecoder
 from .render2 import GroupingEngine, WOBSAnalyzer, is_compliant
 from .filestreams import BDNXML, SUPFile
 
@@ -33,11 +34,7 @@ def __init__(self, bdnf: str, kwargs: dict[str, int]) -> None:
         self.bdn_file = bdnf
         self._epochs = []
         self.skip_errors = kwargs.pop("skip_errors", False)
-        #Leave norm threshold to zero, it can generate unexpected behaviours.
-        #Colors should be 256. Anything above is illegal, anything below results in a
-        # loss of quality.
-        self.kwargs = {'colors': 256}
-        self.kwargs |= kwargs
+        self.kwargs = kwargs
 
     def optimise(self) -> None:
         kwargs = self.kwargs
@@ -54,76 +51,62 @@ def optimise(self) -> None:
                 sys.exit(1)
 
         clip_framerate = bdn.fps
-        if self.kwargs.pop('adjust_dropframe', False):
+        if self.kwargs.get('adjust_dropframe', False):
             if isinstance(bdn.fps, float):
                 bdn.fps = round(bdn.fps)
-                logger.info(f"NTSC timing flag: using {bdn.fps} for timestamps rather than BDNXML {clip_framerate:.03f}.")
+                logger.info(f"NTSC timing flag: using {round(bdn.fps)} for timestamps rather than BDNXML {clip_framerate:.03f}.")
             else:
+                self.kwargs['adjust_dropframe'] = False
                 logger.warning("Ignored NDF flag with integer framerate.")
 
         logger.info("Finding epochs...")
 
-        #Empirical max: we need <=6 frames @23.976 to clear the buffers and windows.
-        # This is doing coarse epoch definitions, without any consideration to
-        # what's being displayed on screen.
-        delay_refresh = 0.01+0.25*np.multiply(*bdn.format.value)/(1920*1080)
-        for group in bdn.groups(delay_refresh):
-            offset = len(group)-1
+        #In the worst case, there is a single composition object for the whole screen.
+        screen_area = np.multiply(*bdn.format.value)
+        epochstart_dd_fn = lambda o_area: max(PGDecoder.copy_gp_duration(screen_area), PGDecoder.decode_obj_duration(o_area)) + PGDecoder.copy_gp_duration(o_area)
+        #Round up to tick
+        epochstart_dd_fnr = lambda o_area: round(epochstart_dd_fn(o_area)*PGDecoder.FREQ)/PGDecoder.FREQ
+
+        for group in bdn.groups(epochstart_dd_fn(screen_area)):
             subgroups = []
-            last_split = len(group)
-            largest_shape = Shape(0, 0)
-
-            #Backward pass for fine epochs definition
-            # We consider the delay between events and the size of the overall
-            # graphic that we want to display.
-            for k, event in enumerate(reversed(group[1:])):
-                offset -= 1
-                if np.multiply(*group[offset].shape) > np.multiply(*largest_shape):
-                    largest_shape = event.shape
-                nf = TC.tc2f(event.tc_in, bdn.fps) - TC.tc2f(group[offset].tc_out, bdn.fps)
-
-                if nf > 0 and nf/bdn.fps > 3*_pinit_fn(largest_shape)/90e3:
-                    subgroups.append(group[offset+1:last_split])
-                    last_split = offset + 1
-            if group[offset+1:last_split] != []:
-                subgroups.append(group[offset+1:last_split])
-            if subgroups:
-                subgroups[-1].insert(0, group[0])
+            offset = len(group)
+            max_area = 0
+
+            for k, event in enumerate(reversed(group[1:]), 1):
+                max_area = max(np.multiply(*event.shape), max_area)
+
+                delay = TC.tc2s(event.tc_in, bdn.fps) - TC.tc2s(group[len(group)-k-1].tc_out, bdn.fps)
+                if epochstart_dd_fnr(max_area) <= delay:
+                    subgroups.append(group[offset-k:offset])
+                    offset -= len(subgroups[-1])
+                    max_area = 0
+            if len(group[:offset]) > 0:
+                subgroups.append(group[:offset])
             else:
-                subgroups = [[group[0]]]
+                assert offset == 0
+            assert sum(map(len, subgroups)) == len(group)
 
             #Epoch generation (each subgroup will be its own epoch)
             for subgroup in reversed(subgroups):
-                logger.info(f"Generating epoch {subgroup[0].tc_in}->{subgroup[-1].tc_out}...")
+                logger.info(f"Identified epoch {subgroup[0].tc_in}->{subgroup[-1].tc_out}:")
+
                 wob, box = GroupingEngine(n_groups=2, **kwargs).group(subgroup)
+                logger.info(f" => Screen layout: {len(wob)} window(s), analyzing objects...")
 
                 wobz = WOBSAnalyzer(wob, subgroup, box, clip_framerate, bdn, **kwargs)
                 epoch = wobz.analyze()
                 self._epochs.append(epoch)
-                logger.info(f" => optimised as {len(epoch)} display sets on {len(wob)} window(s).")
+                logger.info(f" => optimised as {len(epoch)} display sets.")
             gc.collect()
 
-        if clip_framerate != bdn.fps:
-            self.ndf_shift(bdn, clip_framerate)
-
         scaled_fps = False
         if self.kwargs.get('scale_fps', False):
             scaled_fps = self.scale_pcsfps()
 
-        if self.kwargs.get('enforce_dts', False):
-            self.compute_set_dts()
-
         # Final check
-        is_compliant(self._epochs, bdn.fps * int(1+scaled_fps))
+        is_compliant(self._epochs, bdn.fps * int(1+scaled_fps), self.kwargs.get('enforce_dts', True))
     ####
 
-    def ndf_shift(self, bdn: BDNXML, clip_framerate: float) -> None:
-        adjustment_ratio = 1.001
-        for epoch in self._epochs:
-            for ds in epoch:
-                for seg in ds:
-                    seg.pts = seg.pts*adjustment_ratio - 3/90e3
-
     def scale_pcsfps(self) -> bool:
         from SUPer.utils import BDVideo
         pcs_fps = self._epochs[0].ds[0].pcs.fps.value
@@ -136,19 +119,6 @@ def scale_pcsfps(self) -> bool:
             logger.error(f"Expexcted 25 or 30 fps for 2x scaling. Got '{BDVideo.LUT_FPS_PCSFPS[pcs_fps]}'.")
         return scaled_fps
 
-    def compute_set_dts(self) -> None:
-        logger.info("Setting DTS values in the stream.")
-        prev_ds_pts = 0
-        for epoch in self._epochs:
-            for ds in epoch:
-                for seg in ds:
-                    # -0.3735 because: (decode 4 MiB + screen flush + screen refresh)
-                    # i.e this is the max shift we would need in the worst case
-                    seg.dts = max(seg.pts - 0.3735, prev_ds_pts)
-                seg.dts = seg.pts #enforce == for END segment
-                # set DTS one tick in the future.
-                prev_ds_pts = seg.pts + 1/90e3
-
     def merge(self, input_sup) -> None:
         epochs = SUPFile(input_sup).epochs()
         if not self._epochs:

diff --git a/SUPer/optim.py b/SUPer/optim.py
@@ -46,7 +46,9 @@ def quantize(img: Image.Image, colors: int = 256, kmeans_quant: bool = False, km
         #use cv2 for high transparency images, pillow has issues
 
         alpha = np.asarray(img.split()[-1], dtype=np.uint16)
-        kmeans_fade = (np.mean(alpha[alpha > 0]) < 38) and kmeans_fade
+        non_tsp_pix = alpha[alpha > 0]
+        if non_tsp_pix.size > 0:
+            kmeans_fade = (np.mean(non_tsp_pix) < 38 * (1 + kwargs.get('tsp_thresh', 0))) and kmeans_fade
 
         if kmeans_quant or kmeans_fade:
             # Use PIL to get approximate number of clusters
@@ -95,7 +97,7 @@ def palettize_events(events: list[ImageEvent], flags: PalettizeMode,
         :return:        The events with optimised images.
         """
         if 2 <= colors > 256:
-            raise ValueError("Palettization is always performed on 2<'colors'<=256.")
+            raise ValueError("Palettization is always performed on 2< colors <=256.")
 
         if not PalettizeMode(flags):
             logging.info("No known optimisation selected, skipping.")
@@ -213,115 +215,6 @@ def palettize_img(img: Image, pal: npt.NDArray[np.uint8], *,
 
 
 class Optimise:
-    @staticmethod
-    def prepare_sequence(events: list[ImageEvent], **kwargs) -> tuple[npt.NDArray[np.uint8],
-                                                             npt.NDArray[np.uint8],
-                                                             list[int]]:
-        """
-        This functions gets a list of image to optimize as a single one + PAL updates
-
-        :param events:  Set of images events where a palette animation takes places.
-        :return: color look up table for each image (stacked), sequence of pixels and
-                   length of each CLUT
-        """
-
-        n_colors = kwargs.get('colors', 256)
-
-        maps = []
-        clut = []
-        clut_lens = []
-        for event in events:
-            img, img_pal = Preprocess.quantize(event.img, n_colors, **kwargs)
-            maps.append(img)
-
-            a = np.zeros((256, 4))
-            pilpal = list(img_pal.keys())
-            clut_lens.append(len(pilpal))
-
-            b = np.asarray(pilpal, dtype=np.uint8)
-            a[:b.shape[0], :b.shape[1]] = b
-            clut.append(a)
-
-        cluts = np.stack(clut).astype(np.uint8) # Stack all CLUT in the sequence
-        px_sequences = np.asarray(maps, dtype=np.uint8) #All cmaps
-
-        return cluts, px_sequences, clut_lens
-
-    @staticmethod
-    def solve_sequence(cluts, cmaps, clut_len, **kwargs) -> tuple[npt.NDArray[np.uint8],
-                                                                  npt.NDArray[np.uint8]]:
-        """
-        This functions finds a solution for the provided subtitle animation.
-        :param cluts:  Color look-up tables of each bitmap, stacked one after the other
-        :param cmaps:  P-Images linked to their respective CLUT, stacked like CLUTs
-        :param clut_len: Length for each CLUT
-        :param **kwargs: Additional parameters to adjust inner params of the solver.
-
-        :return: P-Image for the PGStream, Sequence of RGBA values for the animation.
-        """
-
-        N_SEQUENCES_MAX = kwargs.get('colors', 256)
-
-        sequences = [clut[cmap] for clut, cmap in zip(cluts, cmaps)]
-        sequences = np.stack(sequences, axis=2).astype(np.uint8) #(LEN_SEQ, H, W, RGBA=4)
-
-        #Find all sequences and count them
-        seq_occ = {}
-        for i in range(sequences.shape[0]):
-            for j in range(sequences.shape[1]):
-                seq = hash(sequences[i, j, :, :].data.tobytes())
-                try:
-                    seq_occ[seq][0] += 1
-                except KeyError:
-                    seq_occ[seq] = [1, sequences[i, j, :, :]]
-
-        #Sort sequences by commonness
-        seq_sorted = {k: v[1] for k, v in list(sorted(seq_occ.items(),
-                                                      key=lambda item: item[1][0],
-                                                      reverse=True))}
-
-        #Fill a new array with kept sequences to perform fast norm calculations
-        norm_mat = np.ndarray((N_SEQUENCES_MAX,
-                               sequences[i,j,:,:].shape[0],
-                               sequences[i,j,:,:].shape[1]))
-        seqs, cnt = {}, 0
-        remap = {}
-
-        for k, v in seq_sorted.items():
-            if cnt < N_SEQUENCES_MAX:
-                seqs[k] = (cnt, v) # cnt will be the CLUT id for v
-                norm_mat[cnt, :, :] = v
-                cnt += 1
-            elif k not in remap:
-                nm = np.linalg.norm(norm_mat - v[None, :], 2, axis=2)
-
-                id1 = np.argsort(np.sum(nm, axis=1))
-                id2 = np.argsort(np.sum(nm, axis=1)/np.sum(nm != 0, axis=1))
-
-                best_fit = np.abs(id1 - id2[:, None])
-                id1_i, id2_i = best_fit.argmin() % id1.size, best_fit.argmin()//id1.size
-
-                assert id1[id1_i] ==  id2[id2_i], "Something unconceivable has happened."
-
-                remap[k] = hash(norm_mat[id1[id1_i]].astype(np.uint8).data.tobytes())
-
-        out_map = np.zeros(sequences.shape[0:2], dtype=np.uint8)
-
-        for i in range(sequences.shape[0]):
-            for j in range(sequences.shape[1]):
-                seq_hash = hash(sequences[i, j, :, :].data.tobytes())
-                if seq_hash in seqs:
-                    assert np.all(sequences[i, j] == seqs[seq_hash][1]), \
-                        "Sequences did not match (hash collision?)"
-                    out_map[i, j] = seqs[seq_hash][0]
-                elif seq_hash in remap:
-                    out_map[i, j] = seqs[remap[seq_hash]][0]
-                else:
-                    logging.error("Sequence not found in any map.")
-        # Output map, Sequences for the N_SEQUENCES_MAX RGBA values
-        return out_map, \
-            np.asarray(list(seq_sorted.values())[:N_SEQUENCES_MAX]).astype(np.uint8)
-
     @staticmethod
     def solve_sequence_fast(events, colors: int = 256, **kwargs) -> tuple[npt.NDArray[np.uint8], npt.NDArray[np.uint8]]:
         """
@@ -340,7 +233,6 @@ def solve_sequence_fast(events, colors: int = 256, **kwargs) -> tuple[npt.NDArra
             sequences.append(clut[img])
 
         sequences = np.stack(sequences, axis=2).astype(np.uint8)
-
         #catalog the sequences
         seq_occ: dict[int, tuple[int, npt.NDArray[np.uint8]]] = {}
         for i in range(sequences.shape[0]):