Skip to content

Commit

Permalink
Merge pull request #3 from cubicibo/MR/managepalette
Browse files Browse the repository at this point in the history
Implement ahead of time decoding, double buffering and event filtering.
  • Loading branch information
cubicibo authored Aug 2, 2023
2 parents d8975f2 + 37e611e commit 60cc4ad
Show file tree
Hide file tree
Showing 13 changed files with 781 additions and 804 deletions.
36 changes: 21 additions & 15 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,30 @@
# SUPer
SUPer is a subtitle rendering and manipulation tool specifically for the PGS (SUP) format. Unlike any other .SUP exporting tools, SUPer re-renders the subtitles graphics internally to make full use of the the BDSup format. Caption files generated with SUPer can feature softsub karaokes, masking and fades and are likely to work nicely on your favorite Blu-Ray player.
SUPer is a tool to convert BDNXML+PNG assets to Blu-ray SUP subtitles.
Unlike any other .SUP conversion tools, SUPer analyzes and re-renders the subtitles graphics internally to make full use of the the BD SUP format (Presentation Graphic Stream). Caption files generated with SUPer can feature softsub karaokes, masking, fades and basic moves and are guaranteed to work nicely on your favorite Blu-ray player.

## Usage
SUPer is made easy to use with the graphical user interface `supergui.py` - it lets you choose your input BDNXML, the output file name and optionally a SUP file to merge with. A command line client is also available as `supercli.py`. See below for further details.

## Suggested workflow
- Generate a BDNXML with PNG assets using ass2bdnxml, avs2bdnxml or SubtitleEdit.
- Use SUPer to convert the BDNXML to a BD SUP; simply load a BDNXML file in the GUI, set an output file and have an espresso while the fan spins.
The common usage is the following:
- Generate a BDNXML with PNG assets using [ass2bdnxml](https://github.com/cubicibo/ass2bdnxml) or avs2bdnxml.
- Use SUPer to convert the BDNXML to a Blu-ray SUP; simply load a BDNXML file in the GUI, set an output file and have an espresso while the fan spins.

## GUI client
This is the client executed when you download the stand-alone binary or when you run `python3 supergui.py`. Its interface is very simple but extensive for all types of conversion.
This is the client executed when you download the stand-alone binary or when you run `python3 supergui.py`. Its interface is very simple but extensive for all types of conversion. The GUI is always running aside a command-line window which gives the conversion progress and logging information.

- Select the input BDN XML file. The file must resides in the same directory as the PNG assets.
- Select the desired output file and extension using the Windows explorer.
- "Make it SUPer" starts the conversion process. The actual conversion progress is printed in the command line window.

The GUI supports two output format: SUP and PES+MUI (Scenarist BD).

## Command line client
`supercli.py` is essentially the command line equivalent to `supergui.py`.

### Usage
### CLI Usage
`python3 supercli.py [PARAMETERS] outputfile`

### Parameters
### CLI Parameters
```
-i, --input Input BDNXML file.
-c, --compression Time threshold for acquisitions. [int, 0-100, def: 80],
Expand All @@ -33,17 +40,16 @@ This is the client executed when you download the stand-alone binary or when you
```
The output file extension is used to infer the desired output type (SUP or PES).

## Misc
Some misc and trivia about SUPer, how it works and manages to generate complex stream with animations that work on hardware decoders:
## How SUPer works
SUPer implements a conversion engine that uses the entirety of the PG specs described in the two patents US8638861B2 and US20090185789A1. PG decoders, while designed to be as cheap as possible, feature a few nifty capabilities that includes palette updates, object redefinition, object cropping and events buffering.

### Behind the scene
SUPer tries to re-use existing object in the stream and exploits the PG decoders capabilites like palette updates to encode animations. This saves bandwidth significantly and enables to perform animations that are otherwise impossible due to hardware limitations of the bandwidth limited PG object decoder.
SUPer analyzes each input images and encodes a sequence of similar images together into a single presentation graphic (bitmap). This PG object has the animation encoded in it and a sequence of palette updates will display the sequence of images. This dramatically reduces the decoding and composition bandwidth and allows for complex animations to be performed while the hardware PG decoder is busy decoding the next PG objects.

### PGS Limitations to keep in mind
- There are only two PGS objects on screen at a time. SUPer puts as many subtitles lines as it can to a single PGS object and minimizes the windows areas in which the said objects are displayed. Palette updates are then used to eventually display/undisplay specific lines associated to a given object.
- A hardware PG decoder has a limited bandwidth and can refresh an object only ever so often. SUPer distributes the object definitions in the stream to ease the work of the decoder. SUPer then uses palette updates to link the missing "steps" between two objects definition. However, SUPer defines the steps depending of a similarity measure with the previous bitmaps. If it changes too much, SUPer is obligated to insert the new object in the stream as visual quality remains the most important aspect.

- A hardware PG decoder has a limited bandwidth and can refresh an object ever so often. SUPer distributes the object definitions in the stream and uses double buffering to ease the work of the decoder. However, the bigger the objects (= windows), the longer they will take to decode. SUPer may be obligated to drop events every now and then if an event can't be decoded and displayed in due time. This will happen frequently if the graphics differ excessively between successive events.
- Moves, within a reasonable area, are doable at lower framerates like 23.976, 24 or 25. The ability to perform moves lowers if the epoch is complex or if the PG windows within which the object is displayed are large.

## Special thanks
- TheScorpius666, NLScavenger, Prince 7, Masstock
- FFmpeg libavcodec pgssubdec authors
- TheScorpius666, Masstock, NLScavenger, Prince 7
- FFmpeg libavcodec pgssubdec authors
2 changes: 1 addition & 1 deletion SUPer/__metadata__.py
Original file line number Diff line number Diff line change
Expand Up @@ -18,7 +18,7 @@

__MAJOR = 0
__MINOR = 1
__REVISION = 7
__REVISION = 8

__name__ = "SUPer"
__version__ = '.'.join(map(str, [__MAJOR, __MINOR, __REVISION]))
Expand Down
94 changes: 32 additions & 62 deletions SUPer/interface.py
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@
from scenaristream import EsMuiStream

from .utils import Shape, TimeConv as TC, _pinit_fn, get_super_logger
from .pgraphics import PGDecoder
from .render2 import GroupingEngine, WOBSAnalyzer, is_compliant
from .filestreams import BDNXML, SUPFile

Expand All @@ -33,11 +34,7 @@ def __init__(self, bdnf: str, kwargs: dict[str, int]) -> None:
self.bdn_file = bdnf
self._epochs = []
self.skip_errors = kwargs.pop("skip_errors", False)
#Leave norm threshold to zero, it can generate unexpected behaviours.
#Colors should be 256. Anything above is illegal, anything below results in a
# loss of quality.
self.kwargs = {'colors': 256}
self.kwargs |= kwargs
self.kwargs = kwargs

def optimise(self) -> None:
kwargs = self.kwargs
Expand All @@ -54,76 +51,62 @@ def optimise(self) -> None:
sys.exit(1)

clip_framerate = bdn.fps
if self.kwargs.pop('adjust_dropframe', False):
if self.kwargs.get('adjust_dropframe', False):
if isinstance(bdn.fps, float):
bdn.fps = round(bdn.fps)
logger.info(f"NTSC timing flag: using {bdn.fps} for timestamps rather than BDNXML {clip_framerate:.03f}.")
logger.info(f"NTSC timing flag: using {round(bdn.fps)} for timestamps rather than BDNXML {clip_framerate:.03f}.")
else:
self.kwargs['adjust_dropframe'] = False
logger.warning("Ignored NDF flag with integer framerate.")

logger.info("Finding epochs...")

#Empirical max: we need <=6 frames @23.976 to clear the buffers and windows.
# This is doing coarse epoch definitions, without any consideration to
# what's being displayed on screen.
delay_refresh = 0.01+0.25*np.multiply(*bdn.format.value)/(1920*1080)
for group in bdn.groups(delay_refresh):
offset = len(group)-1
#In the worst case, there is a single composition object for the whole screen.
screen_area = np.multiply(*bdn.format.value)
epochstart_dd_fn = lambda o_area: max(PGDecoder.copy_gp_duration(screen_area), PGDecoder.decode_obj_duration(o_area)) + PGDecoder.copy_gp_duration(o_area)
#Round up to tick
epochstart_dd_fnr = lambda o_area: round(epochstart_dd_fn(o_area)*PGDecoder.FREQ)/PGDecoder.FREQ

for group in bdn.groups(epochstart_dd_fn(screen_area)):
subgroups = []
last_split = len(group)
largest_shape = Shape(0, 0)

#Backward pass for fine epochs definition
# We consider the delay between events and the size of the overall
# graphic that we want to display.
for k, event in enumerate(reversed(group[1:])):
offset -= 1
if np.multiply(*group[offset].shape) > np.multiply(*largest_shape):
largest_shape = event.shape
nf = TC.tc2f(event.tc_in, bdn.fps) - TC.tc2f(group[offset].tc_out, bdn.fps)

if nf > 0 and nf/bdn.fps > 3*_pinit_fn(largest_shape)/90e3:
subgroups.append(group[offset+1:last_split])
last_split = offset + 1
if group[offset+1:last_split] != []:
subgroups.append(group[offset+1:last_split])
if subgroups:
subgroups[-1].insert(0, group[0])
offset = len(group)
max_area = 0

for k, event in enumerate(reversed(group[1:]), 1):
max_area = max(np.multiply(*event.shape), max_area)

delay = TC.tc2s(event.tc_in, bdn.fps) - TC.tc2s(group[len(group)-k-1].tc_out, bdn.fps)
if epochstart_dd_fnr(max_area) <= delay:
subgroups.append(group[offset-k:offset])
offset -= len(subgroups[-1])
max_area = 0
if len(group[:offset]) > 0:
subgroups.append(group[:offset])
else:
subgroups = [[group[0]]]
assert offset == 0
assert sum(map(len, subgroups)) == len(group)

#Epoch generation (each subgroup will be its own epoch)
for subgroup in reversed(subgroups):
logger.info(f"Generating epoch {subgroup[0].tc_in}->{subgroup[-1].tc_out}...")
logger.info(f"Identified epoch {subgroup[0].tc_in}->{subgroup[-1].tc_out}:")

wob, box = GroupingEngine(n_groups=2, **kwargs).group(subgroup)
logger.info(f" => Screen layout: {len(wob)} window(s), analyzing objects...")

wobz = WOBSAnalyzer(wob, subgroup, box, clip_framerate, bdn, **kwargs)
epoch = wobz.analyze()
self._epochs.append(epoch)
logger.info(f" => optimised as {len(epoch)} display sets on {len(wob)} window(s).")
logger.info(f" => optimised as {len(epoch)} display sets.")
gc.collect()

if clip_framerate != bdn.fps:
self.ndf_shift(bdn, clip_framerate)

scaled_fps = False
if self.kwargs.get('scale_fps', False):
scaled_fps = self.scale_pcsfps()

if self.kwargs.get('enforce_dts', False):
self.compute_set_dts()

# Final check
is_compliant(self._epochs, bdn.fps * int(1+scaled_fps))
is_compliant(self._epochs, bdn.fps * int(1+scaled_fps), self.kwargs.get('enforce_dts', True))
####

def ndf_shift(self, bdn: BDNXML, clip_framerate: float) -> None:
adjustment_ratio = 1.001
for epoch in self._epochs:
for ds in epoch:
for seg in ds:
seg.pts = seg.pts*adjustment_ratio - 3/90e3

def scale_pcsfps(self) -> bool:
from SUPer.utils import BDVideo
pcs_fps = self._epochs[0].ds[0].pcs.fps.value
Expand All @@ -136,19 +119,6 @@ def scale_pcsfps(self) -> bool:
logger.error(f"Expexcted 25 or 30 fps for 2x scaling. Got '{BDVideo.LUT_FPS_PCSFPS[pcs_fps]}'.")
return scaled_fps

def compute_set_dts(self) -> None:
logger.info("Setting DTS values in the stream.")
prev_ds_pts = 0
for epoch in self._epochs:
for ds in epoch:
for seg in ds:
# -0.3735 because: (decode 4 MiB + screen flush + screen refresh)
# i.e this is the max shift we would need in the worst case
seg.dts = max(seg.pts - 0.3735, prev_ds_pts)
seg.dts = seg.pts #enforce == for END segment
# set DTS one tick in the future.
prev_ds_pts = seg.pts + 1/90e3

def merge(self, input_sup) -> None:
epochs = SUPFile(input_sup).epochs()
if not self._epochs:
Expand Down
116 changes: 4 additions & 112 deletions SUPer/optim.py
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,9 @@ def quantize(img: Image.Image, colors: int = 256, kmeans_quant: bool = False, km
#use cv2 for high transparency images, pillow has issues

alpha = np.asarray(img.split()[-1], dtype=np.uint16)
kmeans_fade = (np.mean(alpha[alpha > 0]) < 38) and kmeans_fade
non_tsp_pix = alpha[alpha > 0]
if non_tsp_pix.size > 0:
kmeans_fade = (np.mean(non_tsp_pix) < 38 * (1 + kwargs.get('tsp_thresh', 0))) and kmeans_fade

if kmeans_quant or kmeans_fade:
# Use PIL to get approximate number of clusters
Expand Down Expand Up @@ -95,7 +97,7 @@ def palettize_events(events: list[ImageEvent], flags: PalettizeMode,
:return: The events with optimised images.
"""
if 2 <= colors > 256:
raise ValueError("Palettization is always performed on 2<'colors'<=256.")
raise ValueError("Palettization is always performed on 2< colors <=256.")

if not PalettizeMode(flags):
logging.info("No known optimisation selected, skipping.")
Expand Down Expand Up @@ -213,115 +215,6 @@ def palettize_img(img: Image, pal: npt.NDArray[np.uint8], *,


class Optimise:
@staticmethod
def prepare_sequence(events: list[ImageEvent], **kwargs) -> tuple[npt.NDArray[np.uint8],
npt.NDArray[np.uint8],
list[int]]:
"""
This functions gets a list of image to optimize as a single one + PAL updates
:param events: Set of images events where a palette animation takes places.
:return: color look up table for each image (stacked), sequence of pixels and
length of each CLUT
"""

n_colors = kwargs.get('colors', 256)

maps = []
clut = []
clut_lens = []
for event in events:
img, img_pal = Preprocess.quantize(event.img, n_colors, **kwargs)
maps.append(img)

a = np.zeros((256, 4))
pilpal = list(img_pal.keys())
clut_lens.append(len(pilpal))

b = np.asarray(pilpal, dtype=np.uint8)
a[:b.shape[0], :b.shape[1]] = b
clut.append(a)

cluts = np.stack(clut).astype(np.uint8) # Stack all CLUT in the sequence
px_sequences = np.asarray(maps, dtype=np.uint8) #All cmaps

return cluts, px_sequences, clut_lens

@staticmethod
def solve_sequence(cluts, cmaps, clut_len, **kwargs) -> tuple[npt.NDArray[np.uint8],
npt.NDArray[np.uint8]]:
"""
This functions finds a solution for the provided subtitle animation.
:param cluts: Color look-up tables of each bitmap, stacked one after the other
:param cmaps: P-Images linked to their respective CLUT, stacked like CLUTs
:param clut_len: Length for each CLUT
:param **kwargs: Additional parameters to adjust inner params of the solver.
:return: P-Image for the PGStream, Sequence of RGBA values for the animation.
"""

N_SEQUENCES_MAX = kwargs.get('colors', 256)

sequences = [clut[cmap] for clut, cmap in zip(cluts, cmaps)]
sequences = np.stack(sequences, axis=2).astype(np.uint8) #(LEN_SEQ, H, W, RGBA=4)

#Find all sequences and count them
seq_occ = {}
for i in range(sequences.shape[0]):
for j in range(sequences.shape[1]):
seq = hash(sequences[i, j, :, :].data.tobytes())
try:
seq_occ[seq][0] += 1
except KeyError:
seq_occ[seq] = [1, sequences[i, j, :, :]]

#Sort sequences by commonness
seq_sorted = {k: v[1] for k, v in list(sorted(seq_occ.items(),
key=lambda item: item[1][0],
reverse=True))}

#Fill a new array with kept sequences to perform fast norm calculations
norm_mat = np.ndarray((N_SEQUENCES_MAX,
sequences[i,j,:,:].shape[0],
sequences[i,j,:,:].shape[1]))
seqs, cnt = {}, 0
remap = {}

for k, v in seq_sorted.items():
if cnt < N_SEQUENCES_MAX:
seqs[k] = (cnt, v) # cnt will be the CLUT id for v
norm_mat[cnt, :, :] = v
cnt += 1
elif k not in remap:
nm = np.linalg.norm(norm_mat - v[None, :], 2, axis=2)

id1 = np.argsort(np.sum(nm, axis=1))
id2 = np.argsort(np.sum(nm, axis=1)/np.sum(nm != 0, axis=1))

best_fit = np.abs(id1 - id2[:, None])
id1_i, id2_i = best_fit.argmin() % id1.size, best_fit.argmin()//id1.size

assert id1[id1_i] == id2[id2_i], "Something unconceivable has happened."

remap[k] = hash(norm_mat[id1[id1_i]].astype(np.uint8).data.tobytes())

out_map = np.zeros(sequences.shape[0:2], dtype=np.uint8)

for i in range(sequences.shape[0]):
for j in range(sequences.shape[1]):
seq_hash = hash(sequences[i, j, :, :].data.tobytes())
if seq_hash in seqs:
assert np.all(sequences[i, j] == seqs[seq_hash][1]), \
"Sequences did not match (hash collision?)"
out_map[i, j] = seqs[seq_hash][0]
elif seq_hash in remap:
out_map[i, j] = seqs[remap[seq_hash]][0]
else:
logging.error("Sequence not found in any map.")
# Output map, Sequences for the N_SEQUENCES_MAX RGBA values
return out_map, \
np.asarray(list(seq_sorted.values())[:N_SEQUENCES_MAX]).astype(np.uint8)

@staticmethod
def solve_sequence_fast(events, colors: int = 256, **kwargs) -> tuple[npt.NDArray[np.uint8], npt.NDArray[np.uint8]]:
"""
Expand All @@ -340,7 +233,6 @@ def solve_sequence_fast(events, colors: int = 256, **kwargs) -> tuple[npt.NDArra
sequences.append(clut[img])

sequences = np.stack(sequences, axis=2).astype(np.uint8)

#catalog the sequences
seq_occ: dict[int, tuple[int, npt.NDArray[np.uint8]]] = {}
for i in range(sequences.shape[0]):
Expand Down
Loading

0 comments on commit 60cc4ad

Please sign in to comment.