From 17ce397eb907f893fdc611e6a2f8524e5c057e4a Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=ED=99=A9=EC=84=B1=ED=9D=AC/=EC=B0=A8=EC=84=B8=EB=8C=80=20?= =?UTF-8?q?Display=20Lab=28SR=29/=EC=82=BC=EC=84=B1=EC=A0=84=EC=9E=90?= Date: Tue, 28 Nov 2023 15:02:56 +0900 Subject: [PATCH 1/2] Fix #768, Improve scalable channel group and layer text --- index.bs | 129 ++++++++++++++++++++++++++++--------------------------- 1 file changed, 65 insertions(+), 64 deletions(-) diff --git a/index.bs b/index.bs index 2a60f573..9ef8b7de 100644 --- a/index.bs +++ b/index.bs @@ -697,34 +697,7 @@ audio_element_type: The type of audio representation. num_substreams specifies the number of [=Audio Substream=]s that are used to reconstruct this [=Audio Element=]. It SHALL NOT be set to 0. -audio_substream_id indicates the identifier for an [=Audio Substream=] which this [=Audio Element=] refers to. - -Let a particular [=Channel Group=]'s [=Audio Substream=]s be indexed as \(\left[c, n_c\right]\), where a [=Channel Group=] format is described in [[#scalablechannelaudio-channelgroupformat]] and -- \(c = \left[1, \ldots, C\right]\) is the [=Channel Group=] index and \(C\) is the number of [=Channel Group=]s. -- \(n_c = \left[1, \ldots, N_c\right]\) is the [=Audio Substream=] index in the \(c\)-th [=Channel Group=] and \(N_c\) is the number of [=Audio Substream=]s in the \(c\)-th [=Channel Group=]. - -Then, the i-th [=audio_element_obu/audio_substream_id=] maps to a [=Channel Group=]'s [=Audio Substream=]s as follows, where i is the index of the array: - -\[ -\left[ -\left[ 1, 1 \right], -\left[ 1, 2 \right], -\cdots, -\left[ 1, N_1 \right], -\left[ 2, 1 \right], -\left[ 2, 2 \right], -\cdots, -\left[ 2, N_2 \right], -\cdots, -\left[ C, 1 \right], -\left[ C, 2 \right], -\cdots, -\left[ C, N_c \right] -\right] -\] - -The order of the [=Audio Substream=]s in each [=Channel Group=] (i.e., the semantics of \(n_c\)) is specified in [[#syntax-scalable-channel-layout-config]]. - +audio_substream_id indicates the identifier for an [=Audio Substream=] which this [=Audio Element=] refers to. When [=audio_element_type=] is CHANNEL_BASED, the ordering of [=audio_element_obu/audio_substream_id=]s within this loop SHALL comply with [[#scalablechannelaudio-orderingofaudiosubstreamidentifiers]]. num_parameters specifies the number of [=Parameter Substream=]s that are used by the algorithms specified in this [=Audio Element=]. - When [=audio_element_type=] = 0, this field SHALL be set to 0, 1, or 2. @@ -910,25 +883,6 @@ class ChannelAudioLayerConfig(i) { } ``` -When an [=Audio Element=] is composed of \(G(r)\) number of [=Audio Substream=]s, its scalable channel audio representation is layered into \(r\) [=num_layers=] of [=Channel Group=]s. - - -- The order of the [=Channel Group=]s in each [=Temporal Unit=] SHALL be same as the order of the [=channel_audio_layer_config=]s in [=ScalableChannelLayoutConfig()=]. -- The \(q\)-th [=Channel Group=] consists of \(G(q) - G(q - 1)\) number of [=Audio Substream=]s, where \(q = 1, 2, \ldots, r\) and \(G(0) = 0\). -- Let the term "Audio Frames" mean the set of all [=Audio Frame OBU=]s (for this [=Audio Element=]) that have the same start timestamp. All Audio Frames in an [=IA Sequence=] SHALL have the same number of [=Audio Frame OBU=]s. -- [=Parameter Block OBU=]s MAY be associated with Audio Frames. - -
-
Immersive Audio Sequence with scalable channel audio (before OBU packing). See [[#standalone]] for related details on OBU ordering within an IA Sequence.
- - -Each [=Channel Group=] (or scalable channel audio layer) is associated with a different [=loudspeaker_layout=]. The IA decoder SHALL select one of the layers according to the following rules, in order: - -- The IA decoder SHOULD first attempt to select the layer with a [=loudspeaker_layout=] that matches the physical playback layout. -- If there is no match, the IA decoder SHOULD select the layer with the closest [=loudspeaker_layout=] to the physical layout and then apply up- or down-mixing appropriately, after decoding and reconstruction of the channel audio. Sections [[#iamfgeneration-scalablechannelaudio-downmixmechanism]] and [[#processing-downmixmatrix]] provide examples of dynamic and static down-mixing matrices for some common layouts that MAY be used. - -The relationship among all [=Channel Group=]s for the given scalable channel audio representation SHALL comply with [[#scalablechannelaudio-channelgroupformat]] and the relationship among all channel layouts indicated by [=loudspeaker_layout=]s specified in an [=Audio Element OBU=] SHALL comply with [[#scalablechannelaudio-channellayoutgenerationrule]]. - Semantics num_layers indicates the number of [=Channel Group=]s for scalable channel audio. It SHALL NOT be set to zero and its maximum value SHALL be 6. @@ -1029,15 +983,34 @@ Bit position : Channel Name output_gain indicates the gain value to be applied to the mixed channels which are indicated by [=output_gain_flags=], where each mixed channel is generated by down-mixing two or more input channels. It is computed as \(20 \times \log_{10}(f)\), where \(f\) is the factor by which to scale the mixed channels. It is stored as a 16-bit, signed, two’s complement fixed-point value with 8 fractional bits (i.e., Q7.8)([[Q-Format]]). +### Scalable Channel Group and Layout ### {#scalalechannelaudio-channelgroupandlayout} + +When an [=Audio Element=] is composed of \(G(r)\) number of [=Audio Substream=]s, its scalable channel audio representation is layered into \(r\) [=num_layers=] of [=Channel Group=]s. +- The order of the [=Channel Group=]s in each [=Temporal Unit=] SHALL be same as the order of the [=channel_audio_layer_config=]s in [=ScalableChannelLayoutConfig()=]. +- The \(q\)-th [=Channel Group=] consists of \(G(q) - G(q - 1)\) number of [=Audio Substream=]s, where \(q = 1, 2, \ldots, r\) and \(G(0) = 0\). +- Let the term "Audio Frames" mean the set of all [=Audio Frame OBU=]s (for this [=Audio Element=]) that have the same start timestamp. All Audio Frames in an [=IA Sequence=] SHALL have the same number of [=Audio Frame OBU=]s. +- [=Parameter Block OBU=]s MAY be associated with Audio Frames. + +
+
Immersive Audio Sequence with scalable channel audio (before OBU packing). See [[#standalone]] for related details on OBU ordering within an IA Sequence.
+ + +Each [=Channel Group=] (or scalable channel audio layer) is associated with a different [=loudspeaker_layout=]. The IA decoder SHALL select one of the layers according to the following rules, in order: + +- The IA decoder SHOULD first attempt to select the layer with a [=loudspeaker_layout=] that matches the physical playback layout. +- If there is no match, the IA decoder SHOULD select the layer with the closest [=loudspeaker_layout=] to the physical layout and then apply up- or down-mixing appropriately, after decoding and reconstruction of the channel audio. Sections [[#iamfgeneration-scalablechannelaudio-downmixmechanism]] and [[#processing-downmixmatrix]] provide examples of dynamic and static down-mixing matrices for some common layouts that MAY be used. + +The relationship among all [=Channel Group=]s for the given scalable channel audio representation SHALL comply with [[#scalablechannelaudio-channelgroupformat]] and the relationship among all channel layouts indicated by [=loudspeaker_layout=]s specified in an [=Audio Element OBU=] SHALL comply with [[#scalablechannelaudio-channellayoutgenerationrule]]. + #### Channel Layout Generation Rule #### {#scalablechannelaudio-channellayoutgenerationrule} This section describes the generation rule for channel layouts for scalable channel audio. -For a given channel layout (CL #n) of a channel-based input [=3D audio signal=], any list of CLs ({CL #i: i = 1, 2, ..., n}) for scalable channel audio SHALL conform with the following rules: -- Xi ≤ Xi+1 and Yi ≤ Yi+1 and Zi ≤ Zi+1 except Xi = Xi+1, Yi = Yi+1 and Zi = Zi+1 for i = n-1, n-2, ..., 1, where the i-th channel layout CL #i = Xi.Yi.Zi, Xi is the number of surround channels, Yi is the number of LFE channels, and Zi is the number of height channels. -- CL #i is one of the [=loudspeaker_layout=]s supported in this version of the specification. +For a given channel layout (\(CL \text{#}n\)) of a channel-based input [=3D audio signal=], any list of CLs (\({CL \text{#}i: i = 1, 2, \ldots, n}\)) for scalable channel audio SHALL conform with the following rules: +- \(\text{Xi} \le \text{Xi+1}\) and \(\text{Yi} \le \text{Yi+1}\) and \(\text{Zi} \le \text{Zi+1}\) except \(\text{Xi} = \text{Xi+1}\), \(\text{Yi} = \text{Yi+1}\) and \(\text{Zi} = \text{Zi+1}\) for \(i = n-1, n-2, \ldots, 1\), where the \(i\)-th channel layout \(CL \text{#}i = \text{Xi}.\text{Yi}.\text{Zi}\), \(\text{Xi}\) is the number of surround channels, \(\text{Yi}\) is the number of LFE channels, and \(\text{Zi}\) is the number of height channels. +- \(CL \text{#}i\) is one of the [=loudspeaker_layout=]s supported in this version of the specification. -Scalable channel audio with [=num_layers=] > 1 SHALL only allow down-mix paths that conform to the rules above, as depicted in the figure below. +Scalable channel audio with [=num_layers=] \(> 1\) SHALL only allow down-mix paths that conform to the rules above, as depicted in the figure below.
IA Down-mix Path for scalable channel audio
@@ -1045,19 +1018,47 @@ Scalable channel audio with [=num_layers=] > 1 SHALL only allow down-mix paths t #### Channel Group Format #### {#scalablechannelaudio-channelgroupformat} The [=Channel Group=] format SHALL conform to the following rules: -- It consists of C number of channels and is structured to n number of [=Channel Group=]s, where C is the number of channels for the input [=3D audio signal=]. -- [=Channel Group=] #1 (as called BCG): This [=Channel Group=] is the [=down-mixed audio=] itself for CL #1 generated from the input [=3D audio signal=]. It contains a C1 number of channels. -- [=Channel Group=] #i (as called DCG, i = 2, 3, …, n): This [=Channel Group=] contains (Ci – Ci-1) number of channels. (Ci – Ci-1) channel(s) consists of as follows: - - (Xi – Xi-1) surround channel(s) if Xi > Xi-1 . When \(S_{\text{set}} = \{x \mid \text{Xi}-1 < x \le \text{Xi}\} \) and \(x\) is an integer, - - If 2 is an element of \(S_{\text{set}}\), the L2 channel is contained in this CG #i. - - If 3 is an element of \(S_{\text{set}}\), the Center channel is contained in this CG #i. - - If 5 is an element of \(S_{\text{set}}\), the L5 and R5 channels are contained in this CG #i. - - If 7 is an element of \(S_{\text{set}}\), the Lss7 and Rss7 channels are contained in this CG #i. - - The LFE channel if Yi > Yi-1. - - (Zi - Zi-1) top channels if Zi > Zi-1. - - If Zi-1 = 0, the top channels of the [=down-mixed audio=] for CL #i are contained in this [=Channel Group=] #i. - - If Zi-1 = 2, the Ltf and Rtf channels of the [=down-mixed audio=] for CL #i are contained in this [=Channel Group=] #i. - - Where Xi.Yi.Zi denotes the channel layout in CL #i, where Xi is the number of surround channels, Yi is the number of LFE channels and Zi is the number of height channels. +- It consists of C number of channels and is structured to \(r\) number of [=Channel Group=]s, where \(C\) is the number of channels for the input [=3D audio signal=]. +- [=Channel Group=] \(\text{#}1\) (as called BCG): This [=Channel Group=] is the [=down-mixed audio=] itself for \(CL \text{#}1\) generated from the input [=3D audio signal=]. It contains a \(C1\) number of channels. +- [=Channel Group=] \(\text{#}i\) (as called DCG, \(i = 2, 3, \ldots, n)\): This [=Channel Group=] contains (\(\text{Ci} – \text{Ci}-1)\) number of channels. \((\text{Ci} – \text{Ci}-1)\) channel(s) consists of as follows: + - \((\text{Xi} – \text{Xi-1})\) surround channel(s) if \(\text{Xi} > \text{Xi-1}\) . When \(S_{\text{set}} = \{x \mid \text{Xi-1} < x \le \text{Xi}\} \) and \(x\) is an integer, + - If 2 is an element of \(S_{\text{set}}\), the L2 channel is contained in this \(CG \text{#}i\). + - If 3 is an element of \(S_{\text{set}}\), the Center channel is contained in this \(CG \text{#}i\). + - If 5 is an element of \(S_{\text{set}}\), the L5 and R5 channels are contained in this \(CG \text{#}i\). + - If 7 is an element of \(S_{\text{set}}\), the Lss7 and Rss7 channels are contained in this \(CG \text{#}i\). + - The LFE channel if \(\text{Yi} > \text{Yi-1}\). + - \((\text{Zi} - \text{Zi-1})\) top channels if \(\text{Zi} > \text{Zi-1}\). + - If \(\text{Zi-1} = 0\), the top channels of the [=down-mixed audio=] for \(CL \text{#}i\) are contained in this [=Channel Group=] \(\text{#}i\). + - If \(\text{Zi-1} = 2\), the Ltf and Rtf channels of the [=down-mixed audio=] for \(CL \text{#}i\) are contained in this [=Channel Group=] \(\text{#}i\). + - Where \(\text{Xi}.\text{Yi}.\text{Zi}\) denotes the channel layout in \(CL \text{#}i\), where \(\text{Xi}\) is the number of surround channels, \(\text{Yi}\) is the number of LFE channels and \(\text{Zi}\) is the number of height channels. + +#### Ordering of Audio Substream Identifers #### {#scalablechannelaudio-orderingofaudiosubstreamidentifiers} + +Let a particular [=Channel Group=]'s [=Audio Substream=]s be indexed as \(\left[c, n_c\right]\), where a [=Channel Group=] format is described in [[#scalablechannelaudio-channelgroupformat]] and +- \(c\) is the [=Channel Group=] index, where \(c = 1, 2, \ldots, C\) and \(C\) is the number of [=Channel Group=]s. +- \(n_c\) is the [=Audio Substream=] index in the \(c\)-th [=Channel Group=], where \(n_c = 1, 2, \ldots, N_c\) and \(N_c\) is the number of [=Audio Substream=]s in the \(c\)-th [=Channel Group=]. + +Then, the i-th [=audio_element_obu/audio_substream_id=] maps to a [=Channel Group=]'s [=Audio Substream=]s as follows, where i is the index of the array: + +\[ +\left[ +\left[ 1, 1 \right], +\left[ 1, 2 \right], +\cdots, +\left[ 1, N_1 \right], +\left[ 2, 1 \right], +\left[ 2, 2 \right], +\cdots, +\left[ 2, N_2 \right], +\cdots, +\left[ C, 1 \right], +\left[ C, 2 \right], +\cdots, +\left[ C, N_c \right] +\right] +\] + +The order of the [=Audio Substream=]s in each [=Channel Group=] (i.e., the semantics of \(n_c\)) is specified in [[#syntax-scalable-channel-layout-config]]. ### Ambisonics Config Syntax and Semantics ### {#syntax-ambisonics-config} From badad8334f7b4d8ba46a2e3493057830bc307530 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?=ED=99=A9=EC=84=B1=ED=9D=AC/=EC=B0=A8=EC=84=B8=EB=8C=80=20?= =?UTF-8?q?Display=20Lab=28SR=29/=EC=82=BC=EC=84=B1=EC=A0=84=EC=9E=90?= Date: Wed, 29 Nov 2023 10:37:31 +0900 Subject: [PATCH 2/2] Follow reviewer's comment on substreams' order --- index.bs | 17 ++++++++--------- 1 file changed, 8 insertions(+), 9 deletions(-) diff --git a/index.bs b/index.bs index 9ef8b7de..68dad3a6 100644 --- a/index.bs +++ b/index.bs @@ -956,16 +956,11 @@ NOTE: This specification allows down-mixing mechanisms (e.g., as specified in [[ coupled_substream_count specifies the number of referenced [=Audio Substream=]s, each of which is coded as coupled stereo channels. -Each pair of [=Coupled stereo channels|coupled stereo channels=] in the same [=Channel Group=] SHALL be coded in stereo mode to generate one single coded [=Audio Substream=], also referred to as a coupled substream. Each [=Non-coupled channels|non-coupled channel=] in the same [=Channel Group=] SHALL be coded in mono mode to generate one single coded [=Audio Substream=], also known as a non-coupled substream. +Each pair of [=Coupled stereo channels|coupled stereo channels=] in the same [=Channel Group=] SHALL be coded in stereo mode to generate one single coded [=Audio Substream=], also referred to as a coupled substream. Each [=Non-coupled channels|non-coupled channel=] in the same [=Channel Group=] SHALL be coded in mono mode to generate one single coded [=Audio Substream=], also known as a non-coupled substream. - Coupled stereo channels: L/R, Ls/Rs, Lss/Rss, Lrs/Rrs, Ltf/Rtf, Ltb/Rtb - Non-coupled channels: C, LFE, L -The order of the [=Audio Substream=]s in each [=Channel Group=] SHALL be as follows: -- Coupled substreams come first and are followed by non-coupled substreams. -- The coupled substreams for the surround channels come first and are followed by the coupled substreams for the top channels. -- The coupled substreams for the front channels come first and are followed by the coupled substreams for the side, rear and back channels. -- The coupled substreams for the side channels come first and are followed by the coupled substreams for the rear channels. -- The Center channel comes first and is followed by the LFE channel, and then the L channel. +The order of the [=Audio Substream=]s in each [=Channel Group=] is specified in [[#scalablechannelaudio-orderingofaudiosubstreamidentifiers]]. output_gain_flags indicates the channels which [=output_gain=] is applied to. If a bit is set to 1, [=output_gain=] SHALL be applied to the channel. Otherwise, [=output_gain=] SHALL NOT be applied to the channel. @@ -1058,8 +1053,12 @@ Then, the i-th [=audio_element_obu/audio_substream_id=] maps to a [=Channel Grou \right] \] -The order of the [=Audio Substream=]s in each [=Channel Group=] (i.e., the semantics of \(n_c\)) is specified in [[#syntax-scalable-channel-layout-config]]. - +The order of the [=Audio Substream=]s in each [=Channel Group=] (i.e., the semantics of \(n_c\)) SHALL be as follows: +- [=Coupled substream=]s come first and are followed by [=non-coupled substream=]s. +- The [=coupled substream=]s for the surround channels come first and are followed by the [=coupled substream=]s for the top channels. +- The [=coupled substream=]s for the front channels come first and are followed by the [=coupled substream=]s for the side, rear and back channels. +- The [=coupled substream=]s for the side channels come first and are followed by the [=coupled substream=]s for the rear channels. +- The Center channel comes first and is followed by the LFE channel, and then the L channel. ### Ambisonics Config Syntax and Semantics ### {#syntax-ambisonics-config}