From 04fce195698c7007cee013126e90d6fa6b5b576c Mon Sep 17 00:00:00 2001
From: singagan <53442471+singagan@users.noreply.github.com>
Date: Thu, 25 Apr 2024 19:00:36 +0200
Subject: [PATCH] Updated documentation for all convolution-based designs
(#1399)
Co-authored-by: Joseph Melber
Co-authored-by: Kristof Denolf
---
programming_examples/ml/bottleneck/README.md | 103 +++++++-----------
.../ml/bottleneck/bottleneck_pipeline.png | Bin 0 -> 79590 bytes
.../ml/bottleneck/requirements.txt | 1 -
programming_examples/ml/conv2d/README.md | 71 ++++++++----
programming_examples/ml/conv2d/act_layout.png | Bin 0 -> 45710 bytes
.../ml/conv2d/requirements.txt | 1 -
.../ml/conv2d_fused_relu/README.md | 69 ++++--------
.../ml/conv2d_fused_relu/requirements.txt | 1 -
programming_examples/ml/resnet/README.md | 100 ++++++-----------
.../ml/resnet/layers_conv2_x/requirements.txt | 1 -
.../layers_conv2_x/resnet_conv2x_pipeline.png | Bin 0 -> 180088 bytes
programming_guide/section-6/README.md | 2 +-
12 files changed, 140 insertions(+), 209 deletions(-)
create mode 100644 programming_examples/ml/bottleneck/bottleneck_pipeline.png
delete mode 100644 programming_examples/ml/bottleneck/requirements.txt
create mode 100644 programming_examples/ml/conv2d/act_layout.png
delete mode 100644 programming_examples/ml/conv2d/requirements.txt
delete mode 100644 programming_examples/ml/conv2d_fused_relu/requirements.txt
delete mode 100755 programming_examples/ml/resnet/layers_conv2_x/requirements.txt
create mode 100644 programming_examples/ml/resnet/layers_conv2_x/resnet_conv2x_pipeline.png
diff --git a/programming_examples/ml/bottleneck/README.md b/programming_examples/ml/bottleneck/README.md
index 40a69e8576..b1a2229537 100644
--- a/programming_examples/ml/bottleneck/README.md
+++ b/programming_examples/ml/bottleneck/README.md
@@ -8,15 +8,15 @@
//
//===----------------------------------------------------------------------===//-->
-# The Bottleneck Block
+# Bottleneck Block
## Introduction
-The bottleneck block is a key component in deep neural network architectures, such as ResNet. It is designed to help address the challenge of training very deep networks by reducing the computational cost while maintaining or improving performance. This README provides an overview of the process and considerations for accelerating a single bottleneck block.
+The bottleneck block is a key component in deep neural network architectures like ResNet. It is designed to help address the challenge of training deep networks by reducing computational costs while maintaining or improving performance. This README provides an overview of the process and considerations for accelerating a bottleneck block on a single NPU column using four AI Engine (AIE) cores.
## Bottleneck Block Overview
The components and functionality of a standard bottleneck block:
-* Identity Mapping: The core idea behind bottleneck blocks is the concept of identity mapping. Traditional neural network layers aim to learn a mapping from input to output. In contrast, a bottleneck block learns a residual mapping, which is the difference between the input and the output. The original input is then added back to this residual mapping to obtain the final output. Mathematically, this can be represented as `output = input+ residual.`
+* Identity Mapping: The core idea behind bottleneck blocks is the concept of identity mapping. Traditional neural network layers aim to learn how to map from input to output. In contrast, a bottleneck block learns a residual mapping, which is the difference between the input and the output. The original input is then added to this residual mapping to obtain the final output. Mathematically, this can be represented as `output = input+ residual.`
* Convolutional Layers: Bottleneck blocks typically consist of one or more convolutional layers. These layers are responsible for learning features from the input data. Convolutional layers apply filters/kernels to the input feature maps to extract relevant patterns and features. The number of filters, kernel size, and other parameters can vary based on the specific architecture and requirements.
@@ -24,87 +24,58 @@ The components and functionality of a standard bottleneck block:
* Batch Normalization: Batch normalization is often employed after convolutional layers to stabilize and accelerate the training process. It normalizes the activations of each layer, making optimization more robust and efficient.
-* Skip Connection (Identity Shortcut): This is the hallmark of bottleneck blocks. The skip connection directly passes the input from one layer to a later layer without any modification. It provides an alternative, shorter path for gradient flow during training. If the input and output dimensions of the bottleneck block are the same, the skip connection directly adds the input to the output. If the dimensions differ, the skip connection might include a 1x1 convolutional layer to adjust the dimensions accordingly.
+* Skip Connection (Identity Shortcut): This is the hallmark of bottleneck blocks. The skip connection directly passes the input from one layer to a later layer without modification. It provides an alternative, shorter path for gradient flow during training. If the input and output dimensions of the bottleneck block are the same, the skip connection directly adds the input to the output. If the dimensions differ, the skip connection might include a 1x1 convolutional layer to adjust the dimensions accordingly.
* Final Output: The final output of the bottleneck block is obtained by adding the input to the output of the convolutional layers (including any adjustments made to match dimensions via the skip connection).
-
-## Acceleration Techniques
-1. Depth-First Implementation: Spatial architectures provide coarse-grained flexibility that allows for tailoring of the dataflow to optimize data movement. By tailoring the dataflow, we implement depth-first schedule for a bottleneck block routing the output of one convolutional operation on an AIE core directly to another convolutional operation on a separate AIE core, all without the need to transfer intermediate results off-chip. This approach effectively minimizes the memory footprint associated with intermediate data, mitigating the overhead of costly off-chip accesses leading to increase in the overall performance.
+## Source Files Overview
-2. Data Layout: Optimize activation and weight layout to enhance memory access patterns and enables effective utilization of AIE parallel processing units, ultimately improving the performance of 2D convolution operations.
-
-3. Kernel Optimzation: To optimize convolution operations on AIE, we vectorize the code using AIE vector intrinsics. We load 8 elements of the input channel into vector registers using vector load intrinsic. We apply the convolution operation on this loaded data, utilizing for enhanced computational efficiency. To ensure accurate convolution results, particularly at the edges of feature maps, we implement zero-padding to handle boundary conditions. This comprehensive approach optimizes convolution processing on AIE, facilitating efficient and accurate feature extraction in neural network applications. Input is 4x8 matrix corresponding to 4 element of row and 8 input channels.
-
-4. Quantization: We use int8 precision for activationa and weights. At int8 precision, AIE offers the highest compute density with 256 MAC/cycle.
-
-5. Layer Fused: We perform two levels of fusion. First, we fuse ReLU in convolution using SRS capabilities of AIE. Second, we fuse BatchNorm into convolution weights.
-
-
-
-## Data Layout
-We need to ensure that the data layout is compatible with efficient SIMD processing and rearrange the input data into a format where contiguous elements represent consecutive X-dimension values for each channel. For more efficient processing, we adopt a channels-last memory ordering, denoted as NYCXC8, to ensure that channels become the densest dimension. Operating on 8 elements simultaneously, we process 8 channels with the same width at once. Subsequently, we traverse the entire width dimension, handling the remaining channels in batches of 8. This process continues row-wise, resulting in our final data layout pattern: NYCXC8. This optimized layout enhances memory access patterns and enables effective utilization of parallel processing units, ultimately improving the performance of 2D convolution operations. This transformation ensures that data can be efficiently loaded into SIMD registers and processed in parallel.
-
-YCXC8 Input/Output Data Layout:
-
-In the YCXC8 (with N=1) data layout, the data is organized in memory as follows:
-
-* Y: Represents the output feature map dimension.
-* C: Denotes the number of channels.
-* X: Represents the input feature map dimension.
-* C8: Indicates that 8 elements of the input channel are processed together.
-
-OIYXI8O8 Weight Layout:
-
-We align the weight layout as specified: O,I,Y,X,I8,O8, to match the input image processing. We first load the weight tensor, organizing it to match this layout, where dimensions represent: output channels, input channels, kernel height, kernel width, input channel groups of 8, and output channel groups of 8. By aligning the weight layout in this manner, we enable seamless integration with the input data layout, maximizing parallelism and minimizing memory access overhead.
-
-In the OIYXI8O8 data layout, the data is organized in memory as follows:
-
-* O: Denotes the number of output channels.
-* I: Denotes the number of input channels.
-* Y: Represents the kernel height.
-* X: Represents the kernel weight.
-* I8: Indicates that 8 elements of the input channel are processed together.
-* O8: Indicates that 8 elements of the output channel are processed together.
+```
+.
++-- aie2.py # A Python script that defines the AIE array structural design using MLIR-AIE operations.
++-- bottleneck_block.png # Figure describing the layers in the bottleneck block after fusing ReLU and batch norm into the convolution layer.
++-- bottleneck_pipeline.png # Figure describing our implementation bottleneck block on a single NPU Column.
++-- Makefile # Contains instructions for building and compiling software projects.
++-- README.md # This file.
++-- run.lit # For LLVM Integrated Tester (LIT) of the design.
++-- test.py # Python code testbench for the design example.
+```
-## Fusing Convolution and Batch Normalization
+## NPU Implementation
-We assume the BatchNorm layer is fused into Convoluion Layer. Fusing BatchNorm into convolution involves incorporating the normalization step directly into the convolution operation. This is achieved by modifying the weights of the convolutional filters to include the scaling and shifting factors. Specifically, the weights are adjusted such that the convolution operation performs the normalization, scaling, and shifting in a single step.
+We map a bottleneck block on a single column of NPU in depth-first manner where the output of one convolutional operation on an AIE core is sent directly to another convolutional operation on a separate AIE core, all without the need to transfer intermediate results off-chip.
+In our bottleneck pipeline implementation, every adjacent ReLU operation is fused into the convolution operation using the approach described in [conv2d_fused_relu](../conv2d_fused_relu). Fusing adjacent convolution and batch norm layers is another inference-time optimization, which involves updating the weight and bias of the convolution layer. The remaining layers of the bottleneck block are mapped onto a single column of NPU with one `Shim Tile (0,0)` and one `Mem Tile (0,1)`, along with four AIE computer tiles spanning from (0,2) to (0,5), as illustrated in the figure below.
-## Fusing ReLU
+
+
+
Depth-first implementation of bottleneck block pipeline on a single column of NPU.
+
+
-Fusing ReLU into the convolution operation can further optimize the implementation by reducing memory bandwidth requirements and computational overhead. ReLU activation function introduces non-linearity by setting negative values to zero and leaving positive values unchanged. Utilize SIMD instructions to efficiently compute ReLU activation in parallel with convolution. After performing the convolution operation, apply ReLU activation function at vector register level.
-We use `aie::set_rounding()` and `aie::set_saturation()` to set the rounding and saturation modes for the computed results in the accumulator. Seeting round mode `postitive_inf` rounds halfway towards positive infinity while setting saturation to `aie::saturation_mode::saturate` saturation rounds an uint8 range (0, 255).
+The data movement within this pipeline is orchestrated using the ObjectFifo (OF) primitive. Initially, input activation is brought into the array via the `Shim Tile (0,0)`. We broadcast the data to both `AIE (0,2)` and `AIE (0,4)` via `Mem Tile (0,1)` to perform the very first convolution and skip addition operation in the bottleneck block, respectively. Since `AIE (0,4)` must await additional data from other kernels before proceeding with its execution, buffering the data for tile (0,4) within the `Mem Tile (0,1)` is imperative to prevent any stalls in the broadcast process. Due to the data's size, direct buffering in the smaller L1 memory module of `AIE (0,4)` is impractical. Therefore, we require two OFs: one for broadcasting to tile (0,2) and the Mem tile and another for data movement between the Mem tile and tile (0,4). These two OFs are interconnected to indicate that data from the first OF should be implicitly copied to the second OF through the Mem tile's DMA.
-```
-::aie::set_saturation(
- aie::saturation_mode::saturate); // Needed to saturate properly to uint8
-::aie::set_rounding(
- aie::rounding_mode::positive_inf); // Needed to saturate properly to uint8
-```
-After convolution and ReLU fusion, the output data is generate in YCXC8 layout. Ensure that the output data layout is compatible with subsequent layers or processing steps in the neural network architecture.
+Starting from the `AIE (0,2)`, data is processed by each compute tile, with the intermediate activations being forwarded to the subsequent tile. `AIE (0,2)` handles 1x1 convolution with fused ReLU operation. Based on our hand analysis, we partition the 3x3 convolution across two cores, `AIE (0,3)` and `AIE (0,5)`, to balance computation and accommodate weight distribution across two cores effectively. Therefore, the feature map from the 1x1 convolution is broadcasted to `AIE (0,3)` and `AIE (0,5)` to ensure all required input channels are available for generating output feature maps in the subsequent 3x3 convolution. We split the output feature map processing across these cores, with each core computing half of the total output channels. The outputs from `AIE (0,3)` and `AIE (0,5)` are then merged in `AIE (0,4)` to perform the final 1x1 convolution. This final convolution operation also integrates skip addition, utilizing the initial input to the bottleneck block and the output of the 1x1 convolution. The final ReLU activation is applied to obtain the final output feature map. This output feature map is transmitted from the `AIE (0,4)` back to the output via the `Shim Tile (0,0)`. Although not shown in the figure, weights are transferred separately using a `Shim Tile (0,0)` channel into `Mem Tile (0,1)`, which distributes them across appropriate AIE cores in parallel, leveraging the large number of MemTile channels.
+We use the following architectural techniques to implement our bottleneck pipeline:
-### Benefits of ReLU Fusion:
+1. Depth-First Implementation: Spatial architectures provide coarse-grained flexibility that allows for tailoring of the data flow to optimize data movement. By tailoring the dataflow, we implement a depth-first schedule for a bottleneck block where the output of one convolutional operation on an AIE core is sent directly to another convolutional operation on a separate AIE core, all without the need to transfer intermediate results off-chip. This approach effectively minimizes the memory footprint associated with intermediate data, mitigating the overhead of costly off-chip accesses and increasing the overall performance.
-1. Reduced Memory Bandwidth:
-By fusing ReLU into the convolution operation, unnecessary memory accesses and data transfers associated with separate ReLU computation are eliminated, leading to reduced memory bandwidth requirements.
+2. Data Layout: Optimize activation and weight layout to enhance memory access patterns and enable effective utilization of AIE parallel processing units, ultimately improving the performance of 2D convolution operations. Please refer to our [conv2d](../conv2d) design for details on the data layout.
-2. Improved Performance:
-Fusing ReLU reduces the number of instructions executed per element, resulting in improved computational efficiency and overall performance of the convolution operation.
+3. Kernel Optimization: Please refer to our [conv2d](../conv2d) design for details on vectorizing convolution 2D.
-3. Simplified Code Structure:
-Fusing ReLU into the convolution kernel simplifies the code structure and reduces the overhead associated with separate activation function calls, leading to cleaner and more maintainable code.
+4. Quantization: We use int8 precision for activation and weights. At int8 precision, AIE offers the highest compute density with 256 MAC/cycle.
-4. Enhanced Resource Utilization:
-By combining convolution and ReLU operations, computational resources such as CPU cores or SIMD units are utilized more efficiently, maximizing throughput and achieving better resource utilization.
+5. Layer Fused: Initially, we employ AIE's SRS capabilities to fuse ReLU directly into the convolution operation. This integration optimizes performance by eliminating separate ReLU computations, streamlining the convolution process. Please refer to our [conv2d_fused_relu](../conv2d_fused_relu) design for details on fusing ReLU into the convolution layer.
## Compilation
To compile the design:
diff --git a/programming_examples/ml/bottleneck/bottleneck_pipeline.png b/programming_examples/ml/bottleneck/bottleneck_pipeline.png
new file mode 100644
index 0000000000000000000000000000000000000000..e91b231be5c1ce94e7cb2adeb68d623a2b47dbcc
GIT binary patch
literal 79590
zcma(2Wmufe(gg~`V1p0tFu1$>;1VoYaCd?Pcb6f-Aqj3FNP-6qF2UV{TOhc*oymT-
zeBZgwyPwS;u7SRLx~saY)~Z@}jE0&5CK@Rk92^{`lA^3892~MW92|TX5E=Ft%Hk1c
z*e`f@O$BMVic#`iI5;qzlB|@rkI6wGN*aOO)guxEyv%5FxZ-Gq*-Ebc2o9GmK5yE$
zv|0YJ{S0?P{qFCMvz{}aQXu>omSp!FY7AbLn`d?Ly_Zu{k0#gt%bYp)fkAsMduYX9
zM3%I6Rt$7tKhm@S*$^NIo(6&lg~I{=pC5qup$2r;fFB><IS=
z7=Q{1DO3SKZ1Dc4g@Bp6`Tu6Ef9)a^5e(5}fCqm={ky&WV|-%jI_X~z{>}a%01%it
z1upR?;(syMe`Zn~bjAF8^TNi1;em;%N^t$J!~Q3$=_?ExxppG)xUNB(9?ph@@43NM
z=H197!fu7JWP)l}tHIC24r}zfh&e%b_UR&?U)ENIg@qBJe>XV~erOr3wTRaNlmAgu
zalq*hA`+3$omh@5Exu>Pl8;IpEho}VhxLnLPHZn6sKoth10GKMcr1G{>}SjLKg4Eq
z)BT@k=pRP0q5@PXrT-fGIHW)oNXKqipV}FY@*-2fY4S{4Qjz2$a4?nmr{H3<{KJa>
z@xTXm!$RA!oW0zxF6lYF;dEYwc#%D(&mVl(LQ#xN`@%+nMjjJ#XS>C@gTc@DruM6C
z;`ucHeF30Q47m<|=JTsXmm$wN<5!qb9LC=jSK9(VDXsWYV-hHm
zl;Gqk_v>~WQDdan-(Bv%cz$~P{HoHVX3oTafI6Cdqhx-(a9D{kn{Kl&L7fcvLRr~r
z=8H}kf2M-YykE12wneTm1e|Hi>)Pao**1BbS)aqiV5p4Q3vTcu7
zhKt)cgc!w736f7f17l+eU*CPR8A(RPW9T>bUZ!_itdBF8%YA;ZwVy6YGV=OqjWZOM
zx;vg9zDZmlkQxDCeN~Z)j7eG&{J;fzog4hr#HLpTA>=T!GyDa!8^=zzcaF#LO6$V0
zT$=`nYyC;iBOyGZR!2fq#355f>st%?($-LyY>IXGlulN%v
zA@n`HG7bDpK2#NWz5L3=XC0~3u)%rQIB56t$E|qLqeRDwCi#k%v%!+fkk_G)XG5Ht
z8X9JXZ3P9=%=Dm|{S&o>J1X;Ge?>IMguOJSB-I#l?&Ry(DZ^89!le0Dx28686dQYx(wc_+Z}
zr1sxqDi|<>bD}bs`u;sCKDVny@XdPkjKlqDAJwnq7aT9m3!aXjpVUoYS1|1~2hqyPJy>jIWDLhlf>L0g9G8aE^va{<9CP+T4WAngvSmIfv9Qr9i!oX%MhrRvr
zDs$`CuiM#j*k#kE^xeYoz5Vs^*IGC>)?9Poz1L?=zq{Synbnox=Rgu6*StHIL=-Si
zMcbW2)83b_0sHUtan6>_0x>B>v?1iT=2QVEp+;|h5X9@`i%0BC3H^x(Nxx4`H!V&EP&r7k8F^-uO2+n*SQKA<};=flol7)E@LmE)yap
z?7o$xFj()jxG|WNj~XuEv`{jxRYK%M`;0n`cYoX-Ow6QOzlIIw%8P|G^6M3=WpBX@
z_v*RTb6
z$)asoa(joSLzH@(eUBn+op=vKzQmr^{Nqr9id#q&*{q*X@g6A#zj)+rtA9o+_3oS1jiDC=v@*6N)
zR~h`d4{LIUPSJpw#2Fc3#rD_dw*1Fy3mE-QmCJx3A9
z5Lp@iE(n(ZeHg4^ZY_io3N_T>={JN`X~AOX$DQ0@lgD^+65VJKZFY@TumW1v^1|9h
z{R?vJGnAO0Qbh-Kb0(7-aop1I%Spu%lLy%GK{5u3`N@z`doKk2Lk0ee0y
z9Q4|CK_(voP-xyAg(DS#M|G>0<(AO#Q^I->e_&)pXT4FQGCNdH+OEp)a<6pUg+U-A
z)%ScvQ0LiPBk18g_gUM4&Mcgq(*)B8!JElO4Hh6BK7JbO-vvyfZ;in6M0t<`6K@2&
zjL$bbb+Kys6b;^T{crM@RPF~>8k*E76MHjX-#Z}9r#Z@{@U%bOecjsG`RXuRVf+;+
zkVvS9XVYJ2XZieaKK82%owFA6YqXIJI5-b&a(yzd0G7&HE$hI(5QqhZ{Lmw=ldAav
zm)X%Qgih*qrtyts-;08}{d)Efe~aIOL9m5c{p?l{<@u|APkZu4z~yMB__y`^%Iyye$QUuS0VvXfU$;89KjVu
zYT0}GFT<*g)Fo~UyFlj45RTA@P=GQXVhLFq?C5BFV(%qoy0q!%_->PUQeYG!pXGwt
z(n<}&c#y2HQ^7^5Ib!<1R3{vf+Vk1m=BLXIUlGg{V(fL#j~~HuJYmK5&1S+Y7TkuY
zTi=sT6ar?ie(LCwL^JSim2O+7{`xVVyTbO&@0W_Q{*&?1|mCQyV
z{Ox^|{!Cei7&isaVAN<#G6q;IczKT;k*Acl>$qHI-OyM~{&x`Z=1(Z-9UfVq#!O@p
zS;AAhpnWMK?P9+VOWy}eObgxQe8MjzzQyOCg2ZB;qi$n+3BSQT`UI4t>_ewCR
zH;%0M4v2)N&9RUuh$Pev$uJu=&q%F~CwU9r@|LQ|vqK|rH~MJol}>}1jS+~vTfG`|
zA34Afi;4)vnu@B!2t$>PpBZ=ADwuf7+~BC3;JMZjsvL>9SZx_?CLRju5qe1kRk*-u
z+&s4Nkz|^zWf^$UjKi2CiTgJwKIHgM{8>voTBviJj)x3{o@ffyB}_do}mMf`F1Nsy=x0VJ_2)h8O`gwE~t%$NeH4ZEaRnGE{{UqhO}_&hyf#`wvNCi
zYZvnEIDY7(H=p7~lywkrk?BzTNh1Uh1_}rye+y<^XuN
z7R+)p_!tBp6hr_NG20v2)2xM-ueA@DE4|f3%;6WAi8rJ_UkJZ3{KN#~kAr{);Ijk|W-zs9+tv
zeh)0<1>`7^VCaet<2a*@ZeH%su>#;_k==sqej%iyg4q+%k)smLUHLMRlTjF--(dC=
z3GJI3%vi7hY2Y!7CGi=HEPfbJwtvbZIYU)We8JH~XM}&NpZG%dE6aFrxep1Sod$m-
zpv$u}KY`1n1HQ?lj1E5)9C&x}Qfaa7J3I>*BpHr^tsDGpBkaTkW+ScqMKg^n2z=oP
zDJjP+5)c>8$#U5BfL3L;#(K!=>EZTEyTNkAY3S?F-9%gPIK&|@YgECqSaZ@hz?HqL
ziTZ$CIw%lEkXVdJF{nFN03g^IQ;B?<8ni&*aAo1@v
z;!ii5>qfE+`Zu$NE(4f?OL@IWViJ!RT62!dcpTt|9G?x;8H%m1@JiuTzQ9}*P4vQK
zfeR`pGx1fw{c^Ya!-lV}t|qf?Y(i%l!_shH__tC36R}w@`a7-p6QfLnZa>p}$2sGC
z8RgGDZ90$FI5rORkZUv?R>;HcSg`JxwpjCYKrsoeH_@&yAJKmOqkZ
zJKlyYw#FSZZ<$TsqJ_)i9{2ma>h(s$D$+lPBL5WF|10Z<9>{?S+AVKYhF328lkh@)
zK+u@fi7yDemCz$ZMZ%UmS_QPTATZr~uPbUnXnF_b95{kVf*hijRv4@MC9M#+_q0j%
z%%q_TVYXP1BQh=|pRvi+pi)FhWalUrFKN6a?wyOUkql^6=wdCY(V*Cz@^
zWD1Aqg_uo5wa|cl%v||8PO+mUyXu$S$HdpU)AK9VYNNj)j(Tam+1dilIED7Z*#q9A
zxQ1y_vf#6RFd+)lZAv4~^h|T0Hbam@xx?mBOQF-ipw1#rCDRboSN$q+;>~2f2A0O6
z1d%6UI0DuWFVb)fF!Gzv&dyq}cvT$W1c#Fh!~Iaz%!MkX#CClJp(Naz25k%48`19~a
zpTi3g!eVK<88p@6dp^&K_FFlSQ}$BBkGdsyCOyobQqiN#x-2lU#C0o->j!T~hRncN
zg@X>wm82`5jN#o+tbH1Q2?$al)MB}o4Pmu!HKOv+Pb}&8Mft^GdY;;@1B~rSZa*CA^~p6N3tav-KkR4?J4U$A@PwpqK)B7tQb%xyh)n!Eknd
zg+gGzIbdp!B%4T1lm;gZ4GkfMxHB3ZqfC^X;3ltg*EPGrY9xc7We131$<-aSJ8r)k
za31GCWnKZ2$29Lh2}LA$`Hql1b<;Q+UPe5_Cq(A?4;#3ZDYId50a2ny0J~mwDxuDx
zK`GbB-tD?NH+l^u)AN6U0C|`M)~Y7ffAcXT=zylB+8@r|fulhfo3mJ8GxR=?ws@Lt
zKuPHI#QMNVH88qz8Pzz^5&Y8({Hbb%&%~9oo4SaA-?{@2$`=oT3$^wvMigud8PX)GgJzhJz(Uop6IDfdR*(|Dw5Djjm(aS9
zpxvr4=LJtBIP@P8A-<+kKJwWqOhto$Mqm$zBEeFMRC12Tq*8skUiW2sq${T^RDLPg
zg4+y#SC;(_T9w@By|xNORiXt9$U`LYT5um?_8PE|(4>*k)ERY{R+e-8kAxSU*B?>$
zhU`$lRSHDx28Gj|J}xquLRp=G*(E=1M1yD&QT@qQb%bi8u8l;eP-2(Xbd8M4uxi&^
zDNuOwjF8?m+fOlEI=^$_+tTY~%3;6GrFUUzkMon)n9Bi6BJv;ee?zjtG1(Ar9r((6giy~yr>n|)7^?7Jt=jTlB20<6?n-R`t8p-qfS
z+gNd^R+~w+5e2?JamU92Oa!?Gm1NgTK0=@1jd3h2@-d?h
zgDWfszc
z6|vWjOA1vxd@u2skBegXr+?miicjvb{vTkqIkaXf!_*%
z!GZTw*kcOhvOk(RFrFj>G7Z@A+AL2G<7$z>CLVt9;%eF9dQ^Jbyxih*;{K|L$uS|U
zn#k2Lm%G8=zPZkJr$=ceYkd5j#mt1T0rv$fkvd~qs*NOd3Yi|mWbIAO5J&zwj8js0
z;r@1nd|#tB^F+*WK>C)jjRGX6)TqqMb@h^Q%Gcp=t?Bq_w?iI*KwvJlrh;gnm%aL(
zA_5?M(t$9ut0_d5UlEd@>*}M2xd${tutuN^@PXR*wdoTUB4*KQWI}`sbQqZ4kuzif
zPm>vu0;kPuhU-t(XRFL*e&O@TROAn@K0jTbJUCj%?xuF#LDB@7l2vMeR1q4i@Cd4O
z9ifpmBBiYaWD(k(ePBf@_ag{mmr}=KItMVf`wTYPDKY)`i}wJ*M)u$wCZ8+$_ae
z%MP)1cyYfAyU!74HAQQU;ddy)u^Po#T2TVrL5bFS`!^>FqZC3Hs#91TCGQ{sVIFSZ
zKJ2c=3*1*fW6@woP^z#Fmo*h-IiT2wPwfc(im!`kg$fH?zxQ>j9gm*%a^>trr{6<(
zQ$C|J>jnH7kg5yk>2o?QkVs(Z0&K-1R5F;#m2_#jx*>;C2~mRKt6;e=|!{c|`rUa_lZbo?`Tjd15d;`K5a};Z!P4{Ku&=fv!$xgv*;!%ht
z&VL(jCGP=8`WJ~Tqyl)YR?mj_czx=kHcSm~XG>|b^qfY%!Ny7N69d^xaK3>U1&f}i
zrv>fjh%*2NWIG6$!-@Qs@-Byh3bDFFqqUM$`%s&PuI3?dngzAtj}Hv9^*WOv~vLZjLck^opaLx5#qYQ!P1P?$7ArK
zZo9bg5Tbk_bK1w-)`)^U0wz(_(6ipowC`3qh)ff8y&g&9mI)lU3qwC$NeV{}Ykb?)EJagoVBpnv(Kr4jv;(0jV&Zzl=1i=Xb&k$vZ84p~!bZ7C|FV2KUIG
za8N?C&?&+NVI*53$}NpVy#c%}=TJAB=f~?;#R;v%y_j7iN40K+ersagOXe*Zshzt@
z55Y=GF?T4I{3H7rfYQil!PyvnqAN5tYkA(*tCo7Qa+9Yc3V7CAKmieBZ?%_{#ztG1
zFcLMO+a%%+G^*Y*ydlWzopc1RD0zT+Ah(s=8rK8iWyPUse`@q~SoDK$30SAU4e_ORuOmGevReM3&_^UD
z>Y4i~hazP~Q+A#vL?A(+xC7>!qHd2p|BuGv;Q0K2nWkwCWXrxsNc4>?(2$WI^V&1O
zG3nL)yO60BWT}`-j18PI?iNAIF>?QdYKPS|4TPNbp%3gY?pEa61Y2Y}PO##x(WW3-
z8ycj&Rc4)lzGEgM%VcezqplB}rWI{3T&vJl&4`;XHKG{|r6Btw)G
z_#f`@Z$Y3VpLjb{LaZbjKZFOKlbKHM6s6Vb6URx|MQ2@_E>Mg&B?IL
zb^rr%lcv5PPCpx>Q4)_|{|&Hj1Xa*dh-5vnm-z{{byi!Nrx<*7F~Z+#J+l6G-Er$h
zF(~05gF0dWC~t#0^cJN?v9h9&sJD7UFbDU!B)#T3%;6OZr%{UmctUUAboGyV=#g-qp6~`{zX^zugNs)ZvZ#}qurIcfD$&`J
zoXj^IG1(LIy&;n@!ubNC`%HCD>Q9glG6$<<9iiV~HXWX*g}2(hQe(9rms}d7Az7hF
z9Kv4b9l2%%sc6iHLA-d5_IoPK>unNX?5sSSRihU|lI!44mIQ>s^Y2jM`HOocz75^4
zM{7`NA;N|tnU%Gtux?!_8{}WOU8-ZmyX>jL{jE_l%v}cj5sAyNT|GYcZ)xUbf+{sbrDVy4JPH)gnSuJ9c%181w5-_a0*SADpqi$4xm(azrGAtL-NaCiAxS1fa%Zk|PRiH^wM&qT2z
z`*G$`7)F|PR(M3a>x6vYfw=#Ha${uS_6hxQiV;uM;N+lMP_8UuSz=lkU$XzYYpYn8
z#aSJTdCaC^&{dl}RYAGc9GZbdE#R5P<0zc{f?e~5I}mfaO(Y=ciR=hfgH1u}o11AL
z&s0mVI%3O#Q_1ri><;fRuxsJ8{xby1BE)4|r*a$5uk+~ydx3(lYaL0pt6KD+
z!`9+#mdB8xV<|c5Us?dbZWNqSx<@AWm7qNkT#4=0aC4~fVj*O{$_3bg8k!{uws5V%
z-@`6ZI!I*zJHDJ%vJNAXi~vn?HiLgA4xz5MB}N0z?6Nb_dr-4FvPHeU_z5XKuqAiN
z^;S+a$A3_A@dXKj?)QvNTe@8MiZQj01b6`>_6Vy!czL*Rxh7OLNXPDb~1=As<_p
zzrNzE6OSN@a$n6z2GfnY`7XW`GJpHaLED2Bcc_jU`I+x!WA7`ow*t0f6Y
z+1JQcLn(kA_s6<1O59NGKjH+IVZePNAt3
z=#C$r*cZcnc?N5Aml91lR<`O!I=uanX!?Tjh3pM2uViGMMD|V>hldKA(>W>48Z!J=
z5I`7!(D~ix)S|9$J>-W(wE-XhXRoXb2N6dKjq&wOVt?{=%;gM;0je04nOm#8s0q6G
zx}b8COg{T$u#pU3retv2_qk9FX-bJDlsfZO8mj`WG~J_pyWeNmd7zV;hEoh==5Sm5
z938I@!>z_JX6I(!<(g14v;%c5*2vsg!DTMS$WrN7Cm5jl6&pi1%TRJ_)%uri14ITe
zX}5f1hD{xi3nP-k2UXn;d>PCz$Z}GYH14TcdX`SSn6afJ3ts#qpm5?HJ?)K!v#`o_
z8)RlVz*iGo47}+z66RDfM1*ufC=eJKrO(!^O1**>2C^@eJc`MgMI_2mfwSb~^cwRW
z+=cEE)Um{8YE5y=@l_7};_!@kd{)CyKo8&G
zE47&?=C%K><(t(Dpg#ZoDM8+o`17H(MV}U0DE*5MC0gtAto_~*;^h=gu=YSb4Otm-
zANz2;iWD(Igyc6;d|RMk#3M8?z7>OjWSY}N+gyYQW6zsL+ry+b_;KIVoU!63mDVlp
zR~9E2RS27kRF{0LMO!9i=A*v$RA#>|+l`%TEq|X6Hl~85U{WD^f0s(rb`xZ5%Cfu2
z>YpS>s6w(h>|0DZ$3i|4t?v-%HNG9SKR->DYBBx_nL?8)4#)rcs`5kN)z3ceKSYiP
zEA@|q&1m={hN6EPj7F_d6RH8BrC0Z&pG~j)g@II(4{A
zbt=R)#oht)5`DPMn}aSQj-xh(Kh*qKruv;!R>#cz-?0U*7b}s{&lm4&C;9+$XZ_#1hsad
zZ2@=1NXx;*5+7%N%hy~BTm{Wl80)mJLH1T!{p%z)vI(x~TT*)5tKhGJOV_RmO)v}m
zT5=yN4~>lPq>6huwdBx8`3so$L_^qCs~K|xYBc3UZ>4!oVT|!}p3qE`cThQlf}(oB
z3)-;AnH!Z@m%uXznnZS(%xB<3`nFJ@6dWg5gK3lC0&8h(cfNSYl%_N$_4
zoAei#=iB`XiuY>1%ny*qBn8Tsqa?%|fXspKNrI@iDQ2@;8rw>K%~^QBIKYNnU!ej8
zOzWUMTBeAsiOg7+hhIgiMh$D&ckXRD*K*kgnRSam;kwQtmxKd(AEb`n|LJ&Iz$#*z
z`_s&0$!(^xgFE<~XE3dZwiQcORpYhUak|pLrM)7kjVkcsWnGnpOZ%gjP;}gKtNmz}
zuyX!fwWY$`vQb+=)Awt5qQu#vH>S-|Nl1aR;18|k*HKzIpS9-~MSgNkFJVhQxDs;<
zB_(E}9h3XC(d@+fCIZ@JZsSgDq3@|iOoAUh-8@mQ|+?Ha-
zt1%h4Dqog2tl&n7Rp)^ElyoA%2)^(?P^Gh}tDT*lD=^3eRD?A>P0+Fo6mAL*7aN9P
zVu~4lpjMbxQx!~h#6mwRe!cYROx=e&%SR51E97KOjGlb9F)bm@NNt6B_jannwdhY1
zaxGS>k(#)RT`I>pN_C3bWZd?#Al}RG`lx&lAR#22V8rhg#SP)N#MNu0B<>oB*iVWs
z@;Y2Fi5jet?cfLKxoE^t8>QQ$PL7G>S-#z!ISL7xS
z5wOKHgk`Fq)`@?t^yYL(f^T6b_Up9eP4GcDY`y_~+w9L>h<^M5R-5>=dtD|Gv?SN$
z#83ibMs42tbFjjnSi)s*K@=txEy06YygOCQ8iu@%myL0)#1zJ5(=#*M8+hoct1n@K
zXtBK5h$XxNI>U}80x@8W1umgOtk&SdM)5ulx5n-8U2$OeQ{HC4tW=b9-tUAGJ~<}F
z71~Z+NOIM{kfr?Mg^rLelcPrtX`Hw;UKq5q6I1k(3`=coCCZzCPEg`%-f}Zrz(oD8
zdLTR)wC~M!2HYS69e?Tn6t+X%_TuyrdTm%)800AMV%g5_gR(_nU0j@wAsz6_uB&o}MG)o#RN5&Y;2yg~<1a+7M9oEfh
zzX9>~;~;3t<+Amm5MGI4n4?v{S{p`O$7NpOOU|Kz7u;}d=00EtiMposTKrz@PonL<
zCUrjsFVSfGaMgypqGuR+4Zi;*^YZt7dU`itoh;9zyA}8hQAJ2by(clBbZsZ&chF87
z|J4?6E$^!;2JES6!1;oBk34puIA+PT1(~PwccEn?18OY4I#0(u$jZqTd8SaLU|R4v
zk~IqVN9s2xAMB7qU!tn6e6xCUw_o{L?j50n0Zet4-|tW!C9??pw9Qa!FLplY81oNJ
zXoo1EI-qZ(ih6~Df52sq1?$epXFZbSn!-^~S5__bjG`V!7(E*`$#P>>=IAQHfS>WO
zWb@h-o?H?ZB6C3Fj3H~o5NCoC%-`9_$-vTUH$RvIvWgV2fb93P
zzY=9(<-x75A>R-fr`GB0$MZsqgBA(of8#PCz8w990NO;r@Z?x4lqVjKw~
znl=}>%!9|w=-tQQ2-Qi(U);_m5yN6U&=_o#3z+(#SUR<{+GE9cCo(h29caF4Cktjm
zys-D4PDoH^nk$PUUeL(zkhnkKl^X=3*3VR_SG!1Qgw=+GIUg-GvDBv73pqhku3@S+
zE%V;I*PSa@-RZY5MH!kR;Q{>x3iW`Q##TpY>}kd|4d|1Xuc-!dAzJUdz_>?D6fS(H
zBF%AolW_rQM)TJhjWgCwmEi^>WpjUGWZa{O&2Sni#6n#L6P*CHYyRr@gbYq?X;nkM
zE>b%BnI74ZjE{Qt7n2jwoO2jb*Oim10mI;|_Ewyfu~!rV*higx=eBME4X-}`3P?#-ecC>n9{nH
zpJX(NSbvnK(fSrtuo)Cl5n&~1X2vYnFp)^CjUUXZgwprmY3!~XSm|AhSI8VO8W|YZ
z(jrEGE}%a98$K*AWC|~yKuG8TY+4YCQcTd57e^*ok&!}j_PPXat*jH>JB2}&{OhE-
z2A4_fqJ!dFXdaS~xXFudY{^uZ>t2;w
zW=V9(Cs490K9sA!rKL+8T@O2v06rE|WY3C#HsIGo*KT@N1iGJpECx64Pf(_cY3$k|
zfdzhlZYX3Cd!XOr<9O6Kq+p%
z7H9**Jl9njDv!uR0G~)Ig~w7FM&Y*E(S)7}6ZmSLPe36>G>NMI_nQflgo(pm!ZjBU
z8{)_xRF8Smzt;?5&%p#P)L>iKg~=c_m0|ApbrP{YIMIy9R9w1gFd
z5o`Cbi#Ks5W8`7U)dn2GWd$^Z%HRvBNd2`T81{?~?Cq
z|7W()S_L>?7LQ>+UJT_0>{tXaKtDcY6d^pKh_mg+;=`xTb-d7FDYys>Xu6?GtIn^u
z0-S$pHu~jZOU#V>Hp-0`IxPj8a&H*PU;0nBP<(;t+6AoMO(K5`MWIw2;;x#4ebuxq
zWG6d~%RDU0XCo#8rf?3$kb+4ZB@$UH8l~Ir_E>i(Kfj2IecSyHqrxEb8YX;DpI`U8
z?1;LIj`8^RzLr0Gl4R#27eSNt!YF`mKfjL8a}yZ@BPuPx4*ve*j5(B;2)^%?P8y
zKO0%q)E&q3^RAmZf9#k5v*Qi773N=dg#4A-{;v})DH%n(!wJjCtNqV^3TR-XS!CJ%mvz8uoj71-
zJjtlC|LrY+mw!eR&=CIT+y2X5ArG)i5|3`@gNEg4Ra-z0&{x#st6n
zLwV&=M)Gk(Rph~h(TZ$yRyjE5^+5N`oDHqSesglIbxh;FG21-&yRrV!DKZuiGD45o-{jT*JX&tupKD%l(iH0dPvAt;)
z=WJ?hXn*^wC2qczpZuo)#7q!S%@P%3?umFqu985IpPP9{k{atGlgzVQtvmk$<^eBZ
z%hQRSOKPHhg|~BFLQ1E!pnW4>`=28%@f4!E*=6do!7Uf!WP=t#O9$@FO>cM`eukLW
z^rj8o|FCy5tL#ob#my=6RhDh`qx!_odC&8kf9o@6Ob0XB0@PcrbvT{Dp)ohyP+u39
zVd0@i^m#%axBSs=?%ey|&IdOE5g#k^6n3_=rP^OG3=+v|GBi0Rl*25z27s?rGvHdj
z>&+=4Yw;v@cD~_=%DJE!=;Fjdm~&vQB~W@JSL{^v4M+9Trj2+vr_uPM+C3M=b@3v1
zb2_`^zT1=u%lmXLbLmFW?E&Wp9RKXmr5l@}k#mDe$v_vmcj=7`Q>PM^*^k;8L|ShM
z4x7rad7nnaDOPiOXPk4Um_DK%ubw6dxM2>=&kfj3VMTg~NaeDmA+=KKOX8f6g=+g9
z-E-G#>#lz$oh_ZElH6V`#o|0pm7;L+2r8C@3zVVpg#s80v5BvrwL;=1$7fmWu+1O5Dhoy@(TzHY!#(*Cd
zs4|e;46raB)mDs8bxdV+i^w7l*Bnz;B8(FcdnuClQ7Mh=?ew8Bh5&u!I(GHL{Tc;m
zKiU6?xB4(Q(|7jo+X`Wz1{DS;7pgFn6F;#KR#g$vG>p?Z)0qJj6
z(Lc}uCUB}MbU?h?t~z}qN+W~*78jR4aARb^not$o3jS+8temtjg1CAyhF@RD{QIRF
zMJL2CpuegnL)}hRD=k1jGUSq>FQKz6FcpQU
zmv%6sz(UNBt(QZY;Z*?xKrI@eI2=*6!JE+>_3xR%umzoDrcpEFQMiAAg)XoZFink;
z;Z_aw`!$0NVj1HP-;5PXWydyDkw!}sw-zo`VA>eHh^q{q*+-eE2o+pH#>|TdxEK=u
zMj!Ct6pw}Zf3C|Aj_Y76(NF`8QaYQd3b_Q1A_1Zgjj)bQ*yG7txVZ{m3$J1qOJDs{
z5k@w3#0Tk)fL(@DjdNLXR|bxta)DVqK0r{eGsgFJsW}%<7FSM`IH7&Uf9D$>B=~))
zI=z)oFk(08uqFhBkkXp141ZwLPwu5PpZi7tkITz%n^Wxa%~Q~7rE#P#>+r2mIr-!_
z@}WM3?Q&2&Q#PoI)~3eOH1j^)UpQNr`(F9qiIR>CUi3%ooVRi^&p?h`y*gqIIsEx%
z8dZdT8}VeoO-9p#%OipGrIrX`3qIaO)<;IsdtVZ9HTR5s4IzeY42KTY`=qudBmw71C<{E!leUhSKDL!kOXf|0JjZX>iUz
z7M)RtSclFCz9p6q~&c
zl&8#4w;kRZLj@njjc%@JtN!2^!(T`V_}S84nyl1%*-!Z+8q&f@w&^I|^y>UkwCU!X
z=YO324Fu&^1}dcw8=yGw-SiVn2DRJ?mWu$uHjwF965gosLO8&7+&jiyEi>u)$VQws
zd%q~z7=Ijn7Ovc1r($y>;izJTphdRyd(p)l{b72%#f){G(m041Bbk=JRg%ALeCFE*
zj~u0|Y+xZa;IwVbVAc4RBU8eCEawi^th@$aN3;w1myh^`jFba13u9juYBdIwSq+b`
zv|8&OudFAj=I}CZFcr(*E^|W1D-~_7gf*K^UD}=Qc_pPEGD_$7)|p%9UC}pKD&tKg
zPL_p5TYB*@Eu2$cGV1VD=U3#e-#Jv?^VI9om(lRgeAE6Y@~JdC-PyF(GW(-;bNu~6
zE=f#%CDChR^16HpnY)3I-qwN-mhj#{tHp(SmWY}-Gp`1g*UE=RBzK#mI{CpLkM9~X
z(cBlbIo~@z1Rqg4JD8EsB%T4+RxAR0cYo7Qz@l*b!oNs~*i{XtwpLaOd{O&{gve%V$x3~H&_~xgl)^;DqKt>`diTkfB
zxkdhQW~1@D_)chz=^BdhK1BRYW7){B$(wO9;8v
zqsOU|gB-utjvw#x*5g>wmh0f$_)&Rk@75ddQ~l-bh(LLvUA+R`+h1$wN+j*NZ*cdk
zgH-ZZx6$eQKUQ}~+ftz7*()=sWN3$H4b*E@DjC(77Zctk?v7FfIwB!XRw*Bwd-ie!phI$K2=HlW~#}g|WrI-up$CN5rnI(OE2wyjL2v>fROdEsOpNHD!W`DqS@NHJU}Lacc74?eo>!$8%z=0`F$oc>Q%f
z&fKoXCWd4k7au762k5LXT^6ok7=oy!O)EL|v}RDVtW2xFrkYm1KW)8RtE{1Pkph=D
zCYQ|p&}NWXygi0E#cl%etx{E5&x;M)D!+H;1w}#%D#?B5Z&K`KRqLa;rbcsA+~Cq2
z1j`=4{kp8XHTTal>(GZU;p)}}
z4kYo1)ynE48TZ_Mkpo5Yl%7lJGphv@F9GpyyqCi4zxJ&kKB>Reem|IM-`=z$lA#EX
z8um;n&?r(}ApK7Nb;;-0{oV~GA?rKmWSc?k7v21-E0^Aabffb_GUl4mSfUzHN5LxO
zt?QqEhCpgv3_;9EK+pr$8GZXJSTPX&6c>B+cP7K@mPCF!`wb~^(TdBIkcq;fQrj#F
zw5ARt9PdF2xa4Rmr9
zY@XyzUj8dFJ*U!ZRtrPk0+Ecd^a3@eV_5Bb6Jrk3w)8ruWu2>_K~{q!q65Q{8R=)W
z!$2}@ule(gyq^~R=GT8o{9tL+czl<$@z;ckLR1Y}?
zux4D;&^Rur-|_qcpDIsVoxy)f&CJ3-Of+4M(mkUmfPVC$792>|i^)*Am!DoGy)zyf
z5v5m3_jzTJT+#5w3RHm+fx+&DsA9pOHBDCi3FqMbZh*3SFvipIkowkr?v5NH+7>!K
z`xr<0*Fo_a+jn=t8BB6<3=Vt}YK?{aF-C`tp66V}@}(&M(gKV|@F!KE2{kb8c(%Q_
zbGH^3PdH0I=$P?Ii#GxZB&jVI-=q7$OsF#F`e_17Rdy&db&w^^?iwr^+}CDuyCd8T
zEz!8?vAZ{H$aNkqd+lRWeCU4H{+xUo9B9(A9R!b)vP&21*&d!sBD}|cX)7KC(!k4k
z(tX4nBxUT9XN`*77juTB#Nk{XYCi@m1*+
zji|0s&Fy!^Xy2u^bVjc<6YN6C9vE-@!E2z=k4Bh3@2>wH>GSP~um0G${l0&!@x>cxDt{TQj3
z{_^mJSY5hPB>zi1Rn-)ZcuMzjTZD;BwY^Xu)B0+`-C9rYXgBvgiDgC7WS*mKiHXm{
z8sR2g7;c<>0$Q0Xh7}yKMMX+PzZ{E78fiyt77}?xU$Fpe9wJb=BWVzDQc9JH1)B*F
zF}f<`y1}F`KXqmQI!eO9;t_1LwoF&Fie=EL1uz&Ze(>3OnZl<=JvMom&2J)B^Xm?_
zbH0JFMB^J5$KMsN8O@i;TD0<@F_0d}fVqyZ
z)hE*4`y&r;Bt=q2de!gbP&dKJ1CGzwTrJm^4K+T!*fSk)r^!xqDz!5jV$2?st7>v7
zgRA)yy+2lEsGXA1R{RnbQrLjp+p;uyv7-nnsQLZ4t&~g2d_*Q5)qEwUoQGH_KEjAH
znhX