From 3b6d73440b667a490f4fa942d09c138164cac7f6 Mon Sep 17 00:00:00 2001
From: lezcano <lezcano-93@hotmail.com>
Date: Mon, 18 Sep 2023 12:47:07 +0000
Subject: [PATCH] google docs suggestion

---
 blogpost/post.md | 10 +++++-----
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/blogpost/post.md b/blogpost/post.md
index e70a3a82..77b3ae0f 100644
--- a/blogpost/post.md
+++ b/blogpost/post.md
@@ -85,7 +85,7 @@ rather than generating CUDA code directly, `torch.compile` generates rather
 readable [triton](https://triton-lang.org/main/index.html) code
 
 ```python
-def triton_(in_ptr0, in_ptr1, out_ptr0, xnumel, XBLOCK : tl.constexpr):
+def triton_(in_ptr0, in_ptr1, out_ptr0, XBLOCK : tl.constexpr):
     xnumel = 20000000
     xoffset = tl.program_id(0) * XBLOCK
     xindex = xoffset + tl.arange(0, XBLOCK)[:]
@@ -174,7 +174,7 @@ NumPy and then do an if/else depending on values within the array, or perform
 operations in-place, perhaps via boolean masks. These constructions, while
 supported by `torch.compile`, hamper its performance. Changes like moving from
 in-place indexing to using `np.where`, writing the code in a branchless way, or
-avoid using in-place ops in favor of out-of-place ops can go a long way.
+avoiding in-place ops in favor of out-of-place ops can go a long way.
 
 To write fast NumPy code, it is best to avoid loops, but sometimes they are
 unavoidable. When tracing through a loop, `torch.compile` will try to fully
@@ -222,10 +222,10 @@ explicit
 times, a bit surprising
 
 ```python
->>> np.asarray([1], dtype=np.int8) + 126
+>>> np.zeros(1, dtype=np.int8) + 127
 array([127], dtype=int8)
->>> np.asarray([1], dtype=np.int8) + 128
-array([129], dtype=int16)
+>>> np.zeros(1, dtype=np.int8) + 128
+array([128], dtype=int16)
 ```
 NumPy 2.0 is changing these rules to follow others that are closer to those
 PyTorch. The relevant technical document is [NEP 50](https://numpy.org/neps/nep-0050-scalar-promotion.html).