Skip to content

Commit

Permalink
convert nb
Browse files Browse the repository at this point in the history
  • Loading branch information
dnth committed Oct 21, 2024
1 parent 65bdffa commit 8696f6d
Show file tree
Hide file tree
Showing 3 changed files with 268 additions and 0 deletions.
268 changes: 268 additions & 0 deletions docs/examples/quickstart.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,268 @@
## Quickstart

[![Open In Colab](https://img.shields.io/badge/Open%20In-Colab-blue?style=for-the-badge&logo=google-colab)](https://colab.research.google.com/github/dnth/x.infer/blob/main/nbs/quickstart.ipynb)
[![Open In Kaggle](https://img.shields.io/badge/Open%20In-Kaggle-blue?style=for-the-badge&logo=kaggle)](https://kaggle.com/kernels/welcome?src=https://github.com/dnth/x.infer/blob/main/nbs/quickstart.ipynb)

This notebook shows how to get started with using x.infer.

x.infer relies on PyTorch and torchvision, so make sure you have it installed on your system. Uncomment the following line to install it.


```python
# !pip install -Uqq torch torchvision
```

Let's check if PyTorch is installed by checking the version.


```python
import torch

torch.__version__
```




'2.2.0+cu121'



Also let's check if CUDA is available.


```python
torch.cuda.is_available()
```




True



x.infer relies on various optional dependencies like transformers, ultralytics, timm, etc.
You don't need to install these dependencies if you don't want to. Just install x.infer with the dependencies you want.

For example, if you'd like to use models from the transformers library, you can install the `transformers` extra:


```bash
pip install -Uqq "xinfer[transformers]"
```

To install all the dependencies, you can run:
```bash
pip install -Uqq "xinfer[all]"
```

For this example, we'll install all the dependencies.


```python
!pip install -Uqq "xinfer[all]"
```

It's recommended to restart the kernel once all the dependencies are installed. Uncomment the following line to restart the kernel.


```python
# from IPython import get_ipython
# get_ipython().kernel.do_shutdown(restart=True)
```

Once completed, let's import x.infer, check the version and list all the models available.


```python
import xinfer

print(xinfer.__version__)
xinfer.list_models()
```

0.0.7



<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="font-style: italic"> Available Models </span>
┏━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┓
┃<span style="font-weight: bold"> Implementation </span>┃<span style="font-weight: bold"> Model ID </span>┃<span style="font-weight: bold"> Input --&gt; Output </span>┃
┡━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━┩
│<span style="color: #008080; text-decoration-color: #008080"> timm </span>│<span style="color: #800080; text-decoration-color: #800080"> eva02_large_patch14_448.mim_m38m_ft_in22k_in1k </span>│<span style="color: #008000; text-decoration-color: #008000"> image --&gt; class </span>│
│<span style="color: #008080; text-decoration-color: #008080"> timm </span>│<span style="color: #800080; text-decoration-color: #800080"> eva02_large_patch14_448.mim_m38m_ft_in1k </span>│<span style="color: #008000; text-decoration-color: #008000"> image --&gt; class </span>│
│<span style="color: #008080; text-decoration-color: #008080"> timm </span>│<span style="color: #800080; text-decoration-color: #800080"> eva02_large_patch14_448.mim_in22k_ft_in22k_in1k </span>│<span style="color: #008000; text-decoration-color: #008000"> image --&gt; class </span>│
│<span style="color: #008080; text-decoration-color: #008080"> timm </span>│<span style="color: #800080; text-decoration-color: #800080"> eva02_large_patch14_448.mim_in22k_ft_in1k </span>│<span style="color: #008000; text-decoration-color: #008000"> image --&gt; class </span>│
│<span style="color: #008080; text-decoration-color: #008080"> timm </span>│<span style="color: #800080; text-decoration-color: #800080"> eva02_base_patch14_448.mim_in22k_ft_in22k_in1k </span>│<span style="color: #008000; text-decoration-color: #008000"> image --&gt; class </span>│
│<span style="color: #008080; text-decoration-color: #008080"> timm </span>│<span style="color: #800080; text-decoration-color: #800080"> eva02_base_patch14_448.mim_in22k_ft_in1k </span>│<span style="color: #008000; text-decoration-color: #008000"> image --&gt; class </span>│
│<span style="color: #008080; text-decoration-color: #008080"> timm </span>│<span style="color: #800080; text-decoration-color: #800080"> eva02_small_patch14_336.mim_in22k_ft_in1k </span>│<span style="color: #008000; text-decoration-color: #008000"> image --&gt; class </span>│
│<span style="color: #008080; text-decoration-color: #008080"> timm </span>│<span style="color: #800080; text-decoration-color: #800080"> eva02_tiny_patch14_336.mim_in22k_ft_in1k </span>│<span style="color: #008000; text-decoration-color: #008000"> image --&gt; class </span>│
│<span style="color: #008080; text-decoration-color: #008080"> transformers </span>│<span style="color: #800080; text-decoration-color: #800080"> Salesforce/blip2-opt-6.7b-coco </span>│<span style="color: #008000; text-decoration-color: #008000"> image-text --&gt; text </span>│
│<span style="color: #008080; text-decoration-color: #008080"> transformers </span>│<span style="color: #800080; text-decoration-color: #800080"> Salesforce/blip2-flan-t5-xxl </span>│<span style="color: #008000; text-decoration-color: #008000"> image-text --&gt; text </span>│
│<span style="color: #008080; text-decoration-color: #008080"> transformers </span>│<span style="color: #800080; text-decoration-color: #800080"> Salesforce/blip2-opt-6.7b </span>│<span style="color: #008000; text-decoration-color: #008000"> image-text --&gt; text </span>│
│<span style="color: #008080; text-decoration-color: #008080"> transformers </span>│<span style="color: #800080; text-decoration-color: #800080"> Salesforce/blip2-opt-2.7b </span>│<span style="color: #008000; text-decoration-color: #008000"> image-text --&gt; text </span>│
│<span style="color: #008080; text-decoration-color: #008080"> transformers </span>│<span style="color: #800080; text-decoration-color: #800080"> fancyfeast/llama-joycaption-alpha-two-hf-llava </span>│<span style="color: #008000; text-decoration-color: #008000"> image-text --&gt; text </span>│
│<span style="color: #008080; text-decoration-color: #008080"> transformers </span>│<span style="color: #800080; text-decoration-color: #800080"> vikhyatk/moondream2 </span>│<span style="color: #008000; text-decoration-color: #008000"> image-text --&gt; text </span>│
│<span style="color: #008080; text-decoration-color: #008080"> transformers </span>│<span style="color: #800080; text-decoration-color: #800080"> sashakunitsyn/vlrm-blip2-opt-2.7b </span>│<span style="color: #008000; text-decoration-color: #008000"> image-text --&gt; text </span>│
│<span style="color: #008080; text-decoration-color: #008080"> ultralytics </span>│<span style="color: #800080; text-decoration-color: #800080"> yolov8x </span>│<span style="color: #008000; text-decoration-color: #008000"> image --&gt; objects </span>│
│<span style="color: #008080; text-decoration-color: #008080"> ultralytics </span>│<span style="color: #800080; text-decoration-color: #800080"> yolov8m </span>│<span style="color: #008000; text-decoration-color: #008000"> image --&gt; objects </span>│
│<span style="color: #008080; text-decoration-color: #008080"> ultralytics </span>│<span style="color: #800080; text-decoration-color: #800080"> yolov8l </span>│<span style="color: #008000; text-decoration-color: #008000"> image --&gt; objects </span>│
│<span style="color: #008080; text-decoration-color: #008080"> ultralytics </span>│<span style="color: #800080; text-decoration-color: #800080"> yolov8s </span>│<span style="color: #008000; text-decoration-color: #008000"> image --&gt; objects </span>│
│<span style="color: #008080; text-decoration-color: #008080"> ultralytics </span>│<span style="color: #800080; text-decoration-color: #800080"> yolov8n </span>│<span style="color: #008000; text-decoration-color: #008000"> image --&gt; objects </span>│
│<span style="color: #008080; text-decoration-color: #008080"> ... </span>│<span style="color: #800080; text-decoration-color: #800080"> ... </span>│<span style="color: #008000; text-decoration-color: #008000"> ... </span>│
│<span style="color: #008080; text-decoration-color: #008080"> ... </span>│<span style="color: #800080; text-decoration-color: #800080"> ... </span>│<span style="color: #008000; text-decoration-color: #008000"> ... </span>│
└────────────────┴─────────────────────────────────────────────────┴─────────────────────┘
</pre>



You can pick any model from the list of models available.
Let's create a model from the `vikhyatk/moondream2` model. We can optionally specify the device and dtype. By default, the model is created on the CPU and the dtype is `float32`.

Since we have GPU available, let's create the model on the GPU and use `float16` precision.


```python
model = xinfer.create_model("vikhyatk/moondream2", device="cuda", dtype="float16")
```

2024-10-21 22:37:07.833 | INFO  | xinfer.transformers.moondream:__init__:24 - Model: vikhyatk/moondream2
2024-10-21 22:37:07.835 | INFO  | xinfer.transformers.moondream:__init__:25 - Revision: 2024-08-26
2024-10-21 22:37:07.835 | INFO  | xinfer.transformers.moondream:__init__:26 - Device: cuda
2024-10-21 22:37:07.835 | INFO  | xinfer.transformers.moondream:__init__:27 - Dtype: float16
PhiForCausalLM has generative capabilities, as `prepare_inputs_for_generation` is explicitly overwritten. However, it doesn't directly inherit from `GenerationMixin`. From 👉v4.50👈 onwards, `PreTrainedModel` will NOT inherit from `GenerationMixin`, and this model will lose the ability to call `generate` and other related functions.
- If you're using `trust_remote_code=True`, you can get rid of this warning by loading the model with an auto class. See https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes
- If you are the owner of the model architecture code, please modify your model class such that it inherits from `GenerationMixin` (after `PreTrainedModel`, otherwise you'll get an exception).
- If you are not the owner of the model architecture class, please contact the model code owner to update it.


Now that we have the model, let's infer an image.


```python
from PIL import Image
import requests

image_url = "https://raw.githubusercontent.com/vikhyat/moondream/main/assets/demo-1.jpg"
Image.open(requests.get(image_url, stream=True).raw)

```





![png](quickstart_files/quickstart_15_0.png)




You can pass in a url or the path to an image file.


```python
image = "https://raw.githubusercontent.com/vikhyat/moondream/main/assets/demo-1.jpg"
prompt = "Caption this image."

model.infer(image, prompt)
```




'An anime-style illustration depicts a young girl with white hair and green eyes, wearing a white jacket, holding a large burger in her hands and smiling.'



If you'd like to generate a longer caption, you can do so by setting the `max_new_tokens` parameter. You can also pass in any generation parameters supported by the `transformers` library.


```python
image = "https://raw.githubusercontent.com/vikhyat/moondream/main/assets/demo-1.jpg"
prompt = "Caption this image highlighting the focus of the image and the background in detail."

model.infer(image, prompt, max_new_tokens=500)
```




'The image depicts a young girl with long, white hair and blue eyes sitting at a table, holding a large burger in her hands. The background shows a cozy indoor setting with a window and a chair visible.'



If you'd like to see the inference stats, you can do so by calling the `print_stats` method. This might be useful if you're running some sort of benchmark on the inference time.


```python
model.stats.print_stats()
```


<pre style="white-space:pre;overflow-x:auto;line-height:normal;font-family:Menlo,'DejaVu Sans Mono',consolas,'Courier New',monospace"><span style="font-style: italic"> Model Stats </span>
╭───────────────────────────┬─────────────────────╮
│<span style="font-weight: bold"> Attribute </span>│<span style="font-weight: bold"> Value </span>│
├───────────────────────────┼─────────────────────┤
│<span style="color: #008080; text-decoration-color: #008080"> Model ID </span>│<span style="color: #800080; text-decoration-color: #800080"> vikhyatk/moondream2 </span>│
│<span style="color: #008080; text-decoration-color: #008080"> Device </span>│<span style="color: #800080; text-decoration-color: #800080"> cuda </span>│
│<span style="color: #008080; text-decoration-color: #008080"> Dtype </span>│<span style="color: #800080; text-decoration-color: #800080"> torch.float16 </span>│
│<span style="color: #008080; text-decoration-color: #008080"> Number of Inferences </span>│<span style="color: #800080; text-decoration-color: #800080"> 2 </span>│
│<span style="color: #008080; text-decoration-color: #008080"> Total Inference Time (ms) </span>│<span style="color: #800080; text-decoration-color: #800080"> 2087.3517 </span>│
│<span style="color: #008080; text-decoration-color: #008080"> Average Latency (ms) </span>│<span style="color: #800080; text-decoration-color: #800080"> 1043.6759 </span>│
╰───────────────────────────┴─────────────────────╯
</pre>



Finally, you can also run batch inference. You'll have to pass in a list of images and prompts.


```python
model.infer_batch([image, image], [prompt, prompt])
```




['The image depicts a young girl with long, white hair and blue eyes sitting at a table, holding a large burger in her hands. The background shows a cozy indoor setting with a window and a chair visible.',
'The image depicts a young girl with long, white hair and blue eyes sitting at a table, holding a large burger in her hands. The background shows a cozy indoor setting with a window and a chair visible.']



For convenience, you can also launch a Gradio interface to interact with the model.


```python
model.launch_gradio()
```

* Running on local URL: http://127.0.0.1:7860

To create a public link, set `share=True` in `launch()`.



<div><iframe src="http://127.0.0.1:7860/" width="100%" height="500" allow="autoplay; camera; microphone; clipboard-read; clipboard-write;" frameborder="0" allowfullscreen></iframe></div>


That's it! You've successfully run inference with x.infer.

Hope this simplifies the process of running inference with your favorite computer vision models!

<div align="center">
<img src="https://raw.githubusercontent.com/dnth/x.infer/refs/heads/main/assets/github_banner.png" alt="x.infer" width="600"/>
<br />
<br />
<a href="https://dnth.github.io/x.infer" target="_blank" rel="noopener noreferrer"><strong>Explore the docs »</strong></a>
<br />
<a href="#quickstart" target="_blank" rel="noopener noreferrer">Quickstart</a>
·
<a href="https://github.com/dnth/x.infer/issues/new?assignees=&labels=Feature+Request&projects=&template=feature_request.md" target="_blank" rel="noopener noreferrer">Feature Request</a>
·
<a href="https://github.com/dnth/x.infer/issues/new?assignees=&labels=bug&projects=&template=bug_report.md" target="_blank" rel="noopener noreferrer">Report Bug</a>
·
<a href="https://github.com/dnth/x.infer/discussions" target="_blank" rel="noopener noreferrer">Discussions</a>
·
<a href="https://dicksonneoh.com/" target="_blank" rel="noopener noreferrer">About</a>
</div>
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 8696f6d

Please sign in to comment.