Skip to content

The repository for U of A Datalab’s “NLP for All” workshop series, where we cover the basics of Natural Language Processing (NLP) and its practical applications for everyday tasks.

License

Notifications You must be signed in to change notification settings

ua-datalab/NLP-Speech

Repository files navigation

nlp Image source: Jeevan chavan's article "NLP: Tokenization , Stemming , Lemmatization , Bag of Words ,TF-IDF , POS"

Natural Language Processing for All

Join us for an engaging and accessible introduction to Natural Language Processing (NLP) and its practical applications for everyday tasks! In "NLP for All," we will explore the fundamental concepts behind NLP: From understanding how computers interpret human language; to discovering how to improve search queries, use regular expressions, find datasets, and learn about pipelines for working with language. Whether you're curious about chatbots, voice assistants, or automated text transcription and analysis, this series will demystify popular technologies and show you how they work.

What We Will Cover:

  • Foundations of NLP: Gain a solid grasp of NLP concepts and terminology without needing a technical background.
  • Real-World Applications: Explore practical uses of NLP in various contexts, such as improving search and information retrieval, generating and evaluating automatic transcriptions, and working with popular libraries such as spaCy, PyTorch and scikit-learn.
  • Hands-On Experience: We will illustrate NLP concepts in action with a well-documented code notebook, aimed at solving practical examples. We will also explore online sources for NLP tools and datasets, such as HuggingFace.

Pre-requisites:

Coordinator: Megh Krishnaswamy.
Location: Albert B. Weaver Science-Engineering Library. Room 212.
When: Thursdays at 3PM.

Calendar:

Date Title Topic Description Materials
09/05/2024 3PM Introduction to NLP with SpaCy

Join us for an informative session on the basics of Natural Language Processing (NLP) with spaCy, a leading open-source library for advanced text processing in Python. Designed for production use and capable of handling large volumes of text efficiently, spaCy offers a Swiss-knife approach to text processing across multiple languages.

In this workshop, we will include tools for key NLP tasks such as tokenization, part-of-speech tagging, named entity recognition, dependency parsing, text similarity calculation.

Link to Notebook
09/12/2024 3PM Regular Expressions for NLP

Regular Expressions (Regex) is an essential skill for advanced search, and analysis. Join us for a comprehensive introduction to the basics of building regular expressions, with a focus on creating and applying patterns to extract, clean, and transform text data effectively. We will explore practical NLP use cases, such as extracting specific information from unstructured data, performing search-and-replace operations, and validating text inputs.

Our materials will include resources for getting started with Regex syntax, as well as practical code examples deploying Regex searches on your desktop, using browser tools, as well as Python and R libraries. Join us for a practical demonstration of how to get the best out of your text searches with Regex!

Link to notebook
09/19/2024 3PM NLP with Transformers In this workshop, we will introduce foundational concepts of the transformer architecture.

We will also look at some use-cases for using pre-trained models that fit our use case, using popular Python frameworks like TensorFlow and PyTorch.

Join us for an informative session on the technology that built Large Language Models, and what it can do to enhance your skills as a researcher!

Link to notebook Link to Slides
09/26/2024 3PM Introduction to Semantic Search

Dive into the world of semantic search with this workshop, where we will explore NLP powered options to enhance text search. Unlike the traditional keyword-based search, semantic search understands the meaning and context behind queries, providing more relevant and contextualized results. This workshop covers the fundamentals of semantic search technologies, with introductions to vector representations, embeddings, and advanced search algorithms.

In this workshop, we will explore how to implement semantic search with a simple real-world use-case. We will also learn how to find, choose and use pre-trained models and datasets for our tasks. Join us to learn more about how meaning and context can help you get the best out of your search experience!

Link to Notebook Link to optional notebook
10/03/2024 3PM Introduction to Information Extraction

Join us for an introductory session on Information Extraction (IE)! Designed with a focus on automatic extraction of structured information from unstructured text, we will explore why information extraction is a key skill for a variety of research tasks.

IE is a critical component of many NLP applications, from data mining to knowledge graph construction. In this workshop, we will cover from fundamentals of information extraction, such as named entity recognition, relationship extraction, and event detection. We will look at various algorithms and tools used in IE. This workshop will provide hands-on experience with a simple project that demonstrates how to extract valuable insights from large text corpora, implemented using Python.

Enhance your abilities to automate information extraction, to transform raw text into meaningful data!

Link to Notebook
10/10/2024 3PM Text pre-processing for NLP

Prepare your text data for advanced analysis with our primer on text pre-processing for Natural Language Processing. Text pre-processing is a crucial step in any NLP pipeline, ensuring that your data is clean, normalized, and ready for modeling. This workshop will introduce pre-processing techniques for text data from sources such as web scraping and online datasets. We will take a look at tools available for categorising, organizing and tagging our text.

With a practical demonstration, we will explore handling various text formats, dealing with noise, and transforming text into a format suitable for machine learning algorithms. Whether you are interested in an NLP task or just making sense of a data dump, join us for this session on the tools and knowledge to optimize your text data effectively!

Link to Notebook
10/17/2024 3PM Introduction to Speech Technology

Explore the field of Speech Technology with this introductory workshop, designed to improve your knowledge of the principles and applications of speech processing. This is a beginner-friendly, hands-on workshop that covers the basics of acoustic modeling, phonetics, and a brief look at the applications of speech technology in modern applications.

We will discuss real-world applications such as automatic transcription, speech recognition, text-to-speech synthesis, and speaker identification, and take a look at existing tools and techniques for building simple speech-powered tools.

10/24/2024 3PM Speech-to-Text with Whisper AI

Whisper AI, known for its high accuracy and efficiency, is transforming the way we convert spoken language into written text. This workshop provides an overview of Whisper AI's architecture and features, and covers the process of building, training, and deploying speech-to-text models. We will explore real-world applications such as automatic transcription, and look at ways to effectively evaluate our output (such as WER scores) .

With a practical coding examples, we will cover handling speech data in various languages, to achieve high-quality transcription, and explore ways of creating pipelines in Python to save and process our outputs.


About

The repository for U of A Datalab’s “NLP for All” workshop series, where we cover the basics of Natural Language Processing (NLP) and its practical applications for everyday tasks.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published