Week 01 Introduction
LLMs in Lingustic Research WiSe 2024/25
Akhilesh Kakolu Ramarao
HHU
09 Oct 2024
What we will cover today:
WhoamI: Akhilesh
Akhilesh Kakolu Ramarao (he/him) PhD researcher at the English Language and Linguistics department
WhoamI: Anna
Main interests : anything Natural Language Processing, open source, β¦
Part of research at the Anglistics, Linguistics and Computer Science department
Currently writing my MA thesis on making LLMs better at pragmatics
Part of the Slamlab: https://slam.phil.hhu.de/
What to expect
You will see code and math
Math will be annotated and explained in natural language
You will not be asked to do calculations or derive equations
Youβll learn how to work with code snippets but not how to build something from scratch (unless you already know how to code)
Syllabus Overview
09.10
Admin , Architectures : Statistical and Probabilistic Language Models
Familiarization with Google Colab
16.10
Architectures : Perceptrons and Neural Networks
23.10
Architectures : Recurrent Neural Networks
Assignment 1
Syllabus Overview
30.10
Transformer : General architecture
06.11
Transformer : The Attention mechanism
13.11
Transformer : Transformer models: Decoder/Encode only models
Assignment 2
20.11
Using pre-trained models
Syllabus Overview
27.11
Study week
04.12
Transfer learning : fine-tuning
11.12
Adapting models for specific tasks
Assignment 3
18.12
Adapting models for specific tasks
08.01
Adapting models for specific tasks
Assignment 4
15.01
Probing LLMs
22.01
Probing LLMs
Assignment 5
29.01
TBD
BN Requirements
Completion of the homeworks
Active participation in the class
Pass 4/5 assignments
Who are you
Who are you? What name do you prefer?
What are you studying?
Do you have any prior programming experience? If so, with what language?
Have you worked with LLMs before?
What are you hoping to get out of this course?
Basics of Language Model
A model trained to predict the probability distribution over words or sequences of words in a language
Probability distribution is a mathematical description of the probabilities of word sequences
Statistical Language Models
These models use statistical patterns in the data to make predictions about the likelihood of specific sequences of words
N-gram models are the most common type of statistical language model that predicts the probability of a word given the previous n-1 words
They are simple and computationally efficient but have limitations in capturing long-range dependencies
They are widely used in speech recognition, machine translation, and other NLP tasks
They are used as a baseline for more complex language models
Probabilistic Language Models
These models assign a probability to sequences of words based on the training data
They are based on the principles of probability theory and use probabilistic methods to model the language
They can capture complex patterns in the data and are more flexible than statistical models
They are used in a wide range of NLP tasks, such as machine translation, text generation, and speech recognition
Neural Language Models
These models use neural networks to predict the likelihood of a sequence of words
They are trained on a large corpus of text data and can learn complex patterns and dependencies
They are more powerful than traditional statistical and probabilistic models
They are used in a wide range of NLP tasks, such as, machine translation, text generation, and sentiment analysis
Large Language Models
They are advanced language models that handle billions of training data parameters and generate text output
They are trained on large-scale datasets and can generate human-like text
They are used in a wide range of NLP tasks, such as machine translation, text generation, and question answering
They are the focus of the course
Week 01 Introduction LLMs in Lingustic Research WiSe 2024/25 Akhilesh Kakolu Ramarao HHU 09 Oct 2024