Natural Language Processing 2023
Academic Year 2023/2024
This is not the version of the lesson in the current year (2024).
Visit the UniBO website of the lecture for official and administrative details.
Prerequisites
A gentle introduction to Python
This topic wont be covered in class.
if you are a student of TraTec:
you had the intro to Python in PBR
elif you are a student of SpecTra:
you had the intro to python in APS
else:
check the slides, notebooks, and 2021 video recordings
Regardless, you can find the materials on virtuale.
Regardless of whether you attended either of the introductions, I suggest you to do (or re-visit) all the exercises ASAP.
Course contents
Whereas the contents could be (slightly) adapted according to the students skills and interests, the general structure of the course is as follows.
As of June 2024. the links are broken. I am working on transferring the contents from the olf website
1. Introduction to Natural Language Processing
- 02/10/23 Slides
2. Words and the vector space model
- 03/10/23 Slides on tokens
- 03/10/23 Notebook on tokens and normalisation
- 09/10/23 Slides on VSM
- 09/10/23 Notebook on VSM
- 10/10/23 Slides on RB sentiment (+ naive bayes)
- 10/10/23 Notebook on RB sentiment
3. Naïve Bayes
- 10/10/23 Slides on Naïve Bayes
- 16/10/23 Notebook on Naïve Bayes
4. Word vectors
- 17/10/23 Slides on tf-idf
- 17/10/23 Notebook
5. From Word Counts to Meaning
- 23/10/23 Slides introducing topic modelling
- 23/10/23 Notebook on topic modelling
- 24/10/23 Slides introducing LSA and SVD
- 24/10/23 Notebook on LSA
6. Training and Evaluation
- 30/10/23 Slides on training and evaluation
- 30/10/23 Notebook
7. Intro to NN
- 31/10/23 Slides on the perceptron
- 31/10/23 Notebook on the perceptron
- 06/11/23 Slides introducing neural networks and keras
- 06/11/23 Notebook introducing neural networks and keras
8. Word Embeddings
9. Doc2Vec
- 14/11/23 Slides
- 14/11/23 Notebook
- 14/11/23 Project reminder
10. Convolutions for text
11. Text is Sequential / LSTM
- 21/11/23 Slides on RNN
- 21/11/23 Notebook on RNN
- 27/11/23 Slides on BiRNN and LSTM
- 27/11/23 Notebook on BiRNN
- 27/11/23 Notebook on LSTM
12. Text generation
- 28/11/23 Slides on characters and generation
- 28/12/23 Notebook on characters
- 28/12/23 Notebook on generation
13. Intro to Seq2Seq and Transformers ; Closing Remaks
- 05/12/23 Slides for part one
13. A brief intro to LLMs
- 11/12/23 CLIC-it 2023 tutorial (we will pay a visit to the cool materials from D. Croce and C.D. Hromei)
Embed videos, podcasts, code, LaTeX math, and even test students!
On this page, you’ll find some examples of the types of technical content that can be rendered with Hugo Blox.
Video
Teach your course by sharing videos with your students. Choose from one of the following approaches:
Youtube:
{{< youtube w7Ft2ymGmfc >}}
Bilibili:
{{< bilibili id="BV1WV4y1r7DF" >}}
Video file
Videos may be added to a page by either placing them in your assets/media/
media library or in your page’s folder, and then embedding them with the video shortcode:
{{< video src="my_video.mp4" controls="yes" >}}
Podcast
You can add a podcast or music to a page by placing the MP3 file in the page’s folder or the media library folder and then embedding the audio on your page with the audio shortcode:
{{< audio src="ambient-piano.mp3" >}}
Try it out:
Test students
Provide a simple yet fun self-assessment by revealing the solutions to challenges with the spoiler
shortcode:
{{< spoiler text="👉 Click to view the solution" >}}
You found me!
{{< /spoiler >}}
renders as
👉 Click to view the solution
Math
Hugo Blox Builder supports a Markdown extension for $\LaTeX$ math. You can enable this feature by toggling the math
option in your config/_default/params.yaml
file.
To render inline or block math, wrap your LaTeX math with {{< math >}}$...${{< /math >}}
or {{< math >}}$$...$${{< /math >}}
, respectively.
Example math block:
{{< math >}}
$$
\gamma_{n} = \frac{ \left | \left (\mathbf x_{n} - \mathbf x_{n-1} \right )^T \left [\nabla F (\mathbf x_{n}) - \nabla F (\mathbf x_{n-1}) \right ] \right |}{\left \|\nabla F(\mathbf{x}_{n}) - \nabla F(\mathbf{x}_{n-1}) \right \|^2}
$$
{{< /math >}}
renders as
$$\gamma_{n} = \frac{ \left | \left (\mathbf x_{n} - \mathbf x_{n-1} \right )^T \left [\nabla F (\mathbf x_{n}) - \nabla F (\mathbf x_{n-1}) \right ] \right |}{\left \|\nabla F(\mathbf{x}_{n}) - \nabla F(\mathbf{x}_{n-1}) \right \|^2}$$Example inline math {{< math >}}$\nabla F(\mathbf{x}_{n})${{< /math >}}
renders as $\nabla F(\mathbf{x}_{n})$
.
Example multi-line math using the math linebreak (\\
):
{{< math >}}
$$f(k;p_{0}^{*}) = \begin{cases}p_{0}^{*} & \text{if }k=1, \\
1-p_{0}^{*} & \text{if }k=0.\end{cases}$$
{{< /math >}}
renders as
$$ f(k;p_{0}^{*}) = \begin{cases}p_{0}^{*} & \text{if }k=1, \\ 1-p_{0}^{*} & \text{if }k=0.\end{cases} $$Code
Hugo Blox Builder utilises Hugo’s Markdown extension for highlighting code syntax. The code theme can be selected in the config/_default/params.yaml
file.
```python
import pandas as pd
data = pd.read_csv("data.csv")
data.head()
```
renders as
import pandas as pd
data = pd.read_csv("data.csv")
data.head()
Inline Images
{{< icon name="python" >}} Python
renders as
Python