Natural Language Processing

Natural Language Processing

Academic Year 2025/2026

(frontpage illustration produced with deepai’s tool in October 2024; using prompt natural language processing class for translation and technology masters).

Visit the UniBO website of the lecture for official and administrative details.

Prerequisites

A gentle introduction to Python

This topic wont be covered in class.

if you are a student of TraTec:
  you had the intro to Python in PBR
elif you are a student of SpecTra:
  you had the intro to python in APS
else: 
  check the slides, notebooks, and 2021 video recordings

Regardless, you can find the materials on virtuale.

Regardless of whether you attended either of the introductions, I suggest you to do (or re-visit) all the exercises ASAP.

Homework

Homework is going to be handled through virtuale. No further contents are expected to be shared there. On 09/10, you should have obtained the password to access from me. If you did not, ping me. Homework has associated a hard deadline.

Course contents

Whereas the contents could be (slightly) adapted according to the students skills and interests, the general structure of the course is as follows.

1. Introduction to Natural Language Processing

  • Lesson 1. MO 29/09/24 Slides Introduction

2. Words and the vector space model

  • Lesson 2. WE 01/10/25 Slides Tokens and normalisation
  • Lesson 2. WE 01/10/25 Notebook Tokens and normalisation
  • Lesson 3. LU 06/10/25 Slides Vector Space Model
  • Lesson 3. LU 06/10/25 Notebook on VSM Vector Space Model

3. Rule-based and Naïve Bayes’ classifier

  • Lesson 4. WE 08/10/25 Slides Rule-based sentiment analysis

  • Lesson 4. WE 10/10/25 Notebook Rule-based sentiment analysis

  • Lesson 5. WE 13/10/25 Slides Naïve Bayes’ classifier

4. Word vectors

5. From Word Counts to Meaning

6. Training and Evaluation

7. Intro to NN

Intermezzo

8. Word Embeddings

9. Doc2Vec

10. Convolutions for text

11. Text is Sequential / LSTM

- CLIC-it 2024

12. Text generation

13. Intro to Seq2Seq and Transformers

14. A brief intro to LLMs + Closing Remaks

FIN

Calendars

This year, NLP has one sibling lesson:

  • Selected Topics in Natural Language Processing is an optional (with credits). Further information about it is available on the UniBO website. Table 1 shows the calendar of the 8 lessons.
LessonDateTimeLocationLessonDateTimeLocation
1TU 14 Oct13:30lab 105TU 11 Nov13:30lab 10
2TU 21 Oct13:30lab 106TU 18 Nov13:30lab 10
3TU 28 Oct13:30lab 107TU 25 Nov13:30lab 10
4TU 04 Nov13:30lab 108TU 02 Dec13:30lab 10
Table 1: Calendar overviewing all 8 Selected Topics in NLP planned lessons.

Projects

For your final mark, 80% comes from the final project. Look for inspiration, in the projects presented in previous years

Some project ideas

  • Given an entry from a restaurant menu, split into name, description, and price.

Eventually, I will drop here more ideas for final projects.

Previous final projects

2025-2026

yours will be here

2024-2025

  • Santangelo D.P. (2025) No Stupid Questions, Only Labeled Ones: Intent Classification for University FAQs
    🗎

  • Forzatti A. (2025) Benchmarking Bilingual Text Anonymization and Automatic Term Extraction Approaches
    🗎

2023-2024

  • Cupin E., Galiero L., and Ciminari D. (2023). Back to the Roots: Tracing Source Languages in Wikipedia with LABSE
    🗎

2022-2023

  • Mainardi. P (2023). Identifying masculine generics in Italian
    🗎

2021-2022

  • Gajo, P. (2022). Hate Speech Detection in Incel Online Spaces
    🗎

  • Kovacs, M. (2022). Fishing for catfishes: using a model trained on Twitter data to predict author gender in Reddit posts
    🗎

2020-2021

  • Hopkins, D. (2022). Assessing Semantic Similarity between Original Texts and Machine Translations
    🗎
  • Galletti, E. (2021). Identifying Characters’ Lines in Original and Translated Plays. The case of Golden and Horan’s Class
    🗎

  • Yu, X. (2021). Classifying An Imbalanced Dataset with CNN, RNN, and LSTM
    🗎

2019-2020

  • Fernicola F. and Zhang S. (2020). AriEmozione: Identifying Emotions in Opera Verses
    (developed under CRICC; published in CLiC-it 2020)
    🗎 🎦

  • Muti, A. (2020). UniBO@AMI: A Multi-Class Approach to Misogyny and Aggressiveness Identification on Twitter Posts Using AlBERTo
    (top-performing model in Evalita’s 2020 AMI shared task)
    🗎 🎦