Computational Linguistics.

Academic Year 2019/2020

This is not the current edition of this lecture. Jump here instead

Learning outcomes

The student will learn the basic theoretical aspects of computational linguistics/natural language processing and will acquire practical skills to perform from tokenization and vectorization to the computation of similarities and supervised models (e.g., for topic identification, structural analysis, meaning analysis).

Course contents

Whereas the contents could be (slightly) adapted according to the students skills and interests, the general structure of the course will be as follows.

0. Introduction to Computational Linguistics

1. Introduction to Python scripting

2. Words and vector space model

3. Naive Bayes

4. Word vectors

5. From Word Counts to Meaning

6. Training and Evaluation

7. Intro to LSA

8. Intro to NN

9. Word Embeddings

10. Visualisation

11. From document representations, towards sequences

12. Convolutions for text

12. Text is Sequential

Beyond the course

Some interesting papers involving corpora creation
Some quick guides to build corpora

Projects

Notice If you opt for turning your project into the participation to some shared task, it is alright if more than one person targets the same task.

Submitted projects (up to February 2021)

Some project alternatives

Are you defending on the first/second appello? Why not turning your project into a CLIC-it paper? The deadline is on 15/07/2020.

2020 students and their project

students Project name Status Call
Alfieri, A TBD TBD TBD
Compagnoni, A TBD TBD TBD
Contarino, A TBD TBD TBD
Fabbri, E TBD TBD TBD
Fernicola, F AriEmotion submitted Sep 2020
Ferraiuolo, M TBD TBD TBD
Galletti, E Theatre’s character recognition submitted Feb 2021
Giannoni, L TBD TBD TBD
Guarino, E TBD TBD TBD
Ippoliti, C TBD TBD TBD
Martinelli, M TBD TBD TBD
Moro, E TBD TBD TBD
Muti, A Evalita’s AMI (task A) submitted Sep
Norova-Lukina, V Cognates for text intercomprehension green flag TBD
Polverino, F TBD TBD TBD
Ravanelli, S TBD TBD TBD
Tedesco, N Geolocalised COVID-19 Twitter Discussion Explorer Tentative TBD
Terenzi, L TBD TBD TBD
Vázquez C, A TBD TBD TBD
Wang, X TBD TBD TBD
Yu, X X (Catherine) Focused hate-speech during the pandemia submitted Feb 2021
Zhang, S AriEmotion submitted Sep 2020

Readings/Bibliography

Core

Optional

Teaching methods

The course is a combination of seminar and practical sessions. In either case, active participation of the students is expected. We will start with an introduction to the Python programming language and follow with a (practical) description of diverse models and tasks.

Attendance to a minimum of 70% of the lessons is a must.

Assessment methods

The student will work on addressing a problem within her own research interests with the knowledge acquired during the course. Upon agreement of the topic, the student will work on solving the problem and will write a written report. A poster session will be organized at the end of the course in which the students will present their research work.

The final evaluation will be computed as a combination of both report and poster presentation.

Teaching tools

Seminars will be carried out with slides and coding will be carried out with jupyter notebooks. Continuous exercises will be carried out.

Office hours

TBD