Computational Linguistics

Academic Year 2020/2021

Learning outcomes

The student will learn the basic theoretical aspects of computational linguistics/natural language processing and will acquire practical skills to perform from tokenization and vectorization to the computation of similarities and supervised models (e.g., for topic identification, structural analysis, meaning analysis).

Course contents

Whereas the contents could be (slightly) adapted according to the students skills and interests, the general structure of the course will be as follows.

0. A gentle introduction to Python

1. Introduction to Computational Linguistics

2. Words and vector space model

3. Naive Bayes

4. Word vectors

5. From Word Counts to Meaning

6. Training and Evaluation

Intermezzo

7. Intro to LSA

8. Intro to NN

9. Word Embeddings

10. Visualisation

11. From document representations, towards sequences

12. Convolutions for text

13. Text is Sequential

14. Long Short-Term Memory Networks

Projects

The evaluation is based on a project. If you want some inspiration, look at the projects presented last year

Some project ideas

Whereas you are supposed to apply the acquired knowledge on a problem of your own interest, here are some ideas, in case you find yourself lost

Standard research

Shared Tasks

Readings/Bibliography

Core

Optional

Teaching methods

The course is a combination of seminar and practical sessions. In either case, active participation of the students is expected. Assuming you know the basics of programming (e.g., by completing the python course in Topic 0 ) we will cover a (practical) description of diverse models and tasks.

Evaluation

The student will work on addressing a problem within her own research interests with the knowledge acquired during the course. Upon agreement of the topic, the student will work on solving the problem and will produce a written report. A poster session will be organized before at the end of the course (or before every appello )in which the students will present their research work.

The final evaluation will be computed as a combination of both report and poster presentation.

Important points

Teaching tools

Seminars will be carried out with slides and coding will be carried out with jupyter notebooks. Continuous exercises will be carried out.

Office hours

See my UniBO website