# Natural Language Processing 2024.
## LM Specialised Translation

# Playing with a rule-based sentiment analyser

The [VADER](http://comp.social.gatech.edu/papers/icwsm14.vader.hutto.pdf) sentiment analyser was introduced in ICWCSM 2014. [ICWCSM](https://www.icwsm.org/) is the Conference on Web and Social Media.

Vader has been [released](https://pypi.org/project/vaderSentiment/) as a python package

In [None]:
# installing the package (library)
! pip3 install vaderSentiment

In [None]:
from vaderSentiment.vaderSentiment import SentimentIntensityAnalyzer
sa = SentimentIntensityAnalyzer()

In [None]:
# Let us have a look at the "full" lexicon
sa.lexicon

# this is "just" a dictionary!

In [None]:
# Or just part of it
[(tok, score) for tok, score in sa.lexicon.items() if tok.startswith("no")]

In [None]:
# Let us see if there are bigrams

# BTW, what is a bigram?

[(tok, score) for tok, score in sa.lexicon.items() if " " in tok]

In [None]:
# Finally, let's score!!
text = "Python is very readable and it's great for NLP."
sa.polarity_scores(text=text)

What is the meaning of these scores?

Let us look at the [documentation](https://github.com/cjhutto/vaderSentiment#about-the-scoring)

In [None]:
# Let us see the dictionary entries for this vocabulary

for token in "Python is very readable and it's great for NLP.".lower().split():
    if token in sa.lexicon:
        print(token, sa.lexicon[token])
    else:
        print(token, "unk")

In [None]:
# Let us look at a some interesting examples...

print(sa.polarity_scores(text="Python is not very readable and it isn't great for NLP."))
print(sa.polarity_scores(text="Python is not a bad choice for many applications."))
sa.polarity_scores(text="Python is a bad choice for many applications.")

In [None]:
for word in "Absolutely perfect! Love it! :-) :-) :-)".split():
    print(word, sa.polarity_scores(word))

In [None]:
corpus = ["Absolutely perfect! Love it! :-)",
          "Horrible! Completely useless. :(",
          "It was OK. Some good and some bad things.",
          "Absolutely perfect! Love it! :("]

for doc in corpus:
    scores = sa.polarity_scores(doc)
    # here "{:+}" is forcing the sign to be displayed, even if positive
    print('{:+}: {}'.format(scores['compound'], doc))

In [None]:
# Scoring an (incomplete) Amazon review

text = """"This monitor is definitely a good value. Does it have superb color and
contrast? No. Does it boast the best refresh rate on the market? No.
But if you're tight on money, this thing looks and preforms great for the money.
It has a Matte screen which does a great job at eliminating glare. The chassis it's enclosed
within is absolutely stunning.")"""
print("Length of the text", len(text.split()), "\n")

for i in [3, 5, 10, 20, 45, 60]:
    t = " ".join(text.split()[:i])
    print(i,"\t", t)
    print("ONE TIME", sa.polarity_scores(t))
    print("THREE TIMES", sa.polarity_scores(" ".join([t, t, t])))
    print()


In [None]:
print(sa.polarity_scores("this is not good"))
print(sa.polarity_scores("this is not good at all"))

In [None]:
# Scoring a tweet
sa.polarity_scores("""His ass didnt concede until July 12, 2016.
Because he was throwing a tantrum. I can't say this enough: Fuck Bernie Sanders""")

In [None]:
# Scoring a more recent tweet
# https://twitter.com/KremlinRussia_E/status/1497265862784339971
sa.polarity_scores("""Meeting with permanent members of the Security Council.
The main topic is the situation in Ukraine""")

In [None]:
# Scoring a line from a political magazine
# https://www.frontpagemag.com/the-lefts-virtue-signalers-recoil-at-illegal-immigrants-on-their-doorsteps/
sa.polarity_scores("""The administrationâ€™s cynical strategy is to deflect attention from a border crisis
of its own making and cast Governors Abbott and DeSantis in particular as inhumane villains.""")

In [None]:
sa.polarity_scores("Tbilisi tonight. Thank you, Georgia! #StandWithUkriane")

**Warning**: these tools are not perfect. They are limited.

The results have to be properly analysed.
For instance, let us consider [this tweet](https://twitter.com/Umarbison/status/1498812392951611395)...

In [None]:
tweet_1498812392951611395 = """What if the difference? You know what!
it is only hate! Worse thing is about media showing it hero and villian.
End of the day we breathe same oxygen to survive. Innocents are dying
everywhere but only the best actors are supported by media and people.
#UkraineRussiaWar #Palestine"""
sa.polarity_scores(tweet_1498812392951611395)

...versus this [Amazon review](https://www.amazon.co.uk/gp/customer-reviews/R2M5ZU6IKXGQT6/ref=cm_cr_dp_d_rvw_ttl?ie=UTF8&ASIN=B07XLDLZPF) (which 1 star associated)

In [None]:
amzn_review = """Monitor seems good, but for some reason LG have taken it upon themselves to
sink the VESA mount points into a proprietary 6mm deep cutout in the rear of the monitor,
meaning that the Duronic monitor mounts I've been using for *every single* monitor on my desk
for years, categorically do not fit this monitor.
Totally defeats the point of the VESA mount standard. Stupid move LG, really dumb design."""
sa.polarity_scores(amzn_review)

# Homework

1. Get the polarity scores for each of the words in "Python is very readable and it's great for NLP" after running a proper pre-processing pipeline
2. Get the polarity scores for each of the words in "Python is not very readable and it isn't great for NLP." after running a proper pre-processing pipeline
3. Play with some other interesting examples from twitter, amazon, rotten tomatoes...