A blog from the University of Borås

Monday 16 May 2022

RAI - FTA Internship (2/3)

Hey guys hope you are fine !

my last weeks mainly consisted of finding appropriate companies and contacting these to arrange a meeting. However, since many companies have no interest or availability in taking part in the study, it was more about finding the right strategy to get through to the right people and be successful with arranging an interview.

Hence we focused our last meeting with Vijay and Frederic from WLY on an additional extension of my research work, by text-mining documents and twitter feeds. This should help to compensate the little information I am retrieving from interviews and helps to give an additional perspective on the research.

Text Mining is a process which uses natural language processing (NLP) to analyse text on patterns and coherences. To search through twitter I formulated a query with specific keywords, which a tweet must contain. Twitter then sends you back a file with tweets that match the query. This gathered data needs to be further cleaned and filtered for the analysis.

Tokenisation: Splits sentence in its words and makes them a string. This step makes it easier for the further processing to recognise the words of a sentence.

Lemmatisation: Lemmatisation takes every word and reduces it to the word stem. This means words like worked or working will be reduced to work, since it’s the original root word. If this is not done the words would appear as individual words and this can distort the analysis.

Followed are these main processing steps, by a filter which only shows words, which appear more than ten times, links to pictures and urls are dropped and characters are filtered, to only leave the information which is important to the analysis.

In the analysis I then took a look into the most frequent words and how they appeared over time. This helps to understand how and what people are tweeting about my research topic.

I additionally want to word mine CSR and sustainable reports and make them same processes to the sentence that contain the keywords. Since time is very limited in this research I will maybe only focus on a few reports.

With a presentation of our finding in the beginning of June this project will be finished, but till then I will have written my third report and tell you more.

I hope I could give you a good idea about what I have been doing over the last weeks and what I am looking forward too.

Wish you all a very nice last period of your project and internship.

regards, 

Fabian

No comments:

Post a Comment

Note: only a member of this blog may post a comment.