My research focuses on Improving the Utility of Social Media with Natural Language Processing (NLP). In particular, I worked on Twitter text normalisation and geolocation prediction. My research aims to practise a divide and conquer paradigm for the huge amount of social media data. On "conquer the noise", lexical normalisation aims to convert non-standard words to their canonical forms in social media, e.g., 4eva ("forever"). By doing so, the normalised data is expected to be more accessible to existing NLP tools and downstream applications. As for "divide the data", geolocation prediction usually takes a Twitter user's tweets (incl. metadata) as input, and outputs the most probable location from a discrete set of pre-defined locations, such as metropolitan cities. It enables data partition by location. As a result, it makes location-based applications feasible (e.g., local event detection, regional sentiment analysis) and avoids dealing with massive irrelevant data.
Sometimes, I'd like to code some side data science projects. These projects are like Unix utility tools which are dedicated to solve a particular information need. I found them useful when I bought my property and car.
Cars (in Australia)