author = "Muhammad Hassan Arif and Jianxin Li and
Muhammad Iqbal",
booktitle = "2017 IEEE 29th International Conference on Tools with
Artificial Intelligence (ICTAI)",
title = "Solving Social Media Text Classification Problems
Using Code Fragment-Based XCSR",
year = "2017",
pages = "485--492",
abstract = "Sentiment analysis and spam detection of social media
text messages are two challenging data analysis tasks
due to sparse and high-dimensional feature vectors.
Learning classifier systems (LCS) are rule-based
evolutionary computing systems and have limited
capabilities to handle real valued sparse
high-dimensional big data sets. LCS techniques use
interval based representations to handle real valued
feature vectors. In the work presented here, interval
based representation is replaced by genetic programming
based tree like structures to classify high-dimensional
real valued text feature vectors. Multiple experiments
are conducted on different social media text data sets,
i.e. tweets, film reviews, Amazon and yelp reviews, SMS
and Email spam message to evaluate the proposed scheme.
Real valued feature vectors are generated from these
data sets using term frequency inverse document
frequency and/or sentiment lexicons-based features.
Results depicts the supremacy of the new encoding
scheme over interval based representations in both
small and large social media text data sets.",