Hate Speech Classification of Codeswitched Data: Leveraging Psycho-social Features to classify Hate Speech: Case of Kenyan Tweets during 2017 Election
商品資訊
ISBN13:9781952751899
出版社:Lightning Source Inc
作者:Lawrence Muchemi
出版日:2020/09/23
裝訂:平裝
規格:22.9cm*15.2cm*1cm (高/寬/厚)
商品簡介
商品簡介
Identifying short text messages containing hate speech from the gigantic content generated by users on social media is a challenging classification task. Social media data presents unprecedented challenges to conventional natural language processing techniques regarding extracting high-quality features from the noisy, highly dimensional, codeswitched, and big unstructured data. Besides, a systematic review of previous studies indicated lack of publicly available annotated datasets for comparative studies, little evidence of theoretical underpinning for the annotation schemes used, and hardly any study on codeswitched data. To address these gaps, this book explores a data-driven approach in identifying highly qualitative and discriminative features in hate text messages from social media. The goal was to subsequently use these features to train a better performing machine classification model in effectively capturing subtle hate speech text messages from social media. Approximately 400k messages were crawled from social media for a period of one year during the 2017 general election period in Kenya using a combination of problematic hashtags, ethnic slurs, hate patterns, and messages from pro-hate user accounts. A random sample of 50k messages was manually labeled into three classes, i.e., Hate Speech, Offensive, or Neither, by a team of 27 human annotators. Subsequently, this dataset was further reduced by extracting a psychosocial feature subset (PDC) informed by the conceptual framework using a hierarchical probability modeling technique. To evaluate and select the best model, a grid search was performed over all the combination of features using a 5- fold cross-validation, with a tenth of the data reserved for evaluation as well as to avoid over-fitting the model. Based on the results of the experiments, the novel psychosocial feature set (PDC) was effective in identifying hate speech and outperformed the conventional features in training the best classifier, i.e., using the linear SVM algorithm, with accuracies of 82.8%. The Passion (P) and Distance (D) components proved the most salient with accuracies of 74.3% and 74.2%, respectively. Besides, the psychosocial feature framework generalized better in handling other types of hate speech.
主題書展
更多
主題書展
更多書展購物須知
外文書商品之書封,為出版社提供之樣本。實際出貨商品,以出版社所提供之現有版本為主。部份書籍,因出版社供應狀況特殊,匯率將依實際狀況做調整。
無庫存之商品,在您完成訂單程序之後,將以空運的方式為你下單調貨。為了縮短等待的時間,建議您將外文書與其他商品分開下單,以獲得最快的取貨速度,平均調貨時間為1~2個月。
為了保護您的權益,「三民網路書店」提供會員七日商品鑑賞期(收到商品為起始日)。
若要辦理退貨,請在商品鑑賞期內寄回,且商品必須是全新狀態與完整包裝(商品、附件、發票、隨貨贈品等)否則恕不接受退貨。

