A machine learning system that classifies email messages as valid or spam using four trained classification models. Built for MIS 542 — Spam Detection Project.
Try the Live DemoThe goal of this project is to develop a machine learning model capable of automatically classifying email and SMS messages as either valid or spam. Spam messages are unsolicited, often fraudulent communications that clutter inboxes and pose security risks to recipients.
By training on a labeled dataset of real-world messages, the system learns distinguishing linguistic patterns — such as urgency triggers, prize claims, and unusual punctuation — that are strongly associated with spam. The resulting model can then filter new, unseen messages in real time through a web interface.
This project encompasses four distinct phases:
The dataset contains 5,572 labeled messages sourced from real SMS and email communications.
Total Messages
5,572
Valid Messages
4,825
86.6% of total
Spam Messages
747
13.4% of total
Class Distribution
The dataset is heavily imbalanced — only 13.4% of messages are spam. This motivates the use of SMOTE oversampling during model training.