Exp 12 — Spam detection with TF-IDF

Record-ready template Fill placeholders with your dataset, code, outputs, plots, and viva.

Prev Next

AIML355 • Fundamentals of Deep Learning Lab

EXP12 — Spam Detection with TF-IDF

Record-ready template Replace placeholders with your final work (code + outputs + screenshots).

Submission checklist

Aim ✓ • Environment ✓ • Dataset ✓ • Procedure ✓ • Code ✓ • Output ✓ • Discussion ✓ • Viva ✓

1) Aim

To build an NLP model for spam detection using TF-IDF in Python.

Learning outcomes

Prepare labeled SMS/email dataset and clean text.
Build TF-IDF features and train classifier.
Evaluate using precision/recall/F1 and confusion matrix.

2) Requirements / Environment

Software

Python 3.10+ (recommended)
TensorFlow/Keras (or PyTorch where applicable)
NumPy, Pandas, Matplotlib
Jupyter/Colab optional

Hardware

CPU is OK for small runs; GPU optional
RAM: 4–8 GB+ recommended

Reproducibility

Record library versions and random seed in your final report.

import pandas as pd from sklearn.model_selection import train_test_split from sklearn.feature_extraction.text import TfidfVectorizer from sklearn.svm import LinearSVC from sklearn.metrics import classification_report, confusion_matrix # TODO: load spam dataset # df = pd.read_csv(...) # X = df['text'] # y = df['label'] vec = TfidfVectorizer(max_features=20000, ngram_range=(1,2)) Xv = vec.fit_transform(X) X_train, X_test, y_train, y_test = train_test_split(Xv, y, test_size=0.2, random_state=42, stratify=y) clf = LinearSVC() clf.fit(X_train, y_train) pred = clf.predict(X_test) print(classification_report(y_test, pred))

Exp 12 — Spam detection with TF-IDF

EXP12 — Spam Detection with TF-IDF

1) Aim

2) Requirements / Environment

3) Dataset

4) Procedure / Steps

5) Code (Skeleton)

6) Results / Output

7) Observations / Discussion

8) Conclusion

9) Viva Questions

10) Post-lab Assignment