Exp 11 — NLP analysis of restaurant reviews

Record-ready template Fill placeholders with your dataset, code, outputs, plots, and viva.
AIML355 • Fundamentals of Deep Learning Lab

EXP11 — NLP Analysis of Restaurant Reviews

Record-ready template Replace placeholders with your final work (code + outputs + screenshots).
Submission checklist
Aim ✓ • Environment ✓ • Dataset ✓ • Procedure ✓ • Code ✓ • Output ✓ • Discussion ✓ • Viva ✓

1) Aim

To perform NLP analysis of Restaurant Reviews in Python.

Learning outcomes
  • Clean text (lowercase, stopwords, stemming/lemmatization).
  • Convert to features (Bag-of-Words or TF-IDF).
  • Train a classifier (LogReg/Naive Bayes) and evaluate.

2) Requirements / Environment

Software
  • Python 3.10+ (recommended)
  • TensorFlow/Keras (or PyTorch where applicable)
  • NumPy, Pandas, Matplotlib
  • Jupyter/Colab optional
Hardware
  • CPU is OK for small runs; GPU optional
  • RAM: 4–8 GB+ recommended
Reproducibility
Record library versions and random seed in your final report.

3) Dataset

  • Source: Use restaurant reviews dataset (Kaggle or course-provided) and cite it.
  • Features/Labels: [Describe X and y; mention classes if classification]
  • Split: [Train/Validation/Test or K-fold]
  • Preprocessing: [Scaling/Normalization, resizing, tokenization, etc.]

4) Procedure / Steps

  1. Load dataset and perform preprocessing.
  2. Define model architecture and justify key choices.
  3. Compile model (loss + optimizer + metrics).
  4. Train with validation and log curves.
  5. Evaluate on test set and compute required metrics.
  6. Summarize observations and limitations.
Model hint
TF-IDF + classifier baseline; optional deep model: Embedding + 1D CNN / RNN.

5) Code (Skeleton)

Paste your complete runnable code below (or attach notebook link in the final submission).

import re
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.linear_model import LogisticRegression
from sklearn.metrics import classification_report, confusion_matrix

# TODO: load restaurant reviews dataset
# df = pd.read_csv(...)
# X = df['review_text']
# y = df['label']

vec = TfidfVectorizer(max_features=5000, ngram_range=(1,2))
Xv = vec.fit_transform(X)
X_train, X_test, y_train, y_test = train_test_split(Xv, y, test_size=0.2, random_state=42, stratify=y)

clf = LogisticRegression(max_iter=200)
clf.fit(X_train, y_train)
pred = clf.predict(X_test)
print(classification_report(y_test, pred))

6) Results / Output

  • Metrics: [Write your final values: accuracy/F1 or MAE/MSE]
  • Plots: [Attach loss/metric curves; prediction vs actual plots if forecasting]
  • Screenshots: [Paste screenshots of outputs, confusion matrix, sample predictions]

7) Observations / Discussion

  • [Observation 1: what changed when you tuned epochs/batch size?]
  • [Observation 2: evidence of overfitting/underfitting?]
  • [Observation 3: what improved performance (augmentation, regularization, fine-tuning)?]

8) Conclusion

Write 3–6 lines summarizing what you implemented, key result, and what you learned.

9) Viva Questions

  1. What is TF-IDF and why is it useful?
  2. What are n-grams?
  3. What is the difference between training and validation data?
  4. Explain overfitting and two ways to reduce it.
  5. Why do we normalize/scale inputs?
  6. What does batch size and epoch mean?
  7. How do you choose a loss function for a task?

10) Post-lab Assignment

  • Compare TF-IDF with Bag-of-Words.
  • Try a deep model (Embedding + LSTM) and compare.
Tip: press Esc to close.