Housing Pricing Prediction

A model to predict resale prices of HDB flats in Singapore.

DataAnalysisMachineLearningSciKitLearnPyTorchTensorFlowPandasNumpyTimeSeriesXGBoostCatBoost
Github
Housing Pricing Prediction

Problem

HDB resale flat prices in Singapore are shaped by dozens of geospatial and temporal factors. Existing valuation tools were opaque, failed to account for local amenity density, and couldn't forecast future price trends.

Solution

Built a multi-model forecasting pipeline across 16 Jupyter notebooks — engineering 20+ features from MRT proximity, school density, BTO supply, and SORA rates — then benchmarked XGBoost, CatBoost, GNNWR, and LSTM head-to-end.

Achievement

  • 5 ML models benchmarked on the same dataset
  • 20+ engineered geospatial and temporal features
  • OneMapSG API integration for coordinate resolution
  • Best model achieved <8% MAPE on 2024 held-out data

This project develops time series forecasting models for predicting housing prices in Singapore's HDB market using transaction data and geographical information.

Housing Price Prediction Methodology

ML Pipeline

Data Cleaning & Processing

Feature Engineering

Model Building & Comparison

XGBoost — Gradient Boosting

Trained on working dataset and evaluated on 2024 resale prices.

Random Forest — Ensemble Learning

Built using Out-of-bag (OOB) method and 10-fold Cross Validation.

CatBoost — Gradient Boosting

Trained on cleaned and normalized data with categorical feature support.

GNNWR — Geospatial Model

Incorporates latitude and longitude via neural networks for geospatial weighting.

LSTM — Deep Learning

Three-layered neural network with dropout regularization for capturing temporal dependencies.

XGBoost Model Performance

Random Forest Model Performance

CatBoost Model Performance