7  KNN Imputation

(with faiss package in Python script)

Published

June 26, 2025

# | output: false
library(reticulate)
library(readr)
library(tidyr)
use_condaenv(condaenv = 'dssg_env') 

Note: use_condaenv(condaenv = 'dssg_env') should be run once. Error in console can be resolved by restarting the R session can re-run the above first code cell.

7.1 KNN Imputation

Workflow for imputation is executable through running the script stored in:

scripts/week-06-knn-faiss-impute.py

There are two ways to execute this script:

  1. Directly render the script in Terminal application using the following bash command (assuming python is installed in your system):

    python scripts/week-06-knn-faiss-impute.py

  2. Run all the code cells this quarto note book:

  • Note 1: Rendering this notebook will not automatically render the following code block. To produce the imputed dataset, the following code needs to be manually run since the data offered by Mentor Canada is not hosted on the current Github repo.
  • Note 2: Please ensure that in your terminal, dssg_env conda environment is NOT activated when running this report. An already activated dssg_env will conflict with the dssg_env in this report.
py_run_file("scripts/week-06-knn-faiss-impute.py")

Resultant train and test data frames will be stored in a folder structure of ../../dssg-2025-mentor-canada/Data/ relative to your local summer2025-dssg_mentor_canada git repository.

  • Example dataset output path:
'../../dssg-2025-mentor-canada/Data/ohe_unimputed_train.feather'