# | output: false
library(reticulate)
library(readr)
library(tidyr)
use_condaenv(condaenv = 'dssg_env')
7 KNN Imputation
(with faiss package in Python script)
Note:
use_condaenv(condaenv = 'dssg_env')
should be run once. Error in console can be resolved by restarting the R session can re-run the above first code cell.
7.1 KNN Imputation
Workflow for imputation is executable through running the script stored in:
scripts/week-06-knn-faiss-impute.py
There are two ways to execute this script:
Directly render the script in Terminal application using the following bash command (assuming python is installed in your system):
python scripts/week-06-knn-faiss-impute.py
Run all the code cells this quarto note book:
- Note 1: Rendering this notebook will not automatically render the following code block. To produce the imputed dataset, the following code needs to be manually run since the data offered by Mentor Canada is not hosted on the current Github repo.
- Note 2: Please ensure that in your terminal,
dssg_env
conda environment is NOT activated when running this report. An already activateddssg_env
will conflict with thedssg_env
in this report.
py_run_file("scripts/week-06-knn-faiss-impute.py")
Resultant train and test data frames will be stored in a folder structure of ../../dssg-2025-mentor-canada/Data/
relative to your local summer2025-dssg_mentor_canada
git repository.
- Example dataset output path:
'../../dssg-2025-mentor-canada/Data/ohe_unimputed_train.feather'