Prepping the Environment
Introduction
Before we begin building chatbots—whether rule-based, retrieval-based, or generative—we need to establish a clean, reproducible development environment. In machine learning–driven applications, environment consistency is critical. Differences in Python versions, libraries, or tooling can lead to confusing bugs, incompatible dependencies, or inconsistent model behavior.
In this lecture, we will:
- Create and manage a Python virtual environment
- Set up Jupyter Notebook for experimentation and learning
- Install foundational machine learning libraries including PyTorch, scikit-learn, pandas, and NumPy
- Configure VSCode for an efficient ML-focused workflow
By the end of this lesson, every student should have a standardized environment capable of supporting the entire chatbot curriculum.
Project Structure End-State
By the end of this lecture, you should have a project layout that looks as such.
root/
├── .venv/
├── notebooks/
├── data/
└── requirements.txt
This will help you organize and manage our AI related projects through out this module.
Python Virtual Environment
What is a Virtual Environment
A Python virtual environment (venv) is an isolated workspace that allows you to install Python packages without affecting your system-wide Python installation. Each virtual environment maintains its own versions of libraries and dependencies.
Virtual environments are especially important in machine learning because:
- ML libraries evolve rapidly
- Different projects often require different dependency versions
- Reproducibility is essential for debugging and collaboration
Using a virtual environment ensures that everyone in the class is working with the same tools and avoids the common “it works on my machine” problem.
Creating a Python VENV
We will standardize on Python 3.11, using the system-installed python3 available on both macOS (Homebrew) and Ubuntu (apt).
Ensure you are in your project directory before running these commands.
python3 -m venv .venv
This command creates a folder named .venv/ containing an isolated Python interpreter and package manager.
Management Commands
Activate the Virtual Environment
source .venv/bin/activate
Once activated, your terminal prompt will change to indicate you are inside the virtual environment.
Deactivate the Virtual Environment
deactivate
You should deactivate the environment when switching projects or when you are done working.
VSCode Consideration
VSCode must be explicitly told to use the Python interpreter inside your virtual environment.
Steps:
- Open the project folder in VSCode
- Press
Cmd + Shift + P(macOS) orCtrl + Shift + P(Linux) - Select Python: Select Interpreter
- Choose the interpreter located at:
./.venv/bin/python
if this isn't within the provided options, activate your .venv and write
which python3onto the terminal. Copy the output of the command and enter it as the interpreter path.
This ensures:
- Jupyter uses the correct kernel
- Installed ML libraries are recognized
- Linting and IntelliSense work correctly
Jupyter Notebook
What is Jupyter Notebook
Jupyter Notebook is an interactive computing environment that allows you to combine live code, explanatory text, equations, and visualizations in a single document. Originally developed from the IPython project, Jupyter has become a cornerstone of the data science and machine learning ecosystem.
Jupyter is widely used because it:
- Encourages experimentation and iteration
- Makes data exploration visual and interactive
- Is ideal for teaching, prototyping, and analysis
Throughout this chatbot curriculum, Jupyter Notebooks will be used to explore datasets, prototype models, and understand ML concepts step by step.
Installing Jupyter Notebook into the VENV
Ensure your virtual environment is activated, then install Jupyter:
pip install notebook ipykernel
notebookprovides the classic Jupyter interfaceipykernelallows the virtual environment to be used as a kernel
Register the virtual environment with Jupyter:
python -m ipykernel install --user --name chatbot-venv --display-name "Chatbot VENV"
Creating a Jupyter Notebook File
Launch Jupyter file within VSCode by creating a file with the ending of ipynb:
touch my_notebook.ipynb
Connecting a Shell to Your Jupyter Notebook File
Always confirm the kernel in the top-right corner matches your virtual environment. If it does not:
- Click Kernel
- Select Change Kernel
- Choose Chatbot VENV
This prevents dependency mismatches and runtime errors.
Notebook Blocks
Markdown Block
Markdown cells are used to write explanations, notes, and documentation. They support headings, lists, code formatting, and images.
Use Markdown blocks to:
- Explain logic
- Document experiments
- Describe results and conclusions
Python Block
Python cells execute live code. These blocks are used to:
- Load data
- Train models
- Run experiments
- Visualize results
Notebook execution is stateful, meaning variables persist across cells. Execution order matters.
PyTorch
What is PyTorch
PyTorch is an open-source deep learning framework widely used in research and industry. It provides tools for building neural networks, training models, and performing tensor computations efficiently.
PyTorch is popular because:
- It uses dynamic computation graphs
- It feels “Pythonic” and intuitive
- It is widely adopted in modern AI research
In this curriculum, PyTorch will be used to explore neural networks and generative chatbot concepts.
Installing PyTorch (CPU-only)
With your virtual environment activated:
pip install torch torchvision torchaudio
This installs the CPU-only version of PyTorch, which is sufficient for learning and experimentation.
Scikit-learn
What is scikit-learn
scikit-learn is a machine learning library focused on classical ML algorithms such as classification, regression, clustering, and similarity search.
It is commonly used for:
- Feature extraction
- Text vectorization
- Similarity-based retrieval
- Model evaluation
Retrieval-based chatbots often rely on scikit-learn techniques.
Installing scikit-learn
pip install scikit-learn
Pandas
What is pandas
pandas is a data manipulation and analysis library built on top of NumPy. It introduces the DataFrame, a powerful structure for working with tabular data.
pandas is commonly used for:
- Loading datasets
- Cleaning and transforming data
- Preparing data for ML models
Installing pandas
pip install pandas
NumPy
What is NumPy
NumPy is the foundational numerical computing library in Python. It provides efficient array operations and mathematical functions.
Nearly all ML libraries—including PyTorch and pandas—are built on top of NumPy.
Installing NumPy
pip install numpy
requirements.txt
To ensure reproducibility, we track dependencies using a requirements.txt file.
Generate it with:
pip freeze > requirements.txt
This file allows anyone to recreate the environment using:
pip install -r requirements.txt
This practice becomes critical when deploying or collaborating on ML projects.
VSCode Extensions
Recommended VSCode extensions for this module:
- Python (Microsoft)
- Jupyter (Microsoft)
- Pylance
- Python Data Science Extension Pack
These extensions improve:
- Notebook integration
- Code completion
- Inline documentation
- ML workflow productivity
Conclusion
A well-prepared environment is the foundation of every successful machine learning project. In this lecture, we created an isolated Python 3.11 virtual environment, configured Jupyter Notebook, installed essential ML libraries, and aligned VSCode with our setup. With these tools in place, you are ready to begin building chatbots—from simple rule-based systems to advanced generative models—without friction or configuration issues. In the next lecture, we will begin implementing our first chatbot and explore how conversational logic works in practice.