Python Project Example
Let’s go through an entire Python data science project example.
1 Install Python
Install the latest Python 3.x.x version known by pyenv
pyenv install 32 Create a new python project
cd ~/Desktop
mkdir my_python_project
cd ~/Desktop/my_python_projectWe’ll create a small python script
{.python, include="example_project/01-create_data.py", eval=FALSE}
import pandas as pd
# Create example data
data = {
"Date": [
"2023-01-01",
"2023-01-02",
"2023-01-03",
"2023-01-04",
"2023-01-05",
],
"Product": ["A", "B", "A", "C", "B"],
"Sales": [100, 150, 120, 80, 200],
"Profit": [30, 40, 25, 10, 50],
}
# Create a DataFrame
df = pd.DataFrame(data)
# Save the DataFrame to a CSV file
df.to_csv("sales_data.csv", index=False)
# Group by Product and calculate total sales and profit
product_group = df.groupby("Product").agg(
{"Sales": "sum", "Profit": "sum"}
)
# Save the product group to a CSV file
product_group.to_csv("product_group.csv", index=False)
print(
"Data saved to 'sales_data.csv' and\n"
"product group data saved to 'product_group.csv'"
)Save this code to 01-create_data.py
my_python_project % ls
01-create_data.py3 Switch to the proper python version
Check all the installed versions
my_python_project % pyenv versions
system
3.9.11
3.9.18
3.10.3
3.10.4
3.11.0rc1
* 3.11.5 (set by /Users/danielchen/.pyenv/version)
3.12.0Switch to the version of interest.
pyenv shell 3.12.0my_python_project % pyenv versions
system
3.9.11
3.9.18
3.10.3
3.10.4
3.11.0rc1
3.11.5
* 3.12.0 (set by PYENV_VERSION environment variable)Re-confirm the python version
my_python_project % pyenv which python
/Users/danielchen/.pyenv/versions/3.12.0/bin/pythonmy_python_project % python --version
Python 3.12.04 An empty slate
Currently, our python environment is the default base environment without any extra packages.
If we try to run the script, it will fail because the pandas module is not installed.
my_python_project % python 01-create_data.py
Traceback (most recent call last):
File "/Users/danielchen/Desktop/my_python_project/01-create_data.py", line 1, in <module>
import pandas as pd
ModuleNotFoundError: No module named 'pandas'5 Create a venv
The Iron Law of Python Management states that every project should have their own virtual environment. Before we start installing packages (e.g., pandas), we need to create and activate a virtual environment first.
We will use the built-in venv python module to create a venv. The venv will be saved into a folder called venv in the current directory.
my_python_project % python -m venv venvHere’s the folder structure after creating a venv
my_python_project % tree -L 4 .
.
├── 01-create_data.py
└── venv
├── bin
│ ├── Activate.ps1
│ ├── activate
│ ├── activate.csh
│ ├── activate.fish
│ ├── pip
│ ├── pip3
│ ├── pip3.12
│ ├── python -> /Users/danielchen/.pyenv/versions/3.12.0/bin/python
│ ├── python3 -> python
│ └── python3.12 -> python
├── include
│ └── python3.12
├── lib
│ └── python3.12
│ └── site-packages
└── pyvenv.cfgYou can see here that the python, python3, and python3.12 are all pointing to the same python we installed. This is why you want to keep the base version environment clean.
6 Activate venv
The venv/bin/ directory has a few activate scripts. These scripts are to activate for different operating system. Currently we are using a Mac/*nix environment.
source venv/bin/activatevenv/bin/Activate.ps1You will notice your prompt change and the name of the venv will be prepended to the beginning of the prompt:
Before:
my_python_project %After:
(venv) my_python_project %Now you are ready to install packages and run your code!
7 Install packages into venv
We’re finally able to install packages. Our current project only needs pandas
pip install pandas8 Run your code
Our code now runs!
my_python_project % python 01-create_data.py
Data saved to 'sales_data.csv' and
product group data saved to 'product_group.csv'9 Rinse and Repeat
We’ll create a new script 02-viz_pandas.py with the following bits of code:
import pandas as pd
import matplotlib.pyplot as plt
# Read product group data from CSV
product_group = pd.read_csv('product_group.csv')
# Bar chart for total sales by product
plt.figure(figsize=(8, 6)) # Set the figure size
product_group['Sales'].plot(kind='bar')
plt.title('Total Sales by Product')
plt.xlabel('Product')
plt.ylabel('Total Sales')
plt.savefig('sales_by_product.png') # Save the figure as a PNG
plt.show()This code will load a dataset from our 01 script, create, save, and show a figure.
my_python_project % ls
01-create_data.py product_group.csv venv
02-viz_pandas.py sales_data.csvNow let’s run the script.
(venv) my_python_project % python 02-viz_pandas.py
Traceback (most recent call last):
File "/Users/danielchen/Desktop/my_python_project/02-viz_pandas.py", line 2, in <module>
import matplotlib.pyplot as plt
ModuleNotFoundError: No module named 'matplotlib'We need to install matplotlib in our environment.
pip install matplotlibAnd now things work!
(venv) my_python_project % python 02-viz_pandas.py
(venv) my_python_project % ls
01-create_data.py product_group.csv sales_data.csv
02-viz_pandas.py sales_by_product.png venv10 One more time
In a new 03-viz_mpl.py file:
import pandas as pd
import matplotlib.pyplot as plt
# Scatter plot of Sales vs. Profit
df = pd.read_csv('sales_data.csv')
plt.figure(figsize=(8, 6)) # Set the figure size
plt.scatter(df['Sales'], df['Profit'])
plt.title('Scatter Plot of Sales vs. Profit')
plt.xlabel('Sales')
plt.ylabel('Profit')
plt.savefig('scatter_plot.png') # Save the figure as a PNG
plt.show()Voilà!
(venv) my_python_project % python 03-viz_mpl.py
(venv) my_python_project % ls
01-create_data.py 03-viz_mpl.py sales_by_product.png scatter_plot.png
02-viz_pandas.py product_group.csv sales_data.csv venv11 Save requirements.txt
The pip freeze command will show you all the packages (and dependencies) you have installed in the current virtual environment.
(venv) my_python_project % pip freeze
contourpy==1.1.1
cycler==0.12.0
fonttools==4.43.0
kiwisolver==1.4.5
matplotlib==3.8.0
numpy==1.26.0
packaging==23.2
pandas==2.1.1
Pillow==10.0.1
pyparsing==3.1.1
python-dateutil==2.8.2
pytz==2023.3.post1
six==1.16.0
tzdata==2023.3We can save the contents of this file out to a requirements.txt file.
pip freeze > requirements.txtNow you have full python project!
(venv) my_python_project % ls
01-create_data.py product_group.csv sales_data.csv
02-viz_pandas.py requirements.txt scatter_plot.png
03-viz_mpl.py sales_by_product.png venvYou will need to manually run pip freeze > requirements.txt when you want to update your requirements.txt file.
12 Deactivate your virtual environment
When you want to leave your project environment you can run deactivate in the terminal.
deactivateThis will remove the venv name that was original prepended to your terminal:
In the venv:
(venv) my_python_project % deactivateDeactivated:
my_python_project %And we’re back to our original package environment
my_python_project % python 01-create_data.py
Traceback (most recent call last):
File "/Users/danielchen/Desktop/my_python_project/01-create_data.py", line 1, in <module>
import pandas as pd
ModuleNotFoundError: No module named 'pandas'14 Conclusion
The general workflow for working with python and python projects:
- Install Python version you want
- Switch into Python
- Create new python project directory
- Create
venv - Activate
venv - Install packages into
venv - Run your code
- Rinse and repeat
There are other tools that can be installed to streamline the process. But this should be the bare minimum python project setup you use going forward.