Python Project Example
Let’s go through an entire Python data science project example.
1 Install Python
Install the latest Python 3.x.x version known by pyenv
pyenv install 3
2 Create a new python project
cd ~/Desktop
mkdir my_python_project
cd ~/Desktop/my_python_project
We’ll create a small python script
{.python, include="example_project/01-create_data.py", eval=FALSE}
import pandas as pd
# Create example data
= {
data "Date": [
"2023-01-01",
"2023-01-02",
"2023-01-03",
"2023-01-04",
"2023-01-05",
],"Product": ["A", "B", "A", "C", "B"],
"Sales": [100, 150, 120, 80, 200],
"Profit": [30, 40, 25, 10, 50],
}
# Create a DataFrame
= pd.DataFrame(data)
df
# Save the DataFrame to a CSV file
"sales_data.csv", index=False)
df.to_csv(
# Group by Product and calculate total sales and profit
= df.groupby("Product").agg(
product_group "Sales": "sum", "Profit": "sum"}
{
)
# Save the product group to a CSV file
"product_group.csv", index=False)
product_group.to_csv(
print(
"Data saved to 'sales_data.csv' and\n"
"product group data saved to 'product_group.csv'"
)
Save this code to 01-create_data.py
my_python_project % ls
01-create_data.py
3 Switch to the proper python version
Check all the installed versions
my_python_project % pyenv versions
system
3.9.11
3.9.18
3.10.3
3.10.4
3.11.0rc1
* 3.11.5 (set by /Users/danielchen/.pyenv/version)
3.12.0
Switch to the version of interest.
pyenv shell 3.12.0
my_python_project % pyenv versions
system
3.9.11
3.9.18
3.10.3
3.10.4
3.11.0rc1
3.11.5
* 3.12.0 (set by PYENV_VERSION environment variable)
Re-confirm the python version
my_python_project % pyenv which python
/Users/danielchen/.pyenv/versions/3.12.0/bin/python
my_python_project % python --version
Python 3.12.0
4 An empty slate
Currently, our python environment is the default base environment without any extra packages.
If we try to run the script, it will fail because the pandas
module is not installed.
my_python_project % python 01-create_data.py
Traceback (most recent call last):
File "/Users/danielchen/Desktop/my_python_project/01-create_data.py", line 1, in <module>
import pandas as pd
ModuleNotFoundError: No module named 'pandas'
5 Create a venv
The Iron Law of Python Management states that every project should have their own virtual environment. Before we start installing packages (e.g., pandas
), we need to create and activate a virtual environment first.
We will use the built-in venv
python module to create a venv
. The venv
will be saved into a folder called venv
in the current directory.
my_python_project % python -m venv venv
Here’s the folder structure after creating a venv
my_python_project % tree -L 4 .
.
├── 01-create_data.py
└── venv
├── bin
│ ├── Activate.ps1
│ ├── activate
│ ├── activate.csh
│ ├── activate.fish
│ ├── pip
│ ├── pip3
│ ├── pip3.12
│ ├── python -> /Users/danielchen/.pyenv/versions/3.12.0/bin/python
│ ├── python3 -> python
│ └── python3.12 -> python
├── include
│ └── python3.12
├── lib
│ └── python3.12
│ └── site-packages
└── pyvenv.cfg
You can see here that the python
, python3
, and python3.12
are all pointing to the same python we installed. This is why you want to keep the base version environment clean.
6 Activate venv
The venv/bin/
directory has a few activate
scripts. These scripts are to activate for different operating system. Currently we are using a Mac/*nix environment.
source venv/bin/activate
/bin/Activate.ps1 venv
You will notice your prompt change and the name of the venv
will be prepended to the beginning of the prompt:
Before:
my_python_project %
After:
(venv) my_python_project %
Now you are ready to install packages and run your code!
7 Install packages into venv
We’re finally able to install packages. Our current project only needs pandas
pip install pandas
8 Run your code
Our code now runs!
my_python_project % python 01-create_data.py
Data saved to 'sales_data.csv' and
product group data saved to 'product_group.csv'
9 Rinse and Repeat
We’ll create a new script 02-viz_pandas.py
with the following bits of code:
import pandas as pd
import matplotlib.pyplot as plt
# Read product group data from CSV
= pd.read_csv('product_group.csv')
product_group
# Bar chart for total sales by product
=(8, 6)) # Set the figure size
plt.figure(figsize'Sales'].plot(kind='bar')
product_group['Total Sales by Product')
plt.title('Product')
plt.xlabel('Total Sales')
plt.ylabel('sales_by_product.png') # Save the figure as a PNG
plt.savefig( plt.show()
This code will load a dataset from our 01
script, create, save, and show a figure.
my_python_project % ls
01-create_data.py product_group.csv venv
02-viz_pandas.py sales_data.csv
Now let’s run the script.
(venv) my_python_project % python 02-viz_pandas.py
Traceback (most recent call last):
File "/Users/danielchen/Desktop/my_python_project/02-viz_pandas.py", line 2, in <module>
import matplotlib.pyplot as plt
ModuleNotFoundError: No module named 'matplotlib'
We need to install matplotlib
in our environment.
pip install matplotlib
And now things work!
(venv) my_python_project % python 02-viz_pandas.py
(venv) my_python_project % ls
01-create_data.py product_group.csv sales_data.csv
02-viz_pandas.py sales_by_product.png venv
10 One more time
In a new 03-viz_mpl.py
file:
import pandas as pd
import matplotlib.pyplot as plt
# Scatter plot of Sales vs. Profit
= pd.read_csv('sales_data.csv')
df =(8, 6)) # Set the figure size
plt.figure(figsize'Sales'], df['Profit'])
plt.scatter(df['Scatter Plot of Sales vs. Profit')
plt.title('Sales')
plt.xlabel('Profit')
plt.ylabel('scatter_plot.png') # Save the figure as a PNG
plt.savefig( plt.show()
Voilà!
(venv) my_python_project % python 03-viz_mpl.py
(venv) my_python_project % ls
01-create_data.py 03-viz_mpl.py sales_by_product.png scatter_plot.png
02-viz_pandas.py product_group.csv sales_data.csv venv
11 Save requirements.txt
The pip freeze
command will show you all the packages (and dependencies) you have installed in the current virtual environment.
(venv) my_python_project % pip freeze
contourpy==1.1.1
cycler==0.12.0
fonttools==4.43.0
kiwisolver==1.4.5
matplotlib==3.8.0
numpy==1.26.0
packaging==23.2
pandas==2.1.1
Pillow==10.0.1
pyparsing==3.1.1
python-dateutil==2.8.2
pytz==2023.3.post1
six==1.16.0
tzdata==2023.3
We can save the contents of this file out to a requirements.txt
file.
pip freeze > requirements.txt
Now you have full python project!
(venv) my_python_project % ls
01-create_data.py product_group.csv sales_data.csv
02-viz_pandas.py requirements.txt scatter_plot.png
03-viz_mpl.py sales_by_product.png venv
You will need to manually run pip freeze > requirements.txt
when you want to update your requirements.txt
file.
12 Deactivate your virtual environment
When you want to leave your project environment you can run deactivate
in the terminal.
deactivate
This will remove the venv
name that was original prepended to your terminal:
In the venv
:
(venv) my_python_project % deactivate
Deactivated:
my_python_project %
And we’re back to our original package environment
my_python_project % python 01-create_data.py
Traceback (most recent call last):
File "/Users/danielchen/Desktop/my_python_project/01-create_data.py", line 1, in <module>
import pandas as pd
ModuleNotFoundError: No module named 'pandas'
14 Conclusion
The general workflow for working with python and python projects:
- Install Python version you want
- Switch into Python
- Create new python project directory
- Create
venv
- Activate
venv
- Install packages into
venv
- Run your code
- Rinse and repeat
There are other tools that can be installed to streamline the process. But this should be the bare minimum python project setup you use going forward.