Exploring the Penguins Dataset with Streamlit¶
In this chapter, we'll explore the [Penguins dataset](https://github.com/dataprofessor/data/blob/master/penguins_cleaned.csv{:target=_blank} and use it as a base to build an interactive ML application using Streamlit.
By the end of this chapter you will,
- Loading, preprocessing, and preparing the dataset for visualization
- Using Streamlit Expander to display:
- Dataset Information
- Machine Learning Model Features (X)
- Prediction Target (
species
) variable (y)
- Creating interactive scatter plots to identify patterns and relationships
Download Dataset¶
Let us download the dataset locally,
mkdir -p "$TUTORIAL_HOME/data"
curl -sSL \
-o data/penguins_cleaned.csv \
https://raw.githubusercontent.com/dataprofessor/data/refs/heads/master/penguins_cleaned.csv
Displaying the Data¶
Edit and update the streamlit_app.py
with the following code,
streamlit_app.py | |
---|---|
Application Overview¶
As part of this machine learning application, we will be building a simple classification model to predict Penguin species (y) using input variables (X). Using Streamlit's interactive widgets, we'll display these variables to make our application user-friendly and intuitive.
This classification model will help us categorize penguins into their respective species based on their physical characteristics. The input variables and target variable will be presented through Streamlit's interface, allowing users to easily interact with and understand the prediction process.
Adding Our First Widget¶
Let us add our first Streamlit widget expander to allow expand and collapse of the data frame.
Edit and update the $TUTORIAL_HOME/streamlit_app.py
with the following code,
Displaying the Variables¶
Let us create and display the input features (X) and target (y), edit and update the $TUTORIAL_HOME/streamlit_app.py
with the following code,
Data Visualization¶
Let us visualize the Penguins data using a scatter plot, edit and update the $TUTORIAL_HOME/streamlit_app.py
with the following code,
Now that we have our variables and target displayed for reference, let's move to the next chapter where we'll explore Streamlit's interactive features.