Data Preparation¶
Before applying our machine learning model, we need to convert categorical variables into numerical format using One-Hot1 Encoding. In our penguin dataset, we'll use pandas get_dummies()
2 to encode:
Features (X):
-
island
- Categorical location of penguin -
sex
- Gender of penguin
Target (y):
-
species
- Type of penguin (our prediction target)
📝 One-Hot Encoding converts categorical variables into binary (0 or 1) format. For example:
# Original: island = ['Torgersen', 'Biscoe']
# After encoding:
# island_Torgersen = [1, 0]
# island_Biscoe = [0, 1]
Encoding Features and Target¶
Edit and update the $TUTORIAL_HOME/streamlit_app.py
with the following code,
streamlit_app.py | |
---|---|
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 |
|
After successfully preprocessing our penguin dataset with appropriate encoding and feature selection, let's move forward to training our model and calculating species prediction probabilities. This step will prepare us for creating interactive visualizations in Streamlit.