Breaking down SDSU AI Club’s introduction to machine learning workshop conceptually.
Hey there! 👋
Glad you could make it to our meeting today! Today’s meeting (September 6th) had the largest member turnout we’ve seen in a while, and our club officers were thrilled to see so many new faces! This will be a fun semester, and I’m excited for what’s to come!
I’m very loosely covering topics only discussed in our meeting since there’s a lot I could say, but then I would write on and on forever, so let’s get started with what we discussed!
What is artificial intelligence (and machine learning)? 💭🤔
We gave an introduction to the realm of artificial intelligence, mentioning natural language processing, deep learning, neural networks, and machine learning (ML). Machine learning is seen as a sub-area of artificial intelligence and is the study of computer algorithms that improve automatically through experience.
In many areas of machine learning and data science, we work with sample data, known as training data, to allow for the model to make predictions and decisions without being explicitly programmed to do so.
Workshop 0: The Introduction to Machine Learning ✨
We discussed a bunch of ML concepts that might be unfamiliar if you haven’t taken a data science or AI/ML course, and I’m aware that this is many of our new members first time learning about many of these concepts. I decided to write an overview of the ideas presented in the introductory workshop.
Necessary Libraries: numpy and matplotlib ✅
Be sure you import these for the calculations and plotting we’ll do later!
import numpy as np
import matplotlib.pyplot as plt
Sample data: Features (x_vals) and Labels (y_vals)
Create two numpy arrays that represent the features and labels of our dataset.
x_vals = np.array([1, 2, 3, 4, 5, 6, 7, 8])
y_vals = np.array([2, 3, 5, 7, 8, 13, 15, 17])
Model: Linear Regression 📈
We gave a simple example of a machine learning technique, linear regression, that finds the relationship between features and labels. We had two lists in our demo; Logan compared it to square footage of a house (the feature) and the respective house price (the label). We trained a model that draws the line of best fit through the given data, algebraically given as:
y = mx + b
The respective ML equation is similar, but the parameters are slightly different, especially in regard to artificial neural networks.
y’ = b + wx
Here, y’ is our prediction, x is our feature value, and found through training are our b and w values, the bias and weights, respectively. Weights and biases influence the prediction function, since these are essentially the y-intercept and slope.
In the workshop, we created a function called `approximate` that returns the predicted y-value (label) based on the x-value (feature). Note we are reflecting this as our y = mx + b equation.
def approximate(m, b, x_vals):
y_pred = m * x_vals + b
return y_pred
Loss: Mean Square Error👌
Here’s the code from the workshop:
def mse(m, b, x_vals, y_vals):
y_pred = approximate(m, b, x_vals)
mse = np.mean((y_vals - y_pred)**2)
return mse
What does this mean? In machine learning, loss is a numerical metric that describes essentially how wrong a model’s predictions are from the true value. When we train a model by giving it data, we want to minimize the loss. We used the mean square error (a type of loss) in our demo today. This is the average squared difference between our given values and the predicted values represented in our loss function.
Optimizer: Gradient Descent 👾
Gradient descent (GD) is an optimization technique that helps produce a model with the lowest loss by guiding the model in finding the best weights and bias in an iterative process. In our demo, our gradient descent function is calculating the loss with the mean square error, then determining which direction to move the weights and bias to reduce loss. We start with random parameters and work our way towards tuning them.
m = np.random.rand()
b = np.random.rand()
learning_rate = 0.01
iterations = 1000
GD Parameters: Learning Rate and Iterations
Gradient descent has two parameters that help with convergence and optimality: the learning rate and the number of iterations. A larger learning rate means a bigger step towards minimizing the cost function, but too large of a learning rate can be detrimental to the model’s loss function convergence. The iterations are just how many times we want to run gradient descent. Too large of a value can be unnecessary, given that your model’s loss function converges fast.
Reducing loss means that our prediction (in this case, our line of best fit) better represents the data points given. Here’s the code from the workshop:
def gradient_descent(m, b, x_vals, y_vals, learning_rate, iterations):
n = len(x_vals)
for _ in range(iterations):
y_pred = approximate(m, b, x_vals)
dm = (-2/n) * np.sum(x_vals * (y_vals - y_pred))
db = (-2/n) * np.sum(y_vals - y_pred)
m -= learning_rate * dm
b -= learning_rate * db
return m, b
When we cannot minimize the loss calculated in our loss function anymore, we can say that our ML model is optimized for our data. However, our demonstration had a fixed number of iterations. These are called epochs in machine learning.
Mathematically, gradient descent is just finding the local minimum of our cost function by moving the parameters in the direction of the steepest decrease. Gradient descent is used in a variety of machine learning and deep learning tasks for model training, not just linear regression. Many real-world problems come down to minimizing a function.
m, b = gradient_descent(m, b, x_vals, y_vals, learning_rate, iterations)
Output 🎉
Lastly, we want to show our output with our mean square error and our plot.
print(f"Final Best line of fit: y = {m:.2f}*x + {b:.2f}")
final_mse = mse(m, b, x_vals, y_vals)
print(f"Final MSE: {final_mse:.2f}")
y_pred = approximate(m, b, x_vals)
plt.scatter(x_vals, y_vals, color="blue")
plt.plot(x_vals, y_pred, color="red")
plt.xlabel("Features")
plt.ylabel("Labels")
plt.show()
Final Best line of fit: y = 2.25*x + -1.38
Final MSE: 0.83
Visual Representation: A Hilly Terrain 🌄
Most things get more confusing when you add a third dimension to it, but I think it’s significantly easier to understand gradient descent in 3D. Think of our cost function as a really hilly terrain. When we perform gradient descent, we are placed in a random location (our random parameters), and we try to find the fastest way to get to the bottom of the hill (optimized parameters that minimize loss).
Note that gradient descent can only find the local minimum when you have more than one feature, but in our case, our workshop only had one feature, so the optimized parameters would be the best parameters.
Hopefully, that wasn’t too much for an introductory workshop. The beautiful thing about machine learning is that there are many resources online to help! One of my favorites is the Google Crash Course on Machine Learning. Feel free to check this out in your own time; it’s completely free too!
Here’s our club flyer too:
Until next meeting!