If You Understand Bananas, You Can Understand Machine Learning
A simplified high-level overview of primary machine learning algorithms for anyone to understand
I want this article to serve three purposes:
Show that the big picture of machine learning is easy for anyone to understand.
Act as a resource to point anyone (even less-technical individuals) to so they can understand machine learning and why it's important for everyone (even consumers) to understand.
Create a baseline to understand more complex, but equally as important, machine learning topics coming in future articles.
I'll get into more complex machine learning topics, make them easier to understand, and explain why they matter to you over the upcoming weeks. If this interests you and you haven’t joined Society’s Backend, join us now here:
The truth about machine learning is that the big picture is really easy to understand. In fact, it's basically intuitive because it’s something you've been doing your entire life. The basic one-phrase definition of machine learning is teaching computers to learn about systems the way we do.
While simple, this definition isn't very descriptive so I'll break it down using some examples. I'll explain the three main types of machine learning in simple terms:
And then I'll explain why machine learning is important to you.
Machine Learning is Bananas, B-A-N-A-N-A-S
Imagine running a banana stand in a market. Each day you visit a banana farm early in the morning, pick some bananas and sell them at the local market. Then each day, you keep track of which bananas sell and which ones don’t.
You notice trends such as large bananas selling better than small ones, yellow bananas outselling green ones, and bunches of bananas being more popular than individual ones. By the end of the year, you can accurately predict which bananas will sell based on their size, color, and whether they're sold in bunches or individually.
Congrats! You’ve just trained a supervised learning algorithm. Machines can learn this too and they learn very similarly to how you do. We give the machine an input (the banana), have it predict whether or not it will sell, and let it know if it sold. The machine will remember if it got the prediction right or wrong (and even how it got it wrong) and use that info to improve the next prediction.
This algorithm is considered ‘supervised’ because we provide it with the correct predictions during the learning process. The biggest difference between the machine's learning process and yours is that the machine can learn in only a few minutes what took you a year—and it’ll likely be more accurate in its predictions.
Machine Learning is Also Berries and Melons
Let's say you add more types of fruit to your banana stand. If you bring apples, oranges, berries, melons, and more to sell at your stand, how would you go about placing them? You'd probably look at each fruit, identify its characteristics, and place it in your display accordingly.
At the end, you'll likely have a fruit stand where berries are grouped together on one side of the stand with melons grouped together on the other. In the middle will be a gradient of other fruits, with fruits more similar to berries on the berry side and fruits more similar to melons on the melon side.
Now you've trained your first unsupervised machine learning algorithm. Unsupervised algorithms are great because they don't require an answer to the problem the machine needs to solve (in our previous example the answer was whether or not the banana sold). They're excellent for pattern recognition and clustering tasks.
In this example, we let a machine know what fruits are in the stand and it can use the fruits' characteristics to relate them to one another. As we learned back when we only had a banana stand, machines can do this faster and better than we can. In this case, machines really shine when there are thousands or even millions of fruits. As a person, it would take forever to go through and relate these, but machines can do it much faster and more accurately.
You now understand how unsupervised learning works!
What You Can Learn From Your Roomba
Reinforcement learning is the easiest machine learning method to understand because it's the same way most of us learned as a kid. When you did something wrong as a child, your parents let you know it was wrong. When you did something right, they let you know it was correct. This positive and negative reinforcement taught us what behaviors are acceptable or not.
Reinforcement learning is excellent for teaching robots how to do a task (among other applications). To get more specific, let's take a look at your friendly household robot: your Roomba. When your Roomba gets stuck or hits an obstacle - that’s negative feedback. When it successfully cleans an area - that’s positive feedback.
This feedback system allows you to simply plug in your Roomba, press a 'map' button and watch it learn how to navigate your floor plan and clean your house optimally, without any manual training. It also enables the Roomba to adapt when it encounters an obstacle during cleaning that wasn't there during its initial mapping.
Reinforcement learning is an intuitive way to teach machines because it mirrors how we often learn ourselves. Now you understand Reinforcement Learning!
Why This Matters to You
Up until machine learning, we've always told computers what to do. With machine learning, we tell them to figure out what to do based on the information we give them. To put this in perspective, this is the difference between a human being given a set of instructions for an assignment and a human learning how to complete that assignment from observation. The human with the given instructions will have an expected behavior. The human learning from observation may come up with their own thing.
You might think this doesn't matter to you if you aren't the one telling machines what to do, but that's incorrect. The instruction-based programming of machines has shaped a user's understanding of how a machine should respond to a given input. When you provide input your phone, you know what should be the output. If something goes wrong, you know the instructions given to the phone were incorrect.
This isn't always the case with machine learning— models will sometimes behave unexpectedly and consumers need to be aware of that. ChatGPT and other large language models have shown us why this is the case. ChatGPT has exposed unexpected information, AI chatbots have promoted self-harm, and large language models frequently hallucinate1 which can indirectly promote something such as unhealthy cooking practices (this is something I've seen myself). This doesn't mean machine learning is harmful, but it does mean it is important to understand how it can be harmful.
This starts with the above examples. Even though they lack detail, an understanding of how the primary types of machine learning work shows the value of machine learning, but also helps us understand the possibilities of where it can go wrong. This understanding is what helps both those working within machine learning and those using machine learning to use it properly.
To recap, there are three main types of machine learning algorithms:
Supervised learning algorithms: Inputs and outputs are fed to the model. Inputs are used to make a prediction; outputs are used to gauge the correctness of that prediction and make changes. Over many iterations of inputs and outputs, the model gets better at predicting.
Unsupervised learning algorithms: Inputs are given to the model and it identifies characteristics. Proper outputs are not needed for training. These are great for pattern recognitions and clustering tasks.
Reinforcement learning: Uses a trial-and-error based approach to teach a model the fundamental way to interact with an environment.
And you need to understand machine learning because it fundamentally changes the way we compute and, in turn, the way you use your electronic devices.
Everyone Needs a Machine Learning Education
I can't stress it enough: everyone needs a machine learning education. This article should provide a good baseline for anyone to understand the bigger picture of machine learning. If you have any questions, drop them in the comments below. If I got anything wrong or left out any important information, please clarify that below as well.
I'll be breaking down more topics about the importance of good machine learning engineering in future articles. If this interests you, join our network of machine learners:
Hallucinating in the context of LLMs means generation text that is nonsensical or unreal. This means a user asking a question may be given false information and the reason why many LLMs contain subtext to check the validity of its outputs. There are ways to mitigate this but it’s a difficult problem to solve entirely.