Computer Vision in Machine Learning
Jaden Zhang December 19, 2023
What is Computer Vision?
Computer vision, a branch of machine learning and artificial intelligence, is a fast growing field where machines are given visual input for processing (IBM). By allowing computers to “see” the world and process what it is seeing, it pushes technology towards one of humanity’s most valuable senses, sight. With a market value of 22.27 billion USD, CV is becoming much more accurate with the possibilities of 99% identification accuracy (Statista).
Simplified Explanation
CV has different types of layers for different functionalities, but I will focus on classification-type CV. A simple way computers process an image is by only using black and white pictures. The brightness of each pixel can be converted to their respective values and placed in an array. The array can then be analysed for many different types of patterns (Simplilearn). The patterns are associated with their respective labels for the objects in the images. After many different images from a data set and tweaks to the types of template patterns, a model is compiled that knows the patterns of pixels for specific shapes (V7Labs). When novel images are given to the model, it compares the new patterns to known patterns and gives a similarity rating. The label with the highest matching rate is the classification given to the image (IBM).
Congruent Fields of Study
To build up to its current-day abilities, CV needed a base of fields to learn from!
Digital Imaging
- Learning how to get visual input to CV models for processing is the first step to vision (Microsoft).
- It is vital for computer vision to receive the data with as much detail as possible, so that the models can be precise.
- If other types of imaging can also be researched and developed (IR and sound wave imaging), computers can receive more than the typical visual information.
- Simply put, to allow computers to see, you need to have digital stuff to look at!
Personal Experiences
- I have used tensorflow keras with python to set up a deep learning program that creates a CV ML model to classify foods based on my data set of food categories (Tensorflow).
- My team and I used ReLu functions for each layer to act as the function that determined the action potential of the nodes to mimic neurons (Krishnamurthy). Click here to see the devpost of my first hackathon! We have gotten better since!
This video was my excellent starting point!
Our model, named JART, was part of our first ever hackathon project. JART was not particularly amazing, but had a training accuracy rate of 99%. The problem is that this rate was only achieved with the training set, not with new data (Renotte).
Images had to be scaled before they could be processed. This was done by unlocking the aspect ratio and forcing the image to be within a specific square ratio.
We used: Typescript (T3 Stack), CSS (Really just Tailwind though), React, Python, Tensorflow Keras and HTML to create Pentous!