Computer Vision in Machine Learning


Jaden Zhang December 19, 2023

What is Computer Vision?

Computer vision, a branch of machine learning and artificial intelligence, is a fast growing field where machines are given visual input for processing (IBM). By allowing computers to “see” the world and process what it is seeing, it pushes technology towards one of humanity’s most valuable senses, sight. With a market value of 22.27 billion USD, CV is becoming much more accurate with the possibilities of 99% identification accuracy (Statista).

Simplified Explanation

CV has different types of layers for different functionalities, but I will focus on classification-type CV. A simple way computers process an image is by only using black and white pictures. The brightness of each pixel can be converted to their respective values and placed in an array. The array can then be analysed for many different types of patterns (Simplilearn). The patterns are associated with their respective labels for the objects in the images. After many different images from a data set and tweaks to the types of template patterns, a model is compiled that knows the patterns of pixels for specific shapes (V7Labs). When novel images are given to the model, it compares the new patterns to known patterns and gives a similarity rating. The label with the highest matching rate is the classification given to the image (IBM).

Helpful Video!

Congruent Fields of Study

To build up to its current-day abilities, CV needed a base of fields to learn from!

Digital Imaging

  • Learning how to get visual input to CV models for processing is the first step to vision (Microsoft).
  • It is vital for computer vision to receive the data with as much detail as possible, so that the models can be precise.
  • If other types of imaging can also be researched and developed (IR and sound wave imaging), computers can receive more than the typical visual information.
  • Simply put, to allow computers to see, you need to have digital stuff to look at!
1/4

Personal Experiences

We used: Typescript (T3 Stack), CSS (Really just Tailwind though), React, Python, Tensorflow Keras and HTML to create Pentous!

Created with ♥ by Jaden Zhang
Resources