Augmented reality is the art of adding information to the raw data perceived by a device. This perceived raw data is usually in the form of camera input; some 30 images every second. In order to be able to add relevant information to these images, the device must go through a highly sophisticated process of understanding the input.
This is known as image processing.
Take this image of São Paulo for example.
To a human like myself it is rather easy to understand what this image represents. I see lots of buildings, most of them residential and built between 1960 and 1990. I deduce this is a large and dense city. From there, I could easily count how many floors there are in some of the buildings and draw huge floating numbers on top of them so no one else has to count them, ever.
What my device's camera sees is very different, however. She (my device is a she) sees an array of pixels, each of which has a very precise colour assigned to it. To put ourselves in my device's shoes, we zoom in until we can no longer recognize individual objects.
She (I'm still talking about my device) is free to scroll around, but she is still unable to recognize buildings, let alone counting the floors. She can only obtain very "local" information about parts of the image. By looking around she realizes that a certain pixel is much darker than its neighbour to the right. Another one over there is much lighter than its neighbour below. It occurs to her that by following contrasting pixels, she can obtain "contours" which may lead her to deducing the shapes of individual buildings in the image.
This is the basis of image processing: Obtaining relevant object information from an image by putting together clusters of local information. My job as a developer is to research, come up with, and write down the algorithms that make a device able to perform image processing. This is not limited to finding contours, and sometimes it involves concepts as deep as "teaching" the device to to improve its processing methods by analyzing its own output.
In a sequel to this blog post we will learn a little bit more about image processing and how it is possible for a device to deduce perspective information from camera input.