Do Computers Dream?

Okay, that title is a bit misleading, but it’s hard not to go there…  stick with this for a minute.

At Google I/O this year, Google announced the new Google Photos – a new approach to storing, organizing and searching photos. It’s been dubbed “Gmail for photos”, and they are hoping to do for photos what Gmail did for email. To that point, it’s a service (website + mobile apps) that allow you to easily sync and store photos. The differentiator for this service is search – the service will automatically analyze what’s in your photos, so you can search, rather than hunt down a specific photos:

Google Photos automatically organizes your memories by the people, places, and things that matter. You don’t have to tag or label any of them, and you don’t need to laboriously create albums. When you want to find a particular shot, with a simple search you can instantly find any photo—whether it’s your dog, your daughter’s birthday party, or your favorite beach in Santa Barbara. And all of this auto-grouping is private, for your eyes only.

Google recently revealed some how they’re doing this in a post called “Inceptionism: Going Deeper into Neural Networks“, and it’s fascinating. They’re using machine learning to accomplish this, and more specifically Artificial Neural Networks are trained to determine what’s in the images.

Well, we train networks by simply showing them many examples of what we want them to learn, hoping they extract the essence of the matter at hand (e.g., a fork needs a handle and 2-4 tines), and learn to ignore what doesn’t matter (a fork can be any shape, size, color or orientation). But how do you check that the network has correctly learned the right features? It can help to visualize the network’s representation of a fork.

Like may things, if you can see what’s going on by visualizing it, you can have a better understanding of what’s going on, and ideally improve it, learn from it, or make it better. To this end, Google created some tools helps them understand what’s going on, and while it did help them learn, they also found that it created some amazing imagery.

image credit: Google

Some of the simple results highlight the structure of a photo, turning it into what looks like an impressionist painting. At higher levels of inspection, they can tell the software to look for certain things in images (like animals, or eyes) and it will try to find them in images, with unexpected results. By exploiting how this works, and asking the software to look for these certain things (then amplifying that), you can see some amazing things emerge. Due to great interest and demand, Google open-sourced the software, DeepDream, and published a post about it. If you’re a developer, you can get the software up and running quickly. If you’re using a Mac, a company is selling an app called Deep Dreamer that wraps it all up in an easy to use standalone program.

Below are a few images we’ve run through the software. Billy, our Founder and CEO, has an amazing telescope. This is a photo taken through that on a recent night of stargazing:

Frame 0 2015-08-02 23-30-06.jpg copy

One thing the software will do is look for the structure of the photo. When you do that, you can see the lines that it uses to determine the shapes and structure of the photo:


When we run this photo through the software and tell it to looks for “eyes”, you can see what emerges: It has found possible locations in the photo where eyes could exist – they follow and fit the contour of the original photo.

Frame 1 2015-08-02 23-30-06

If we run this through 10 times, we can see they effects are even more exaggerated:

Frame 10 2015-08-03 01-07-31

If we want this software to look for animals, we get a different result:

Frame 1 2015-08-04 15-51-23

So often the underlying methods and algorithms of code we write is hidden – and not able to be visualized. Using machine learning, and exposing how this is done helps deepen our understanding of how this code works. At space150, we use computer vision in a lot of the interactive / physical to digital experiences we create (like the Forever 21 Times Square Billboard). When we’re creating these experience, we often need to see what the computer sees, so it’s a natural thing to want to do.

What if you send some different type of images through this? I’ll end this with 10 iterations of the classic Super Mario Bros title screen (HT @gregswan for the SMB idea)


Happy dreaming.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s