Now, did you hear what I was saying? Clearly enough that you could, say, write
it in dow the comments? If so, you just experienced a phenomenon called The Cocktail Party Effect. You can hear me while there’s people talking
right next to us or if there’s a jazz band across the room. This is because of selective attention – our
ability to focus on one particular thing while tuning out our surroundings. And it’s the same effect that allows us
to separate the vocals from the background music in a song This comes so naturally to us, but machines
find these tasks extremely hard. To a machine, a voice singing is just another
track in a song that isn’t easily discernible from the piano track or the violin track or
the harmonica track. So how do you train a machine to separate
voices at a party or vocals from a song like people can? Well, the answer lies in algorithms and lots
of data. Recently, researchers developed an algorithm
that can identify the vocals in multiple songs. And this is thanks to breakthroughs in machine
learning – a method used in artificial intelligence to allow machines to learn by analysing data. To do so, researchers used a deep neural network
– these networks are software inspired by how our brain works. They can learn using a method called deep
learning, a kind of machine learning technique that works through a series of layers. An input layer, an output layer and middle
hidden layers. These hidden layers are where the magic happens. And to train an artificial neural network,
you have to feed them a ton of data – just like us, the more they know, the better they
can learn. So researchers trained their neural network
by giving it 50 songs. They let the neural network try to separate
the vocals and the non-vocal components (the other instruments), and compare its results
with the correct answer – which is the particular song already separated into the different
components. Every time the neural network gets closer
to the correct result, it’s rewarded. So it improves with each run. It was then tested with 13 new songs, and
it correctly separated the vocals from the background music in each one. It taught itself to tell the vocals apart
from the other instruments. What separates deep learning from previous
types of machine learning is this layered structure, which is modelled specifically
after the cortex, the wrinkly outer layer of the brain. It’s the part responsible for higher-order
brain function like sensory perception, cognition, spatial reasoning and language. Basically it’s the part that makes you… different from a lizard. It’s made up of 6 layers, and different
aspects of processing happen at each level. For example, when you see an apple, the first
layer might identify the color red, the second layer detects the round edges, and so on until
finally the last layer puts it all together and says hey, that’s an apple! Deep learning software tries to imitate this
hierarchical structure of neurons in the cortex. The first few layers of a deep neural network
learn to identify simple patterns, like single units sounds. The next layers learn to recognize more complicated
patterns, like words. Eventually, the result is that extremely complicated
patterns like the entire vocals of a song can be recognized and distinguished from the
other instruments. This layered process is at the heart of deep
learning’s success. Starting with simple ideas and making them
become a more and more like a generalized concept seems to capture something fundamental
about intelligence. Humans used to have a clear advantage in pattern
recognition, but in 2015 a deep neural network beat a human at image recognition for the
first time. This means we’re able to make better and
more sophisticated machines that can master tasks we thought were unique to humans. Machines are helping doctors make better diagnoses
and robots are learning to cook by watching YouTube videos. And when a robot can learn to cook by watching
YouTube videos – that makes you question what it really means to be human.