Welcome to the December edition of our best and favorite articles in AI that were published this month. We are a Paris-based company that does Agile data development. This month, we spotted articles about deep learning, visualization, ethics and more. We advise you to have a Python environment ready if you want to follow some tutorials :). Let’s kick off with the comic of the month:
It could be a revolution in Deep Learning.
Artificial Neural Networks rely on layers. One might argue that the more layers there are, the more accurate the model is. To capture the most information, each layer will represent a type of feature (from a geometrical shape to a specific concept such as a species).
But what if we replaced all the layers with calculus equations, such that it would act as an infinite layers network? This idea was implemented by David Duvenaud team and earned them the “Best Paper” award in the NeurIPS conference earlier in December.
This could help to define continuous time models. Especially in the medical field where patients health records data come at irregular times.
Even if the ODE net (its little name) is at an early stage, I’m sure other studies will follow and have huge impacts!
If you want more about NeurIPS 2018, check this selection of papers we made for you.
With Grasp2Vec, the Google Robotics team designed an algorithm that teaches object representation to a robotic arm. By grasping a single object from an initial scene, the robot creates 3 different pieces of information: the initial scene, the final scene, and the grasped object.
Difference between final and initial scenes results in the grasped object. From this intuition, the team trained a model that helps in:
representing a specific object in the scene
Good news is that the robot creates its own dataset by picking object after object!
Anomaly Detection is central to many businesses, but hard to establish.
Take the example of a company specialized in healthcare insurance. Every day, it treats thousands of care transactions. Among them, a very small portion are fraudulent: you have to block them and investigate. But how do you detect them when you have no model and you lack technological infrastructure?
You can start by following this step-by-step article. It will help you implement, train, and deploy your own anomaly detection system. Based on Amazon SageMaker, AWS Machine Learning Service, the system provides you with a complete and operational baseline model for your business.
By the way, we’re preparing an article to address fraud detection using python graph library NetworkX. Don’t forget to follow us!
Who said that representing data with python was painful and inefficient?
Each example comes with its own code snippet you can copy, paste and adapt to your situation.
Hope it will save you time. It did for me!
For those using Jupyter Notebooks, did you know that you could customize your notebooks?
There are several extensions you can enable/disable at will, for more modularity. Each of them will make your daily use of notebooks more efficient, convenient and neat!
One year ago, DeepMind team released their system, one that could beat any player from Chess, Shogi and Go. They now publish the full evaluation of their creation.
AlphaZero is based on a Monte-Carlo Tree Search method that is guided by a self-taught Neural Network trained through Reinforcement Learning. And it works. Stubbornly well. But it also brings a new vision to these centuries-old games.
The implications go far beyond my beloved chessboard… Not only do these self-taught expert machines perform incredibly well, but we can actually learn from the new knowledge they produce.
What experts would call risky moves (such as sacrificing valuable chess pieces) are central to AlphaZero style. It offers inspiring material for Chess, Shogi or Go players, from beginner to expert.
What should a fully autonomous driving agent do in an imminent crash situation? In a strangely curved road? Or when facing an unanticipated pedestrian?
Tough questions when you only trained your agent on “standard” expert driving data. Because those rare situations are usually not part of such datasets.
To synthesize such situations and train their RNN, Waymo used a distorted version of expert trajectories. The resulting trajectories are then combined to a loss policy in order to train their agent — ChauffeurNet — and make it more robust to the unusual situations.
Still not convinced by self-driving cars?
Splitting datasets into chunks used for training different versions of a single model is called data-parallelism.
Data-parallelism is one of those smart moves in AI that help to improve our field. But could we go further and derive what would be the perfect size of batches for data-parallelism beforehand?
The OpenAI Team related the Optimal Batch Size to the Gradient Noise Scale (variation of the gradient between training examples). By computing Gradient Noise Scale in training dataset we obtain the critical size. And intuitively, the more complex the problem is (eg. training a Dota agent), the bigger the batches are.
A handy way to train your Neural Networks.
How many people leave your building every working day to go to your office out of the city, passing by your favorite bakery every once in a while, and sometimes go play tennis on Sundays? Only you.
And how many companies have access to this valuable information? Dozens.
Scary, isn’t it?
Many mobile applications require access to your location. Location data is then used as a product for companies to deliver you personalized services.
Even if this is no new information, I hope that this compelling article will get you familiar with your favorite apps privacy settings.
A little bit of music to conclude this best of.
I suppose you knew that an audio signal was a wave. Perhaps you also got that it could be represented by its spectrogram. But what about Zero Crossing Rate, Spectral Rolloff or the Mel-Frequency Cepstral Coefficients?
The 3 of them are features you can use for describing an audio signal. Along with other useful features, this article will get you familiar with audio processing. It also proposes a nice application: music genre classification.
Everything with Python!
How to Perform Fraud Detection with Personalized Page Rank
This article shows how to perform fraud detection with Graph Analysis.
How to Write Perfect Python Command-line Interfaces
This article will show you how to make perfect Python command line interfaces.
Image Registration: From SIFT to Deep Learning
How the field has evolved from OpenCV to Neural Networks.