January 8, 2019

The Best of AI: New Articles Published This Month (December 2018)

10 data articles handpicked by the Sicara team, just for you.

Welcome to the December edition of our best and favorite articles in AI that were published this month. We are a Paris-based company that does Agile data development. This month, we spotted articles about deep learning, visualization, ethics and more. We advise you to have a Python environment ready if you want to follow some tutorials :). Let’s kick off with the comic of the month:

“I find that when someone’s taking time to do something right in the present, they’re a perfectionist with no ability to prioritize, whereas when someone took time to do something right in the past, they’re a master artisan of great foresight”
“I find that when someone’s taking time to do something right in the present, they’re a perfectionist with no ability to prioritize, whereas when someone took time to do something right in the past, they’re a master artisan of great foresight”

1 — Towards Continuous Neural Networks

green circles

It could be a revolution in Deep Learning.

Artificial Neural Networks rely on layers. One might argue that the more layers there are, the more accurate the model is. To capture the most information, each layer will represent a type of feature (from a geometrical shape to a specific concept such as a species).

But what if we replaced all the layers with calculus equations, such that it would act as an infinite layers network? This idea was implemented by David Duvenaud team and earned them the “Best Paper” award in the NeurIPS conference earlier in December.

This could help to define continuous time models. Especially in the medical field where patients health records data come at irregular times.

Even if the ODE net (its little name) is at an early stage, I’m sure other studies will follow and have huge impacts!

If you want more about NeurIPS 2018, check this selection of papers we made for you.

Read A radical new neural network design could overcome big challenges in AI — from MIT Technology Review

2 — Grasp2Vec: Robot Autonomously Learning Object Representation

object representation

With Grasp2Vec, the Google Robotics team designed an algorithm that teaches object representation to a robotic arm. By grasping a single object from an initial scene, the robot creates 3 different pieces of information: the initial scene, the final scene, and the grasped object.

Difference between final and initial scenes results in the grasped object. From this intuition, the team trained a model that helps in:

  • comparing objects

  • representing a specific object in the scene

Good news is that the robot creates its own dataset by picking object after object!

Read Grasp2Vec: Learning Object Representations from Self-Supervised Grasping— from Google AI Blog

3 — Anomaly Detection with AW

anomaly detection

Anomaly Detection is central to many businesses, but hard to establish.

Take the example of a company specialized in healthcare insurance. Every day, it treats thousands of care transactions. Among them, a very small portion are fraudulent: you have to block them and investigate. But how do you detect them when you have no model and you lack technological infrastructure?

You can start by following this step-by-step article. It will help you implementtrain, and deploy your own anomaly detection system. Based on Amazon SageMaker, AWS Machine Learning Service, the system provides you with a complete and operational baseline model for your business.

By the way, we’re preparing an article to address fraud detection using python graph library NetworkX. Don’t forget to follow us!

Read Anomaly detection on Amazon DynamoDB Streams — from AWS Machine Learning Blog

4 — Data Visualization with Python

python visualization

Who said that representing data with python was painful and inefficient?

I did… Until I discovered this article. Mixing examples using matplotlib or seaborn, this best of plot solutions gathers 50 graphs sorted by category: distributionrankingcorrelation,…

Each example comes with its own code snippet you can copypaste and adapt to your situation.

Hope it will save you time. It did for me!

Read the Top 50 matplotlib Visualizations — from Machine Learning Plus

5 — Upgrading Jupyter Notebook

jupyter notebooks

For those using Jupyter Notebooks, did you know that you could customize your notebooks?

There are several extensions you can enable/disable at will, for more modularity. Each of them will make your daily use of notebooks more efficient, convenient and neat!

My favorites?

Read Jupyter Notebook Extensions — by Will Koehrsen

6— AlphaZero…Again

alpha zero

One year ago, DeepMind team released their system, one that could beat any player from Chess, Shogi and Go. They now publish the full evaluation of their creation.

AlphaZero is based on a Monte-Carlo Tree Search method that is guided by a self-taught Neural Network trained through Reinforcement Learning. And it works. Stubbornly well. But it also brings a new vision to these centuries-old games.

The implications go far beyond my beloved chessboard… Not only do these self-taught expert machines perform incredibly well, but we can actually learn from the new knowledge they produce.

Garry Kasparov

What experts would call risky moves (such as sacrificing valuable chess pieces) are central to AlphaZero style. It offers inspiring material for Chess, Shogi or Go players, from beginner to expert.

Read AlphaZero: Shedding new light on the grand games of chess, shogi and Go— from DeepMind

7 —A driving agent inspired by synthetic situations

driving agent

What should a fully autonomous driving agent do in an imminent crash situation? In a strangely curved road? Or when facing an unanticipated pedestrian?

Tough questions when you only trained your agent on “standard” expert driving data. Because those rare situations are usually not part of such datasets.

To synthesize such situations and train their RNN, Waymo used a distorted version of expert trajectories. The resulting trajectories are then combined to a loss policy in order to train their agent — ChauffeurNet — and make it more robust to the unusual situations.

Still not convinced by self-driving cars?

Read Learning to Drive: Beyond Pure Imitation — from Waymo Team

8 — Improve AI training through Data-Parallelism

training time

Splitting datasets into chunks used for training different versions of a single model is called data-parallelism.

Data-parallelism is one of those smart moves in AI that help to improve our field. But could we go further and derive what would be the perfect size of batches for data-parallelism beforehand?

The OpenAI Team related the Optimal Batch Size to the Gradient Noise Scale (variation of the gradient between training examples). By computing Gradient Noise Scale in training dataset we obtain the critical size. And intuitively, the more complex the problem is (eg. training a Dota agent), the bigger the batches are.

A handy way to train your Neural Networks.

Read How AI Training Scales — from OpenAI blog

9 — Your Location Data is No Secret

location data

How many people leave your building every working day to go to your office out of the city, passing by your favorite bakery every once in a while, and sometimes go play tennis on Sundays? Only you.

And how many companies have access to this valuable information? Dozens.

Scary, isn’t it?

Many mobile applications require access to your location. Location data is then used as a product for companies to deliver you personalized services.

Even if this is no new information, I hope that this compelling article will get you familiar with your favorite apps privacy settings.

Read Your Apps Know Where You Were Last Night — from The New York Times

10 — Audio Processing

music graph

A little bit of music to conclude this best of.

I suppose you knew that an audio signal was a wave. Perhaps you also got that it could be represented by its spectrogram. But what about Zero Crossing RateSpectral Rolloff or the Mel-Frequency Cepstral Coefficients?

The 3 of them are features you can use for describing an audio signal. Along with other useful features, this article will get you familiar with audio processing. It also proposes a nice application: music genre classification.

Everything with Python!

Read Music Genre Classification with Python — by Parul Pandey

Thanks to Nicolas Jean, Alexandre Sapet, Raphaël Meudec, and Flavian Hautbois. 

thief

How to Perform Fraud Detection with Personalized Page Rank

This article shows how to perform fraud detection with Graph Analysis.

header commandes python

How to Write Perfect Python Command-line Interfaces

This article will show you how to make perfect Python command line interfaces.

bridge

Image Registration: From SIFT to Deep Learning

How the field has evolved from OpenCV to Neural Networks.