best ai may 2018
June 22, 2018

The Best of AI: New Articles Published This Month (May 2018)

10 data articles handpicked by the Sicara team, just for you.

Welcome to the May edition of our best and favorite articles in AI that were published this month. We are a Paris-based company that does Agile data development. This month, we spotted articles about the last breakthroughs in AI as well as some controversies. Let’s kick off with the comic of the month:

machine learning system comic

1 — “garbage in, garbage out”

amount of lost sleep over

While most of the machine learning researchers focus on improving their models, Tesla’s data scientists spend 75% of their time trying to improve their data setThis blog post explains why you should do the same if you want to build a cracking machine learning product!

The author advises you to quickly pick a model among the well-known ones, and then concentrate on getting a quantitative, qualitative and diversified training data. He also gives some tips to continuously improve this precious data set.

Read Why you need to improve your training data, and how to do it — from Pete Warden

2 — 3.5 billions of images

iphone images

Facebook seems to have well understood the power that a quantitative data set provides. As explained in this article, the company has used a smart approach to collect its huge amount of data for training object recognition models. Maybe have you even contributed to feeding this data set, because it consists of all pictures people have posted on Instagram, with their hashtags as labels.

Of course, as smart and efficient as it is, it raises some privacy issues. Some people may not appreciate that Facebook’s data scientists look at their pictures.

Read Facebook is using billions of Instagram images to train artificial intelligence algorithms — from Nick Statt

3 — GDPR and Machine Learning

keep calm and comply with gdpr

Speaking of privacy issues, an important event happened this month: On May 25, the new European privacy regulation called The General Data Protection Regulation (GDPR) came into effect. Many people worry about the impact of these new laws on the data science world, and especially on Machine Learning. That is why this article has tried to answer the most commonly asked questions about the subject.

Two questions remain partially unanswered:

  • To which extent companies will have to “explain” how algorithms work to their users?

  • Will people have the ability to ask companies not to train their algorithms with their personal data?

Read How will the GDPR impact machine learning? — from Andrew Burt

4— An AI journalist

an ai journalist

Would you believe me if I told you that this Best of AI article had been written by an algorithm? Don’t worry, this is not the case. But this article makes me think that it will be possible one day: it is about a new algorithm developed by Salesforce which is able to summarise any long document, in a surprisingly coherent way.

Salesforce data scientist uses reinforcement learning technics, scoring the output summaries with an automated evaluation metric called ROUGE.

Read An Algorithm Summarizes Lengthy Text Surprisingly Well — from Will Knight

5 — The Book of Why

the book of why

Correlation does not imply causation, and that is why it is often tricky to infer causal relationships from data. But Judea Pearl wants to take up that challenge. This computer scientist and philosopher, a winner of the Turing Award, just wrote a book called The Book of Why: The New Science of Cause and Effect.

As I have read in this article, Judea Pearl is disappointed by the last signs of progress in machine learning, which are “just curve fitting”. He thinks that learning AI to find causes is the real next step for approaching human-level intelligence.

Read To Build Truly Intelligent Machines, Teach Them Cause and Effect — from Kevin Hartnett

6 — A new Deep Learning Computer Vision toolkit

deep learning computer vision toolkit

The author of this MXNet blog post explained that he and his team had encountered many problems when trying to replicate experimental results from papers. In order to solve this issue, they built GluonCV, a new toolkit designed to allow any newcomer to the Deep Learning field to try pre-trained models from recent important papers.

Not only will it be useful for a beginner to learn the concepts, but it should also be very helpful for an engineer who wants to quickly test a new model in order to see if it could be suitable for his problems.

Read GluonCV — Deep Learning Toolkit for Computer Vision — from Mu Li

7 — “OK Google, make me an appointment at the hairdresser!”

google make appointment at the hairdresser

One of the most impressive news from this month has been the demonstration from Google of their new technology called Google Duplex, an assistant able to call restaurants or shops in order to release some booking tasks. As stunning is the demo, you may want to understand a bit how this Deep Learning algorithm works. And, luckily, Google explained it in their blog post.

Read Google Duplex: An AI System for Accomplishing Real-World Tasks Over the Phone — Google AI Blog

8 — Using 3D maps is cheating

car driving

Until now, self-driving cars had always needed dense 3D maps of roads in order to take them. These maps were used to determine the precise trajectory of the car in these roads. This limitation made it really difficult to drive on less traveled country roads.

But as it is written in this article, for the first time MIT researchers have built a prototype which does not need this specific kind of maps. It uses only standard imprecise maps (from Google maps) and sensors to detect the curve of the road.

Read MIT built a self-driving car that can navigate unmapped country roads — from Andrew J. Hawkins

9 — Controversy at Google

military google

As told in this article from The New York Times, there has been this month a controversy about one of Google’s projects, which raised debates inside and outside the company. This computer vision project for the Pentagon, called Maven, consists of analyzing images taken from drones in order to be able to automate some attacks.

Several thousands of Google’s employees, strongly opposed to the participation of their company to warfare technologies, have signed a petition in order to stop the project.

Read How a Pentagon Contract Became an Identity Crisis for Google — from Scott Shane

10 — Ten free must-read books

Ten free must-read books

Let us conclude this Best of AI with a Best-of-ception. The last article I wanted to recommend presents 10 useful books about Machine Learning and Data Science which are available online for free! You will find all you need if you want to learn or in improve in Python, Neural Networks, NLP, Data Mining, or even Bayesian statistics.

This compilation follows a first one which was written by the same author last year.

Read 10 More Free Must-Read Books for Machine Learning and Data Science — from Matthew Mayo

Thanks to Raphaël Meudec and Adil Baaj. 


Git Branch Control when Deploying on AWS with Serverless Framework

Use a Serverless Plugin to check your Git branch before deploying on AWS.

fig. 1: Screenshot of my React app using the neural networks computed here.

Introduction to Deep Q-learning with SynapticJS & ConvNetJS

An application to the Connect 4 game.


Publish Data Outside Your Data Lake with a Spark Connector

Feedback on implementing a Spark connector for Tableau.