Welcome to the May edition of our best and favorite articles in AI that were published this month. We are a Paris-based company that does Agile data development. This month, we spotted articles about the last breakthroughs in AI as well as some controversies. Let’s kick off with the comic of the month:
While most of the machine learning researchers focus on improving their models, Tesla’s data scientists spend 75% of their time trying to improve their data set. This blog post explains why you should do the same if you want to build a cracking machine learning product!
The author advises you to quickly pick a model among the well-known ones, and then concentrate on getting a quantitative, qualitative and diversified training data. He also gives some tips to continuously improve this precious data set.
Facebook seems to have well understood the power that a quantitative data set provides. As explained in this article, the company has used a smart approach to collect its huge amount of data for training object recognition models. Maybe have you even contributed to feeding this data set, because it consists of all pictures people have posted on Instagram, with their hashtags as labels.
Of course, as smart and efficient as it is, it raises some privacy issues. Some people may not appreciate that Facebook’s data scientists look at their pictures.
Speaking of privacy issues, an important event happened this month: On May 25, the new European privacy regulation called The General Data Protection Regulation (GDPR) came into effect. Many people worry about the impact of these new laws on the data science world, and especially on Machine Learning. That is why this article has tried to answer the most commonly asked questions about the subject.
Two questions remain partially unanswered:
To which extent companies will have to “explain” how algorithms work to their users?
Will people have the ability to ask companies not to train their algorithms with their personal data?
Would you believe me if I told you that this Best of AI article had been written by an algorithm? Don’t worry, this is not the case. But this article makes me think that it will be possible one day: it is about a new algorithm developed by Salesforce which is able to summarise any long document, in a surprisingly coherent way.
Salesforce data scientist uses reinforcement learning technics, scoring the output summaries with an automated evaluation metric called ROUGE.
Correlation does not imply causation, and that is why it is often tricky to infer causal relationships from data. But Judea Pearl wants to take up that challenge. This computer scientist and philosopher, a winner of the Turing Award, just wrote a book called The Book of Why: The New Science of Cause and Effect.
As I have read in this article, Judea Pearl is disappointed by the last signs of progress in machine learning, which are “just curve fitting”. He thinks that learning AI to find causes is the real next step for approaching human-level intelligence.
The author of this MXNet blog post explained that he and his team had encountered many problems when trying to replicate experimental results from papers. In order to solve this issue, they built GluonCV, a new toolkit designed to allow any newcomer to the Deep Learning field to try pre-trained models from recent important papers.
Not only will it be useful for a beginner to learn the concepts, but it should also be very helpful for an engineer who wants to quickly test a new model in order to see if it could be suitable for his problems.
One of the most impressive news from this month has been the demonstration from Google of their new technology called Google Duplex, an assistant able to call restaurants or shops in order to release some booking tasks. As stunning is the demo, you may want to understand a bit how this Deep Learning algorithm works. And, luckily, Google explained it in their blog post.
Until now, self-driving cars had always needed dense 3D maps of roads in order to take them. These maps were used to determine the precise trajectory of the car in these roads. This limitation made it really difficult to drive on less traveled country roads.
But as it is written in this article, for the first time MIT researchers have built a prototype which does not need this specific kind of maps. It uses only standard imprecise maps (from Google maps) and sensors to detect the curve of the road.
As told in this article from The New York Times, there has been this month a controversy about one of Google’s projects, which raised debates inside and outside the company. This computer vision project for the Pentagon, called Maven, consists of analyzing images taken from drones in order to be able to automate some attacks.
Several thousands of Google’s employees, strongly opposed to the participation of their company to warfare technologies, have signed a petition in order to stop the project.
Let us conclude this Best of AI with a Best-of-ception. The last article I wanted to recommend presents 10 useful books about Machine Learning and Data Science which are available online for free! You will find all you need if you want to learn or in improve in Python, Neural Networks, NLP, Data Mining, or even Bayesian statistics.
This compilation follows a first one which was written by the same author last year.
Git Branch Control when Deploying on AWS with Serverless Framework
Use a Serverless Plugin to check your Git branch before deploying on AWS.
Introduction to Deep Q-learning with SynapticJS & ConvNetJS
An application to the Connect 4 game.
Publish Data Outside Your Data Lake with a Spark Connector
Feedback on implementing a Spark connector for Tableau.