Welcome to the February edition of our best and favorite articles in AI that were published this month. We are a Paris-based company that does Agile data development. This month, we spotted articles about Neural Language Processing, neural network training understanding and about what it means to be a data scientist in 2019. We advise you to have a Python environment ready if you want to follow some tutorials :). Let’s kick off with the comic of the month:
OpenAI trained a neural network they do not want to share. Why? This neural network is able to generate a completely fake news from a single sentence. In other words, it could be a means of mass-producing fake news.
This article presents the results they obtained while training their neural network to predict the next word over 40GB of text. The results are great but a question remains: what if these results fall in the wrong hands?
OpenAI justifies its decision to not share its results and “see this current work as potentially representing the early beginnings of such concerns”.
This is, indeed, one of the first examples both representing a potential threat and being easily accessible to a large part of the population. To me, it definitely formalizes a step towards AI responsibility.
Fake-news fighting is a topic with increasing importance today. This article describes techniques and obstacles for sifting the wheat from the chaff.
Among the given obstacles, the bias induced by topic frequency in the training set is the most important. Indeed, words containing “Trump” or “Clinton” are more likely to be fake-news. Training neural networks so that they can perform efficiently on unknown topics is the real challenge here.
This article popularizes an important topic that everyone, in my opinion, should be sensitized to.
Read Peering under the hood of fake-news detectors — from Rob Matheson.
Data science can bring real value for companies if it is efficiently plugged to their existing business processes. That is the point of the author who explains why data science is complex to bring to production.
Data science is demanding in term of resources, thus, the developers and operations have to work together to bring stable and new features to the customer quickly. Breaking silos in the working team and onboarding the whole company on the automated decision-making topic are, in my opinion, requirements to hope for its successful integration.
Also, his note about lean thinking and efficiency in IT infrastructure implementation is, in my opinion, a key to success.
Read Why Is It so Hard to Put Data Science in Production? — from Sebastian Neubauer.
This article goes straight to its point. Today, there is a common misconception of what the daily work of a data scientist really is. This is particularly important for the hundred thousand junior data scientists currently aiming at starting a professional career in this field. No, mastering some state of the art algorithms will not be sufficient.
However, there is still hope! Here are the skills to master in order to integrate the data industry and to thrive as a data scientist. I really like the way the author is honest about her job. There is no free lunch, so you will need hard work and patience.
Yet, by tackling the real issues, bird by bird, things may get done faster than you think. “Don’t let the hype overwhelm you.”
“The cost of turbines has plummeted and its adoption has surged”, this is why DeepMind thinks conditions are met to boost the value of wind. For this, Deep Learning might be of great help.
Why? Because being able to be accurate on wind predictions will allow turbines owners to optimize their commitments to the power grid. Then, the more scheduling, the more value for the grid.
This article is short but I found it efficient and interessant. It gives context, arguments and a glimpse on the different step of the wind power process to which DeedMind is going to bring value. Thumbs up for carbon-free technologies!
Pedagogical and beautiful. The basis of this writing is another article presenting a neural network that generates beautiful combinations of colors. On his path to reproduce the example given in an article he found on the Internet, the author encountered several problems leading to poor quality images.
Thus, he presents the different steps of his investigation, providing intermediary results and hypothesis. Will the enhancements and modifications increase the quality of the results along the article?
Code to implement your own NN and make your improvements is provided as a Jupyter Notebook so you have no excuse for not being enough of an artist anymore!
This amazing post is a must read for anyone who wants to start working with TensorFlow. The author talks about 7 architectural paradigms from MLP to Deep Reinforcement Learning and, above all, provides code examples to implement each one of them.
This article presents the material for a course given at MIT in a really clear and pedagogical way. I hope you will enjoy it and not only store it in your favorite links!
Video games rapidly became a play field for challenging algorithms. StarCraft has been put under the spotlights because it is a fabulously successful and a really challenging game at the same time. Partially available information, mix of short and long term strategies or latency between action and effects.
After a brief description of the neural network used during the learning phase, the article details the network learning path with visual and interactive schematics.
Also, it is really interesting to compare the way AlphaStar plays with the way real players do. Algorithm outclasses human in some interesting fields such as number of actions per minute or context switching.
This article presents a simple but great idea that could help manually fix biased neural networks.
Researchers from the MIT worked on a tool (NeuroX) that ranks the neurons of a neural network trained for language translation. The goal of this ranking is to measure the importance of each neuron in the translation process.
Thus, ablating an important neuron drastically decreased the performance of the network while ablating a less important one had little impact. It would allow to be more specific when fighting a bias in the network training. Spot the highest ranked neurons responsible for the words gender and ablate it: congratulations, your neural network is not sexist anymore!
Read Putting Neural Networks Under the Microscope — from Rob Matheson
Uber generates “more than a hundred petabytes of raw data” per day. Thus, their need for a reliable, scalable and maintainable system shared between operation teams and machine learning engineers is clear. This led the company to develop Piper, their centralized workflow management system.
This glimpse of the tool Uber is implementing to suit its needs is really interesting. Managing “tens of thousands of workflows running hundreds of thousands tasks a day” requires enhancement. The improvements Uber is bringing iteratively to this tool are clearly presented in this article that should interest anyone working with data systems.
How to Build a Serverless REST API in 15 Minutes on AWS
Use AWS Lambda to build a Serverless REST API, storing data in S3 and querying it with Athena.
How to Perform Fraud Detection with Personalized Page Rank
This article shows how to perform fraud detection with Graph Analysis.
Introducing tf-explain, Interpretability for TensorFlow 2.0
A Tensorflow 2.0 library for deep learning model interpretability.