Welcome to the November edition of our best and favorite articles in AI that were published this month. We are a Paris-based company that does Agile data development.
This month, we spotted articles about AI that can identify who wrote each scene in Shakespeare’s Henry VIII, and teach non-native speakers how to pronounce English words! Let’s start, as usual, with the comic of the month:
In a recent article researchers describe how they trained machine-learning algorithms to predict what features in a song would impact people’s emotional responses.
They predicted brain and heart activities as well as physiological response using features based on music dynamics such as timbre, harmony, etc...
This work helps to understand how music affects human experience and has applications in music emotion recognition and neuroscience.
How to improve your English pronunciation if — like me — you do not always understand why your sentence is wrongly enunciated? A startup used machine learning to tackle this challenge! Blue Canoe created a mobile app directing its users to repeat sentence prompts. Speech-recognition technology then analyzes the recordings and uses machine-learning models to point out the differences. When users spend 10 minutes per day on the app, personalized feedback informs students precisely how they mispronounced words. The startup started by digitizing a 20-year-old methodology called the Color Vowel System. Then, they hired linguists to listen to users’ recordings and tag the problems. Recordings are then used to improve machine-learning models.
While California was last month the third state to frame Police use of facial recognition softwares, Police Scotland unveiled this month a new drone using computer vision to search missing and vulnerable people reported BBC. Its recognition software is lightweight enough to be used on a smartphone and uses an optical camera and a sensor detecting heat. Police Scotland’s air support unit detailed aspects of its drone to argue it will not be used to spy citizens: ”We’ll comply fully with all the human rights legislation — in fact a data protection impact assessment has been carried out and we review that yearly. Also, before we deploy we’ll use social media to tell the public this is what we’re doing. ”In addition, its blue light and the sound of its rotors are supposed to alert people of its presence, believes BBC.
PySlowFast —Facebook’s video recognition system— is now available on GitHub and its mechanisms explained in a preprint paper. The main intuition of this system is to reproduce primate’s eye cells. These cells are either functioning at low frequency and focusing on fine details either responding to swift changes. Transposed to this video's recognition system: the video is treated at a low and at a higher temporal rate. The lower to recognize static areas and the higher to recognize dynamic areas. This model has been confronted to two popular datasets: DeepMind’s Kinetics-400 and Google’s AVA and achieved state-of-the-art results on both.
Last November 5, OpenAi finally released the largest version of its controversial model GPT-2, claiming they have not found “strong evidence of misuse so far”. GPT-2 is a deep learning model able to output credible text from a minimal prompt (demo here). This full version was not originally released last February because OpenAi was concerned it could be used to automatically produce Fake News (summary of the debate here). They motivated this late release by the following arguments:
this model version has only a marginally greater “credibility score” compared to already released version (according to a survey by Cornell University).
they acknowledge that “GPT-2 can be fine-tuned for misuse” but argue that “despite having low detection accuracy on synthetic outputs, ML-based detection methods can give experts reasonable suspicion that an actor is generating synthetic text”
they conducted in-house detection research and developed a that has detection rates of ~95% for detecting 1.5B GPT-2-generated text. By releasing this version they aim “to aid the study of research into the detection of synthetic text, although this does let adversaries with access better evade detection”.
The French version of BERT has been released on huggingface/transformers repo! BERT or Bidirectional Encoder Representations from Transformers is a method based on pre-training language representations which obtained state-of-the-art results on a wide array of Natural Language Processing tasks (Google explanations here). This French version has been trained on 138 GB of French text and is available both in Pytorch and Tensorflow 2. This release is the achievement of a collaboration between Facebook AI, INRIA and Sorbonne Université.
In an interesting blog post, Deepmind explained its approaches implementing recommendation algorithms for Google Play Store, in order to “help users discover personalized apps”. The first approach using LSTM (Neural Network used to treat sequences) has been replaced by Transformers, which improved the model performance, but also increased the training cost. Third and final solution was to implement “an efficient additive attention model that works for any combination of sequence features, while incurring low computational cost”. In addition, the blog post introduced recommendation bias problem and how they deal with it: “For instance, if app A is shown in the Play Store 10 times more than app B, it’s more likely to be installed by the user, and thus more likely to be recommended by our model”. They detailed refinements they introduced in re-ranking recommendations and optimizing for multiple objective, such as relevance, popularity, or personal preferences.
Some literary analysts believe that Shakespeare did not write his play Henry VIII alone but has been helped by John Fletcher, the writer who replaced him as playwright of the King’s Men after his dead. In the mid-nineteenth century, literary analyst James Spedding already proposed a division based on the use of eleven-syllable lines. In 1962, an influential analyst divided the play between Shakespeare and Fletcher based on their distinctive word choices, for example Fletcher’s uses of ye for you and ’em for them. And last month, Petr Plecháč of the Czech Academy of Sciences in Prague claimed he has studied the problem using machine learning to identify the authorship at a more accurate level (not only attributing scenes): “Our results highly support the canonical division of the play between William Shakespeare and John Fletcher proposed by James Spedding”.
A startup named Heliogen aims to increase solar panel energy production by using advanced computer vision software. Such technology's impact should not be limited to increase energy production. By accurately aligning mirrors Heliogen expects to be able to reach temperatures over 1,000 degrees Celsius. Such high temperatures could be used for the industrial applications that currently account for roughly 75 percent of the energy demand through fossil fuel production. In addition, this technology could ultimately provide an alternative to gasoline for powering automobiles by “spliting carbon dioxide and water molecules to produce clean-burning fuels like hydrogen”, the article explains. This AI-backed technology could therefore be a step to successfully use solar energy in fields where still dependent to fossil fuel.
In an article submitted last November 11, three researchers explained how they obtained 87.4% top-1 accuracy on ImageNet, which is 1.0% better than the state-of-the-art model that requires 3.5B weakly labeled Instagram images. ImageNet is a famous image database often used to measure the performance of image classification neural networks. To achieve this result, they first trained an EfficientNet on labeled ImageNet images and use it to label 300M unlabeled images (it creates pseudo-labels, as these labels are not the ground-truth but a prediction). This first EfficientNet is called the Teacher. Then they trained a larger EfficientNet — called the student — learning to classify both ImageNet and newly labeled images. They iterate this process by using the larger EfficientNet as Teacher, i.e to re-label the dataset of 300M unlabeled images. During the learning of the student, they injected noise such as data augmentation, dropout, stochastic depth to the student so that the student neural network is forced to learn harder from the pseudo labels. But during the pseudo-labelling of the 300M unlabelled images, the teacher is not noised so that the pseudo labels are as good as possible. These researchers stressed that the “main difference between [their] work and prior works is that [they] identify the importance of noise, and aggressively inject noise to make the student better”. The following results show this impact of noise on the network’s results:
Do you need data science services for your business? Do you want to apply for a data science job at Sicara? Feel free to contact us, we would be glad to welcome you in our Paris office
About Convolutional Layer and Convolution Kernel
A story of Convnet in machine learning from the perspective of kernel sizes.
Artificial Intelligence: The End of Law Firms? (In French)
The next Silicon Valley unicorn will likely be a law firm.
3 Steps to Improve the Data Quality of a Data lake
From Customising Logs in the Code to Monitoring in Kibana