AI Scholar: Weekly 3

Source: Deep Learning on Medium


Cross-domain NLP with bioBERT, training systems to detect deep fakes, split mobile cloud-based deep learning systems, real-time collision threat assessment, and more.

Bidirectional Encoder Representations from Transformers (BioBERT) for Biomedical Text Mining

Image result for BERT transformer

Driven by medical innovations and rapid growth in biomedical content, biomedical text mining is increasingly becoming important. Progressive developments in machine learning combined with active research and engineering have made it possible to extract valuable information from biomedical literature. The icing on the cake, deep learning is gradually boosting the field through the development of effective biomedical text mining models.

Nonetheless, deep learning models demand massive amounts of training data for biomedical fields. And the result is as you’ve probably guessed progress has been slow, but BioBERT has pushed the boundaries of what is possible

Potential Uses and Effects

What does all this mean for computational biologists and the entire community of AI researchers and engineers? BioBERT can be used to help advance the design and development of efficient and accurate bioinformatics text mining tools whose demand is significantly on the rise.

Additionally, while BERT was specially built for general purpose language interpretation, BioBERT will greatly drive research developments in the biomedical industry. The code can easily be adapted to support other niche text use cases like machine learning papers.

Read More: https://arxiv.org/abs/1901.08746v2

Git Hub: https://github.com/dmis-lab/biobert

Intelligent Mobile/Cloud Computing Services with Deep Learning

A new deep learning algorithm BottleNet, significantly minimizes the feature size needed to be sent to the cloud. It is trained using a compression-aware method which allows considerable bit savings while at the same time providing an acceptable accuracy.

On testing BottleNet across different wireless network settings, the model achieves 30× improvements for end-to-end latency and 40× improvements for mobile energy consumption with a negligible accuracy loss(less than 2%).

The new model aims to select the best partition point to achieve minimum end-to-end latency and mobile energy consumption at run-time. It can adapt to any DNN architecture, wireless network settings, hardware platform, and mobile and server load levels.

Potential Uses and Effects

The accuracy point for the traditional cloud-based method ResNet-50 model accuracy on mini ImageNet dataset without the proposed bottleNet model 76 percent. With less than 2 percent accuracy loss, BottleNet collaborative intelligence framework can be used to achieve approximately 84× bit savings in mobile cloud applications. This means considerable low costs communication feature transmission between the mobile and cloud to achieve low-cost development and deployment of DNNs for mobile cloud applications in the near future.

Read more: https://arxiv.org/abs/1902.01000v1

New Method for Predicting Automotive Collision Risk in Real-time

A group of researchers have recently presented a low-cost methodology to help achieve real-time automotive risk prediction using minimal inputs and with reasonable computational power. The proposed model has the potential to predict automotive collision risk from a monocular video source. The system integrates object detection, object tracking, and state estimation components and solutions to produce real-time predictions of the automotive risk for the next 10s at over 5 Hz. The modular system is also designed in such a way that alternative components can be substituted with minimal effort.

Potential Uses and Effects

This proposed frameworks will envisage the intermediate automotive collision risk for a driver at low cost and in so doing, help advance the development of efficient, extendable, and effective automotive collision risk systems.

Assuming that the parameters for an autonomous car camera are known beforehand such as the focal length and mounting position, the system will greatly minimize the costs for LIDAR sensors. Forthcoming developments have the potential to support full view scenes through multiple cameras. The approach will create an impact in Generative Adversarial Imitation Learning which can be incorporated to generate more realistic driving scenarios for simulation-based risk predictions.

Read more: https://arxiv.org/pdf/1902.01293v1.pdf

Face Forensics: Learning to Detect Manipulated Facial Images

The last two decades has spurred a lot of interest in synthetic image generation and manipulation. Most of it has been for the development and advancement of intelligent AI systems. But this rapid progress in virtual image synthesis has also presented a bad side with numerous negative implications.

Anyone can now use computer graphics and vision techniques for facial expression and identity manipulation. This has already lead to a loss of trust in digital content across the globe, and might even cause further damage through the creation of fake news and dissemination of false information.

New Approach to Detect Manipulated Images

For both machines and humans, it is difficult to detect properly done image manipulations. But this new research study demonstrates a new approach that can outperform human observers and be applied to automatically detect such manipulations. The approach leverages recent deep learning advances the ability to learn image features with convolutional neural networks (CNNs) to handle the detection problem.

Researchers used generated datasets of manipulations based on FaceSwap, Face2Face, and DeepFakes as well as domain-specific knowledge to train forgery detector models in a supervised fashion. A user study was then done to assess the effectiveness of selected face manipulation methods and the ability of a human observer to detect fake images.

Potential Uses and Effects

As fake images and videos become more commonplace, society will have to find ways of dealing with fake news. The proposed algorithm is the first step in a long journey of securing our information sources.

Read More: https://arxiv.org/abs/1901.11528v1

Artificial Agent Capable of Interacting with Humans in Collaborative Dialogue

Think about what makes human conversations interesting for a minute. An interesting collaborative dialogue is a result of knowledge that is iteratively built on by speakers through the presentation of new information to the conversation. But when it comes to artificial agents for collaborative dialogue, such is easier said than done.

AI has come along way from presenting us with rule-based conversational agents which are limited to programmed rules to deep learning agents that can hold pretty interesting conversations with humans. Designing and developing such agents is still a grand challenge because of a lack of global consistency. The field is an emerging technology with a lot of ongoing research and developments for advancements that can achieve agents with the capability to generate grammatically correct sentences that make a lot of sense depending on the context in question in collaborative dialogue.

Enabling Effective Collaborative Dialogues

Researchers have presented work that makes it possible for artificial agents to hold creative and engaging machine-human dialogues. The new model is based on a narrative arc to incrementally construct shared knowledge and augment any conversational agent by allowing it to reason about universal information.

With each response, an agent is able to reveal only the necessary information without limiting the conversation which significantly improves the accuracy that is needed to keep the conversation going. The algorithm has been tested and validated using an expert user-study involving professional theatre performers who prefer the model-augmented agent interactions over those of an unaugmented agent.

Potential Uses and Effects

With increasing popularity in personal digital assistants and conversational bots, these developments can be applied for more effective conversational systems that provide the capability for creative and helpful human-machine interactions industries such as health-care, customer services, marketing etc.

The development also fills a previous research gap by connecting the utterance-level improvements of language models with the conversation-level improvements of universe tracking and can be compatible with image or video based conversations.

Read More: https://arxiv.org/abs/1901.11528v1

Scalable and Adaptable 6-DOF Pose Estimation

Pose estimation, location, and mapping problems have been some of the most challenging problems in computer vision. The reasons — it is no easy task to locate and map every focus point on an object to enable rotational motion around an axis due to the obscurities enacted by diverse camera viewpoints, textures, background environment, and more. Secondly, it is difficult to change 2D image data to a 3D pose and on a single image.

With the advent of real-world robotics, autonomous driving, and augmented reality (AR) which rely significantly on advances in localization and mapping algorithms, visual localization has thus become a key component. Well, most of the research that has gone into the field has predominantly focused on improving accuracy and precision and has left issues such as model scalability and flexibility to adapt to available computational resources untouched.

SASSE: New 6-DOF Localization Algorithm

Latest research work in the field involves a new 6-DOF localization algorithm which boasts of capability to achieve efficient storage through sub-linear storage growth, uncertainty to image descriptors, and customization of available storage and computational resources concurrently for the first time in history.

Key features of the model are established through the adaptation of multiple-label learning which is integrated together with effective dimensional reduction and learning techniques to achieve simple and efficient optimization. The model has also been evaluated on a number of large benchmarking datasets and found to achieve competitive accuracy levels with previous existing pose estimation methods.

Potential Uses and Effects

6-DOF Localization System can be used to achieve better sub-linear storage scaling as well as reduced storage and computational requirements for faster model training and deployment. Its uses can also be extended for applications where 6-DOF poses are unavailable. In addition, proposed SASSE techniques can also be utilized in object pose estimations such as robot grasping and manipulation.

Read More: https://arxiv.org/abs/1902.01549v1

Beholder-GANs Can Generate and Beautify Facial Images

Researchers have developed a deep learning model, beholder- GAN, a variant of PGGAN which showcases the ability to generate realistic facial images conditioned on a beauty score.

Sequences of facial images are generated with the same latent space vector and different beauty levels offering insights into what humans consider beautiful. Human biases with regards to age, race, and gender are also revealed. Overall, the study also presents a new technique that relies on a trained generator model to recover the latent vector of any given real face image and use beholder-GAN to “beautify” it. While we can agree that beauty is subjective, the resulting score from different people shows correlation.

Potential Uses and Effects

Facial beauty prediction can be used to achieve automatic human-consistent facial attractiveness assessments. It can be applied in face beautification, facial makeup synthesis and recommendations, content-based image retrieval, aesthetic surgery and more. It, however, should be used with care lest it becomes a racist algorithm. Additionally, a very small or huge increase in a beauty score can result in harsh facial transformation effects.

Read More: https://arxiv.org/abs/1902.02593v1

Face Based Emotion Recognition Enhancement Using Attention

Research in the field of Face Recognition Systems has been ongoing for many years. Automatic facial recognition is undoubtedly an important component of human-computer interfaces with a myriad of practical applications in biometrics, video surveillance, law enforcement, smart cards, access control, information security, and more.

But even with ongoing active research, reliable face recognition systems still face great challenges. Conventional systems have always performed reasonably with controlled data inputs, but they fail to perform as good when it comes to challenging uncontrolled datasets that may come with image variations. To deal with this, deep learning models have been used to advance facial expression recognition systems. However, there’s still more to achieve for even more efficient and accurate facial recognition systems.

Attentional Convolutional Network for Deep Emotion Face Recognition

This recent paper presents a deep learning model that improves facial recognition through the application of an attentional convolutional network. Such a network is packed with the capability to extract distinct features from input images and narrow down on important facial portions.

On multiple datasets including JAFFE, FER-2013, and FERG, the model achieves significant improvement over conventional models. The researchers in this case also used a visualization technique which was able to find important face regions to detect a wide range of emotions.

Potential Uses And Effects

Face recognition systems have numerous applications and their improvement will spur the development of even more effective systems. And, with a lot of business and personal procedures now conducted virtually, improved facial recognition systems will achieve a key role in secure identity verification with applications including electoral registration, virtual banking, e-commerce, electronic IDs, e-passports, employee IDs, sign language and more.

Read More: https://arxiv.org/abs/1902.01019v1

Other interesting papers