SAP Leonardo Machine Learning at ECCV 2018

Computer vision experts will gather this month in Munich, Germany for the 15th European Conference on Computer Vision (ECCV), taking place between 8–14 of September. As a gold sponsor, SAP will join its peers to exchange ideas about the field’s recent developments and trends and share its recent research papers in the field of computer vision.

SAP Machine Learning Research team is one of the organizers of the workshop on the shortcomings in vision and language (SiVL), and our researchers will be presenting their recent research papers in VizWiz and SiVL workshops. Our team is also among the top performing teams participating in the VizWiz challenge.

Workshop on the Shortcomings in Vision and Language (SiVL)

As one of our primary research areas, our team is working on highlighting the challenges faced by the joint communities of computer vision and natural language processing. This workshop is a collaboration with researchers from the University of Amsterdam, University of Edinburgh, University of Trento, Georgia Institute of Technology, Rochester Institute of Technology and Facebook AI Research. We aim to initiate a continuous dialogue among researchers from both areas to examine current approaches and its limitations and steer development towards more novel methods that adopt a truly integrative approach, taking as its focal point multimodality when dealing with vision and language models.

Our researchers will be presenting two accepted papers in this workshop: “Be Different to Be Better: Toward the Integration of Vision and Language” and “An Evaluative Look at the Evaluation of VQA.

Our Papers at SiVL

Be Different to Be Better: Toward the Integration of Vision and Language

Sandro Pezzelle, Claudio Greco, Aurélie Herbelot, Tassilo Klein, Moin Nabi, and Raffaella Bernardi

This paper gives a comprehensive analysis of current vision and language approaches and highlights its limitations. The paper introduces a proposal for a truly integrative approach that aims at developing models that mimic human interaction by eliminating redundancy through integrating and utilizing information from multiple modalities.

An Evaluative Look at the Evaluation of VQA

Shailza Jolly, Sandro Pezzelle, Tassilo Klein, and Moin Nabi

The second workshop paper dissects current VQA evaluation metrics and proposes a multi-component evaluation metric that measures the performance of VQA models. The proposed metric tests predictions, while taking into account three factors: the majority voted answers, the subjectivity of questions and the semantic similarities between answers. Evaluating VQA predictions more comprehensively can lead to a more accurate understanding of VQA performance.

VizWiz Workshop Challenge: AI and the progress of assistive technologies

Artificial intelligence has the potential to be the driving force to accelerate the progress of the industry of assistive technology. Therefore, pushing the AI community to engage more in developing new models and algorithms, tailor-made to meet the industry’s needs and demands should be a priority. AI can unlock a breadth of new possibilities that would empower people with different disabilities and assist them in overcoming their daily challenges.

VizWiz workshop challenge is leading discussions in this area by bringing together the AI community across various domains from academia to industry to foster collaborations that would lead to more inventions and applications for the assistive technology industry.

The workshop opens submissions for a challenge that aims at developing a new VQA dataset that includes visual questions from blind people about real life images they took and questions about these images. The tasks involved are: VQA and determining whether a question is answerable or not.

As participants in the VizWiz challenge and one of the top performing teams, our researchers will be presenting our approach in solving the challenge, along with one accepted poster presentation to be presented at the workshop.

When the Distribution Is the Answer: An Analysis of the Responses in VizWiz

Denis Dushi, Sandro Pezzelle, Tassilo Klein and Moin Nabi

The paper examines how answers are distributed in the VizWiz dataset, and proposes a new model that can exploit responses’ distribution, which can lead to a higher level of accuracy of the answers.

