Winning Topic: Extraction to Analysis Systems Approach Across All Data Categories - $30,000
Team Summary: The CMU team is led by Research Professor Alex Hauptmann has been influential through groundbreaking research in the areas of man-machine communication, natural language processing, speech understanding, video analysis, machine learning and multi-modal fusion with success documented by outstanding performance in numerous evaluations and competitions. The UCF team is guided by Professors Mubarak Shah and Yogesh Rawat who have vast research experience with innovative analytic approaches in video surveillance, visual tracking, human activity recognition, visual analysis of crowded scenes, video registration, UAV video analysis, multimedia and social computing. Together, these two teams have performed research in the public safety domain that has already benefited and is being used by the military, the intelligence community, law enforcement, news organizations, and human rights organizations.
Team members: Yogesh Singh Rawat (UCF), Mubarak Shah (UCF), Alexander G. Hauptmann (CMU), Praveen Tirupattur (UCF), Junwei Liang (CMU), Shruti Vyas (UCF)
Alexander G. Hauptmann and Mubarak Shah are members of a winning Contest 1 team. They joined the ASAPS team via Zoom to answer some questions about their team’s winning solution and give some insight about the ASAPS Challenge.
To begin with, can you give us an idea of what your mission was and how you achieved it in Contest 1?
Alex: We see our mission as to enable timely responses for critical public emergencies. And to do that, you need to identify and analyze what events they are, what they mean. This can be from surveillance, videos, social media, and all sorts of other types of information. The challenges, in addition to analysis, is to fuse and summarize all the information into one sort of three dimensional or multidimensional reconstruction that allows improved decision-making and deployment of resources. That is really the key to what we are trying to do.
One thing that is unique about the ASAPS challenge is this combination of real-time requirements and its dramatically multimodal nature. What do you think are the unique challenges of integrating a fusion from these very heterogeneous data streams? And what do you think are the best approaches to taking that challenge on?
Alex: I think that is the fundamental toughest problem to figure out – Should we care about this audio stream? How can we present the insights from that audio stream to somebody who is supposed to make decisions? And at the same time there are these videos that show chaos over in this corner, and there are some police officers trying to quell the violence over there. How do we put this all together and make it a reasonable interface? – The real time challenge I think we are in pretty good shape to solve, in the sense that we have systems that are running in sub-real time on reasonably priced, computational architectures. But having reasonably affordable processing for each stream and then being able to merge the analytic insights from each stream is key. You can get better identification, better analysis if you have merged your muzzle flash detection with your gunshot detection, then you are much more certain that this really was a gunshot.
What are you most looking forward to in terms of the resources and infrastructure in the ASAPS challenge? What kinds of things are important to you in terms of applying your research to the contest?
Mubarak: I think that for us, the most exciting thing is the dataset because we utilize different modalities to solve the problem. You have the text information and the video, images and maybe audio. I think that would be good for us to challenge ourselves to see how well we can do.
Alex: In addition to the technical challenges, to me, what is exciting is that this has a potential for the research to really make a difference in decision-making for public safety. I mean, we publish papers, and we show that we can do a little bit better at face recognition or a little bit better at action recognition, but in the end – so what? And I think being able to sort of complete that and say, “Okay, here's how we need to extract the information and present it to somebody who makes decisions that could really change things around.” And that would be, just a phenomenal thing for us to achieve as researchers – to make some difference in the world.