4 minute read

Automatic Sports Match Highlight Generation

KURT JOSEPH ABELA | SUPERVISOR: Dr Claudia Borg COURSE: B.Sc. IT (Hons.) Artificial Intelligence

Big sporting events are reported in real time by many news outlets in the form of minute-by-minute updates. Many news outlets also issue a news summary of the event immediately after the end of the event. This is generally a time-consuming task, and the journalists would be always under pressure to be among the first to issue their report at the end of the match, in order to publish the articles at a point when the level of interest would be at its peak.

Advertisement

This project proposes a method for detecting highlights from a football match automatically, by using both audio and text features that have been extracted from the commentary. Once the highlights would be detected, the system would then create a textual summary of these highlights, thus facilitating the production process of news articles based on sports reporting.

The dataset used comprises some of the best FIFA World Cup matches published on YouTube. This dataset made it possible to extract the audio from each video, and the text was automatically generated through automatic closed captioning (CC). The project entailed creating a corpus of 14 matches, totalling 23.5 hours. Previously available corpora were either only text-based (such as minute-byminute reports) or focused on features that were aimed at classifying game outcomes. In order to use the audio files as input to a classifier, the task at hand required producing spectrograms, through which any features identified could be analysed. The data is split into 70% training set and 30% test set.

In order to build the system, two separate classifiers were trained: one for the audio and one for the text. The audio classifier was based on a DenseNet architecture, whereas the text classifier was based on a 1D convolutional neural network (CNN) architecture. Once the individual components were trained, a further ensemble classifier was trained to determine the correct level of confidence that should be present in the audio and text classifiers – outputting a final classification for the corresponding timestamp. Finally, once the highlights were classified, the system used the text from the relevant detected timestamps as highlights to produce a summary report.

The audio and text classifiers were evaluated as separate components, whilst the summary outputs were evaluated by comparing them to gold-standard reports from various sports sites. In future work, such a system could also include video analysis, thus providing a richer feature system to detect and report highlights. The same approach could also be applied to different types of sporting events.

Figure 1. System architecture

A system to support audio authenticity in digital forensics

NICHOLAS JOHN APAP BOLOGNA | SUPERVISOR: Dr Joseph G. Vella COURSE: B.Sc. IT (Hons.) Software Development

This project attempts to contribute to audio-forensic procedures by providing a system that would process and check audio files for their authenticity, through various methods. In order to achieve this, it would be necessary to build a collection of tools and techniques, as creating a single tool would result in great difficulty adapting the tool to all the situations/scenarios encountered in the field of audio forensics.

In its original version, the project is a web-based system that would allow the forensics expert to upload and analyse audio files. However, provisions were put in place through the construction of an application programming interface (API), to allow other types of clients to access this system such as mobile, web and desktop applications simultaneously (as depicted in Figure 1). In addition to this, the API would allow clients to run analysis tasks on the data available.

The system accepts multiple file formats as input, such as FLAC, AAC, WAV, MP3, etc., with the multiple audio files being supported by the Librosa library. This would allow the importation of multiple audio file formats into a standard format. It is recommended that source audio files be stored as WAV or some other lossless file format, to ensure the preservation of details that could otherwise be lost when using lossy file formats.

Furthermore, in order to achieve the desired functionality, several libraries, frameworks and tools were used. Among these were: Django, Django Rest Framework, React, SciPy, Numpy, MathPlotLib and Librosa. The Django and Django Rest Framework were used to construct the API needed for the front end (which was built using the React and MaterialUI libraries, among others). SciPy, Numpy, MathPlotLib and Librosa are used to load, process and output results of analysis done on audio files.

The available methods fall into two categories: container-based analysis and content-based analysis. Container-based analysis examines the properties of the file (i.e., date and time of creation/modification) whereas content-based analysis examines the actual content of the file (i.e., the waveform). Authenticity could be guaranteed by: searching for a discontinuity in a feature hidden within the waveform (e.g., electric network frequency); checking if different microphones recorded the audio file (e.g., microphone signature analysis); checking if different environments were used to record the audio file (e.g., environmental signature analysis); checking if different speakers recorded the audio (e.g. speaker identification); and checking for discrepancies within the properties of the file (i.e., container analysis).

The system pays particular attention to the integrity of data through file hashes, which are random strings of numbers and letters that represent a whole file. Any change in the file would result in a file hash that would be completely different from the previous one. The original file hash would be recorded immediately upon upload, safeguarding against any tampering of data after this point. In addition to this, the system also places importance upon the access levels for users, meaning that each user should only be able to access the parts of the system they may view, and not others.

Figure 1. Architecture of the proposed system

This article is from: