AIS Signals Cleaning and Normalisation

Supervisors

Giorgio Orsi (Oxford Martin Fellow Oxford Martin Fellow)

Suitable for

Abstract

This project falls under the fields of Geospatial data engineering, a subfield of Computer Science concerned with geospatial data management and analysis.

Automatic Identification Systems (AIS) is the primary mechanism used by the shipping industry to provide secure navigation over the oceans. The International Maritime Organisation (IMO) has been leading and enforcing the use of AIS for vessel monitoring and maritime traffic management.

Satellite AIS signals are usually transmitted by transponders carried by the vessels and received by satellites carrying specialised equipment and then retransmitted to receiving ground stations. Often ground stations can receive AIS signals directly from the vessels if they are near shore. Regardless of the technology used, AIS signals suffer from a large amount of noise which can be roughly classified into:

● Data corruption: usually caused by interference due to equipment or atmospheric events

● Data errors: the information carried by an AIS signal is mostly human-inputted and therefore subject to human error

● Data obfuscation: the identity of a vessel (e.g., its MMSI) can be intentionally changed to hide the true identify of the vessel and its purpose (spoofing)

● Data gaps: AIS signals can be missing either because of operational signal loss or by intentionally disabling the AIS transponder for, e.g., safety reasons

● AIS noise poses substantial challenges for the correct identification of the vessel, its nature, and the identification of its current position and route.

Current approaches to the problem of AIS cleaning largely rely on batch clustering and analysis of AIS signals, i.e., the AIS signals are processed and corrected in bulk. Batch processing of AIS signals is no longer suitable for the dynamic, fast moving requirements of trading applications which are emerging in the energy sector and in which Vortexa operates. Trajectory prediction algorithms rely on the implicit assumption that the input data is received in order (i.e., in terms of timestamps) and clean. AIS data is noisy and often received out of order undermining the accuracy of existing trajectory prediction algorithms. On the other hand, data cleaning algorithms, especially those relying on clustering, assume the entire dataset to be cleaned is available beforehand. AIS data is naturally streaming and their geospatial nature limits the effectiveness of traditional off-the-shelf cleaning algorithms.

At this time, all approaches to the AIS noise problem aim at providing general algorithms that are only based on geospatial information provided by the AIS signal, e.g., positions, vessel type, identity, vastly ignoring other potential useful information which is specific to the specific industries the vessels operate in, e.g., tankers vs dry-bulk cargoes. The aims of this project are as follows:

1) to attain an understanding of the patterns and phenomenology of AIS noise at a fundamental level.

2) to design algorithms with theoretical quality guarantees to tackle the AIS noise problem, building on top of established research such as TREAD (Pallotta et Al. - 2013), OP TICS (Ankerst et Al. - 1999), Spectral Clustering (Shi and Malik - 2000), and Adaptive HDBSCAN (Bai et Al. - 2023).

3) to demonstrate the effectiveness of these new noise cleaning algorithms in a real world setting of oil, gas, and chemical tankers as well as supporting vessels (e.g., lightering and bunkering). The specific innovations we are bringing include: 1) Exploring the use of product information (i.e., the type of cargo carried by the vessel) to correctly classify the behaviour of the vessel 2) The use of commercial information about the vessel to determine how aggressive the noise cleaning algorithm needs to be (e.g., dark fleets vessels typically require more aggressive cleaning) 3) The use of knowledge about the source and location of the AIS receivers and satellites to adapt the noise cleaning algorithm (e.g., genuine AIS signals gaps are more likely and frequent in South East Asia than in the Mediterranean sea.

4) The study, identification, and classification of known patterns of AIS spoofing as a means to complement general-purpose algorithms with human-provided knowledge about spoofing behaviour.

We believe this is the right time to investigate this problem in a more principled way as the number of satellite constellations capable of receiving AIS signals have dramatically increased in recent years. Moreover, the number of actors who actively attempt to undermine the effectiveness of AIS receivers is also increasing especially in areas involved in conflicts where electronic warfare is predominant. Dark fleets from sanctioned countries such as Iran and Russia actively adopt AIS spoofing and interference technology to mask or disrupt the AIS network making this research area especially important for the entire maritime industry.

Skills and Experience Required:

● Driven by working in an intellectually engaging environment with the top minds in the industry, where constructive and friendly challenges and debates are encouraged, not avoided

● Strong foundation in software engineering and machine learning, with coursework in advanced machine learning, databases, or data science preferred.

● Proficiency in Python, especially in machine learning libraries and geospatial data processing.

● Interest in online machine-learning algorithms and data streams.

● Interest in applying machine learning to real-world maritime challenges and developing cutting-edge algorithms.

AIS Signals Cleaning and Normalisation

Supervisors

Suitable for

Abstract

Student Space