Detecting Anomalous Vessels at Maritime Ports
Every day, hundreds of thousands of vessels cross the oceans, traveling from country to country and port to port. Most of these trips are mundane, routine, and fairly uninteresting. These vessels are carrying cargo and have declared their port calls beforehand.
Instances where vessels don’t declare a port visit are less common, but it can be valuable to know about them as soon as possible. The arrival of unusual vessels is important information for port authorities and port operators due to the abnormality of the vessels’ type, flag country, speed, size, or other attribute. Real-time alerting of all vessel arrivals, with particular attention to vessels that are abnormal or anomalous from the receiving port’s perspective, can be used to quickly cue port authorities to further inspect the vessel.
Defining an abnormal vessel visit can be tricky. When it comes to machine learning algorithms, finding training data for such events is even more challenging. GA-CCRi has created a port-based anomaly detection model and demonstrated how it can be used to score incoming vessels for potential anomalies.
One example of such potentially anomalous vessels was a group of Iranian oil tankers that sailed to Venezuela. This was an unusual journey that sparked global interest. The image below shows historical Automatic Information System (AIS) vessel position data collected and provided by exactEarth.
Vessel and Port Descriptions
A challenging factor for tracking port arrivals, departures, and current port activity is deciding what geographical boundaries to use as a bounding area. GA-CCRi recently launched a new data product that includes data-driven and detected port boundaries based on real-time and historical vessel positions. The image below illustrates distinct port boundaries created through this process. It also shows live vessel AIS data for slow-moving or stopped vessels that are moored and how they reside in these port boundaries. The port boundaries will be the basis for creating subsequent anomaly models.
Detecting Abnormal Vessels
Training Data
Looking at historical AIS within Venezuela’s Port El Palito shows three of the five tankers visiting this port. These three tankers, with their Maritime Mobile Service Identity (MMSI) numbers, are the Fortune (422303000), Petunia (422232600), and Clavel (422232300). Shown below is Port El Palito with its corresponding port boundary and tracks of AIS-reported positions, colored by vessel, of the tankers arriving at the port.
Using this port geometry, we collected historical AIS data over the time period of 1 May 2019 – 1 May 2020 for vessels that were either moving slowly or stopped. This resulted in a data set containing 54 unique vessels from 12 different countries covering 6 vessel types. To construct the training set, we used a subset of numerical and categorical fields that relate to the physical characteristics: vessel type (categorical), flag country (categorical), length (numerical), and width (numerical).
Model Training
Once we created useful training data, we trained an isolation forest model using flag country, vessel type, length, and width as input features. At first, we used the isolation forest model from the scikit-learn library, but we found issues caused by the inclusion of categorical features. In an attempt to use the categorical features correctly, we used one-hot encoding to encode the flag countries and vessel types, but the algorithm failed to use the features correctly, resulting in strange results. Further research into this issue led us to the h4o.ai library, which handles categorical features with no need to encode them beforehand. This indeed appeared to be the case as the results aligned much better with what we expected.
Model Results
The isolation forest is a tree-based model capable of explicitly identifying anomalies instead of relying on a clear distinction of “normal” instances. The algorithm takes advantage of the properties of an anomaly in that there are fewer instances and the values of the attributes are very different from normal, more abundant instances.
As mentioned above, the isolation forest does not rely on training data to be without anomalies, which provides an advantage when training data is hard to come by. During model training, random partitions are created within each tree by randomly selecting a feature and value to perform a split, where the split depends on how long it takes to separate the instances. For each tree, a path length parameter is found that represents the number of edges an observation must pass in the tree before reaching the terminal node. Shorter paths are produced for anomalous instances with random partitioning, and when the average path length is shorter over many trees, it is highly likely that the instance is anomalous. The anomaly score is inversely related to the average path length, so a larger score indicates a more anomalous observation.
The following table shows the unique vessels that are believed to have visited Port El Palito along with the average path length and anomaly values assigned by the trained model. The vessels we are interested in have higher anomaly scores, with the two Iranian vessels of identical size having the same score. Some other vessels with higher scores include a Venezuelan diving vessel, which scores high due to its type. The most anomalous vessel was a Cuban tanker, which was likely due to its flag country and its physical size being smaller than a typical tanker.
MMSI
|
Flag Country
|
Vessel Type
|
Length
(Meters) |
Width
(Meters) |
Isolation Forest
Anomaly Score |
Isolation Forest
Path Length |
---|---|---|---|---|---|---|
323147000 | Cuba | Tanker | 144 | 23 | 0.75 | 3.19 |
422303000 | Iran | Tanker | 175 | 31 | 0.69 | 3.38 |
636016905 | Liberia | Tanker | 240 | 36 | 0.63 | 3.58 |
775996002 | Venezuela | Diving | 24 | 6 | 0.62 | 3.64 |
422232300 | Iran | Tanker | 183 | 32 | 0.60 | 3.70 |
422232600 | Iran | Tanker | 183 | 32 | 0.60 | 3.70 |
620586000 | Comoros | Tanker | 185 | 32 | 0.58 | 3.75 |
775994450 | Venezuela | Tug | 28 | 10 | 0.58 | 3.75 |
355823000 | Panama | Tug | 30 | 6 | 0.48 | 4.08 |
371182000 | Panama | Tug | 29 | 10 | 0.45 | 4.18 |
775054000 | Venezuela | Tanker | 282 | 32 | 0.42 | 4.29 |
775090000 | Venezuela | Tanker | 228 | 42 | 0.26 | 4.80 |
775092000 | Venezuela | Tanker | 183 | 32 | 0.25 | 4.86 |
775048000 | Venezuela | Tanker | 183 | 32 | 0.25 | 4.86 |
352332000 | Panama | Tanker | 251 | 46 | 0.23 | 4.92 |
371382000 | Panama | Tanker | 246 | 42 | 0.12 | 5.26 |
370955000 | Panama | Tanker | 228 | 32 | 0.05 | 5.50 |
373527000 | Panama | Tanker | 183 | 32 | 0.00 | 5.67 |
Scored vessels for Port El Palito over the time period for which the Iranian tankers visited
Port Anomaly Detection at Scale
This technique could be extended to any number of ports that have a port boundary available, where each model can be used to score real-time AIS observations as they are collected and transmitted. Real-time anomaly scores could be made available alongside the live AIS with a small latency, providing near real-time anomaly detection at a global scale. Early warnings could also be generated by scoring vessels that are inbound to ports using their forecasted routes, which could allow port operators, ship owners, and customs control to anticipate unusual vessel arrivals before they occur. Next steps for this work would be to model global ports and then use our streaming detection platform to score all incoming AIS messages against each model, creating real-time anomaly scores.