How to mark organisms cut by the border of the frame?

by jo.irisson translator, scientist

This frame (APK0007f0v) prompted the question: how should partially cut organisms be marked? In particular, should this jelly (a solmaris) be marked on this frame or should one wait for the frame with the main part of the body?

A bit of information first. ISIIS takes a continuous stream of images, 2048 px wide. From this stream 1024x560 pixels windows, or cutouts, are extracted and displayed on planktonportal. For the California data set, those cutouts are cleverly placed to center on big organisms, but this makes it difficult to capture everything, on the borders in particular (if I am not mistaken; correct me if I am). For the Mediterranean data set, cutouts are just placed everywhere and those with nothing in them are eliminated; this cuts more organisms on the sides but ensures good representation of all regions.

The orientation of cutouts is such that the top and bottom are along the continuous stream of pixels recorded by ISIIS. The sides may be on the actual side of the recorded stream. In other words, for this image, in the Mediterranean dataset, the other part of the jelly is surely on another frame (and it actually is APK0007f0u). If it had been cut on the left or right side, it would have been less certain (two planktonportal frames fit in the 2048px width of the ISIIS pixel stream; one side is therefore a real side, the other is just the middle of the stream and the rest is on another frame; but there is no way to tell reliably which is which here).

Now, to be operational however, I would suggest to mark everything you can recognise, on every frame.

Indeed, the left and right sides are ambiguous (as pointed out above). It may sometimes be difficult to really know wether the largest part of the animal will be on another frame or is on the current one (large siphonophores with repeated structures which can be cut at any point come to mind). Even if the largest part is somewhere else, there is no guarantee that it will be easily identifiable. For all these reasons and probably others, we need to have all information and, after the fact, to come up with an algorithm to detect individuals marked twice, on two frames.

We have the coordinates of all planktonportal frames within the original stream and can therefore place all the marks back into their original, uncut, environmen. We will need to post-process these images and detect the potential organisms (i.e. the white regions) anyway, to measure various stuff on them (now that we only mark locations in particular -- see the posts about the particle detector). So it should be pretty easy to check whether two marks originally on two distinct planktonportal frames fall into the bounding box of the same object as detected by the particle detector on the full stream. Of course this can lead to false positives (i.e. a tiny copepod within the bounding box of a large jelly for example) but a combination of size and connected particles regions can probably easily take care of that.

Jessica, Cédric, others, what do you think?

Posted May 17, 2015 10:07 PM
by kirstenr

Thanks for this detailed post. I love it when you explain how things are actually done behind the scenes: It helps us be better classifiers.

Posted May 24, 2015 4:37 PM