It is a type of html subtitle, the coordinates will point to the exact position of the text on the existing image.If you look at the duration, every single sub seem to be at 1:29 .09x length. You can also see that the url pointers are incremented by constant values. I think it indicates that this vtt is a bogus or a dummy filler than a viable subtitle file.
Which makes me wonder how the streaming server picks up the correct subs at time of playback.