By Jeremy Hsu on September 24, 2024
Popular smart TV models made by Samsung and LG can take multiple snapshots of what you are watching every second – even when they are being used as external displays for your laptop or video game console.
Smart TV manufacturers use these frequent screenshots, as well as audio recordings, in their automatic content recognition systems, which track viewing habits in order to target people with specific advertising. But researchers showed this tracking by some of the world’s most popular smart TV brands – Samsung TVs can take screenshots every 500 milliseconds and LG TVs every 10 milliseconds – can occur when people least expect it.
“When a user connects their laptop via HDMI just to browse stuff on their laptop on a bigger screen by using the TV as a ‘dumb’ display, they are unsuspecting of their activity being screenshotted,” says Yash Vekaria at the University of California, Davis. Samsung and LG did not respond to a request for comment.
Vekaria and his colleagues connected smart TVs from Samsung and LG to their own computer server. Their server, which was equipped with software for analysing network traffic, acted as a middleman to see what visual snapshots or audio data the TVs were uploading.
They found the smart TVs did not appear to upload any screenshots or audio data when streaming from Netflix or other third-party apps, mirroring YouTube content streamed on a separate phone or laptop or when sitting idle. But the smart TVs did upload snapshots when showing broadcasts from the TV antenna or content from an HDMI-connected device.
The researchers also discovered country-specific differences when users streamed the free ad-supported TV channel provided by Samsung or LG platforms. Such user activities were uploaded when the TV was operating in the US but not in the UK.
By recording user activity even when it’s coming from connected laptops, smart TVs might capture sensitive data, says Vekaria. For example, it might record if people are browsing for baby products or other personal items.
Customers can opt out of such tracking for Samsung and LG TVs. But the process requires customers to either enable or disable between six and 11 different options in the TV settings.
“This is the sort of privacy-intrusive technology that should require people to opt into sharing their data with clear language explaining exactly what they’re agreeing to, not baked into initial setup agreements that people tend to speed through,” says Thorin Klosowski at the Electronic Frontier Foundation, a digital privacy non-profit based in California.
Even a 0.30$ ch32v003 could handle this tiny amount of data. It's not a resource limit
I was curious enough to check and with 2KB SRAM that thing doesn't have anywhere enough memory to process a 320x200 RGB image much less 1080p or 4K.
Further you definitelly don't want to send 2 images per-second down to a server in uncompressed format (even 1080p RGB with an encoding that loses a bit of color fidelity to just use two bytes per pixel, adds up to 4MB uncompressed per image), so its either using something with hardware compression or its using processing cycles for that.
My expectation is that it's not the snapshoting itself that would eat CPU cycles, it's the compression.
That said, I think you make a good point, just with the wrong example - I would've gone with: a thing capable of handling video decoding at 50 fps - i.e. one frame per 20ms - (even if it's actually using hardware video decoding) can probably handle compressing and sending over the network two frames per second, though performance might suffer if they're using a chip without hardware compression support and are using complex compression methods like JPEG instead of something simpler like LZW or similar.
I don't think they will compress the screenshot and send them but run content in a tensorflow lite model or even just hash a few of the pixels to try for an ID match
Well that makes sense but might even be more processor intensive unless they're using an SOC that includes an NFU or similar.
I doubt it's a straight forward hash because a hash database for video which includes all manner of small clips and has to somehow be able to match something missing over 90% of frames (if indeed the thing is sampling it at 2 fps, then it only sees 2 frames out of every 25) would be huge.
A rough calculation for a system of hashes for groups of 13 frames in a row (so that at least one would be hit if sampling at 2 fps on a 25 fps system) storing just one block of 13 frame hashes per minute in a 5 byte value (so large enough to have 5 trillion distinctive values) would in 1GB store enough hashes for 136k 2h movies in hashes alone so it would be maybe feasible if the system had 2GB+ of main memory, though even then I'm not so sure the CPU speed would be enough to search it every 500ms (though if the hashes are ordered by value in a long array and there's a matching array of clip IDs, it might be doable since there are some pretty good algorithms for that).
I would sample a few dozens equally space pixels out of the frame, then drop similar value frames, and send that with timestamp. In the cloud, you runs those few pixels in a content recognition model.
It doesn't have to be especially accurate or know any niche content, the point is to make a psychomarketing profile of the customer like "car guy, watches tool reviews".