I’m recovering from the hyperventilating hyperbole in the reportage of IBM’s labeling of a dataset of facial photographs and making it available to researchers to reduce bias in facial recognition. NBC News went with a headline that read: Facial recognition’s ‘dirty little secret’: Millions of online photos scraped without consent. That might merit a “pants on fire” rating if it were in the realm of political reporting. The photos were not “scraped.” They were licensed.
Licensed, not scraped
The NBC story linked to IBM’s discussion of its work which, in turn, identified the dataset that it used. It is the one and only YFCC100M, a set of 100 million images (and associated metadata), each of which was posted on the photo sharing site Flickr, and was licensed under a Creative Commons license by its photographer. The project is described briefly on a Yahoo! page here (Flickr has since been sold by Yahoo! to SmugMug), and in slightly greater detail here.
There are various flavors of Creative Commons licenses, and the use of the YFC110M data set is subject to its own license and set of conditions. For example, licensees of the dataset must cite a journal article that describes it, and must credit the photographers as required by the Creative Commons licenses they attached to their images.
IBM allows any subject of a photograph in the dataset to notify IBM and elect to have his or her image deleted from the dataset. However, IBM did not make public a list of individuals represented in the set of images. (There is no such list; NBC obtained a list of the Flickr user names of the photographers whose work is represented in the dataset, so that list is searchable and IBM could be asked to remove a photo taken by a certain photographer.)
(A dozen years ago, a photo with a Creative Commons license permitting commercial use, with attribution, was used by a cellular phone carrier in an ad. The subject of the photo did not approve of the use and tried to have it taken down. Her case was tossed out of federal court for jurisdictional reasons — the company was a foreign affiliate of a US company — but if it were heard the plaintiff might have prevailed on privacy rights, since her image was used for commercial purposes without compensation and the photographer could waive his rights but not hers.)
Is this a problem?
Is there a potential issue with IBM’s use of photographs of individuals (the subjects of the photos, not the photographers) who may not have consented to this use of the image when the photo was first taken? What consent, if any, should have been obtained by the photographers? Is there a right to privacy that may be asserted now even if, as is likely the case for many, if not most, of them, the photograph was taken in a public location and not for commercial purposes? Are there moral rights that IBM should honor above and beyond any legal rights?
I assume that many, if not most, of the photos in the dataset were taken in public locations (where consent of the subject is not required) or taken by photographers who knew their subjects and had their implicit or verbal consent to take the photo and post it online. Obviously, none of the subjects specifically consented to the use of their photos as part of a facial recognition training dataset. However, that is not necessarily fatal to the images being used in research. The Creative Commons license must be honored, but to the extent that IBM hews to its published condition that the use of the labeled dataset is to be limited to research, rather than commercial use, it is not clear that a privacy right of the subjects is being violated. It seems to me that the common law privacy right that might be triggered by this scenario would require a showing of damages in order to be recognized.
On the other hand, what did the photographers think they were agreeing to when they attached Creative Commons licenses to their work? What were the subjects thinking about online privacy (if they were thinking about online privacy) when they were photographed? One way of thinking about this problem is that context is everything; when the context changes, the assumptions about consent must change. (“What you tell your bank, you might not tell your doctor. What you tell your friend, you might not tell your father-in-law.” What you told Flickr ten years ago, you might not tell Flickr today.) But the internet is constantly changing. The context for the use of data is constantly changing, while a data subject’s intent regarding use of one’s data may not change. I can be delighted by one unanticipated use and dismayed by another. The question is whether there is a reliable mechanism that we can use to capture and communicate intent in a manner that transcends context. In order for intent to be honored as more than a gut feeling (cf. the famous statement on pornography by Supreme Court Justice Potter Stewart: “I know it when I see it”), we would need to revisit the legal underpinnings of the whole notion of consent or authorization or licensing the use of an image.
There oughta be a law?
Maybe. But I don’t think there is one on the books that recognizes rights that the NBC piece — and other pieces published about IBM’s labeling of the dataset — suggest have been violated. The suggestion that we have arrived in the age of Minority Report, or in the age of a surveillance economy or surveillance state, are not to be lightly dismissed. But these images were already out in the wild under Creative Commons licenses. Changing the paradigm to require advance licensing by the subject of a candid street photograph would require unwinding years of precedent and creating an unreasonable burden on photographers. The context has changed, but how can we recalibrate our intent — and our communication of our intent — after the fact? A privacy right that is a variation or an expression of the right to be forgotten might be available in some jurisdictions, but it is an after-the-fact solution, which can limit the perceived harm but not necessarily avoid it.
Susan Sontag wrote in On Photography (quoted in the NY Times article linked to above):
Photography does not simply reproduce the real, it recycles it — a key procedure of modern society. In the form of photographic images, things and events are put to new uses, assigned new meanings which go beyond the distinctions between the beautiful and the ugly, the true and false, the useful and the useless, good taste and bad.
We haven’t yet come up with a framework to change the dynamic at play here. It may be difficult to do so, and it may be too late.
The Harlow Group LLC
Health Care Law and Consulting
Originally published on my award-winning blog, HealthBlawg
You should follow me on Twitter
Check out my internet radio show / podcast, Harlow on Healthcare
Filed Under: Artificial Intelligence, Big Data, Health care policy, Health Law, Open Data, Privacy