Can Internet Filters Identify Obscene Images?

(Third in a series of five articles on Internet Filters)

A common misunderstanding about Internet Filters is the belief that such programs examine photographs or similar computer images and decide whether the content of the image is pornographic.  This is not a capability that filtering programs have, nor is it reasonable to expect that they can.

At a technical level, computers think quite differently from the way humans think.  Recognizing patterns, even imperfect ones, is easy for human minds, but extremely difficult for computers. A person, for example, can read the wildly varying handwriting of many others, while computers can decipher very few of these. Humans can understand spoken language with many different accents, while computers are easily confused by even slight differences in pronunciation (think about the last time you spoke your account number into one of those automated telephone banking or airline reservation systems).

Character Recognition is a good example.  Most of has have seen an image like the one depicted on the right when creating some kind of online account.  The whole point of this kind of image is to make sure that it’s a human being, and not a computer program, that is creating the account.  This is effective because the distortion of the letters makes it almost impossible for a computer program to recognize them, even though humans can usually identify the letters quite easily. 

While recognizing letters is more complicated than most people realize, it is vastly simpler than identifying the thematic contents of an image.  Consider the picture on the left.  Try to image how difficult it is for a computer – challenged by recognizing just letters – to determine what is going on in this picture.  Are there human bodies or body parts in the picture?  What are they doing?  Is it pornographic?  Such questions are probably beyond any computer program.

Some confusion arises because Internet Filtering programs sometimes do make choices about whether to allow or prohibit access to image files.  In most cases, though, the filtering program is making this choice on the basis of text, not the image contents of a file.  For one thing, the program can look at the text surrounding a link on a web page, and, assuming that the text gives some idea of the contents of the image to which the link leads, can prohibit access if that text contains tabooed terms.  The name of the file itself is also text that can be checked for tabooed terms.  In addition, the image file may contain text that is hidden from most viewers.  Depending on the format of the image file (gif, jpeg, png, etc.), there may be “tags” inside the file, text that describes the file contents.  These tags are not visible when displaying the image in the file, but they are present behind the scenes, and the filtering program can find them and check them for tabooed terms.

Beyond the technical issues lies a much more important and entirely human one: people can’t agree on a definition of pornography.  A precise legal definition has evaded lawyers and judges for decades, and today the determination is left up to juries using “community standards” as to what is prurient or patently offensive, and a “reasonable person’s” definition as to whether a work has “serious value.” Such vagueness is something humans may be able to grapple with, but it is quite outside the pale of computational logic. 

Progress is being made in the field of Artificial Intelligence, but we’re not there yet.  For the foreseeable future, computers will have to be told in great detail what to do and how to do it, and that makes it impossible for them to accomplish the nearly instantaneous pattern-recognition that is natural for the human brain. For now, Internet Filters will have to rely on Black Lists and White Lists determined by human review of websites, and on recognizing keywords that indicate possibly objectionable content. 

