This lecture was presented at the 3D Digital Documentation Summit held July 10-12, 2012 at the Presidio, San Francisco, CA
Automated Classification of Surface Texture for Photographic Paper
Surface texture is a vital attribute defining the appearance of a photographic print. Texture impacts tonal range, rendering of detail, reflectance and conveys subtle qualitative information about the aesthetic intent of a photographer. During the 20th century, manufacturers created a huge diversity of specialized textures. Identification of these textures can yield important information about the origin of a photographic print, including the date and the region of origin.
Assembled over past decade, a texture library of photographic papers containing over 2,000 identified surfaces has been assembled using a simple system for capturing photomicrographs. Lacking a query and retrieval mechanism, this library has only the most basic application for the identification of unknown textures. Addressing this deficit, practical applications are being tested as part of the Museum of Modern Art’s project to characterize photographs from its Thomas Walther collection (funded by the Andrew W. Mellon Foundation). As part of this project, texture is being documented by reflectance transformation imaging, raking light and differential interference contrast.
Using these tools, image data “training sets” were assembled from 65 reference samples of photographic paper. Within the 65 items in the training set about 30% have matches derived from the same package of photographic paper or from the same manufacturer’s surface designation made during the same time period. As part of a “Photographic Paper Classification Challenge” these training sets have been distributed to teams with the objective of producing an automated classification system that matches an unknown texture to a short list of identified references gleaned from a library that may include thousands of samples. Accepting this challenge, five university teams from Worcester Polytechnic Institute (USA), University of Wisconsin (USA), University of Arizona (USA), Tilburg University (the Netherlands), and Ecole Normale Supérieure de Lyon (France) are developing separate approaches to solving this classification problem.
Such classification procedures typically divide into two parts: Feature vector extraction from the images followed by similarity evaluation of the feature vectors. For these tasks many algorithms are plausible with strengths and weaknesses dependent on the peculiarities of materials being analyzed. For example Fourier, wavelet, and multi-fractal analysis may have greater or lesser success on certain types of surfaces based on physical characteristics including isotropy or roughness. The success of these and other strategies from the Photographic Paper Classification Challenge will be assessed just prior to the 3D Digital Documentation Summit and will be summarized in the presentation. The techniques developed through the challenge may have applications for rapidly and inexpensively assembling texture libraries of other materials such as textiles and painted surfaces and for accessing these texture collections through database query and retrieval methods.
Church: Automated classification of surface texture for photographic paper. We have, I’m not sure in what order, but we have Richard Johnson and Paul Messier. Richard received his PhD in Electrical Engineering from Stanford University and is among the first PhD majors in Art History granted from Stanford in 1977. He was then faculty at Virginia Tech and then joined Cornell University’s faculty. He’s a senior professor of engineering. Paul Messier was independent conservator for photographs working in Boston. Founded in 1994, the studio provides conservation services for leading private and institutional clients throughout the world. The heart of his practice is the Messier Reference Collection of photographic papers playing a vital role. The largest resource of its kind in the world, the collection has over 5000 samples. It’s provided for scholars and connoisseurs, an objective database for dating, authenticating photographic prints. I’d like to welcome both.
Messier: Alright. Thanks very much. Let’s just get started. So, right away we’re going to get into some texture issues here although you can’t see them because what you’re looking at are images of Ansel Adams’ “Moonrise” and maybe you can’t even see “Moonrise” anymore. It’s such a icon of photography. But this is an image that Adams returned to basically over forty years, four decades of his life and each time, he’s reinterpreting it, and again you can’t see that here because you’re looking at an image. You’re looking at a two dimensional image, you’re not seeing the three dimensional object. There’s all kinds of texture cues built into these four different photographs that we cannot see when it’s projected as an image, but if you hold it in your hand or you’re in a gallery situation and you have lighting, you can change the angle of your eyes to the incident light. You can immediately perceive that there are fundamental object based differences to these four prints. Those fundamental differences come in large part from texture, from the texture of the photographic paper. It all starts with the photographic paper and these are tactile qualities that Adams himself was responding to when he was selecting the paper. These are tactile qualities that are built in to photographic prints, whether they are 19th century photographic prints, 20th century photographic prints or even 21st century photographic prints whether it inkjet paper or whatever.
So it starts with the paper. This is a sort of a break through brand of silver gelatin paper that was introduced in the latter part of the 19th century, this Velox brand, which is one of the oldest brands in photography. It is roughly a 100 year old brand and I collect this material as it was noted in the introduction. One of the ways that I have tried to get a handle on this universe of papers is texture. Texture is a manufactured in attribute, in other words, different manufacturers have different ways of producing this paper. They apply different textures, deliberately apply these different textures, so from my standpoint, 100 years out trying to catalog all of this material, texture becomes this vital component to classify this universe. They competed a lot. I mean the different manufacturers, you have to remember black and white photography at a certain period of the 20th century was the preeminent imaging medium that existed and before World War II especially, there are many different manufacturers that were all cranking out these highly specialized, very sophisticated surfaces. These are a couple of sample books from the thirties and early forties. Each one of these surfaces in the back here represents a different texture and actually gloss combination.
So the universe is pretty big and I just very quickly want to talk about yes, you can look at this universe through direct measurements and we’ve done that kind of experiment with confocal microscopy and other ways. So I’m acknowledging that but I’m going to push past that and I’m going to talk about sort of this almost semi-quantitative way. What I’ve come up with as a scheme is just something extremely dead simple, which is to classify the textures in this library. This is an image here; this is a microscope lens, focuser, and an LED light source projecting a constant angle raking light across my surface. I’m making these images and these images look like this.
I have a standard kind of domain which is one centimeter tall and again, I’ve got a lot of them. One of the keys, just like when I was talking about how to classify photographic papers as a whole, as a universe, how do you start classifying a collection of textures, a big collection of textures. I’ve got about, in terms of kind of a visual library, about roughly 2500 to 2600 of these photo micrographs. So another way that we are looking at not just the raking light, but another way, is through RTI and RTI, let’s see if I get this to work, RTI through the microscope. So adapting that microscope that I showed you to an RTI system. The video, it worked in rehearsal. Okay, so what we are looking at is the underside of a little mini RTI dome which is made of the finest materials available to man. In other words, we cut a waffle ball in half and drilled a hole in it and then drilled a bunch of holes for cell phone LED’s and what’s nice about it is the integration that we have between the camera and the dome. So the dome is actually fairly crude but the software that controls the camera and the dome is actually quite nice. We can control intensity of the individual light sources and things like that.
So we’ve got these two systems, this raking light system and this RTI, sort of micro RTI system. This is what the sample look like. This is just a video cam video, so I’m moving the virtual light source here, going around the sample. I think these RTI files are not necessarily intrinsically valuable but where they do have value and Rick’s going to talk about it is all the normal’s that are being generated here, basically the mathematical rendering of that surface. That’s something that we can actually work on and do some signal processing on when it comes to classifying all these thousands of different textures. Just a final point before I bring Rick up, one of the things that’s very important here and going back to this kind of direct measurement with confocal microscopy, that’s a six figure instrument, this instrument is about a five thousand dollar instrument and we can describe the geometries extremely precisely and we can make it kind of interoperable. So we can kind of deploy this as a system, a classification system very easily to other institutions. So here’s the National Gallery has built one, MFA Houston is using this as a technique, so we can not only classify my collection of 5000 or so, potentially what we are trying to build is a system to start classifying collections of photographs in institutions and the vision of course is to unify all of that in some sort of query and retrieval system so all these different institutions might be able to share.
Johnson: If you can’t hear me raise your hand. Basically what I’m going to do is introduce a challenge that we have created.
So what I want to do is tell you about a challenge that we introduced to try and get a handful of engineering teams to address that question. In other words, if I showed you images taken, either RTI images or raking light images, how would you use those to try and find the two in the haystack that match each other. So this could be pieces of images that were taken from the same piece of photographic paper or more likely two pieces of photographic paper that are from the same package and therefore the assumption is that they were possibly printed by the same person and this seems to be something that really matters to art historians. Or maybe the two pieces of paper are coming from the same manufacturer to the same specifications; there are these different types of matches and I know many of you here are technical, like I am, but I’ll reveal a certain truth. Anything we do in image processing to try and assist you in doing classifications, etc, if the humans can’t do it, the computer is not a miracle machine. So the first thing we’re going to do is to see how well you humans do.
There’s a pairing here. Do you see which one it is? These are the raking light photos blown up and correct me if I don’t remember the dimensions, but I think it’s about .67 centimeters, 1024 pixels, and of course when you’re looking at this image here it’s gone through the projection system so it’s not that great but there is a pair that matches. Now, does anybody want to be bold and make a guess, or if you want you can be like my dad on the game show. Just as they’re announcing the answer, he’d say what it was like he knew, you know. So to avoid that, if you think you really, really know what the pair is, you can just whisper to your neighbor, so somebody could vouch for you when you say, “Yes, I know.” Since we’re pinched for time (it’s 10:05) what I’m going to do is I’m going to tell you the answer to this one and we’ll call it practice. Okay, so does anybody think it’s like one or two match with anything? Those are the two that match. See there you go, do you believe her. Okay, so one person out of how many. Okay the next one is going to be a little bit easier for those of you who didn’t have a good time last time because we’re going to take one away. So now there’s only eight. So you’ve only got to find the pair in these eight.
When you’re done, we’re going to have you write down the logarithm you used because that’s what were after. We’re looking for the algorithm to do this automatically and that’s what all the teams are looking for. So we had four teams. I’ll tell you who they were in a minute but they used different algorithms to come up with different approaches to try and do this straight in the computer. No human intervention.
Okay. How many people got that one? Whoa, we’re doing better and in fact the next one you should do even better because it’s your third try. Okay, there’s a pair here as well. I’ll give you a second. How about that. That was about seven people out of what about one hundred. Not quite. You think you know which one it is? Are you ready? After all, if the computer’s going to do it, how fast is the computer going to do it. It’s done them all before we get started, right? So you know, you’re never going to catch up. Those two. How many people got those? Whoa. See you are doing a little better. What you may not have noticed is this one on the first page matched that one on the second page. Did you get that one? Damn. The problem is how many of these do we have. How many did you say? 5000 and how many of these do you want to look through.
So this is the problem we faced. The idea was to try and do this automatically and what we did was, MOMA gave us, we started with a set of twenty-five images that MOMA actually imaged for use. So basically there were two pairs. Remember the red pairs that I just showed you? So what we did was start the competition. All of them by the way, did you notice, all three pairs were in one group. They were all from the same kind of paper. Its not that they just match each other all on one page. So now we have a baseline. But we can ask them, what’s the best you can do with this set but before I get there, it is a lot of numbers. What I’m going to do is I’m going to look at the first three columns, just the first three and if I get any matches in there, I’m going to color that box a certain color and if they don’t match, it’s going to be blank, so if I did that for this, it would look like that. In other words the first few choices are in group one. Neither have a match in the first three that’s empty and that’s not good. Would you’d like for it to have happen is [Johnson steps away from microphone].
And we know we’re not done but that’s what we could do on our first pass with the 25 I showed you a minute ago. So we nailed the first group and we barely missed the second group in its entirety. Okay now but this, as you almost well know, if you’ve ever done any classification, the smaller the group, remember when it was eight, nobody got it but when it was seven people started getting it. If I’d made the group four, it would have been easier , right? So once you blow this group up to be more like the twenty-five hundred, you can expect you’re going to be in trouble. So in fact, that’s visible in some other examples we did. So here’s fourteen possible choices, each of which has one match in a set of eighty and these are all from Paul Messier’s data set and they’re from the same package of papers. So this is the one you would expect if you thought the manufacturing process was perfect, these guys ought to all be so similar that even your pet could get it right, okay? And it turns out that it’s not that easy. In fact, the algorithms did well on the one I showed you a minute ago. They’re struggling a bit but what really happens here is if you think, and this is really the way we were charged by the museum, is that you don’t have to have the match come up to be number one, after all that cuts the conservator out of having to do anything. Alright, so you want to give them a choice. So you want to tell them what the top three or the top five is. So all we really want is one yellow block in each line and if so it’s in the top three and that would be satisfactory to the museum. As it turns out, we’re only missing two, three at this stage. And they’re actually somewhere in the top five or ten, they’re just not in the top three. So this is an improvement over what we started with six months ago.
But now there’s a harder problem and that’s if they’re from the same manufacturer designation and you might expect that this would be not so. Not as strong a connection, in other words, not from the same package. They may not even be made the same exact year but they’re made supposedly they’re the same manufacture specifications. And if you’re a good paper conservator, you ought to be able to tell that. So humans should be able to do this. Can the computer assist. This becomes something where matching the images is much harder to do because they sit across a broader collection. There is more variability across this specification then there is within one pack of paper or within one sheet of paper. So we’ve got more to do.
The reason for this effort, I’ll take the last minute twenty four, where my involvement comes from. I’m absolutely committed to trying to see high quality scientific data be made available to teams that want to do research into the cultural heritage industry. So I’ve been involved with projects where we gathered data to do brush stroke analysis and fake identification for Van Gogh’s in the Van Gogh museum about six years ago. since then in the interim, we’ve counted the threads in all the Van Gogh paintings in the world or all the Van Gogh paintings in the Van Gogh museum and we had our first article in the Burlington Magazine this last February about thread counting. So we looked at brush strokes, we’ve looked at counting threads from x-rays and this is basically the third project that I’m trying to get started and draw people from my community in and it only works once the data is available. That’s the absolute hardest thing for us as outsiders to come by. So we’ve been very fortunate in working with the Museum of Modern Art and with Paul and his collection and we hope to expand it to inkjet papers and wove papers and other things but this paper classification problem is a natural. It’s just like the classification problems that engineers do for thousands of other problems and most of you that are device types and technology types probably recognize all this, but this is one of the problems we’re trying for us to deal with the cultural heritage industry is collecting this data.
So hopefully we’ll be able to at some point, be able to share this and hopefully next year we’ll be able to report to you that we have a scheme that puts yellow in the first column for everything.
Thank you for your time.
C. Richard Johnson, Jr. was born in Macon, GA in 1950. He received a PhD in Electrical Engineering from Stanford University, along with the first PhD minor in Art History granted by Stanford, in 1977. Following 4 years on the faculty at Virginia Tech, he joined the Cornell University faculty in 1981, where he is the Geoffrey S. M. Hedrick Senior Professor of Engineering. Since 2007, after 30 years of research on adaptive feedback systems theory and blind equalization in communication receivers, Professor Johnson has been promoting the application of signal processing in technical art history.
Paul Messier is an independent conservator of photographs working in Boston Massachusetts, USA. Founded in 1994, his studio provides conservation services for leading private and institutional clients throughout the world. The heart of this practice is unique knowledge and ongoing research into photographic papers. The Messier Reference Collection of Photographic Papers plays a vital role in this work. The largest resource of its kind in the world, the collection of over 5,000 samples was started in the late 1990s to provide, for scholars and connoisseurs, an objective baseline for dating and authenticating photographic prints.