|
1. Introduction: Theoretical Developments; From Unconscious Inference to Intrinsic Image: 1.1. Basic ambiguity: luminance vs. lightness; lightness constancy We present here a new theory of how the visual system assigns lightness values, or perceived black, white, or gray values to various regions of the retinal image. The theory grew out of two problems facing what are probably the most advanced models of lightness perception: the intrinsic image models. To understand the problem of lightness constancy, it is necessary to understand the ambiguous relationship between luminance values (light intensities) within the retinal image, and the lightness values of the surfaces in our perceived world. A key problem is that luminance values in the retinal image are a product, not only of the actual physical shade of gray of the imaged surfaces, but also and even more so, of the intensity of the light illuminating those surfaces. The luminance of any region of the retinal image can vary by a factor of no more than thirty to one, as a function of the physical reflectance of that surface. But it can vary as a factor of a billion to one as a function of the amount of illumination on that surface. The net result is that any given luminance value can be perceived as literally any shade of gray, depending on its context within the image. Despite this challenge, we perceive shades of surface grays with rough accuracy. This is the well known problem of lightness constancy. Here is a brief history of attempts to solve this problem. 1.2. Inferring the illuminance: Helmholtz In one of the earliest attempts to solve this problem, Helmholtz (1866) proposed that the luminance of a region in the retinal image is compared with the perceived intensity of the illumination in that region of the visual scene. Dividing the luminance by the illumination yields reflectance and this is how a physicist might determine the reflectance of a surface. But as an account of human lightness, the theory has been fraught with both logical and empirical difficulties. Hering (1874) argued that Helmholtz' position is illogical because, given the luminance at any point, one would need to know the surface reflectance to infer the illumination. But one would also need to know the illumination to infer the surface reflectance. Hering emphasized the role of sensory mechanisms like pupil size and adaptation. But he emphasized lateral inhibition as the main mechanism, many years before its actual discovery, thus creating the prototype of the modern contrast theories. The publication of Katzs (1911, 1935) book The World of Colour gave enormous momentum to the emerging field of lightness perception. "Its importance at the time of its publication can hardly be overrated," wrote Koffka (1935, p. 241). Katz presented a thorough phenomenological analysis of the visual experience of color. He outlined the various modes of color appearance, emphasizing especially the distinction between surface color and film color. Yet at the same time Katz was a rigorous experimentalist, especially in relation to his historical context. He developed a variety of experimental methods for studying lightness constancy, including the now standard method involving side-by-side fields of light and shadow. He showed that lightness constancy holds even when all three of Herings mechanisms plus memory color are ruled out. Katzs theoretical perspective, though initially closest to that of Helmholtz, became strongly influenced by the Gestalt theorists. 1.4. Gestalt and the relational approach The Gestalt theorists rejected the assumption that light per se is the stimulus for lightness, in favor of luminance ratios or gradients. As Marr (1982, p. 259) has noted: "It is a widespread and time-honored view, going back at least to Ernst Mach, that object color depends upon the ratios of light reflected from the various parts of the visual field rather than on the absolute amount of light reflected." But Koffka (1935, p. 244) was the first to base a theory of lightness perception strictly on relative luminance: "Our theory of whiteness constancy will be based on this characteristic of colours...that perceived qualities depend upon stimulus gradients." Gelb (1929) showed that a piece of black paper appears white when presented alone in a spotlight, but much darker when a real white is placed next to it in the spotlight. Experiments of fundamental importance were also conducted by Kardos (1934, 1935), Burzlaff (1931), Wolff (1933, 1934), and Katona (1935), to name only a few. Reading the current literature, one would hardly suspect the vigorous empirical and theoretical developments that took place during the several decades following the publication of the first edition of Katz's book in 1911. Those developments were derailed by the events leading to World War II. After the war, the spotlight shifted to America, where contrast theories, spurred by the first direct physiological evidence for lateral inhibition, came to dominate the field completely. The stimulus conditions became highly reduced: luminous patches presented in dark rooms. Relative luminance came to mean contrast. The Gestalt lessons were lost in the stampede to explain lightness at the physiological level. The contrast theorists argued that the facts of lateral inhibition rendered the vague ideas of the Gestaltists obsolete. Consequently they felt no need to cite the earlier European work, and they did not (see Gilchrist, 1996). Several important publications in the tradition of relational determination appeared in the 1940s, but they were quickly assimilated to the contrast interpretation. Helson (1943, 1964), like Koffka, based lightness values on stimulus gradients. He proposed that the luminance of a target surface is compared with the average luminance (in fact a weighted average) in the retinal image, such that a surface with a luminance equal to the average luminance is seen as middle gray, luminances above the average are seen as light gray or white and those below the average are seen as dark gray or black. Wallach (1948), in a landmark paper, proposed a simple ratio theory of lightness. Presenting observers with two disk/annulus displays, he showed that disks of different luminance appear equal in lightness as long as the disk/annulus luminance ratios are equal. 1.5. Contrast theories versus relational theories Hering (1874) is often described as one of the earliest to point out the importance of relative luminance. But, as Koffka (1935, p. 245) wrote about Hering: "contrast...implies an explanation not in terms of gradient, but in terms of absolute amounts of light." This is explicit in more recent contrast theories within the Hering tradition. Cornsweet (1970, p. 303) defines the correlate of perceived lightness as "the frequency of firing of the spatially corresponding part of the visual system (after inhibition)." Likewise Jameson and Hurvich (1964) seem to acknowledge a fundamental role for relative luminance but their position is that the lightness values produced by a given luminance ratio in the stimulus depends, in the end, on the absolute luminance values (Jameson & Hurvich, 1961). This was the central point of their celebrated but essentially unreplicable (Flock & Noguchi, 1970; Gilchrist & Jacobsen, 1988; Haimson, 1974; Heinemann, 1988; Jacobsen & Gilchrist, 1988a and 1988b; Noguchi & Masuda, 1971) report. Relational theories like those of Koffka, Helson, and Wallach can be driven equally well by an input consisting either of absolute or relative luminance values. Contrast theories, while sometimes couched in relational terms, require absolute luminance information as well. Although lightness perception continues to be attributed to contrast mechanisms by non-specialists, those who study surface lightness have long recognized that lateral inhibition, although crucial to the encoding of stimulus values, plays no more substantial role. Empirical evidence accumulated more recently (Shapley & Enroth-Cugell, 1984; Whittle & Challands, 1969; Yarbus, 1967) tends to support the idea that retinal encoding processes simply encode relative luminance (see Gilchrist, 1994). And yet there has been a reluctance to embrace such a simple idea. In the past two decades, more effective models of lightness perception have emerged. They have evolved, not out of contrast theories, but out of attempts to correct several limitations in Wallach's simple ratio formula. Although it has become widely agreed that the concept of luminance ratios at edges goes far toward explaining the traditional problem of lightness constancy, this same insight has revealed a second constancy problem. When the same piece of gray paper is viewed successively against different backgrounds, the luminance ratio at the edge of the paper changes dramatically, yet the paper appears to change very little in lightness, contrary to the ratio principle. This constancy of lightness with respect to changing background has been labeled Type II constancy, to distinguish it from lightness constancy with respect to changing illumination, or Type I constancy. Ross and Pessoa (personal communication) have proposed the more memorable terms illumination-independent constancy (Type I) and background-independent constancy (Type II) and we have adopted that usage. A third kind might be called veil-independent constancy (Gilchrist & Jacobsen, 1983). Wallach's ratio rule deals effectively with adjacent luminances. But background-independent constancy seems to require a mechanism by which the luminance values of widely separated regions in the retinal image can be compared. The recognition of this need led to several papers, which appeared at almost the same time. Land and McCann (1971); Arend, Buehler, and Lockhead (1971); and Whittle and Challands (1969) offered evidence and arguments suggesting that the visual system is capable of deriving the luminance ratio between two surfaces remote from each other in the image. The exact mechanism for this is unknown, but one suggestion is that luminance ratios at every edge encountered along an arbitrary path from one surface to its remote pair are mathematically integrated. Gilchrist (1977, 1979, 1980) demonstrated empirically an observation that had been made by Koffka in 1935 (p. 248) that "not all gradients are equally effective as regards the appearance of a particular field part....," and, in another passage (p. 246): "Clearly two parts at the same apparent distance will, ceteris paribus, belong more closely together than field parts organized in different planes." Gilchrist found that perceived lightness values depend primarily on luminance ratios between adjacent regions perceived to lie in the same plane, as opposed to luminance ratios between any two adjacent parts of the visual field. Inspired by Koffka's observation that some luminance ratios are relatively effective in determining surface lightness while others are relatively ineffective, Gilchrist (1977; 1979; Gilchrist, Delman, & Jacobsen, 1983) proposed a distinction between what he called reflectance edges and illuminance edges. Reflectance edges are those luminance borders in the retinal image that are caused by a change in the reflectance (or pigment) of the surface being viewed while illuminance edges are those that are caused by changes in the intensity of the illumination on a surface, such as the border of a cast shadow, for example, or the luminance step at a corner. Gilchrist proposed that the visual system must classify edges in the image into one of these two main categories, before edge integration. Then an integration of all the edges in the reflectance category can yield a map of all the reflectances in the visual field, just as an integration of all the edges in the illuminance class yields a map of the illuminance pattern within the visual field. In effect, Gilchrist proposed to use edge classification as a wedge to pry the retinal image into two overlapping layers, one representing surface lightness values, the other representing the pattern of illuminance on those surfaces. At the same time, Bergström (1977) proposed that luminance variations within the retinal image are vector analyzed into three components: one for surface reflectance, one for illumination, and one for three-dimensional form. Bergström's distinction between illumination changes and three-dimensional form changes is roughly equivalent to Gilchrist's further breakdown of illumination edges into cast illuminance edges and attached illuminance edges. Adelson and Pentland (1990) have recently proposed an elegant scheme that bears a striking resemblance to Bergström's model. They liken the visual system to a three person workshop crew that produces theatrical sets. One person is a painter, one is an illumination expert, and one bends metal. Any luminance pattern can be produced by any of the three specialists. The painter can paint the pattern. The lighting expert can produce the pattern with variations in the illumination. And the metalworker can create the pattern by shaping the surface, as in shape from shading. But the cost of these three methods is not the same, and this sets up an economy principle in which each desired pattern is to be created in the cheapest possible way, reminiscent of the Gestalt simplicity principle (Gerbino, 1994). In 1978, Barrow and Tenenbaum proposed that every retinal image is composed of a set of what they called intrinsic images. One intrinsic image would contain the array of reflectance values in the scene, a second, the array of illumination intensities, a third, the array of depth values, and so on. The Bergström, Gilchrist, Barrow and Tenenbaum, and Adelson and Pentland models all have in common the idea that the retinal image is parsed into a set of overlapping layers, much as in the scission idea made popular by Metelli (1985; Metelli, et al, 1985) in his analysis of perceived transparency. We will refer to these models as intrinsic image models. An excellent discussion of them can be found in a recent chapter by Arend (1994). 1.8. Two weaknesses of the intrinsic image models. Intrinsic image models are the most advanced models in the continuing development of lightness theory. They offer an explanation of both illumination-independent and background-independent constancy. Yet they are incomplete in two very important, though different, ways: (1) they cannot account for errors in lightness perception and (2) they have no anchoring rule. We consider these in turn. The goal of the computational enterprise that produced the intrinsic image models has been the modeling of a completely veridical lightness perception system. This is implied in terms like "inverse optics" and "recovering reflectance." In that sense it has been consistent with the goals of machine vision. Failures of constancy and other perceptual errors have been largely ignored. Even in human vision there is a good reason for this approach. The achievement of constancy and veridicality in the perception of surface lightness is stunning, especially when the various challenges to constancy are recognized. This degree of veridicality does not happen by accident; it cannot be merely the byproduct of a system with goals other than veridicality. Somehow a very robust truth-seeking quality must lie at the heart of the system. Thus if the achievement of veridicality is considered to be more impressive than the degree of failure, it makes sense to begin by trying to model a system that comes as close to veridical perception as possible. Nevertheless, unlike the situation in machine vision, a theory of human lightness must include an account of errors. To the extent that veridicality fails in lightness perception and to the extent that the intrinsic image models predict veridicality, to that extent the models must fail to account for human lightness perception. But perhaps more importantly, a systematic analysis of human lightness errors can be a powerful tool for revealing how surface lightness is processed by the human visual system. There is a simple and compelling logic behind the study of errors: 1. Errors in lightness perception are always present, however slight. 2. These errors are systematic, not random. 3. The pattern of errors must reflect visual processing. This pattern is the signature of the visual system. The theoretical picture would be brighter if the empirical pattern of errors could be produced by tweaking the veridicality models. But no such prospect is in sight. For example there is no coherent approach that can explain both illumination-dependent failures of constancy (Type 1), and background-dependent failures of constancy (Type 2), despite an attempt by Gilchrist (1988). Background-dependent failures of constancy, of which the textbook version of simultaneous lightness contrast is the best-known example, are typically described as simply reflecting the operation of the mechanism that achieves illumination-independent constancy. But how are illumination-dependent failures of constancy to be explained? Is there no systematic relationship between these two kinds of failure? We consider a unified account of illumination-dependent and background-dependent failures to be a major goal of a theory of lightness errors. The second shortcoming in the intrinsic image models, in their current form, is that they are missing an essential component for veridical perception: an anchoring rule. We now turn to an extended discussion of the anchoring problem, which in turn will bring us back to the errors problem. We will find that the attempt to fill a crucial gap in the intrinsic image approach by finding the missing anchoring rule in fact turns out to undermine the entire intrinsic image approach. At the same time it provides the outlines of a very different approach to surface lightness, one that excels in its ability to explain the pattern of errors found across a very broad range empirical results.
|
|||