|
5. Anchoring in Complex Images: A New Theory So far the anchoring approach has been applied only to simple retinal images, and here it has proven its effectiveness in a compelling way. But the ultimate challenge for a lightness theory lies in the kind of complex images we encounter routinely. How can the rules of anchoring by relative luminance and by relative area be applied to complex images? Obviously, these anchoring rules cannot be applied directly to complex images. A more plausible approach would be to decompose the image into components, or sub-images, and then apply our rules of anchoring to each of these components. But here we encounter several difficulties. There are a variety of kinds of components into which the image can be decomposed. We must find the appropriate one. Even if we do, it is not reasonable to expect that each component sub-image can be treated in total isolation from the rest of the image. Surely there is interaction among the sub-images and the exact nature of this interaction must be identified. 5.1. What are the relevant components of a complex image? Both the phenomenologist Katz (1935) and the Gestalt theorists like Koffka (1935) spoke of regions of the image they called fields or frameworks. These are regions of common illumination. All the surfaces lying within a shadowed region, for example, would constitute a field. Applying the rules of anchoring within such fields of common illumination makes good intuitive sense. But it is not immediately obvious how the visual system can identify and segregate such fields. Edge classification (Gilchrist, et al, 1983) would be enormously useful here, but spelling out the rules of edge classification presents its own challenge to theory. Another approach, though related, would be to divide the image into coplanar regions, as proposed by Gilchrist (1980), but there are pitfalls here as well. What happens, for example, when a shadow falls across half of a set of coplanar regions. Intrinsic images provide yet another kind of sub-image. As we shall see, however, each of these schemes fails when confronted with the empirical data. We propose to define a framework in terms of the Gestalt grouping principles. A framework is a group of surfaces that belong to each other, more or less. By this definition of framework it is clear that complex images contain multiple frameworks. These multiple frameworks can be related to each other in several ways, depending on the distribution of grouping factors in the image. Some images are divided into separate but adjacent local frameworks, like a country is divided into provinces. Some images are structured as a nested hierarchy, with several superordinate and subordinate levels. In yet other cases the frameworks intersect one another (see Figures 25 and 28). 5.3. Local and global frameworks The largest framework consists of the entire visual field and will be called the global framework. Subordinate frameworks will be called local frameworks. Local frameworks are defined by local grouping factors, not by distance. There is no fixed degree of proximity within which a group of regions will be called local. A target will always be a member of at least two of these frameworks, the global framework, and one or more local frameworks. In each framework, target lightness is computed according to Formula 1 or Formula 2 (Section 4.5), just as it is computed in simple images. Except by coincidence, the target will have a different computed lightness when anchored within each of these frameworks. According to our proposed model, the net lightness predicted for a given target is a weighted average of its computed lightness values in each of these frameworks, in proportion to the strength of each framework. Because the grouping factors are graded, as opposed to all-or-none, and because a given framework can be supported either by a single grouping factor or by several, frameworks can be said to be stronger or weaker. The strength of a framework also depends strongly on the size of the framework and on the number of distinct patches within the framework. A target that belongs to a framework containing many distinct patches will be anchored strongly by that framework. A target will be more strongly anchored by a large framework to which it belongs than to a small framework to which it belongs. A stimulus configuration that has been frequently studied in lightness perception involves a single superordinate framework that is subdivided into two subordinate frameworks. Katz's experimental arrangement composed of adjacent lighted and shadowed fields is one example. Another is the simultaneous lightness contrast illusion consisting of two gray targets on adjacent black and white backgrounds. We can sketch out fairly well the rules of anchoring for images that contain such two levels of framework, using the more convenient terms local and global even when the entire stimulus pattern does not fill the entire visual field. The following formula predicts the appearance of the target: PR = Wl(Lt/Lhl * 90%) + (W-1)(Lt/Lhg * 90%) (3) where Wl is the weight of the local framework, W-1 is the weight of the global framework, Lhl is the highest luminance in the local framework, and Lhg is the highest luminance in the global framework. When area effects apply, this formula would have to be modified as shown in Formula 2 (Section 4.5). 5.5. Belongingness and grouping factors The grouping factors produce the perceptual quality of belongingness, or appurtenance as Koffka (1935) called it. A set of coplanar surfaces appear to belong together and thus constitute a framework. A set of surfaces moving in the same direction also constitute a framework, based on the principle of common fate. A group of surfaces lying in shadow constitute a framework as well. The strongest factor is probably coplanarity, at least when the luminance range is large (Gilchrist, 1980, p. 533). Classic Gestalt grouping factors like proximity, good continuation, common fate, and similarity are also effective. Edge sharpness, T-junctions, and X-junctions (especially when they are ratio-invariant) are important factors in belongingness as well, as we shall see. Finally, many empirical results appear to require that retinal proximity be treated as a weak but inescapable grouping factor. Grouping factors can segregate as well as integrate. The T-junction appears to be a potent grouping factor. In our model, T-junctions function as illustrated in Figure 7. The general principle seems to be that the two "occluded" quadrants (B & C) appear to belong to each other very strongly while the "occluding" border seems to provide a strong segregative factor, perceptually segregating the occluding region (A) from the two occluded quadrant regions. Todorovic (1997), Ross and Pessoa (in press), and Anderson (1997) have recently emphasized the role of T-junctions in such illusions, but they have given somewhat different interpretations. Todorovic speaks of contrast between regions bounded by the same collinear edge, with the basis of contrast unexplained. Ross and Pessoa propose the idea of contrast reduction at context boundaries, signaled in some cases by T-junctions. Anderson emphasizes the role of T-junctions in producing scission. Luminance gradients are held to segregate the luminance values at their opposite ends from each other. If two different but adjacent luminance values are divided by a sharp edge, they belong together strongly for anchoring purposes. But if they are separated by a luminance ramp, the same two luminances will be only weakly anchored by each other. This may seem backwards. One might argue that when 5.6. Theoretical value of belongingness. There are several important theoretical advantages to the belongingness definition of a framework. First, it allows us to define frameworks in terms of retinal variables rather than in terms of a perceived variable like perceived illumination, avoiding the percept-percept coupling issue. Second, it allows a unified account of both illumination-dependent errors and background-dependent errors (simultaneous contrast). If a framework were defined as a region of common illumination, as in the usage of Katz (1935) and Koffka (1935), our anchoring analysis would not work for the simultaneous contrast display (the standard textbook version) because there both local frameworks lie in the same field of illumination. Third, the belongingness construction bypasses the problem of edge classification. At the same time grouping factors must be identified that create the visual experience of a special region of illumination such as a shadows. The segregating role of the penumbra (luminance gradient) is one. Another might be called luminance similarity. All regions in the shadow share a lowered luminance relative to regions outside the shadow. It is true that we now have a lot of evidence, both empirical and phenomenological, that edges are perceptually classified. And yet, the basis of edge classification has never been fully spelled out. Indeed, our emphasis on belongingness may well provide a new angle from which to attack the classification problem, touching, as it does, on the central problem of perceptual structure. Moreover, the anchoring model need not deny that humans can classify edges. Rather the model carries the more modest implication that lightness computation does not depend on edge classification. Edge classification might depend on a process that runs in parallel to that of lightness computation. 5.7. The scale normalization effect. The term scaling refers to the relationship between the range of luminances in the image (or within a framework) and the corresponding range of perceived lightness values. The range of lightness values can be either expanded or compressed relative to the range of luminance values. Unless otherwise stated, our model assumes veridical, or 1:1 scaling; the range of lightness values is equal to the range of luminance values However, we do postulate a scale normalization effect whereby the range of perceived lightness values in a framework tends to normalize on the luminance range between black and white (30:1). Whenever the luminance range within a framework is greater than 30:1 some compression occurs, but whenever the range is less than this, some expansion occurs, with the amount of compression or expansion proportional to the deviation of the stimulus range from the standard range (30:1).
|
|||