It has been argued that scene-selective areas in the human brain represent both the 3D structure of the local visual environment and low-level 2D features that provide cues for 3D structure. Here we develop an encoding model of 3D scene structure and test it against a model of low-level 2D features. We fit the models to fMRI data recorded while subjects viewed visual scenes. The fit models reveal that scene-selective areas represent the distance to and orientation of large surfaces, at least partly independent of low-level features. Principal component analysis of the model weights reveals that the most important dimensions of 3D structure are distance and openness. Finally, reconstructions of the stimuli based on the model weights demonstrate that our model captures unprecedented detail about the local visual environment from scene-selective areas.