I'm obviously not Ian Roberts, so I may get this wrong, but:
It's always easier to say Yes and agree to what the other person is doing. If someone comes out with a particular attitude, we agree that the attitude is real and is actually how that character is feeling and is not a lie. We can show that agreement a million different ways, (matching their attitude is a perfectly awesome way to do that) but no matter what we say Yes. Treating the other person's idea with a big No ("No, you don't feel that way.") or as stupid is not agreement and slows us way down. Now we would have to negotiate our ideas about what the scene is rather than just jumping into it in the first place.
In regards to environment: Imagine someone coming out and miming driving a car. A really easy way to agree to their environment would be to sit down next to them inside the same car. A way to not agree with them would be to stand on the other side of the stage and mime making a pizza. Now, we could have a guy practicing his driving skills in preparation for his driver's ed test inside his friend's pizza parlor, but isn't that a huge unnecessary exposition when just being inside the same car would be so much easier? We can establish environment a million ways. The above example is non-verbally. A player could also come out and say "Just pump a few more quarters into Street Fighter. I'm gonna win this time." implying an arcade (or at least a place with an arcade game). It would be a good idea to just say yes to this verbal exposition and pump some quarters into the machine. No matter how you go about it, it's always going to be a better idea to agree - we can get to what the scene is really about faster.
I think this is also about what to focus on in the beginning. The first few moments of a scene are pretty critical. Think of improv as a playground. You can do whatever you want and have fun in it, but you have to build that playground first. If we have the first few moments be super-solid, it makes it much easier to figure out the rest. How to play is very clear. Environment is just one block we can use to begin building our playground... emotion is another, mood is a third, etc., etc.
So, the scene is not about ____. In our previous examples, although we started with, say, driving a car, the scene is not about driving the car. It's about the people inside that car. Yes, the car does influence and inform what will happen to them. It may be a catalyst in some way. In the car, we may see a character's claustrophobia flare up. So the scene is not about the car. It's more about the claustrophobia, and it's definitely about the other guy's reaction to the claustrophobia (indifference, panic, exasperation, whatever). So, while we start with the car/environment, the scene is not just about two people driving.
Of course it's not impossible for two people to start out in two different environments, and of course it's not fatal if there's some miscommunication on that point. But in general, we should make it easier on ourselves and endeavor to agree at the beginning. The alternative is having to scramble and figure it all out instead of just knowing and going.