What type of bias is evaluated by measuring outcomes? Are the tools appropriate for the outcomes? Are the tools standardized and validated? Were the measurements conducted at the same time centrally vs locally?