How can the theoretical peak performance of 4 floating point operations (double precision) per cycle be achiev