Why are multi-threaded apps difficult to write?

I read a good article highlighting why it is difficult to write a decent multi-threaded app that scales well with the number of hardware cores and wanted to share it with you all.

http://www.javacodegeeks.com/2012/08/what-makes-parallel-programming-hard.html

Speed optimizations can even cause unintended artificial limitations.  Something to think about.