Apache Spark is a fast and general engine for large scale data processing. It is written in Scala, a functional programming language that runs in a JVM. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. You can use Spark through Spark Shell for learning or data exploration (in Scala or Python, and since 1.4, in R) or through Spark Applications, for large scale data processing (mainly in Python, Scala or Java).
Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage.