Basic ideas about Apache Spark

 Big Data  Comments Off on Basic ideas about Apache Spark
Sep 112015

Apache Spark is a fast and general engine for large scale data processing. It is written in Scala, a functional programming language that runs in a JVM. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. You can use Spark through Spark Shell for learning or data exploration (in Scala or Python, and since 1.4, in R) or through Spark Applications, for large scale data processing (mainly in Python, Scala or Java). Continue reading »

Basic ideas about Hadoop

 Big Data  Comments Off on Basic ideas about Hadoop
Sep 042015

Apache Hadoop is an open source software framework for storage and large scale processing of data-sets on clusters of commodity hardware. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Continue reading »