Built a distributed data processing environment on AWS EC2 using Hadoop ecosystem to understand cluster architecture, data flow, and large-scale processing systems.
This project focused on understanding distributed computing systems by building a Hadoop-based data processing environment on AWS. It provided hands-on experience with cluster setup, data storage, and processing workflows.
The system was deployed on AWS EC2 instances, forming a distributed cluster. Hadoop handled data storage, Hive provided querying capability, and Spark was used for data processing tasks.
The main challenge was understanding and managing distributed system behavior across multiple nodes.
This project improved my understanding of distributed systems, cluster-based architecture, and large-scale data processing workflows in cloud environments.