so here are some good big-data ecosystem related questions shared for the community for their interview preparations.
HDFS
- What are the main components and node types that make HDFS ?
- Briefly explain the purpose and function of FSI image and Editlog ?
- What is the difference between secondary name node and standby name node?
- what are the options for replicating and HDFS cluster for example production and DR cluster ?
- How are corrupted HDFS files detected and subsequently recovered ?
- Describe HDFS tiered storage concepts?
KafKA
- Describe functions of Kafka Mirror maker ?
- how do you integrate Kafka mirror maker into an existing architecture ?
Spark
- What are definitions in Spark for , Jobs , Stages ,Task.
- Describe each of execution modes that spark supports
Yarn
- What is yarn used for ? Describe each of its core components ?
Cassandra
- Explain how Cassandra stores data
- What is difference between partition and clustering ?
Thanks for the interview panel to share a copy of these questions
No comments:
Post a Comment