Experience: 5+ Years
Type: 3 Days WFH and 2 Days WFO
Location: Bangalore
Notice-period: Immediate/15 days
Budget: As per Company Norms
Technology: IT
Mandatory Skills:
- Apache Spark and either PySpark or Scala: Extensive hands-on experience with Spark for large-scale data processing and analysis. Proficiency in either PySpark or Scala for developing Spark applications.
- Databricks: Strong expertise in using Databricks for big data analytics, data engineering, and collaborative work on Apache Spark.
- Github: Proficient in version control using Git and GitHub for managing and tracking changes in the codebase.
- Data Warehousing (DWH): Experience with one or more of the following DWH technologies – Snowflake, Presto, Hive, or Hadoop. Ability to design, implement, and optimize data warehouses.
- Python: Advanced programming skills in Python for data manipulation, analysis, and scripting tasks.
- SQL: Strong proficiency in SQL for querying, analyzing, and manipulating large datasets in relational databases.
- Data Streaming or Data Batch: In-depth knowledge and hands-on experience in both data streaming and batch processing methodologies.
Good to Have:
- Kafka: Familiarity with Apache Kafka for building real-time data pipelines and streaming applications.
- Jenkins: Experience with Jenkins for continuous integration and continuous delivery (CI/CD) in the data engineering workflow.
Responsibilities:
- Design, develop, and maintain scalable and efficient data engineering solutions using Apache Spark and related technologies.
- Collaborate with cross-functional teams to understand data requirements, design data models, and implement data processing pipelines.
- Utilize Databricks for collaborative development, debugging, and optimization of Spark applications.
- Work with various data warehousing technologies such as Snowflake, Presto, Hive, or Hadoop to build robust and high-performance data storage solutions.
- Develop and optimize SQL queries for efficient data retrieval and transformation.
- Implement both batch and streaming data processing solutions to meet business requirements.
- Collaborate with other teams to integrate data engineering solutions into larger software systems.
- Utilize version control (Git/GitHub) for managing codebase and ensuring code quality through collaborative development practices.
- Stay updated on industry best practices, emerging technologies, and contribute to the adoption of new tools and techniques within the data engineering team.
Qualifications:
- Bachelor’s or Master’s degree in Computer Science, Information Technology, or a related field.
- 5+ years of experience in data engineering with a focus on big data processing.
- Proven experience in designing and implementing scalable data solutions using Spark, Databricks, and other relevant technologies.
- Strong programming skills in Python and proficiency in SQL.
- Experience with both batch and streaming data processing.
- Good communication skills and the ability to work collaboratively in a team environment.
- Optional: Certifications in relevant technologies such as Spark, Databricks, or AWS/GCP/Azure.