top of page

Senior Data Engineer

India (Preferred - Hyderabad/Bengaluru)

Job Type

Full Time

Workspace

Remote/Hybrid

About the Role

The Senior Data Engineer will play a pivotal role in building, optimizing, and maintaining data pipelines for LakeFusion.AI's platform. This role emphasizes advanced expertise in Databricks and its ecosystem, requiring a proven ability to design scalable, high-performance data solutions aligned with best practices for data quality, governance, and compliance. Ideal candidates will have a Databricks MVP-type skillset, showcasing deep technical knowledge and leadership in Databricks workflows and features.

Requirements

Key Responsibilities:
  • Design, build, and maintain scalable and high-performance data pipelines using Databricks and Delta Lake.

  • Implement robust data ingestion processes, including real-time ingestion using tools like Databricks Auto Loader and structured streaming.

  • Work extensively with Unity Catalog to manage metadata, enforce governance, and ensure data quality across the platform.

  • Optimize data storage and transformations for structured and unstructured data from multiple sources (e.g., Salesforce, Workday, ADLS).

  • Collaborate with data scientists and engineers to integrate data pipelines with LLMs and vector database workflows.

  • Define and implement advanced ETL/ELT processes, focusing on performance tuning and cost optimization in a cloud-native environment.

  • Ensure compliance with HIPAA and other relevant data privacy and security standards.

  • Provide guidance on best practices for Databricks usage, staying up-to-date with the latest Databricks features and enhancements.


Qualifications:
  • Educational Background:

    • Bachelor’s or Master’s degree in Computer Science, Data Engineering, or a related field.

  • Experience:

    • 7+ years of experience in data engineering roles, with at least 3 years of hands-on experience with Databricks.

    • Proven expertise in designing and optimizing data pipelines using Spark on Databricks.

    • Experience with Delta Lake for data storage and processing.

    • Strong understanding of data governance and metadata management using Unity Catalog.

    • Experience integrating data workflows with advanced AI/ML models, including LLMs and graph-based solutions.

  • Technical Skills:

    • Proficiency in Databricks SQL, Python, and Scala for data engineering tasks.

    • Expertise in Databricks Auto Loader and real-time streaming pipelines.

    • Familiarity with vector databases (e.g., Pinecone, Milvus) and their integration with Databricks workflows.

    • Knowledge of cloud platforms such as AWS or Azure, with hands-on experience in cloud-native data solutions.

    • Strong debugging and optimization skills for large-scale data pipelines.

  • Soft Skills:

    • Ability to collaborate with cross-functional teams and translate business needs into technical requirements.

    • Excellent problem-solving and analytical skills.

    • Strong communication skills for explaining technical concepts to non-technical stakeholders.


Preferred Qualifications:
  • Certification as a Databricks Certified Data Engineer or Databricks Certified Professional.

  • Hands-on experience with Databricks MLflow for experiment tracking and model deployment.

  • Previous contributions to Databricks-related open-source projects, blogs, or technical community engagements.

  • Familiarity with HIPAA compliance and healthcare data management.

About the Company

Frisco Analytics is a forward-thinking data consulting firm dedicated to empowering businesses with cutting-edge analytics and insights. We specialize in transforming complex data into actionable strategies that drive growth and innovation. Our expert team leverages advanced technologies and a deep understanding of industry trends to deliver tailored solutions that meet the unique needs of our clients. At Frisco Analytics, we believe in the power of data to unlock potential and create lasting impact, partnering with businesses to navigate the ever-evolving landscape of modern analytics.

bottom of page