Work Experience:
Python Data Engineer: Adobe, (Oct-2021 to till date)
● Led the Data and ML teams for DSP (Demand Side Platform)
● Led several projects for the Data, ML and Product teams.
● Worked closely with Reporting, Analytics and Accounting teams of Advertising Platform.
● Led the Qubole to EMR migration project for ML and Data team and helped other teams.
● Helped the teams to improve Query/Job performance using Spark.
● Configured EMR cluster for adhoc Spark, Hive and Presto Jobs/Queries.
● Implemented/Improved alerting for failed jobs,data spike using Python, Slack, PagerDuty and Mail.
● Managed the job scheduling tool and improved its performance and monitoring.
● Added several features to the job scheduling platform like push data to S3, Snowflake,Hive etc.
● Enabled data pipelines for ML model related and product related dashboards.
● Enabled data pipelines for reporting on advertisement platforms.
● Enabled data pipelines to exchange data with external partners, vendors.
● Managed ML model training pipelines, model serving and model monitoring dashboards.
● Deployed new ML models on K8s cluster and automated model training and prediction.
● I have done several POCs for tools, technologies and platforms.
● Identified unused computers and storage and cleaned them.
Data Engineer 3: PayPal, (May-2021 to Oct-2021)
● Understanding and solving the current issues in existing data pipelines.
● Onboarding new tables to hadoop clusters from PayPal properties like Venmo,Xoom etc.
● Extracting data from different DBs and ingesting to hadoop cluster.
● Scheduling pipelines using crontab and uc4.
● Ingesting history data in the existing tables with or without modified schemas.
● Enabling Partitioned and Non-Partitioned hive tables to other clusters which are used by management
and reporting teams.
● Working on the existing issues in kafka-spark streaming pipelines.
● Automating the data transformations and data pipelines.
Senior Software Engineer: Freshworks, (Nov-2019 to May-2021)
● Developed and Improved Big Data Pipelines on Spark, Hive, Impala, AWS, RDS
● Refactored the existing data pipelines to improve code maintainability and performance.
● Developed the data pipeline to process feedback data for model training,monitoring and generating
the reports.
● Understood Machine Learning requirements from Data Scientist and developed data pipelines to
provide appropriate data for model training.
● Developed data archival and purging solutions to remove PII.
● Designed database schemas, classifying fields and migrating data.
● Coordinating with different Solution providers to modernize existing Data Platforms.
● Worked on POCs to evaluate different technologies and platforms that can help in modernising
existing platforms.
● Developed and conducted learning sessions on Big Data and related technologies.
Specialist Programmer: Infosys Ltd, (May-2016 to Nov-2019)
TECHNICAL SKILLS AND TOOLS
● Programming Languages - Scala, Java, Python, C, SQL, HiveQL, Shell Script
● Big Data - MapReduce, HDFS, Hbase, Sqoop, ZooKeeper, Hive, Impala, Kafka, Spark SQL, Spark
Streaming, Zeppelin, Databricks, Cloudera, UC4, Snowflake, Qubole, Presto
● Cloud Computing- AWS, Amazon EMR, Kinesis, S3, DynamoDB, RDS, EC2, Azure
● Databases – MySQL, Hbase, PostgreSQL
● IDEs- Intellij IDEA, Eclipse,Pycharm, MS Visual Studio, Jupyter Notebooks.
● Operating Systems – Windows, Linux, Mac.
● Other- Github, Bitbucket, Bamboo, Jira, Confluence, Jenkins.