Candidates

Data Analyst
Candidate Details
Candidate Summary
Senior Data Analyst with a 10+ years transforming complex data into actionable insights across healthcare,
insurance, and retail domains, Cloud Analytics, AI-Driven Insights & Business Intelligence
My expertise is deeply rooted in various platforms (e.g. AWS, Google Cloud Platform and Microsoft Azure).
I am committed to spearheading advanced data engineering projects, with a focus on enhancing security,
scalability and efficiency.
Utilized GCP services such as Big Query, Dataflow, Pub Sub, and DataProc to build scalable and high-
performance data processing solutions.
Advanced skills in Python, Scala, JavaScript and Shell scripting within UNIX/Linux environments, facilitating
powerful scripting and automation solutions.
Mastery in utilizing AWS (including EMR, EC2, RDS, S3, Lambda, Glue, Redshift), Azure (spanning Data Lake,
Storage, SQL, Databricks) and Google Cloud Platform, facilitating scalable and resilient cloud-based solutions.
Proficient in architecting and executing complex data ingestion strategies, building robust data pipelines, adept
in managing Hadoop infrastructures, excelling in data modeling, mining and refining ETL processes for optimal
performance.
In-depth knowledge of the Hadoop ecosystem, proficient in HDFS, MapReduce, Hive, Pig, Oozie, Flume,
Cassandra and Spark technologies (including Scala integration, PySpark, RDDs, Data Frames, Spark SQL, Spark
MLlib and Spark GraphX), ensuring high efficiency in big data processing and analysis.
Solid background in applying ETL methodologies with tools such as Microsoft Integration Services, Informatica
Power Center, Snow SQL, alongside deep understanding of OLAP and OLTP systems, enhancing data
transformation and loading processes.
Expertise in developing lightweight, scalable Microservices with Spring Boot for real-time data processing and
seamless integration, leveraging its convention-over-configuration principle for efficient application
development.
Experienced in formulating both logical and physical data models, employing Star Schema and Snowflake
Schema designs to support complex data analysis and reporting needs.
High proficiency in SQL Server and NoSQL databases (such as DynamoDB and MongoDB), executing complex
Oracle queries with PL/SQL and leveraging SSIS for effective data extraction, complemented by enhanced
reporting through SSRS.
Competent in utilizing data visualization tools like Tableau and Power BI and employing Talend for constructing
scalable data processing pipelines, facilitating insightful data presentation and decision-making.
Skilled in developing and managing Golang-based data processing pipelines, efficiently handling voluminous data
through ingestion, transformation and loading, with expertise in Snowflake data warehouse management within
Azure and applying Terraform for infrastructure as code across multiple clouds.
Proficient in designing secure API endpoints, incorporating JWT, OAuth2 and API keys for robust authentication
and authorization mechanisms, adept in managing various file formats (Text, Sequence, XML, JSON) for versatile
data interaction.
Strong adoption of Agile and Scrum methodologies, focusing on iterative development, collaboration and
efficiency, proficient in Test-Driven Development (TDD) and leveraging CI/CD pipelines (with tools like Jenkins,
Docker, Concourse and Bitbucket) for continuous integration and delivery.
Proficient with leading testing tools like Apache JMeter, Query Surge and Talend Data Quality, ensuring rigorous
validation of data transformations and ETL processes for accuracy and performance.
Deep understanding of network protocols (DNS, TCP/IP, VPN), with specialized skills in configuration and
troubleshooting to ensure secure and reliable data communication across networks.
Proficient in leveraging cutting-edge technologies like Apache Kafka and Apache Storm for real-time data
streaming and analytics. Capable of designing and implementing high-throughput systems that facilitate
immediate data processing and insights, enabling dynamic decision-making processes.
Expertise in establishing robust data governance frameworks to ensure data integrity, quality and compliance
with global data protection regulations (such as GDPR and CCPA).
Skilled in implementing data lifecycle management practices, metadata management and access controls to
P
A
G
E
1
safeguard sensitive information and promote ethical data usage.
Proficiency in integrating machine learning models and AI algorithms into data processing pipelines, using
platforms like TensorFlow and PyTorch.
Adept at developing predictive models and intelligent systems that enhance business operations, customer
experiences and decision-making capabilities through actionable insights derived from large datasets.
Technical Skills:
Programming & Scripting: Python Scala SQL PySpark Shell Scripting, JavaScript
Cloud Platforms: AWS: EMR, Glue, Redshift Lambda S3,Azure: Synapse, Data Lake, Databricks, GCP: Big Query Dataflow
Pub/Sub
Big Data & Analytics: Spark Hadoop Hive Kafka
Data Warehousing: Snowflake, Delta Lake Iceberg
ETL/Orchestration: dbt Airflow Azure Data Factory
Databases: SQL: Snowflake Redshift Oracle SQL Server, NoSQL: MongoDB DynamoDB Cassandra
BI & Visualization: Power BI, Tableau, Looker Plotly
DevOps & Engineering: CI/CD: Jenkins GitHub Actions Docker
Infrastructure: Terraform Kubernetes
Machine Learning: Scikit-learn, TensorFlow, PyTorch, MLflow
Healthcare Tech: Epic Clarity, Cerner Millennium, FHIR/HL7 APIs, SMART on FHIR, HIPAA Compliance