India Spring SDE 2022


Theme: Application of Open-source Technologies in Data Science and Analytics
As we all know, open-source tools have started to mature in their reliability and flexibility. In the last few years, we have witnessed the expansion of data science toolsets. In the field of statistics, analytics and visualisation, in addition to SAS (one of the most talked about languages in recent times) we have R and Python. The technology space continues to expand, and it is paramount that we stay ahead, in terms of the learning curve, in order to take advantage of the cutting-edge solutions that are required for data science problem solving.

During this Single Day Event, we shall look at usage of R programming, at Python and Julia languages for clinical trial data analysis, and explore more about potential opportunities in using open-source technology for regulatory data submissions.

Suhas Kirani Ravindra
Suhas Kirani Ravindra
SDE Chair

Biography:Suhas Kirani Ravindra is an Associate Director Programming, Technical Excellence & Innovation, Biostatistics at GSK. He leads a team of passionate R and R Shiny developers and data scientists.

He has 15 years of experience in the industry and extensive knowledge of clinical trial reporting across multiple phases and disease areas. Suhas is passionate about innovation, process optimisation, re-designing or developing enterprise solutions. His focus is always on being customer-centric and following agile principles, and he strongly believes in a continuous learning culture with lean agile leadership.

Bhaskar Subramanian
Bhaskar Subramanian
SDE Chair
Labcorp Drug Development

Biography:Bhaskar Subramanian has worked at various levels in programming and management roles for 21+ years, within areas of clinical data flow such as data acquisition, standards, statistical analysis and submission.

Prior to joining Labcorp, Bhaskar worked for GSK, Quintiles, Octagon and Accenture. He has trained 250+ professionals in CDISC SDTM.

Sanjeev SR
Sanjeev SR
Associate Vice President – Statistical Programming

Abstract:It is no surprise that companies in the pharmaceutical industry are adopting more and more open-source tools and methods to potentially move faster, while performing exceptionally well in clinical submissions.

Julia is a free and open-source language, with multi-model functionality and object-oriented programming. Julia provides ease and expressiveness for high-level numerical computing, similar to R and Python, and supports general programming. Working with almost all databases. Julia has been downloaded over 25 million times and has registered more than 5,000 Julia packages for community use. These include mathematical libraries, data manipulation tools and packages for general purpose computing. In addition, we can easily use libraries from Python/R, C, C++ and Java. Julia’s compilation is different from languages like Python or R. It’s easy to understand ‒ let’s learn how to use it as fast as C.


Biography:Sanjeev SR is responsible for Operations & Client Delivery for the Statistical Programming team at Accenture Life Sciences R&D. He has 18+ years of experience in clinical data management, eDC build, global standards and clinical programming (SDTM/CDISC) in various therapeutic areas. Sanjeev is accountable for global teams delivering to project timelines, major project milestones and overall project quality. He leads strategic technical or process improvement initiatives. Sanjeev is an expert in people management, operational leadership, project management and client management within health and life sciences R&D.

Jagadish Katam
Jagadish Katam
Principal Statistical Programmer
Princeps Technologies

Abstract:Shiny (a web application framework for R) is an R package that makes it easy to build highly interactive web apps directly in R. It combines the computational power of R and modern visualisation techniques to create interactive applications. Harnessing this power, R users can develop Shiny apps for visualising clinical data, as well as applications that aid in study design and analysis. Shiny apps can empower non-statisticians to explore and visualise their data or perform their own analysis with methods we develop.

In this presentation we will try to understand what Shiny is, the basic structure of a Shiny app and how we can make our own Shiny app. We will look into some examples of Shiny applications that use the power of R and Shiny, for viewing the relationship between variables in multiple dimensions and altering our visualisation with real-time parameter refinement using the UI component of the app.


Biography:Jagadish Katam is a Principal Statistical Programmer at Princeps Technologies, working on end-to-end programming activities. He has more than nine years’ experience as an SAS programmer and in leading studies. He has worked on successful clinical submissions and has experience of working in therapeutic areas such as infectious diseases and oncology. Jagadish’s expertise ranges from SDTM, ADaM and TFLs to macro programming.


He likes to spend his free time in improving programming skills in SAS and exploring the usage of R in supporting innovative statistical methodologies and advanced visualisation in the pharma industry. Jagadish sees PHUSE as a good platform to share and gain knowledge on various innovative topics in the ever-evolving clinical industry.

Joseph Rajan
Joseph Rajan
Principal Statistical Programmer

Abstract:The growth of open-source software and programming languages has been a great game-changer in the world of information technology. Having simple snake games on our Nokias to millions of complicated applications in our phones and computers was made possible due to open-source software programming languages. A simple observation is, you had a limited number of applications that worked on the Blackberry phone and the Symbian-operated Samsung/Nokia phones. Once the era of android OS came into market, it produced millions of applications in the play store, developed by millions of developers worldwide. The clinical trial world is a little more completed than considering an open source for the benefit of having to complete research faster.


This presentation will focus on the big “WHY” we should consider open source and, specifically, Python language. We will explore a few of the pros and cons of this programming language.


Biography:Joseph Rajan has 11 years of experience in information technology, spending eight years at Cognizant before moving to Pfizer India. He has worked with multiple technologies throughout his career. He gained his Bachelor of Engineering in Computer Science from Mount Zion College of Engineering, an affiliate college of Anna University, Chennai.


Joseph’s primary interest is in developing automation, to minimise manual effort and increase productivity. His thought process is very simple: “Never do the same routine twice.” Joseph would gladly spend ten hours developing a simple automation for a task that can be completed in two hours, just so that task need not be done manually.

Arwa Topiwalla
Arwa Topiwalla
Senior Principal Consultant
Saama Technology

Abstract: When we think about clinical trial datasets (SDTM/ADaM), we often only think about closed, sourced, expensive and licensed software. The majority of regulatory submissions are done using these licensed software; however, there are alternatives for clinical programming including biostatistical programming. The objective is to bring out alternatives to this traditional approach by discussing open-source technology which provides all the functionality required to create a CDISC-compliant dataset and cater to regulatory submissions. Here, we will discuss whether expensive technologies are required for creating a CDISC submission package and what are the advantages of open-source technologies for related and common business use cases.

Biography:Arwa Topiwalla has 15+ years of experience in life sciences as a biostatistical programmer. She has significant experience in leading multipleprojects for Business Process as a Service (BPaaS). Arwar is proficient in statistical programming activities for Phase I to IV clinical trials including CDISC and submission packages, and is part of the standards team to set up global standards for CDISC and TFLs. Arwar is currentlystandards lead for Saama’s smart series SAM (Smart Auto mapper).

Soumita Chel
Soumita Chel
Programmer Analyst

Abstract:You have spent a considerable amount of time in the clinical domain and recently started working with data science. One fine day, you get an email about a Python upskilling session, which leaves you wondering whether this could be useful in your current role.


In this presentation, we will seek to find that answer by exploring Python as an open-source tool for data science. We will see how clinical data representation, manipulation andvisualisation can be achieved through different Python libraries like Pandas, NumPy, Matplotlib and Seaborn. We will lastly go through a practical machine learning demonstration of a clinical scenario using the Python sklearn package.


Biography:Soumita Chel is a programmer analyst at Bayer Pharmaceuticals, with six-and-a-half years’ experience. After graduating in engineering, Soumita joined Tech Mahindra as a software engineer in a role based on automation and data analysis using Unix, Linux, Python and Oracle. Handling huge amounts of data gave Soumita the possibility to explore more about data science and pursue a full-time master’s degree in data science from the University of Glasgow. Soumita’s academic research was based on machine learning modelling of single cell RNA sequencing data. On joining IIT Hyderabad as a Data Science Research Fellow, Soumita worked on implementing ML techniques in cancer drug data, which led to publications in IEEE. Soumita’s role at Bayer focuses on developing Azure- and Python-based utilities. When she’s not at work, Soumita can be found reading, writing research papers or doing origami.

Ross Farrugia
Ross Farrugia
Data Engineering Product Family Lead

Abstract:Roche and GSK have entered a co-development collaboration towards an early 2022 open-source release of a modular ADaM framework solution (using R) named admiral (ADaM in R Asset Library). A further 15 companies are involved in testing the prototype, as we progress development throughout the year. The framework will rely on community contributions from all companies to grow it, given the endless nature of analysis datasets, so this is an early chance to engage with an industry-wide collaboration that we hope many will want to further contribute to in the future. We see this as one component of a wider potential for pharma open-source collaborations to cover our e2e clinical reporting flow.


This is your opportunity to join the journey early on and help us lessen the burden of ADaM, thus accelerating the speed at which our industry is collectively able to bring treatments to patients.

Biography:Ross Farrugia is the Data Engineering Product Family Lead within the Data Sciences group at Roche. In this position, he has accountability for our analysis data pipeline products that set the foundations ready for insights generation. Prior to this, Ross worked for 15 years as a statistical programmer, where he was involved in leading numerous filings, line management activities, and a global technology & innovation lead role.

He has a passion for transformative ways of working in order to help better meet the needs of our patients, including driving greater x-pharma alignment and collaboration through open-source solutions.


Ross holds a BSc in Maths, Operational Research & Statistics from Cardiff University. Outside of work, he is an avid football fan and loves spending time with his two young children and his dog named Batman.


1 Comment on this article

  1. .

    Sheila Mahoney

    Great stuff, thanks for posting!