Protein Sequence and Structure Analysis Using Google Cloud Engine

Ajay Arya1, Johan Nyström-Persson2, Shandar Ahmad1

1School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067, India

2National Institute of Biomedical Innovation, Health and Nutrition, Osaka, Japan



The computational needs of bioinformatics are constantly increasing. Although sophisticated ready-made tools are increasingly available, in order to fully control their methods, bioinformaticians and other data scientists will need to write or modify their own software. Recently, there has also been a shift in computational architectures, from single-core desktop and laptop computers to multicore and distributed systems such as cloud computing. This shift necessitates a change in the way that we approach programming and think about algorithms in general and also specifically in bioinformatics. In this workshop we will introduce Google cloud computing techniques and tools to perform basic sequence and structure analysis of proteins.


This course will give a theoretical background as well as hands-on experience in the following topics.

  • Scalable computation with Google Cloud Platform
  • High-performance, concurrent algorithms for data analysis on Google compute engine
  • Basic Protein Sequence and Structure Analysis using Google Cloud
Requirement for signing up to Google Cloud Platform (GCP):

This workshop will need Google Cloud Services for all. Every participant is required to
sign up for an account in Google Cloud Service to use GCP. This entails following

  1. If you do not already have a Google account (e.g. gmail), you need to create one
    for free.
  2. After having created a Google account, login to your Google account and
    specifically sign up for Cloud account. To do so, navigate to and click, on “Try it free!”. This will start your GCP
    sign up process. Complete the process. You will need to enter your billing
    information but the free services can be used and are enough to complete all the
    activities in your workshop. GCP free trial gives you approximately $300 worth of
    cloud services valid for 12 months from the date of signing up.
  3. If you have used up your GCP free account allocation before or during the
    workshop, you may have to go to paid option and complete the payment.
Workshop details

Workshop will consists of three one hour lectures. Throughout the workshop, lectures will start with a lecture and theoretical introduction to each topic, followed by hands-on exercises.

Lecture 1. Scalable programming basics

Basic ideas of parallel and scalable programing will be introduced. General tools for parallelization in standard software such as R will be reviewed.

Lecture 2. Google Cloud Engine

Basic introduction on cloud computing will be given. This will follow as specific implementations in Google cloud. MapReduce versus Apache Sparke cloud computing frameworks will be introduced.

Lecture 3. Sequence and Structure Analysis using Google Cloud.

Basic sequence analysis tasks such as amino acid composition calculation, amino acid propensity in a pair of sequenc sets, information contents from multiple alignments etc. Will be demonstrated and hands on exercises will be provided. For structure analysis, structures from PDB will be taken as examples and secondary structure, solvent accessibility and binding sites will be computed.

Who can attend

Participants are expected to have at least some minimal prior experience in programming, using any language (R, Python, or Perl would suffice). A basic working knowledge of molecular biology is helpful but not essential.

Maximum intake:

A maximum of 50 participants will be accommodated. Selection (in case of excess requests) will be made by the course coordinator and teaching faculty based on the compatibility and usefulness of the course and also considering regional, social and gender diversity.


Users will be required to subscribe to basic Cloud services in Google. This may amount to about USD 50 per person and will be done by users themselves. No financial transactions is needed with organizers or hosts.

Duration of Course:

Lectures:  3 hours Hands on:  2 hours

Course coordinator:

Professor Shandar Ahmad,

School of Computational and Integrative Sciences, Jawaharlal Nehru University, New Delhi-110067

Phone: +91-11-2674-8788 (O)                Email:


Shandar Ahmad:                                   Ajay Arya: