## Training solutions: Python, SageMath, Stata, R, Excel and Tableau

#### Data Management and Analysis for Project Managers

Participants will learn the principles of statistics and gain skills in using statistical tools to describe, study and investigate the various variables in survey data sets using Stata. The statistical background required to conduct research, describe, summarize, develop hypothesis, assess associations, analyze data, interpret and communicate results will be studied comprehensively.

The course targets professionals in research (or research related) organizations / institutions who wish to acquire or increase their computational skills in the Stata software. The course is developed to benefit data managers, research officers, project managers, statisticians, data analysts, students (undergraduate and postgraduate), trainers/facilitators, research organizations, non-governmental organizations, policy makers, among others.

**Course software (either of them, not both):**

- Stata
- R programming

### Course content

- Data management and manipulation
- Introduction to the software
- Software overview
- Directory management commands
- Data types
- Basic arithmetics
- Syntax and its format
- Syntax editors
- Creating data sets
- Display and view data sets
- Interrupting computations

- Working with data sets, dates and help
- Rename of variables
- Managing variables and/or variable properties
- Importing data from other software
- Exporting data to other software
- Count number of observations
- Generate sequential numbers
- Change order of variables
- Saving data sets
- Loading data into the memory
- Conditional statements
- Create subsets
- Create random variables (from distributions)
- Sort variables
- Random sampling
- Working with dates
- Obtaining help in Stata

- Creating and changing variables
- Create new variables
- Extended generate command
- Duplicate an existing variable
- Replace contents of a variable
- Convert numeric to string
- Convert string numbers to numeric
- Convert numeric values to missing and vice versa
- Recode string variables
- Decode numerically coded variables
- Transforming a continuous variable to categorical
- Reduce number of categories of a categorical variable
- Managing duplicates

- Transforming variables
- Split variables
- Extract parts of variables
- Standardize variables
- Create dummy variables
- Create separate variables
- Transpose variables
- Stack variables
- Unstack variables
- Convert datasets from wide to long
- Convert datasets from long to wide

- Appending and merging data sets
- Appending data sets by rows
- Appending data sets by columns
- One-to-one merging
- One-to-many merging
- Many-to-one merging
- Many-to-many merging
- Using merge to update data sets

- Introduction to the software
- Descriptive statistics
- Introduction to statistical concepts
- Review of research process
- Research designs
- Sampling techniques
- Types of data
- Descriptive statistics
- Graphs for descriptive statistics

- Explanatory data analysis in Stata
- One way frequency tables
- Crosstabutions
- Tables of descriptive statistics
- Exporting tables to Excel
- Multiple responses

- Introduction to statistical concepts
- Data visualization (charts)
- Introduction to graphing
- The graphics dialog windows
- Graph elements (x and y labels, titles, legends)
- Graph appearance (marker symbol, color, size, line width, pattern, e.t.c)
- Multiple graphs (by option)
- Graphics syntax
- Adding text and annotations to graphs
- Saving and printing graphs
- Combining active graphs into one figure
- Graphics window (interactive plotting)

- Common graphs and charts
- Pie chart
- Simple bar graph
- Grouped/clustered bar graph
- Stacked bar graph
- Confidence interval bands
- Line graph
- Combined graph
- Area bar graph
- Scatter plot
- Box plot

- Introduction to graphing
- Hypothesis testing
- Hypothesis testing background
- Definitions
- Statistical inference
- Generalizability
- Confidence intervals in clinical research
- P-values in clinical research
- Hypothesis testing
- Interpreting hypothesis test results

- Tests of differences in population means
- One sample z tests
- One sample t tests
- Two sample independent z tests
- Two sample independent t tests
- Two sample paired t test
- One way analysis of variance
- Two way analysis of variance
- Analysis of covariance

- Analysis of contingency tables
- Proportion test
- Chi-square goodness of fit test
- Chi-square test of independence
- Chi-square test of homogeneity
- Proportion test
- Fisher’s exact test
- McNemar matched pairs for binary response

- Non-parametric methods
- Sign test
- Wilcoxon signed-rank test
- Median test
- Wilcoxon signed-sum (Mann-Whitney) test
- Kolmogorov-Sminorv goodness-of-fit test
- Kruskal-Wallis one way analysis of variance
- Friedman two-way analysis of variance
- Spearman's rank order correlation
- Kendall's correlation

- Hypothesis testing background
- Regression analysis and correlation
- Pearson correlation coefficient
- Linear regression for continuous response
- Overview of linear regression analysis
- Simple linear regression
- Multiple linear regression
- Prediction in linear regression
- Testing for assumptions of linear regression
- Regression diagonostics for linear regression

- Logistic regression for categorical response
- Overview of logistic regression analysis
- Simple logistic regression
- Multiple logistic regression
- Tests for logistic regression
- Ordinal logistic regression
- Multinomial logistic regression
- Conditional logistic regression

- Poisson regression for count response
- Overview of Poison regression analysis
- Simple Poison regression
- Multiple Poison regression
- Ordinal Poison regression

#### Biostatistics and Epidemiology for Public Health Professionals

Participants will learn the principles of epidemiology and biostatistics and gain skills in using epidemiological and biostatistical tools to describe, monitor and investigate the determinants of population health. The statistical background required to conduct research, describe, summarize, develop hypothesis, assess associations, analyze data, interpret and communicate results will be studied comprehensively.

The course targets health care professionals who wish to consolidate their knowledge and skills and increase their understanding of the importance of epidemiology and statistics in public health today. Career pathway with these areas include: public health physicians, epidemiologists, biostatisticians, surveillance officers, monitoring and evaluation coordinators, data managers, research officers, among others.

**Course software (either of them, not both):**

- Stata
- R programming

### Course content

- Descriptive statistics and graphics
- Explanatory data analysis in Stata
- One way frequency tables
- Crosstabutions
- Tables of descriptive statistics
- Exporting tables to Excel
- Multiple responses

- Introduction to graphing
- The graphics dialog windows
- Graph elements (x and y labels, titles, legends)
- Graph appearance (marker symbol, color, size, line width, pattern, e.t.c)
- Multiple graphs (by option)
- Graphics syntax
- Adding text and annotations to graphs
- Saving and printing graphs
- Combining active graphs into one figure
- Graphics window (interactive plotting)

- Common graphs and charts
- Pie chart
- Simple bar graph
- Grouped/clustered bar graph
- Stacked bar graph
- Confidence interval bands
- Line graph
- Combined graph
- Area bar graph
- Scatter plot
- Box plot

- Explanatory data analysis in Stata
- Hypothesis testing
- Hypothesis testing background
- Definitions
- Statistical inference
- Generalizability
- Confidence intervals in clinical research
- P-values in clinical research
- Hypothesis testing
- Interpreting hypothesis test results

- Tests of differences in population means
- One sample z tests
- One sample t tests
- Two sample independent z tests
- Two sample independent t tests
- Two sample paired t test
- One way analysis of variance
- Two way analysis of variance
- Analysis of covariance

- Analysis of contingency tables
- Proportion test
- Chi-square goodness of fit test
- Chi-square test of independence
- Chi-square test of homogeneity
- Proportion test
- Fisher’s exact test
- McNemar matched pairs for binary response

- Non-parametric methods
- Sign test
- Wilcoxon signed-rank test
- Median test
- Wilcoxon signed-sum (Mann-Whitney) test
- Kolmogorov-Sminorv goodness-of-fit test
- Kruskal-Wallis one way analysis of variance
- Friedman two-way analysis of variance
- Spearman's rank order correlation
- Kendall's correlation

- Hypothesis testing background
- Regression analysis and correlation
- Pearson correlation coefficient
- Linear regression for continuous response
- Overview of linear regression analysis
- Simple linear regression
- Multiple linear regression
- Prediction in linear regression
- Testing for assumptions of linear regression
- Regression diagonostics for linear regression

- Basics of epidemiology
- Overview to epidemiology
- Measures of disease frequency
- Importance of measures of disease frequency
- Measures of risk and association
- Risk verses prevention
- Prevalence
- Incidence, cumulative incidence & incidence density
- Relationship between prevalence and incidence
- Stratification of disease frequency

- Measures of effect for categorical data
- Risk difference
- Risk ratio
- Attribute fraction
- Attribute risk
- Relative risk
- Odds ratio

- Measures of effect for stratified categorical data
- Mantel-Haenzsel test
- Odds ratio for stratified data
- Odds ratio for matched pairs studies
- Testing for trends

- Vital statistics
- Introduction
- Death rates and ratios
- Measures of fertility
- Measures of morbidity

- Epidemiological studies
- Clinical research designs
- Study population
- Exposure and outcome
- Study designs
- Causation

- Case report and series
- Cross-sectional studies
- Cohort studies
- Cohort study design
- Ascertainment
- Advantages
- Disadvantages
- Poisson regression for cohort studies

- Case-control studies
- Case-control study design
- Advantages
- Disadvantages
- Unconditional logistic regression
- Conditional logistic regression

- Clinical research designs
- Other analysis
- Misclassification
- Definition
- Simple linear regression
- Non-differential misclassification
- Differential misclassification
- Assessing misclassification

- Confounding
- Confounding overview
- Evaluation of confounding factors
- Confounding by indication

- Remedies for confounding
- Restriction
- Stratification
- Matching
- Regression
- Randomization
- Interpretation after adjusting for confounding
- Unadjusted verse adjusted association: confounding

- Effect modification
- Overview
- Synergy between exposure variables
- Effect modification verses confounding
- Evaluation of effect modification
- Effect modification in clinical research articles
- Effect modification on the relative and absolute scales

- Introduction to survival analysis
- Overview
- Organizing survival data for computer use
- Censoring (right and left)
- Truncation (right and left)
- Plotting survival data (the Kaplan-Meier curve)
- Log-rank tests
- Hazard rates
- Cox proportional hazard models

- Misclassification

#### Numerical Analysis and Modelling for Scientists and Engineers

Numerical analysis is a multidisciplinary subject which is recognized as an integral part in many areas including: mathematics, computer science, physics, commerce, engineering, biology, e.t.c.

Participants will get a comprehensive introduction to either Python or SageMath after which they will use either of these programming languages to find numerical solutions to applied scientific problems. While analytic/symbolic methods will not be our main focus, we will nonetheless use the Python Sympy library or SageMath Symbolic Toolbox to find analytic solutions to symbolic expressions.

This numerical analysis course targets students and professionals in mathematics, engineering, finance, economics, biology amnong others. Those who have an interest in mathematical modelling and simulation are highly encouraged to apply.

**Course software (either of them, not both):**

- Python
- SageMath

### Course content

- Getting started
- Overview
- Programming editors
- Modules (Python only)
- Structure of commands
- Creating variables and arrays
- Working with strings
- Input and output functions
- Accessing help

- Arithmetic operations and built-in functions
- Real Numbers
- Complex Numbers
- Lists (Python only)
- Tuples (Python only)
- Round functions
- Mathematical functions

- Functions and program control flow
- Introduction
- Scripts
- User defined functions
- Program control flow

- Vectors and matrices
- Introduction
- Creating scalars and arrays
- Sequences
- Subscripting arrays
- Special matrices
- Restructuring matrices
- Operations on matrices

- Symbolic mathematics
- Introduction
- Polynomials and function simplification
- Solutions of equations
- Limits
- Series expansion
- Series summation
- Symbolic operations on matrices
- Differentiation and Integration
- Ordinary differential equations
- Transforms
- Vector differential calculus
- Vector integral calculus

- Introduction to graphs and plots
- Introduction to plotting
- The plot functions
- Titles and axes labels (x and y)
- Creating multiple graphs
- Adding annotations / text to graphs
- X and Y axes properties
- Additional options
- Common 2D and 3D plots

- Direct solutions of linear systems of equations
- Introduction
- Elementary row operations
- Elementary row operation applications
- LU factorization
- Solutions of linear systems with built-in functions

- Iterative and conjugate gradient methods
- Introduction
- Vector norms
- Matrix norms
- Iterative techniques
- Conjugate gradient methods

- Solutions of single variable equations
- Introduction
- Closed domain methods
- Open domain methods
- Other methods
- Equations with multiple roots

- Solutions of systems of non-linear equations
- Introduction
- Fixed point iteration
- Newton’s method
- Quasi-Newton methods: Broyden method
- Steepest gradient techniques: Steepest descent
- Homotopy and continuation methods: Continuation algorithm

- Numerical differentiation
- Introduction
- Direct polynomial fit
- Newton difference methods
- Three point formulas
- Five point formulas
- Richardson extrapolation
- Second derivative mid-point formula

- Numerical integration
- Introduction
- Direct polynomial fit
- Newton-Cotes formulas
- Composite rules
- Romberg integration
- Gaussian quadrature
- Double integration

- Curve fitting and interpolation
- Introduction
- Least square regression
- Linearizing nonlinear data
- Polynomial interpolation
- Interpolation using Splines

- Initial value problems
- Introduction
- Single step methods
- Adaptive Runge-Kutta methods
- Multi-step methods
- Predictor-corrector methods
- Extrapolation method
- Systems of ordinary differential equations
- Higher order ordinary differential equations

- Boundary value problems
- Introduction
- Shooting method
- Finite difference method
- Rayleigh-Ritz method

#### Data Analytics and Machine Learning for Data Scientists

Data analytics / science has been ranked as one of the most preferred professions. The demand for data practitioners is continuing to be sky-high and thus business in all industries have been to capitilize on the increase in data and the new tools that are becoming available for analyzing big data and getting value from it.

The course is intended for anyone who has an interest in developing skills and experience to pursue a career in data science and machine learning. By the end of the course, participants will have done several practical assignments with real data sets to provide them with the confidence that they need to get into the data science profession.

**Course software (either of them, not both):**

- Python
- R programming

### Course content

#### Data Visualization and Dashboards for Enhanced Reporting

Representation of data in visualizations generated by statistical models help audience to get finer details from large and complex datasets through dynamic and interactive exploration. It is by so doing that big data is able to realize its potential for informing decisions and policy making.

This course is intented for those who do not have prior knowledge in data visualization using Tableau. The course makes use of the various Tableau capacilities to demostrate the best practices for data visualization and mining by using real world data sets.

By the end of the training, participants will be expected to generate powerful reports and dashboards that will help people make decisions on the basis of what the data says.

**Course software (either of them, not both):**

- Tableau
- Microsoft Excel