8 COURSES | 8 WEEKS
Starts: October 6th
Ends: November 26th
Tuesday & Thursday 17.00 – 20.00
8 courses / 16 sessions
– 6 hours per course
– 3 hours each session
Online – synchronous classes: Cisco Webex
Each Course: 140 Euros *
4+ Courses: 10% discount
All Courses: 15% discount
* VAT Exempted
We are pleased to annouce the first edition of the Applied Data Analysis School that consists of a series of 8 courses which can be taken individually or as a whole as required. Courses are taught interactively using a blend of theory, follow-along demonstrations and exercises. The courses will be taught using R, RStudio, Jupyter Notebooks and JupyterLab.
The courses are designed for academic researchers, including Master/PhD students, who have a basic knowledge of statistics and econometrics, and who deal with different types of data and projects in their day-to-day work. The course is also recommended to non-academic staff with an interest in data analysis from an econometric perspective. Professionals who are interested to learn different techniques and raise their awareness of possible methodologies that can be used in their current or future projects will greatly benefit from this course. The courses will be taught in English.
The instructors have extensive experience teaching statistics, economics and applied econometrics. Applicants to this course are encouraged to bring their research questions, as they will benefit from the instructors’ wide collaborations with different researchers, across several countries, as well as their involvement supervising graduate students. Our instrutors are:
• Anabela Carneiro | University of Porto
• Cristina Amado | University of Minho
• João Cerejeira | University of Minho
• Miguel Portela | University of Minho
• Nelson Areal | University of Minho
• Rita Sousa | Bank of Portugal
1. INTRODUCTION TO R | By: Nelson Areal | 6 & 8 October
Date: 6 & 8 October | 17h00-20h00
Delivered by: Nelson Areal, University of Minho
The goal of this module is to provide an introduction to R enabling participants to understand and use the basics of the R language. We will start with base R and its fundamental data structures and quickly progress to the concept of tidy data and the tidy universe of packages to import and manipulate data (namely: select, mutate, filter, summarize and merge data sets using the tidy data principles). We will also cover how to transform data from wide to long format and vice-versa. We will follow this with more advanced R language concepts and strategies that are needed to create your own functions, to profile your code and parallelize code execution.
- Introduction to R: what it is, and why you should use it
- Rstudio IDE
- R Basics
3.1 Mathematical operations; comparisons; functions; help
3.2 R object types (variables; datatypes and data structures)
3.3 Sub setting and creating sequences. Model formulae in R
3.4 Packages in R and package management
- Data manipulation with R
4.1 Reading and exporting data (csv, Excel, Stata, SPSS, Matlab, databases and other file formats)
4.2 Filtering, cleaning, merging, sorting and transforming data (dplyr, tydr and pipes)
4.3 String manipulation and date manipulation
4.4 Using apply functions
- Introduction to R programming
5.1 Creating your own functions
5.2 Loops and conditional statements
5.3 Profiling and benchmarking R code
5.4 Parallel code execution
- Matloff, Norman (2011) “The Art of R Programming: A Tour of Statistical Software Design”, No Starch Press, 1st Ed., pp. 400. ISBN: 1593273843.
- Wickham, H. (2014). Tidy data. Journal of Statistical Software, 59(10), 1-23.
- Wickham, H., & Grolemund, G. (2017). R for data science: import, tidy, transform, visualize, and model data. O’Reilly Media, Inc.
2. DATA VISUALIZATION WITH R | By: Nelson Areal | 13 & 15 October
Date: 13 & 15 October | 17h00-20h00
Delivered by: Nelson Areal, University of Minho
This module presents basic visualization principles, the characteristics of a good plot, how to avoid deceiving the reader, and how to create plots. We will learn how to create plots for categorical and continuous variables, plots that display several variables simultaneously, adding layers such as summaries, create small multiples, annotate plots and create maps. The module will use ggplot2 package extensively. This package is based on the Grammar of Graphics, is very flexible and allows the user to control the end result of the plot.
- Basic visualization principles. What we should know, and what we should avoid.
- Introduction to ggplot2: core principles
- Plotting categorical and continuous variables
- Small multiples
- Adding summaries
- Annotating plots
- Creating simple maps
- Healy, K. (2019). Data visualization: a practical introduction. Princeton University Press.
- Wickham, H. (2020). ggplot2: elegant graphics for data analysis. 3rd edition. https://ggplot2-book.org
- Wilkinson, L. (2005). The Grammar of Graphics. 2nd ed. Statistics and Computing. Springer
3. BIG DATA: PREPARATION AND EXPLORATORY DATA ANALYSIS USING R | By: Rita Sousa | 20 & 22 October
Date: 20 & 22 October | 17h00-20h00
Delivered by: Rita Sousa, Bank of Portugal
The Exploratory Data Analysis helps an analyst to understand complex data sets and answer research questions using multiple types of numerical and graphical summarization techniques. This module shows how to explore and prepare your data using the R programming language and packages. Here we will see tools and examples of good practices in accessing and manipulating data, addressing the most common challenges faced when dealing with big data.
- Importing datasets and database connection
- Data cleaning and manipulation
- Data manipulation with pipes
- Exploratory techniques for summarizing data
- Gillespie, C., Lovelace, R. (2016). Efficient R Programming. O’Reilly Media, Inc.
- Grolemund, G., Wickham, H. (2016). R for Data Science. O’Reilly Media, Inc.
- Lander, J. (2014). R for Everyone: Advanced Analytics and Graphics, 2nd Edition. Addison-Wesley Data and Analytics.
- Matloff, N. (2011). The Art of R Programming. No Starch Press.
- Pearson, R. K. (2018). Exploratory Data Analysis Using R. Chapman and Hall/CRC.
- Teetor, P. (2019). R Cookbook, 2nd Edition. O’Reilly Media, Inc.
- Walkowiak, S. (2016). Big Data Analytics with R: Leverage R Programming to uncover hidden patterns in your Big Data. Packt Publishing.
4. LITERATE PROGRAMMING IN R MARKDOWN | By: Miguel Portela and Nelson Areal | 27 & 29 October
Date: 27 & 29 October | 17h00-20h00
Delivered by: Miguel Portela, University of Minho and Nelson Areal, University of Minho
Literate programming refers to melding a descriptive narrative and computer code into a single document, from which both human-friendly documentation and computer readable files can be created. Your work should be transparent, easy to update, easy to maintain, and easy to replicate. Literate programming saves time and effort, so we can dedicate more time doing research. Literate programming is also useful for teaching.
- Markdown and Pandoc
- Create a markdown document and run code
- Develop a report
- Publish the report
- Xie, Y., Allaire, J.J. and Grolemund, G., 2018. R markdown: The definitive guide. CRC Press. (https://bookdown.org/yihui/rmarkdown/)Course 5: Web-based tools for data analysis: JupyterLab environment and workflow optimization
- The Jupyter Notebook: https://jupyter-notebook.readthedocs.io/
- Project Jupyter: https://jupyter.org/
5. WEB-BASED TOOLS FOR DATA ANALYSIS: JUPYTERLAB ENVIRONMENT AND WORKFLOW OPTIMIZATION | By: Miguel Portela | 3 & 5 November
Date: 3 & 5 November | 17h00-20h00
Delivered by: Miguel Portela, University of Minho
The Jupyter Notebook is an interactive computing environment that enables users to develop a data science project. It is an open-source web application that allows you to create and share documents that contain live code, equations, and visualizations, among others. It can be used in data cleaning and transformation tasks, numerical simulation, statistical modeling, data visualization, or machine learning. The code and its output is integrated into a single document that combines visualizations, mathematical equations and discussion.
- Create your first notebook
- Develop a R data analysis template using a notebook
- Run online notebooks
- Share your notebook
- The Jupyter Notebook: https://jupyter-notebook.readthedocs.io/
- Project Jupyter: https://jupyter.org/
6. REGRESSION ANALYSIS AND CAUSALITY WITH R | By: João Cerejeira | 10 & 12 November
Date: 10 & 12 November | 17h00-20h00
Delivered by: João Cerejeira, University of Minho
Regression modeling is a fundamental tool for researchers who want to quantify causal relationships from observational data. This course is intended as an overview of the theoretical concepts necessary to understand regression models and how to implement them using R. Focusing on the use of regression analysis applied to program evaluation, the course will cover instrumental variables, propensity scores matching and difference-in-differences methodologies, that are widely employed for designing and conducting impact evaluations.
- Econometric concepts: Regression analysis — OLS & GLS
- Causality under cross-section data
- Basic issues in program evaluation; Causality and the problem of selection bias
- Regression and causality
- Instrumental variables (two stage least squares (2SLS); weak instruments; overidentification tests)
- Binary choice models
- Propensity score matching and estimation and Pro
- Longitudinal data: Difference-in-differences (DD)
- Angrist, Joshua D. and Jörn-Steffen Pischke (2009), Mostly Harmless Econometrics:
- An Empiricist’s Companion, Princeton University Press
- Baum, Christopher (2006), An Introduction to Modern Econometrics Using Stata, Stata Press.
- Cameron, Colin and Pravin Trivedi (2010), Microeconometrics Using Stata, Stata Press.
- Cameron, Colin and Pravin Trivedi (2005), Microeconometrics: Methods and Applications, Cambridge University Press.
7. PANEL DATA MODELS WITH R | By: Miguel Portela and Anabela Carneiro | 17 & 19 November
Date: 17 & 19 November | 17h00-20h00
Delivered by: Miguel Portela, University of Minho and Anabela Carneiro, University of Porto
Prior knowledge on introductory econometrics is recommended. Both static fixed- and random-effects models, as well as dynamic models, will be specified and estimated. Lectures will combine theoretical discussion of the models with substantive empirical analysis of longitudinal data using Stata. Appropriate hypothesis testing is defined, not only on the significance of parameters, but equally on the validity of underlying assumptions, namely homoscedasticity, autocorrelation and endogeneity. The module aims at providing key guidelines on panel data handling and exploration. The concept underlying the module is designing a roadmap to carry out research that takes advantage of these data.
- Panel data regression: dealing with endogeneity issues
- Data structure & discussion
- Model specification
- Fixed and Random Effects in Static Models
- Hausman test for the validity of the random effects model
- Hypothesis testing
- Heteroscedasticity, Autocorrelation, Robust Estimation
- Dynamic Panel Data: endogeneity & GMM
- First-differences: Arellano & Bond Exogenous vs. Endogenous regressors
- Validity of the Instruments: Sargan test
- The system GMM estimator
- Dynamic panel data models & variance correction
- Unit root tests for panel data
- Arellano, M. (2003), Panel Data Econometrics, Oxford University Press: New York.
- Verbeek, M. (2012), A Guide to Modern Econometrics, 4th ed., John Wiley & Sons, Ltd.: Chichester, England.
- Wooldridge, J. (2010), Econometric Analysis of Cross Section and Panel Data, 2nd ed., The MIT Press: Cambridge, Massachusetts.
8. FORECASTING METHODS AND APPLICATIONS WITH R | By: Cristina Amado | 24 & 26 November
Date: 24 & 26 November | 17h00-20h00
Delivered by: Cristina Amado, University of Minho
The course aims to cover an introduction to the most widely used forecasting methods. The emphasis of the course is on the practice of time series techniques to forecast economic and financial data, but it also reviews underlying theoretical concepts. Students are trained to enhance their forecasting skills as economic and financial decision-making tools. At the end of the course, students should be able to select the best method for producing economic and financial forecasts.
- Introduction to Forecasting
- Regression Analysis and Forecasting
- Exponential Smoothing Methods
- Univariate Time Series Models
- Trend and Seasonality
- Forecasting Volatility
- Diebold, F. X. (2007), Elements of Forecasting, 4th edition, South-Western College Publishing.
- Montgomery, D. C., Jennings, C. L. and Kulahci, M. (2015), Introduction to Time Series Analysis and Forecasting, 2nd edition, Wiley.
- Brockwell, P. J. and Davis, R. A. (2002), Introduction to Time Series and Forecasting, 2nd edition, Springer-Verlag, New York.
Anabela Carneiro is an Assistant Professor (with Habilitation) at the University of Porto (FEP, U.Porto) and a Research Economist at the Center for Economics and Finance at UPorto. She is also the Director of the Master in Economics at FEP, U.Porto. She holds a Doctoral (2003)degree in Economics from the University of Porto. Her research interests are in Labor Economics, Economics of Education, and Entrepreneurship. She has published in leading economics journals, including the American Economic Journal: Macroeconomics, Journal of Human Resources, and Journal of Management Studies. She has also participated in the production of several technical reports on the labor market for entities such as the Norte Portugal Regional Coordination and Development Commission, Portuguese Ministry of Economy and Ministry of Labor of Mozambique.
Cristina Amado is currently an Assistant Professor in the Department of Economics at the University of Minho, Portugal, and an international research fellow at CREATES, Aarhus University. She holds a PhD in Economic Statistics from the Stockholm School of Economics with a thesis entitled “Four essays on the econometric modelling of volatility and durations” since 2009 under the supervision of Professor Timo Teräsvirta. She is also a research member at the Centre for Research in Economics and Management (NIPE). Her main research interests lie within the fields of time series analysis, nonlinear time series modelling and mathematical statistics.
João Cerejeira is a PhD in Economics by the European University Institute. He is Assistant Professor at Minho University, Portugal. His research interests are in labor economics, human capital and urban economics. His publications include The World Economy, Economics Letters and Higher Education. He has written policy-oriented reports on Minimum Wage, Education and Employment on the Portuguese labor market. He has also consultancy experience, both for private as well as public institutions
Miguel Portela is a PhD in Economics by Tinbergen Institute/University of Amsterdam, 2007. He is currently Associate Professor with Habilitation at Universidade do Minho, and Director of its Doctoral Programme in Economics. He is also affiliated with NIPE/U Minho, CIPES and IZA, Bonn. He has an ongoing collaboration with the Bank of Portugal. His research interests’ lie in the areas of labor economics, economics of education and applied econometrics. He counts several papers, books and book chapters, and has been published in Econometrica, Scandinavian Journal of Economics, Regional Studies and Studies in Higher Education. His work has been cited in Brookings Papers on Economic Activity, Review of Economics and Statistics, Journal of Business and Economic Statistics, Industrial and Labor Relations Review and Labour Economics, among others. He has on-going research collaborations across different countries, leads and integrates research teams working in financed projects and has written policy-oriented reports on Minimum Wage, Education and Employment on the Portuguese labor market. He has also consultancy experience, both for private as well as public institutions.
Nelson Areal is an Associate Professor of Finance at the School of Economics and Management, Department of Management, University of Minho. His research interests include risk measurement and forecasting, option valuation using numerical methods, performance measurement, socially responsible investments, and management education. He has a PhD in Accounting and Finance (Lancaster University, 2006; thesis title “Essays on FTSE-100 volatility and options valuation”); MSc in Business Administration with specialization in Corporate Finance (University of Minho, 1998); BSc in Management by the University of Minho (1992). My career also includes two years (1992-1994) as an Information Systems Auditor at Ernst & Young. Co.
Rita Sousa is a PhD in Statistics and Risk Management by the New University of Lisbon. She is Senior Data Analyst in the Microdata Research Laboratory (BPLIM), Economics and Research Department, at Banco de Portugal, since 2015. She has 15 years of experience as Methodologist in the Department of Methodology and Information Systems at Statistics Portugal, from 2000 to 2015. She was Invited Assistant Professor at University of Porto during more than 10 years. She also has experience as a Trainer in Statistical Software. His research interests are Sampling, Analysis of Large Data Sets, Statistical Disclosure Control and Estimation of Sensitive Variables.
Prerequisites and Methodology
The courses will be delivered online synchronous, using Cisco Webex.
Courses are taught interactively using a blend of theory, follow-along demonstrations and exercises. The courses will be taugth using R, RStudio, Jupyter Notebooks and JupyterLab.
Learning Ratio: Theory : 50% | Practical: 50%
Integrated Solutions for Data Analysis | We Find Solutions, We Deliver Knowledge
Advanced and scientific training using the most advanced tools such as R, Python, SPSS, NVivo, MAXQDA, Stata, Eviews, MATLAB, among others. Our courses are certified by DGERT and our associated academic institutions. We work in conjunction with a wide range of leading academics and professionals to deliver our courses.
We distribute scientific software to several areas of knowledge. At GADES Solutions, we provide an integrated service of advice, sales and technical assistance in all software.
Focused on the needs of each client, our projects use an integrated Consulting and Training Onsite approach, with resources to the most advanced Data Analysis tools.
Talk to us
Rua Ferreira de Castro nº19
2635-361, Sintra, Portugal
Telephone: +351 210 124 743
Cellphone: +351 932 027 860
Please enter your details. We will contact you shortly.
COPYRIGHT © 2019 • GADES SOLUTIONS • TODOS OS DIREITOS RESERVADOS