Professional Skills Framework

Mapping the Data Analytics and Data Science Profession

Our Certification is mapped to the EDISON Data Science Framework (EDSF) which has been developed to support, guide and ultimately accelerate the education process of fit-for-purpose Data Science Professionals.

The EU-funded EDISON Project has put in place foundation mechanisms that will speed-up the increase in the number of competent and qualified Data Scientists across Europe and beyond.

The EDSF is a collection of documents that define the Data Science profession. Freely available, these documents have been developed to guide educators and trainers, employers and managers, and Data Scientists themselves.

This collection of documents collectively breakdown the complexity of the skills and competences need to define Data Science as a professional practice.

 

Defining the Profession In a rapidly developing profession like Data Analytics and data Science, it is important for students, professionals and employers to be able to map how each role is defined, where they interface and how they interact.The Professional Skills Framework brings together these elements into one cohesive model and defining the skills, knowledge items and competences required for each role.
Business Analytics Lifecycle Each professional role defined in the Framework plays an important part in the overall Analytics Lifecycle. At different stages in the lifecycle, some roles become more prominent while others support. This can change and evolve as the project progresses.
Data Analytics
Professional
Data Application
Engineer
Data
Engineer
Data
Scientist
At least one of the following:
Informatica
Abinitio
Talend
IBM Data Studio
SAS Data Integration
Dataflux

At least one of the following:
Informatica
Abinitio
Talend
IBM Data Studio
SAS Data Integration
Dataflux
At least one of the following:
R
IBM SPSS
SAS
Alteryx
At least one of the following:
R
IBM SPSS
SAS
Alteryx
At least one of the following:
Tableau
Spotfire
Qlik
PowerBI
SAS Visual Analytics
At least one of the following:
Tableau
Spotfire
Qlik
PowerBI
SAS Visual Analytics
At least one of the following:
Tableau
Spotfire
Qlik
PowerBI
SAS Visual Analytics
Python
At least one of the following:
SQL
T-SQL (Stored procedures and functions)
PL/SQL
pgPL/Sql
Pig
Hive
Impala
At least one of the following:
Python
Java
JavaScript (or similar)
Go
Ruby
C
C#
VBA
At least one of the following:
Python
SQL
T-SQL (Stored procedures and functions)
PL/SQL
pgPL/Sql
Pig
Hive
Impala
Java
JavaScript (or similar)
Go
Ruby
C
C#
VBA
At least one of the following:
SQL Server
PostgreSQL
Teradata
Oracle
IBM DB2
MySql
SAP HANA
Mongo DB
At least one of the following:
SQL Server
PostgreSQL
Teradata
Oracle
IBM DB2
MySql
SAP HANA
Mongo DB
At least one of the following:
DOS batch
Power Shell
BASH (UNIX/LINUX utilities)
At least one of the following:
DOS batch
Power Shell
BASH (UNIX/LINUX utilities)
At least one of the following:
JIRA
MOVEit
SVN (Subversion)
Git
Monarch
At least one of the following:
JIRA
MOVEit
SVN (Subversion)
Git
At least one of the following:
JIRA
MOVEit
SVN (Subversion)
Git
Monarch
Data Analytics
Professional
Data Application
Engineer
Data
Engineer
Data
Scientist
Machine Learning (supervised): Decision trees, Naïve Bayes classification, Ordinary least square regression, Logistic regression, Neural Networks, SVM (Support Vector Machine), Ensemble methods, others
Systems Engineering and Software Engineering principles, methods and models, distributed systems design and organisation
Data management and enterprise data infrastructure, private and public data storage systems and services
Machine Learning (supervised): Decision trees, Naïve Bayes classification, Ordinary least square regression, Logistic regression, Neural Networks, SVM (Support Vector Machine), Ensemble methods. Systems Engineering and Software Engineering principles, methods and models, distributed systems design and organisation. Data management and enterprise data infrastructure, private and public data storage systems and services
Machine Learning (unsupervised): clustering algorithms, Principal Components Analysis (PCA), Singular Value Decomposition (SVD), Independent Components Analysis (ICA)
Cloud Computing, cloud based services and cloud powered services design
Data storage systems, data archive services, digital libraries, and their operational models
Machine Learning (unsupervised): clustering algorithms, Principal Components Analysis (PCA), Singular Value Decomposition (SVD), Independent Components Analysis (ICA). Cloud Computing, cloud based services and cloud powered services design. Data storage systems, data archive services, digital libraries, and their operational models.
Machine Learning (reinforced): Q-Learning, TD-Learning, Genetic Algorithms)
Big Data technologies for large datasets processing: batch, parallel, streaming systems, in particular cloud based
Data governance, data governance strategy, Data Management Plan (DMP)
Machine Learning (reinforced): Q-Learning, TD-Learning, Genetic Algorithms). Big Data technologies for large datasets processing: batch, parallel, streaming systems, in particular cloud based. Data governance, data governance strategy, Data Management Plan (DMP).
Data Mining (Text mining, Anomaly detection, regression, time series, classification, feature selection, association, clustering)
Applications software requirements and design, agile development technologies, DevOps and continuous improvement cycle
Data Architecture, data types and data formats, data modeling and design, including related technologies (ETL, OLAP, OLTP, etc.)
Data Mining (Text mining, Anomaly detection, regression, time series, classification, feature selection, association, clustering). Applications software requirements and design, agile development technologies, DevOps and continuous improvement cycle. Data Architecture, data types and data formats, data modeling and design, including related technologies (ETL, OLAP, OLTP, etc.)
Text Data Mining: statistical methods, NLP, feature selection, apriori algorithm, etc.
Systems and data security, data access, including data anonymisation, federated access control systems
Data lifecycle and organisational workflow, data provenance and linked data
Text Data Mining: statistical methods, NLP, feature selection, apriori algorithm, etc. Systems and data security, data access, including data anonymisation, federated access control systems. Data lifecycle and organisational workflow, data provenance and linked data
Prescriptive Analytics
Compliance based security models, privacy and IPR protection
Data curation and data quality, data integration and interoperability
Prescriptive Analytics. Compliance based security models, privacy and IPR protection. Data curation and data quality, data integration and interoperability
Prescriptive Analytics
Relational, nonrelational databases (SQL and NoSQL), Data Warehouse solutions, ETL (Extract, Transform, Load), OLTP, OLAP processes for large datasets
Data protection, backup, privacy, IPR, ethics and responsible data use
Prescriptive Analytics. Relational, nonrelational databases (SQL and NoSQL), Data Warehouse solutions, ETL (Extract, Transform, Load), OLTP, OLAP processes for large datasets. Data protection, backup, privacy, IPR, ethics and responsible data usu
Graph Data Analytics: path analysis, connectivity analysis, community analysis, centrality analysis, subgraph isomorphism, etc.
Big Data infrastructures, high-performance networks, infrastructure and services management and operation
Metadata, PID, data registries, data factories, standards and compliance
Graph Data Analytics: path analysis, connectivity analysis, community analysis, centrality analysis, subgraph isomorphism, etc. Big Data infrastructures, high-performance networks, infrastructure and services management and operation. Metadata, PID, data registries, data factories, standards and compliance
Qualitative analytics
Modeling and simulation, theory and systems
Open Data, Open Science, research data archives/repositories, Open Access, ORCID
Qualitative analytics. Modeling and simulation, theory and systems. Open Data, Open Science, research data archives/repositories, Open Access, ORCID
Natural language processing
Information systems, collaborative systems
Data preparation and pre-processing
Natural language processing. Information systems, collaborative systems. Data preparation and pre-processing
Business Analytics (BA) and Business Intelligence (BI); methods and data analysis; cognitive technologies
Optimisation
Business Analytics (BA) and Business Intelligence (BI); methods and data analysis; cognitive technologies. Optimisation
Data Warehouses technologies, data integration and analytics
Data driven User Experience (UX) requirements and design
Data Warehouses technologies, data integration and analytics
Data driven User Experience (UX) requirements and design
Data Analytics
Professional
Data Application
Engineer
Data
Engineer
Data
Scientist
Use appropriate data analytics and statistical techniques on available data to discover new relations and deliver insights into research problem or organizational processes and support decision-making.
Use engineering principles and modern computer technologies to research, design, implement new data analytics applications; develop experiments, processes, instruments, systems, infrastructures to support data handling during the whole data lifecycle.
Develop and implement data engineering strategy for data collection, integration, quality, lineage, security, storage, preservation, and availability for further processing.
Develop and implement data engineering strategies. Using modern principles and technologies to design, build and execute applications, experiments and data systems to support data handling during the entire data lifecycle. Discover new relations, deliver insights into research problems and support decision-making.
Effectively use variety of data analytics techniques, such as Machine Learning (including supervised, unsupervised, semisupervised learning), Data Mining, Prescriptive and Predictive Analytics, for complex data analysis through the whole Business Analytics lifecycle
Use engineering principles (general and software) to research, design, develop and implement new instruments and applications for data collection, storage, analysis and visualisation
Develop and implement data strategy, in particular, in a form of data management policy and Data Management Plan and path to execution of the plan - tooling & steps
Develop and implement data management policy and identify a path to its execution. Effectively use variety of data analytics techniques, such as Machine Learning, Data Mining, Prescriptive and Predictive Analytics, for complex data analysis through the whole Business Analytics lifecycle, as well as engineering principles to research and develop new tools for data collection, storage and analysis.
Apply designated quantitative techniques, including statistics, time series analysis, optimization, and simulation to deploy appropriate models for analysis and prediction
Develop and apply computational and data driven solutions to domain related problems using wide range of data analytics platforms, including Big Data technologies for large datasets and cloud based data analytics platforms
Develop and implement relevant data models, define metadata using common standards and practices, for different data sources in variety of scientific and industry domains
Develop and apply computational and data driven solutions to domain-related problems using a wide-range of data analytics platforms. Deploy appropriate models for analysis and prediction using quantitative techniques; statistics, time series analysis, simulation etc. Implement relative data models and define metadata using common standards and practises in a variety of scientific and industry domains.
Identify, extract, and pull together available and pertinent heterogeneous data, including modern data sources such as social media data, open data, governmental data
Develop and prototype specialised data analysis applicaions, tools and supporting infrastructures for data driven scientific, business or organisational workflow; use distributed, parallel, batch and streaming processing platforms, including online and cloud based solutions for on-demand provisioned and scalable services
Integrate heterogeneous data from multiple source and provide them for further analysis and use
Develop specialised applications for data analysis, supporting organisational workflow; use distributed, parallel, batch and streaming platforms. Identify, extract, and pull together available and pertinent heterogeneous data, including modern data sources such as social media data, open data, governmental data and integrate sources for further analysis and use.
Understand and use different performance and accuracy metrics for model validation in analytics projects, hypothesis testing, and information retrieval in line with the Business Analytics Lifecycle
Develop, deploy and operate large scale data storage and processing solutions using different distributed and cloud based platforms for storing data
Maintain historical information on data handling, including reference to published data and corresponding data sources - Data Lineage and Data Dictionary
Understand and use different performance and accuracy metrics for model validation in analytics projects, hypothesis testing, and information retrieval. Develop and operate large scale data storage and processing solutions using different distributed platforms and maintain historical information on data handling, including reference to published data and corresponding data sources - Data Lineage and Data Dictionary.
Develop required data analytics for organizational tasks, integrate data analytics and processing applications into organization workflow and business processes to enable agile decision making (Stage 5&6 of the Business Analytics Lifecycle)
Consistently apply data security mechanisms and controls at each stage of the data processing, including data anonymisation, privacy and IPR protection.
Ensure data quality, accessibility, interoperability, compliance to standards, and publication
Ensure data quality, accessibility, interoperability, compliance to standards, and publication. Consistently apply data security mechanisms and controls at each stage of the data processing, including data anonymisation, privacy and IPR protection and develop required data analytics for business tasks. Integrate analytics and applications into business workflow to enable decision making.
Visualise results of data analysis, design dashboard and use storytelling methods
Design, build, operate relational and nonrelational databases (SQL and NoSQL), integrate them with the modern Data Solutions, ensure effective ETL (Extract, Transform, Load), OLTP, OLAP processes as appropriate to the Data Application being engineered
Design, build, operate appropriate effective ETL (Extract, Transform, Load) solutions and processes for the Data Analysis being performed such that they can be both implemented and scaled into target environments
Design, build, operate relational and nonrelational databases (SQL and NoSQL) and ETL (Extract, Transform, Load) OLTP, OLAP processes as appropriate. Integrate them and ensure effective processes for the data analysis being performed, such that they can be implemented and scaled into target environments.
Results of data analysis, design dashboard and use storytelling methods
Results of data analysis, design dashboard and use storytelling methods
Results of data analysis, design dashboard and use storytelling methods
Results of data analysis, design dashboard and use storytelling methods