Fill-in-the-Blank: Data Science Foundations, Workflow, Ethics, and Cloud
Back to Pack

Fill-in-the-Blank: Data Science Foundations, Workflow, Ethics, and Cloud

Complete the sentences by filling in the blanks. Each correct answer earns points!

15 Questions • 150 Total Points
1

is a field that draws methods and skills from multiple disciplines to solve problems.

Context: Definition and Scope of Data Science

2

Data science uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms, and systems to extract or extrapolate knowledge from data.

Context: Meaning of noisy data in data science

3

has organized formats, while unstructured data lacks a fixed schema.

Context: Structured vs unstructured data concept

4

EDA uses graphics and descriptive statistics to explore patterns and generate .

Context: EDA purpose (generate hypotheses)

5

Confirmatory analysis applies statistical to test hypotheses and quantify uncertainty.

Context: Confirmatory analysis mechanism

6

EDA and confirmatory analysis differ because EDA explores patterns to generate hypotheses, while confirmatory analysis uses statistical inference to test hypotheses and quantify .

Context: Distinguishing EDA vs confirmatory analysis

7

Typical data science activities include data collection/integration, cleaning/preparation, , visualization/descriptive statistics, modeling, and communication/reproducibility.

Context: Core workflow activities

8

is the ability for others to repeat and verify results using shared artifacts like reports or notebooks.

Context: Meaning of reproducibility

9

CRISP-DM is a lifecycle framework describing steps from business understanding through and monitoring.

Context: CRISP-DM lifecycle scope

10

Big data workloads require heavy computation and storage, so and distributed frameworks are used to process data efficiently.

Context: Cause→effect: big workloads lead to cloud/distributed processing

11

Data science results must be trusted and reused, which is why practices (reports, notebooks, dashboards) are emphasized.

Context: Cause→effect: trust/reuse leads to reproducibility

12

Machine learning models are trained on biased data, which can cause models to produce discriminatory or unfair outcomes due to .

Context: Cause→effect: biased training data leads to unfair outcomes via bias amplification

13

AI systems grow larger and more complex, so approaches become increasingly important.

Context: Cause→effect: complexity leads to data-centric approaches

14

Data science involves collecting and analyzing personal and sensitive information, which creates ethical risks such as privacy violations and negative societal impacts; this risk arises because data handling and analysis can expose or misuse sensitive attributes without safeguards, linking to .

Context: Ethics in data science as the umbrella concept for privacy risks

15

Cloud computing provides scalable storage and compute, but it does not replace the need for skills like cleaning, feature work, modeling, evaluation, and communication.

Context: Common confusion: cloud does not replace data science skills