Complete the sentences by filling in the blanks. Each correct answer earns points!
is a field that draws methods and skills from multiple disciplines to solve problems.
Context: Definition and Scope of Data Science
Data science uses statistics, scientific computing, scientific methods, processing, scientific visualization, algorithms, and systems to extract or extrapolate knowledge from data.
Context: Meaning of noisy data in data science
has organized formats, while unstructured data lacks a fixed schema.
Context: Structured vs unstructured data concept
EDA uses graphics and descriptive statistics to explore patterns and generate .
Context: EDA purpose (generate hypotheses)
Confirmatory analysis applies statistical to test hypotheses and quantify uncertainty.
Context: Confirmatory analysis mechanism
EDA and confirmatory analysis differ because EDA explores patterns to generate hypotheses, while confirmatory analysis uses statistical inference to test hypotheses and quantify .
Context: Distinguishing EDA vs confirmatory analysis
Typical data science activities include data collection/integration, cleaning/preparation, , visualization/descriptive statistics, modeling, and communication/reproducibility.
Context: Core workflow activities
is the ability for others to repeat and verify results using shared artifacts like reports or notebooks.
Context: Meaning of reproducibility
CRISP-DM is a lifecycle framework describing steps from business understanding through and monitoring.
Context: CRISP-DM lifecycle scope
Big data workloads require heavy computation and storage, so and distributed frameworks are used to process data efficiently.
Context: Cause→effect: big workloads lead to cloud/distributed processing
Data science results must be trusted and reused, which is why practices (reports, notebooks, dashboards) are emphasized.
Context: Cause→effect: trust/reuse leads to reproducibility
Machine learning models are trained on biased data, which can cause models to produce discriminatory or unfair outcomes due to .
Context: Cause→effect: biased training data leads to unfair outcomes via bias amplification
AI systems grow larger and more complex, so approaches become increasingly important.
Context: Cause→effect: complexity leads to data-centric approaches
Data science involves collecting and analyzing personal and sensitive information, which creates ethical risks such as privacy violations and negative societal impacts; this risk arises because data handling and analysis can expose or misuse sensitive attributes without safeguards, linking to .
Context: Ethics in data science as the umbrella concept for privacy risks
Cloud computing provides scalable storage and compute, but it does not replace the need for skills like cleaning, feature work, modeling, evaluation, and communication.
Context: Common confusion: cloud does not replace data science skills