Term | Definition | Formula | |||
---|---|---|---|---|---|
Mean | The arithmetic average of a set of values. | ||||
Median | The middle value of a dataset. | For | |||
Mode | The most frequently occurring value in a dataset. | ||||
Central Limit Theorem | A theorem stating that the sampling distribution of the sample mean of a random variable approaches a normal distribution as the sample size increases, regardless of the shape of the population distribution, provided that the sample size is sufficiently large. | ||||
Weighted Mean | The average of a set of values, each multiplied by a corresponding value called its weight. | ||||
Trimmed Mean | A trimmed mean is calculated by removing a certain percentage of the lowest and highest values from a dataset and then calculating the mean of the remaining values. | | |||
Mean Absolute Deviation | A measure of the average absolute deviation of a set of values from their mean. | ||||
Median Absolute Deviation | A measure of the spread of data points around the median. | | |||
Moving Average | A statistical technique used to smooth out fluctuations in data by averaging data points over a specified period. It helps identify trends by reducing the impact of random variations. | |
Term | Definitions | Formula | |||
---|---|---|---|---|---|
Hypothesis Testing | A statistical method for making inferences about population parameters based on sample data. | ||||
p-value | A measure of the strength of evidence against the null hypothesis in hypothesis testing. | ||||
Type I Error | Occurs when the null hypothesis is incorrectly rejected when it is actually true. | ||||
Type II Error | Occurs when the null hypothesis is incorrectly accepted when it is actually false. | ||||
Effect Size | A measure of the magnitude of the difference or relationship between two groups or variables in a statistical analysis. | ||||
Z-test | A statistical test used to compare a sample mean to a known population mean when the population standard deviation is known. | | |||
F-test | A statistical test used to compare the variances or standard deviations of two or more groups. | ||||
False Positive Rate | |||||
True Positive Rate | |||||
T-test | A statistical test used to compare the means of two groups and determine whether they are significantly different from each other. | | |||
Linear Regression | A statistical method for modeling the relationship between a dependent variable and one or more independent variables by fitting a linear equation to observed data. | ||||
Recall | |||||
Accuracy | |||||
Precision |
Term | Definition | ||
---|---|---|---|
Independent Variable | The variable that is manipulated or controlled by the experimenter. | ||
Dependent Variable | The variable that is measured or observed to determine the effect of the independent variable. | ||
Control Group | A group of subjects or samples that is treated identically to the experimental group(s) except for the manipulation of the independent variable. The control group provides a baseline for comparison to assess the effect of the independent variable. | ||
Factorial Design | An experimental design in which multiple independent variables, known as factors, are manipulated simultaneously to investigate their main effects and interactions. |
Term | Definitions | Formula | |||
---|---|---|---|---|---|
Minimum | The smallest value in a dataset. | ||||
Maximum | The largest value in a dataset. | ||||
Range | The difference between the maximum and minimum values in a dataset. | ||||
Interquartile Range | The range within which the middle 50% of the data values in a dataset lie. | | |||
Outliers | Data points that significantly differ from the rest of the dataset. | ||||
Sum of Squares | The sum of the squared differences between each data point and the mean of the dataset. | | |||
Standard Deviation | A measure of the amount of variation or dispersion in a set of values. | ||||
Coefficient of Variation | A measure of relative variability, calculated as the ratio of the standard deviation to the mean, expressed as a percentage. | | |||
Upper Fence | The threshold beyond which data points are considered potential outliers. | | |||
Lower Fence | The threshold below which data points are considered potential outliers. | | |||
Standard Error of the Mean | An estimate of how much the sample mean is expected to vary from the true population mean. | | |||
Confidence Intervals | A range of values that is likely to contain the true population parameter, with a certain level of confidence. |
Term | Definition | Symbol | |||
---|---|---|---|---|---|
Sample Space | The set of all possible outcomes of a random experiment. | ||||
Event | A subset of the sample space, representing a collection of outcomes of interest. | ||||
Independence | Two events are independent if the occurrence of one event does not affect the occurrence of the other. | ||||
Conditional Probability | The probability of an event | ||||
Random Variables | A variable whose possible values are numerical outcomes of a random phenomenon. | ||||
Bayes' Theorem | A formula that calculates the probability of an event A given that event B has occurred. | ||||
Addition Rule | The probability of the union of two independent events A and B is the sum of their individual probabilities | ||||
Multiplication Rule | The probability of the intersection of two independent events A and B is the product of their individual probabilities. |
Term | Definition | Probability Mass Function | |||
---|---|---|---|---|---|
Binomial Distribution | A discrete probability distribution that describes the number of successes in a fixed number of independent Bernoulli trials. | ||||
Poisson Distribution | A discrete probability distribution that describes the number of events occurring in a fixed interval of time or space. | ||||
Bernoulli Distribution | A probability distribution representing a single trial with two possible outcomes: success (usually denoted as 1) and failure (usually denoted as 0). | ||||
Geometric Distribution | A probability distribution representing the number of trials needed to achieve the first success in a sequence of independent and identically distributed Bernoulli trials, each with probability | ||||
Hypergeometric Distribution | A discrete probability distribution that describes the probability of a specified number of successes in a fixed number of draws without replacement from a finite population. | ||||
Negative Binomial Distribution | A probability distribution that models the number of failures that occur before a specified number of successes | ||||
Multinomial Distribution | A probability distribution that generalizes the binomial distribution to more than two categories. | | |||
Probability Mass Function | A function that gives the probability of a discrete random variable taking on a particular value. |
Term | Definitions | Probability Density Function | |||
---|---|---|---|---|---|
Normal Distribution | A probability distribution that is symmetric, bell-shaped, and characterized by its mean and standard deviation. | ||||
Log-Normal Distribution | A probability distribution of a random variable whose logarithm is normally distributed. | ||||
Exponential Distribution | A probability distribution that describes the time between events in a Poisson process, where events occur continuously and independently at a constant average rate. It is commonly used to model the waiting times between successive events, such as the time between arrivals of customers at a service center or the lifespan of electronic components. | ||||
Chi-Squared Distribution | A probability distribution that arises in the context of hypothesis testing and is characterized by a single parameter, often denoted as kk, representing the degrees of freedom. The chi-squared distribution is the distribution of the sum of the squares of kk independent standard normal random variables. | ||||
Beta Distribution | A probability distribution defined on the interval [0, 1], often used to model random variables representing proportions, probabilities, or percentages. The beta distribution is characterized by two shape parameters, denoted as αα and ββ, which determine the shape of the distribution. | ||||
Mixed Probability Distribution | A probability distribution that is a mixture of two or more component distributions. Each component distribution is weighted by a probability, and the mixture distribution is obtained by summing or integrating the component distributions according to their weights | ||||
Joint Distribution | A probability distribution that describes the simultaneous behavior of two or more random variables. | ||||
Probability Mechanism | A function that describes the likelihood of a continuous random variable falling within a particular range of values. |
Term | Definition | ||
---|---|---|---|
Simpson's Paradox | A statistical paradox where a trend appears in different groups of data but disappears or reverses when the groups are combined. It occurs when a confounding variable is not taken into account in the analysis, leading to misleading conclusions. | ||
Sampling Bias | Bias in which a sample is collected in such a way that it is not representative of the population being studied, often resulting in skewed or inaccurate results. | ||
Survivorship Bias | The tendency to focus on individuals or things that have survived a process while overlooking those that did not, leading to an incomplete or biased analysis. | ||
Selection Bias | A bias introduced when individuals or data points are not selected randomly or systematically, leading to a non-representative sample and potentially biased results. | ||
Overfitting | A modeling error that occurs when a statistical model captures noise or random fluctuations in the training data rather than the underlying relationship, resulting in poor generalization to new data. | ||
Data Drift | Changes in the underlying data distribution over time, leading to a decrease in the performance of predictive models trained on historical data. | ||
Concept Drift | Changes in the relationship between variables over time, making predictive models less accurate as the underlying patterns evolve. | ||
Seasonality | Patterns in data that occur at regular intervals due to seasonal variations, such as temperature changes, holiday seasons, or sales cycles. | ||
Underfitting | A modeling error that occurs when a statistical model is too simplistic to capture the underlying structure of the data, leading to poor predictive performance. |
Term | Definition | ||
---|---|---|---|
Entropy | Measures the uncertainty or randomness associated with a random variable's possible outcomes. | ||
Entropy of Probability Function | |||
Cross Entropy | |||
Kullback-Leibler Divergence | |||
Information Gain |
Term | Definitions | ||
---|---|---|---|
Skew | A measure of the asymmetry or skew of a probability distribution. | ||
Kurtosis | A measure of the 'tailedness' or sharpness of the peak of a frequency distribution. | ||
Kurtosis Excess | A measure of kurtosis that adjusts for the standard normal distribution's kurtosis value of 3. |
AI Study Tools for STEM Students Worldwide.
© 2025 CompSciLib™, LLC. All rights reserved.
info@compscilib.comContact Us