Any statistical study of mass social phenomena includes 3 main stages:

    Statistical observation - primary statistical data, or initial statistical information, which is the basis of statistical research, is generated. If an error is made during the collection of primary statistical data or the material turns out to be of poor quality, this will affect the correctness and reliability of both theoretical and practical conclusions;

    Data summary and grouping - at this stage, the population is divided according to the signs of difference and united according to the signs of similarity, the total indicators are calculated for the groups and as a whole. Using the grouping method, the phenomena under study are divided into types, groups and subgroups, depending on their essential characteristics. The grouping method makes it possible to limit populations that are qualitatively homogeneous in significant respects, which serves as a prerequisite for the definition and application of generalizing indicators;

    Processing and analysis of received data, identifying patterns. At this stage, with the help of generalizing indicators, relative and average values ​​are calculated, a summary assessment of the variation of characteristics is given, the dynamics of phenomena are characterized, indices and balance sheets are used, indicators are calculated that characterize the closeness of connections in changes in characteristics. For the purpose of the most rational and visual presentation of digital material, it is presented in the form of tables and graphs.

Lecture No. 2. Statistical observation

1. Concept and forms of statistical observation

Statistical observation is the first stage of any statistical research.

Statistical observation is a scientifically organized work to collect mass primary data on the phenomena and processes of social life.

However, not every collection of information is a statistical observation. We can talk about statistical observation only when statistical patterns are studied, i.e. those that manifest themselves only in a mass process, in a large number of units of some aggregate.

Therefore, the statistical observation should be:

    systematic - to prepare and carry out according to a developed plan, which includes issues of methodology, organization, technology for collecting information, quality control of the collected material, its reliability, and presentation of the final results;

    massive - to cover a large number of cases of manifestation of a given process, sufficient to obtain true statistical data characterizing not only individual units, but also the entire population as a whole;

    systematic - the study of trends and patterns of socio-economic processes characterized by quantitative and qualitative changes is possible only on the basis of systematicity.

The following basic requirements apply to statistical observation:

    completeness of statistical data (completeness of coverage of units of the population being studied, aspects of a particular phenomenon, as well as completeness of coverage over time);

    reliability and accuracy of data;

    uniformity and comparability of data.

In statistical practice, two organizational forms of observation are used:

1) reporting is an organizational form in which observation units present information about their activities in the form of forms of a regulated sample. The peculiarity of reporting is that it is mandatory, documented and legally confirmed by the signature of the manager;

2) special statistical surveys, examples of which are population censuses, sociological studies, censuses of residual material and other observations that are carried out if problems arise for which there is not sufficient information. They provide additional material to the reporting data or use them to verify the reporting data.

1. STAGES OF STATISTICAL RESEARCH

The process of studying socio-economic phenomena through a system of statistical methods and quantitative characteristics - a system of indicators - is called statistical research.

The main stages of conducting a statistical study are:

1) statistical observation;

2) summary of the data obtained;

3) statistical analysis.

If necessary, a statistical study may contain an additional stage - a statistical forecast.

Statistical observation is a scientifically organized collection of data about the phenomena and processes of social life through registration according to a pre-developed program for observing their essential features. Observation data represents primary statistical information about the observed objects, which is the basis for obtaining their general characteristics. Observation acts as one of the main methods of statistics and as one of the most important stages of statistical research.

Conducting a statistical study is impossible without a high-quality information base obtained during statistical observation. Therefore, since the change in ideas about statistics as a descriptive science, special rules for conducting observations and special requirements for its results - statistical data - have been developed. That is, observation is one of the main methods of statistics.

Observation is the first stage of statistical research, the quality of which determines the achievement of the final objectives of the study.

1.1. Observation is carried out according to a specially prepared program.

The program includes a list of characteristics of the research object, data about which must be obtained as a result of observation.

When preparing an observation, it is necessary to determine in advance:

1. An observation program in which:

a) the object of observation is determined, i.e. that set of units of a phenomenon that needs to be investigated. Moreover, it is necessary to distinguish the observation unit from the reporting unit. A reporting unit is a unit providing statistical data; it may consist of several population units, or may coincide with a population unit. For example, in a population survey, the unit might be the household member and the reporting unit might be the household.

b) the boundaries of the observation object are determined.

c) the characteristics of the object of observation are identified, information about which must be obtained as a result of observation.

2. Time of observation of an object - the time as of which or for which information about the object being studied is recorded.

3. Timing of observation. That is, the period of time for data collection and the date of completion of observation are determined. The observation period affects the completion time of the overall statistical study and the timeliness of its conclusions.

4. Funds and resources required for monitoring: number of qualified specialists; material resources; means for processing observation results.

5. Requirements for statistical data. The main requirements are: a) reliability, i.e. information about the object of research should reflect its real state at the time of observation; b) comparability of data, i.e. information obtained as a result of observation must be comparable, which is ensured by a unified methodology for collecting and analyzing data, by units of measurement, etc.

1.2. There are several types of statistical observation.

1. By coverage of population units:

a) solid;

b) non-continuous (selective, monographic, based on the bulk method)

2. According to the time of registration of facts: a) current (continuous); b) discontinuous (periodic, one-time)

3. By the method of collecting information: a) direct observation; b) documentary observation; c) survey (questionnaire, correspondent, etc.)

Summary is the process of bringing the received data into the system, processing it and calculating intermediate and general results, calculating interrelated quantities of an analytical nature.

The next stage of statistical research is the preparation of information obtained during observation for analysis. This stage is called summary.

Summary includes:

— systematization of information obtained during observations;

— their grouping;

— development of a system of indicators characterizing educated groups;

— creation of development tables for grouped data;

— calculation of derived quantities using development tables.

In the literature on the theory of statistics, one often encounters consideration of summary and grouping as independent stages of research. However, it should be noted that the concept of summary includes actions to group statistical data, so here the concept of “summary” is adopted as the name of the research stage.

Statistical analysis is a study of the characteristic features of the structure, relationships of phenomena, trends, patterns of development of socio-economic phenomena, for which specific economic-statistical and mathematical-statistical methods are used. Statistical analysis concludes with the interpretation of the results obtained.

Statistical forecast is a scientific identification of the state and probable paths of development of phenomena and processes, based on a system of established cause-and-effect relationships and patterns.

EXERCISE 1

As a result of a sample survey of wages of 60 employees of an industrial enterprise, the following data were obtained (Table 1).

Construct an interval distribution series based on the effective attribute, forming five groups with equal intervals.

Determine the main indicators of variation (variance, standard deviation, coefficient of variation), the average power value (the average value of the characteristic) and structural averages. Represent it graphically in the form of: a) a histogram; b) cumulates; c) ogives. Draw a conclusion.

SOLUTION

1. Let us determine the scope of variation according to the effective attribute - according to production experience using the formula:

R = Хmax – Хmin = 36 – 5 = 31

where Xmax is the maximum size of assets

Хmin – minimum asset size

2. Determine the size of the interval

i = R/n = 31/5 = 6.2

Taking into account the obtained intervals, we group the banks and obtain

3. Let's build an auxiliary table

Recognition group

Meaning of values ​​in a group

x i

Quantity of characteristic frequency (frequency)

f i

in % of total

ω

Cumulative frequency

S i

Middle of the interval

*fi

ω

I

5 – 11,2

6,8,7,5,8,6,10,9,9,7, 6,6,9,10,7,9,10,10, 11,8,9,8, 7, 6, 9, 10

43,3

43,3

210,6

350,73

46,24

1202,24

II

11,2 – 17,4

16,15,13,12,14,14, 12,14,17,13,15,17, 14

21,7

14,3

185,9

310,31

0,36

4,68

III

17,4 – 23,6

18,21,20,20,21,18, 19,22,21,21,21,18, 19

21,7

86,7

20,5

266,5

444,85

31,36

407,68

IV

23,6 –29,8

28,29,25,28, 24

26,7

133,5

221,61

11,8

139,24

696,2

V

29,8 – 36

36,35,33,

32,9

98,7

164,5

TOTAL

895,2

1492

541,2

3282,8

4. The average value of a characteristic in the population under study is determined by the arithmetic weighted formula:

of the year

5. The variance and standard deviation of a characteristic are determined by the formula



Determination of variability


Thus, V>33.3%, therefore, the population is heterogeneous.

6. Definition of fashion

Mode is the value of a characteristic that occurs most frequently in the population being studied. In the interval variation series under study, the mode is calculated using the formula:


Where

x M0
– lower limit of the modal interval:

i M0– the value of the modal interval;

f M0-1 f M0 f M0+1– frequencies (frequencies) of modal, pre-modal and post-modal intervals, respectively.

A modal interval is the interval that has the greatest frequency (frequency). In our problem, this is the first interval.


7. Calculate the median.

Median is an option located in the middle of an ordered variation series, dividing it into two equal parts, such that half of the population units have attribute values ​​less than the median, and half more than the median.

In an interval series, the median is determined by the formula:


where is the beginning of the median interval;

– value of the median interval

– frequency of the median interval;

– the sum of accumulated frequencies in the pre-median interval.

The median interval is the interval in which the serial number of the median is located. To determine it, it is necessary to calculate the sum of the accumulated frequencies to a number exceeding half of the totality.

According to gr. 5 of the auxiliary table we find the interval in which the amount of accumulated amounts often exceeds 50%. This is the second interval - from 11.6 to 18.4, and it is the median.

Then


Consequently, half of the workers with work experience are less than 13.25 years, and half have more than this value.

6. Let us depict the series in the form of a polygon, a histogram, a cumulative line, or an ogive.

Graphic representation plays an important role in the study of variation series, as it allows one to analyze statistical data in a simple and visual form.

There are several ways to graphically display series (histogram, polygon, cumulate, ogive), the choice of which depends on the purpose of the study and on the type of variation series.

A distribution polygon is mainly used to depict a discrete series, but you can also construct a polygon for an interval series if you first convert it to a discrete series. The distribution polygon is a closed broken line in a rectangular coordinate system with coordinates (x i, q i), where x i is the value of the i-th feature, q i is the frequency or frequency of the i-ro feature.

A distribution histogram is used to display an interval series. To construct a histogram, segments equal to the intervals of the characteristic are laid out sequentially on the horizontal axis, and on these segments, as on bases, rectangles are constructed, the heights of which are equal to the frequencies or particulars for a series with equal intervals, densities; for a series with unequal intervals.


Cumulates are a graphical representation of a variation series, when accumulated frequencies or particulars are plotted on the vertical axis, and characteristic values ​​are plotted on the horizontal axis. The cumulate is used for graphical representation of both discrete and interval variation series.


Conclusion: Thus, the main indicators of variation of the series under study were calculated: the average value of the attribute - production experience is 14.9 years, the dispersion was calculated to be 54.713, in turn, the standard deviation of the attribute is 7.397. The mode has a value of 9.13, and the modal interval is the first interval of the series being studied. The median of the series, equal to 13.108, divides the series into two equal parts, indicating that in the organization under study, half of the employees have less than 13.108 years of work experience, and half have more.

TASK 2

The following initial data are available that characterize the dynamics for 1997 – 2001. (table 2).

Table 2 Initial data

Year

1997

1998

1999

2000

2001

Production of granulated sugar, thousand tons

1620

1660

1700

1680

1700

Determine the main indicators of the dynamics series. Present the calculation in the form of a table. Calculate the average annual values ​​of the indicators. In the form of a graphic image - a polygon, indicate the dynamics of the analyzed indicator. Draw a conclusion.

SOLUTION

Given

Year

Years

1997

1998

1999

2000

2001

1620

1660

1700

1680

1700

1) The average level of dynamics is calculated using the formula


2) Chain and base growth rates are calculated as follows:

1. Absolute growth is determined by the formula:

Аib = yi – y0

Aic = yi – yi-1

2. The growth rate is determined by the formula: (%)

Trb = (yi / y0) *100

Trc = (yi / yi-1)*100

3. The growth rate is determined by the formula: (%)

Тnрb = Трb –100%:

Tnrts = Trts – 100%

4. Average absolute increase:


y n
– final level of the time series;

y 0
– initial level of the dynamic series;

n c
– number of chain absolute increases.

5. Average annual growth rate:


6. Average annual growth rate:


3) Absolute content of 1% increase:

A = Xi-1/100

We summarize all calculated indicators in a table.

Indicators

Years

1997

1998

1999

2000

2001

Number of surgical operations during the period

1620

1660

1700

1680

1700

2. Absolute increase

Aic

3. Growth rate

Trib

102,5

104,9

103,7

104,9

Trits

102,5

102,4

98,8

101,2

4. Growth rate

Тпib

Tpitz

5. Value of 1% increase

16,2

16,6

17,0

16,8

5) Average annual value


7. Let's depict it graphically in the form of a polygon.


Thus, the following is obtained. The largest absolute and relative increase in surgical operations for the period was in 1999 and amounted to 1700, the absolute increase compared to the base year was 80 operations, the growth rate compared to the base year 1997 was 104.9%, and the base growth rate was 4.9 %. The largest chain absolute increases were in 1998 and 1999 – 40 operations each. The highest chain growth rate was observed in 1998 - 102.5%, and the lowest chain growth rate in the number of transactions was in 2000 - 98.8%.

TASK 3

There are data on sales of goods (see table 3)

Table 3 Initial data on sales of goods

Product

Base year

Reporting year

quantity

price

quantity

price

1100

1000

1350

1300

1650

1700

Determine: a) individual indices ( i p , i q); b) general indices (I p, I q, I pq); c) absolute change in trade turnover due to: 1) the number of goods; 2) prices.

Draw a conclusion based on the calculated indicators.

SOLUTION

Let's create an auxiliary table

View

Basic

Reporting

Work

Indexes

Quantity, q 0

Price, p 0

Quantity, q 1

Price, p 1

q 0 * p 0

q 1 * p 1

i q =q 1 /q 0

i p =p 1 /p 0

q 1 * p 0

44000

35000

0,875

0,909

38500

1100

1000

41800

40000

0,909

1,053

38000

7500

8400

1,200

0,933

9000

1350

1300

40500

26000

0,667

0,963

27000

45000

44000

1,100

0,889

49500

1650

1700

26400

25500

1,030

0,938

27200

TOTAL

205200

178900

189200


Conclusion: As we can see, the total increase in trade turnover for the year was (-26,300) conventional units, including the impact of a change in the quantity of goods sold by - 16,000 and due to changes in the price of goods - 10,300 conventional units. The overall increase in trade turnover was 87.2%. It should be noted that according to the calculated indices of the quantity of goods by assortment, there is a slight increase in turnover for product “P” by 120% and product “C” by 110%, a slight increase in sales of product “T” is only 103%. Sales of goods “P” decreased quite significantly - only 66.7% of sales in the base year, sales of goods “N” were slightly higher - 87.5% and goods “O” - 90.9% of the corresponding indicator for the base year. The individual price index shows that the price increased only for the product “O” - by 105.3%, while at the same time for all other product names - “N”, “P”, “R”, “S”, “T” the individual price index indicates negative dynamics (decrease), respectively – 90.9%; 93.3%;, 96.3%, 88.9; 93.8.

The general index of physical sales volume indicates a slight decrease in total sales volume by 94.6%; the general price index indicates a general decrease in the price of goods sold by 92.2%, and the general turnover index indicates a general decrease in trade turnover by 87.2%.

TASK 4

From the initial data of table No. 1 (select rows from 14 to 23) based on two characteristics - length of service and wages - conduct a correlation-regression analysis, determine the parameters of correlation and determination. Construct a graph of the correlation between two characteristics (resultative and factorial). Draw a conclusion.

SOLUTION

Initial data

Production experience

Salary amount

1800

2500

1750

1580

1750

1560

1210

1860

1355

1480

Straight-line dependence

The parameters of the equation are determined using the least squares method, using the system of normal equations


To solve the system we use the method of determinants.

Parameters are calculated using formulas

2.1 Statistical study design

Statistical data analysis systems are a modern, effective tool for statistical research. Special statistical analysis systems, as well as universal tools - Excel, Matlab, Mathcad, etc., have ample opportunities for processing statistical data.

But even the most advanced tool cannot replace the researcher, who must formulate the purpose of the study, collect data, select methods, approaches, models and tools for processing and analyzing data, and interpret the results obtained.

Figure 2.1 shows a diagram of the statistical study.

Fig. 2.1 - Schematic diagram of statistical research

The starting point of statistical research is the formulation of the problem. When determining it, the purpose of the study is taken into account, what information is needed and how it will be used when making a decision is determined.

The statistical study itself begins with the preparatory stage. During the preparatory stage, analysts study technical task– a document drawn up by the customer of the study. The terms of reference must clearly state the objectives of the research:

    the object of research is determined;

    assumptions and hypotheses that must be confirmed or refuted during the study are listed;

    describes how the research results will be used;

    the time frame within which the study must be carried out and the budget for the study.

Based on the technical specifications, it is developed analytical report structure- That, in any form the results of the study must be presented, as well as statistical observation program. The program is a list of signs that must be recorded during the observation process (or questions to which reliable answers must be obtained for each surveyed observation unit). The content of the program is determined both by the characteristics of the observed object and the objectives of the study, as well as by the methods chosen by analysts for further processing of the collected information.

The main stage of statistical research includes the collection of necessary data and their analysis.

The final stage of the research is drawing up an analytical report and submitting it to the customer.

In Fig. Figure 2.2 presents a diagram of statistical data analysis.

Fig.2.2 – Main stages of statistical analysis

2.2 Collection of statistical information

Collecting materials involves analyzing the technical specifications of the study, identifying sources of necessary information and (if necessary) developing questionnaires. When researching sources of information, all required data is divided into primary(data that is not available and must be collected directly for this study), and secondary(previously collected for other purposes).

Secondary data collection is often referred to as “desk” or “library” research.

Examples of collecting primary data: observing store visitors, surveying hospital patients, discussing a problem at a meeting.

Secondary data is divided into internal and external.

Examples of internal secondary data sources:

    information system of the organization (including an accounting subsystem, a sales management subsystem, CRM (CRM system, short for Customer Relationship Management) - application software for organizations designed to automate strategies for interacting with customers) and others);

    previous studies;

    written reports from employees.

Examples of external secondary data sources:

    reports from statistical bodies and other government agencies;

    reports from marketing agencies, professional associations, etc.;

    electronic databases (address directories, GIS, etc.);

    libraries;

    mass media.

The main outputs at the data collection stage are:

    planned sample size;

    sample structure (presence and size of quotas);

    type of statistical observation (data collection, survey, questionnaire, measurement, experiment, examination, etc.);

    information about survey parameters (for example, the possibility of falsification of questionnaires);

    scheme for encoding variables in the database of the program selected for processing;

    data conversion plan;

    plan diagram of the statistical procedures used.

This same stage includes the survey procedure itself. Of course, questionnaires are developed only to obtain primary information.

The received data must be edited and prepared accordingly. Each questionnaire or observation form is checked and, if necessary, adjusted. Each answer is assigned numeric or letter codes - the information is encoded. Data preparation includes editing, transcribing and checking data, coding and necessary transformations.

2.3 Determination of sample characteristics

As a rule, data collected as a result of statistical observation for statistical analysis is a sample population. The sequence of data transformation into the process of statistical research can be schematically represented as follows (Fig. 2.3)

Fig 2.3 Statistical data conversion scheme

By analyzing a sample, it is possible to draw conclusions about the population represented by the sample.

Final determination of general sampling parameters produced when all the questionnaires have been collected. It includes:

    determining the actual number of respondents,

    determination of the sampling structure,

    distribution by survey location,

    establishing a confidence level for the statistical reliability of the sample,

    calculation of statistical error and determination of representativeness of the sample.

Real quantity respondents may turn out to be more or less than planned. The first option is better for analysis, but is disadvantageous for the customer of the study. The second may have a negative impact on the quality of the research, and, therefore, is not beneficial for either analysts or customers.

Sampling structure may be random or non-random (respondents were selected based on a previously known criterion, for example, by the quota method). Random samples are a priori representative. Non-random samples may be intentionally unrepresentative of the population but provide important information for research. In this case, you should also carefully consider the filtering questions of the questionnaire, which are designed specifically to filter out respondents who do not meet the requirements.

For determining the accuracy of the assessment First of all, it is necessary to set the confidence level (95% or 99%). Then the maximum statistical error sample is calculated as

or
,

Where - sample size, - the probability of the occurrence of the event under study (the respondent being included in the sample), - the probability of the opposite event (the respondent not being included in the sample), - confidence coefficient,
- variance of the characteristic.

Table 2.4 shows the most commonly used values ​​of confidence probability and confidence coefficients.

Table 2.4

2.5 Data processing on a computer

Analyzing data using a computer involves performing a number of necessary steps.

1. Determination of the structure of the source data.

2. Entering data into the computer in accordance with its structure and program requirements. Editing and converting data.

3. Specifying a data processing method in accordance with the objectives of the study.

4. Obtaining the result of data processing. Editing it and saving it in the required format.

5. Interpretation of the processing result.

No computer program can perform steps 1 (preparatory) and 5 (final) - the researcher does them himself. Steps 2-4 are performed by the researcher using the program, but it is the researcher who determines the necessary procedures for editing and transforming data, methods of data processing, as well as the format for presenting the processing results. The computer's help (steps 2–4) ultimately involves moving from a long sequence of numbers to a more compact one. At the “input” of the computer, the researcher submits an array of initial data that is inaccessible to comprehension, but suitable for computer processing (step 2). Then the researcher gives the program a command to process the data in accordance with the task and data structure (step 3). At the “output”, he receives the result of processing (step 4) - also an array of data, only smaller, accessible to comprehension and meaningful interpretation. At the same time, an exhaustive analysis of data usually requires repeated processing using different methods.

2.6 Selecting a data analysis strategy

The choice of strategy for analyzing the collected data is based on knowledge of the theoretical and practical aspects of the subject area under study, the specifics and known characteristics of the information, the properties of specific statistical methods, as well as the experience and views of the researcher.

It must be remembered that data analysis is not the final goal of the study. Its goal is to obtain information that will help solve a specific problem and make adequate management decisions. The choice of analysis strategy should begin with an examination of the results of the previous stages of the process: defining the problem and developing a research plan. A preliminary data analysis plan developed as one element of a research plan is used as a “draft”. Then, as additional information becomes available at later stages of the research process, certain changes may need to be made.

Statistical methods are divided into one- and multivariate. Univariate methods are used when all elements of the sample are assessed by one indicator, or if there are several of these indicators for each element, but each variable is analyzed separately from all the others.

Multivariate techniques are excellent for data analysis when two or more measures are used to evaluate each sample element and these variables are analyzed simultaneously. Such methods are used to determine dependencies between phenomena.

Multivariate methods differ from univariate methods primarily in that when they are used, the focus of attention shifts from the levels (averages) and distributions (variances) of phenomena and focuses on the degree of relationship (correlation or covariance) between these phenomena.

Univariate methods can be classified based on whether the data being analyzed is metric or non-metric (Figure 3). Metric data is measured on an interval scale or a relative scale. Nonmetric data is assessed on a nominal or ordinal scale

Additionally, these methods are divided into classes based on how many samples—one, two, or more—are analyzed in the study.

The classification of one-dimensional statistical methods is presented in Fig. 2.4.

Rice. 2.4 Classification of univariate statistical methods depending on the analyzed data

The number of samples is determined by how the data is handled for a particular analysis, not by how the data was collected. For example, data on males and females can be obtained within the same sample, but if the analysis is aimed at identifying differences in perception based on gender differences, the researcher will have to operate with two different samples. Samples are considered independent if they are not experimentally related to each other. Measurements taken in one sample do not affect the values ​​of variables in another. For analysis, data from different groups of respondents, such as those collected from females and males, are usually treated as independent samples.

On the other hand, if the data from two samples refer to the same group of respondents, the samples are considered paired - dependent.

If there is only one sample of metric data, z-test and t-test can be used. If there are two or more independent samples, in the first case you can use the z- and t-test for two samples, in the second - the method of one-way analysis of variance. For two related samples, a paired t-test is used. If we are talking about non-metric data from a single sample, the researcher can use frequency distribution tests, chi-square, Kolmogorov-Smirnov (K~S) test, series test and binomial test. For two independent samples with non-metric data, you can resort to the following methods of analysis: chi-square, Mann-Whitney, medians, K-S, one-way analysis of variance Kruskal-Wallis (ANOVA). In contrast, if there are two or more related samples, the sign, McNemar, and Wilcoxon tests should be used.

Multivariate statistical methods are aimed at identifying existing patterns: interdependence of variables, relationship or sequence of events, inter-object similarity.

Quite conventionally, we can distinguish five standard types of patterns, the study of which is of significant interest: association, sequence, classification, clustering and forecasting

An association occurs when several events are related to each other. For example, a study conducted in a supermarket may show that 65% of those who buy corn chips also buy Coca-Cola, and if there is a discount for such a set, they buy Coke in 85% of cases. Having information about such an association, it is easy for managers to assess how effective the discount provided is.

If there is a chain of events related in time, then we talk about a sequence. For example, after buying a house, in 45% of cases, a new kitchen stove is purchased within a month, and within two weeks, 60% of new residents acquire a refrigerator.

With the help of classification, signs are identified that characterize the group to which a particular object belongs. This is done by analyzing already classified objects and formulating some set of rules.

Clustering differs from classification in that the groups themselves are not predefined. Using clustering, various homogeneous groups of data are identified.

The basis for all kinds of forecasting systems is historical information stored in the form of time series. If it is possible to construct patterns that adequately reflect the dynamics of the behavior of target indicators, there is a possibility that with their help it is possible to predict the behavior of the system in the future.

Multivariate statistical methods can be divided into relationship analysis methods and classification analysis (Fig. 2.5).

Fig. 2.5 – Classification of multivariate statistical methods

The concept of studying the quantitative aspects of objects and phenomena was formed a long time ago, from the moment a person developed basic skills in working with information. However, the term “statistics”, which has come down to our time, was borrowed much later from the Latin language and comes from the word “status”, which means “a certain state of things”. “Status” was also used in the meaning of “political state” and was fixed in almost all European languages ​​in this semantic meaning: the English “state”, the German “Staat”, the Italian “stato” and its derivative “statista” - an expert on the state.

The word “statistics” received widespread use in the 18th century and was used to mean “state science.” Statistics is a branch of practical activity aimed at collecting, processing, analyzing and providing for public use data about phenomena and processes of social life.

Analysis is a method of scientific study of an object by considering its individual aspects and components.

Economic-statistical analysis is the development of a methodology based on the widespread use of traditional statistical and mathematical-statistical methods in order to control the adequate reflection of the phenomena and processes under study.

Stages of statistical research. Statistical research takes place in three stages:

  • 1) statistical observation;
  • 2) summary of the data obtained;
  • 3) statistical analysis.

At the first stage, primary statistical data is collected using the mass observation method.

At the second stage of statistical research, the collected data undergoes primary processing, summary and grouping. The grouping method allows you to identify homogeneous populations and divide them into groups and subgroups. A summary is the obtaining of results for the population as a whole and its individual groups and subgroups.

The grouping and summary results are presented in the form of statistical tables. The main content of this stage is the transition from the characteristics of each observation unit to the summary characteristics of the population as a whole or its groups.

At the third stage, the obtained summary data is analyzed by the method of generalizing indicators (absolute, relative and average values, variation indicators, index systems, methods of mathematical statistics, tabular method, graphical method, etc.).

Basics of statistical analysis:

  • 1) approval of facts and establishment of their assessment;
  • 2) identifying the characteristic features and causes of the phenomenon;
  • 3) comparison of the phenomenon with normative, planned and other phenomena that are taken as the basis for comparison;
  • 4) formulation of conclusions, forecasts, assumptions and hypotheses;
  • 5) statistical testing of the put forward assumptions (hypotheses).

Analysis and generalization of statistical data is the final stage of statistical research, the ultimate goal of which is to obtain theoretical conclusions and practical conclusions about the trends and patterns of the socio-economic phenomena and processes being studied. The objectives of statistical analysis are: determining and assessing the specifics and characteristics of the phenomena and processes being studied, studying their structure, relationships and patterns of their development.

Statistical analysis of data is carried out in inextricable connection with a theoretical, qualitative analysis of the essence of the phenomena under study and the corresponding quantitative tools, the study of their structure, connections and dynamics.

Statistical analysis is the study of the characteristic features of the structure, relationships of phenomena, trends, patterns of development of socio-economic phenomena, for which specific economic-statistical and mathematical-statistical methods are used. Statistical analysis concludes with the interpretation of the results obtained.

In statistical analysis, signs are divided according to the nature of their influence on each other:

  • 1. Result trait - the trait analyzed in this study. The individual dimensions of such a feature in individual elements of the population are influenced by one or more other features. In other words, the result-attribute is considered as a consequence of the interaction of other factors;
  • 2. Sign-factor - a sign that influences the characteristic under study (sign-result). Moreover, the relationship between the factor-attribute and the result-attribute can be quantitatively determined. Synonyms for this term in statistics are “factor characteristic”, “factor”. It is necessary to distinguish between the concepts of factor-attribute and weight-attribute. A weight feature is a feature that must be taken into account in calculations. But the weight trait does not affect the trait being studied. A factor attribute can be considered as a weight attribute, i.e., taken into account in calculations, but not every weight attribute is a factor attribute. For example, when studying in a group of students the relationship between the time to prepare for an exam and the number of points received on the exam, the third characteristic should also be taken into account: “The number of people certified for a certain score.” The last feature does not affect the result, however, it will be included in the analytical calculations. It is this kind of attribute that is called a weight attribute, and not a factor attribute.

Before starting the analysis, it is necessary to check whether the conditions are met to ensure its reliability and correctness:

  • - Reliability of primary digital data;
  • - Completeness of coverage of the population being studied;
  • - Comparability of indicators (by accounting units, territory, calculation method).

The main concepts of statistical analysis are:

  • 1. Hypothesis;
  • 2. Decisive function and decisive rule;
  • 3. Sample from the general population;
  • 4. Assessment of characteristics of the general population;
  • 5. Confidence interval;
  • 6. Trend;
  • 7. Statistical relationship.

Analysis is the final stage of statistical research, the essence of which is to identify relationships and patterns of the phenomenon being studied, formulate conclusions and proposals.

Any statistical study is based on three interrelated stages of work:

1) statistical observation;

2) summary and grouping of observation data;

3) scientific processing and analysis of the summary results. Each subsequent stage of a statistical study can be carried out provided that the previous (preceding) stages of work have been carried out.

Statistical observation is the first stage of statistical research.

Statistical observation- this is a systematic, scientifically organized collection of information about a particular set of social and, in particular, economic phenomena or processes.

Statistical observations are very diverse and differ in the nature of the phenomena being studied, the form of organization, the time of observation, and the completeness of coverage of the phenomena being studied. In this regard, it was carried out classification of statistical observations according to individual characteristics .

1. According to the form of organization statistical observations are divided into reporting and specially organized statistical observations.

Reporting– this is the main organizational form of statistical observation, which boils down to collecting information from enterprises, institutions and organizations about various aspects of their activities on special forms called reports. Reporting is mandatory. Reporting is divided into basic and current depending on the duration of the period for which it is prepared.

Basic reporting also called annual and contains the widest range of indicators covering all aspects of the enterprise’s activities.

Current reporting presented throughout the year for periods of varying duration.

However, there are data that are fundamentally impossible to obtain from reporting and data that are inappropriate to include in it. It is to obtain these two types of data that specially organized statistical observations are used - various types of surveys and censuses.

Statistical surveys- these are specially organized observations in which the studied set of phenomena is observed over a certain period of time.

Census– this is a form of specially organized statistical observation in which the studied set of phenomena is observed on some date (at some moment).

2. Based on time All statistical observations are divided into continuous and discontinuous.

Continuous (current) statistical observation- This is an observation that is carried out continuously over time. With this type of observation, individual phenomena, facts, and events are recorded as they occur.


Intermittent statistical observation– this is an observation in which the observed phenomena, facts, events are recorded not continuously, but through periods of time of equal or unequal duration. There are two types of continuous monitoring – periodic and one-time. Periodic called discontinuous observation, which is carried out over periods of time of equal duration. One-time is called observation that is carried out over periods of time of unequal duration or of a one-time nature.

3. Based on the completeness of coverage of the studied mass phenomena, facts, events, statistical observations are divided into continuous and non-continuous, or partial.

Continuous observation aims to take into account all, without exception, phenomena, facts, events that form the population under study.

Partial observation aims to take into account only a certain part of the phenomena, facts, events that form the population under study.