Data Science or data science is a field of activity that involves the collection, processing and analysis of data - structured and unstructured, not just large. It uses methods of mathematical and statistical analysis, as well as software solutions. Data Science also works with Big Data, but its main goal is to find something valuable in the data to use it for specific tasks.

Contents

Taming Big Data with Apache SPARK Free ITIL 4 books
Supercomputers are used to process large amounts of data online: their power and computing capabilities are many times superior to conventional ones.

How Big Data works: how is big data collected and stored?

Big data is needed to analyze all the relevant factors and make the right decision. With the help of Big Data, simulation models are built to test a particular solution, idea, or product.

The main sources of big data, according to the Taming Big Data with Apache SPARK book (Frank Kane, 2020):

  • Internet of Things (IoT) and devices connected to it;
  • social networks, blogs and media;
  • company data: transactions, orders for goods and services, taxi rides and car sharing, customer profiles;
  • instrument readings: meteorological stations, meters of the composition of air and water bodies, data from satellites;
  • statistics of cities and states: data on movements, births and deaths;
  • medical data: analyzes, diseases, diagnostic images.

Since 2007, PRISM has appeared at the disposal of the FBI and the CIA - one of the most advanced services that collects personal data about all users of social networks, as well as services from Microsoft, Google, Apple, Yahoo and even telephone recordings.

Modern computing systems provide instant access to massive amounts of data. To store them, they use special data centers with the most powerful servers.

In addition to traditional, physical servers use cloud storage, "data lake" (data lake - storage of a large amount of unstructured data from a single source) and Hadoop - a framework consisting of a set of utilities for the development and execution of distributed computing programs. To work with Big Data, they use advanced methods of integration and management, as well as preparing data for analytics.

Big Data Analytics - How is Big Data Analyzed?

With high-performance technologies such as grid computing or in-memory analytics, companies can leverage any amount of big data for analysis. Sometimes Big Data is first structured, selecting only those that are needed for analysis. Increasingly, big data is being used for advanced analytics tasks, including artificial intelligence.

There are four main methods of Big Data analysis:

  • Descriptive analytics is the most common. It answers the question "What happened?", Analyzes the data coming in real time and historical data. The main goal is to find out the reasons and patterns of success or failure in a particular area in order to use this data for the most effective models. Descriptive analytics use basic math functions. A typical example is case studies or web statistics data that a company receives through Google Analytics.
  • Predictive analytics - helps to predict the most likely course of events based on the available data. To do this, use ready-made templates based on any objects or phenomena with a similar set of characteristics. Using predictive (or predictive, predictive) analytics, you can, for example, calculate the collapse or change in prices in the stock market. Or evaluate the potential borrower's ability to repay the loan.
  • Prescriptive analytics - the next level compared to predictive analytics. With the help of Big Data and modern technologies, it is possible to identify problem points in business or any other activity and calculate in which scenario they can be avoided in the future. Aurora Health Care network of medical centers saves $ 6 million annually due to prescriptive analytics: it was able to reduce the number of hospital readmissions by 10%.
  • Diagnostic analytics - uses data to analyze the cause of what happened. This helps to identify anomalies and random connections between events and actions.

For example, Amazon analyzes sales and gross margin data for various products to find out why they generated less revenue than expected.

Data is processed and analyzed using various tools and technologies.

Special software: NoSQL, MapReduce, Hadoop, R

Data mining - extraction of previously unknown data from arrays using a wide range of techniques;
AI and neural networks - for building models based on Big Data, including text and image recognition. For example, lottery operator Stoloto has made big data the backbone of its strategy within the Data-driven Organization. Using Big Data and artificial intelligence, the company analyzes customer experience and offers personalized products and services;
Analytic Data Visualization - Animated models or graphs created from big data.