Data Science


Data Science


Dr. Rajeswari Purushothaman

Publication Date:

May, 2021

No.of Pages: 282
Price: ₹ 300/-



Magestic Technology Solutions (P) Ltd.
Place of Publication:
Chennai, India.


In order to discover patterns in large quantities of organised and unstructured big data, data science is used by companies to analyse the data. Increasing efficiency and cost management and the identification of new market possibilities enable businesses to gain an edge in the marketplace. In order to get a suggestion from a personal assistant like Alexa or Siri, data science is required. Operating a selfdriving vehicle, utilising a search engine that returns relevant results, or interacting with a chatbot for customer support all qualifies as technological advancements. All of these are real-world examples of data science in action. When it comes to data science, the discipline involves massive mining datasets of raw data, both organised and unstructured, to discover patterns and derive meaningful knowledge from them. Because data science is a multidisciplinary subject, its underlying principles include statistics, inference, computer science, predictive analytics, machine learning algorithm development, and new tools for extracting insights from large amounts of data. Please start with the life cycle of data science to better understand it and enhance data science project management. The first step of the data science pipeline process is data collection, which includes collecting data, extracting data where necessary, and putting it into the system. Data management is the following step, and it involves data warehousing, data cleaning, data processing, data staging, and data architecture, among other things. Data processing is the next step and is considered to be one of the foundations of data science. Data scientists and data engineers are distinguishable from one another when it comes to data exploration and processing. At this stage, data mining, data classification and clustering, data x modelling, and summarising insights gained from the data are all performed, which are the procedures that result from inadequate data. Following that comes data analysis, which is an equally important step. During their time here, data scientists carry out exploratory and confirmation work and regression analysis, forecasting analysis, qualitative analysis, and text mining. When data science is done correctly, this step demonstrates why there is no cookie-cutter data science. The data scientist discusses his or her findings in the final step. Specifically, data visualisation, data reporting, and the use of different business intelligence tools are all part of helping companies, policymakers, and others make more informed decisions. Data preparation and analysis are the essential data science abilities, yet data preparation alone takes up 60 to 70% of a data scientist’s work on average, according to industry estimates. Data is seldom produced in a format that is corrected, organised, and free of noise. In this phase, the data is converted and prepared for future usage by the application. This stage of the process includes the transformation and sampling of data, the verification of both characteristics and observations, and statistical methods to eliminate noise from the data. This phase also reveals if the various characteristics of the data set are independent of one another and whether there are any missing values in the data set, among other things. By 2023, there will be about 40 zettabytes of data—or 40 trillion gigabytes—on the planet. The quantity of data that exists is increasing at an exponential rate. According to IBM and SINTEF, about 90 percent of this massive quantity of data is produced in the most recent two years at any given point in time. xi It is estimated that every day, internet users produce about 2.5 quintillion bytes of data. Approximately 146,880 gigabytes of data will be generated per day by 2025, and by 2025, this will amount to 165 zettabytes of data per year for every individual on Earth. This implies a tremendous amount of work to be done in data science—a great deal more to be discovered. Said, simple data analysis may be used to analyse information from a single source or a small quantity of data. In order to comprehend extensive data and data from many sources in a meaningful manner, however, the use of data science techniques is essential. This concept is shown by a look at some of the particular data science applications in business, which serve as an excellent introduction to data science. This book focuses on the fundamentals of Data Science, which will be beneficial for all aspiring Science and Engineering graduates to understand the basic levels and enrich their knowledge with the readiness to fine-tune themselves to the next level of development.

Keywords: Data Science, Data scientists, Data Processing, Data Analysis


  1. Agarwal, R., & Dhar, V. (2014). Big data, data science, and analytics: The opportunity and challenge for IS research.
  2. Bruce, P., Bruce, A., & Gedeck, P. (2020). Practical Statistics for Data Scientists: 50+ Essential Concepts Using R and Python. O’Reilly Media.
  3. Lahti, L., Marjanen, J., Roivainen, H., & Tolonen, M. (2019). Bibliographic Data Science and the History of the Book (c. 1500– 1800). Cataloguing & Classification Quarterly, 57(1), 5-23.
  4. Leskovec, J., Rajaraman, A., & Ullman, J. D. (2020). Mining of massive data sets. Cambridge university press.
  5. O’Neil, C., & Schutt, R. (2013). Doing data science: Straight talk from the frontline. ” O’Reilly Media, Inc.”.
  6. Ruehle, F. (2020). Data science applications to string theory. Physics Reports, 839, 1-117.
  7. Van Der Aalst, W. (2016). Data science in action. In Process Mining (pp. 3-23). Springer, Berlin, Heidelberg.


Rajeswari Purushothaman.,(2021). Data Sciene (1st ed., Vol. 1). Magestic Technology Solutions (P) Ltd. ISBN: 978-81-947070-1-1. DOI:

Leave a Reply

Your email address will not be published. Required fields are marked *