Data Quality
199 pages
English

Vous pourrez modifier la taille du texte de cet ouvrage

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris

Découvre YouScribe en t'inscrivant gratuitement

Je m'inscris
Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
199 pages
English

Vous pourrez modifier la taille du texte de cet ouvrage

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

Good data is a source of myriad opportunities, while bad data is a tremendous burden. Companies that manage their data effectively are able to achieve a competitive advantage in the marketplace, while bad data, like cancer, can weaken and kill an organization.
In this comprehensive book, Rupa Mahanti provides guidance on the different aspects of data quality with the aim to be able to improve data quality. Specifically, the book addresses:
Causes of bad data quality, bad data quality impacts, and importance of data quality to justify the case for data quality
Butterfly effect of data quality
A detailed description of data quality dimensions and their measurement
Data quality strategy approach
Six Sigma - DMAIC approach to data quality
Data quality management techniques
Data quality in relation to data initiatives like data migration, MDM, data governance, etc.
Data quality myths, challenges, and critical success factors
Students, academicians, professionals, and researchers can all use the content in this book to further their knowledge and get guidance on their own specific projects. It balances technical details (for example, SQL statements, relational database components, data quality dimensions measurements) and higher-level qualitative discussions (cost of data quality, data quality strategy, data quality maturity, the case made for data quality, and so on) with case studies, illustrations, and real-world examples throughout.
About the Author
Rupa Mahanti, Ph.D. is a Business and Information Management consultant and has worked in different solution environments and industry sectors in the United States, United Kingdom, India, and Australia. She helps clients with activities such as business process mapping, information management, data quality, and strategy. Having a work experience (academic, industry, and research) of more than a decade and half, Rupa has guided a doctoral dissertation and published a large number of research articles. She is an associate editor with the journal Software Quality Professional and a reviewer for several international journals.
"This is not the kind of book that you'll read one time and be done with. So scan it quickly the first time through to get an idea of its breadth. Then dig in on one topic of special importance to your work. Finally, use it as a reference to guide your next steps, learn details, and broaden your perspective."
from the foreword by Thomas C. Redman, Ph.D., the Data Doc
Dr. Mahanti provides a very detailed and thorough coverage of all aspects of data quality management that would suit all ranges of expertise from a beginner to an advanced practitioner. With plenty of examples, diagrams, etc. the book is easy to follow and will deepen your knowledge in the data domain. I will certainly keep this handy as my go-to reference. I can't imagine the level of effort and passion that Dr. Mahanti has put into this book that captures so much knowledge and experience for the benefit of the reader. I would highly recommend this book for its comprehensiveness, depth, and detail. A must-have for a data practitioner at any level.
Clint D'Souza, CEO and Director, CDZM Consulting

Sujets

Informations

Publié par
Date de parution 18 mars 2019
Nombre de lectures 1
EAN13 9781951058685
Langue English
Poids de l'ouvrage 1 Mo

Informations légales : prix de location à la page 0,6750€. Cette information est donnée uniquement à titre indicatif conformément à la législation en vigueur.

Extrait

Data Quality



Also available from ASQ Quality Press:
Quality Experience Telemetry: How to Effectively Use Telemetry for Improved Customer Success
Alka Jarvis, Luis Morales, and Johnson Jose
Linear Regression Analysis with JMP and R
Rachel T. Silvestrini and Sarah E. Burke
Navigating the Minefield: A Practical KM Companion
Patricia Lee Eng and Paul J. Corney
The Certified Software Quality Engineer Handbook , Second Edition
Linda Westfall
Introduction to 8D Problem Solving: Including Practical Applications and Examples
Ali Zarghami and Don Benbow
The Quality Toolbox , Second Edition
Nancy R. Tague
Root Cause Analysis: Simplified Tools and Techniques , Second Edition
Bjørn Andersen and Tom Fagerhaug
The Certified Six Sigma Green Belt Handbook , Second Edition
Roderick A. Munro, Govindarajan Ramu, and Daniel J. Zrymiak
The Certified Manager of Quality/Organizational Excellence Handbook , Fourth Edition
Russell T. Westcott, editor
The Certified Six Sigma Black Belt Handbook , Third Edition
T. M. Kubiak and Donald W. Benbow
The ASQ Auditing Handbook , Fourth Edition
J.P. Russell, editor
The ASQ Quality Improvement Pocket Guide: Basic History, Concepts, Tools, and Relationships
Grace L. Duffy, editor
To request a complimentary catalog of ASQ Quality Press publications, call 800-248-1946, or visit our website at http://www.asq.org/quality-press .


Data Quality
Dimensions, Measurement, Strategy, Management, and Governance
Dr. Rupa Mahanti
ASQ Quality Press
Milwaukee, Wisconsin



American Society for Quality, Quality Press, Milwaukee 53203
© 2018 by ASQ
All rights reserved. Published 2018
Library of Congress Cataloging-in-Publication Data
Names: Mahanti, Rupa, author.
Title: Data quality : dimensions, measurement, strategy, management, and
governance / Dr. Rupa Mahanti.
Description: Milwaukee, Wisconsin : ASQ Quality Press, [2019] | Includes
bibliographical references and index.
Identifiers: LCCN 2018050766 | ISBN 9780873899772 (hard cover : alk. paper)
Subjects: LCSH: Database management—Quality control.
Classification: LCC QA76.9.D3 M2848 2019 | DDC 005.74—dc23
LC record available at https://lccn.loc.gov/2018050766
ISBN: 978-0-87389-977-2
No part of this book may be reproduced in any form or by any means, electronic, mechanical, photocopying, recording, or otherwise, without the prior written permission of the publisher.
Publisher: Seiche Sanders
Sr. Creative Services Specialist: Randy L. Benson
ASQ Mission: The American Society for Quality advances individual, organizational, and community excellence worldwide through learning, quality improvement, and knowledge exchange.
Attention Bookstores, Wholesalers, Schools, and Corporations: ASQ Quality Press books, video, audio, and software are available at quantity discounts with bulk purchases for business, educational, or instructional use. For information, please contact ASQ Quality Press at 800-248-1946, or write to ASQ Quality Press, P.O. Box 3005, Milwaukee, WI 53201-3005.
To place orders or to request ASQ membership information, call 800-248-1946. Visit our website at http://www.asq.org/quality-press .



List of Figures and Tables
Figure 1.1 Categories of data.
Figure 1.2 Metadata categories.
Table 1.1 Characteristics of data that make them fit for use.
Figure 1.3 The data life cycle.
Figure 1.4 Causes of bad data quality.
Figure 1.5 Data migration/conversion process.
Figure 1.6 Data integration process.
Figure 1.7 Bad data quality impacts.
Figure 1.8 Prevention cost:Correction cost:Failure cost.
Figure 1.9 Butterfly effect on data quality.
Figure 2.1a Layout of a relational table.
Figure 2.1b Table containing customer data.
Figure 2.2 Customer and order tables.
Figure 2.3a Data model—basic styles.
Figure 2.3b Conceptual, logical, and physical versions of a single data model.
Table 2.1 Comparison of conceptual, logical, and phsycial model.
Figure 2.4 Possible sources of data for data warehousing.
Figure 2.5 Star schema design.
Figure 2.6 Star schema example.
Figure 2.7 Snowflake schema design.
Figure 2.8 Snowflake schema example.
Figure 2.9 Data warehouse structure.
Figure 2.10 Data hierarchy in a database.
Table 2.2 Common terminologies.
Figure 3.1 Data hierarchy and data quality metrics.
Figure 3.2 Commonly cited data quality dimensions.
Figure 3.3 Data quality dimensions.
Figure 3.4 Customer contact data set completeness.
Figure 3.5 Incompleteness illustrated through a data set containing product IDs and product names.
Figure 3.6 Residential address data set having incomplete ZIP code data.
Figure 3.7 Customer data—applicable and inapplicable attributes.
Figure 3.8 Different representations of an individual’s name.
Figure 3.9 Name format.
Table 3.1 Valid and invalid values for employee ID.
Figure 3.10 Standards/formats defined for the customer data set in Figure 3.11.
Figure 3.11 Customer data set—conformity as defined in Figure 3.10.
Figure 3.12 Customer data set—uniqueness.
Figure 3.13 Employee data set to illustrate uniqueness.
Figure 3.14 Data set in database DB1 compared to data set in database DB2.
Table 3.2 Individual customer name formatting guidelines for databases DB1, DB2, DB3, and DB4.
Figure 3.15 Customer name data set from database DB1.
Figure 3.16 Customer name data set from database DB2.
Figure 3.17 Customer name data set from database DB3.
Figure 3.18 Customer name data set from database DB4.
Figure 3.19 Name data set to illustrate intra-record consistency.
Figure 3.20 Full Name field values and values after concatenating First Name, Middle Name, and Last Name.
Figure 3.21 Name data set as per January 2, 2016.
Figure 3.22 Name data set as per October 15, 2016.
Figure 3.23 Customer table and order table relationships and integrity.
Figure 3.24 Employee data set illustrating data integrity.
Figure 3.25 Name granularity.
Table 3.3 Coarse granularity versus fine granularity for name.
Figure 3.26 Address granularity.
Table 3.4 Postal address at different levels of granularity.
Figure 3.27 Employee data set with experience in years recorded values having less precision.
Figure 3.28 Employee data set with experience in years recorded values having greater precision.
Figure 3.29 Address data set in database DB1 and database DB2.
Figure 3.30 Organizational data flow.
Table 3.5 Data quality dimensions—summary table.
Table 4.1 Data quality dimensions and measurement.
Table 4.2 Statistics for annual income column in the customer database.
Table 4.3 Employee data set for Example 4.1.
Table 4.4 Social security number occurrences for Example 4.1.
Figure 4.1 Customer data set for Example 4.2.
Figure 4.2 Business rules for date of birth completeness for Example 4.2.
Table 4.5 “Customer type” counts for Example 4.2.
Figure 4.3 Employee data set—incomplete records for Example 4.3.
Table 4.6 Employee reference data set.
Table 4.7 Employee data set showing duplication of social security number (highlighted in the same shade) for Example 4.5.
Table 4.8 Number of occurrences of employee ID values for Example 4.5.
Table 4.9 Number of occurrences of social security number values for Example 4.5.
Table 4.10 Employee reference data set for Example 4.6.
Table 4.11 Employee data set for Example 4.6.
Figure 4.4 Metadata for data elements Employee ID, Employee Name, and Social Security Number for Example 4.7.
Table 4.12 Employee data set for Example 4.7.
Table 4.13 Valid and invalid records for Example 4.8.
Table 4.14 Reference employee data set for Example 4.9.
Table 4.15 Employee data set for Example 4.9.
Table 4.16 Employee reference data set for Example 4.10.
Table 4.17 Accurate versus inaccurate records for Example 4.10.
Table 4.18 Sample customer data set for Example 4.11.
Figure 4.5 Customer data—data definitions for Example 4.11.
Table 4.19 Title and gender mappings for Example 4.11.
Table 4.20 Title and gender—inconsistent and consistent values for Example 4.11.
Table 4.21 Consistent and inconsistent values (date of birth and customer start date combination) for Example 4.11.
Table 4.22 Consistent and inconsistent values (customer start date and customer end date combination) for Example 4.11.
Table 4.23 Consistent and inconsistent values (date of birth and customer end date combination) for Example 4.11.
Table 4.24 Consistent and inconsistent values (full name, first name, middle name, and last name data element combination) for Example 4.11.
Table 4.25 Consistency results for different data element combinations for Example 4.11.
Table 4.26 Record level consistency/inconsistency for Example 4.12.
Table 4.27a Customer data table for Example 4.13.
Table 4.27b Claim data table for Example 4.13.
Table 4.28a Customer data and claim data inconsistency/consistency for Example 4.13.
Table 4.28b Customer data and claim data inconsistency/consistency for Example 4.13.
Table 4.29 Customer sample data set for Example 4.17.
Table 4.30 Order sample data set for Example 4.17.
Table 4.31 Customer–Order relationship–integrity for Example 4.17.
Table 4.32 Address data set for Example 4.18.
Table 4.33 Customers who have lived in multiple addresses for Example 4.18.
Table 4.34 Difference in time between old address and current address for Example 4.18.
Figure 4.6 Data flow through systems where data are captured after the occurrence of the event.
Figure 4.7 Data flow through systems where data are captured at

  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • Podcasts Podcasts
  • BD BD
  • Documents Documents