Market Data Analysis Using JMP
738 pages
English

Vous pourrez modifier la taille du texte de cet ouvrage

Market Data Analysis Using JMP

-

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus
738 pages
English

Vous pourrez modifier la taille du texte de cet ouvrage

Obtenez un accès à la bibliothèque pour le consulter en ligne
En savoir plus

Description

With the powerful interactive and visual functionality of JMP, you can dynamically analyze market data to transform it into actionable and useful
information with clear, concise, and insightful reports and displays. Market Data Analysis Using JMP is a unique example-driven book because it has a specific application focus: market data analysis. A working knowledge of JMP will help you turn your market data into vital knowledge that will help you
succeed in a highly competitive, fast-moving, and dynamic business world.


This book can be used as a stand-alone resource for working professionals, or as a supplement to a business school course in market data research. Anyone
who works with market data will benefit from reading and studying this book, then using JMP to apply the dynamic analytical concepts to their market data.


After reading this book, you will be able to quickly and effortlessly use JMP to:


  • prepare market data for analysis
  • use and interpret sophisticated statistical methods
  • build choice models
  • estimate regression models to turn data into useful and actionable information

Market Data Analysis Using JMP will teach you how to use dynamic graphics to illustrate your market data analysis and explore the vast possibilities that your data can offer!

Sujets

Informations

Publié par
Date de parution 05 février 2018
Nombre de lectures 1
EAN13 9781629604855
Langue English
Poids de l'ouvrage 7 Mo

Informations légales : prix de location à la page 0,0112€. Cette information est donnée uniquement à titre indicatif conformément à la législation en vigueur.

Exrait

The correct bibliographic citation for this manual is as follows: SAS Institute Inc. 2016 . Market Data Analysis Using JMP® Cary , NC: SAS Institute Inc.
Market Data Analysis Using JMP®
Copyright © 2016, SAS Institute Inc., Cary, NC, USA
ISBN 978-1-62960-408-4 (Hard copy)
ISBN 978-1-62960-485-5 (Epub)
ISBN 978-1-62960-486-2 (Mobi)
ISBN 978-1-62960-487-9 (PDF)
All Rights Reserved. Produced in the United States of America.
For a hard copy book: No part of this publication may be reproduced, stored in a retrieval system, or transmitted, in any form or by any means, electronic, mechanical, photocopying, or otherwise, without the prior written permission of the publisher, SAS Institute Inc.
For a web download or e-book: Your use of this publication shall be governed by the terms established by the vendor at the time you acquire this publication.
The scanning, uploading, and distribution of this book via the Internet or any other means without the permission of the publisher is illegal and punishable by law. Please purchase only authorized electronic editions and do not participate in or encourage electronic piracy of copyrighted materials. Your support of others' rights is appreciated.
U.S. Government License Rights; Restricted Rights: The Software and its documentation is commercial computer software developed at private expense and is provided with RESTRICTED RIGHTS to the United States Government. Use, duplication, or disclosure of the Software by the United States Government is subject to the license terms of this Agreement pursuant to, as applicable, FAR 12.212, DFAR 227.7202-1(a), DFAR 227.7202-3(a), and DFAR 227.7202-4, and, to the extent required under U.S. federal law, the minimum restricted rights as set out in FAR 52.227-19 (DEC 2007). If FAR 52.227-19 is applicable, this provision serves as notice under clause (c) thereof and no other notice is required to be affixed to the Software or documentation. The Government’s rights in Software and documentation shall be only those set forth in this Agreement.
SAS Institute Inc., SAS Campus Drive, Cary, NC 27513-2414
February 2018

SAS® and all other SAS Institute Inc. product or service names are registered trademarks or trademarks of SAS Institute Inc. in the USA and other countries. ® indicates USA registration.

Other brand and product names are trademarks of their respective companies.
P2:spjmpmktdat
Last updated: January 4, 2018
About This Book

Purpose
Businesses are drowning in data. Market data, whether from surveys or a data warehouse or data mart, must be analyzed dynamically to reveal patterns and relationships. Simple static tables and pie and bar charts don’t suffice. To convert market data into actionable, data-driven decisions, you need software that offers three key features: powerful data handling and scripting tools; and dynamic graphing and tabulation capabilities to drill down and link data; market-oriented statistical methods, such as choice models.
JMP ® gives you access to the right data-handling tools, a rich array of analytical methods, and dynamic views of your data so that you can drill-down and link the data across several displays for comprehensive views. Regardless of your level of experience, it offers tools and analysis reports that are accessible to novices and sophisticates alike.
The goal of this book is to introduce JMP to the market research community and to explain how to use JMP to implement some statistical techniques that can increase the power and sophistication of the data analysis process.

Is This Book for You?
This book is for data analysts, who, like me, are charged with analyzing market data. These economists, statisticians, and market researchers must estimate models, look for relationships, and produce insight from data to help business leaders make data-driven decisions.
It’s also for students and professors in quantitative methods courses that deal with market data. Students who focus on quantitative methods can learn about the high-powered statistical methods in JMP and can discover how the graphing and tabulation capabilities in JMP can help them easily and dynamically gain insight into their data.
This book is slightly technical at times, but not greatly so. An undergraduate background in statistics (through regression analysis) and perhaps an econometrics course are helpful. The focus is on showing how to use JMP with market data. Therefore, it's assumed that basic statistics, probability theory, and regression analysis are understood.
Scope of This Book
Here’s the general outline of the book:


Chapters 1–3 set the stage. They cover the state of market research, describe how JMP can help, and introduce JMP, for those unfamiliar with it.

Chapters 4 and 5 describe the JMP features that are useful for preparing data for analysis. They discuss analyzing survey data with JMP.

Chapters 6 and 7 apply JMP to specific business and marketing problems. They cover using choice models to study consumer behavior and using survey data to segment a market.

Chapter 8 shifts the focus from survey data to sales data from a data warehouse or data mart. Emphasis is placed on data visualization using the JMP Graph Builder .

Exercises are included at the end of each chapter. Solutions are available at the author page at http://support.sas.com/paczkowski . The exercises are divided into three groups:


Research concepts covering the technicalities introduced in the chapter.

JMP concepts covering the main tools of JMP that can be used for analyzing market data.

JMP problems that use the research concepts and JMP concepts to address data problems.

Some of the examples in this book come from my work with market data related to consumer products, but the material and methods described can be used with other types of market data. With one exception, the data used in this book is simulated. In Chapter 3, I use the Big Class data set that comes with JMP.
Throughout, JMP commands are capitalized and italicized. Menu commands, also capitalized and italicized, are run together and separated by a forward slash. For example, File/Preferences refers to the File menu on the main menu bar and the Preferences submenu. Data table column names are also italicized.
In this book, as in any human endeavor, you might find a mistake. Please let me know about anything I missed by contacting me at walt@dataanalyticscorp.com .

Keep in Touch

We look forward to hearing from you. We invite questions, comments, and concerns. If you want to contact us about a specific book, please include the book title in your correspondence.
To Contact the Author through SAS Press

By e-mail: saspress@sas.com

Via the Web: http://support.sas.com/author_feedback
SAS Books
For a complete list of books available through SAS, visit http://support.sas.com/bookstore.

Phone: 1-800-727-3228

Fax: 1-919-677-8166

E-mail: sasbook@sas.com
SAS Book Report
Receive up-to-date information about all new SAS publications via e-mail by subscribing to the SAS Book Report monthly eNewsletter. Visit http://support.sas.com/sbr .
Publish with SAS
SAS is recruiting authors! Are you interested in writing a book? Visit http://support.sas.com/saspress for more information.
About the Author


Walter R. Paczkowski is a market research consultant focused on helping companies turn their market data into actionable market information in a wide range of industries, such as telecommunications, pharmaceuticals, jewelry, food and beverage, and automotive. With nearly 40 years of extensive quantitative experience as an analyst in AT&T’s Analytical Support Center, a member of the technical staff at AT&T Bell Labs, head of pricing research at AT&T’s Computer Systems Division, and founder of his consulting company Data Analytics Corporation, he brings a wealth of knowledge to share about data analysis. He is currently an adjunct professor in the Department of Economics at Rutgers University, and is also an adjunct professor in the Department of Mathematics and Statistics at The College of New Jersey. He received his PhD in Economics from Texas A&M University.
Learn more about this author by visiting his author page at http://support.sas.com/paczkowski . There you can download free book excerpts, access example code and data, read the latest reviews, get updates, and more.
Acknowledgments
Many people have contributed to this book either through their direct input or through their encouragement. For the former, I received invaluable help from members of the JMP development team and JMP Press. Among those deserving special mention are Melinda Thielbar who provided technical assistance, Brenna Leath who guided me through the process and was very patient with me, Robert Harris who designed the cover, and Cindy Puryear, Monica McClain, Robin Langford, and Stacey Hamilton who all helped with the editing. My daughters, Kristin and Melissa, provided encouragement and an extra set of eyes when I need proofreading. This got me through with minimal headaches. And then there’s my wonderful wife and life partner, Gail. What can I say about how she read many pages of the manuscript, provided a large number of helpful suggestions, and, most of all, made sure I kept on track? She definitely deserves a very special thanks.
Chapter 1: The State of Market Research


Introduction
Market Research Role in Business
Surveys for Data Collection
Tabs as an Analysis Tool
Pie and Bar Charts for Data Analysis
Spreadsheets As Data Managers and Analysis Tools
You Can Do Better
Points to Remember
Exercises
Works Cited
End Notes
Last updated: January 4, 2018
Introduction

The role and function of market research is to enable information-driven decision making in businesses. Business leaders need actionable market information, not just “data,” for making intelligent decisions in highly competitive, differentiated, and rapidly evolving markets fraught with uncertainties and risks. Information is insight into customer purchasing behaviors, product preferences, willingness-to-pay, and groupings (a.k.a., segments) that increases understanding by showing previously unknown trends, patterns, anomalies, and relationships. The uncertainties and risks of markets are greatly reduced, albeit not eliminated, because of this information, so that better decisions can be made and the best actions can be taken for a business problem. Data are the building blocks for information, almost like Lego bricks that haven't been assembled. As building blocks, they can be assembled in many different ways with rules specifying how to do the assembling to create information that leads to decisions.
Market researchers provide that information while others in the business, such as those in the IT department, provide data, not information. This is an important distinction because there’s a division of labor that, on the one hand, separates the two functional areas but, on the other hand, interconnects them in a complex, synergistic way. Market researchers analyze the data for trends, patterns, and relationships while the IT staff organize and maintain the data in data warehouses and make them accessible in a useful form to a wide audience usually through data marts . The market researchers, in turn, inform and guide the IT staff regarding the type of data they need and the form or organization of the data most convenient for them. This synergy is illustrated in Figure 1.1 Market Research – IT Synergy .

Figure 1.1 Market Research – IT Synergy

This chapter discusses the major tools, statistical and software, that market researchers use and need to turn data into information through analysis. These tools, however, are often unknown or inadequate for the job. So, this chapter identifies problems with some tools and argues what needs to be done to better address market researchers' statistical and data handling problems.
This chapter is also a guide to succeeding chapters. In later chapters, I describe how the problems and issues I raise here are addressed with JMP, a fully functional, user-friendly statistical package with dynamic capabilities for performing penetrating analysis and extracting information that can serve as the focal point for handling the problems. Market researchers need software with three features:


powerful data handling methods to manipulate and shape data for meaningful analyses;

dynamic graphing and tabulation capabilities to drill down and link data for deep and penetrating insight;

statistical methods appropriate only for the type of data being analyzed.
JMP fills the bill perfectly. It provides appropriate tools and analysis reports tailored for those who are novice data analysts and for those who are more sophisticated in the tools-of-the-trade, while also providing rich analytical methods as well as scripting capabilities for creating custom reports and analyses.
So, although this chapter is about how I see the current state of market research, it’s also about the role that JMP plays in converting data into information; that is, it is about its role in data analysis . Data analysis is not just looking at data and reporting what you see; it’s about breaking raw data into parts and reassembling them to extract information. I expand on the role that JMP plays in this task in Chapter 2.
This chapter has six main sections:


The first introduces the role of market research in providing the information that key decision makers need to run a business.

The second outlines the sources of the data that are turned into information. Surveys are the main source that most researchers use, but it’s not the only one. Data can also come from a data warehouse or data mart.

The third section is the first of three that discuss the main tools that most researchers rely on. This section focuses on “the tabs.”

The fourth section adds the typical visual displays, pie and bar charts, to the discussion of tools. These two charts not only are the visual reporting tools, but they’re also quite often the analytical tools.

The fifth section discusses the role of spreadsheets in data analysis and the issues they have for analyzing data.

The final main section argues that we can do better – and that JMP is the tool to use.
Last updated: January 4, 2018
Market Research Role in Business

At the end of each day, every business must sell a product. If nothing is sold, nothing is earned and the business ceases to exist. What is sold must meet customer needs at the right price, be the best, be the first, and be easily accessible. To make sure their product is sold, business managers need to know four things:


what customers want;

how much customers will pay;

what competitors are offering;

how consumers are grouped or segmented for optimal pricing, product development, and selling.
With this information, they can strategize and measure the effects of their strategic and tactical decisions on their key business metrics. A few key business metrics are profit, revenue, contribution, and sales, but there are probably many more that any business manager can devise.
The market research department in any major business is the organization typically called upon to provide this information. This information-providing function is displayed in Figure 1.2 Market Research Functions .

Figure 1.2 Market Research Functions

In order to fulfill their objectives, market researchers typically use four tools. Their use, of course, depends on the researchers’ sophistication and the problem they’re addressing. Nonetheless, these four tools are used more often than not. These are:


surveys as the prime data collection methodology;

cross-tabulations as the main analytical tool;

simple, static charts such as pie and bar charts for analysis and display;

spreadsheets for managing, organizing, analyzing, and displaying survey data.
I’ll discuss each of these in detail in the following sections.
Last updated: January 4, 2018
Surveys for Data Collection

A Focus on Surveys

Surveys are definitely the major source of data for market studies in modern businesses. They're unsurpassed for learning opinions and revealing preferences. Opinions are varied but are usually about the quality of service and performance on key measures or attributes of the product or business. This could include service rep responsiveness, pricing and payment options, delivery promptness, product quality, respect and courtesy, and so forth. Customer satisfaction studies, for example, try to isolate these opinions to measure how well the business is performing, especially relative to the competition.
Understanding preferences is important because they're usually about new product features, price points, messages, etc., but these preferences can only be revealed when people are exposed to the object in question (e.g., a new price point), which may be impossible to do in an actual market implementation. In a pricing study, for example, surveys reveal preferences for pricing programs, strategies, and willingness-to-pay without the firm actually changing prices in the market, which could alienate vital customers if not done correctly.
Surveys are used to collect experimental data as opposed to transactional data . I often refer to survey data as experimental data because they’re collected under controlled conditions with an experimental design in at least one part of the questionnaire. An experimental design is also used to sample respondents and to show them questions or tasks. The data collection design could be a simple random sample (not typically used) or a stratified random sample (more typically used). Some examples of studies that use an experimental approach are conjoint, MaxDiff, and discrete choice. I’ll show in Chapter 6 how these can be designed and analyzed using JMP. I discuss survey data, in general, in several chapters of this book where I’ll also show how JMP is a powerful and robust tool for turning survey data into actionable and useful information.
Transactional or sales data, also called observational data , are records of what people actually did in the market: what was purchased, how much was purchased, price paid, purchase date, location, and much more. The emphasis is on what was done -- retrospective -- not what will be or could be done -- prospective . In addition, the data are for customers for that business only; data on customers of like-products sold by competitors are not, and cannot be, included. This results in an incomplete view of the market. I discuss transactional data, now called Big Data, in Chapter 8 and show how JMP will enable you to manage and analyze this data for information.

The Need for Speed

In our modern technology-driven economy, most surveys are done online with sophisticated programming that allows questionnaires to be quite complex. Paper and pencil and telephone surveys are still used, but online surveys are now the norm rather than the exception. Their use contributes to a subtle problem -- the need for speed. They can be quickly implemented with panels of respondents that can likewise be quickly assembled to meet pre-specified criteria (e.g., ethnic/racial profile, education level, product experience), thus shortening implementation time. In addition, many online survey vendors have templates for most standard questionnaires and question sets that just require users to modify wording for their particular problem, thus speeding up the process of writing and fielding a study.
This speed translates into clients requiring (almost demanding) a quick turn-around of reports, which are expected hours -- or less -- after a study leaves the field. The easiest way to meet this expectation is to ask online vendors to provide canned reports that have simple charts and tables for each question to go along with the questionnaire templates.

Surveys Are Overemphasized

Transaction data, maintained in either a data warehouse or data mart, can also be used for developing information about markets. A data warehouse is an all encompassing compilation of data on every aspect of the business and a data mart is a specific functional area (e.g., marketing, finance, logistics) subset of the data warehouse. Scanner data are a prime example of transaction data.
Market researchers leave working with databases to data scientists who are usually in the IT department, thus ignoring this data’s richness for providing insight about market strengths, weaknesses, opportunities, and threats ( SWOT ). There are two reasons for this. The first is the sheer complexity of these databases, which is daunting to most. A very special skill set is needed to handle, let alone analyze, them. I touch on some issues in Chapter 8.
The second reason is the belief that only surveys can tell you what customers want, and only surveys can reveal their opinions and preferences. To a good extent this is true, especially regarding new products, enhancements to a product line based on modifications to product features, or new price points. These will never appear in a database, which, by its nature, must be historical. Showing what did happen is unimportant for many business decisions. Opinions and preferences for new products, features, and price points can only be obtained by asking.
What people actively did as reflected in a database is equally important for information-driven business decisions because you can learn how people behaved under different conditions (e.g., price points), what sold, when sales increased and decreased (e.g., seasonal patterns), what appealed to which group of consumers (i.e., segments), and so forth. So databases should not be ignored, but relying on surveys to the exclusion of databases can go too far. Wheeler (2012) cites an anecdote about a data mining professional who was told by a client: “ The problem with your data … is that it's not the real data. We should use real data … data from the surveys we take, not data from the web .” This is misguided because an overreliance on surveys is dangerous. Quite often what people say they want and will buy is not always what they buy. People tend to be inconsistent. Because of this, discrete choice experiments, based on surveys, are sometimes combined with actual purchase data to gain better insight into preferences. I briefly discuss this in Chapter 6.
Last updated: January 4, 2018
Tabs as an Analysis Tool

A Focus on Tabs

In addition to surveys being overworked as a data collection method, cross-tabulations of the data, or simply “the tabs,” are equally overworked as the primary analysis tool. A client once told me that “ Everything you need to know is in the tabs .” In my opinion, this sums up the prevalent view of market researchers.
Tabs are quite often printed in books with voluminous pages and 8-point font. They’re predefined once the survey is designed with a complex set of programming code generating them, code that becomes difficult to change so that the tabs are usually unchangeable; i.e., they’re static. This has an important implication for data analysis: you can’t look at rearrangements or reconfigurations of the data without having to reprogram the tab software, and maybe printing new books. There’s a time cost, as well as a monetary cost, which plays to the “need for speed” issue. To minimize these costs and to quickly generate a report, only one set of static tabs will be produced. This definitely handicaps analysis.

Tabs Are Just Contingency Tables

The concept of a cross-tab is familiar to statisticians; they're just contingency tables created by crossing two discrete variables to create a single table. The cells of the table are the number of observations (the count or frequency) determined simultaneously by a particular combination of two categories of the two variables. Dividing by an appropriate base changes the cell frequencies into cell proportions or percentages. The data in the cells of the table can thus be presented in several ways:



raw counts or frequencies;

percentages of the total sample (base: total sample);

percentages of the columns (base: column totals);

percentages of the rows (base: row totals).

The columns are called banner points or variables and the rows studs . The banner points are interpreted as independent variables that determine the responses to survey questions. Demographics, consumer segments, and quota groups are typical banner points. The studs are the question responses and so can be interpreted as dependent variables. A typical cross-tab page would have several banner points (usually at a very small point size) but only one stud variable. The process of creating a cross-tab table or tab is called “tabbing.”
A typical cross-tab is shown in Figure 1.3 Example Cross-tab . For this example, the two variables are brand of yogurt and where yogurt may be purchased. The Brand is the banner point and the Store where yogurt is typically purchased is the stud. There were 906 survey respondents. I produced this cross-tab using JMP; I explain how in Chapter 5. When continuous variables are involved (for example, dollars spent), sums, means, medians, and standard errors are also included.

Figure 1.3 Example Cross-tab


Problems with Tabs

Too Much Reliance

Tabs, although widely used, have five problems that compromise their use:


they can quickly become voluminous thus hampering a search for information;

they have static rather than dynamic views of the data;

they largely contain just simple univariate descriptive statistics;

they don’t show or measure effects;

they don’t visualize relationships.
This is not to say you shouldn't use them; just that you shouldn't be so heavily reliant on them for analysis.

Problem 1: Tabs Can Be Voluminous

Cross-tabs can quickly become voluminous since tabs compare just two variables. Suppose a questionnaire has 100 variables and you tab every pair. This means you have 4,950 tables. Imagine looking through 4,950 tables trying to find actionable information! Where do you begin?
For the 4,950 tables, you need 990 sheets of paper, assuming that five tables fit on an 8.5 x 11 sheet of paper with an 8-point font. A ream of paper is 500 sheets (one standard package), so you need approximately two reams of paper. One ream is approximately 2.25 inches thick, so you would have a stack of paper almost 4.5 inches thick! And just for 100 variables. A typical questionnaire has more than 100 variables, so the size of the tabs has the potential to be very large.
Not long ago, the tabs were physically printed and bound into books (hence, this reams of paper issue). This made gleaning any insight from them challenging to say the least because of the sheer volume of pages, not to mention the fact that the pages are just static snapshots of the market based on people's responses to survey questions. The tabs are now in digital books, although physical books are still popular. Digital books are equally difficult to work with because the volume issue still remains. Plus, there's the added factor that paging from one part of a digital book to another makes it difficult to compare one page to another, unlike for a physical book.

Problem 2: Tabs Are Static Views

I mentioned that tabs are static, not providing a dynamic view of your data. Dynamic in this context means highly interactive. You can:



drag and drop variables to a table or graph canvas to quickly create new views;

click an icon or select from a menu to change views of the same data;

drill-down and link tables and graphs to quickly see interrelationships.

“Dynamic interactive graphics are graphics that can move smoothly and change in response to the data analyst's actions, the changes being computed and presented in real time ” (Young, Valero-Mora, and Friendly, 2006; emphasis in original). Static means non-interactive or fixed, so drilling down and linking are impossible. You can only see the one pattern or maybe two patterns in the view given by a static table or chart. The view can be changed, but not easily. If time is of the essence for most studies, then changing the tabs to create new views becomes a luxury many analysts can’t afford. The static tabs impose an unforeseen cost on doing data analysis as noted above.
JMP is designed to allow you to have dynamic views of your data, so that you can drill-down and link this data across several displays for comprehensive views. This is the strength of JMP, its forte, for analyzing market data for information. I illustrate this in Chapters 4 and 5 and use this dynamic analytical capability in Chapters 6–8.

Problem 3: Tabs Have Descriptive Statistics

Tabs are just tables of descriptive statistics at best and these statistics are univariate. Univariate means that only one variable is summarized or analyzed. Means, medians, proportions, and standard errors are examples of univariate, descriptive statistics. Even though two variables are “crossed” in the table, just simple univariate summary statistics are calculated and reported. A simple correlation coefficient for a bivariate relationship, for example, is not calculated and shown.
Markets, however, are multivariate , meaning that many variables or factors interact and work simultaneously to produce whatever result you're studying. It's never the case that only one variable explains another or one summary measure (e.g., a mean or proportion) reveals complex interrelationships. An example is purchase intent, which is frequently measured on a 5-point Likert Scale ranging from “ Extremely Unlikely ” to “ Extremely Likely .” A univariate statistic, such as the mean (assuming the measurement scale is at least interval), is frequently calculated, shown in the tabs, and reported to the client. This, however, ignores how it interacts with price, product features, types of stores where available, etc. -- and all simultaneously! These complex, simultaneous relationships can only be uncovered, and the direction and magnitude of the interactions determined, by a regression model (or an appropriate member of the regression family). Regressions are not in the tabs.
A model separate from the tabs is needed to estimate multivariate relationships. A model is a statistical representation, an abstraction, of what you believe is the relationship between a factor of interest, a dependent variable, and other key variables, the independent variables. There are many different types of models; the applicability of each depends on the nature of the data. This is summarized in Figure 1.4 Statistical Model Highlights . The type of model that can be used depends on the type of dependent and independent variables. The modeling types are Continuous , Ordinal , and Nominal scaled. JMP “knows” the type (and sometimes the role—e.g., dependent variable) and gives you appropriate modeling options in modeling platforms . For example, if the dependent variable is nominal, then logistic regression will be available as the recommended method or personality in the Fit Model platform, although you can choose something else. JMP allows you to estimate many different types of models as I illustrate in Chapters 5, 6, and 8. In fact, JMP helps the analysis process by providing only those model types appropriate for the data types being analyzed. I give a broad, high-level introduction to JMP platforms and personalities in Chapter 3.

Figure 1.4 Statistical Model Highlights


Problem 4: Tabs Don't Show Effects

Consider purchase intent again. A logical question to ask is: “ How much does purchase intent change because of a price change ?” Yes, intent will rise if price falls. That's obvious and simple economics, but the real issue is how much. This is given by an elasticity , something else that’s not in the tabs. An elasticity shows the responsiveness of a change in one quantity (e.g., purchase intent) to a change in another (e.g., price). A key business metric, revenue, is intimately connected to a price elasticity. A model provides this measure of effect, which is another reason to go beyond tabs. I show you how to estimate elasticities for choice models in Chapter 6 and how you estimate them for sales data Chapter 8.

Problem 5: Tabs Don't Visualize Relationships

If a cross-tab table is large (i.e., bigger than a 2 x 2 table [ 1 ] ), then it becomes very difficult to see relationships because people tend to have visual problems seeing such relationships in tables of numbers, especially large tables. For instance, in the table in Figure 1.3 Example Cross-tab you can’t tell which brands of yogurt are more closely associated (in a proximity sense) with the client's brand or which brands are more closely associated with each type of store. The best you can hope for is to spot a large number (frequency or percent) and then draw attention to it as if that number is the most important piece of information in the table. A graph of the table would be a tremendous help. This is what correspondence analysis produces. I discuss correspondence analysis and its associated graph in Chapter 5.
Last updated: January 4, 2018
Pie and Bar Charts for Data Analysis

Visualization Issues

Pie and bar charts, used to display categorical data or data summaries such as proportions, are ubiquitous; everyone knows them. They're simple to produce and understand -- sometimes. Since most survey questions have categorical responses, these charts are liberally used. But they have problems.
Data visualization experts abhor pie charts. Tufte, probably the most dominant and best known data visualization expert, once wrote: “ The only thing worse than a pie chart is several of them .” (Tufte, 1983). Pie charts are not effective data displays because we have problems with angles, especially comparing them, just as we have problems with large tables. Lengths are easier for us to see and compare because they're linear; angles are not. This is illustrated in Figure 1.5 Comparison of Pie and Bar Charts . The pie chart in Panel A , which shows market shares for three brands, illustrates the main problem with this type of display. Which has the smallest share? The actual shares are:



A: 35%

B: 30%

C: 35%.

The bar chart in Panel B shows the same data, but now the difference in shares is clear. The bar chart shows the same data in a much clearer fashion. See Few (2007) for similar comparisons.

Figure 1.5 Comparison of Pie and Bar Charts

In addition to the angle issue, pie charts are frequently drawn with too many “slices,” which becomes a cognitive challenge for deciphering key messages. This is compounded when you're asked to compare across several pies in a single slide or, worse yet, across multiple slides in a report.
Because of these problems, some visualization experts recommend replacing pie charts with bar charts as in Panel B of Figure 1.5 Comparison of Pie and Bar Charts , another very popular, static chart that appears in most market research reports. Bar charts are viewed as “unrolled” pie charts—they contain the same information, but you can discern differences much faster and more accurately because of the linear nature of the bars. See Few (2007) for interesting comparisons. Also, bar charts can show more quantities than simple pie charts. For instance, they can show:



counts or frequencies;

means;

proportions;

sums;

trends.

3-D effects are often added to bar charts (as well as to pie charts) to show depth, but this just counters their advantage because people have depth perception problems as well as angle problems.

Other Chart Forms Are Ignored

There are many other chart forms that can be used to display data. Some are:



Box plots

Bubble plots—to display three quantities at once. Colors add a fourth dimension—but not for color-challenged people!

Dot plots

Heat maps

Line plots

Mosaic plots

Scatter plots

Treemaps

Panel (also called trellis) displays

JMP has a great graphing platform, Graph Builder , that not only enables you to create all these graph types but also enables you to create them dynamically, so that you can see your data from many different perspectives. In short, it actually allows you to analyze your data. I discuss Graph Builder in Chapter 5 and illustrate its use throughout the remainder of this book. See Cleveland and McGill (1984) for a more technical discussion of graphical displays.
Ignoring visual issues, pie and bar charts frequently replace analysis of market data. Many reports have a pie followed by a bar followed by a pie followed by—well, it keeps going on and on, so that there’s one chart for each question. This is just reporting at worst and simple descriptive “analysis” at best so that the analysis and reporting of data are the same; the same tools are used to study and report the data. Information, however, is not extracted because a connection is not made between (or among) questions (variables) to show relationships, trends, patterns, and anomalies (i.e., outliers). This is the analysis of data.

The Chartjunk Issue

I would be remiss if I didn't comment on chartjunk , a term coined by Tufte (1983):
The interior decoration of graphics generates a lot of ink that does not tell the viewer anything new. The purpose of decoration varies -- to make the graphic appear more scientific and precise, to enliven the display, to give the designer an opportunity to exercise artistic skills. Regardless of its cause, it is all non-data-ink or redundant data-ink, and it is often chartjunk.
Microsoft PowerPoint and spreadsheet software (discussed next) make it very easy to create elaborate charts that look impressive but are not really statistical and don't add to the extraction of information from data. The chartjunk graphs pass the burden of analysis onto the report reader (i.e., the client or manager) by forcing him/her to sift through the junk to find the market information the study was designed to deliver in the first place. Statistical charts and graphs should be used. See Su (2008) for a discussion of spreadsheet graphics that don’t adhere to basic principles of statistical graphics and which thus overburden the reader. These principles have been well stated by others. See the references in Su (2008) and a slightly different perspective on chartjunk by Few (2011).
Last updated: January 4, 2018
Spreadsheets As Data Managers and Analysis Tools

Overreliance on Spreadsheets

Spreadsheets are very heavily relied upon for analyses and database management. There are several major issues associated with their use. They're easy to use in general, but this is not a reason to use them as database managers. They're not database managers.
Figure 1.6 Example of Spreadsheet Data shows an example of a section of a large spreadsheet typical of what you might receive from an online survey vendor. There are 18,492 rows or records, which represent that number of people, and 81 columns or variables on each person. There’s no indication of what Q1 , Q2 , and so on are, no indication of their values, and no definitions for the Code variable. Sometimes documentation is provided, perhaps as another worksheet in the same workbook as the data, but often it's not. You have to refer to the questionnaire to find the definition for the variables, such as Q1 , and even then, it's not clear what they are.

Figure 1.6 Example of Spreadsheet Data

Spreadsheets are good as simple data structures such as flat files . Flat files have just a rectangular array of data with no links to other data; they have just simple rows and columns. They’re inadequate for complex data structures involving several tables, which are, effectively, a 3-D data cube. They lack:


Variable documentation , except by creating yet another spreadsheet, preferably in the same workbook.

Value mapping from numerics to descriptive labels.

Variable grouping for quick data location and management.

Table operations such as joining/splitting/stacking.

Programming capabilities , aside from Visual Basic for Applications ( VBA ), which is not a statistical programming language.

Sophisticated statistical operations beyond arithmetic operations and simple regression analysis. Add-on packages help, but they tend to lack depth and rely on the spreadsheet engine.

Cell-centric Problem

Spreadsheets are notorious for making it difficult to track formulas and catch errors because they’re cell-centric. Each cell could have a separate formula; even cells in the same column for a single variable could have different variables. As an example, for the sample spreadsheet in Figure 1.6 Example of Spreadsheet Data , creating a new variable that is Q1 + Q2 requires creating 18,492 formulas, one for each cell. The chance for error is, of course, huge.

Auditing Problem

The cell-centric formula issue leads to an auditing problem. The cells in a spreadsheet are often linked to other cells, either across spreadsheets in the same workbook or across workbooks, and often with no clear pattern. Tracing and reproducing these links is difficult or impossible for very large spreadsheets so that auditing calculations becomes difficult. The chance of an undetected error is higher the more complex the spreadsheet.

Easy Graphs

Spreadsheets provide the ability to easily graph data. The static nature of the graphs, however, makes it difficult to quickly explore and test ideas. Often what is presented is the first and only analysis. There are no links back to the original data for further analyses. As an example, there's no ability to click a bar in a bar chart to identify or subset the data in that bar. The static graphs are limited. These simple charts, combined with the tabs, are often the sole forms of (descriptive) analyses.
Last updated: January 4, 2018
You Can Do Better

The result of focusing almost exclusively on surveys, static univariate tabs, and spreadsheets with static pie and bar charts is that the wealth of information that could be gained about market forces is left untapped. Important relationships are left uncovered and unexplored, resulting in simple conclusions and recommendations. But you should and can do better.
Your goal should be to provide actionable information about market forces based on dynamic tools and sophisticated analyses, so your client or other business leaders can make the best informed information-driven decisions. You need to go beyond a cult of surveys, tabs, and static pie/bar charts, and move beyond just market research to researching the market with more sophisticated, dynamic analysis tools. So, an agenda item for this book is to promote more sophisticated research and analysis of markets with broader data and enhanced, dynamic tools.
This is where JMP enters the picture. I focus in Chapter 2 on how JMP can help you in this area.
Last updated: January 4, 2018
Points to Remember

There are several main points to remember from this chapter:


Market researchers provide information about market operations, organization, needs, and performance.

Market researchers rely on four tools:


surveys : overworked to the exclusion of databases;

tabs : provide static snapshots of survey results that are difficult to change and are univariate;

simple charts : static and simplistic;

spreadsheets : error-prone and used for analysis and as database managers.

The goal of market researchers should be to provide actionable information about market forces with dynamic tools and sophisticated analyses.
Last updated: January 4, 2018
Exercises

Research Concepts



What is the role of market research in business?

Distinguish between data and information.

What are the major data sources used by market researchers?

Compare and contrast the major data sources used by market researchers.

Describe some implications of a “need for speed” attitude in market research.

What is a “tab”? How are tabs used in market research?

List and discuss five problems with tabs.

How are the “need for speed” and tabs related?

What are two popular graphic displays of data? Summarize some issues with each one.

Compare and contrast dynamic and static graphs and tables.

Compare and contrast univariate and multivariate statistics. Give a marketing example of each.

How prevalent are spreadsheets? How are they generally used by market researchers? Summarize issues with using spreadsheets.

JMP Concepts



Name three data modeling types recognized by JMP.

Briefly state how JMP can aid you in gaining a comprehensive view of your data.

Describe how JMP can help you extract information from raw data.
Last updated: January 4, 2018
Works Cited

Cleveland, William S., and Robert McGill. (September 1984). Graphical Perception: Theory, Experimentation, and Application to the Development of Graphical Methods. Journal of the American Statistical Association , Vol. 79, No. 387, pp. 531 - 554.
Few, Stephen. (2007, August). Save the Pies for Dessert. Retrieved from Perceptual Edge: http://perceptualedge.com/library.php
Few, Stephen. (2011, April/May/June). The Chartjunk Debate: A Close Examination of Recent Findings. Retrieved from Perceptual Edge: http://perceptualedge.com/library.php
Su, Yu-Sung. (June 15, 2008). It’s easy to produce chartjunk using Microsoft Excel 2007 but hard to make good graphs. Computational Statistics & Data Analysis , Vol. 52, No.10, pp. 4594–4601.
Tufte, Edward R. (1983). The Visual Display of Quantitative Information. Cheshire, CT: Graphics Press.
Wheeler, Schaun. (2012, April 2). Surveys, Assumptions, and the Need for Data Collection Alternatives. Retrieved May 22, 2015, from R-Bloggers: https://www.r-bloggers.com/surveys-assumptions-and-the-need-for-data-collection-alternatives/
Young, Forrest W., Pedro M. Valero-Mora, and Michael Friendly. (2006). Visual Statistics: Seeing Data with Dynamic Interactive Graphics. Hoboken, NJ: John Wiley & Sons.
Last updated: January 4, 2018
End Notes
1: A 2 X 2 table has two rows and two columns so there are only four cells in the table. [ return ]
Last updated: January 4, 2018
Chapter 2: Using JMP for Market Research


Introduction
Project Management Steps
Documentation and Reproducibility
Managing Data
Statistical Capabilities
Dynamic Graphs and Tables
Scripting Capabilities
Integrating with SAS and R
Points to Remember
Exercises
Works Cited
End Notes
Last updated: January 4, 2018
Introduction

You must do more with your data than just use it to create tabs, calculate means, and report results in static pie and bar charts. These simplistic approaches to data analysis are usually enabled by the software used, i.e., spreadsheets. Spreadsheets have limited functionality, and therefore only enable naive analytics approaches. More complex problems require more sophisticated software. This, however, begs a question: " What software ?"
The goals for this chapter are to:


demonstrate why JMP should be used in market research studies;

describe the statistical capabilities of JMP;

introduce the powerful integration of JMP, SAS, and R for market research.
In the remaining chapters of this book, I show you how to use JMP to analyze market data and convert them into useful, actionable information.
This chapter has seven main sections. The first section to follow describes six properties of a project and six properties of JMP that make it ideal for market data analysis. These six properties are the base for the remaining six sections.
Last updated: January 4, 2018
Project Management Steps

Studying markets and how people and businesses buy and sell products and services is not trivial or new. Economists have the job of theorizing about markets while market researchers have the job of collecting and distilling information for practical problems. These problems are complex and require thoughtful analyses with sophisticated tools and techniques. Due to this complexity, no market research project, or any statistical project in general, should be done without a clear project management framework for the successful completion of the research. Figure 2.1 Market Research Parts in JMP shows six parts of a framework that are also properties of JMP that make it an ideal tool for analyzing market data. Any statistical software should have these six properties, although the last one would obviously vary depending on the software. JMP is a powerful tool for handling the major requirements for project management and statistical analysis.

Figure 2.1 Market Research Parts in JMP

Last updated: January 4, 2018
Documentation and Reproducibility

Why Worry about Documentation and Reproducibility?

Documentation and reproducibility are two separate but connected tasks that are important in any research project. I chose to put them first in the list because they must be planned before research begins and actively pursued while research is conducted. Trying to recall what was done afterward is a daunting task usually subordinated to the other daunting tasks of research. Researchers typically get to them after the work is done but at this point they’re incomplete at best and error prone at worse.
Documentation is the logging of important steps in the research process including data sources, important transformations, and analyses used to answer to a client’s question. Reproducibility is the ability to start with the same data, follow the documented steps, and get the same answer. Quite often, analysts produce a report only to have the client call, weeks or months later, requesting a clarification of how a calculation was done or to request further analysis along the same lines. This means that you must recall exactly what you did, not to mention what data you used.
Reproducibility has become a major issue not only in the research community but in business as a whole. [ 1 ] See Gandrud (2014) and Fomel and Claerbout (2009) for some comments.
JMP enables effective documentation and reproducibility three ways:


saving scripts to re-run analyses and data transformations;

using Table Variables for notes about data sources or transformations;

creating Journals that save analysis results.

Saving Scripts

The major data management paradigm in JMP is the data table , which is the main repository of your data. A JMP data table for a fictitious market research project on consumers’ yogurt-buying habits is shown in Figure 2.2 Example of a JMP Data Table . Notice that the table has rows and columns much like a spreadsheet, but the similarity ends there.
Every platform in JMP automatically generates a script that records the user’s choices. These scripts can be saved as part of the data table and then run to re-generate the analysis. This serves two purposes:


the scripts themselves document the type of analysis used and the options chosen;

the scripts can be re-run to reproduce the analysis quickly and easily.
Moreover, the JMP Scripting Language ( JSL ) is a powerful and flexible language in its own right that allows experienced users to extend the capabilities of JMP. I do not discuss script writing in this book. See Utlaut, Morgan, and Anderson (2001) for a good treatment of writing scripts.
Each green triangle in the yogurt data denotes a script that was either automatically generated by a platform or written by a user. The ability to save the script with the data allows for easy organization of analyses and data.

Figure 2.2 Example of a JMP Data Table


Using Table Variables

Table Variables are special variables saved as part of a data table. Table Variables are very versatile since they can be defined to hold any documentation. For instance, I typically create Table Variables to hold the title of a project, the client name, the project number, and the date on which the data table was created. See Figure 2.2 Example of a JMP Data Table . Creating Table Variables for documentation is discussed in Chapter 3.

Creating Journals

Journals are a powerful way to save any documentation in JMP. A Journal can hold scripts to re-create a report, PowerPoint slides, and any text or documentation in a form similar to what appears in JMP reports. The Journal basically acts as a general repository for everything that can be done in JMP as well as in other programs that might be needed for a project. Some features of a Journal are illustrated in Chapter 3.
Last updated: January 4, 2018
Managing Data

Data Management Tasks

Managing data is an important part of any project. There are three tasks associated with data management:


obtaining and distributing data;

wrangling data tables and variables;

creating new variables.
JMP has great facilities for making it easy and painless to do these major data management functions. I discuss these in the following subsections.

Obtaining and Distributing Data

Introduction

It should go without saying that you have to obtain, or import , your data from some source before you can do anything. For market researchers who rely on surveys, that might mean obtaining the data from an online survey program as I discussed in Chapter 1. Once you have the data in one program, chances are you’ll need it in another program because analyses are typically done using several software packages. So you'll have to export the data to another format. For example, your initial data might be downloaded from an online survey program in a text format and then imported into JMP for processing and analysis. You might then send processed data from JMP to SAS for creating charts or to Microsoft Excel for an eventual hand-off to the client.
Every statistical package can import and export data from one format to another. The distinguishing feature of JMP is the ease with which this can be done.

Importing Data

The Open function, extensively discussed in Chapter 3, reads the major formats such as:
Table 2.1 JMP File Open Formats
Source
Format or Extension
Excel
*.xls, *.xlsx, *.xlsm
SAS
*.sas7bdat
SPSS
*.sav
Text files
*.csv, *.txt, *.dat
xBase
*.dbf
Many others
Many others
This function is also versatile enough that it can open and read PowerPoint decks, PDF files, and even execute .exe programs.
The ability to import SPSS, Excel, and CSV is important because these are the three most popular formats for market research applications. The SPSS format is particularly important because it’s very popular in market research since it has metadata [ 2 ] that includes:



variable name;

variable label;

value labels.

These metadata are brought into JMP along with the data and maintained in the data table created during the import process. The variable name is the mnemonic used to identify the variable while the variable label is a longer descriptive, which is often the question itself. So in the example data table in Figure 2.2 Example of a JMP Data Table , the SPSS variable name is Brand , which is used as the JMP column name, and the SPSS variable label (not shown) is “ Which brand of yogurt do you typically buy ?” which is the question as it appears in the questionnaire. Data are usually numerically coded so Value Labels associate words with each numeric value, making them more intelligible. For the Brand variable, the labels are: 1 = "C", 2 = "Client", 3 = "Major Competitor", 4 = "D", 5 = "E", 6 = "G". In JMP, the numeric values are used in computations, but the value labels are displayed in reports.
Excel is universally used for many data management and analysis operations -- and everyone has it! So, it's natural for online survey programs to export data in this format. Unfortunately, an Excel spreadsheet doesn’t have a metadata feature, which is a major drawback. JMP has a robust Excel import feature that allows you to import Excel spreadsheets stored in a variety of common formats. It is important to note here that Excel is often used for data management, statistical analysis, and graphing. There are, however, many well documented issues with Excel's statistical and graphing capabilities that should cause one to be cautious when using it. See McCullough (2008), B.D. McCullough (2008), McCullough and Heiser (2008), Yalta (2008), and Su (2008). The ease of importing data from Excel into JMP, along with the robust and full-featured statistics available in JMP, should make JMP a natural choice for most of the data management and analysis tasks commonly performed in Excel.
The Comma Separated Value ( CSV) format is a text file where data values are separated by commas. It is very popular because all programs can read it. But like Excel files, CSV files have the downside of not storing metadata. Column labels, value labels, and other data about the data can be lost.
I discuss importing or opening files in JMP in Chapter 3.

Exporting Data

Just as JMP can import or open a wide range of data formats, it can also export to many formats. Excel, SAS, and CSV are the most common.

Data Wrangling

Data are never exactly in the form you need for analysis. In fact, most of your time is spent reorganizing your data so that you can use them in statistical functions or display them in appropriate graphs. Some estimate that data reorganizing and cleaning accounts for 80% of the effort in a data mining environment. See Dasu and Johnson (2003), and Kandel and others (2011). Before anything can be done, the data have to be wrangled or wrestled into another form. Another, almost crude word is munge . Wrangle, an interesting word, means to modify data in some way to make it more suitable for your analysis. See Cross (2001), McKinney (2013), and Kandel and others (2011). An advanced book with a focus on Python is Kazil and Jarmul (2016). Some, such as Kandel (Kandel, et al., 2011) define data wrangling as an iterative process of data exploration and transformation to put the data into a form useful for “downstream” analysis. Kandel and others (Kandel, et al., 2011) define data as being useful if it’s usable (in the right form for the analysis tools, such as regression analysis and graph building or visualization) and credible (representative of the population being studied).
This obviously is a broad definition; vague is a better word. That's because the one word tries to capture all the things you have to do to your data before analysis (the “downstream” process) can begin, some of which are:



filter;

merge;

reshape;

sort;

subset.

As an example of reshaping, data are often in wide-form when received from an online survey tool. This means there is a column in the data table for each level of a survey question with each row of the table representing a respondent. For a conjoint study, which I discuss in Chapter 6, consumers rate a series of alternative product configurations. Each rating is a separate column in the data table. Most statistical methods, such as OLS regression estimation, require the data be in long-form -- with one row per level and each respondent in several rows. So the data have to be wrangled from wide- to long-form. I discuss this in Chapter 4 and, as I just noted, illustrate it for a conjoint, as well as a discrete choice model in Chapter 6.
JMP has very powerful, easy to use tools that allow you to wrangle your data into an alternative form. All the standard wrangling operations can be easily done in JMP as I discuss in Chapter 4.

Creating New Variables

In addition to never having the data in the form that you need, you often don't have the variables that you need and have to create them. This, of course, does not mean you can just magically produce new data from nothing. You create new variables using formulas incorporating variables that already exist in the data table. JMP allows you to add new variables (i.e., columns) to your data table either through a script or through the graphical user interface ( GUI ) interface. Variable creation is illustrated throughout this book.
Last updated: January 4, 2018
Statistical Capabilities

JMP has a rich array of statistical capabilities. Some of these are shown in Figure 2.3 Statistical Capabilities of JMP . These capabilities span the univariate and multivariate statistical space. Univariate and multivariate graphing add to the statistical understanding of the data, especially for visualizing patterns and relationships. The statistical capabilities include the usual univariate descriptive statistics as well as standard multivariate methods such as regression analysis. It also has powerful methods such as correspondence analysis, factor analysis, discriminant analysis, clustering algorithms, and experimental design functions that are state-of-the-art. I illustrate these for market research applications throughout this book.

Figure 2.3 Statistical Capabilities of JMP

An important feature of JMP is the ability to define a variable's modeling type, that is, a property that determines how it should be used in statistical analysis. Once the modeling type is set, JMP presents analysis choices that are appropriate for that type of variable. For instance, if purchase intent is the dependent variable in a regression model but intent is coded simply as Yes and No (i.e., binary), then JMP will only present logistic regression functions for modeling. This is a great advantage because it makes it easy for even novice users to choose the right methods for the data that they are analyzing and easily avoid using the wrong methods.
Last updated: January 4, 2018
Dynamic Graphs and Tables

Data are the building blocks for discovering information about markets and the opportunities that they offer. To accomplish this, data can be "looked at" using simple, static tables or pie and bar charts. Or, they can be organized and studied (i.e., analyzed) dynamically with powerful graphing and tabulation tools that allow drilling down and linking data for insight. Static analyses lead to shallow, naive, and incomplete views of a market; dynamic analyses lead to penetrating insight and understanding.
The graphing and tabulation capabilities of JMP allow you to dynamically:



build graphs to drill down on data;

build summary tables of key statistics;

link data for penetrating insight.

The JMP Graph Builder, the centerpiece of its dynamic graphing capability, allows you to create different views of your data by dragging variables to different locations on a palate to paint a picture. But the picture can be dynamically changed to another form for different views. In addition, a data filter can be used to focus on one subset, or “slice” of the data table. Finally, clicking on segments of a graph highlight rows in the associated data table and segments in other graphs. Some of these features allow you to:



build simple to complex scatter plots;

change scatter plots to box plots or histograms;

add smooth lines to scatter plots to see general patterns or trends;

build panel graphs;

create pie and bar charts;

link to the data table for further analysis.

I illustrate these features throughout this book.
Last updated: January 4, 2018
Scripting Capabilities

Some advanced data management tasks require the use of a programming language. The JMP Scripting Language ( JSL ) is a powerful statistical, data-handling language unto itself. It has the expected language capabilities such as:



conditional statements;

looping;

table manipulation

to name a few. In addition, it has many built-in probability distributions, matrix operations, and statistical methods that make it especially useful for wrangling data in preparation for statistical analysis.
I don’t discuss the scripting language in this book, but it is referred to as needed.
Last updated: January 4, 2018
Integrating with SAS and R

Increasing the Power and Flexibility of JMP

No statistical package can be all encompassing. JMP is no different despite its power, flexibility, and versatility. In many instances, you’ll have to draw on other packages for specialized routines or methods. JMP integrates with both SAS and R; you can build on these software packages to expand and enhance the functionality of JMP. [ 3 ]

SAS Integration

SAS is undeniably the most powerful and comprehensive statistical package available. Because of the tight integration with JMP, the power of SAS is readily available through JMP. With the SAS integration, you can:



read and write SAS data sets;

submit SAS programs;

retrieve results and graphs as JMP data tables or reports.

The use of SAS is illustrated several times throughout this book.

R Integration

R is considered by many to be the de facto standard for statistical analysis. It's a programming language that is function based, meaning that every command is a function. [ 4 ] R is a dialect of the S programming language originally developed at Bell Labs for handling data analyses. R is built on the premise that its functionality can be extended through packages written and made readily available through a worldwide network of users. There are literally thousands of R packages available.
Its data focus, package extensibility orientation, and the fact that it's free, makes it the product of choice among many, especially in academics. There are four problems with R, however. [ 5 ]


It has a very steep learning curve, especially considering that everything in R must be programmed; there are few GUI interfaces that remove the burden of programming. Market researchers who are challenged by programming, let alone statistics, would certainly shy away from R for this fact alone.

R’s output is completely unadorned, meaning that computation results are returned as lists or the simplest, most basic tables. Presentation quality output must be created elsewhere or by using specialized R packages.

Finding a needed function can become a task unto itself because the function can be in any package, and in fact the same function name can appear in different packages. Also, there are several functions that do the same things but have different names and argument lists.

R is unforgiving. It does exactly what you ask it to do even if it's inappropriate for your data. This is a big issue for market researchers who may have to work with many different types of data with specialized analysis needs.
JMP integrates with R so that, like the SAS integration, data can be sent to R to take advantage of specialized packages and have results returned to JMP either as data tables or graphs. This makes JMP an effective front-end for R, greatly increasing the functionality of JMP.

Integrating with Other Software

JMP allows you to integrate with software that can be executed through a command-line call. For instance, you could execute a Python script with pandas function calls for special data processing. From JMP, you would first export data as a CSV file (which is discussed in Chapter 3), execute the JSL RunProgram function with appropriate options, and then import results for further processing in JMP. This capability is not demonstrated in this book.
Last updated: January 4, 2018
Points to Remember

The main point of this chapter is that JMP can handle all your market research needs. This is summarized in Figure 2.4 Market Research Functions in JMP . JMP has a depth of functionality for market research as shown in this "fishbone" chart.

Figure 2.4 Market Research Functions in JMP

Last updated: January 4, 2018
Exercises

Research Concepts



Distinguish between documentation and reproducibility. Why is each important?

What are metadata?

What are three tasks of data management?

Describe data wrangling. Is this an iterative process? How so?

Discuss data wrangling processes typically done before any analysis of data.

What data management activities are associated with data wrangling?

JMP Concepts



Briefly describe how JMP can help you with documentation and analysis reproducibility. Why should you be concerned with reproducibility?

What is a script? Why are scripts useful? Do you need to know the scripting language to successfully use JMP?

List some uses of Table Variables .

Discuss some statistical capabilities in JMP that would be used by you or your organization.

What does it mean to say that JMP is extensible?

What file formats does JMP support?

Discuss some of the capabilities of the JMP Graph Builder .
Last updated: January 4, 2018
Works Cited

Cross, David. (2001). Data Munging with Perl. Greenwich, CT: Manning.
Dasu, Tamraparni, and Theodore Johnson. (2003). Exploratory Data Mining and Data Cleaning. Hoboken, NJ: John Wiley & Sons.
Fomel, Sergey, and Jon F. Claerbout. (January 2009). Reproducible Research. Computing in Science & Engineering , Vol. 11, No. 1, pp. 5–7.
Gandrud, Christopher. (2015). Reproducible Research with R and RStudio. 2d ed. Boca Raton, FL: CRC Press.
Kandel, Sean, Jeffrey Heer, Catherine Plaisant, Jessie Kennedy, Frank van Ham, Nathalie Henry Riche, Chris Weaver, Bongshin Lee, Dominique Brodbeck, Paolo Buono. (October, 2011). Research directions in data wrangling: Visualizations and transformations for usable and credible data. Information Visualization , Vol. 10, No. 4, pp. 271–288.
Kazil, Jacqueline, and Katharine Jarmul. (2016). Data Wrangling with Python. Sebastopol, CA: O'Reilly Media, Inc.
McCullough, B. D. (June 15, 2008). Microsoft Excel’s 'Not The Wichmann–Hill' random number generators. Computational Statistics & Data Analysis , Vol. 52, No. 10, pp. 4587–4593.
McCullough, B. D. (June 15, 2008). Special section on Microsoft Excel 2007. Computational Statistics & Data Analysis , Vol. 52, No. 10, pp. 4568–4569.
McCullough, B. D., and David A. Heiser. (June 15, 2008). On the accuracy of statistical procedures in Microsoft Excel 2007. Computational Statistics & Data Analysis , Vol. 52, No. 10, pp. 4570–4578.
McKinney, Wes. (2013). Python for Data Analysis. Sebastopol, CA: O'Reilly Media, Inc.
Su, Yu-Sung. (June 15, 2008). It’s easy to produce chartjunk using Microsoft Excel 2007 but hard to make good graphs. Computational Statistics & Data Analysis , Vol. 52, No. 10, pp. 4594–4601.
Utlaut, Theresa L., Georgia Z. Morgan, and Kevin C. Anderson. (2011). JSL Companion: Applications of the JMP Scripting Language. Cary, NC: SAS Institute Inc.
Yalta, A. Talha. (June 15, 2008). The accuracy of statistical distributions in Microsoft Excel 2007. Computational Statistics & Data Analysis , Vol. 52, No. 10, pp. 4579–4586.
Last updated: January 4, 2018
End Notes
1: This is a narrow approach to reproducibility in that I focus only on project work for a market research client, either an internal or external client. For academic researchers, this is a very big issue since their research findings must be replicated and duplicated by others as part of the scientific approach. I'm not concerned with the academic in this book, but certainly JMP will help here as well. [ return ]
2: Metadata is data about the data; basically, it's documentation. For an extensive discussion and references on metadata, see the Wikipedia article http://en.wikipedia.org/wiki/Metadata. Last accessed on November 3, 2015. [ return ]
3: MATLAB integration is also available but this is not covered in this book. [ return ]
4: See http://adv-r.had.co.nz/Functional-programming.html. Last accessed on November 3, 2015. [ return ]
5: See http://www.kdnuggets.com/2015/05/r-vs-python-data-science.html for discussions of other R problems and a comparison to Python. Last accessed on November 3, 2015. [ return ]
Last updated: January 4, 2018
Chapter 3: A Very Short Introduction to JMP


Introduction
JMP Workspace
Data Table
JMP Reports
Points to Remember
Exercises
Works Cited
End Notes
Last updated: January 4, 2018
Introduction

JMP has many powerful features for the thorough and dynamic analysis of market data, capabilities that go beyond software used by most analysts. My goal in this chapter is to provide a very short introduction to JMP that helps new users understand the software's structure and features. For other information on using JMP, see the three JMP books Discovering JMP , Using JMP , and Basic Analysis that come bundled with your JMP installation. The material below shows the Windows installation, though the Mac version is very similar. The scripting language ( JSL ), although mentioned in this chapter and referred to in others, is not covered in this book. See the JMP Scripting Guide for details on JSL . Also see Utlaut, Morgan, and Anderson (2011) for an excellent introduction to JSL .
You can skip this chapter on a first reading if you already have a good understanding of JMP. Otherwise, view it as a way to gain an overview of the JMP structure for analyzing market data and an understanding of some of the phraseology unique to JMP that is used throughout this book.
This chapter has three main sections.


The first section lays out the structure of the JMP workspace. Understanding this structure will make you more proficient in using JMP because you'll spend less time looking for menu options and ways to navigate data tables.

The second section describes the JMP data table and how it is different from a spreadsheet and other software packages' methods of storing data.

The third section provides a brief overview of JMP reports.
Last updated: January 4, 2018
JMP Workspace

Introduction

When you first open JMP you see a workspace, or JMP Home Window , divided into two panes: Recent Files and Window List . [ 1 ] There is a main menu bar at the top of the workspace that provides the usual software options plus options specific to JMP. Below the main menu bar is a tool bar with short-cut icons. You should be familiar with the general structure and functionality of the menu and tool bar concept from other Windows products. JMP uses the same concepts, but, of course, it has its own twists added. These are shown in Figure 3.1 JMP Workspace .

Figure 3.1 JMP Workspace


Main Menu Bar

Introduction

The main menu bar has the usual menu options -- File , Window , and Help -- plus several others.

File

The File option allows you to do the following:



create new data tables (discussed below), scripts, Journals, and queries;

open files;

connect to databases;

establish connections with SAS;

interact with the Internet;

set preferences;

perform typical file functions such as print, send, and save.

The Preferences option is an important one because this is where you can make JMP "your own" by specifying the look and feel of reports, tables, graphs, fonts, and so forth. To access the Preferences dialog box:


Click File on the main menu bar.

Click Preferences .
Figure 3.2 Main Preferences Dialog Window shows the main Preferences dialog window with my general preference settings.

Figure 3.2 Main Preferences Dialog Window

An important Preferences option is Platforms . Platforms in JMP are used to organize the different types of statistical analyses. For example, there's a Categorical platform for categorical analysis, a Choice platform for choice modeling, a Graph Builder platform for creating graphs, a Fit Least Squares platform for regression analysis, and many more. Each one has a multitude of options for look and feel (e.g., colors), report content, and analysis specifications so you can tailor JMP to your liking. For example, the Distribution platform, which is often used for initial analysis, can calculate many different types of summary statistics. Most market researchers have their favorites and won't want to see all of them in most reports. Using File/Preferences/Platforms/Distribution you can set which statistics appear by default so that each time the Distribution platform appears your report is automatically customized to your needs.
Many of the platforms have personalities which are collections of analysis options within a platform. For example, the Fit Model platform has several personalities, each of which allows you to specify a particular type of regression model. Figure 3.3 JMP Fit Model Platform Personalities shows the personalities for this particular platform. I illustrate platforms and personalities appropriate for market research applications throughout this book.

Figure 3.3 JMP Fit Model Platform Personalities

If SAS is available either on a network server or locally on your computer, then a connection to SAS can be established. The connection establishes a hand-shake between the two programs so you can transfer data and submit SAS programming code. The drop-down menu options for a connection are shown in Figure 3.4 The JMP SAS Menu .

Figure 3.4 The JMP SAS Menu


Tables

Data tables are the main repository for your data. I discuss tables and their structure in a separate section below.

DOE

" DOE " stands for Design of Experiments , a very large, complex, and important sub-discipline of statistics. It should go without saying that before any form of statistical analysis can even begin, data must first be collected. For many applications, including market research, how the data are collected can substantively influence the results. A designed experiment can help ensure that relationships found in the analysis stage are "true" relationships and not artifacts of the collection scheme. Though mostly used in laboratory and industrial settings, experimental design can also benefit market researchers. Experimental designs applicable to choice studies are developed in Chapter 6.

Analyze

Two very important main menu options for market data analysis are Analyze and Graph . Analyze is where most of your statistical analysis is done. Tabulations and categorical analyses, model estimations including choice models, multivariate analyses, correlation analyses, and basic descriptive analyses are all platforms that can be selected here. The array of platforms is shown in Figure 3.5 Analyze Main Menu Items .

Figure 3.5 Analyze Main Menu Items

I focus on nine of these in this book, the ones most important for market data analysis.
1). Distribution
The Distribution platform calculates univariate, descriptive summary statistics appropriate for categorical and continuous variables and draws histograms and boxplots as appropriate. There are many summary statistics that can be calculated. To make the displayed list manageable, you can specify the ones you typically use (e.g., mean, standard deviation, and range) and set the look and feel of histograms in File/Preferences . For example, I set my histogram preferences so that I see a horizontal display, with percents over the bars, and a scale that shows percents rather than frequency counts.
If you want to also see this same type of histogram display as a default, then do the following:


Click on File/Preferences.

Click on Platforms in the Preference Group list.

Click on Distribution in the Platform list.

Check the options as in Figure 3.6 My Default Histogram Settings .

Click the OK action button at the bottom of the window.
Your defaults are now set.

Figure 3.6 My Default Histogram Settings

2). Fit Y by X
This is a general platform for analyzing a simple statistical model with a Y and an X where Y is the dependent variable and X is the independent variable in a statistical model. The type of analysis you can do depends on the modeling type of the X and the Y variables (continuous, ordinal, or nominal). If the modeling types have been set properly (as discussed below), the user only needs to assign an X and a Y column. JMP then selects the appropriate analysis for that type of data. The possible analysis methods for the different modeling types for this platform are shown in Fit Y by X Platform Analytical Options .
Table 3.1 Fit Y by X Platform Analytical Options

Y Dependent Variable
X Independent Variable
Continuous
Nominal/Ordinal
Continuous
Bivariate Regression
Logistic Regression
Nominal/Ordinal
ANOVA
Contingency Table
3). Tabulate
The Tabulate platform allows you to create tables through a drag-and-drop method. Variables can be dropped to row and/or column locations on a canvas to create one-way, two-way, and multi-way tables. Cell values can be easily changed to reflect different statistical summary measures such as counts, means, percentages, and sums. The measures are summarized in Tabulate Platform Summary Statistics . In addition, you can nest variables to see how statistics differ across different combinations of values. Finally, the table can be converted to a data table for further analysis. A script can be saved, perhaps to the source data table, to reproduce a tabulation if needed. This is illustrated in Chapter 4.
Table 3.2 Tabulate Platform Summary Statistics
Summary Statistics
Statistics
Description
N
Frequency count
Mean
Arithmetic average
Std Dev
Standard deviation
Min
Minimum
Max
Maximum
Range
Range = Maximum – Minimum
% of Total N
Percent of total values
Missing
Number of missing values
N Categories
Number of categories
Sum
Sum of data
Sum Wgt
Sum of weights
Variance
Sample variance
Std Err
Standard error of the mean
CV
Coefficient of variation
Median
Median or second quartile ( Q2 )
Interquartile Range
Interquartile range = Q3 – Q1
Quantiles
Quantiles (e.g., Q1, Q2, Q3, Q4)
Column %
Percent of column total
Row %
Percent of row total
All
Grand or aggregate totals
4). Text Explorer
This is an innovative platform for analyzing text, a process sometimes called Text Mining as opposed to Data Mining . [ 2 ] In market research, free text, or verbatim, questions often appear as part of a survey. This may be as simple as the ubiquitous “Other (Please Specify)” so common in questionnaires, or more complex, such as a question asking consumers to provide a descriptive word or phrase that describes a product or purchase experience. Market researchers often analyze text by manually coding the words and phrases (i.e., assigning them numbers or into groups) and then counting the occurrence of each. This is a labor-intensive process, and because of the "need for speed" in market research, it often means much of the verbatim text goes unanalyzed. The Text Explorer platform alleviates this problem. Text Explorer is illustrated in Chapter 7.
5). Fit Model
The Fit Model platform is an extension of the Fit Y by X platform for advanced regression methods with multiple X s. Independent variables can be specified as main effects and interaction effects or nested in a hierarchy. All variables can also be transformed "on-the-fly" without changing the original data. The most common transformations for market analysis are the log (i.e., natural log) and logit transforms.
Table 3.3 Fit Model Platform Functions
Fit Model Functions
Personality
Description
Standard Least Squares
Conventional OLS
Stepwise
Stepwise analysis for standard OLS and ordinal logistic analyses
Generalized Regression
Generalized regression analysis with a number of response distributions
Mixed Models
Mixed models with fix and random effects and repeated structures analyses
MANOVA
Multivariate ANOVA
Loglinear Variance
Loglinear models
Nominal Logistic
Nominal logistic models
Ordinal Logistic
Ordinal logistic models
Proportional Hazard
Cox proportional hazards model for Survival Analysis
Parametric Survival
Survival analysis
Generalized Linear Models
Generalized Linear Models ( GLM ) with several distribution and link functions
Partial Least Squares
Partial least squares analysis
Response Screening
Linear models across a number of responses
6). Predictive Modeling
This platform has a number of non-regression methods for modeling data, but for market data analysis the Partition platform would be the most likely one you would use. The model for partitioning a data set is sometimes called a decision tree because it shows bifurcations of a dependent variable or a number of independent variables, and the results are shown as a tree diagram. The dependent and independent variables can have any one of the three modeling types: Continuous , Ordinal , or Nominal . The Partition platform is often used for segmentation studies, where customers are placed into groups according to their preferences. It is also useful for key driver analysis, where the researcher attempts to determine what product qualities drive a particular satisfaction measure.
7). Multivariate Methods
This platform has several traditional multivariate methods: correlation analysis, principal components, discriminate analysis, and partial least squares. These are organized as subplatforms under the larger umbrella of Multivariate Methods . Correlations are illustrated in Chapter 4.
8). Clustering
A major function of market research is to segment the market as mentioned in Chapter 1. Segmenting means finding groups of consumers who are similar to each other, but different from consumers in other groups (i.e. consumers are homogenous within a group and heterogeneous among groups). The segments or groups are sometimes referred to as clusters . This menu provides a wealth of sophisticated clustering methodologies as platforms ( Hierarchical , k Means , Normal Mixtures , and Latent Class analysis) that can be used to segment your market. These methods are illustrated in Chapter 7 which is devoted to market segmentation.
9). Consumer Research
This is a very important menu for market research because so much of research deals with understanding consumer behavior in a business-to-consumer ( B2C ) context, and many of the same techniques are relevant for business-to-business ( B2B ) problems. This menu contains platforms for Categorical Analysis , Multiple Correspondence Analysis , Factor Analysis , MaxDiff Analysis , and Choice Analysis . [ 3 ] All of these, except Factor , are illustrated later in this book: Categorical Analysis and Multiple Correspondence Analysis in Chapter 5, MaxDiff Analysis and Choice Analysis in Chapter 6.

Graph

Next to the Analyze option, this is the second most important main menu option because it allows you to dynamically view your data. There are many graphing options below this menu option, but Graph Builder is the main one that is illustrated throughout the book.

Tools

Most of what is available under this menu option is useful for graph annotation and as aids to interacting with JMP graphs. I don’t say much about these tools in this book.

Add-ins

A strong feature of JMP is its extensibility via script packages written in JSL . These add-in packages can be developed by you or someone in your organization and distributed for general use by your colleagues. There is also a community of JMP users who write and distribute add-ins for general use. Packages are available at the JMP web site https://community.jmp.com/t5/Add-Ins/ct-p/Addins. Two that I find particularly useful are:


Interactive Binning

Switch to short or long names
Some data that is collected as continuous is better analyzed as categorical data. The process of turning a continuous variable into a categorical variable is called "binning" (because you are placing data values into groups or "bins"). The Interactive Binning add-in allows you to interactively create bins of continuous data by moving a vertical line on a histogram. Other vertical lines can be added by clicking a button to define new bins. You can also set cut-points at percentiles and the mean the standard deviation. A new column with a formula can be added to your data table so that the newly binned data can be used in further analyses. This add-in is illustrated in Chapter 4.
I find the second add-in to be useful when I receive SPSS-formatted data because these files usually have short variable names as well as long descriptive names (SPSS labels). This add-in allows you to switch back and forth between the two in a data table. The long names are good for reports while the short names are convenient for analyzing the data, but this is purely a personal preference.
Once an add-in is downloaded from the JMP website, installing it is very easy:


Click File/Open and navigate to the folder where you saved the add-in.

Select the type of file to open by clicking on the drop-down menu in the lower right corner of the File/Open dialog box as illustrated in Figure 3.7 Add-in Main Menu . You want the JMP add-in extension .jmpaddin.

Select the JMP add-in you want to install.

Click Open in the action area at the bottom of the window.
JMP will install the add-in and make it available through the Add-ins main menu option. You’re all set to use this package at any time.

Figure 3.7 Add-in Main Menu


View

This option allows you to view different windows, the most important of which is the Log if you write scrips in JSL . The Log is where all script commands are repeated and, most importantly, error messages are printed. This is very helpful during script development since errors will always occur -- no script is ever written perfectly the first time. If you're familiar with SAS, then you'll quickly see that the JMP log is similar to the SAS log. The Log is be discussed further since scripts are not a focus of this book.

Windows

The usual Windows features (such as Close , Minimize , Restore , etc.) are available here.

Help

Last, but not least, is the Help option. The usual help features are available, but the most important are:



Books

Statistics Index

Scripting Index

Sample Data

The Books option gives you access to all the JMP documentation in PDF format so this is a great reference source for all the JMP documentation. The Statistics and Scripting Indexes have examples of statistics and programming functions. The Sample Data option gives you quick access to sample data tables provided with a JMP installation. Many of these sample tables have scripts that show you the types of analyses that can be done, so this option is useful when first learning JMP.

Main Menu Bar Customization

You’re not restricted to just the main menu options factory-installed with JMP. You can create your own to customize JMP for your purposes and work style. For instance, I created two menu options to help me be more organized for teaching and delivering presentations and workshops. My Teaching menu option has a drop-down list of courses so I can quickly pick one and display the course roster, syllabus, lectures, demonstration scripts, and example data I need for that course. My Presentation Manager menu has a drop-down list of conference and workshop presentations that allow me to pick a presentation and relevant example data without having to search during the presentation. See Figure 3.8 My Customized Menu for an example. Menu structures can be created for other purposes such as project management.
To customize the main menu:


Right click anywhere on the main menu bar and select Customize/Menus and Toolbars. See Figure 3.9 Accessing the Customized Menu Options .

Select where you want the new menu item to be placed using the left panel of the dialog box that appears.

Complete the necessary information, including the name of the menu item and any other pertinent information about the item.

Press the OK action button and the menu item will be added to the main menu bar.

Figure 3.8 My Customized Menu


Figure 3.9 Accessing the Customized Menu Options


Recent Files Pane

File Management Features

The Recent Files section shows all previously opened files of any type, making it relatively easy to locate a file to reopen. Just double click on the file to open it. Alternatively, you can select the file and click on the file open icon on the small menu bar at the top of this section (first icon from the left). See Figure 3.10 File Management Icons .
The recent files list can become long, making it difficult to find the recent file you want. JMP has three file management features to ease the burden:


sorting;

filtering;

pinning.
These features can be accessed from the small set of five icons on the Recent Files top bar as illustrated in Figure 3.10 File Management Icons . From left to right, these icons are:


Open Selected JMP data table or file. This is a quick way to open a table.

Sort the files by name and most recently opened.

Filter files by their type (i.e., data table, report, script, etc.). The list of possible filter items is shown in Figure 3.11 File Management Filter Icon .

Change Icon Size.

Close the Recent Files pane (shown as a small “x”).

Figure 3.10 File Management Icons


Figure 3.11 File Management Filter Icon


Sorting

You can sort your file list by right clicking just to the left of the file list and picking Sort by Name or Sort by Most Recent Usage from the pop-up menu. The sorting function can also be accessed from the sort icon on the small menu bar (second icon from the left next to the open icon).

Filtering

The file filter, accessed by an icon also on the small menu bar (third icon from the left) allows you to narrow the list to certain file types (JMP data table, script file, text file including CSV formatted files, etc.).

Pinning

If you have a file, or set of files, that you use often, you can pin it to the top of the file list so that it's always available. Just select the file to pin, right click as if to sort, but instead select Pin File from the pop-up menu. To unpin a file, just select it, right click, and select Unpin File . If you sort the files, the pinned files are also sorted, but as a separate group at the top of the list. Finally, if you use the file filter, the pinned files are also filtered.

Window List Pane

All currently open, active JMP windows, such as data tables, script windows, the Log window, and reports are listed in this section. Clicking on the window name brings the window to the front. A Filter icon similar to the one described above is available to narrow the list.
Open reports based on a particular data table are listed under that data table in an outline format. See Figure 3.12 Window List . In this example, the data table Big Class has a chart for sex and a distribution report for height listed under it. Both can be quickly accessed. Big Class is a sample data table installed with JMP and often used for examples in the JMP documentation. It has five variables for 40 people: Name , Age , Sex , Height , and Weight .
To open the Big Class data table:


Select Help/Sample Data from the main menu bar.

Click on the Open Sample Data Directory button.

Navigate to the Big Class.jmp file where . jmp is the JMP data table file extension.

Click on the file and it will open.

Figure 3.12 Window List

Last updated: January 4, 2018
Data Table

The Data Table Paradigm

The data table (much like a SAS data set, R dataframe, or pandas DataFrame in Python) is the main data repository in JMP. Data tables are like spreadsheets with rows and columns forming a rectangular array, but the similarity ends there. The rows are the observations , e.g., objects , people , firms , instances , time periods for which data are collected while the columns are the variables or measures on those objects. In a JMP data table, operations (equations or formulas) are done at the column level, unlike in a spreadsheet where the operations are done at the cell level and have to be copied from one cell to another. [ 4 ] You can therefore think of a JMP data table as being column-centric instead of cell-centric. Scripts can be written to operate on a data table as in SAS, R, and pandas but this is not necessary since there are ample menu-driven options for doing almost all needed operations.
There are four sections to a JMP data table:


Documentation and scripting;

Column identification: number of variables, variable types, and current variable states;

Row features: number of rows, number of active rows, row states;

Data grid.
These sections are highlighted in Figure 3.13 Basic Data Table Structure . Notice that each section has a small, red triangle that points downward. These icons open menus that are called red-triangle menus (or RTM s).
Red triangle menus appear on almost all JMP reports and in many other windows. They are always context-specific, meaning that the menu selections are tailored to the report or section where they appear. Going forward in this book, I refer to the red triangle menu as the RTM since it’s so important and is referred to frequently.

Figure 3.13 Basic Data Table Structure

Each section in the data table has a separate drop-down menu of options accessed by clicking the RTM . The data grid has two RTM s: one for the columns and one for the rows.
I describe each part in detail in the following four sections.

Documentation and Scripting

Introduction

There are two ways you can document data and analysis steps:


Table Variables

Table scripts.

Table Variables

Table Variables , briefly mentioned in Chapter 2, can hold any documentation; basically, metadata. Metadata are important because any project can quickly generate a plethora of tables, sometimes disconnected from each other. Table variables can be used to record important information such as the project title, client name, the original source for the data, and a brief description of the data's purpose for answering the client's question.
To add a Table Variable :


Open a data table or select a data table that is already open. In Figure 3.14 Example Table Variable , the Big Class data table is open as an example.

Click the RTM at the upper left of the scripts and documentation pane and select New Table Variable from the pop-up menu.

Give the Table Variable a name in the Name box and enter whatever content you want in the Value box. See Figure 3.14 Example Table Variable for an example for the Big Class data table.

Figure 3.14 Example Table Variable


Table Scripts

Scripts, called Table Scripts , can be saved to the data table to recreate an analysis, report, or graph thus enabling reproducibility. I describe this in Chapter 4.

Columns

Background

Columns is an important piece of the data table display because it allows you to manage your variables. The top of this section indicates the number of columns in the table and the number currently selected. Just below this is a list of columns in the data table. See Figure 3.13 Basic Data Table Structure .
When you right click on the icon to the left of the column name, a small menu appears that allows you to set the modeling type if the variable data type is numeric, which tells JMP which statistical analyses can be performed on each column. Character data, data stored as text or strings, can have the modeling type of Ordinal or Nominal with the latter as the default. For numerical data, the valid modeling types are Continuous , Ordinal , and Nominal , with Continuous as the default.
Ordinal variables are variables with only a few distinct levels that have an inherent order. Survey questions with the possible answers "Agree," "Neutral," or "Disagree" would have an ordinal modeling type. Verbatims for the “Other (Please Specify)” found in most questionnaires are probably independent of any order. Nominal variables also have only a few distinct values, but they do not have an inherent order. A survey question such as " What is your favorite color? " Blue, red, yellow, or green, would have a nominal modeling type. The Character/Nominal icon is also shown in Figure 3.15 Modeling Icons .
A continuous numeric variable corresponds to a ratio or interval scaled measure and has a decimal part, such as 3.14159, although the decimal doesn’t have to be shown. So 3 is the same as 3.0. The difference between an ordinal numeric variable and a continuous numeric variable is not always obvious. The survey literature contains much discussion about when ordered variables can be treated as continuous, and how the scale used affects analysis outcomes. See Stevens (1946) for the original discussion on scales. Also see the Wikipedia article "Level of measurement" at https://en.wikipedia.org/wiki/Level_of_measurement , last accessed September 21, 2016.
The small icon next to the variable name in the column pane identifies the modeling type assigned to a variable. The icons are shown in Figure 3.15 Modeling Icons . You can change the modeling type by right clicking on an icon and selecting the new modeling type from the pop-up menu.

Figure 3.15 Modeling Icons

Data types and modeling types are summarized in Variable Types in JMP .
Table 3.4 Variable Types in JMP
Modeling Type
Use
Examples
General
Marketing
Continuous (numeric data only)
Show decimals
3.14159
Price; Time spent; Agree/ Disagree (Interval level)
Ordinal
Show order; strings
First, second, third; Low, medium, high; any text or string
Numeric coding for: Rank brands; Agree/Disagree (Non-interval level); Household Income (Ranges); “1”, “2”, “3”
Nominal
Show choice, categories; capture expressions, opinions, any text
0/1 Dummy/ binary/ indicator variable; any text or string
Numeric coding for: Yes/No; Male/ Female; Buy/ Don't Buy; Agree/ Disagree; Marital Status; “What word best describes this product?”; “Other ( Please Specify)”; ZIP Code
There are other columns, specialized, column types that can be used to help document data or for advanced analysis. See the Using JMP book included with your documentation.
Clicking the RTM for the Column Identification section opens a menu list of options for managing that column. The most important are:


New Columns;

Column Info;

Standardized Attributes;

Hide/Unhide;

Exclude/Unexclude;

Recode;

Utilities;

Group Columns;

Ungroup Columns.
I’ll discuss a few of these next. See the JMP book Using JMP that comes bundled with your JMP installation for a complete explanation of all of the properties. This book can be accessed using Help/Books/Using JMP .

New Columns

This allows you to add a new column to the data table (you can also add multiple columns at once). When you create a new column, you can specify its Data Type (i.e., Character or Numeric ), its Modeling Type , format (e.g., fixed decimal, percent, currency), and set its properties. Properties are characteristics of the columns the way Row States are characteristics of the rows. There is a long list of properties with the most important for the analyses shown in this book displayed in Column Properties .
Table 3.5 Column Properties
Property
Function
Formula
Create and edit a formula for the whole column
Notes
Record a note about the column
Value Labels
Assign labels to numeric values
Value Ordering
Specify the order in which the data will be handled (e.g., 0 before 1 or 1 before 0; the default is 0 before 1)
Missing Value Codes
Specify what numeric values should be treated as missing data
Other
Define your own property
Column properties are accessed by right clicking on a column and selecting either Column Info… or Column Properties from the pop-up menu. I recommend that you use Column Info… because it gives you a complete picture of the column (i.e., variable).

Figure 3.16 Accessing JMP Column Properties


Formula

Formulas tell JMP to perform an operation (calculation, conditional assignment, random number generation, character extraction, etc.) on each row in the column. This is a major difference between JMP and a spreadsheet program where a formula applies to a single cell in a column that must then be appropriately copied to all the other cells in the column; the spreadsheet is cell-centric. In JMP, a single formula is applied automatically to all cells (i.e. rows) of the column at once; JMP is column-centric. The Formula property is where a formula is created, edited, and stored. A discussion on creating formulas would require a chapter unto itself. See the JMP book Using JMP that comes bundled with your installation of JMP. You can access the book by using Help/Books .

Notes

Notes can add additional information about a variable beyond the name. For example, a common question in a customer satisfaction survey is the likelihood to recommend the brand. The variable name is sometimes just recorded as "rec." A Note would identify this as, say, "Likelihood to recommend to a family member, friend, or coworker." Notes are displayed whenever the mouse cursor is hovered over the column name, either in the Columns section, or in the data grid. A Note could also record where that specific variable came from, any issues you should remember, and so forth.

Value Labels

The Value Labels property allows you to associate meaningful labels or tags to the stored column values. For example, it’s not uncommon in a questionnaire to code gender as “1 = Male” and “2 = Female.” The respondent would check the appropriate box, but the numeric codes would appear in the data table. Some columns may have a large number of codes. With the Value Labels property, you can assign the words “Male” to 1 and “Female” to 2. You then have the option to display these labels instead of the number codes in the data table, reports, and graphs, something I definitely recommend.
To turn Value Labels on or off:


Right click on the column you want to change.

Check or uncheck Use Value Labels from the pop-up list.

Figure 3.17 Gender Coding Snippet


Value Ordering

JMP automatically sorts values in alphanumeric order, meaning that “a” comes before “b”, which comes before “c”, etc. and 0 comes before 1 which comes before 2, and so on. This may not be what you want. For example, referring to the Male/Female coding above, you will always have males appearing before females in a report since 1 comes before 2. If you’re studying the jewelry market where women are the dominant owners of jewelry, you may want females first to emphasize their role in this market.
You can change the value order using the Value Order dialog box accessed from Column Properties . JMP initially places the values in alphanumeric order, which means they’re in alphabetical order followed by numeric order. Figure 3.18 Value Ordering illustrates an initial ordering for the sex variable in Big Class . Notice that Females (“F”) is listed before Males (“M”).

Figure 3.18 Value Ordering

To change the value ordering:


Select the column you want to change.

Right click on this column and select Column Info … from the pop-up menu.

Select Value Ordering from the property list on the left of the column information dialog box. The values are initially in alphanumeric order if this is the first time you’re accessing the Value Ordering property. Otherwise, they’re in whatever order you previously specified.

To move an item in the list, select the item and press the Move Up or Move Down button as needed.

If you only need to reverse the existing order, use the Reverse button. This reverses the entire list.

Click the OK action button on the Column Info… dialog box to save the order.
The order of values is important for the order of bars in bar charts, slices in pie charts, and rows and columns in tables. Changing the orders in these reports can have an impact on how results are interpreted, something I illustrate in Chapter 5 regarding odds ratio calculations. Value Ordering and Placements tells you the relationship between the Value Order and the placement of associated bars, slices, rows, and columns. For example, for a bar chart with vertical bars, the first value in the Value Ordering (starting at the top of the list) will be the first bar at the left of the chart while the last value will be the last bar. Similarly, for a bar chart with horizontal bars, the first in the list will be the bottom bar and the last will be the top bar.
Table 3.6 Value Ordering and Placements
Object
Object Part
Value Ordering
First in List
Last in List
Bar Chart
Vertical
Bar at Left
Bar at Right
Horizontal
Bar at Bottom
Bar at Top
Stacked Vertical
Bar at Bottom
Bar at Top
Stacked Horizontal
Bar at Left
Bar at Right
Side-by-Side Vertical
Bar at Left
Bar at Right
Side-by-Side Horizontal
Bar at Bottom
Bar at Top
Pie Chart
Counter clockwise
First Slice
Last Slice
Table
Rows
Top of Table
Bottom of Table
Columns
Left of Table
Right of Table

Missing Value Codes

Many questionnaires have a “Don’t Know” as an option for a question. You may not want to have this displayed in any reports or be used in calculations. You can tell JMP to treat a “Don’t Know” as a missing value by adding it to the column's Missing Value Codes. For example, if the possible options for a question are “1 = Yes”, “2 = No”, and “3 = Don’t Know,” then you would specify 3 as a missing value code for this question. You can specify as many missing value codes as necessary for a variable.
To add a value to the column’s missing value codes:


Select the column you intend to change.

Right click on the column and select Column Info … from the pop-up menu.

Select Missing Value Codes from the property list on the left of the column information dialog box.

Enter the value you want treated as a missing value. For example, if “3 = Don’t Know” is in the column and you want 3 to be treated as a missing value, then enter 3. If a character string represents a missing value, such as “.V”, then enter the string.

Click the OK action button on the Column Info… dialog box to save the codes.

Other

The Other property allows you to create your own properties simply by naming them. So you could create, for example, a new property called “Source” to describe the source for the column. Just enter a name for the property (e.g., “Source”) and a space will be created for you to enter text for the property.

Standardize Attributes

Standardize Attributes is a powerful tool for changing the same property for several columns at once, i.e., to “standardize” them. For example, you may want to ask consumers in a customer satisfaction study to rate a number of service attributes (sometimes called customer touch points) on a 5-point Likert Scale. You may provide a “Don’t Know” option, coded 99, for those who have no experience with a particular service attribute. This “Don’t Know” can be defined as missing for all the service attributes at once using missing value codes. To do this:


Select the columns to standardize.

Click the RTM in the Column Identifications section or right click on the selected columns.

Select the Standardize Attributes option.

Press the Column Properties button in the Standardize Attributes section of the dialog box.

Select the Missing Value Codes option.

Enter 99 for the code to treat as missing and click the Add button. You can add as many missing value codes as needed.

Click the OK action button.
All the selected columns will have the Missing Value Codes property, with a missing value code of 99. See Figure 3.19 Standardizing Column Attributes for an example.

Figure 3.19 Standardizing Column Attributes


Recode

Recode is a versatile utility for changing the values of a column, either by replacing the values within the column or by creating a new column of values based on another column. The utility is accessed either from the Cols main menu option, or by right clicking on a column name in the column section. Recode is demonstrated in Chapter 4.
If the column is Character , then Recode gives you text editing options accessible through the RTM . See Recode RTM Options . The options are:
Table 3.7 Recode RTM Options
Recode Text Option
Action
Convert to Titlecase
Converts the first letter of each word to upper case. The remaining letters of each word are converted to lower case.
Convert to Uppercase
Converts all letters to upper case.
Convert to Lowercase
Converts all letters to lower case.
Trim Whitespace
Trims whitespace at both the beginning and end of a character string.
Collapse Whitespace
Trims whitespace at the beginning and end of a character string as well as within the string, leaving one whitespace between each word.
First Word
Deletes all words but the first . Only the first word will be retained.
Last Word
Deletes all words but the last . Only the last word will be retained.
All But First Word
Deletes the first word and retains all the other words. It retains all but the first word.
All But Last Word
Deletes the last word and retains all the other words. It retains all but the last word.
Group Similar Values…
Groups words that are "similar," which is useful for correcting typos or data entry errors.
Start Over
Returns you to the default Recode window.
Recall
Recalls previous changes you made.
Script
Various scripting options to record how the column was recorded.

Recode is a powerful feature for cleaning your data. For more information, see the Discovering JMP book of the JMP documentation.

Figure 3.20 Recoding Column Text

When you click the "Done" button to save your changes, JMP gives you several options for how the changes should be saved. You can save them “In Place,” which means that the original column is changed. Once a column is recoded in place, it cannot be changed back. You can save them to a new column so that the original column is unchanged. You can also save a formula column that documents the instructions for how the data are recoded (i.e., a Formula column). Or you can save a recoded script that can be used on other columns with similar values. See Figure 3.21 The Recode Window .

Figure 3.21 The Recode Window


Group Columns

The Group Columns (and, to a lesser extent, the Ungroup Columns ) utility is very useful for organizing tables that have many columns. Placing variables into a group has the following effects:



The columns are placed together in one location in the table;

given a common group name (e.g., "demo");

collapsed in the Column Identification section .

The group can be moved as a whole, renamed, and clicked on for an analysis of the whole group (e.g., creating distribution tables of each demographic variable by selecting the group as opposed to selecting each variable separately). I often collapse demographic variables into a group called "demo," satisfaction variables into a group called "satisfaction," screener variables into a group called "screener," and junk variables (e.g., loop counters, empty columns, etc. that online survey tools produce, and any variables of little interest) into "junk."
To group a set of columns:


Select the columns to group from the Column Identifications pane. (The columns don’t have to be contiguous. If they’re not contiguous, hold the control key on the keyboard (Ctrl) while selecting the columns.)

Right click on the selected columns and choose Group Columns .

You can rename a group by clicking on the group and changing the name.

You can move a group by selecting it and dragging it to where you want it. You can also rearrange the components inside a group by selecting what you want to move and dragging it to a new location in the group.
See Figure 3.22 Grouping Columns for an illustration.

Figure 3.22 Grouping Columns

The Ungroup Columns utility does just what its name says.

Rows Section

Background

The Rows Section shows pieces of information about the rows or observations in the data table:


number of rows;

number of rows selected or highlighted;

number of rows excluded from analysis;

number of rows hidden;

number of rows that are labelled.
There is also a RTM for additional functionality. The features discussed in this book are:



Data Filter

Clear Row States


Figure 3.23 Rows Section


Clear Row States

Row States are row-level properties that tell JMP how the rows should be used in the analysis. Rows can be excluded so they are not used to calculate statistics. They can also be hidden so their values do not show in JMP plots. For example, if you wanted to do an analysis for Males separately from Female , you could use row selection to select only the Males , and then set those rows to be hidden and excluded. Later, when you want to do another analysis for the whole table, you can clear the row states using the clear row states selection from the RTM .

Data Filter

The Data Filter allows you to identify subsets of data according to column values. Say, for example, that you want to do a special analysis that is only for men ( Gender ="M"). You can also fine-tune values used for a Numeric/Continuous variable to narrow the range you want to study. This is important when looking at a histogram in, say, Graph/Graph Builder because outliers will distort the histogram. Using the Local Data Filter , however, will enable you to filter out the outliers. This histogram will automatically adjust for the dynamic views I talk about in Chapter 2. You can also select specific values for a Character/Nominal , Numeric/Ordinal , or Numeric/Nominal variable. This is illustrated in later chapters.
To access a Data Filter , click Rows/Data Filter or click the icon on the tool bar for a Local Data Filter . Once the Data Filter has been selected, a dialog box appears for you to indicate which variables to filter on and which values for the selected variables. Using Big Class :


Click Rows/Data Filter .

Select sex from the column listing and click Add . You can choose several columns if you wish.

Figure 3.24 Example Gray Triangle Menu

  • Accueil Accueil
  • Univers Univers
  • Ebooks Ebooks
  • Livres audio Livres audio
  • Presse Presse
  • BD BD
  • Documents Documents