rdp2 book

Author

Kristensen Kor

Published

November 3, 2024

Preface

Welcome to the rdp2 user manual, your comprehensive guide to efficient data processing in R for survey and marketing research. This manual is designed to help you navigate the functionalities of the rdp2 package, understand its core principles, and apply its tools to streamline your data analysis workflow.

Introduction to the rdp2 Package

rdp2 is a powerful and innovative R package developed to address the common tasks encountered in data processing for marketing, sociological, psychological, and other survey-based research. Unlike traditional libraries that offer a collection of functions for data frames, rdp2 introduces a new approach by utilizing R6 classes to manage datasets. This design transforms it from a mere library into a comprehensive framework for data manipulation.

The package encapsulates both raw data and its associated metadata (variable and value labels) within the DS (DataSet) class, providing a structured and intuitive framework for data manipulation. By separating data from metadata, rdp2 enhances data management, ensuring that variable and value labels are preserved throughout the data processing pipeline.

Core Principles

rdp2 is built around several core principles that differentiate it from traditional R data processing approaches:

Labeled Numeric Vectors for Categorical Variables: Categorical variables are treated as labeled numeric vectors instead of factors, providing greater flexibility and consistency in data manipulation.
Native Support for Multiple-Response Questions: The package offers robust support for multiple-response variables, enabling workflows for these variables to be consistent with those for single-categorical variables.
Support for Weighted Data: Recognizing the importance of weights in survey data, rdp2 seamlessly integrates weighting into its data processing functions.
Mutable Data Operations with R6 Classes: Departing from R’s traditional immutable data structures, rdp2 leverages R6 classes to allow for mutable operations. This shift leads to a more natural and convenient workflow, reducing the need for repetitive data reassignment.

Main Functionality

The rdp2 package provides a suite of tools designed to streamline various aspects of survey data processing:

Variable Creation and Recoding: Easily create new variables and perform recoding operations such as calculating means, top-two box scores (T2B), Net Promoter Scores (NPS), and more.
Data Import: Import SPSS datasets directly into R, preserving variable and value labels for seamless integration.
Dataset Restructuring: Reshape and reorganize datasets to suit your analysis needs, including handling of multiple-response variables and transposing data structures.
Excel Table Generation: Create formatted Excel tables with built-in significance checks, facilitating professional reporting and presentation of results.

Development Status

Please note that rdp2 is currently in the early stages of development. As an evolving project, it may undergo significant changes, and some features might be experimental. While there is no official documentation or tutorials at this time, this manual serves as a foundational guide to help you navigate the package’s functionalities.

Why rdp2? Was there rdp1?

The inception of rdp2 (which stands for ‘R Data Processing’) stems from lessons learned during the development of its predecessor, rdp1, which was not publicly released. The original version relied on tibbles and variable attributes to store metadata, but this approach proved unreliable due to R’s tendency to strip attributes during data manipulation. Additionally, the immutable nature of traditional R data structures made data processing cumbersome, requiring constant reassignment of datasets after each operation.

To address these challenges, rdp2 was rewritten from scratch using R6 classes. This architecture separates metadata from the main data object while maintaining compatibility with standard data frames (tibbles). The result is a more intuitive and efficient workflow that aligns with the functionalities of the tidyverse suite of packages.

rdp itself stands simply for “R data processing”.

Who Should Read This Manual

This manual is intended for data analysts, researchers, and practitioners in the fields of marketing, sociology, psychology, and other disciplines that involve survey data analysis. Whether you are a seasoned R user or new to data processing in R, this guide provides the knowledge and tools to leverage rdp2 for efficient and effective data management.

Additionally, this manual aims to offer practical guidance on efficiently managing and processing survey data. By the end of this guide, readers should be comfortable applying rdp2 to various survey data scenarios, making their data analysis workflow more intuitive and effective.