Skip to contents

Project Status: Active - The project has reached a stable, usable state and is being actively developed. GitHub issues

codecov CodeFactor

The R package “NeEDS4BigData” provides approaches to implement sampling methods to analyse big data.

What is “NeEDS4BigData” an abbreviation for?

New Experimental Design based Sampling methods for Big Data.

How to engage with “NeEDS4BigData” the first time ?

## Installing the package from GitHub
devtools::install_github("Amalan-ConStat/NeEDS4BigData")

## Installing the package from CRAN
install.packages("NeEDS4BigData")

Sampling Methods

  1. A- and L-optimality based subsampling for GLMs.
  2. A-optimality based subsampling for Gaussian Linear Models.
  3. Leverage sampling for GLMs.
  4. Local case control sampling for logistic regression.
  5. A-optimality based sampling under measurement constraints for GLMs.
  6. Model robust subsampling method for GLMs.
  7. Sampling method for GLMs when the model is potentially misspecified.

These seven methods are described in the following articles

  1. Introduction - explains the need for sampling methods.
  2. Linear Regression - Model based sampling.
  3. Linear Regression - Model robust and misspecification.
  4. Logistic Regression - Model based sampling.
  5. Logistic Regression - Model robust and misspecification.
  6. Poisson Regression - Model based sampling.
  7. Poisson Regression - Model robust and misspecification.

For 2, 4 and 6 we assume the main effects model can describe the data. While for 3, 5 and 7 first we consider there are several models that can describe the big data, then later we assume the given main effects model is misspecified. Under these conditions from 2 − 7 we explore sampling for three given big data sets.

Thank You

Twitter