Microsoft R family

Page 1

R at Microsoft Guillermo Julca (Yemo) The Marketing Advantage, Inc @2017



• • • • • • • • • •

1993: Research project in Auckland, NZ • Ross Ihaka and Robert Gentlemen 1995: Released as open-source software • Generally compatible with the “S” language 1997: R core group formed 2000: R 1.0.0 released 2003: R Foundation formed in Austria 2004: First international user conference 2007: Revolution Analytics founded 2009: New York Times article on R 2013: Revolution R Open released 2015: Microsoft acquires Revolution Analytics

3


New York Times, June 25 2009 (3 hours after Michael Jackson’s death)









Memory bound because product can only process datasets that fit into the available memory. 1

Because the Intel Math Kernel Library (MKL) is included in Microsoft R Open, the performance of a generic R solution is generally better. MKL replaces the standard R implementations of Basic Linear Algebra Subroutines (BLAS) and the LAPACK library with multithreaded versions. As a result, calls to those low-level routines tend to execute faster on Microsoft R than on a conventional installation of R. 2
















More at deployr.revolutionanalytics.com








• Multithreaded library replaces

standard BLAS/LAPACK algorithms •

Intel MKL on Windows/Linux ; Accelerate on Mac

• High-performance algorithms • Sequential  Parallel • Uses as many threads as there are available cores

• No need to change any R code • Included with RRO binary

distributions

More at Revolutions blog


blog.revolutionanalytics.com/popularity R Usage Growth

Rexer Data Miner Survey, 2007-2013

Language Popularity

IEEE Spectrum Top Programming Languages

#9: R • Rexer Data Miner Survey

• IEEE Spectrum, July 2014


Data Step ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪ ▪

Data import – Delimited, Fixed, SAS, SPSS, OBDC Variable creation & transformation Recode variables Factor variables Missing value handling Sort, Merge, Split Aggregate by category (means, sums)

Statistical Tests

▪ ▪ ▪ ▪

Chi Square Test Kendall Rank Correlation Fisher’s Exact Test Student’s t-Test

▪ ▪

Subsample (observations & variables) Random Sampling

Predictive Models

Descriptive Statistics

Min / Max, Mean, Median (approx.) Quantiles (approx.) Standard Deviation Variance Correlation Covariance Sum of Squares (cross product matrix for set variables) Pairwise Cross tabs Risk Ratio & Odds Ratio Cross-Tabulation of Data (standard tables & long form) Marginal Summaries of Cross Tabulations

Sampling

▪ ▪ ▪

▪ ▪ ▪ ▪ ▪

Sum of Squares (cross product matrix for set variables) Multiple Linear Regression Generalized Linear Models (GLM) exponential family distributions: binomial, Gaussian, inverse Gaussian, Poisson, Tweedie. Standard link functions: cauchit, identity, log, logit, probit. User defined distributions & link functions. Covariance & Correlation Matrices Logistic Regression Classification & Regression Trees Predictions/scoring for models Residuals for all models

Variable Selection Stepwise Regression

Simulation ▪ ▪

Simulation (e.g. Monte Carlo) Parallel Random Number Generation

Cluster Analysis ▪

K-Means

Classification ▪ ▪ ▪ ▪

Decision Trees Decision Forests Gradient Boosted Decision Trees Naïve Bayes

Combination ▪ ▪ ▪

New in v7.3

PEMA-R API rxDataStep rxExec

Coming in v7.4







R IN THE CLOUD


• Exposing the expertise of data scientists as APIs • Bringing the utility of data science to applications

• Addressing the Data Science talent gap


Azure: Huge infrastructure scale

19 Regions ONLINE…huge datacenter capacity around the world…and we’re growing

North Europe Central US Iowa

US Gov

Ireland

North Central US

Illinois

West Europe Netherlands

China North * Beijing

East US

Iowa

West US

Virginia

South Central US

Texas

India West

US Gov

Japan West

India East

TBD

Virginia

Saitama

Shanghai

East US 2

California

Japan East

China South *

Virginia

Osaka

TBD

East Asia

Hong Kong

SE Asia

Singapore

Australia East Sydney

Brazil South Sao Paulo

▪ ▪ ▪ ▪

100+ datacenters One of the top 3 networks in the world (coverage, speed, connections) 2 x AWS and 6x Google number of offered regions G Series – Largest VM available in the market – 32 cores, 448GB Ram, SSD…

Australia West Melbourne

Announced Operational * Operated by 21Vianet




Turn static files into dynamic content formats.

Create a flipbook
Issuu converts static files into: digital portfolios, online yearbooks, online catalogs, digital photo albums and more. Sign up and create your flipbook.