Homepage on Andrea Perlato
/
Recent content in Homepage on Andrea PerlatoHugo -- gohugo.ioen-usThu, 05 May 2016 21:48:51 -0700NLP Glossary
/theorypost/nlp-glossary/
Wed, 11 Dec 2019 00:00:00 +0000/theorypost/nlp-glossary/body {
text-align: justify}
Text Similarity using Vector Space Model
The idea is to find common words within text. The figure below shows an example with a text with common words “”house” and “white”. It represents all texts with common words, calculating the frequency of each word.
For example, looking at the red point (2,7), 2 is the number of times that the word “white” appears in the text and 7 is the number of times that the word “house” appears in the same text.Logistic Regression as a Neuron
/aipost/logistic-regression-as-a-neuron/
Mon, 02 Dec 2019 00:00:00 +0000/aipost/logistic-regression-as-a-neuron/body {
text-align: justify}
Deep Learning is the study of neural network that are networks of neurons that are the fundamental unit of computation.
From the multiple linear regression above, we converted the beta coefficients in weights w1, w2 and they are essentially the slope for each of the individual inputs x1 and x2. Another way to think of the weights is that thay tell us how important each input to predict the output.Predict Breast Cancer using TensorFlow
/aipost/predict-breast-cancer-using-tensorflow/
Mon, 02 Dec 2019 00:00:00 +0000/aipost/predict-breast-cancer-using-tensorflow/body {
text-align: justify}
In this article we try to predict if a patient’s diagnosis of breast tissue is malignant or bening. From the code below, using the feature_names function we can explore the name of all the predictors involved in the breast cancer such as for example: mean radius, mean perimeter and so forth. Moreover, using the funtion target_names we have the two level of the response variable (malignant, benign).Hyperparameters Tuning in AI
/aipost/hyperparameters-tuning-in-ai/
Tue, 26 Nov 2019 00:00:00 +0000/aipost/hyperparameters-tuning-in-ai/body {
text-align: justify}
The tuning process is a painful process in deep learning because we have many paramters:
1 - alpha learning rate
2 - the momentum beta
3 - beta1, beta2, epsilon in AdaM
4 - number of layers L
5 - number of hidden units
6 - learning rate decay parameters
7 - mini-batch size
Some of these paramters reported above, are more impotant than others.Softmax Regression
/aipost/softmax-regression/
Tue, 26 Nov 2019 00:00:00 +0000/aipost/softmax-regression/body {
text-align: justify}
When we have to deal with a classification with more than 2 possible levels, we use a generalization of the logistic regression function called softmax regression; a logistic regression class for multi-class classification tasks. In Softmax Regression, we replace the sigmoid logistic function by the so-called softmax function.
\[
\begin{array}{l}{\qquad P\left(y=j | z^{(i)}\right)=\phi_{\text {softmax}}\left(z^{(i)}\right)=\frac{e^{z^{(i)}}}{\sum_{j=0}^{k} e^{z_{k}^{(i)}}}} \\ {\text { where we define the net input } z \text { as }} \\ {\qquad z=w_{1} x_{1}+\ldots+w_{m} x_{m}+b=\sum_{l=1}^{m} w_{l} x_{l}+b=\mathbf{w}^{T} \mathbf{x}+b}\end{array}
\]Adaptive Momentum
/aipost/adaptive-momentum/
Mon, 25 Nov 2019 00:00:00 +0000/aipost/adaptive-momentum/body {
text-align: justify}
Teh Adaptive Momentum AdaM stands for Adaptive Momentum. It combines the Momentum and RMS prop in a single approach making AdaM a very powerful and fast optimizer.
\[
\begin{aligned} V_{d W}=\beta_{1} V_{d b}+\left(1-\beta_{1}\right) d W ; V_{d b} &=\beta_{1} V_{d b}+\left(1-\beta_{1}\right) d b \\ V_{d W}^{c o r r e c t e d}=& \frac{V_{d W}}{1-\beta_{1}^{i}} ; V_{d b}^{c o r r e c t e d}=\frac{V_{d b}}{1-\beta_{1}^{i}} \\ S_{d W}=\beta_{2} S_{d W}+\left(1-\beta_{2}\right) d W^{2} ; S_{d b} &=\beta_{2} S_{d b}+\left(1-\beta_{2}\right) d b^{2} \\ S_{d W}=\beta_{2} S_{d W}+\left(1-\beta_{2}\right) d W^{2} ; S_{d b} &=\beta_{2} S_{d b}+\left(1-\beta_{2}\right) d b^{2} \\ S_{d W}^{c o r r e c t e d}=\frac{S_{d W}}{1-\beta_{2}^{i}} ; S_{d b}^{c o r r e c t e d} \\ W=W-\alpha \cdot \frac{V_{d V}}{\sqrt{S_{d r r e c t e d}}^{c o r r e c t e d}}+\epsilon \\ b=b-\alpha \cdot \frac{V_{d b}^{c o r r e c t e d}}{\sqrt{S_{d b}^{c o r r e c t e d}}}+\epsilon \end{aligned}
\]Learning Rate Decay and Local Optima
/aipost/learning-rate-decay-and-local-optima/
Mon, 25 Nov 2019 00:00:00 +0000/aipost/learning-rate-decay-and-local-optima/body {
text-align: justify}
Supposing we are implementing a mini-batch gradient descent of just 64 or 128 examples. During the interation we can occur to the problem to not converge to the minimum. That is expecially true when we use fixed values of alpha learning rate. On the contrary, when we slowly reduce the learning rate alpha we are able to end up oscillating in the region around the minimum.Root Mean Square Propagation
/aipost/root-mean-square-propagation/
Mon, 25 Nov 2019 00:00:00 +0000/aipost/root-mean-square-propagation/body {
text-align: justify}
The Root Mean Square Propagation RMS Prop is similar to Momentum, it is a technique to dampen out the motion in the y-axis and speed up gradient descent.
For better understanding, let us denote the Y-axis as the bias b and the X-axis as the weight W.
It is called Root Mean Square because we square the derivatioves of both w and b parameters.Gradient Checking
/aipost/gradient-checking/
Fri, 22 Nov 2019 00:00:00 +0000/aipost/gradient-checking/body {
text-align: justify}
When we implement backpropagation there is a test called Gradient Checking that helps to make sure that the implementation of backpropagation is correct.
Looking at the figure above we can get much better estimate of gradient if we use a larger approximation of the derivative using a double triangle.
The hiight of the triagle in the figure can be seen as follow:Gradient Descent with Momentum
/aipost/gradient-descent-with-momentum/
Fri, 22 Nov 2019 00:00:00 +0000/aipost/gradient-descent-with-momentum/body {
text-align: justify}
Gradient Descent with momentum or just Momentum is an advanced optimization algorithm that speeds up the optimization of the cost function J. It makes use of the moving average to update the trainable parameters of the neural network. Moving average is the average calculated over n successive values rather than the whole set of values. Mathematically, it is denoted as follow:
\[
A_{t}=\beta A_{t-1}+(1-\beta) X_{t}
\]Mini-batch Gradient Descent
/aipost/mini-batch-gradient-descent/
Fri, 22 Nov 2019 00:00:00 +0000/aipost/mini-batch-gradient-descent/body {
text-align: justify}
In Batch Gradient Descent on every interation we go through the entire training set.
From the figure below we can see the cost function J on the left a batch gradient descent that decrease every single interation. On the right we have the cost function J of a mini-batch gradient descent where in every interatin our processing in training on a different train-set; that is why the loss function J is going to be a little noisier.Vanishing Gradient
/aipost/vanishing-gradient/
Fri, 22 Nov 2019 00:00:00 +0000/aipost/vanishing-gradient/body {
text-align: justify}
One of the problems of training a deep neural network is the vanishing and exploding gradient: when we train a deep network the derivatives or the slope can get very big or very small or exponentially small and this makes training difficult. We have to choose very carefully the random weight initialization in order to avoid this problem.
\[
\omega^{[l]}=\left[\begin{array}{cc}{1.5} & {0} \\ {0} & {1.Azure Databricks and RStudio
/mlpost/azure-databricks-and-rstudio/
Wed, 20 Nov 2019 00:00:00 +0000/mlpost/azure-databricks-and-rstudio/body {
text-align: justify}
In the analytics market Spark is taking off for ETL and Machine Learning.
Azure Databircks is a managed version of Spark and very quickly a data scientst is able to start from zero and get to insights. Moreover, in addition it is intereget with Azure AD in order to use the same authentication model that is used for every other services of Azure. A very interesting topic is the integration of RStudio with Azure Databricks and Spark.NLP Step by Step
/mlpost/nlp-step-by-step/
Thu, 14 Nov 2019 00:00:00 +0000/mlpost/nlp-step-by-step/body {
text-align: justify}
This post has the aim to shows all the processes related to the NLP and how to use the Naive Bayes Classifier using Python and the nltk library.
We use data from spam detection.
In NLP a large part of the processing is Feature Engeneering. Tipically the steps are:
Regular Expression: that is a formal language for specifying text strings: for example, we can have for the same word the s for plural, the capital first letter and any combination of those.Graphical Representatioin of Missing Data
/graphpost/graphical-representatioin-of-missing-data/
Tue, 12 Nov 2019 00:00:00 +0000/graphpost/graphical-representatioin-of-missing-data/body {
text-align: justify}
Most statistical methods assume that you’re working with complete matrices, vectors, and data frames. In most cases, we have to eliminate missing data before we address the substantive questions that led us to collect the data. We can eliminate missing data by removing cases with missing data, or replacing missing data with reasonable substitute values. In either case, the end result is a dataset without missing values.Bias Variance Trade-Off
/theorypost/bias-variance-trade-off/
Thu, 07 Nov 2019 00:00:00 +0000/theorypost/bias-variance-trade-off/body {
text-align: justify}
The Bias Variance Trade-Off is used to understand the model’s performance and evaluation. We we have a training error that goes down, nut test error starting to go up, the model we created begins to overfit.
Image to have a Linear Regression ML, but is not accurate to replicate the curve of the true relationship between height and weight.
The inability for an ML to capture the true relationship is called bias.Find Credit Card Info from Ecommerce using Python
/graphpost/find-credit-card-info-from-ecommerce-using-python/
Tue, 05 Nov 2019 00:00:00 +0000/graphpost/find-credit-card-info-from-ecommerce-using-python/body {
text-align: justify}
This is an example of how to extract customer information, such as the credit card number from an Ecommerce using Python.
We start loading the data and showing the first 10 observations. We also can see the number of columns (14) and rows (10000) of the dataset.
import pandas as pd
ecom = pd.read_csv('C:/07 - R Website/dataset/Graph/Ecommerce Purchases')
ecom.head(10)
## Address ... Purchase Price
## 0 16629 Pace Camp Apt.Sankey Diagram with D3js
/graphpost/sankey-diagram-with-d3js/
Fri, 18 Oct 2019 00:00:00 +0000/graphpost/sankey-diagram-with-d3js/body {
text-align: justify}
The Sankey diagram is a way of visualizing the flow of data. A Sankey diagram consists of three sets of elements: the nodes, the links, and the instructions which determine their positions.
The node is wherever the lines change direction. The second element is the edge, that connect the nodes together. These links have a value associated with them, which is represented by the thickness of the link.Vectorization in Python
/aipost/vectorization-in-python/
Fri, 18 Oct 2019 00:00:00 +0000/aipost/vectorization-in-python/body {
text-align: justify}
In deep learing we often deal with large data sets. It is important to run the code quickly because otherwise the code might take a long time to get the results.
That is why perform vectorization has become a key skill.
For example in logistic regression we need to to compute w transpose x in a non-vectorized implementation we can use the following code:Introduction to Support Vector Machine
/theorypost/introduction-to-support-vector-machine/
Wed, 02 Oct 2019 00:00:00 +0000/theorypost/introduction-to-support-vector-machine/body {
text-align: justify}
This article is a summary of the StatQuest video made by Josh Starmer.
Click here to see the video explained by Josh Starmer.
We use as an example the measurement of the Mass of mice (g). The red dots in the figure below represent mice that are not obese and the green dots represent mice obese. Based on this observation, we can pick a threshold, and when we have a new observation that has less mass than the threshold we can classify it as not obese.Extract the Main Topics from Books
/mlpost/extract-the-main-topics-from-books/
Fri, 09 Aug 2019 00:00:00 +0000/mlpost/extract-the-main-topics-from-books/body {
text-align: justify}
Topic modeling is the process of identifying topics in a set of documents. This can be useful for search engines, customer service automation, and any other instance where knowing the topics of documents is important. There are multiple methods of going about doing this. The most commonly used is Latent Dirichlet Allocation.
The LDA builds a topic per document model and words per topic model, modeled as Dirichlet distributions.Introduction to Topic Model
/theorypost/introduction-to-topic-model/
Thu, 08 Aug 2019 00:00:00 +0000/theorypost/introduction-to-topic-model/body {
text-align: justify}
The Topic Model is a type of statistical model to find the topics that occur in a collection of documents.
It is a frequently used text-mining tool for discovery of hidden semantic structures in a text body. For example, image to have some articles or a series of social media messages and we want to understand what is going on inside of them. A common tool to face this problem is via Unsupervised Machine Learning model.Life Expectancy based on World Bank Indicators
/graphpost/life-expectancy-based-on-world-bank-indicators/
Wed, 07 Aug 2019 00:00:00 +0000/graphpost/life-expectancy-based-on-world-bank-indicators/body {
text-align: justify}
We used the data from the World Bank Indicators Portal which is an incredibly rich resource to make an animation of the Life Expectancy as a function of Gross Domestic Product per capita. More precisely, this is the value of all final goods and services produced within a nation in a given year, converted at market exchange rates to current U.S. dollars, divided by the average population for the same year.The Beauty of Animation
/graphpost/the-beauty-of-animation/
Wed, 07 Aug 2019 00:00:00 +0000/graphpost/the-beauty-of-animation/body {
text-align: justify}
Graphs are a common method to visually illustrate relationships in the data,and they give a clear and compact idea and knowledge of matter contained in, at first sight. Graphs are extremely important and in data analysis they are considered the first statistics to perform.
In this post we descrie the R package gganimate an extension of the ggplot2 package for creating animated ggplots. It provides a range of new functionality that can be added to the plot object in order to customize how it should change with time.Introduction to Naive Bayes
/theorypost/introduction-to-naive-bayes/
Tue, 06 Aug 2019 00:00:00 +0000/theorypost/introduction-to-naive-bayes/body {
text-align: justify}
It is a Probability Classifier. Naïve Bayes is the first algorithm that should be considered for solving Text Classification Problem which involves High Dimensional training Dataset. A few examples are: Sentiment Analysis and Classifying Topics on Social Media.
It also refers to the Bayes’ Theorem also known as Bayes’ Law that give us a method to calculate the Conditional Probability: that is the probability of an event, based on previous knowledge available on the events.Introduction to Word2Vec
/theorypost/introduction-to-word2vec/
Tue, 06 Aug 2019 00:00:00 +0000/theorypost/introduction-to-word2vec/body {
text-align: justify}
The Word2Vec is a semantic learning framework that used a neural network to learn the representation of words or phrases in a text. It is usefull to understand the semantic meaning behind a term. This algorithm use two methods:
1 - CBOW
2 - SkipGram
In Continuous bag of words CBOW predicts the current word from a window of surrounding context words, or given a set of context words predicts the missing word that is likely to appear in that context.Predict Movie Sentiment via DOC2VEC
/aipost/predict-movie-sentiment-via-doc2vec/
Tue, 06 Aug 2019 00:00:00 +0000/aipost/predict-movie-sentiment-via-doc2vec/body {
text-align: justify}
In order to have an introduction of the Word2Vec look at this post.
Using this method we try to predict the movie sentiment (positive vs. negative) using text data as predictor. We use the movie review data set, and we use the power of Doc2Vec to transform our data into predictors.
library(text2vec)
library(tm)
data(movie_review)
names(movie_review)
[1] "id" "sentiment" "review"
The data set contain an id, sentiment (0=negative, 1=positive), and review that contains the text if people like the movie or not.Identify Spam Emails
/mlpost/identify-smap-emails/
Fri, 02 Aug 2019 00:00:00 +0000/mlpost/identify-smap-emails/body {
text-align: justify}
We want to differentiate between spam (called spam) and non-spam (called ham) email based on the content. We use a training set of textual data that are already labeled spam/non-spam email.
We start removing empy columns, and we call our columns label and text.
We also create a corpus, remove punctuation, transform everything into lowercase, remove numbers, and stop words. Then, we have to stamming the document, and finally we have a corpus of terms.Introduction to K-Means and Hierarchical clustering
/theorypost/introduction-to-k-means-and-hierarchical-clustering/
Fri, 02 Aug 2019 00:00:00 +0000/theorypost/introduction-to-k-means-and-hierarchical-clustering/body {
text-align: justify}
This article is a summary of the StatQuest video made by Josh Starmer.
Click here to see the video explained by Josh Starmer.
K-Means Clustering
This is a popular unsupervised machine learning algorithms. The objective of K-means is simple: group similar data points together and discover underlying patterns.
The Step - 1 is to identify the number of clusters.
Suppose we have K=3, the Step - 2 is to select randomly 3 data points and these are our Initial Cluster Points.Text Mining Clustering Tweets
/mlpost/text-mining-clustering-tweets/
Fri, 02 Aug 2019 00:00:00 +0000/mlpost/text-mining-clustering-tweets/body {
text-align: justify}
We can use Unsupervised Classification (clustering) to learn more about text that we have to analyze. More specifically, we use Hierarchical clustering to cluster our text into groups that are the propensity to occur together. In this example, text are some tweet about Catalan Independence Referendum.
First step is to convert our tweets into a corpus, remove punctuation, transform everything into lowercase, remove numbers, and stop words.Feature of Automobiles via Web App
/graphpost/feature-of-automobiles-via-web-app/
Wed, 31 Jul 2019 00:00:00 +0000/graphpost/feature-of-automobiles-via-web-app/body {
text-align: justify}
This is an iteractive web applicatioin created in shiny.
It is an interactive approach to telling data story. Here below, an example using feaure of cars from the Mtcars dataset. The Web App need about 15 sec to be loaded here below, please wait :-)
In this example we have a dropdown menu on the left dynamically populated from the data. For each feature inside the dropdown menu we have a count of the unique values per each features.Auto Encoder
/aipost/auto-encoder/
Mon, 29 Jul 2019 00:00:00 +0000/aipost/auto-encoder/body {
text-align: justify}
It encodes itself using Visible Input Nodes, and the Visible Output Nodes are decoded using Hidden Nodes, in order to be identical to the Input Nodes.
It is not a pure Unsupervised Deep Learning algorithm, but it is a Self-Supervised Deep Learning algorithm.
Auto Encoders can be used for Feature Detection. Once we have encoded our data, the Hidden Nodes also called Encoder Nodes, will be represent certain features which are important in ur data.Boltzmann Machine
/aipost/boltzmann-machine/
Sun, 28 Jul 2019 00:00:00 +0000/aipost/boltzmann-machine/body {
text-align: justify}
Boltzmann Machine is an Unsupervised Deep Learning used for Recommendation System.
Boltzmann Machines are undirected models, they don’t have a direction in the connections as described in the figure below.
From the figure above, we can see that there is not an output layer.
This makes Boltzmann Machine fundamentally different to all other algorithms, and it doesn’t expect input data, but it generates information regardless of input nodes.Ensemble Learning
/theorypost/ensemble-learning/
Fri, 26 Jul 2019 00:00:00 +0000/theorypost/ensemble-learning/body {
text-align: justify}
This article is a summary of the StatQuest video made by Josh Starmer.
Click here to see the video explained by Josh Starmer.
The Boosting and Ensemble Learning concepts can be applied to many Machine Learning models: it is a Meta Algorithm used to convert many weak learners into a strong learner in order to achieve good performance in supervised problems.Principal Component Analysis
/theorypost/principal-component-analysis/
Fri, 26 Jul 2019 00:00:00 +0000/theorypost/principal-component-analysis/body {
text-align: justify}
This article is a summary of the StatQuest video made by Josh Starmer.
Click here to see the video explained by Josh Starmer.
The Principal Component Analysis is a deterministic method (given an input will always produce the same output). It is always good to perform a PCA: Principal Components Analysis (PCA) is a data reduction technique that transforms a larger number of correlated variables into a much smaller set of uncorrelated variables called Principal Components.CNN and Softmax
/aipost/cnn-and-softmax/
Tue, 23 Jul 2019 00:00:00 +0000/aipost/cnn-and-softmax/body {
text-align: justify}
Convolutional neural network CNN is a Supervised Deep Learning used for Computer Vision.
The process of Convolutional Neural Networks can be devided in five steps: Convolution, Max Pooling, Flattening, Full Connection.
STEP 1 - Convolution
At the bases of Convolution there is a filter also called Feature Detector or Kernel. We basically multiply the portion of the image by the filter and we check the matching how many 1s have in common.Self Organizing Maps
/aipost/self-organizing-maps/
Tue, 23 Jul 2019 00:00:00 +0000/aipost/self-organizing-maps/body {
text-align: justify}
SOM is an Unsupervised Deep Learning used for Feature Detection.
SOMs are great for dimensionality reduction.
They take a multidimensional data set with lots of columns and end up with a map in a two-dimensional representation using an unsupervised algorithm.
It is a similar approach like the K-Mean Clustering.
How SOMs Learn: Best Matching Unit BMU
The weights in SOMs are different.Recurrent Neural Network in Theory
/aipost/recurrent-neural-network-in-theory/
Fri, 19 Jul 2019 00:00:00 +0000/aipost/recurrent-neural-network-in-theory/body {
text-align: justify}
RNN is a Supervised Deep Learning used for Time Series Analysis.
Recurrent Neural Networks represent one of the most advanced algorithms that exist in the world of supervised deep learning.
Frontal lobe and RNN
RNNs are like short-term memory. We will learn that they can remember things that just happen in a previous couple of observations and apply that knowledge in the going forward.Simple Regression with Python
/theorypost/simple-regression-with-python/
Fri, 19 Jul 2019 00:00:00 +0000/theorypost/simple-regression-with-python/body { text-align: justify} The reticulate package provides a comprehensive set of tools fot interoperability between Python and R.
# Data Preprocessing Template # Importing the libraries import numpy as np import matplotlib.pyplot as plt import pandas as pd # print(r.df.head()) # Importing the dataset df = pd.read_csv(r"C:\07 - R Website\dataset\PY\Salary_Data.csv") # load the data df.head(10) # show firt 10 observations # # Preprocessing Input data ## YearsExperience Salary ## 0 1.Alpha Beta Pruning
/aipost/alpha-beta-pruning/
Thu, 20 Jun 2019 00:00:00 +0000/aipost/alpha-beta-pruning/body {
text-align: justify}
Introduction A fascinating aspect of our brain is the Synaptic pruning. One of the grand strategies nature uses to construct nervous systems is to overproduce neural elements, such as neurons, axons and synapses, and then prune the excess. In fact, this overproduction is so substantial that only about half of the neurons mammalian embryos generate will survive until birth. At the same way the pruning in ANN is used to eliminate redundant connections between neurons during the training.The Activation Function
/aipost/the-activation-function/
Thu, 20 Jun 2019 00:00:00 +0000/aipost/the-activation-function/body {
text-align: justify}
What an artifical neuron do is to calculate a weighted sum of its input, adds a bias and then decides whether it should be “fired” or not.
considering the neuron of the figure above, the value of Y can be anything ranging from -inf to +inf. The neuron really doesn’t know the bounds of the value. How do we decide whether the neuron should fire or not?Fitting a curve to data LOWESS and LOESS
/theorypost/fitting-a-curve-to-data-lowess-and-loess/
Tue, 18 Jun 2019 00:00:00 +0000/theorypost/fitting-a-curve-to-data-lowess-and-loess/body {
text-align: justify}
This article is a summary of the StatQuest video made by Josh Starmer.
Click here to see the video explained by Josh Starmer.
Locally weighted regression scatterplot smoothing - LOWESS
The main idea to fit a curve to data point is to use a type of sliding window to divide the data into smaller blobs. The second main idea is that each data point use the least squares to fit a line.Introduction to Convolutional Neural Networks
/aipost/introduction-to-convolutional-neural-networks/
Tue, 18 Jun 2019 00:00:00 +0000/aipost/introduction-to-convolutional-neural-networks/body {
text-align: justify}
Convolutional Neural Networks is one of the most succesfully and used Neural Network Algorithm. The three main components of the CNN are the Convolutional Layer, the Pooling Layer (used to reduce the computational space), and the Fully Connected Layer. Image to have to classify 32x32 images. A single Fully-Connected Neuron in a first hidden layer would have 3131x3=3072 weights and this structure can not scale to larger images.Ridge and Lasso Regression
/theorypost/ridge-and-lasso-regression/
Tue, 18 Jun 2019 00:00:00 +0000/theorypost/ridge-and-lasso-regression/body {
text-align: justify}
This article is a summary of the StatQuest video made by Josh Starmer.
Click here to see the video on Ridge Regression explained by Josh Starmer.
Click here to see the video on Lasso Regression explained by Josh Starmer.
Overfitting
In statistics, overfitting is the production of an analysis that corresponds too closely or exactly to a particular set of data, and may therefore fail to fit additional data or predict future observations reliably.Backpropagation Intuition
/aipost/backpropagation/
Mon, 17 Jun 2019 00:00:00 +0000/aipost/backpropagation/body {
text-align: justify}
Introduction We already know that there is in ANN a Forwward Propagation where the information is entered into the input layer, and then it is propagated forward to get our output values to compare with the actual values that we have in our training set, and then we calculate the errors. Then the errors are back propagated through the network in the opposite direction in order to adjust the weights.Synthetic Minority Oversampling Technique
/theorypost/synthetic-minority-oversampling-technique/
Mon, 17 Jun 2019 00:00:00 +0000/theorypost/synthetic-minority-oversampling-technique/body {
text-align: justify}
Oversampling and undersampling in data analysis are techniques used to adjust the class distribution of a data set. Both oversampling and undersampling involve introducing a bias to select more samples from one class than from another, to compensate for an imbalance that is either already present in the data. Data Imbalance can be of the following types: Under-representation of a class in one or more important predictor variables.Cluster Analysis in Theory
/theorypost/cluster-analysis-in-theory/
Wed, 12 Jun 2019 00:00:00 +0000/theorypost/cluster-analysis-in-theory/body {
text-align: justify}
This article is a summary of the StatQuest video made by Josh Starmer.
Click here to see the video on K-Mean explained by Josh Starmer.
Click here to see the video on Hierarchical Clustering explained by Josh Starmer.
Cluster analysis or clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).Stochastic Gradient Descent
/aipost/stochastic-gradient-descent/
Tue, 11 Jun 2019 00:00:00 +0000/aipost/stochastic-gradient-descent/body { text-align: justify} This article is a summary of the StatQuest video made by Josh Starmer. Click here to see the video explained by Josh Starmer.
In Gradient Descent, we used the sum of the squared residuals as the Loss Function to determine how well the initial line fit the data. Than, we took the derivative of the sum of the squared residuals with respect to the intercept and slope.How to create 3D and 4D plot
/graphpost/how-to-create-3d-and-4d-plot/
Fri, 07 Jun 2019 00:00:00 +0000/graphpost/how-to-create-3d-and-4d-plot/body {
text-align: justify}
A 3D plot is quite popular, in particular in business presentation, but it is almost always inappropriately used. In fact, it is rare to see a 3D plot that could not be improved by turning into a regular 2D figure.
Visualizations using 3D position scales can sometimes be appropriate, however. If the visualization is show it slowly rotation, rather than a static image from one prospective, will allow the viewer to discern where in 3D space different graphicla elements resides.How to compare data at different scales
/graphpost/how-to-compare-data-at-different-scales/
Tue, 04 Jun 2019 00:00:00 +0000/graphpost/how-to-compare-data-at-different-scales/body {
text-align: justify}
I recently underwent to different tests of the so called Hair Analysis, and as a result I was given different conclusions. More precisely, I sent two of my hair samples to a two different labs, one in Italy and the second in Switzerland. The results are quite different as described by the table and graph below. In 2011 a comprehensive review was published of the scientific literature on hair elemental (mineral) analysis.The Learning Process
/aipost/the-learning-process/
Mon, 20 May 2019 00:00:00 +0000/aipost/the-learning-process/body {
text-align: justify}
Learning means to generalize what we learned and improve the performance of the same task based on a given measure. More specifically, it means to adjusting the parameters of the model in order to accurately predict the dependent variables on new input data. More formally we can define two main functions: the Score Function and the Loss Function. The score Function describes our mapping from the input space x to the output space y.Cox Proportional Hazards Model
/tspost/cox-proportional-hazards-model/
Mon, 13 May 2019 00:00:00 +0000/tspost/cox-proportional-hazards-model/body {
text-align: justify}
An hazard rate is the probability estimate of the time it takes for an event to take place. The event can be anything ranging from death of an organism or failure of a machine or any other time to event setting. There are external factors that influence the probabililty of an event, covariates. For example: how many miles was the car used or did the owner exchange the oil regularly.Parametric Regression Model in Survival Analysis
/tspost/parametric-regression-model-in-survival-analysis/
Mon, 13 May 2019 00:00:00 +0000/tspost/parametric-regression-model-in-survival-analysis/body {
text-align: justify}
There are differences between Parametric Models (e.g. Kaplan-Meier), Semi-Parametric Models (e.g. Cox Proportional Hazard), and Non-Parametric Models. The graph below gives the main pieces of information. A survival analysis can be defined as consisting of two parts: the core survial object with a time indicator plus the corresponding event status (used to calculate the baseline hazard). The second part of the survival model consists of the covariates.Survival Trees
/tspost/survival-trees/
Mon, 13 May 2019 00:00:00 +0000/tspost/survival-trees/body {
text-align: justify}
A survival tree is a decision tree fitted on the survival data. It allows covariates to be incorporated quite like in a Cox Proportional Hazard. When we use a survival treee, we have to keep a few things in mind. First of all, it is a very good choice for huge dataset. Decision or survival trees require a huge amount of data to get precise enough.Survival Analysis: Kaplan-Meier & Logrank test
/tspost/introduction-of-survival-analysis/
Tue, 30 Apr 2019 00:00:00 +0000/tspost/introduction-of-survival-analysis/body {
text-align: justify}
The ultimate goal of survival analysis is to gain information on the expected duration of time untill one or even more events happen. Survival analysis is applied in different fields and most of these fields have different terms for the same concept. So, sometimes it is called Reliability Theory or Reliability Analysis in engineering. It is also called Duration Analysis in economics or Event-history Analysis is sociology.Time Series Classification
/tspost/time-series-classification/
Fri, 05 Apr 2019 00:00:00 +0000/tspost/time-series-classification/body {
text-align: justify}
In order to perform a Time series Classification we use Decision Tree, and then we look at the performance of the classification.
We use the Synthetic Control Chart Time Series. This dataset contains 600 examples of control charts synthetically generated by the process in Alcock and Manolopoulos (1999).
data <- read.table("C:/07 - R Website/dataset/TS/synthetic_control.txt", header = FALSE)
# Data Preparation
pattern100 <- c(rep('Normal', 100),
rep('Cyclic', 100),
rep('Increasing trend', 100),
rep('Decreasing trend', 100),
rep('Upward shift', 100),
rep('Downward shift', 100))
# Create data frame
newdata <- data.Linear Discriminant Analysis
/mlpost/linear-discriminant-analysis/
Thu, 04 Apr 2019 00:00:00 +0000/mlpost/linear-discriminant-analysis/body {
text-align: justify}
Linear Discriminant Analysis was originally developed by R.A. Fisher to classify subjects into one of the two clearly defined groups. It was later expanded to classify subjects inoto more than two groups. It helps to find linear combination of original variables that provide the best possible separation between the groups.
Linear Discriminant Analysis is focused on maximizing the separability among known categories. The problem is when 2 features are not sufficient to capture the most of variation.Time Series Clustering
/tspost/time-series-clustering/
Thu, 04 Apr 2019 00:00:00 +0000/tspost/time-series-clustering/body {
text-align: justify}
Clustering is the task of grouping s set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. Connectivity-based clustering connects objects to form clusters based on their distance. A cluster can be descibed by the maximum distance needed to connect parts of the cluster. At different distances, different clusters will form, which can be represented using dendrogram.Classification and Prediction with Support Vector Machine
/mlpost/classification-and-prediction-with-support-vector-machine/
Wed, 03 Apr 2019 00:00:00 +0000/mlpost/classification-and-prediction-with-support-vector-machine/body {
text-align: justify}
Support Vector Machine SVM is a linear classifier. We can consider SVM for linearly separable binary sets. The goal is to design a hyperplane (is a subspace whose dimension is one less than that of its ambient space. If a space is 3-dimensional then its hyperplanes are the 2-dimensional planes). The hyperplane classifies all the training vectors in two classes. We can have many possible hyperplanes that are able to classify correctly all the elements in the feature set, but the best choice will be the hyperplane that leaves the Maximum Margin from both classes.Interactive Forecasting
/graphpost/interactive-forecasting/
Wed, 03 Apr 2019 00:00:00 +0000/graphpost/interactive-forecasting/body {
text-align: justify}
In this post, we use some fairly new technology of time series analysis namely neural nets and interactive charting tools.
INTERACTIVE GRAPH The time series results should be presented interactively in order to highlight certain features.
# Handle outliers
library(forecast)
myts <- tsclean(myts)
# Set up a NN
mynnetar <- nnetar(myts)
# Forecast 3 years
nnetforecast <- forecast(mynnetar, h = 36, PI = TRUE) # PI create the prediction intervals for the forecast
library(ggplot2)
# Data we need for the graph
data <- nnetforecast$x # raw data
lower <- nnetforecast$lower[,2] # confidence intervals lower bound
upper <- nnetforecast$upper[,2] # confidence intervals upper bound
pforecast <- nnetforecast$mean # th element mean
mydata <- cbind(data, lower, upper, pforecast) # put everything in one dataframe
library(dygraphs)
dygraph(mydata, main = "Campsite Restaurant") %>% # get data and the caption
dyRangeSelector() %>% # the zoom tool
dySeries(name = "data", label = "Revenue Data") %>% # add time series which are store in: data <- nnetforecast$x
dySeries(c("lower","pforecast","upper"), label = "Revenue Forecast") %>% # add the forecast and CI
dyLegend(show = "always", hideOnMouseOut = FALSE) %>% # add the legend (time series + forecast)
dyAxis("y", label = "Monthly Revenue USD") %>% # label the y axis
dyHighlight(highlightCircleSize = 5, # specify what happen when the mouse in hovering the graph
highlightSeriesOpts = list(strokeWidth = 2)) %>%
dyOptions(axisLineColor = "navy", gridLineColor = "grey") %>% # set axis and fridline colors
dyAnnotation("2010-8-1", text = "CF", tooltip = "Camp Festival", attachAtBottom = T) # add annotation
{"Feature Selection using Boruta Algorithm
/mlpost/feature-selection-using-boruta-algorithm/
Tue, 02 Apr 2019 00:00:00 +0000/mlpost/feature-selection-using-boruta-algorithm/body {
text-align: justify}
Variable selection is an important aspect because it helps in building predictive models free from correlated variables, biases and unwanted noise. The Boruta Algorithm is a feature selection algorithm. As a matter of interest, Boruta algorithm derive its name from a demon in Slavic mythology who lived in pine forests.
How Boruta Algorithm works Firstly, it adds randomness to the given data set by creating shuffled copies of all features which are called Shadow Features.Random Forest Hyperparameters Tuning
/mlpost/random-forest-hyperparameters-tuning/
Tue, 02 Apr 2019 00:00:00 +0000/mlpost/random-forest-hyperparameters-tuning/body {
text-align: justify}
Random Forest is a Bagging process of Ensemble Learners. Random Forests are built from Decision Tree. Decision Trees work great, but they are not flexible when it comes to classify new samples. It creates a bootstrapped dataset with the same size of the original, and to do that Random Forest randomly selects rows with replacement. After creating a bootstrap dataset, it creates a decision tree using the bootstrapped dataset, but using only a subset of variables at each step.Principal Component Analysis
/mlpost/principal-component-analysis/
Mon, 01 Apr 2019 00:00:00 +0000/mlpost/principal-component-analysis/body {
text-align: justify}
Principal Component Analysis PCA is a deterministic method (given an input will always produce the same output). It is always good to perform a PCA: Principal Components Analysis (PCA) is a data reduction technique that transforms a larger number of correlated variables into a much smaller set of uncorrelated variables called PRINCIPAL COMPONENTS. For example, we might use PCA to transform many correlated (and possibly redundant) variables into a less number of uncorrelated variables that retain as much information from the original set of variables.Interactive Dashboard
/graphpost/interactive-dashboard/
Wed, 27 Mar 2019 00:00:00 +0000/graphpost/interactive-dashboard/body {
text-align: justify}
A lot of the times when dashboards are implemented they are with a very specific dataset. The problem with this is that we have to rebuild them from scratch every time. The advantage to use shiny is the possibility to create interactive dashboard or webapp without, and reusing the code already wirtten we can adapt it with new data. Here below, there are some examples of interactive dashboards.Polynomial Regression & Smoothing Splines
/mlpost/polynomial-regression-smoothing-splines/
Wed, 27 Mar 2019 00:00:00 +0000/mlpost/polynomial-regression-smoothing-splines/body {
text-align: justify}
Polynomial Linear Regression Polynomial Linear Regression fits a nonlinear relationship between the value of x and the corresponding conditional mean of y, and has been used to describe nonlinear phenomena such as the progression of disease epidemics. Although polynomial regression fits a nonlinear model to the data, as a statistical estimation problem it is linear, in the sense that the regression function is linear in the unknown parameters that are estimated from the data.Fuzzy Matching Addresses to Prevent Fraudulent Application
/mlpost/fuzzy-matching-addresses-to-prevent-fraudulent-application/
Mon, 25 Mar 2019 00:00:00 +0000/mlpost/fuzzy-matching-addresses-to-prevent-fraudulent-application/body {
text-align: justify}
Application fraud refers to fraud committed by submitting a new credit application with fraudulent details to a credit provider. Normally, fraudsters collect the personal and financial data of innocent users from the identity documents, pay slips, bank statements, and other source documents to commit the application fraud. The information collected from all these documents will be either forged or sometimes the document itself will be stolen illegally or the details in the documents will be changed for the purpose of submitting a new credit application.Generalized Addictive Models GAMs
/mlpost/generalized-addictive-models-gams/
Fri, 22 Mar 2019 00:00:00 +0000/mlpost/generalized-addictive-models-gams/body {
text-align: justify}
Generalized Addictive Models GAMs incorporates non linear form of predictions, and are useful when we have not linearity between response variable and predictors. GAMs doesn’t force the predictors to a square as in polynomial regression, but GAMes tries to do a smooth line. The data we use here is biocapacity of different countries.
library(psych)
eco <- read.csv("C:/07 - R Website/dataset/ML/biocap.csv")
pairs.panels(eco, method = "pearson", # correlation method
hist.Deal Multicollinearity with LASSO Regression
/mlpost/deal-multicollinearity-with-lasso-regression/
Thu, 21 Mar 2019 00:00:00 +0000/mlpost/deal-multicollinearity-with-lasso-regression/body {
text-align: justify}
Multicollinearity is a phenomenon in which two or more predictors in a multiple regression are highly correlated (R-squared more than 0.7), this can inflate our regression coefficients. We can test multicollinearity with the Variance Inflation Factor VIF is the ratio of variance in a model with multiple terms, divided by the variance of a model with one term alone. VIF = 1/1-R-squared. A rule of thumb is that if VIF > 10 then multicollinearity is high (a cutoff of 5 is also commonly used).Deal Multicollinearity with Ridge Regression
/mlpost/deal-multicollinearity-with-ridge-regression/
Thu, 21 Mar 2019 00:00:00 +0000/mlpost/deal-multicollinearity-with-ridge-regression/body {
text-align: justify}
Multicollinearity is a phenomenon in which two or more predictors in a multiple regression are highly correlated (R-squared more than 0.7), this can inflate our regression coefficients. We can test multicollinearity with the Variance Inflation Factor VIF is the ratio of variance in a model with multiple terms, divided by the variance of a model with one term alone. VIF = 1/1-R-squared. A rule of thumb is that if VIF > 10 then multicollinearity is high (a cutoff of 5 is also commonly used).Deal Outliers with Robust Regression
/mlpost/deal-outliers-with-robust-regression/
Thu, 21 Mar 2019 00:00:00 +0000/mlpost/deal-outliers-with-robust-regression/body {
text-align: justify}
This is a regression technique that can helps us alleviate the problem of outliers. Robust Regression is a family of regression techniques that is really quite immune to the presence of outliers. Least Trimmed Squares Regression is a technique that fit a regression function and is not effected by the presence of outliers. Least Trimmed Squares Regression attempts to minimise the sum of squared residuals over a subset of k points.Quantile Regression in Medical Expenditures
/mlpost/quantile-regression-in-medical-expenditures/
Wed, 20 Mar 2019 00:00:00 +0000/mlpost/quantile-regression-in-medical-expenditures/body {
text-align: justify}
The Quantile regression gives a more comprehensive picture of the effect of the independent variables on the dependent variable. Instead of estimating the model with average effects using the OLS linear model, the quantile regression produces different effects along the distribution (quantiles) of the dependent variable. The dependent variable is continuous with no zeros or too many repeated values. Examples include estimating the effects of household income on food expenditures for low- and high-expenditure households, what are the factors influencing total medical expenditures for people with low, medium and high expenditures.Interactive Graphs
/graphpost/interactive-graphs/
Fri, 15 Mar 2019 00:00:00 +0000/graphpost/interactive-graphs/body {
text-align: justify}
In this post we are going to create interactive graphs using Plotly. Plotly allows us to create interactive charts, plot and maps with R. Plotly is designed to build a vast range of visualizations. Crucially, it has the ability to automatically create interactive charts from the output ggplot2 which is the most abvanced R library to create scientific graphs.Interactive Tables
/graphpost/interactive-tables/
Thu, 14 Mar 2019 00:00:00 +0000/graphpost/interactive-tables/body {
text-align: justify}
Oftern, it is useful to provide interactive tables alonside charts. Responsive designed web content reflows itself dependent on the with of the browser window. There are many columns in the table which are over to the right-hand side and we need to scroll to access them. So, could be really nice that the columns which don’t fit on the screen, are instead collapsed somehow, and optionally enable these.Naive Bayes Classification
/mlpost/naive-bayes-classification/
Thu, 14 Mar 2019 00:00:00 +0000/mlpost/naive-bayes-classification/body {
text-align: justify}
Naive Bayes is an effective and commonly-used, machine learning classifier. It is a probabilistic classifier that makes classifications using the Maximum A Posteriori decision rule in a Bayesian setting. It can also be represented using a very simple Bayesian network. Naive Bayes classifiers have been especially popular for text classification, and are a traditional solution for problems such as spam detection.
An intuitive explanation for the Maximum A Posteriori Probability MAP is to think probabilities as degrees of belief.Extreme Gradient Boosting Algorithm
/mlpost/extreme-gradient-boosting-algorithm/
Wed, 13 Mar 2019 00:00:00 +0000/mlpost/extreme-gradient-boosting-algorithm/body {
text-align: justify}
Extreme Gradient Boosting is extensively used because is fast and accurate, and can handle missing values. Gradient boosting is a machine learning technique for regression and classification problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stage-wise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function.Forecasting Product Demand
/tspost/forecasting-product-demand/
Thu, 28 Feb 2019 00:00:00 +0000/tspost/forecasting-product-demand/body {
text-align: justify}
Accurately predicting demand for products allows a company to stay ahead of the market. We will predict demand for multiple products across a region of a state in the US. Then we will roll up these predictions across many different regions of the same state to form a complete hierarchical forecasting system. We need to forecast the future values of our data.Interactive Network
/graphpost/interactive-network/
Thu, 28 Feb 2019 00:00:00 +0000/graphpost/interactive-network/body {
text-align: justify}
Interactive Network is incredibly useful for visualizing the connections and relatioship between individuals, locations, and other data sets.
library("tidyverse")
library("leaflet")
# library("oidnChaRts")
transport_data <- read_csv("C:/07 - R Website/dataset/Graph/transport_data.csv")
colnames(transport_data) <- colnames(transport_data) %>%
gsub("sender", "start", .) %>%
gsub("receiver", "end", .)
transport_data <- transport_data %>%
unite(start.location, c(start.country, start.city, start.state)) %>%
unite(end.location, c(end.country, end.city, end.state))
# transport_data %>%
# geo_lines_plot()
The map above shows the number of locations on the globe with lines between them and there are both, start and end points.Applied Time Series and Forecasting
/tspost/applied-time-series-and-forecasting/
Wed, 27 Feb 2019 00:00:00 +0000/tspost/applied-time-series-and-forecasting/body {
text-align: justify}
In this particular field R is favored over Python. In fact, R has more features for Time Series. A precious resource is the Rob Hyndman’s Blog. It explains step by step the standard univariate time series analysis.
FIRST TRENDING DATA In this example we explore how many people are working in a country: unemployment rate vs. labor force participation rate. That is used for propaganda purposes, because low unemployment rates show an optimistic picture about the economics of a country.Neural Nets and Interactive Graphs
/tspost/neural-nets-and-interactive-graphs/
Wed, 27 Feb 2019 00:00:00 +0000/tspost/neural-nets-and-interactive-graphs/body {
text-align: justify}
In this post, we use some fairly new technology of time series analysis namely neural nets and interactive charting tools. These techniques are the state of the art. The dataset we use for this example has: missing data, outliers, poor formatting. The dataset i about restaurant at a campsite that is open whole year. There is a peak season in summer, and so we aspect to have seasonal data and trend might be present.Supply Chain Foundations
/tspost/supply-chain-foundations/
Mon, 25 Feb 2019 00:00:00 +0000/tspost/supply-chain-foundations/body { text-align: justify} INTRODUCTION Who do companies care about supply chain management? We know that food is always at the grocery store, and clothing at the department store. How they get there and who is making those items are there every single day? This is the job of supply chain manager.
In supply chain management we start with purchasing, some people call it Procurement. The second part is Manufacturing and Operation, where the product is made, and we have to do that quickly and being able to do on a day.Gradient Descent Step by Step
/theorypost/gradient-descent-step-by-step/
Wed, 13 Feb 2019 00:00:00 +0000/theorypost/gradient-descent-step-by-step/body {
text-align: justify}
This article is a summary of the StatQuest video made by Josh Starmer.
Click here to see the video explained by Josh Starmer.
Introduction
In statistics, Machine Learning and other Data Science fields, we optimize a lot of stuff. For example in linear regresion, we optimize the Intercept and Slope, or when we use Logistic Regression we optimize the squiggle. Moreover, in t-SNE we optimize clusters.Data Science using Agile Methodology
/theorypost/data-science-using-agile-methodology/
Mon, 11 Feb 2019 00:00:00 +0000/theorypost/data-science-using-agile-methodology/body { text-align: justify} INTRODUCTION A data science team asks great questions, explores the data, and delivers key insights. The best way to generate business value is to deliver a constant stream of key insights in short two-week sprints. A short sprint will also help the team pivot so they can ask new questions based on what they learn from the data.
WORK ON A DATA SCIENCE PROJECT Typical project upfront requirements and we need to understand what we are going to build before to start the planning project.Statistical Background for Time Series
/tspost/statistica-background-for-time-series/
Wed, 06 Feb 2019 00:00:00 +0000/tspost/statistica-background-for-time-series/body {
text-align: justify}
In this post we will review the statistical background for time series analysis and forecasting. We start about how to compare different time seris models against each other.
Forecast Accuracy It determine how much difference thare is between the actual value and the forecast for the value. The simplest way to m ake a comparison is via scale dependent error because all the models need to be on the same scale using the Mean Absolute Error - MAE and the Root Mean Squared Error - RMSE.Distinguish Benign and Malign Tumor via ANN
/aipost/distinguish-benign-and-malign-tumor-via-ann/
Tue, 05 Feb 2019 00:00:00 +0000/aipost/distinguish-benign-and-malign-tumor-via-ann/body {
text-align: justify}
We try to recognize cancer in human breast using a multi-hidden layer artificial neural network via H2O package. We use the Wisconsin Breast-Cancer Dataset which is a collectioin of Dr.Wolberg real clinical cases. There are no images, but we can recognize malignal tumor based on 10 biomedical attributes. We have a total number of 699 patients divided in two classes: malignal and benign cancer.Event Processing
/theorypost/event-processing/
Wed, 23 Jan 2019 00:00:00 +0000/theorypost/event-processing/body {
text-align: justify}
Process Data
Event data consists of three basic components: the why, the what and the who.
Analysing event data is an iteractive process of three steps: extraction (from raw data to event log), processing (removing redundant details, enrich data by calculating variables) and analysis.
The analysis could be for instance which are the roles of different doctors and nurses organization and how they work together.Continuous Probability
/theorypost/continuous-probability/
Tue, 22 Jan 2019 00:00:00 +0000/theorypost/continuous-probability/body {
text-align: justify}
Empirical Cumulative Distribution Function When summarizing a list of numeric values such as heights, it’s not useful to construct a distribution that assigns a proportion to each possible outcome. It is much more practical to define a function that operates on intervals rather than single values. The standard way of doing this is using the cumulative distribution function (CDF). As an example, we define the empirical cumulative distribution function (eCDF) for heights for male adult students.How many Monte Carlo are enough?
/theorypost/how-many-monte-carlo-are-enough/
Sun, 20 Jan 2019 00:00:00 +0000/theorypost/how-many-monte-carlo-are-enough/body {
text-align: justify}
Here an example of the The birthday problem solution via Monte Carlo. Suppose you’re in a classroom with 22 people. If we assume this is a randomly selected group, what is the chance that at least two people have the same birthday?
This is a problem of discrete probability.
All right, first, note that birthdays can be represented as numbers between 1 and 365.The Monty Hall Problem
/theorypost/the-monty-hall-problem/
Sun, 20 Jan 2019 00:00:00 +0000/theorypost/the-monty-hall-problem/body {
text-align: justify}
In the 1970s, there was a game show called Let’s Make a Deal. Monty Hall was the hos, this is where the name of the problem comes from. At some point in the game, contestants were asked to pick one of three doors. Behind one door, there was a prize. The other two had a goat behind them. And this basically meant that you lost.Automotive Multivariate Visualization
/graphpost/automotive-multivariate-visualization/
Sat, 19 Jan 2019 00:00:00 +0000/graphpost/automotive-multivariate-visualization/body {
text-align: justify}
This is a session dedicated to multivariate data visualization using some tipical feature of automobile. Here below we can see the matrix of correlation between features and a graphical representation.
mpg cyl disp hp drat wt qsec vs am gear carb
mpg 1.00 -0.85 -0.85 -0.78 0.68 -0.87 0.42 0.66 0.60 0.48 -0.55
cyl -0.85 1.00 0.90 0.83 -0.70 0.78 -0.59 -0.81 -0.How Neural Network learn? An example of risk of churn.
/aipost/how-neural-network-learn/
Mon, 14 Jan 2019 00:00:00 +0000/aipost/how-neural-network-learn/body {
text-align: justify}
Having a one layer neural network (single layer feedforeward) with the output value to be compare to the actual value. Baed on the activation function we have our output. In order to be able to lear, we have to compare the output value with the actual value via the cost funtion which is the half of the squred difference output and actual value.Customer segmentation via K-Means & Hierarchical clustering
/mlpost/customer-segmentation/
Sat, 12 Jan 2019 00:00:00 +0000/mlpost/customer-segmentation/body {
text-align: justify}
Consider to have a big mall in a specific city that contains information of its clients that subcribed to a membership card. The last feature is Spending Score that is a score that the mall computed for each of their clients based on several criteria including for example their income and the number of times per week they show up in the mall and of course, the amount of dollars they spent in a year.Assessing the sucess of a new product via multiple classifiers
/mlpost/estimate-the-sucess-of-a-new-product-with-logistic-regression/
Thu, 10 Jan 2019 00:00:00 +0000/mlpost/estimate-the-sucess-of-a-new-product-with-logistic-regression/body {
text-align: justify}
These are a series of analysis to illustate the main classification algorithms and their advantages.
The table shows the business clients of a company that has just launched a new product online. Some of the clients responded positively to the ads by buying the product and other responded negatively by not buying the product. The last column of the table tells for each user if the user bought the product or not.Interview
/about/
Thu, 05 May 2016 21:48:51 -0700/about/body { text-align: justify} .rwd-video { height: 0; overflow: hidden; padding-bottom: 56.25%; padding-top: 30px; position: relative; } .rwd-video iframe, .rwd-video object, .rwd-video embed { height: 100%; left: 0; position: absolute; top: 0; width: 100%; } I started my career as a Data Scientist for more than 10 years ago in the field of Neuroscience at the University of Verona - School of Medicine, Italy. As point person in data analysis, I studied the single and multi-unit recordings from behaving non human primates.