## Posts Tagged ‘neural network’

### It is Not the Machine that is learning. Is human algorithms forcing everyone to adapt or die?

Posted on: November 16, 2020

# Which machine learning algorithm should I use? How many and which one is best?

Note: in the early 1990’s, I took graduate classes in Artificial Intelligence (AI) (The if…Then series of questions and answer of experts in their fields of work) and neural networks developed by psychologists.

The concepts are the same, though upgraded with new algorithms and automation.

I recall a book with a Table (like the Mendeleev table in chemistry) that contained the terms, mental processes, mathematical concepts behind the ideas that formed the AI trend…

There are several lists of methods, depending on the field of study you are more concerned with.

One list of methods is constituted of methods that human factors are trained to utilize if need be, such as:

Verbal protocol, neural network, utility theory, preference judgments, psycho-physical methods, operational research, prototyping, information theory, cost/benefit methods, various statistical modeling packages, and expert systems.

There are those that are intrinsic to artificial intelligence methodology such as:

Fuzzy logic, robotics, discrimination nets, pattern matching, knowledge representation, frames, schemata, semantic network, relational databases, searching methods, zero-sum games theory, logical reasoning methods, probabilistic reasoning, learning methods, natural language understanding, image formation and acquisition, connectedness, cellular logic, problem solving techniques, means-end analysis, geometric reasoning system, algebraic reasoning system.

Hui Li on Subconscious Musings posted on April 12, 2017 Advanced Analytics | Machine Learning

This resource is designed primarily for beginner to intermediate data scientists or analysts who are interested in identifying and applying machine learning algorithms to address the problems of their interest.

typical question asked by a beginner, when facing a wide variety of machine learning algorithms, is “which algorithm should I use?”

The answer to the question varies depending on many factors, including:

• The size, quality, and nature of data.
• The available computational time.
• The urgency of the task.
• What you want to do with the data.

Even an experienced data scientist cannot tell which algorithm will perform the best before trying different algorithms.

We are not advocating a one and done approach, but we do hope to provide some guidance on which algorithms to try first depending on some clear factors.

## The machine learning algorithm cheat sheet

The machine learning algorithm cheat sheet helps you to choose from a variety of machine learning algorithms to find the appropriate algorithm for your specific problems.

This article walks you through the process of how to use the sheet.

Since the cheat sheet is designed for beginner data scientists and analysts, we will make some simplified assumptions when talking about the algorithms.

The algorithms recommended here result from compiled feedback and tips from several data scientists and machine learning experts and developers.

There are several issues on which we have not reached an agreement and for these issues we try to highlight the commonality and reconcile the difference.

Additional algorithms will be added in later as our library grows to encompass a more complete set of available methods.

### How to use the cheat sheet

Read the path and algorithm labels on the chart as “If <path label> then use <algorithm>.” For example:

• If you want to perform dimension reduction then use principal component analysis.
• If you need a numeric prediction quickly, use decision trees or logistic regression.
• If you need a hierarchical result, use hierarchical clustering.

Sometimes more than one branch will apply, and other times none of them will be a perfect match.

It’s important to remember these paths are intended to be rule-of-thumb recommendations, so some of the recommendations are not exact.

Several data scientists I talked with said that the only sure way to find the very best algorithm is to try all of them.

(Is that a process to find an algorithm that matches your world view on an issue? Or an answer that satisfies your boss?)

## Types of machine learning algorithms

This section provides an overview of the most popular types of machine learning. If you’re familiar with these categories and want to move on to discussing specific algorithms, you can skip this section and go to “When to use specific algorithms” below.

### Supervised learning

Supervised learning algorithms make predictions based on a set of examples.

For example, historical sales can be used to estimate the future prices. With supervised learning, you have an input variable that consists of labeled training data and a desired output variable.

You use an algorithm to analyze the training data to learn the function that maps the input to the output. This inferred function maps new, unknown examples by generalizing from the training data to anticipate results in unseen situations.

• Classification: When the data are being used to predict a categorical variable, supervised learning is also called classification. This is the case when assigning a label or indicator, either dog or cat to an image. When there are only two labels, this is called binary classification. When there are more than two categories, the problems are called multi-class classification.
• Regression: When predicting continuous values, the problems become a regression problem.
• Forecasting: This is the process of making predictions about the future based on the past and present data. It is most commonly used to analyze trends. A common example might be estimation of the next year sales based on the sales of the current year and previous years.

### Semi-supervised learning

The challenge with supervised learning is that labeling data can be expensive and time consuming. If labels are limited, you can use unlabeled examples to enhance supervised learning. Because the machine is not fully supervised in this case, we say the machine is semi-supervised. With semi-supervised learning, you use unlabeled examples with a small amount of labeled data to improve the learning accuracy.

### Unsupervised learning

When performing unsupervised learning, the machine is presented with totally unlabeled data. It is asked to discover the intrinsic patterns that underlies the data, such as a clustering structure, a low-dimensional manifold, or a sparse tree and graph.

• Clustering: Grouping a set of data examples so that examples in one group (or one cluster) are more similar (according to some criteria) than those in other groups. This is often used to segment the whole dataset into several groups. Analysis can be performed in each group to help users to find intrinsic patterns.
• Dimension reduction: Reducing the number of variables under consideration. In many applications, the raw data have very high dimensional features and some features are redundant or irrelevant to the task. Reducing the dimensionality helps to find the true, latent relationship.

### Reinforcement learning

Reinforcement learning analyzes and optimizes the behavior of an agent based on the feedback from the environment.  Machines try different scenarios to discover which actions yield the greatest reward, rather than being told which actions to take. Trial-and-error and delayed reward distinguishes reinforcement learning from other techniques.

## Considerations when choosing an algorithm

When choosing an algorithm, always take these aspects into account: accuracy, training time and ease of use. Many users put the accuracy first, while beginners tend to focus on algorithms they know best.

When presented with a dataset, the first thing to consider is how to obtain results, no matter what those results might look like. Beginners tend to choose algorithms that are easy to implement and can obtain results quickly. This works fine, as long as it is just the first step in the process. Once you obtain some results and become familiar with the data, you may spend more time using more sophisticated algorithms to strengthen your understanding of the data, hence further improving the results.

Even in this stage, the best algorithms might not be the methods that have achieved the highest reported accuracy, as an algorithm usually requires careful tuning and extensive training to obtain its best achievable performance.

## When to use specific algorithms

Looking more closely at individual algorithms can help you understand what they provide and how they are used. These descriptions provide more details and give additional tips for when to use specific algorithms, in alignment with the cheat sheet.

### Linear regression and Logistic regression

Linear regressionLogistic regression

Linear regression is an approach for modeling the relationship between a continuous dependent variable [Math Processing Error]y and one or more predictors [Math Processing Error]X. The relationship between [Math Processing Error]y and [Math Processing Error]X can be linearly modeled as [Math Processing Error]y=βTX+ϵ Given the training examples [Math Processing Error]{xi,yi}i=1N, the parameter vector [Math Processing Error]β can be learnt.

If the dependent variable is not continuous but categorical, linear regression can be transformed to logistic regression using a logit link function. Logistic regression is a simple, fast yet powerful classification algorithm.

Here we discuss the binary case where the dependent variable [Math Processing Error]y only takes binary values [Math Processing Error]{yi∈(−1,1)}i=1N (it which can be easily extended to multi-class classification problems).

In logistic regression we use a different hypothesis class to try to predict the probability that a given example belongs to the “1” class versus the probability that it belongs to the “-1” class. Specifically, we will try to learn a function of the form:[Math Processing Error]p(yi=1|xi)=σ(βTxi) and [Math Processing Error]p(yi=−1|xi)=1−σ(βTxi).

Here [Math Processing Error]σ(x)=11+exp(−x) is a sigmoid function. Given the training examples[Math Processing Error]{xi,yi}i=1N, the parameter vector [Math Processing Error]β can be learnt by maximizing the Pyongyang said it could call off the talks, slated for June 12, if the US continues to insist that it give up its nuclear weapons. North Korea called the military drills between South Korea and the US a “provocation,” and canceled a meeting planned for today with South Korea.of [Math Processing Error]β given the data set.Group By Linear RegressionLogistic Regression in SAS Visual Analytics

### Linear SVM and kernel SVM

Kernel tricks are used to map a non-linearly separable functions into a higher dimension linearly separable function. A support vector machine (SVM) training algorithm finds the classifier represented by the normal vector [Math Processing Error]w and bias [Math Processing Error]b of the hyperplane. This hyperplane (boundary) separates different classes by as wide a margin as possible. The problem can be converted into a constrained optimization problem:
[Math Processing Error]minimizew||w||subject toyi(wTXi−b)≥1,i=1,…,n.

A support vector machine (SVM) training algorithm finds the classifier represented by the normal vector  and bias  of the hyperplane. This hyperplane (boundary) separates different classes by as wide a margin as possible. The problem can be converted into a constrained optimization problem:

When the classes are not linearly separable, a kernel trick can be used to map a non-linearly separable space into a higher dimension linearly separable space.

When most dependent variables are numeric, logistic regression and SVM should be the first try for classification. These models are easy to implement, their parameters easy to tune, and the performances are also pretty good. So these models are appropriate for beginners.

### Trees and ensemble trees

Decision trees, random forest and gradient boosting are all algorithms based on decision trees.

There are many variants of decision trees, but they all do the same thing – subdivide the feature space into regions with mostly the same label. Decision trees are easy to understand and implement.

However, they tend to over fit data when we exhaust the branches and go very deep with the trees. Random Forrest and gradient boosting are two popular ways to use tree algorithms to achieve good accuracy as well as overcoming the over-fitting problem.

### Neural networks and deep learning

Neural networks flourished in the mid-1980s due to their parallel and distributed processing ability.

Research in this field was impeded by the ineffectiveness of the back-propagation training algorithm that is widely used to optimize the parameters of neural networks. Support vector machines (SVM) and other simpler models, which can be easily trained by solving convex optimization problems, gradually replaced neural networks in machine learning.

In recent years, new and improved training techniques such as unsupervised pre-training and layer-wise greedy training have led to a resurgence of interest in neural networks.

Increasingly powerful computational capabilities, such as graphical processing unit (GPU) and massively parallel processing (MPP), have also spurred the revived adoption of neural networks. The resurgent research in neural networks has given rise to the invention of models with thousands of layers.

Shallow neural networks have evolved into deep learning neural networks.

Deep neural networks have been very successful for supervised learning.  When used for speech and image recognition, deep learning performs as well as, or even better than, humans.

Applied to unsupervised learning tasks, such as feature extraction, deep learning also extracts features from raw images or speech with much less human intervention.

A neural network consists of three parts: input layer, hidden layers and output layer.

The training samples define the input and output layers. When the output layer is a categorical variable, then the neural network is a way to address classification problems. When the output layer is a continuous variable, then the network can be used to do regression.

When the output layer is the same as the input layer, the network can be used to extract intrinsic features.

The number of hidden layers defines the model complexity and modeling capacity.

Deep Learning: What it is and why it matters

### k-means/k-modes, GMM (Gaussian mixture model) clustering

K Means ClusteringGaussian Mixture Model

Kmeans/k-modes, GMM clustering aims to partition n observations into k clusters. K-means define hard assignment: the samples are to be and only to be associated to one cluster. GMM, however define a soft assignment for each sample. Each sample has a probability to be associated with each cluster. Both algorithms are simple and fast enough for clustering when the number of clusters k is given.

### DBSCAN

When the number of clusters k is not given, DBSCAN (density-based spatial clustering) can be used by connecting samples through density diffusion.

### Hierarchical clustering

Hierarchical partitions can be visualized using a tree structure (a dendrogram). It does not need the number of clusters as an input and the partitions can be viewed at different levels of granularities (i.e., can refine/coarsen clusters) using different K.

### PCA, SVD and LDA

We generally do not want to feed a large number of features directly into a machine learning algorithm since some features may be irrelevant or the “intrinsic” dimensionality may be smaller than the number of features. Principal component analysis (PCA), singular value decomposition (SVD), andlatent Dirichlet allocation (LDA) all can be used to perform dimension reduction.

PCA is an unsupervised clustering method which maps the original data space into a lower dimensional space while preserving as much information as possible. The PCA basically finds a subspace that most preserves the data variance, with the subspace defined by the dominant eigenvectors of the data’s covariance matrix.

The SVD is related to PCA in the sense that SVD of the centered data matrix (features versus samples) provides the dominant left singular vectors that define the same subspace as found by PCA. However, SVD is a more versatile technique as it can also do things that PCA may not do.

For example, the SVD of a user-versus-movie matrix is able to extract the user profiles and movie profiles which can be used in a recommendation system. In addition, SVD is also widely used as a topic modeling tool, known as latent semantic analysis, in natural language processing (NLP).

A related technique in NLP is latent Dirichlet allocation (LDA). LDA is probabilistic topic model and it decomposes documents into topics in a similar way as a Gaussian mixture model (GMM) decomposes continuous data into Gaussian densities. Differently from the GMM, an LDA models discrete data (words in documents) and it constrains that the topics are a priori distributed according to a Dirichlet distribution.

## Conclusions

This is the work flow which is easy to follow. The takeaway messages when trying to solve a new problem are:

• Define the problem. What problems do you want to solve?
• Start simple. Be familiar with the data and the baseline results.
• Then try something more complicated.
• Dr. Hui Li is a Principal Staff Scientist of Data Science Technologies at SAS. Her current work focuses on Deep Learning, Cognitive Computing and SAS recommendation systems in SAS Viya. She received her PhD degree and Master’s degree in Electrical and Computer Engineering from Duke University.
• Before joining SAS, she worked at Duke University as a research scientist and at Signal Innovation Group, Inc. as a research engineer. Her research interests include machine learning for big, heterogeneous data, collaborative filtering recommendations, Bayesian statistical modeling and reinforcement learning.

### Neural Network? Sciences or networking?

Posted on: July 7, 2014

Neural Network? Sciences or networking?

I have taken a couple of graduate courses in neural network at its beginning in 1989 and its modeling and how experiments are done and interpreted using this psychology computer learning algorithm.

# What Does a Neural Network Actually Do?

There has been a lot of renewed interest lately in neural networks (NNs) due to their popularity as a model for deep learning architectures (there are non-NN based deep learning approaches based on sum-products networks and support vector machines with deep kernels, among others).

Perhaps due to their loose analogy with biological brains, the behavior of neural networks has acquired an almost mystical status. This is compounded by the fact that theoretical analysis of multilayer perceptrons (one of the most common architectures) remains very limited, although the situation is gradually improving.

To gain an intuitive understanding of what a learning algorithm does, I usually like to think about its representational power, as this provides insight into what can, if not necessarily what does, happen inside the algorithm to solve a given problem.

I will do this here for the case of multilayer perceptrons. By the end of this informal discussion I hope to provide an intuitive picture of the surprisingly simple representations that NNs encode.

I should note at the outset that what I will describe applies only to a very limited subset of neural networks, namely the feedforward architecture known as a multilayer perceptron.

There are many other architectures that are capable of very different representations. Furthermore, I will be making certain simplifying assumptions that do not generally hold even for multilayer perceptrons. I find that these assumptions help to substantially simplify the discussion while still capturing the underlying essence of what this type of neural network does. I will try to be explicit about everything.

Let’s begin with the simplest configuration possible: two inputs nodes wired to a single output node. Our NN looks like this:

The label associated with a node denotes its output value, and the label associated with an edge denotes its weight. The topmost node $h$ represents the output of this NN, which is:

$h = f\left(w_1 x_1+w_2 x_2+b\right)$

In other words, the NN computes a linear combination of the two inputs $x_1$ and $x_2$, weighted by $w_1$ and $w_2$ respectively, adds an arbitrary bias term $b$ and then passes the result through a function $f$, known as the activation function.

There are a number of different activation functions in common use and they all typically exhibit a nonlinearity. The sigmoid activation $f(a)=\frac{1}{1+e^{-a}}$, plotted below, is a common example.

As we shall see momentarily, the nonlinearity of an activation function is what enables neural networks to represent complicated input-output mappings.

The linear regime of an activation function can also be exploited by a neural network, but for the sake of simplifying our discussion, we will choose an activation function without a linear regime. In other words, $f$ will be a simple step function:

This will allow us to reason about the salient features of a neural network without getting bogged down in the details.

In particular, let’s consider what our current neural network is capable of. The output node can generate one of two values, and this is determined by a linear weighting of the values of the input nodes. Such a function is a binary linear classifier.

As shown below, depending on the values of $w_1$ and $w_2$, one regime in this two-dimensional input space yields a response of $0$ (white) and the other a response of $1$ (shaded):

Let’s now add two more output nodes (a neural network can have more than a single output). I will need to introduce a bit of notation to keep track of everything. The weight associated with an edge from the $jth$ node in the first layer to the $ith$ node in the second layer will be denoted by $w_{ij}^{(1)}$. The output of the $ith$ node in the $nth$ layer will be denoted by $a_i^{(n)}$.

Thus $x_1 = a_1^{(1)}$ and $x_2 = a_2^{(1)}$.

Every output node in this NN is wired to the same set of input nodes, but the weights are allowed to vary. Below is one possible configuration, where the regions triggering a value of $1$ are overlaid and colored in correspondence with the colors of the output nodes:

So far we haven’t really done anything, because we just overlaid the decision boundaries of three linear classifiers without combining them in any meaningful way. Let’s do that now, by feeding the outputs of the top three nodes as inputs into a new node.

I will hollow out the nodes in the middle layer to indicate that they are no longer the final output of the NN.

The value of the single output node at the third layer is:

$a_1^{(3)} = f \left(w_{11}^{(2)} a_1^{(2)}+w_{12}^{(2)} a_2^{(2)}+w_{13}^{(2)} a_3^{(2)}+b_1^{(2)}\right)$

Let’s consider what this means for a moment. Every node in the middle layer is acting as an indicator function, returning $0$ or $1$ depending on where the input lies in $\mathbb{R}^2$.

We are then taking a weighted sum of these indicator functions and feeding it into yet another nonlinearity. The possibilities may seem endless, since we are not placing any restrictions on the weight assignments.

In reality, characterizing the set of NNs (with the above architecture) that exhibit distinct behaviors does require a little bit of work–see Aside–but the point, as we shall see momentarily, is that we do not need to worry about all such possibilities.

One specific choice of assignments already gives the key insight into the representational power of this type of neural network. By setting all weights in the middle layer to $1/3$, and setting the bias of the middle layer $(b_1^{(2)})$ to $-1$, the activation function of the output neuron $(a_1^{(3)})$ will output $1$ whenever the input lies in the intersection of all three half-spaces defined by the decision boundaries, and $0$ otherwise.

Since there was nothing special about our choice of decision boundaries, we are able to carve out any arbitrary polygon and have the NN fire precisely when the input is inside the polygon (in the general case we set the weights to $1/k$, where $k$ is the number of hyperplanes defining the polygon).

This fact demonstrates both the power and limitation of this type of NN architecture.

On the one hand, it is capable of carving out decision boundaries comprised of arbitrary polygons (or more generally polytopes). Creating regions comprised of multiple polygons, even disjoint ones, can be achieved by adding a set of neurons for each polygon and setting the weights of their respective edges to $1/k_i$, where $k_i$ is the number of hyperplanes defining the $ith$ polygon.

This explains why, from an expressiveness standpoint, we don’t need to worry about all possible weight combinations, because defining a binary classifier over unions of polygons is all we can do. Any combination of weights that we assign to the middle layer in the above NN will result in a discrete set of values, up to one unique value per region formed by the union or intersection of the half-spaces defined by the decision boundaries, that are inputted to the $a_1^{(3)}$ node.

Since the bias $b_1^{(2)}$ can only adjust the threshold at which $a_1^{(3)}$ will fire, then the resulting behavior of any weight assignment is activation over some union of polygons defined by the shaded regions.

Thus our restricted treatment, where we only consider weights equal to $1/k$, already captures the representational power of this NN architecture.

A few caveats merit mention.

First, the above says nothing about representational efficiency, only power. A more thoughtful choice of weights, presumably identified by training the NN using backpropagation, can provide a more compact representation comprised of a smaller set of nodes and edges.

Second, I oversimplified the discussion by focusing only on polygons. In reality, any intersection of half-spaces is possible, even ones that do not result in bounded regions.

Third, and most seriously, feedforward NNs are not restricted to step functions for their activation functions. In particular modern NNs that utilize Rectified Linear Units (ReLUs) most likely exploit their linear regions.

Nonetheless, the above simplified discussion illustrates a limitation of this type of NNs. While they are able to represent any boundary with arbitrary accuracy, this would come at a significant cost, much like the cost of polygonally rendering smoothly curved objects in computer graphics.

In principle, NNs with sigmoidal activation functions are universal approximators, meaning they can approximate any continuous function with arbitrary accuracy. In practice I suspect that real NNs with a limited number of neurons behave more like my simplified toy models, carving out sharp regions in high-dimensional space, but on a much larger scale.

Regardless NNs still provide far more expressive power than most other machine learning techniques and my focus on $\mathbb{R}^2$ disguises the fact that even simple decision boundaries, operating in high-dimensional spaces, can be surprisingly powerful.

Before I wrap up, let me highlight one other aspect of NNs that this “union of polygons” perspective helps make clear.

It has long been known that an NN with a single hidden layer, i.e. the three-layer architecture discussed here, is equal in representational power to a neural network with arbitrary depth, as long as the hidden layer is made sufficiently wide.

Why this is so is obvious in the simplified setting described here, because unions of sets of unions of polygons can be flattened out in terms of unions of the underlying polygons. For example, consider the set of polygons formed by the following 10 boundaries:

We would like to create 8 neurons that correspond to the 8 possible activation patterns formed by the polygons (i.e. fire when input is in none of them (1 case), one of them (3 cases), two of them (3 cases), or any of them (1 case)).

In the “deep” case, we can set up a four-layer NN such that the second layer defines the edges, the third layer defines the polygons, and the fourth layer contains the 8 possible activation patterns:

The third layer composes the second layer, by creating neurons that are specific to each closed region.

However, we can just as well collapse this into the following three-layer architecture, where each neuron in the third layer “rediscovers” the polygons and how they must be combined to yield a specific activation pattern:

Deeper architectures allow deeper compositions, where more complex polygons are made up of simpler ones, but in principle all this complexity can be collapsed onto one (hidden) layer.

There is a difference in representational efficiency however, and the two architectures above illustrate this important point.

While the three-layer approach is just as expressive as the four-layer one, it is not as efficient: the three-layer NN has a 2-10-8 configuration, resulting in 100 parameters (20 edges connecting first to second layer plus 80 edges connecting second to third layer), while the four-layer NN, with a 2-10-3-8 configuration, only has 74 parameters.

Herein lies the promise of deeper architectures, by enabling the inference of complex models using a relatively small number of parameters. In particular, lower-level features such as the polygons above can be learned once and then reused by higher layers of the network.

That’s it for now. I hope this discussion provided some insight into the workings of neural networks.

If you’d like to read more, see the Aside, and I also recommend this blog entry by Christopher Olah which takes a topological view of neural networks.

Update: HN discussion here.

### An exercise: taxonomy of methods

Posted on: June 10, 2009

“An exercise for taxonomy of methods”

Article #14 in Human Factors

I am going to let you have a hand at classifying methods by providing a list of various methods that could be used in Industrial engineering, Human Factors, Ergonomics, and Industrial Psychology.

This first list of methods is organized in the sequence used to analyzing part of a system or a mission;

The second list is not necessarily randomized though thrown in without much order; otherwise it will not be an excellent exercise.

First, let us agree that a method is a procedure or a set of step by step process that our for runners of geniuses and scholars have tested, found it good, agreed on it on consensus basis and offered it for you to use for the benefit of progress and science.

Many of you will still try hard to find short cuts to anything, including methods, for the petty argument that the best criterion to discriminating among clever people is who waste time on methods and who are nerds.

Actually, the main reason I don’t try to teach many new methods in this course is that students might smack run into a real occupational stress which they are not immune of, especially that methods in human factors are complex and time consuming.

Here is this famous list of a few methods and you are to decide which ones are still in the conceptual phases and which have been “operationalized“.

The first list contains the following methods: operational analysis, activity analysis, critical incidents, function flow, decision/action, action/information analyses, functional allocation, task, fault tree, failure modes and effects analyses, time line, link analyses, simulation, controlled experimentation  operational sequence analysis, and workload assessment.

The second list is constituted of methods that human factors are trained to utilize if need be such as: verbal protocol, neural network, utility theory, preference judgments, psycho-physical methods, operational research, prototyping, information theory, cost/benefit methods, various statistical modeling packages, and expert systems.

Just wait, let me resume.

There are those that are intrinsic to artificial intelligence methodology such as: fuzzy logic, robotics, discrimination nets, pattern matching, knowledge representation, frames, schemata, semantic network, relational databases, searching methods, zero-sum games theory, logical reasoning methods, probabilistic reasoning, learning methods, natural language understanding, image formation and acquisition, connectedness, cellular logic, problem solving techniques, means-end analysis, geometric reasoning system, algebraic reasoning system.

If your education is multidisciplinary you may catalog the above methods according to specialty disciplines such as: artificial intelligence, robotics, econometrics, marketing, human factors, industrial engineering, other engineering majors, psychology or mathematics.

The most logical grouping is along the purpose, input, process/procedure, and output/product of the method, otherwise it would be impossible to define and understand any method.

Methods could be used to analyze systems, provide heuristic data about human performance, make predictions, generate subjective data, discover the cause and effects of the main factors, or evaluate the human-machine performance of products or systems.

The inputs could be qualitative or quantitative such as declarative data, categorical, or numerical generated from structured observations, records, interviews, questionnaires, computer generated or outputs from prior methods.

The outputs could be point data, behavioral trends, graphical in nature, context specific, generic, or reduction in alternatives.

The process could be a creative graphical or pictorial model, logical hierarchy or in network alternative, operational, empirical, informal, or systematic.

You may also group these methods according to their mathematical branches such as algebraic, probabilistic, or geometric.

You may collect them as to their deterministic, statistical sampling methods and probabilistic characters.

You may differentiate the methods as belonging to categorical, ordinal, discrete or continuous measurements.

You may wish to investigate the methods as parametric, non parametric, distribution free population or normally distributed.

You may separate them on their representation forms such as verbal, graphical, pictorial, or in table.

You may discriminate them on heuristic, observational, or experimental scientific values.

You may bundle these methods on qualitative or quantitative values.

You may as well separate them on their historical values or modern techniques based on newer technologies.

You may select them as to their state of the art methods such as ancient methods that new information and new paradigms have refuted their validity or recently developed.

You may define the methods as those digitally or analytically amenable for solving problems.

You may choose to draw several lists of those methods that are economically sounds, esoteric, or just plainly fuzzy sounding.

You may opt to differentiate these methods on requiring high level of mathematical reasoning that are out of your capability and those that can be comprehended through persistent efforts.

You could as well sort them according to which ones fit nicely into the courses that you have already taken, but failed to recollect that they were indeed methods worth acquiring for your career.

You may use any of these taxonomies to answer an optional exam question with no guarantees that you might get a substantial grade.

It would be interesting to collect statistics on how often these methods are being used, by whom, for what rational and by which line of business and by which universities.

It would be interesting to translate these methods into Arabic, Chinese, Japanese, Hindu, or Russian.

### Blog Stats

• 1,476,321 hits