#### Howdy, Stranger!

It looks like you're new here. If you want to get involved, click one of these buttons!

Options

# Machine learning

Started some wiki pages on Machine learning, Neural networks and Deep learning.

The well-known finitist, Jurgen Schmidthuber has a comprehensive historical survery of neural networks, not just deep learning.

The video by Yann LeCun presents some possibly surprising results about machine learning. The arguments are updated in Geoff Hinton's video.

• Options
1.
edited November 2014

I'm just at the very start of learning about machine learning. Too bad I'm not a machine!

I've never heard of this "well-known finitist". I mainly know mathematicians and physicists who are finitists.

Comment Source:<img src = "http://math.ucr.edu/home/baez/emoticons/thumbsup.gif" alt = ""/> I'm just at the very start of learning about machine learning. Too bad I'm not a machine! I've never heard of this "well-known finitist". I mainly know mathematicians and physicists who are finitists.
• Options
2.
edited November 2014

I'd like to know what the Davids, Paul, Dara, Nick or anybody else makes of Yann LeCun's argument in the NIPS 2009 video that largely underfitted models with far more parameters than nodes perform best rather than the one mode per attribute approach. This seems to suggest that dimensional reduction isn't needed?

He also demonstrates that stochastic gradient descent is the only method needed for optimal minimisation and more evolved methods like BFGS, conjugate gradient etc. can be ignored.

I think I heard Yann LeCun slightly disparage recurrent networks which is something Schmidthuber was big on . I think these are something that the people associated with Neneman and Tishby might have been into.

Jurgen Schmidthuber is an AI guy. I first came across him reading about digital physics where he's well known for his universe as a computer formulations.

His contributions also include generalizations of Kolmogorov complexity and the Speed Prior. From 2004 to 2009 he was professor of Cognitive Robotics at the Tech. University Munich. Since 1995 he has been co-director of the Swiss AI Lab IDSIA in Lugano, since 2009 also professor of Artificial Intelligence at the University of Lugano. Between 2009 and 2012, the recurrent neural networks and deep feedforward neural networks developed in his research group have won eight international competitions in pattern recognition and machine learning.[1] In honor of his achievements he was elected to the European Academy of Sciences and Arts in 2008.

PS. This reminds me of an argument I read in one climate paper where a multivariate model performs better than a bilinear one.

Comment Source:I'd like to know what the Davids, Paul, Dara, Nick or anybody else makes of Yann LeCun's argument in the NIPS 2009 video that largely underfitted models with far more parameters than nodes perform best rather than the one mode per attribute approach. This seems to suggest that dimensional reduction isn't needed? He also demonstrates that stochastic gradient descent is the only method needed for optimal minimisation and more evolved methods like BFGS, conjugate gradient etc. can be ignored. I think I heard Yann LeCun slightly disparage recurrent networks which is something Schmidthuber was big on . I think these are something that the people associated with Neneman and Tishby might have been into. Jurgen Schmidthuber is an AI guy. I first came across him reading about [digital physics](http://wikipedia.org/Digital_physics) where he's well known for his universe as a computer formulations. > His contributions also include generalizations of Kolmogorov complexity and the Speed Prior. From 2004 to 2009 he was professor of Cognitive Robotics at the Tech. University Munich. Since 1995 he has been co-director of the Swiss AI Lab IDSIA in Lugano, since 2009 also professor of Artificial Intelligence at the University of Lugano. Between 2009 and 2012, the recurrent neural networks and deep feedforward neural networks developed in his research group have won eight international competitions in pattern recognition and machine learning.[1] In honor of his achievements he was elected to the European Academy of Sciences and Arts in 2008. PS. This reminds me of an argument I read in one climate paper where a multivariate model performs better than a bilinear one.
• Options
3.
edited November 2014

Hello Jim

If you like to survey the literature for the latest known Machine Algorithms , o if you have a particular one you want to research, I am willing to help out, since I am continuously upgrading myself in this field. It has to be in Dec.

I did not hear about LeCun but if you know of his papers or where his source repository is, I could look it up.

My area of interest, if you find it interesting, is parallelization and real-time applications of Machine Learning algorithms, or their applications to very large data.

I actually want to code these algorithms, parallelize them and run them on large servers or clusters.

Comment Source:Hello Jim If you like to survey the literature for the latest known Machine Algorithms , o if you have a particular one you want to research, I am willing to help out, since I am continuously upgrading myself in this field. It has to be in Dec. I did not hear about LeCun but if you know of his papers or where his source repository is, I could look it up. My area of interest, if you find it interesting, is parallelization and real-time applications of Machine Learning algorithms, or their applications to very large data. I actually want to code these algorithms, parallelize them and run them on large servers or clusters.
• Options
4.
edited November 2014

I've added quite a few more references and started a Bayesian methods page.

Dara, Hessian-free networks don't have to keep recalculating their Hamiltonians and thus achieve superior performance to SVMs, NNs etc.

Yann LeCun's groups are at the Courant Institute and Facebook in which IIUC he's researching image recognition with deep learning, convolutional nets.

I've also added some Schmidthuber papers. You can easily find more on google scholar or somewhere.

John, the HLearn software I want to use is being developed by Mike Izbicki as part of his PhD in CS at UCR. His HLearn paper on deep learning linked on the Machine learning page makes a start at a categorical specification of polymorphic monoidal neural networks. You might want to talk to him to get a run down on machine learning and NIPS so you're familiar with the state-of-the-art jargon and he might be interested in the Azimuth project

NIPS keynotes seem to be videoed. I bet I'm not the only person who'd like it if the climate tutorial was available.

PS It's in my interest as HLearn is going through a major refactoring (ie. doesn't install easily) and talking to you might give him a nudge :)

Comment Source:I've added quite a few more references and started a [[Bayesian methods]] page. Dara, Hessian-free networks don't have to keep recalculating their Hamiltonians and thus achieve superior performance to SVMs, NNs etc. Yann LeCun's groups are at the Courant Institute and Facebook in which IIUC he's researching image recognition with deep learning, convolutional nets. I've also added some Schmidthuber papers. You can easily find more on google scholar or somewhere. John, the HLearn software I want to use is being developed by Mike Izbicki as part of his PhD in CS at UCR. His HLearn paper on deep learning linked on the [[Machine learning]] page makes a start at a categorical specification of polymorphic monoidal neural networks. You might want to talk to him to get a run down on machine learning and NIPS so you're familiar with the state-of-the-art jargon and he might be interested in the Azimuth project NIPS keynotes seem to be videoed. I bet I'm not the only person who'd like it if the climate tutorial was available. PS It's in my interest as HLearn is going through a major refactoring (ie. doesn't install easily) and talking to you might give him a nudge :)
• Options
5.

I watched the video by LeCun. He had a good intuitive explanation of getting around local minimum using multi-dimensional space. Essentially, in one dimension you are blocked by a mountain that gets in your way, but with a few extra dimensions, one can always find a path to "move around" the mountain.

When solving the DiffEq's for the sloshing wave equation, when optimizing by hand it seems that one can always find a path that gets around the local minimum, yet the solvers such as supplied by Mathematica always seem to get stuck.

I wonder if this has something to do with the fact that humans are good at supplying a stochastic element to the search. LeCun also claims that all these sophisticated algorithms are not as good as the rather primitive stochastic gradient.

I have lots of ideas for heuristics in doing these searches but have yet to try to automate the process.

Comment Source:I watched the video by LeCun. He had a good intuitive explanation of getting around local minimum using multi-dimensional space. Essentially, in one dimension you are blocked by a mountain that gets in your way, but with a few extra dimensions, one can always find a path to "move around" the mountain. When solving the DiffEq's for the sloshing wave equation, when optimizing by hand it seems that one can always find a path that gets around the local minimum, yet the solvers such as supplied by Mathematica always seem to get stuck. I wonder if this has something to do with the fact that humans are good at supplying a stochastic element to the search. LeCun also claims that all these sophisticated algorithms are not as good as the rather primitive stochastic gradient. I have lots of ideas for heuristics in doing these searches but have yet to try to automate the process.
• Options
6.
edited November 2014

Those are a couple of good take away points from the video, Now I need to dig out the online sources for the Martens Hessian-free approach. Dara might be interested in LeCun's arguments for the universality of neural networks being inadequate in practice.

Comment Source:Those are a couple of good take away points from the video, Now I need to dig out the online sources for the Martens Hessian-free approach. Dara might be interested in LeCun's arguments for the universality of neural networks being inadequate in practice.
• Options
7.
edited November 2014

I've added links to my current deep learning, Hessian-free convolutional network reading and some extracts from Schmidthuber. LeCun has a criticism of recurrent neural networks (RNNs) so there seem to have been many camps and a plethora of models which I will be interested to see if LeCun has pruned and consolidated.

It's bad form to post citations which I've not read but this is v. much a work in progress. Posts about any errors, omissions, corrections or additions to the wiki pages would be most welcome.

Comment Source:I've added links to my current deep learning, Hessian-free convolutional network reading and some extracts from Schmidthuber. LeCun has a criticism of recurrent neural networks (RNNs) so there seem to have been many camps and a plethora of models which I will be interested to see if LeCun has pruned and consolidated. It's bad form to post citations which I've not read but this is v. much a work in progress. Posts about any errors, omissions, corrections or additions to the wiki pages would be most welcome.
• Options
8.
edited November 2014

Jim, the wiki needs stuff on machine learning, so its good you're doing it. But I think you should change the page name "Bayesian methods" to "Bayesian networks" since that seems to be what its about. "Bayesian methods" covers a huge area, mostly not within machine learning. Also, start a separate threads here for "Bayesian networks", Deep learning, etc.

Comment Source:Jim, the wiki needs stuff on machine learning, so its good you're doing it. But I think you should change the page name "Bayesian methods" to "Bayesian networks" since that seems to be what its about. "Bayesian methods" covers a huge area, mostly not within machine learning. Also, start a separate threads here for "Bayesian networks", Deep learning, etc.
• Options
9.

Thanks Graham, I've changed the Bayes page and will start separate threads if I post more. Alternatively maybe the next person who wants to discuss NN or deeplearning can do so.

Comment Source:Thanks Graham, I've changed the Bayes page and will start separate threads if I post more. Alternatively maybe the next person who wants to discuss NN or deeplearning can do so.
• Options
10.

Hello Jim

The machine learning systems I am working on could do 1000s of training sessions per second, that is not enough for real life applications, I need to reach 1000000s training sessions per second.

Comment Source:Hello Jim The machine learning systems I am working on could do 1000s of training sessions per second, that is not enough for real life applications, I need to reach 1000000s training sessions per second.
• Options
11.

Tim Palmer, who wrote an article for the Azimuth blog, was great a few weeks ago on the BBC's flagship news programme 'Newsnight' on the announcement of the UK Met Office's new (I think it was a paltry £10m) supercomputer explaining it would be used for integrated weather and climate modelling. He said we need at least the same investment in climate modelling computer power as the LHC was allocated. Too right!

I think a reason to be interested in Hessian-free optimisation is its computational efficiency in not having to recompute Hamiltonians.

Comment Source:Tim Palmer, who wrote an article for the Azimuth blog, was great a few weeks ago on the BBC's flagship news programme 'Newsnight' on the announcement of the UK Met Office's new (I think it was a paltry £10m) supercomputer explaining it would be used for integrated weather and climate modelling. He said we need at least the same investment in climate modelling computer power as the LHC was allocated. Too right! I think a reason to be interested in Hessian-free optimisation is its computational efficiency in not having to recompute Hamiltonians.
• Options
12.

Pondering the LeCun presentation again, I have to remind myself that he is mainly dealing with the disciplines of speech and handwriting recognition. In that regime, grading classification accuracy is straightforward, because there is always an objective truth to each test trial.

In contrast, applying machine learning to climate science is open-ended and an objective truth is virtually impossible to achieve. And it gets worse. Besides not knowing what the right answer is, we don't even have the luxury of running controlled experiments to see what the possible outcomes might be.

The analogy would be on doing handwriting recognition on an alien written language. You might be able to classify the markings but you would not know what any of it means. That is about where we are at with trying to classify EOFs in climate dynamics. The EOFs are letters and words in a language that we have yet to understand.

Comment Source:Pondering the LeCun presentation again, I have to remind myself that he is mainly dealing with the disciplines of speech and handwriting recognition. In that regime, grading classification accuracy is straightforward, because there is always an objective truth to each test trial. In contrast, applying machine learning to climate science is open-ended and an objective truth is virtually impossible to achieve. And it gets worse. Besides not knowing what the right answer is, we don't even have the luxury of running controlled experiments to see what the possible outcomes might be. The analogy would be on doing handwriting recognition on an alien written language. You might be able to classify the markings but you would not know what any of it means. That is about where we are at with trying to classify EOFs in climate dynamics. The EOFs are letters and words in a language that we have yet to understand.
• Options
13.

Yes. The whole community focus seems to be on image recognition for YouTube and face recognition. Face recognition experiments done in Florida a few years ago were a spectacular failure.

LeCun points out that simple techniques like bilinear regression are more appropriate for some domains (David Tweed has suggested he might do a few such calcs if and when he has time).

I haven't got any realistic idea atm. of how to construct an appropriate deep network for 3D volumetric data.

Comment Source:Yes. The whole community focus seems to be on image recognition for YouTube and face recognition. Face recognition experiments done in Florida a few years ago were a spectacular failure. LeCun points out that simple techniques like bilinear regression are more appropriate for some domains (David Tweed has suggested he might do a few such calcs if and when he has time). I haven't got any realistic idea atm. of how to construct an appropriate deep network for 3D volumetric data.
• Options
14.

Pondering the LeCun presentation again, I have to remind myself that he is mainly dealing with the disciplines of speech and handwriting recognition. In that regime, grading classification accuracy is straightforward, because there is always an objective truth to each test trial.

Yes. I spent many years working in the pattern recognition/machine vision area (especially printed music notation), and it is certainly very different. To be a bit pedantic, obtaining 'objective truth' (often called 'ground truth') is not that easy. For example you may ask someone to write a 4, and they produce a squiggle which in the opinion of most experts looks more like a 9. What is it really? But still, basically, yes.

Another thing makes recogniton of writing, faces, and speech different. Humans are very good at these, so we know a high accuracy system is possible. That gives moral encouragement for the enterprise (whereas we don't know how accurate climate prediction could be). It also means we can get some clues from the human visual and auditory systems.

On the hand we do have physical models for the atmosphere and oceans. There is nothing equivalent in machine vision. Its not so much 'no models' as 85 different models per researcher.

Comment Source:> Pondering the LeCun presentation again, I have to remind myself that he is mainly dealing with the disciplines of speech and handwriting recognition. In that regime, grading classification accuracy is straightforward, because there is always an objective truth to each test trial. Yes. I spent many years working in the pattern recognition/machine vision area (especially printed music notation), and it is certainly very different. To be a bit pedantic, obtaining 'objective truth' (often called 'ground truth') is not that easy. For example you may ask someone to write a 4, and they produce a squiggle which in the opinion of most experts looks more like a 9. What is it really? But still, basically, yes. Another thing makes recogniton of writing, faces, and speech different. Humans are very good at these, so we know a high accuracy system is possible. That gives moral encouragement for the enterprise (whereas we don't know how accurate climate prediction could be). It also means we can get some clues from the human visual and auditory systems. On the hand we do have physical models for the atmosphere and oceans. There is nothing equivalent in machine vision. Its not so much 'no models' as 85 different models per researcher.