Let’s Write a Pipeline – Machine Learning Recipes #4
Articles,  Blog

Let’s Write a Pipeline – Machine Learning Recipes #4


100 Comments

  • Luis Leal

    Josh, whats your opinion about train,cross validation and test spliting? And does it will be mentioned in future episodes?

  • lobachevscki

    You seriously improved your pronnuntiation from "android which is amazing at reading" to "actual human being". I think are the mannerism and the body language, mostly. Greato job, jokes aside, i think it helps to absorb the content.

    Please, do more of these videos for advanced level.

    Thanks!

  • chakree ten

    hello, this is what i got while running it on my sublime text 3
    from sklearn.cross_Validation import train_test_Split
    ImportError: No module named cross_Validation

  • Abhishek Murali

    I just had 1 question: When we split the data using split command, we basically make the first 75 as training and rest 75 as testing. However, if we don't give training data of the 3rd label, how is it classifying that as well? Or am I interpreting the split function wrongly?Great videos though. As a beginner, these are really helping me.

  • Devender Shekhawat

    what if the spliting method takes all data related to one flower (in iris example) and assign it to test data. can we select the order/randomness

  • Raghunandan Kavi

    sklearn.cross_validation is deprecated. need to change to sklearn.model_selection. Using a IDE is always better than typing code on NotePad

  • Mārtiņš Mālmanis

    Something doesn't work for me – some DeprecationWarning appears… What should I do in this case? Here is print screen:

    https://www.screencast.com/t/jwEiWyAGa6V

  • Turkey Sandwich

    10/10 quality production. Teacher speaks clearly and is easy to understand / enthusiastic. Content is well organized and can be followed each step of the way. I found this series to be the most valuable guide on machine learning on Youtube at the moment. All of the code has worked for me in Python3.4 as well.

  • Gandluri Sai Kishan 13BCE1039

    please help me out
    i am getting this error
    File "<ipython-input-116-f9c2da5a35bb>", line 6, in <module>
    x = iris.data
    AttributeError: 'function' object has no attribute 'data'

  • Tiago

    I have created a github repo with all of the code for all of the recipes of this series. I've used Python3 for all recipes. I've also updated all of the libraries and have added some things to the code here and there. Check it out: https://github.com/TheCoinTosser/MachineLearningGoogleSeries

  • Fistro Man

    About features in knn:The features are finite.So you can create all combinations of them, and then see if we quit one, the results dont change too much… ok ok it could be a lot of computing power, but is the machine deciding by its own rules, no human interaction.This is just good if some of hour features are good, if all are bad dont solve anything 🙂

  • 常Bright

    My major is Statistics and I want to apply for a PhD position in Statistics. But after seeing this series, I have changed my mind!

  • Rex Asabor

    Would we select the classifier with the most accuracy after we test? Also, after we test, shouldnt we feed the testing data too, to increase accuracy?

  • Mayank Gupta

    Hi All, I created a nicely formatted repository containing the code from this video, but updated to work with new packages.
    https://github.com/officialgupta/MachineLearningRecipes
    Like this so people can see it!

  • TimePass

    +Josh Gordon Hey, I am getting an "value error: too many values to unpack" error on executing.
    I have tried using model_selection instead of cross_validation, and still same error pops up.
    Can you help me out?

  • TatTvamAsi

    OMG, I finally see a reason for learning math in high school. I'm so happy I took the time to learn about equation of a line and finding slopes. XD

  • Akadehmix

    If anyone is watching this when cross_validation becomes deprecated, replace cross_validation with model_selection. The classes and functions should work the same, as they are being refactored and moved to this namespace.

  • prashant vaishla

    Their a lots of different classifier algorithm available . but how once can select suitable algo for classification. What should be the criteria for the selection of classification algorithm

  • Omar Salim

    what if the new dot is neither red nor green , How can the classifier recognize that, and return with value 'false' instead of wrong prediction ? I'm working on face recognition project and I'm using this sklearn library … any ideas how can i recognize the face that it's not in the training data ? thanks

  • M15H4

    if you have troubles executing this…
    1) make sure you have "sklearn.model_selection" instead of "sklearn.cross_validation"
    2) If your dataset is undefined, check spelling. Uppercase X and lowercase Y used continuously in this example

  • Pranav Desai

    Can we achieve more accuracy or probably even 100% accuracy by making the classifier more complex or have more parameters. Example– We could classify the dots(more random) better by having a more complex function such as a cubic or a bi-quadratic one right?

  • nebulousJames12345

    I put this on at night and slept to 12 in the afternoon. I put it back on 3 hours after I woke up and fell asleep again for 2 hours

  • Olaseni Odebiyi

    from sklearn.cross_validation import train_test_split
    /anaconda3/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
    "This module will be removed in 0.20.", DeprecationWarning)

  • Joyjit Chatterjee

    Great. Here is my code for classifying the Iris Flower dataset using the Random Forest Classifier~

    from sklearn.datasets import load_iris
    iris=load_iris()
    X=iris.data
    Y=iris.target

    from sklearn import ensemble
    from sklearn.cross_validation import train_test_split

    X_train,X_test,Y_train,Y_test=train_test_split(X,Y,test_size=0.5)

    clf=ensemble.RandomForestClassifier()
    clf.fit(X_train,Y_train)
    predictions=clf.predict(X_test)
    print("Using Random Forest Classifier, Predictions are:")
    print(predictions)

    from sklearn.metrics import accuracy_score

    print("Accuracy Score in percent is:")
    score=accuracy_score(predictions,Y_test)
    print(score*100)

  • dario27

    If you get the deprecation warning, simply replace:
    from sklearn.cross_validation import train_test_split
    with
    from sklearn.model_selection import train_test_split

  • AD ForKnowledge

    Very nice videos I liked all..!!! 🙂 The way you are presenting the example it triggered me to learn Python.. U made it look simple 🙂 I am android developer and have total interest in machine learning…. 🙂 Thanks for the good content… 🙂

  • Xavier X

    aaaaaaaaaaaaah! ridiculous pace.
    hint: watch these videos at 0.5 speed or slower.
    press pause frequently to digest whats going on.

  • Abdullah Aghazadah

    Here is a quick summary of the video:

    – scikit-learn has a handy function for splitting data sets into a training and a testing set
    – it's sklearn.model_selection.train_test_split(data_set_features,data_set_labels,test_fraction)
    – this function will return 1) training_features 2) testing_features 3) training_labels and 4) testing_labels
    – i.e. it returns a tuple of 4 elements
    – note, the test_fraction argument specifies the fraction of the data you want to use for testing
    – so if you put 0.5, it means you want to use half the data for testing (and the other half for training obviously)

    – recall that the .predict() method returns a list of predictions for the list of examples you pass it
    – you can use sklearn.metrics.accuracy_score(test_labels,predicted_labels) to compare two list of labels essentially

    – supervised learning is also known as function approximation because ultimately what you are doing is finding a function that matches your training examples well
    – you start with some general form of the function (e.g. y = mx+b) and then you tune the parameters such that it best describes your training examples (i.e. change m and b until you get a line that best splits your data)

    Key thing to take away from the video:
    Supervised learning is just function approximation. You start with a general function and then tweak the parameters of the function based on your training examples until your function describes the training data well.

  • Robin Dong

    Josh, you are not only knowledgeable of all these ML, but also a outstanding instructor. Simplified all these complicated methods. Cant thank you enough.

  • Mithilesh Thakkar

    I am getting this error while doing accuracy check: accuracy_score() missing 1 required positional argument: 'y_pred'

    may someone help me to sort out.

  • Fennec Besixdouze

    By the way, cross_validation module has been renamed model_selection. Lesson 0: learn to go read the documentation of modules you use. Stuff changes constantly.

Leave a Reply

Your email address will not be published. Required fields are marked *