# Assignment代写：Naive Bayes Classifier

2018-03-15 来源: 51due教员组 类别: 更多范文

Problem Definition

You will be given a movie review corpus where each review has a number of sentences and can be associated with a binary label based on its sentiment polarity (0 for negative and 1 for positive). You must write a pro-gram that automatically classifies these movie reviews to either positive or negative by implementing a Naive Bayes Classifier.

A Naive Bayes Classifier computes the likelihood that a movie review is either positive or negative and most commonly chooses the class that is the most probable. Towards this end, you will need to represent each movie review as a feature vector (you must choose the appropriate features). Please refer to the lecture notes and textbook for these concepts and explain any design and model choice you make in a report.

The executable of your program must be named “NaiveBayesClassifier”.

Program Input: You will be given two separate datasets, a training dataset which will be contained in a file named “training.txt”, and a testing dataset which will be contained in a file named “testing.txt”. Each of these files will have an arbitrary number of lines and each line represents one movie review (also referred to as docu-ment). Each line will have the format “D\tL”, where L is the class (label) of the document (either 0 or 1) and D is the text of the document. Note that D might contain several sentences. Here is an example of a line:

I didn’t know this came from Canada, but it is very good. Very good!

where the label of the document is 1 (positive). And here is an example of a negative review:

0 Starts out with a opening scene that is a terrific example of absurd comedy. A formal orchestra audi-ence is turned into an insane, violent mob by the crazy chantings of it’s singers.

Your program should take as input the two file path names with the training and testing datasets (e.g.,

“training.txt” and “testing.txt”) through command-line arguments ).

Here is an example of how you should execute your code depending on which language you use (assuming that

the dataset files are in the same directory with the executable):

• C/C++: ./NaiveBayesClassifier training.txt testing.txt

• Java: java NaiveBayesClassifier training.txt testing.txt

• Python: python NaiveBayesClassifier.py training.txt testing.txt

