######################################################################## ## This is a sample R session. To get the most out of it, FIRST read ## Chapter 2 of the Introduction to R by Venebles and Smith (there's a ## link on the course website). ## Mostly your job is to run the commands below and ponder the ## results. In several places there are questions to answer. Make ## sure that all of your answers are "comments" like this. Comments ## are ignored by the R processor. They are meant for humans to ## read. Comments should begin with two ##'s. ###################################################################### ## CHECK to make sure that Emacs is running in the "ESS" minor-mode If ## it is, this text will be red (ESS colors comments red) and the ## gray bar at the bottom of this window should read something like this: ## -U:**- week1.R Top L13 (ESS[S] [R] Rox) ---------------------- ## the key bit is the "(ESS" ## A TRICK FOR KEEPING YOUR ## comments neat and easy to edit: is to ## put the cursor anywher in a block of commented text like this ## one. To format it ## neatly ## as ## a block of comments. MAKE SURE THAT EACH line BEGINS with ## ## then ## hit the Esc key (release it) and then hit the letter q. ##---------------------------------------------------------------- ## Quick review of critical emacs tricks. ##---------------------------------------------------------------- ## Ctr g (hereafter C-g) cancels any command that you are in the ## middle of typing. It never hurts to typ C-g. ## C-x 1 (that's numeral one NOT a lower case L) closes all emacs windows ## except the current one. It does not destroy the buffers in the ## other windoes, it just makes them go away. You can visit them again ## anytime via the "Buffers" menu item. ## C-x 2 splits the current window into 2 -- both showing the same buffer. ## C-x k kills the current buffer -- casting its contents into ## oblivion -- unless you wrote them to disk first! ## C-x C-s saves the buffer to the file from which it came. If the ## buffer did not come from a file, then use C-x C-w to write the ## contents of the buffer to a new filename. ###---------------------------------------------------------------------- ## LAUNCHING an R process inside an Emacs window ###---------------------------------------------------------------------- ## BEFORE YOU LAUNCH R -- pause and reflect about the commands to ## split and hide windows and about the Buffers menu that lets you ## revisit them. AFTER you launch R, you will want to have TWO ## windows: one containing this buffer and the other containing the ## running R process. If it doesn't come out that way, then you'll ## need to use the tricks you just learned to make is so. ## Launching an R process inside of Emacs: To launch an R proces ## either type Alt+x R [ENTER] (hereafter M-x R ) or click on the ## fancy R button in the emacs menu bar. NOTE that the "R" icon will ## only show if your cursor is in a window with the ESS minor mode ## running. ###---------------------------------------------------------------------- ## typing commands in your week1.R buffer and executing them in the R ## process running in the other window ###---------------------------------------------------------------------- ## Once you have two Emacs windows, one with the week1.R file's ## contents, and the other with an R process, you are ready to do ## science. This is the typical setp that you will use for doing ## assignments in this class and for making original and significant ## contributions to the field. ## THE big happy trick is to put the cursor somewhere in the line of ## code that you would like to execute and then to *either* click on ## the "Eval line & step" icon in the menu bar OR hit C-c C-n. The ## effect will be to pass the line that your cursor is on to the R ## process running in the other window --and then advance your cursor ## to the next executable line. There are other tricks for passing ## larger chunks of code to the R processor, but C-c C-n is the most ## common thing you'll do. ############################################ ## Getting Help in R ############################################# ## this will open a help application in a browswer. The browswer based ## help is ONE way of reading the help screens. help.start() ## the web based help facility is only one way strugging to figure out ## how to do something. Another is C-v as in: ## C-c C-v seq ## to see what the seq command might be good for. ## notice that after typing C-v seq your *other* window is now full of ## information on seq. C-v created a new buffer to hold the ## information you requested. Your R process buffer is still available ## and R is still running in it. You can use the Buffers menu to get ## it back -- or you can simply execute a line of Rcode via C-c C-n OR ## by clicking the "Eval line & step" icon in the menu bar. ## All this switching of buffers is disconcerting at first, but you ## will soon grow to love it. ## Another handy way of learning about R commands is the example() function. example(mean) ##################################################### ## Creating variables and assigning to them ################################################## ## Excecute this line to figure out what it does 1:10 ## QUESTION In the following two lineswhat does ## the "<-" operator do? and what does R do when you simply pass it ## the name of an object? var1<- 1:10 var1 ## the ls() command lists the objects in your workspace ls() ## Notice that the () indicate that ls is a "function" to be ## executed. If you tell R to execute ls *without* the parenthesis, ## something altogether different happens: ls ## QUESTION: explain what is going on here with ls vs ls(), if ## your explanation is complete, it should allow you to predict what ## will happen if you type the following: foo<-ls foo() foo2<-ls() foo2 ############################################# ## OK TIME TO TAKE A BREAK. ## ## Quit R by typing q() in the R window. ## ALWAYS ELECT NOT to save your work space; ## Save this buffer/file ## quit Emacs ## take a short wholesome break; then come back and open this file and ## pick up where you left off. The reason for doing this is to help ## you remember the procedure for launching Emacs and R. ############################################# ############################## ## Behaviors and properties of "vectors" ############################## ## var1 is a "vector" var1 <- 1:10 var1 ## so is var2. The c() function "concatenates" or links together its ## arguments, thus forming a "vector" var2 <- c(1,1.5,pi,log(2),sqrt(2),2^3) var2 is.vector(var1) is.vector(var2) ## var3 is also a vector as far as R is concerned, although most of us ## would call it a "scalar" var3<-22 is.vector(var3) ## QUESTION: Explain this: note that it produces a warning message ## but it also completes var1*var2 ##################################### ## Numeric and other types of data ##################################### ## QUESTION: Explain what's going on with the next 4 lines of ## code. Why does the second line produce and error message and not ## complete. What do the quote marks tell us in output of the last two lines. words<-c(1:10,"11") words * 10 1:10 words ## A "logical" or "boolean" is an object that takes the value of TRUE ## or FALSE for example: 1:7 <= 3 ## Other operators that can be used in "logical" expressions are: ## <,>, <=, >=, ==, !=, %in% (an example of %in% will come up later) ## you can of course assign the value of logical expressions to objects logical.vector1<- rnorm(15) >0 # look up rnorm() or at least guess what it does logical.vector1 ## HEADSUP = is NOT the same as == vect5<-seq(1,15,by=3) vect5 == 71 ## vect5 = 7 vect5 ## unlike character values, TRUE/FALSE values are "coerced" into 1 and ## zero respectively if the context demnds it: sum(logical.vector1) mean(logical.vector1)/var(logical.vector1) ## Operators that work on objects with logical values include & ## ("and"), | (or) ! means "not", it negates the logical value that ## follows it sum(logical.vector1 | (runif(5) > .25)) sum(logical.vector1 & (runif(5) > .5)) sum(logical.vector1 | (! runif(5) > .95)) ############################################### ## missing values, NA ############################################## ## NA is a special value which can fill an element of a vector without ## changing it's type: c(1:3,NA) ## BUT c(1:3,"NA") ## is something quite different this can be very confusing when ## reading in data from external sources. ## NAs propagate mercilessly in R. The theory is any operation on NA is NA vect3<-c(1,2,15,NA) vect3*2 mean(vect3) mean(vect3,na.rm=TRUE) mean(na.omit(vect3)) ############################################# ## OK TIME TO TAKE ANOTHER BREAK. ## ## Quit R by typing q() in the R window. ## ALWAYS ELECT NOT to save your work space; ## Save this buffer/file ## quit Emacs ## Then open this file again and pick up where you left off. ############################################# ####################### ## selecting elements with [ ] ####################### ## To select or replace specific elements of objects, we can use ## square brackets, []. length(vect3) ## If this produced an error: object 'vect3' not found ... it would be ## because your vect3 no longer exists. That is supposed to happen ## when you quit R and elect NOT to save your workspace. This is ## absolutely a feature and not a bug. If your habbit were to save ## your workspace then someday an object called "vect3" -- but from an ## entirely different project would wind up getting used in a critical ## calculation... won't you be bummed then! It's easy scroll up to ## the command that created vect3 and re-execute it. vect3[3] vect3[4] vect3[7] ## QUESTIION: Do you think this is fair? why or why not. vect3[4] == vect3[7] ## square brackets are much more versatile than you might think vect4<-seq(0,100,by=3) vect4 ## positive integers are the most common expressions inside [] vect4[1:3] vect4[3:1] vect4[seq(1,length(vect4),by=3)] ## logical expressions can be used inside [] vect4[vect4 >85] vect4[(vect4/2) %in% c(12,15)] ## but be careful when using logical expressions inside [] vect4[vect4 >6] ## QUESTION: Explain the difference between the following two ## expressions and why R behaves as it does: vect4[(vect4/3) > 10 +0] vect4[((vect4/3) > 10) +0] ##Negative integers can ## also be used in [] -- to exclude certain elements vect4[-1:-5] vect4[-length(vect4)] ## square brackets can also be used to assign new values to elements ## of objects. vect4[1:3]<-999 vect4 vect4<- sqrt(vect4-18) ## Warning message because negative numbers do not have real square roots. vect4 vect4[is.na(vect4)]<- 0 ## Vectors (and other objects) can have "names" That is, each element ## of a vector can be associated with a character string-- which can ## be used in square brackets to select or assign new values to that ## element. Later in the course, this will turn out to be surprisingly useful. vect5<-c(mean(vect4,na.rm=TRUE), var(vect4,na.rm=TRUE), median(vect4,na.rm=TRUE), sum(vect4,na.rm=TRUE), min(vect4,na.rm=TRUE), max(vect4,na.rm=TRUE)) names(vect5)<-c("mean","var","median","sum","min","max") vect5["var"] vect5["mean"]/sqrt(vect5["var"]) ############################################# ## higher dimensional data objects ############################################# ## Vectors are the most basic data object in R. They have only the two ## "intrinsic" attributes that all objects must have: length and ## mode. We have already seen both. Length is the number of elements, ## and "mode" refers to whether the data object contains numerical, ## logical, or character values. ## Arrays are simply vectors with additional attribute: "dim" for ## dimensionality ## a 2 dimensional array is generally called a matrix mat1<-array(data=1:100,dim=c(20,5)) mat1 ## HEADSUP -- even though the row dimension is the *first* dimension, ## R fills 2 dimensional arrays by column. Another way of thinking of ## this is that the left most subscript changes fastest. mat2<-array(data=1:40,dim=c(4,5,2)) mat2 ## Since arrays are just vectors with a "dim" attribute: dim(mat2) ## we change vectors to arrays and back again by changing the "dim" attribute. dim(mat2)<-c(10,4) mat2 is.array(mat2) dim(mat2)<-NULL mat2 is.array(mat2) ## and back to where we started dim(mat2)<-c(4,5,2) is.array(mat2) ### Square brackets can be used to select and assign to elements of ### arrays To specify an element of a 3 dimensional array, specify the ### index number of each dimension and separate with commas. mat2[3,1,2] ## print mat2 to verify that this makes sense: mat2 ## we can of course also assign to arrays mat2[3,1,2] <-99 mat2 ## QUESTION: What does it mean that you can also reference array ## elements with a single index? mat2[23] ############################################# ## cbind() and rbind() are convenient ways of cobining vectors and ## arrays (by Column or Row respectively) into other arrays ############################################# cbind(0:5, 10:15, 20:25, 30:35, 40:45, 50:55) rbind(0:5, 10:15, 20:25, 30:35, 40:45, 50:55) ## rbind() and cbind() also take arrays as arguments rbind( cbind(0:5,10:15,20:25,30:35,40:45,50:55), rbind(0:5,10:15,20:25,30:35,40:45,50:55) ) ############################################# ## The "factor" is a particularly mischievous type of object. It's ## purpose is to represent categorical data such as eye color, or ## "strongly agree" vs "agree" vs "disagree". In some cases the ## categories have a natural order other times they do not. R ## *frequently* tries to convert vectors of character strings into ## factors. R assumes that you are more likely to be estimating ## statistical models than writing poetry so character strings are ## more likely to be categorical data than simply words. You'll get ## bitten by this bug a few times. ############################################# ## Let's create a factor and see what it does. ## this brings some harmelss data objects into the workspace. There ## are several data sets like this for learning and testing R. data(state) state.name ## one of the data objects in "state" it is just a vector ## of character strings of names of the states state.factr<-factor(state.name) state.factr ## note the lack of quotation marks and the "50 Levels" attributes(state.name) attributes(state.factr) ## What's mischeivous about factors is that they often behave as ## vectors of character strings (to trick you into thinking that ## that's what they are). state.factr == state.name sum(state.factr == "Montana") sum(state.name == "Montana") ## But once you think you know what's going on... it sneaks up and misbehaves state.factr[2] state.factr[2]<-"Palin Land" state.factr[2] levels(state.factr) sum(state.factr == "Alaska") ## It's even worse when the levels of your factor look like ## numbers. age.num<- trunc(runif(n=10000,min=0,max=90)) ## lookup runif() and trunc() age.factr<-factor(age.num) head(age.num) ## head() shows the first 6 elements of a vector or ## rows of a matix head(age.factr) mean(age.num) mean(age.factr) ## When this happens, one may resort to the as.numeric() function to ## convert the factors back into numbers: mean(as.numeric(age.factr)) ## Look carefuly at the result. ## Question: Why does R do this: sum(age.num == age.factr) sum(age.num == as.numeric(age.factr))