Monday, May 11, 2020

How to Split Strings in Ruby

Unless user input is a single word or number, that input will need to be split  or turned into a list of strings or numbers. For instance, if a program asks for your full name, including middle initial, it will first need to split that input into three separate strings before it can work with your individual first, middle and last name. This is achieved using the String#split method. How String#split Works In its most basic form, String#split takes a single argument: the field delimiter as a string. This delimiter will be removed from the output and an array of strings split on the delimiter will be returned. So, in the following example, assuming the user input their name correctly, you should receive a three-element Array from the split. #!/usr/bin/env rubyprint What is your full name? full_name gets.chompname full_name.split( )puts Your first name is #{name.first}puts Your last name is #{name.last} If we run this program and enter a name, well get some expected results. Also, note that name.first and name.last are coincidences. The name variable will be an Array, and those two method calls will be equivalent to name[0] and name[-1] respectively. $ ruby split.rbWhat is your full name? Michael C. MorinYour first name is MichaelYour last name is Morin However,  String#split is a bit smarter than youd think. If the argument to String#split is a string, it does indeed use that as the delimiter, but if the argument is a string with a single space (as we used), then it infers that you want to split on any amount of whitespace  and that you also want to remove any leading whitespace. So, if we were to give it some slightly malformed input such as Michael C. Morin (with extra spaces), then String#split would still do what is expected. However, thats the only special case when you pass a String as the first argument. Regular Expression Delimiters You can also pass a regular expression as the first argument. Here, String#split becomes a bit more flexible. We can also make our little name splitting code a bit smarter. We dont want the period at the end of the middle initial. We know its a middle initial, and the database wont want a period there, so we can remove it while we split. When String#split matches a regular expression, it does the same exact thing as if it had just matched a string delimiter: it takes it out of the output and splits it at that point. So, we can evolve our example a little bit: $ cat split.rb#!/usr/bin/env rubyprint What is your full name? full_name gets.chompname full_name.split(/\.?\s/)puts Your first name is #{name.first}puts Your middle initial is #{name[1]}puts Your last name is #{name.last} Default Record Separator Ruby is not really big on special variables that you might find in languages like Perl, but String#split does use one you need to be aware of. This is the default record separator variable, also known as $;. Its a global, something you dont often see in Ruby, so if you change it, it might affect other parts of the code—just be sure to change it back when finished. However, all this variable does is act as the default value for the first argument to String#split. By default, this variable seems to be set to nil. However, if String#splits first argument is nil, it will replace it with a single space string. Zero-Length Delimiters If the delimiter passed to String#split is a zero-length string or regular expression, then String#split will act a bit differently. It will remove nothing at all from the original string and split on every character. This essentially turns the string into an array of equal length containing only one-character strings, one for each character in the string. This can be useful for iterating over the string and was used in pre-1.9.x and pre-1.8.7 (which backported a number of features from 1.9.x) to iterate over characters in a string without worrying about breaking up multi-byte Unicode characters. However, if what you really want to do is iterate over a string, and youre using 1.8.7 or 1.9.x, you should probably use String#each_char instead. #!/usr/bin/env rubystr She turned me into a newt!str.split().each do|c| puts cend Limiting The Length of the Returned Array So back to our name parsing example, what if someone has a space in their last name? For instance, Dutch surnames can often begin with van (meaning of or from). We only really want a 3-element array, so we can use the second argument to String#split that we have so far ignored. The second argument is expected to be a Fixnum. If this argument is positive, at most, that many elements will be filled in the array. So in our case, we would want to pass 3 for this argument. #!/usr/bin/env rubyprint What is your full name? full_name gets.chompname full_name.split(/\.?\s/, 3)puts Your first name is #{name.first}puts Your middle initial is #{name[1]}puts Your last name is #{name.last} If we run this again and give it a Dutch name, it will act as expected. $ ruby split.rbWhat is your full name? Vincent Willem van GoghYour first name is VincentYour middle initial is WillemYour last name is van Gogh However, if this argument is negative (any negative number), then there will be no limit on the number of elements in the output array and any trailing delimiters will appear as zero-length strings at the end of the array. This is demonstrated in this IRB snippet: :001 this,is,a,test,,,,.split(,, -1) [this, is, a, test, , , , ]

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.